ray/rllib/tuned_examples/pong-dqn.yaml

# You can expect ~20 reward within 1.1m timesteps / 2.1 hours on a K80 GPU
pong-deterministic-dqn:
    env: PongDeterministic-v4
    run: DQN
    stop:
        episode_reward_mean: 20
        time_total_s: 7200
    config:
        num_gpus: 1
        gamma: 0.99
        lr: .0001
        learning_starts: 10000
        buffer_size: 50000
        sample_batch_size: 4
        train_batch_size: 32
        exploration_config:
          epsilon_timesteps: 200000
          final_epsilon: .01
        model:
          grayscale: True
          zero_mean: False
          dim: 42
[rllib] Pull out multi-gpu optimizer as a generic class (#1313) 2017-12-17 15:59:57 -08:00			`# You can expect ~20 reward within 1.1m timesteps / 2.1 hours on a K80 GPU`
[rllib] Fix DQN inefficiency, and cleanup for different modes of parallelism (#1151) * initial checkin * flake * dqn * docs * add tuned pong * remove * upd * add both * better gamma * update * Last nit 2017-10-29 10:52:30 -07:00			`pong-deterministic-dqn:`
[rllib] Initial work on integrating hyperparameter search tool (#1107) * clean up train * update * update train script * add tuned examples * add agent catalog * add tune lib * update * fix * testS * remove * train docs * comments * todo * fix resource parsing * fix cr test * add test * try to fix travis test 2017-10-13 16:18:16 -07:00			`env: PongDeterministic-v4`
[tune] Support user-defined trainable functions / classes / envs with a shared object registry (#1226) 2017-11-20 17:52:43 -08:00			`run: DQN`
[rllib] Initial work on integrating hyperparameter search tool (#1107) * clean up train * update * update train script * add tuned examples * add agent catalog * add tune lib * update * fix * testS * remove * train docs * comments * todo * fix resource parsing * fix cr test * add test * try to fix travis test 2017-10-13 16:18:16 -07:00			`stop:`
			`episode_reward_mean: 20`
			`time_total_s: 7200`
[rllib] Fix DQN inefficiency, and cleanup for different modes of parallelism (#1151) * initial checkin * flake * dqn * docs * add tuned pong * remove * upd * add both * better gamma * update * Last nit 2017-10-29 10:52:30 -07:00			`config:`
[rllib] Clean up agent resource configurations (#3296) Closes #3284 2018-11-13 18:00:03 -08:00			`num_gpus: 1`
[rllib] Fix DQN inefficiency, and cleanup for different modes of parallelism (#1151) * initial checkin * flake * dqn * docs * add tuned pong * remove * upd * add both * better gamma * update * Last nit 2017-10-29 10:52:30 -07:00			`gamma: 0.99`
			`lr: .0001`
			`learning_starts: 10000`
			`buffer_size: 50000`
			`sample_batch_size: 4`
			`train_batch_size: 32`
Fix old exploration configs. (#7240) 2020-02-20 17:39:16 +01:00			`exploration_config:`
[RLlib] Exploration API (+EpsilonGreedy sub-class). (#6974) 2020-02-11 00:22:07 +01:00			`epsilon_timesteps: 200000`
			`final_epsilon: .01`
[rllib] Fix DQN inefficiency, and cleanup for different modes of parallelism (#1151) * initial checkin * flake * dqn * docs * add tuned pong * remove * upd * add both * better gamma * update * Last nit 2017-10-29 10:52:30 -07:00			`model:`
			`grayscale: True`
			`zero_mean: False`
			`dim: 42`