ray/rllib/tuned_examples/dqn/pong-dqn.yaml

# You can expect ~20 reward within 1.1m timesteps / 2.1 hours on a K80 GPU
pong-deterministic-dqn:
    env: PongDeterministic-v4
    run: DQN
    stop:
        episode_reward_mean: 20
        time_total_s: 7200
    config:
        # Works for both torch and tf.
        framework: tf
        num_gpus: 1
        gamma: 0.99
        lr: .0001
        replay_buffer_config:
          type: MultiAgentPrioritizedReplayBuffer
          capacity: 50000
        num_steps_sampled_before_learning_starts: 10000
        rollout_fragment_length: 4
        train_batch_size: 32
        exploration_config:
          epsilon_timesteps: 200000
          final_epsilon: .01
        model:
          grayscale: True
          zero_mean: False
          dim: 42
        # we should set compress_observations to True because few machines
        # would be able to contain the replay buffers in memory otherwise
        compress_observations: True
[rllib] Pull out multi-gpu optimizer as a generic class (#1313) 2017-12-17 15:59:57 -08:00			`# You can expect ~20 reward within 1.1m timesteps / 2.1 hours on a K80 GPU`
[rllib] Fix DQN inefficiency, and cleanup for different modes of parallelism (#1151) * initial checkin * flake * dqn * docs * add tuned pong * remove * upd * add both * better gamma * update * Last nit 2017-10-29 10:52:30 -07:00			`pong-deterministic-dqn:`
[rllib] Initial work on integrating hyperparameter search tool (#1107) * clean up train * update * update train script * add tuned examples * add agent catalog * add tune lib * update * fix * testS * remove * train docs * comments * todo * fix resource parsing * fix cr test * add test * try to fix travis test 2017-10-13 16:18:16 -07:00			`env: PongDeterministic-v4`
[tune] Support user-defined trainable functions / classes / envs with a shared object registry (#1226) 2017-11-20 17:52:43 -08:00			`run: DQN`
[rllib] Initial work on integrating hyperparameter search tool (#1107) * clean up train * update * update train script * add tuned examples * add agent catalog * add tune lib * update * fix * testS * remove * train docs * comments * todo * fix resource parsing * fix cr test * add test * try to fix travis test 2017-10-13 16:18:16 -07:00			`stop:`
			`episode_reward_mean: 20`
			`time_total_s: 7200`
[rllib] Fix DQN inefficiency, and cleanup for different modes of parallelism (#1151) * initial checkin * flake * dqn * docs * add tuned pong * remove * upd * add both * better gamma * update * Last nit 2017-10-29 10:52:30 -07:00			`config:`
[RLlib] Benchmark and regression test yaml cleanup and restructuring. (#8414) 2020-05-26 11:10:27 +02:00			`# Works for both torch and tf.`
[RLlib] Auto-framework, retire `use_pytorch` in favor of `framework=...` (#8520) 2020-05-27 16:19:13 +02:00			`framework: tf`
[rllib] Clean up agent resource configurations (#3296) Closes #3284 2018-11-13 18:00:03 -08:00			`num_gpus: 1`
[rllib] Fix DQN inefficiency, and cleanup for different modes of parallelism (#1151) * initial checkin * flake * dqn * docs * add tuned pong * remove * upd * add both * better gamma * update * Last nit 2017-10-29 10:52:30 -07:00			`gamma: 0.99`
			`lr: .0001`
[RLlib] Replay Buffer API and Ape-X. (#24506) 2022-05-17 13:43:49 +02:00			`replay_buffer_config:`
			`type: MultiAgentPrioritizedReplayBuffer`
			`capacity: 50000`
[RLlib] Move learning_starts logic from buffers into `training_step()`. (#26032) 2022-08-11 13:07:30 +02:00			`num_steps_sampled_before_learning_starts: 10000`
[rllib] Rename sample_batch_size => rollout_fragment_length (#7503) * bulk rename * deprecation warn * update doc * update fig * line length * rename * make pytest comptaible * fix test * fi sys * rename * wip * fix more * lint * update svg * comments * lint * fix use of batch steps 2020-03-14 12:05:04 -07:00			`rollout_fragment_length: 4`
[rllib] Fix DQN inefficiency, and cleanup for different modes of parallelism (#1151) * initial checkin * flake * dqn * docs * add tuned pong * remove * upd * add both * better gamma * update * Last nit 2017-10-29 10:52:30 -07:00			`train_batch_size: 32`
Fix old exploration configs. (#7240) 2020-02-20 17:39:16 +01:00			`exploration_config:`
[RLlib] Exploration API (+EpsilonGreedy sub-class). (#6974) 2020-02-11 00:22:07 +01:00			`epsilon_timesteps: 200000`
			`final_epsilon: .01`
[rllib] Fix DQN inefficiency, and cleanup for different modes of parallelism (#1151) * initial checkin * flake * dqn * docs * add tuned pong * remove * upd * add both * better gamma * update * Last nit 2017-10-29 10:52:30 -07:00			`model:`
			`grayscale: True`
			`zero_mean: False`
			`dim: 42`
[rllib] Use compress observations where replay buffers and image obs are used in tuned examples (#26735) 2022-07-22 10:10:51 -07:00			`# we should set compress_observations to True because few machines`
			`# would be able to contain the replay buffers in memory otherwise`
			`compress_observations: True`