ray/rllib/tuned_examples/apex_dqn/pong-apex-dqn.yaml

# This reaches ~19 reward in < 40 minutes (3M env steps) on a p3.8xlarge AWS instance.
# See https://app.wandb.ai/zplizzi/test/runs/ayuuhixr?workspace=user-zplizzi
# for training curves.
pong-apex:
    env: PongNoFrameskip-v4
    run: APEX
    stop:
        episode_reward_mean: 19.0
        timesteps_total: 4000000
    config:
        # Works for both torch and tf.
        framework: tf
        target_network_update_freq: 20000
        num_workers: 4
        num_envs_per_worker: 8
        lr: .00005
        train_batch_size: 64
        replay_buffer_config:
          type: MultiAgentPrioritizedReplayBuffer
          capacity: 1000000
        gamma: 0.99
        training_intensity: 16
[RLlib][Training iteration fn] APEX conversion (#22937) 2022-04-20 08:56:18 -07:00			`# This reaches ~19 reward in < 40 minutes (3M env steps) on a p3.8xlarge AWS instance.`
Update pong-apex tuned example (#6462) 2019-12-12 10:57:55 -08:00			`# See https://app.wandb.ai/zplizzi/test/runs/ayuuhixr?workspace=user-zplizzi`
			`# for training curves.`
[rllib] Ape-X implementation and DQN refactor to handle replay in policy optimizer (#1604) * minimal apex checkin * cleanup dqn options * actor utils * Sun Feb 25 17:39:54 PST 2018 * update * compression refactor * fix * add test * fix models * Sun Feb 25 21:46:27 PST 2018 * Wed Feb 28 10:26:34 PST 2018 * Wed Feb 28 10:28:09 PST 2018 * Wed Feb 28 10:42:59 PST 2018 * refactor * Wed Feb 28 11:17:19 PST 2018 * Wed Feb 28 11:42:08 PST 2018 * Wed Feb 28 11:42:13 PST 2018 * Wed Feb 28 11:59:02 PST 2018 * Wed Feb 28 11:59:58 PST 2018 * Wed Feb 28 12:00:08 PST 2018 * Wed Feb 28 12:02:19 PST 2018 * Wed Feb 28 13:44:31 PST 2018 * Wed Feb 28 17:01:20 PST 2018 * Sat Mar 3 14:55:59 PST 2018 * make optimizer construction explicit * Sat Mar 3 18:23:08 PST 2018 * Sat Mar 3 18:24:28 PST 2018 * Sat Mar 3 18:49:28 PST 2018 * Sat Mar 3 18:50:42 PST 2018 * Sat Mar 3 18:56:10 PST 2018 2018-03-04 12:25:25 -08:00			`pong-apex:`
[rllib] Switch DQN to using deepmind wrappers (#1655) * deepmind wrap * use 80x80 * respect custom prep * fix replay size * fix chekc * batch idx * Wed Mar 7 11:00:39 PST 2018 * random starts and reward clipping * Fri Mar 9 17:27:17 PST 2018 * Fri Mar 9 17:36:15 PST 2018 * Sat Mar 10 19:47:10 PST 2018 * Sat Mar 10 19:47:37 PST 2018 * Sat Mar 10 20:05:12 PST 2018 * Sat Mar 10 20:54:21 PST 2018 * Sat Mar 10 21:03:52 PST 2018 2018-03-11 21:14:38 -07:00			`env: PongNoFrameskip-v4`
[rllib] Ape-X implementation and DQN refactor to handle replay in policy optimizer (#1604) * minimal apex checkin * cleanup dqn options * actor utils * Sun Feb 25 17:39:54 PST 2018 * update * compression refactor * fix * add test * fix models * Sun Feb 25 21:46:27 PST 2018 * Wed Feb 28 10:26:34 PST 2018 * Wed Feb 28 10:28:09 PST 2018 * Wed Feb 28 10:42:59 PST 2018 * refactor * Wed Feb 28 11:17:19 PST 2018 * Wed Feb 28 11:42:08 PST 2018 * Wed Feb 28 11:42:13 PST 2018 * Wed Feb 28 11:59:02 PST 2018 * Wed Feb 28 11:59:58 PST 2018 * Wed Feb 28 12:00:08 PST 2018 * Wed Feb 28 12:02:19 PST 2018 * Wed Feb 28 13:44:31 PST 2018 * Wed Feb 28 17:01:20 PST 2018 * Sat Mar 3 14:55:59 PST 2018 * make optimizer construction explicit * Sat Mar 3 18:23:08 PST 2018 * Sat Mar 3 18:24:28 PST 2018 * Sat Mar 3 18:49:28 PST 2018 * Sat Mar 3 18:50:42 PST 2018 * Sat Mar 3 18:56:10 PST 2018 2018-03-04 12:25:25 -08:00			`run: APEX`
[RLlib][Training iteration fn] APEX conversion (#22937) 2022-04-20 08:56:18 -07:00			`stop:`
			`episode_reward_mean: 19.0`
			`timesteps_total: 4000000`
[rllib] Ape-X implementation and DQN refactor to handle replay in policy optimizer (#1604) * minimal apex checkin * cleanup dqn options * actor utils * Sun Feb 25 17:39:54 PST 2018 * update * compression refactor * fix * add test * fix models * Sun Feb 25 21:46:27 PST 2018 * Wed Feb 28 10:26:34 PST 2018 * Wed Feb 28 10:28:09 PST 2018 * Wed Feb 28 10:42:59 PST 2018 * refactor * Wed Feb 28 11:17:19 PST 2018 * Wed Feb 28 11:42:08 PST 2018 * Wed Feb 28 11:42:13 PST 2018 * Wed Feb 28 11:59:02 PST 2018 * Wed Feb 28 11:59:58 PST 2018 * Wed Feb 28 12:00:08 PST 2018 * Wed Feb 28 12:02:19 PST 2018 * Wed Feb 28 13:44:31 PST 2018 * Wed Feb 28 17:01:20 PST 2018 * Sat Mar 3 14:55:59 PST 2018 * make optimizer construction explicit * Sat Mar 3 18:23:08 PST 2018 * Sat Mar 3 18:24:28 PST 2018 * Sat Mar 3 18:49:28 PST 2018 * Sat Mar 3 18:50:42 PST 2018 * Sat Mar 3 18:56:10 PST 2018 2018-03-04 12:25:25 -08:00			`config:`
[RLlib] Auto-framework, retire `use_pytorch` in favor of `framework=...` (#8520) 2020-05-27 16:19:13 +02:00			`# Works for both torch and tf.`
			`framework: tf`
Update pong-apex tuned example (#6462) 2019-12-12 10:57:55 -08:00			`target_network_update_freq: 20000`
			`num_workers: 4`
			`num_envs_per_worker: 8`
			`lr: .00005`
			`train_batch_size: 64`
[RLlib] Replay Buffer API and Ape-X. (#24506) 2022-05-17 13:43:49 +02:00			`replay_buffer_config:`
			`type: MultiAgentPrioritizedReplayBuffer`
			`capacity: 1000000`
[rllib] Ape-X implementation and DQN refactor to handle replay in policy optimizer (#1604) * minimal apex checkin * cleanup dqn options * actor utils * Sun Feb 25 17:39:54 PST 2018 * update * compression refactor * fix * add test * fix models * Sun Feb 25 21:46:27 PST 2018 * Wed Feb 28 10:26:34 PST 2018 * Wed Feb 28 10:28:09 PST 2018 * Wed Feb 28 10:42:59 PST 2018 * refactor * Wed Feb 28 11:17:19 PST 2018 * Wed Feb 28 11:42:08 PST 2018 * Wed Feb 28 11:42:13 PST 2018 * Wed Feb 28 11:59:02 PST 2018 * Wed Feb 28 11:59:58 PST 2018 * Wed Feb 28 12:00:08 PST 2018 * Wed Feb 28 12:02:19 PST 2018 * Wed Feb 28 13:44:31 PST 2018 * Wed Feb 28 17:01:20 PST 2018 * Sat Mar 3 14:55:59 PST 2018 * make optimizer construction explicit * Sat Mar 3 18:23:08 PST 2018 * Sat Mar 3 18:24:28 PST 2018 * Sat Mar 3 18:49:28 PST 2018 * Sat Mar 3 18:50:42 PST 2018 * Sat Mar 3 18:56:10 PST 2018 2018-03-04 12:25:25 -08:00			`gamma: 0.99`
[RLlib][Training iteration fn] APEX conversion (#22937) 2022-04-20 08:56:18 -07:00			`training_intensity: 16`