ray/rllib/tuned_examples/ddpg/pendulum-td3.yaml

# This configuration can expect to reach -160 reward in 10k-20k timesteps
pendulum-td3:
    env: Pendulum-v0
    run: TD3
    stop:
        episode_reward_mean: -900
        timesteps_total: 100000
    config:
        # Works for both torch and tf.
        framework: tf
        # === Model ===
        actor_hiddens: [64, 64]
        critic_hiddens: [64, 64]
        # === Exploration ===
        learning_starts: 5000
        exploration_config:
            random_timesteps: 5000
        # === Evaluation ===
        evaluation_interval: 1
        evaluation_num_episodes: 5
Enable Twin Delayed DDPG for RLlib DDPG agent (#3353) 2018-11-22 12:03:20 +08:00			`# This configuration can expect to reach -160 reward in 10k-20k timesteps`
[RLlib] Benchmark and regression test yaml cleanup and restructuring. (#8414) 2020-05-26 11:10:27 +02:00			`pendulum-td3:`
Enable Twin Delayed DDPG for RLlib DDPG agent (#3353) 2018-11-22 12:03:20 +08:00			`env: Pendulum-v0`
[rllib] TD3/DDPG improvements and MuJoCo benchmarks (#4694) * [rllib] Separate optimisers for DDPG actor & crit. * [rllib] Better names for DDPG variables & options Config changes: - noise_scale -> exploration_ou_noise_scale - exploration_theta -> exploration_ou_theta - exploration_sigma -> exploration_ou_sigma - act_noise -> exploration_gaussian_sigma - noise_clip -> target_noise_clip * [rllib] Make DDPG less class-y Used functions to replace three classes with only an __init__ method & a handful of unrelated attributes. * [rllib] Refactor DDPG noise * [rllib] Unify DDPG exploration annealing Added option "exploration_should_anneal" to enable linear annealing of exploration noise. By default this is off, for consistency with DDPG & TD3 papers. Also renamed "exploration_final_eps" to "exploration_final_scale" (that name seems to have been carried over from DQN, and doesn't really make sense here). Finally, tried to rename "eps" to "noise_scale" wherever possible. 2019-04-26 17:49:53 -07:00			`run: TD3`
Enable Twin Delayed DDPG for RLlib DDPG agent (#3353) 2018-11-22 12:03:20 +08:00			`stop:`
[RLlib] Benchmark and regression test yaml cleanup and restructuring. (#8414) 2020-05-26 11:10:27 +02:00			`episode_reward_mean: -900`
			`timesteps_total: 100000`
Enable Twin Delayed DDPG for RLlib DDPG agent (#3353) 2018-11-22 12:03:20 +08:00			`config:`
[RLlib] Benchmark and regression test yaml cleanup and restructuring. (#8414) 2020-05-26 11:10:27 +02:00			`# Works for both torch and tf.`
[RLlib] Auto-framework, retire `use_pytorch` in favor of `framework=...` (#8520) 2020-05-27 16:19:13 +02:00			`framework: tf`
Enable Twin Delayed DDPG for RLlib DDPG agent (#3353) 2018-11-22 12:03:20 +08:00			`# === Model ===`
			`actor_hiddens: [64, 64]`
			`critic_hiddens: [64, 64]`
			`# === Exploration ===`
[rllib] TD3/DDPG improvements and MuJoCo benchmarks (#4694) * [rllib] Separate optimisers for DDPG actor & crit. * [rllib] Better names for DDPG variables & options Config changes: - noise_scale -> exploration_ou_noise_scale - exploration_theta -> exploration_ou_theta - exploration_sigma -> exploration_ou_sigma - act_noise -> exploration_gaussian_sigma - noise_clip -> target_noise_clip * [rllib] Make DDPG less class-y Used functions to replace three classes with only an __init__ method & a handful of unrelated attributes. * [rllib] Refactor DDPG noise * [rllib] Unify DDPG exploration annealing Added option "exploration_should_anneal" to enable linear annealing of exploration noise. By default this is off, for consistency with DDPG & TD3 papers. Also renamed "exploration_final_eps" to "exploration_final_scale" (that name seems to have been carried over from DQN, and doesn't really make sense here). Finally, tried to rename "eps" to "noise_scale" wherever possible. 2019-04-26 17:49:53 -07:00			`learning_starts: 5000`
[RLlib] DDPG refactor and Exploration API action noise classes. (#7314) * WIP. * WIP. * WIP. * WIP. * WIP. * Fix * WIP. * Add TD3 quick Pendulum regresison. * Cleanup. * Fix. * LINT. * Fix. * Sort quick_learning test cases, add TD3. * Sort quick_learning test cases, add TD3. * Revert test_checkpoint_restore.py (debugging) changes. * Fix old soft_q settings in documentation and test configs. * More doc fixes. * Fix test case. * Fix test case. * Lower test load. * WIP. 2020-03-01 20:53:35 +01:00			`exploration_config:`
			`random_timesteps: 5000`
[rllib] TD3/DDPG improvements and MuJoCo benchmarks (#4694) * [rllib] Separate optimisers for DDPG actor & crit. * [rllib] Better names for DDPG variables & options Config changes: - noise_scale -> exploration_ou_noise_scale - exploration_theta -> exploration_ou_theta - exploration_sigma -> exploration_ou_sigma - act_noise -> exploration_gaussian_sigma - noise_clip -> target_noise_clip * [rllib] Make DDPG less class-y Used functions to replace three classes with only an __init__ method & a handful of unrelated attributes. * [rllib] Refactor DDPG noise * [rllib] Unify DDPG exploration annealing Added option "exploration_should_anneal" to enable linear annealing of exploration noise. By default this is off, for consistency with DDPG & TD3 papers. Also renamed "exploration_final_eps" to "exploration_final_scale" (that name seems to have been carried over from DQN, and doesn't really make sense here). Finally, tried to rename "eps" to "noise_scale" wherever possible. 2019-04-26 17:49:53 -07:00			`# === Evaluation ===`
			`evaluation_interval: 1`
			`evaluation_num_episodes: 5`