ray/rllib/tuned_examples/ddpg/mujoco-td3.yaml

mujoco-td3:
    # Solve latest versions of the four hardest Mujoco tasks benchmarked in the
    # original TD3 paper. Average return over 10 trials at end of 1,000,000
    # timesteps (taken from Table 2 of the paper) are given in parens at the end
    # of reach environment name.
    #
    # Paper is at https://arxiv.org/pdf/1802.09477.pdf
    env:
        grid_search:
            - HalfCheetah-v2  # (9,532.99)
            - Hopper-v2  # (3,304.75)
            - Walker2d-v2  # (4,565.24)
            - Ant-v2  # (4,185.06)
    run: TD3
    stop:
        timesteps_total: 1000000
    config:
        # Works for both torch and tf.
        use_pytorch: false
        # === Exploration ===
        learning_starts: 10000
        exploration_config:
            random_timesteps: 10000

        # === Evaluation ===
        evaluation_interval: 5
        evaluation_num_episodes: 10
[rllib] TD3/DDPG improvements and MuJoCo benchmarks (#4694) * [rllib] Separate optimisers for DDPG actor & crit. * [rllib] Better names for DDPG variables & options Config changes: - noise_scale -> exploration_ou_noise_scale - exploration_theta -> exploration_ou_theta - exploration_sigma -> exploration_ou_sigma - act_noise -> exploration_gaussian_sigma - noise_clip -> target_noise_clip * [rllib] Make DDPG less class-y Used functions to replace three classes with only an __init__ method & a handful of unrelated attributes. * [rllib] Refactor DDPG noise * [rllib] Unify DDPG exploration annealing Added option "exploration_should_anneal" to enable linear annealing of exploration noise. By default this is off, for consistency with DDPG & TD3 papers. Also renamed "exploration_final_eps" to "exploration_final_scale" (that name seems to have been carried over from DQN, and doesn't really make sense here). Finally, tried to rename "eps" to "noise_scale" wherever possible. 2019-04-26 17:49:53 -07:00			`mujoco-td3:`
			`# Solve latest versions of the four hardest Mujoco tasks benchmarked in the`
			`# original TD3 paper. Average return over 10 trials at end of 1,000,000`
			`# timesteps (taken from Table 2 of the paper) are given in parens at the end`
			`# of reach environment name.`
			`#`
			`# Paper is at https://arxiv.org/pdf/1802.09477.pdf`
			`env:`
			`grid_search:`
			`- HalfCheetah-v2 # (9,532.99)`
			`- Hopper-v2 # (3,304.75)`
			`- Walker2d-v2 # (4,565.24)`
			`- Ant-v2 # (4,185.06)`
			`run: TD3`
			`stop:`
			`timesteps_total: 1000000`
			`config:`
[RLlib] Benchmark and regression test yaml cleanup and restructuring. (#8414) 2020-05-26 11:10:27 +02:00			`# Works for both torch and tf.`
			`use_pytorch: false`
[rllib] TD3/DDPG improvements and MuJoCo benchmarks (#4694) * [rllib] Separate optimisers for DDPG actor & crit. * [rllib] Better names for DDPG variables & options Config changes: - noise_scale -> exploration_ou_noise_scale - exploration_theta -> exploration_ou_theta - exploration_sigma -> exploration_ou_sigma - act_noise -> exploration_gaussian_sigma - noise_clip -> target_noise_clip * [rllib] Make DDPG less class-y Used functions to replace three classes with only an __init__ method & a handful of unrelated attributes. * [rllib] Refactor DDPG noise * [rllib] Unify DDPG exploration annealing Added option "exploration_should_anneal" to enable linear annealing of exploration noise. By default this is off, for consistency with DDPG & TD3 papers. Also renamed "exploration_final_eps" to "exploration_final_scale" (that name seems to have been carried over from DQN, and doesn't really make sense here). Finally, tried to rename "eps" to "noise_scale" wherever possible. 2019-04-26 17:49:53 -07:00			`# === Exploration ===`
			`learning_starts: 10000`
[RLlib] DDPG refactor and Exploration API action noise classes. (#7314) * WIP. * WIP. * WIP. * WIP. * WIP. * Fix * WIP. * Add TD3 quick Pendulum regresison. * Cleanup. * Fix. * LINT. * Fix. * Sort quick_learning test cases, add TD3. * Sort quick_learning test cases, add TD3. * Revert test_checkpoint_restore.py (debugging) changes. * Fix old soft_q settings in documentation and test configs. * More doc fixes. * Fix test case. * Fix test case. * Lower test load. * WIP. 2020-03-01 20:53:35 +01:00			`exploration_config:`
			`random_timesteps: 10000`
[rllib] TD3/DDPG improvements and MuJoCo benchmarks (#4694) * [rllib] Separate optimisers for DDPG actor & crit. * [rllib] Better names for DDPG variables & options Config changes: - noise_scale -> exploration_ou_noise_scale - exploration_theta -> exploration_ou_theta - exploration_sigma -> exploration_ou_sigma - act_noise -> exploration_gaussian_sigma - noise_clip -> target_noise_clip * [rllib] Make DDPG less class-y Used functions to replace three classes with only an __init__ method & a handful of unrelated attributes. * [rllib] Refactor DDPG noise * [rllib] Unify DDPG exploration annealing Added option "exploration_should_anneal" to enable linear annealing of exploration noise. By default this is off, for consistency with DDPG & TD3 papers. Also renamed "exploration_final_eps" to "exploration_final_scale" (that name seems to have been carried over from DQN, and doesn't really make sense here). Finally, tried to rename "eps" to "noise_scale" wherever possible. 2019-04-26 17:49:53 -07:00
			`# === Evaluation ===`
			`evaluation_interval: 5`
			`evaluation_num_episodes: 10`