ray/rllib/tuned_examples/td3/invertedpendulum-td3.yaml

invertedpendulum-td3:
    # This is a TD3 with stopping conditions and network size tuned specifically
    # for InvertedPendulum. Should be able to reach 1,000 reward (the maximum
    # achievable) in 10,000 to 20,000 steps.
    env: InvertedPendulum-v2
    run: TD3
    stop:
        episode_reward_mean: 9999.9
        time_total_s: 900 # 15 minutes
        timesteps_total: 1000000
    config:
        # Works for both torch and tf.
        framework: tf
        # === Model ===
        actor_hiddens: [32, 32]
        critic_hiddens: [32, 32]

        # === Exploration ===
        replay_buffer_config:
          learning_starts: 1000
        exploration_config:
            random_timesteps: 1000

        # === Evaluation ===
        evaluation_interval: 10
        evaluation_duration: 5
[rllib] TD3/DDPG improvements and MuJoCo benchmarks (#4694) * [rllib] Separate optimisers for DDPG actor & crit. * [rllib] Better names for DDPG variables & options Config changes: - noise_scale -> exploration_ou_noise_scale - exploration_theta -> exploration_ou_theta - exploration_sigma -> exploration_ou_sigma - act_noise -> exploration_gaussian_sigma - noise_clip -> target_noise_clip * [rllib] Make DDPG less class-y Used functions to replace three classes with only an __init__ method & a handful of unrelated attributes. * [rllib] Refactor DDPG noise * [rllib] Unify DDPG exploration annealing Added option "exploration_should_anneal" to enable linear annealing of exploration noise. By default this is off, for consistency with DDPG & TD3 papers. Also renamed "exploration_final_eps" to "exploration_final_scale" (that name seems to have been carried over from DQN, and doesn't really make sense here). Finally, tried to rename "eps" to "noise_scale" wherever possible. 2019-04-26 17:49:53 -07:00			`invertedpendulum-td3:`
			`# This is a TD3 with stopping conditions and network size tuned specifically`
			`# for InvertedPendulum. Should be able to reach 1,000 reward (the maximum`
			`# achievable) in 10,000 to 20,000 steps.`
			`env: InvertedPendulum-v2`
			`run: TD3`
			`stop:`
			`episode_reward_mean: 9999.9`
			`time_total_s: 900 # 15 minutes`
			`timesteps_total: 1000000`
			`config:`
[RLlib] Benchmark and regression test yaml cleanup and restructuring. (#8414) 2020-05-26 11:10:27 +02:00			`# Works for both torch and tf.`
[RLlib] Auto-framework, retire `use_pytorch` in favor of `framework=...` (#8520) 2020-05-27 16:19:13 +02:00			`framework: tf`
[rllib] TD3/DDPG improvements and MuJoCo benchmarks (#4694) * [rllib] Separate optimisers for DDPG actor & crit. * [rllib] Better names for DDPG variables & options Config changes: - noise_scale -> exploration_ou_noise_scale - exploration_theta -> exploration_ou_theta - exploration_sigma -> exploration_ou_sigma - act_noise -> exploration_gaussian_sigma - noise_clip -> target_noise_clip * [rllib] Make DDPG less class-y Used functions to replace three classes with only an __init__ method & a handful of unrelated attributes. * [rllib] Refactor DDPG noise * [rllib] Unify DDPG exploration annealing Added option "exploration_should_anneal" to enable linear annealing of exploration noise. By default this is off, for consistency with DDPG & TD3 papers. Also renamed "exploration_final_eps" to "exploration_final_scale" (that name seems to have been carried over from DQN, and doesn't really make sense here). Finally, tried to rename "eps" to "noise_scale" wherever possible. 2019-04-26 17:49:53 -07:00			`# === Model ===`
			`actor_hiddens: [32, 32]`
			`critic_hiddens: [32, 32]`

			`# === Exploration ===`
[RLlib] Replay Buffer API and Ape-X. (#24506) 2022-05-17 13:43:49 +02:00			`replay_buffer_config:`
			`learning_starts: 1000`
[RLlib] DDPG refactor and Exploration API action noise classes. (#7314) * WIP. * WIP. * WIP. * WIP. * WIP. * Fix * WIP. * Add TD3 quick Pendulum regresison. * Cleanup. * Fix. * LINT. * Fix. * Sort quick_learning test cases, add TD3. * Sort quick_learning test cases, add TD3. * Revert test_checkpoint_restore.py (debugging) changes. * Fix old soft_q settings in documentation and test configs. * More doc fixes. * Fix test case. * Fix test case. * Lower test load. * WIP. 2020-03-01 20:53:35 +01:00			`exploration_config:`
			`random_timesteps: 1000`
[rllib] TD3/DDPG improvements and MuJoCo benchmarks (#4694) * [rllib] Separate optimisers for DDPG actor & crit. * [rllib] Better names for DDPG variables & options Config changes: - noise_scale -> exploration_ou_noise_scale - exploration_theta -> exploration_ou_theta - exploration_sigma -> exploration_ou_sigma - act_noise -> exploration_gaussian_sigma - noise_clip -> target_noise_clip * [rllib] Make DDPG less class-y Used functions to replace three classes with only an __init__ method & a handful of unrelated attributes. * [rllib] Refactor DDPG noise * [rllib] Unify DDPG exploration annealing Added option "exploration_should_anneal" to enable linear annealing of exploration noise. By default this is off, for consistency with DDPG & TD3 papers. Also renamed "exploration_final_eps" to "exploration_final_scale" (that name seems to have been carried over from DQN, and doesn't really make sense here). Finally, tried to rename "eps" to "noise_scale" wherever possible. 2019-04-26 17:49:53 -07:00
			`# === Evaluation ===`
[RLlib] DDPG Training iteration fn & Replay Buffer API (#24212) 2022-05-05 09:41:38 +02:00			`evaluation_interval: 10`
			`evaluation_duration: 5`