ray/rllib/tuned_examples/ddpg/halfcheetah-ddpg.yaml

# This configuration can expect to reach 2000 reward in 150k-200k timesteps
halfcheetah-ddpg:
    env: HalfCheetah-v2
    run: DDPG
    stop:
        episode_reward_mean: 2000
        time_total_s: 5400 # 90 minutes
    config:
        # Works for both torch and tf.
        framework: tf
        # === Model ===
        actor_hiddens: [64, 64]
        critic_hiddens: [64, 64]
        n_step: 1
        model: {}
        gamma: 0.99
        env_config: {}

        # === Exploration ===
        exploration_config:
            initial_scale: 1.0
            final_scale: 0.02
            scale_timesteps: 10000
            ou_base_scale: 0.1
            ou_theta: 0.15
            ou_sigma: 0.2

        timesteps_per_iteration: 1000
        target_network_update_freq: 0
        tau: 0.001

        # === Replay buffer ===
        buffer_size: 10000
        prioritized_replay: True
        prioritized_replay_alpha: 0.6
        prioritized_replay_beta: 0.4
        prioritized_replay_eps: 0.000001
        clip_rewards: False

        # === Optimization ===
        actor_lr: 0.001
        critic_lr: 0.001
        use_huber: false
        huber_threshold: 1.0
        l2_reg: 0.000001
        learning_starts: 500
        rollout_fragment_length: 1
        train_batch_size: 64

        # === Parallelism ===
        num_workers: 0
        num_gpus_per_worker: 0
        worker_side_prioritization: false

        # === Evaluation ===
        evaluation_interval: 5
        evaluation_num_episodes: 10
[rllib] Merge DDPG and DDPG2 implementations (#2202) * removed ddpg2 * removed ddpg2 from codebase * added tests used in ddpg vs ddpg2 comparison * added notes about training timesteps to yaml files * removed ddpg2 yaml files * removed unnecessary configs from yaml files * removed unnecessary configs from yaml files * moved pendulum, mountaincarcontinuous, and halfcheetah tests to tuned_examples * moved pendulum, mountaincarcontinuous, and halfcheetah tests to tuned_examples * added more configuration details to yaml files * removed random starts from halfcheetah 2018-06-09 16:46:23 -07:00			`# This configuration can expect to reach 2000 reward in 150k-200k timesteps`
			`halfcheetah-ddpg:`
			`env: HalfCheetah-v2`
			`run: DDPG`
			`stop:`
			`episode_reward_mean: 2000`
			`time_total_s: 5400 # 90 minutes`
			`config:`
[RLlib] Auto-framework, retire `use_pytorch` in favor of `framework=...` (#8520) 2020-05-27 16:19:13 +02:00			`# Works for both torch and tf.`
			`framework: tf`
[rllib] Merge DDPG and DDPG2 implementations (#2202) * removed ddpg2 * removed ddpg2 from codebase * added tests used in ddpg vs ddpg2 comparison * added notes about training timesteps to yaml files * removed ddpg2 yaml files * removed unnecessary configs from yaml files * removed unnecessary configs from yaml files * moved pendulum, mountaincarcontinuous, and halfcheetah tests to tuned_examples * moved pendulum, mountaincarcontinuous, and halfcheetah tests to tuned_examples * added more configuration details to yaml files * removed random starts from halfcheetah 2018-06-09 16:46:23 -07:00			`# === Model ===`
			`actor_hiddens: [64, 64]`
			`critic_hiddens: [64, 64]`
			`n_step: 1`
			`model: {}`
			`gamma: 0.99`
			`env_config: {}`

			`# === Exploration ===`
[RLlib] DDPG refactor and Exploration API action noise classes. (#7314) * WIP. * WIP. * WIP. * WIP. * WIP. * Fix * WIP. * Add TD3 quick Pendulum regresison. * Cleanup. * Fix. * LINT. * Fix. * Sort quick_learning test cases, add TD3. * Sort quick_learning test cases, add TD3. * Revert test_checkpoint_restore.py (debugging) changes. * Fix old soft_q settings in documentation and test configs. * More doc fixes. * Fix test case. * Fix test case. * Lower test load. * WIP. 2020-03-01 20:53:35 +01:00			`exploration_config:`
			`initial_scale: 1.0`
			`final_scale: 0.02`
			`scale_timesteps: 10000`
			`ou_base_scale: 0.1`
			`ou_theta: 0.15`
			`ou_sigma: 0.2`

[rllib] Merge DDPG and DDPG2 implementations (#2202) * removed ddpg2 * removed ddpg2 from codebase * added tests used in ddpg vs ddpg2 comparison * added notes about training timesteps to yaml files * removed ddpg2 yaml files * removed unnecessary configs from yaml files * removed unnecessary configs from yaml files * moved pendulum, mountaincarcontinuous, and halfcheetah tests to tuned_examples * moved pendulum, mountaincarcontinuous, and halfcheetah tests to tuned_examples * added more configuration details to yaml files * removed random starts from halfcheetah 2018-06-09 16:46:23 -07:00			`timesteps_per_iteration: 1000`
			`target_network_update_freq: 0`
			`tau: 0.001`

			`# === Replay buffer ===`
			`buffer_size: 10000`
			`prioritized_replay: True`
			`prioritized_replay_alpha: 0.6`
			`prioritized_replay_beta: 0.4`
			`prioritized_replay_eps: 0.000001`
			`clip_rewards: False`

			`# === Optimization ===`
[rllib] TD3/DDPG improvements and MuJoCo benchmarks (#4694) * [rllib] Separate optimisers for DDPG actor & crit. * [rllib] Better names for DDPG variables & options Config changes: - noise_scale -> exploration_ou_noise_scale - exploration_theta -> exploration_ou_theta - exploration_sigma -> exploration_ou_sigma - act_noise -> exploration_gaussian_sigma - noise_clip -> target_noise_clip * [rllib] Make DDPG less class-y Used functions to replace three classes with only an __init__ method & a handful of unrelated attributes. * [rllib] Refactor DDPG noise * [rllib] Unify DDPG exploration annealing Added option "exploration_should_anneal" to enable linear annealing of exploration noise. By default this is off, for consistency with DDPG & TD3 papers. Also renamed "exploration_final_eps" to "exploration_final_scale" (that name seems to have been carried over from DQN, and doesn't really make sense here). Finally, tried to rename "eps" to "noise_scale" wherever possible. 2019-04-26 17:49:53 -07:00			`actor_lr: 0.001`
			`critic_lr: 0.001`
[RLlib; Testing] Green all RLlib nightly tests. (#18073) 2021-08-26 14:09:20 +02:00			`use_huber: false`
[rllib] Merge DDPG and DDPG2 implementations (#2202) * removed ddpg2 * removed ddpg2 from codebase * added tests used in ddpg vs ddpg2 comparison * added notes about training timesteps to yaml files * removed ddpg2 yaml files * removed unnecessary configs from yaml files * removed unnecessary configs from yaml files * moved pendulum, mountaincarcontinuous, and halfcheetah tests to tuned_examples * moved pendulum, mountaincarcontinuous, and halfcheetah tests to tuned_examples * added more configuration details to yaml files * removed random starts from halfcheetah 2018-06-09 16:46:23 -07:00			`huber_threshold: 1.0`
			`l2_reg: 0.000001`
			`learning_starts: 500`
[rllib] Rename sample_batch_size => rollout_fragment_length (#7503) * bulk rename * deprecation warn * update doc * update fig * line length * rename * make pytest comptaible * fix test * fi sys * rename * wip * fix more * lint * update svg * comments * lint * fix use of batch steps 2020-03-14 12:05:04 -07:00			`rollout_fragment_length: 1`
[rllib] Merge DDPG and DDPG2 implementations (#2202) * removed ddpg2 * removed ddpg2 from codebase * added tests used in ddpg vs ddpg2 comparison * added notes about training timesteps to yaml files * removed ddpg2 yaml files * removed unnecessary configs from yaml files * removed unnecessary configs from yaml files * moved pendulum, mountaincarcontinuous, and halfcheetah tests to tuned_examples * moved pendulum, mountaincarcontinuous, and halfcheetah tests to tuned_examples * added more configuration details to yaml files * removed random starts from halfcheetah 2018-06-09 16:46:23 -07:00			`train_batch_size: 64`

			`# === Parallelism ===`
			`num_workers: 0`
			`num_gpus_per_worker: 0`
[RLlib; Testing] Green all RLlib nightly tests. (#18073) 2021-08-26 14:09:20 +02:00			`worker_side_prioritization: false`
[rllib] TD3/DDPG improvements and MuJoCo benchmarks (#4694) * [rllib] Separate optimisers for DDPG actor & crit. * [rllib] Better names for DDPG variables & options Config changes: - noise_scale -> exploration_ou_noise_scale - exploration_theta -> exploration_ou_theta - exploration_sigma -> exploration_ou_sigma - act_noise -> exploration_gaussian_sigma - noise_clip -> target_noise_clip * [rllib] Make DDPG less class-y Used functions to replace three classes with only an __init__ method & a handful of unrelated attributes. * [rllib] Refactor DDPG noise * [rllib] Unify DDPG exploration annealing Added option "exploration_should_anneal" to enable linear annealing of exploration noise. By default this is off, for consistency with DDPG & TD3 papers. Also renamed "exploration_final_eps" to "exploration_final_scale" (that name seems to have been carried over from DQN, and doesn't really make sense here). Finally, tried to rename "eps" to "noise_scale" wherever possible. 2019-04-26 17:49:53 -07:00
			`# === Evaluation ===`
			`evaluation_interval: 5`
			`evaluation_num_episodes: 10`