ray/rllib/tuned_examples/ddpg/pendulum-ddpg.yaml

# This configuration can expect to reach -160 reward in 10k-20k timesteps.
pendulum-ddpg:
    env: Pendulum-v1
    run: DDPG
    stop:
      episode_reward_mean: -320
      timesteps_total: 30000
    config:
        # Works for both torch and tf.
        seed: 42
        soft_horizon: false
        no_done_at_end: true
        framework: torch
        # === Model ===
        actor_hiddens: [64, 64]
        critic_hiddens: [64, 64]
        n_step: 1
        model: {}
        gamma: 0.99

        # === Exploration ===
        exploration_config:
            type: "OrnsteinUhlenbeckNoise"
            scale_timesteps: 10000
            initial_scale: 1.0
            final_scale: 0.02
            ou_base_scale: 0.1
            ou_theta: 0.15
            ou_sigma: 0.2

        min_sample_timesteps_per_reporting: 600
        target_network_update_freq: 0
        tau: 0.001

        # === Replay buffer ===
        replay_buffer_config:
          type: MultiAgentPrioritizedReplayBuffer
          capacity: 10000
          worker_side_prioritization: false
        clip_rewards: False

        # === Optimization ===
        actor_lr: 0.001
        critic_lr: 0.001
        use_huber: True
        huber_threshold: 1.0
        l2_reg: 0.000001
        learning_starts: 500
        rollout_fragment_length: 1
        train_batch_size: 64

        # === Parallelism ===
        num_workers: 0
[RLlib] Benchmark and regression test yaml cleanup and restructuring. (#8414) 2020-05-26 11:10:27 +02:00			`# This configuration can expect to reach -160 reward in 10k-20k timesteps.`
[rllib] Add DDPG documentation, rename DDPG2 <=> DDPG (#1946) * updates * updates * updates * updates * updates * updates * Update rllib.rst * Update policy-optimizers.rst 2018-04-30 00:18:15 -07:00			`pendulum-ddpg:`
[RLlib] Upgrade gym version to 0.21 and deprecate pendulum-v0. (#19535) * Fix QMix, SAC, and MADDPA too. * Unpin gym and deprecate pendulum v0 Many tests in rllib depended on pendulum v0, however in gym 0.21, pendulum v0 was deprecated in favor of pendulum v1. This may change reward thresholds, so will have to potentially rerun all of the pendulum v1 benchmarks, or use another environment in favor. The same applies to frozen lake v0 and frozen lake v1 Lastly, all of the RLlib tests and have been moved to python 3.7 * Add gym installation based on python version. Pin python<= 3.6 to gym 0.19 due to install issues with atari roms in gym 0.20 * Reformatting * Fixing tests * Move atari-py install conditional to req.txt * migrate to new ale install method * Fix QMix, SAC, and MADDPA too. * Unpin gym and deprecate pendulum v0 Many tests in rllib depended on pendulum v0, however in gym 0.21, pendulum v0 was deprecated in favor of pendulum v1. This may change reward thresholds, so will have to potentially rerun all of the pendulum v1 benchmarks, or use another environment in favor. The same applies to frozen lake v0 and frozen lake v1 Lastly, all of the RLlib tests and have been moved to python 3.7 * Add gym installation based on python version. Pin python<= 3.6 to gym 0.19 due to install issues with atari roms in gym 0.20 Move atari-py install conditional to req.txt migrate to new ale install method Make parametric_actions_cartpole return float32 actions/obs Adding type conversions if obs/actions don't match space Add utils to make elements match gym space dtypes Co-authored-by: Jun Gong <jungong@anyscale.com> Co-authored-by: sven1977 <svenmika1977@gmail.com> 2021-11-03 08:24:00 -07:00			`env: Pendulum-v1`
[rllib] Add DDPG documentation, rename DDPG2 <=> DDPG (#1946) * updates * updates * updates * updates * updates * updates * Update rllib.rst * Update policy-optimizers.rst 2018-04-30 00:18:15 -07:00			`run: DDPG`
			`stop:`
[RLlib] [CI] Deflake longer running RLlib learning tests for off policy algorithms. Fix seeding issue in TransformedAction Environments (#21685) 2022-02-04 05:59:56 -08:00			`episode_reward_mean: -320`
			`timesteps_total: 30000`
[rllib] Add DDPG documentation, rename DDPG2 <=> DDPG (#1946) * updates * updates * updates * updates * updates * updates * Update rllib.rst * Update policy-optimizers.rst 2018-04-30 00:18:15 -07:00			`config:`
[RLlib] Benchmark and regression test yaml cleanup and restructuring. (#8414) 2020-05-26 11:10:27 +02:00			`# Works for both torch and tf.`
[RLlib] [CI] Deflake longer running RLlib learning tests for off policy algorithms. Fix seeding issue in TransformedAction Environments (#21685) 2022-02-04 05:59:56 -08:00			`seed: 42`
			`soft_horizon: false`
			`no_done_at_end: true`
			`framework: torch`
[rllib] Merge DDPG and DDPG2 implementations (#2202) * removed ddpg2 * removed ddpg2 from codebase * added tests used in ddpg vs ddpg2 comparison * added notes about training timesteps to yaml files * removed ddpg2 yaml files * removed unnecessary configs from yaml files * removed unnecessary configs from yaml files * moved pendulum, mountaincarcontinuous, and halfcheetah tests to tuned_examples * moved pendulum, mountaincarcontinuous, and halfcheetah tests to tuned_examples * added more configuration details to yaml files * removed random starts from halfcheetah 2018-06-09 16:46:23 -07:00			`# === Model ===`
			`actor_hiddens: [64, 64]`
			`critic_hiddens: [64, 64]`
			`n_step: 1`
			`model: {}`
			`gamma: 0.99`

			`# === Exploration ===`
[RLlib] DDPG refactor and Exploration API action noise classes. (#7314) * WIP. * WIP. * WIP. * WIP. * WIP. * Fix * WIP. * Add TD3 quick Pendulum regresison. * Cleanup. * Fix. * LINT. * Fix. * Sort quick_learning test cases, add TD3. * Sort quick_learning test cases, add TD3. * Revert test_checkpoint_restore.py (debugging) changes. * Fix old soft_q settings in documentation and test configs. * More doc fixes. * Fix test case. * Fix test case. * Lower test load. * WIP. 2020-03-01 20:53:35 +01:00			`exploration_config:`
			`type: "OrnsteinUhlenbeckNoise"`
			`scale_timesteps: 10000`
[RLlib] Benchmark and regression test yaml cleanup and restructuring. (#8414) 2020-05-26 11:10:27 +02:00			`initial_scale: 1.0`
[RLlib] DDPG refactor and Exploration API action noise classes. (#7314) * WIP. * WIP. * WIP. * WIP. * WIP. * Fix * WIP. * Add TD3 quick Pendulum regresison. * Cleanup. * Fix. * LINT. * Fix. * Sort quick_learning test cases, add TD3. * Sort quick_learning test cases, add TD3. * Revert test_checkpoint_restore.py (debugging) changes. * Fix old soft_q settings in documentation and test configs. * More doc fixes. * Fix test case. * Fix test case. * Lower test load. * WIP. 2020-03-01 20:53:35 +01:00			`final_scale: 0.02`
			`ou_base_scale: 0.1`
			`ou_theta: 0.15`
			`ou_sigma: 0.2`

[RLlib] Deprecate `timesteps_per_iteration` config key (in favor of `min_[sample\|train]_timesteps_per_reporting`. (#24372) 2022-05-02 12:51:14 +02:00			`min_sample_timesteps_per_reporting: 600`
[rllib] Merge DDPG and DDPG2 implementations (#2202) * removed ddpg2 * removed ddpg2 from codebase * added tests used in ddpg vs ddpg2 comparison * added notes about training timesteps to yaml files * removed ddpg2 yaml files * removed unnecessary configs from yaml files * removed unnecessary configs from yaml files * moved pendulum, mountaincarcontinuous, and halfcheetah tests to tuned_examples * moved pendulum, mountaincarcontinuous, and halfcheetah tests to tuned_examples * added more configuration details to yaml files * removed random starts from halfcheetah 2018-06-09 16:46:23 -07:00			`target_network_update_freq: 0`
			`tau: 0.001`

			`# === Replay buffer ===`
[RLlib] DDPG Training iteration fn & Replay Buffer API (#24212) 2022-05-05 09:41:38 +02:00			`replay_buffer_config:`
[RLlib] Replay Buffer API and Ape-X. (#24506) 2022-05-17 13:43:49 +02:00			`type: MultiAgentPrioritizedReplayBuffer`
[RLlib] DDPG Training iteration fn & Replay Buffer API (#24212) 2022-05-05 09:41:38 +02:00			`capacity: 10000`
[RLlib] SAC, RNNSAC, and CQL TrainerConfig objects (#25059) 2022-05-22 18:58:47 +01:00			`worker_side_prioritization: false`
[rllib] Merge DDPG and DDPG2 implementations (#2202) * removed ddpg2 * removed ddpg2 from codebase * added tests used in ddpg vs ddpg2 comparison * added notes about training timesteps to yaml files * removed ddpg2 yaml files * removed unnecessary configs from yaml files * removed unnecessary configs from yaml files * moved pendulum, mountaincarcontinuous, and halfcheetah tests to tuned_examples * moved pendulum, mountaincarcontinuous, and halfcheetah tests to tuned_examples * added more configuration details to yaml files * removed random starts from halfcheetah 2018-06-09 16:46:23 -07:00			`clip_rewards: False`

			`# === Optimization ===`
[rllib] TD3/DDPG improvements and MuJoCo benchmarks (#4694) * [rllib] Separate optimisers for DDPG actor & crit. * [rllib] Better names for DDPG variables & options Config changes: - noise_scale -> exploration_ou_noise_scale - exploration_theta -> exploration_ou_theta - exploration_sigma -> exploration_ou_sigma - act_noise -> exploration_gaussian_sigma - noise_clip -> target_noise_clip * [rllib] Make DDPG less class-y Used functions to replace three classes with only an __init__ method & a handful of unrelated attributes. * [rllib] Refactor DDPG noise * [rllib] Unify DDPG exploration annealing Added option "exploration_should_anneal" to enable linear annealing of exploration noise. By default this is off, for consistency with DDPG & TD3 papers. Also renamed "exploration_final_eps" to "exploration_final_scale" (that name seems to have been carried over from DQN, and doesn't really make sense here). Finally, tried to rename "eps" to "noise_scale" wherever possible. 2019-04-26 17:49:53 -07:00			`actor_lr: 0.001`
			`critic_lr: 0.001`
[rllib] Merge DDPG and DDPG2 implementations (#2202) * removed ddpg2 * removed ddpg2 from codebase * added tests used in ddpg vs ddpg2 comparison * added notes about training timesteps to yaml files * removed ddpg2 yaml files * removed unnecessary configs from yaml files * removed unnecessary configs from yaml files * moved pendulum, mountaincarcontinuous, and halfcheetah tests to tuned_examples * moved pendulum, mountaincarcontinuous, and halfcheetah tests to tuned_examples * added more configuration details to yaml files * removed random starts from halfcheetah 2018-06-09 16:46:23 -07:00			`use_huber: True`
			`huber_threshold: 1.0`
			`l2_reg: 0.000001`
			`learning_starts: 500`
[rllib] Rename sample_batch_size => rollout_fragment_length (#7503) * bulk rename * deprecation warn * update doc * update fig * line length * rename * make pytest comptaible * fix test * fi sys * rename * wip * fix more * lint * update svg * comments * lint * fix use of batch steps 2020-03-14 12:05:04 -07:00			`rollout_fragment_length: 1`
[rllib] Merge DDPG and DDPG2 implementations (#2202) * removed ddpg2 * removed ddpg2 from codebase * added tests used in ddpg vs ddpg2 comparison * added notes about training timesteps to yaml files * removed ddpg2 yaml files * removed unnecessary configs from yaml files * removed unnecessary configs from yaml files * moved pendulum, mountaincarcontinuous, and halfcheetah tests to tuned_examples * moved pendulum, mountaincarcontinuous, and halfcheetah tests to tuned_examples * added more configuration details to yaml files * removed random starts from halfcheetah 2018-06-09 16:46:23 -07:00			`train_batch_size: 64`

			`# === Parallelism ===`
			`num_workers: 0`