ray/rllib/tuned_examples/ppo/pendulum-ppo.yaml

# Can expect improvement to -140 reward in ~300-500k timesteps.
pendulum-ppo:
    env: Pendulum-v0
    run: PPO
    stop:
        episode_reward_mean: -500
        timesteps_total: 400000
    config:
        # Works for both torch and tf.
        framework: tf
        train_batch_size: 512
        vf_clip_param: 10.0
        num_workers: 0
        num_envs_per_worker: 20
        lambda: 0.1
        gamma: 0.95
        lr: 0.0003
        sgd_minibatch_size: 64
        num_sgd_iter: 6
        observation_filter: MeanStdFilter
[RLlib] Benchmark and regression test yaml cleanup and restructuring. (#8414) 2020-05-26 11:10:27 +02:00			`# Can expect improvement to -140 reward in ~300-500k timesteps.`
			`pendulum-ppo:`
[rllib] Add more regression tests and autogenerate (#2324) 2018-07-02 08:20:53 -07:00			`env: Pendulum-v0`
			`run: PPO`
			`stop:`
[RLlib] Stabilize Pendulum-v0 regression test cases. (#8232) Stabilize Pendulum regression test cases. 2020-04-30 15:48:11 +02:00			`episode_reward_mean: -500`
			`timesteps_total: 400000`
[rllib] Add more regression tests and autogenerate (#2324) 2018-07-02 08:20:53 -07:00			`config:`
[RLlib] Benchmark and regression test yaml cleanup and restructuring. (#8414) 2020-05-26 11:10:27 +02:00			`# Works for both torch and tf.`
[RLlib] Auto-framework, retire `use_pytorch` in favor of `framework=...` (#8520) 2020-05-27 16:19:13 +02:00			`framework: tf`
[RLlib] Issue 8769 broken OOM tests_dir cases (R & S). (#8770) 2020-06-05 08:34:21 +02:00			`train_batch_size: 512`
[rllib] Add vf clipping param to fix pendulum example (#2921) * add vf clip * fix test * Update ppo.py 2018-09-23 13:11:17 -07:00			`vf_clip_param: 10.0`
[rllib] Run simple regressions tests for all algs in jenkins (#3498) 2018-12-11 17:21:53 -08:00			`num_workers: 0`
[RLlib] Issue 8769 broken OOM tests_dir cases (R & S). (#8770) 2020-06-05 08:34:21 +02:00			`num_envs_per_worker: 20`
[rllib] Add more regression tests and autogenerate (#2324) 2018-07-02 08:20:53 -07:00			`lambda: 0.1`
			`gamma: 0.95`
[rllib] clarify train batch size for PPO (#2793) It's possible to configure PPO in a way that ends up discarding most of the samples (they are treated as "stragglers"). Add a warning when this happens, and raise an exception if the waste is particularly egregious. 2018-09-05 12:06:13 -07:00			`lr: 0.0003`
			`sgd_minibatch_size: 64`
[RLlib] Issue 8769 broken OOM tests_dir cases (R & S). (#8770) 2020-06-05 08:34:21 +02:00			`num_sgd_iter: 6`
[rllib] Set PPO observation filter to NoFilter by default (#4191) 2019-03-01 13:19:33 -08:00			`observation_filter: MeanStdFilter`