ray/rllib/tuned_examples/pendulum-ppo.yaml

# can expect improvement to -140 reward in ~300-500k timesteps
pendulum-ppo:
    env: Pendulum-v0
    run: PPO
    config:
        train_batch_size: 2048
        vf_clip_param: 10.0
        num_workers: 0
        num_envs_per_worker: 10
        lambda: 0.1
        gamma: 0.95
        lr: 0.0003
        sgd_minibatch_size: 64
        num_sgd_iter: 10
        model:
            fcnet_hiddens: [64, 64]
        batch_mode: complete_episodes
        observation_filter: MeanStdFilter
[rllib] add tuned example for pendulum (#1552) 2018-02-18 00:46:42 -08:00			`# can expect improvement to -140 reward in ~300-500k timesteps`
			`pendulum-ppo:`
			`env: Pendulum-v0`
			`run: PPO`
			`config:`
[rllib] clarify train batch size for PPO (#2793) It's possible to configure PPO in a way that ends up discarding most of the samples (they are treated as "stragglers"). Add a warning when this happens, and raise an exception if the waste is particularly egregious. 2018-09-05 12:06:13 -07:00			`train_batch_size: 2048`
[rllib] Add vf clipping param to fix pendulum example (#2921) * add vf clip * fix test * Update ppo.py 2018-09-23 13:11:17 -07:00			`vf_clip_param: 10.0`
[rllib] Learner should not see clipped actions (#3496) 2018-12-09 21:57:11 -08:00			`num_workers: 0`
			`num_envs_per_worker: 10`
[rllib] add tuned example for pendulum (#1552) 2018-02-18 00:46:42 -08:00			`lambda: 0.1`
			`gamma: 0.95`
[rllib] clarify train batch size for PPO (#2793) It's possible to configure PPO in a way that ends up discarding most of the samples (they are treated as "stragglers"). Add a warning when this happens, and raise an exception if the waste is particularly egregious. 2018-09-05 12:06:13 -07:00			`lr: 0.0003`
			`sgd_minibatch_size: 64`
[rllib] add tuned example for pendulum (#1552) 2018-02-18 00:46:42 -08:00			`num_sgd_iter: 10`
			`model:`
			`fcnet_hiddens: [64, 64]`
[rllib] Default to truncate_episodes and add some more config validators (#2967) * update * link it * warn about truncation * fix * Update rllib-training.rst * deprecate tests failing 2018-09-30 18:37:55 -07:00			`batch_mode: complete_episodes`
[rllib] Some API cleanups and documentation improvements (#4409) 2019-03-21 21:34:22 -07:00			`observation_filter: MeanStdFilter`