ray/rllib/tuned_examples/pendulum-td3.yaml
Sven Mika 83e06cd30a
[RLlib] DDPG refactor and Exploration API action noise classes. (#7314)
* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* Fix

* WIP.

* Add TD3 quick Pendulum regresison.

* Cleanup.

* Fix.

* LINT.

* Fix.

* Sort quick_learning test cases, add TD3.

* Sort quick_learning test cases, add TD3.

* Revert test_checkpoint_restore.py (debugging) changes.

* Fix old soft_q settings in documentation and test configs.

* More doc fixes.

* Fix test case.

* Fix test case.

* Lower test load.

* WIP.
2020-03-01 11:53:35 -08:00

20 lines
527 B
YAML

# This configuration can expect to reach -160 reward in 10k-20k timesteps
pendulum-ddpg:
env: Pendulum-v0
run: TD3
stop:
episode_reward_mean: -130
time_total_s: 900 # 10 minutes
config:
# === Model ===
actor_hiddens: [64, 64]
critic_hiddens: [64, 64]
# === Exploration ===
learning_starts: 5000
exploration_config:
random_timesteps: 5000
# === Evaluation ===
evaluation_interval: 1
evaluation_num_episodes: 5