ray/rllib/tuned_examples/mujoco-td3.yaml
Sven Mika 83e06cd30a
[RLlib] DDPG refactor and Exploration API action noise classes. (#7314)
* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* Fix

* WIP.

* Add TD3 quick Pendulum regresison.

* Cleanup.

* Fix.

* LINT.

* Fix.

* Sort quick_learning test cases, add TD3.

* Sort quick_learning test cases, add TD3.

* Revert test_checkpoint_restore.py (debugging) changes.

* Fix old soft_q settings in documentation and test configs.

* More doc fixes.

* Fix test case.

* Fix test case.

* Lower test load.

* WIP.
2020-03-01 11:53:35 -08:00

25 lines
819 B
YAML

mujoco-td3:
# Solve latest versions of the four hardest Mujoco tasks benchmarked in the
# original TD3 paper. Average return over 10 trials at end of 1,000,000
# timesteps (taken from Table 2 of the paper) are given in parens at the end
# of reach environment name.
#
# Paper is at https://arxiv.org/pdf/1802.09477.pdf
env:
grid_search:
- HalfCheetah-v2 # (9,532.99)
- Hopper-v2 # (3,304.75)
- Walker2d-v2 # (4,565.24)
- Ant-v2 # (4,185.06)
run: TD3
stop:
timesteps_total: 1000000
config:
# === Exploration ===
learning_starts: 10000
exploration_config:
random_timesteps: 10000
# === Evaluation ===
evaluation_interval: 5
evaluation_num_episodes: 10