[rllib] docs for td3 (#3381)

* td3 doc

* Update rllib-env.rst
This commit is contained in:
Eric Liang 2018-11-22 13:36:47 -08:00 committed by GitHub
parent 41b6b50d09
commit 8b76bab25c
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
4 changed files with 9 additions and 8 deletions

View file

@ -119,12 +119,12 @@ SpaceInvaders 692 ~600
:start-after: __sphinx_doc_begin__
:end-before: __sphinx_doc_end__
Deep Deterministic Policy Gradients (DDPG)
------------------------------------------
Deep Deterministic Policy Gradients (DDPG, TD3)
-----------------------------------------------
`[paper] <https://arxiv.org/abs/1509.02971>`__ `[implementation] <https://github.com/ray-project/ray/blob/master/python/ray/rllib/agents/ddpg/ddpg.py>`__
DDPG is implemented similarly to DQN (below). The algorithm can be scaled by increasing the number of workers, switching to AsyncGradientsOptimizer, or using Ape-X.
DDPG is implemented similarly to DQN (below). The algorithm can be scaled by increasing the number of workers, switching to AsyncGradientsOptimizer, or using Ape-X. The improvements from `TD3 <https://spinningup.openai.com/en/latest/algorithms/td3.html>`__ are available though not enabled by default.
Tuned examples: `Pendulum-v0 <https://github.com/ray-project/ray/blob/master/python/ray/rllib/tuned_examples/pendulum-ddpg.yaml>`__, `MountainCarContinuous-v0 <https://github.com/ray-project/ray/blob/master/python/ray/rllib/tuned_examples/mountaincarcontinuous-ddpg.yaml>`__, `HalfCheetah-v2 <https://github.com/ray-project/ray/blob/master/python/ray/rllib/tuned_examples/halfcheetah-ddpg.yaml>`__
Tuned examples: `Pendulum-v0 <https://github.com/ray-project/ray/blob/master/python/ray/rllib/tuned_examples/pendulum-ddpg.yaml>`__, `TD3 configuration <https://github.com/ray-project/ray/blob/master/python/ray/rllib/tuned_examples/pendulum-td3.yaml>`__, `MountainCarContinuous-v0 <https://github.com/ray-project/ray/blob/master/python/ray/rllib/tuned_examples/mountaincarcontinuous-ddpg.yaml>`__, `HalfCheetah-v2 <https://github.com/ray-project/ray/blob/master/python/ray/rllib/tuned_examples/halfcheetah-ddpg.yaml>`__
**DDPG-specific configs** (see also `common configs <rllib-training.html#common-parameters>`__):

View file

@ -15,7 +15,7 @@ PPO **Yes** **Yes** **Yes** **Yes**
PG **Yes** **Yes** **Yes** **Yes**
IMPALA **Yes** No **Yes** **Yes**
DQN, Rainbow **Yes** No **Yes** No
DDPG No **Yes** **Yes** No
DDPG, TD3 No **Yes** **Yes** No
APEX-DQN **Yes** No **Yes** No
APEX-DDPG No **Yes** **Yes** No
ES **Yes** **Yes** No No

View file

@ -54,7 +54,7 @@ Algorithms
- `Advantage Actor-Critic (A2C, A3C) <rllib-algorithms.html#advantage-actor-critic-a2c-a3c>`__
- `Deep Deterministic Policy Gradients (DDPG) <rllib-algorithms.html#deep-deterministic-policy-gradients-ddpg>`__
- `Deep Deterministic Policy Gradients (DDPG, TD3) <rllib-algorithms.html#deep-deterministic-policy-gradients-ddpg-td3>`__
- `Deep Q Networks (DQN, Rainbow) <rllib-algorithms.html#deep-q-networks-dqn-rainbow>`__

View file

@ -34,8 +34,9 @@ halfcheetah-ddpg:
clip_rewards: False
# === Optimization ===
actor_lr: 0.0001
critic_lr: 0.001
lr: 0.001
actor_loss_coeff: 0.1
critic_loss_coeff: 1.0
use_huber: False
huber_threshold: 1.0
l2_reg: 0.000001