[rllib] docs for td3 (#3381)

* td3 doc * Update rllib-env.rst
2025-03-06 10:31:39 -05:00 · 2018-11-22 13:36:47 -08:00 · 2018-11-22 13:36:47 -08:00 · 8b76bab25c
commit 8b76bab25c
parent 41b6b50d09
4 changed files with 9 additions and 8 deletions
--- a/doc/source/rllib-algorithms.rst
+++ b/doc/source/rllib-algorithms.rst
@ -119,12 +119,12 @@ SpaceInvaders  692                       ~600
   :start-after: __sphinx_doc_begin__
   :end-before: __sphinx_doc_end__

-Deep Deterministic Policy Gradients (DDPG)
------------------------------------------
+Deep Deterministic Policy Gradients (DDPG, TD3)
+-----------------------------------------------
 `[paper] <https://arxiv.org/abs/1509.02971>`__ `[implementation] <https://github.com/ray-project/ray/blob/master/python/ray/rllib/agents/ddpg/ddpg.py>`__
-DDPG is implemented similarly to DQN (below). The algorithm can be scaled by increasing the number of workers, switching to AsyncGradientsOptimizer, or using Ape-X.
+DDPG is implemented similarly to DQN (below). The algorithm can be scaled by increasing the number of workers, switching to AsyncGradientsOptimizer, or using Ape-X. The improvements from `TD3 <https://spinningup.openai.com/en/latest/algorithms/td3.html>`__ are available though not enabled by default.

-Tuned examples: `Pendulum-v0 <https://github.com/ray-project/ray/blob/master/python/ray/rllib/tuned_examples/pendulum-ddpg.yaml>`__, `MountainCarContinuous-v0 <https://github.com/ray-project/ray/blob/master/python/ray/rllib/tuned_examples/mountaincarcontinuous-ddpg.yaml>`__, `HalfCheetah-v2 <https://github.com/ray-project/ray/blob/master/python/ray/rllib/tuned_examples/halfcheetah-ddpg.yaml>`__
+Tuned examples: `Pendulum-v0 <https://github.com/ray-project/ray/blob/master/python/ray/rllib/tuned_examples/pendulum-ddpg.yaml>`__, `TD3 configuration <https://github.com/ray-project/ray/blob/master/python/ray/rllib/tuned_examples/pendulum-td3.yaml>`__, `MountainCarContinuous-v0 <https://github.com/ray-project/ray/blob/master/python/ray/rllib/tuned_examples/mountaincarcontinuous-ddpg.yaml>`__, `HalfCheetah-v2 <https://github.com/ray-project/ray/blob/master/python/ray/rllib/tuned_examples/halfcheetah-ddpg.yaml>`__

 **DDPG-specific configs** (see also `common configs <rllib-training.html#common-parameters>`__):

--- a/doc/source/rllib-env.rst
+++ b/doc/source/rllib-env.rst
@ -15,7 +15,7 @@ PPO             **Yes**           **Yes**             **Yes**      **Yes**
 PG              **Yes**           **Yes**             **Yes**      **Yes**
 IMPALA          **Yes**           No                  **Yes**      **Yes**
 DQN, Rainbow    **Yes**           No                  **Yes**      No
-DDPG            No                **Yes**             **Yes**      No
+DDPG, TD3       No                **Yes**             **Yes**      No
 APEX-DQN        **Yes**           No                  **Yes**      No
 APEX-DDPG       No                **Yes**             **Yes**      No
 ES              **Yes**           **Yes**             No           No
--- a/doc/source/rllib.rst
+++ b/doc/source/rllib.rst
@ -54,7 +54,7 @@ Algorithms

   -  `Advantage Actor-Critic (A2C, A3C) <rllib-algorithms.html#advantage-actor-critic-a2c-a3c>`__

-   -  `Deep Deterministic Policy Gradients (DDPG) <rllib-algorithms.html#deep-deterministic-policy-gradients-ddpg>`__
+   -  `Deep Deterministic Policy Gradients (DDPG, TD3) <rllib-algorithms.html#deep-deterministic-policy-gradients-ddpg-td3>`__

   -  `Deep Q Networks (DQN, Rainbow) <rllib-algorithms.html#deep-q-networks-dqn-rainbow>`__

--- a/python/ray/rllib/tuned_examples/halfcheetah-ddpg.yaml
+++ b/python/ray/rllib/tuned_examples/halfcheetah-ddpg.yaml
@ -34,8 +34,9 @@ halfcheetah-ddpg:
        clip_rewards: False

        # === Optimization ===
-        actor_lr: 0.0001
-        critic_lr: 0.001
+        lr: 0.001
+        actor_loss_coeff: 0.1
+        critic_loss_coeff: 1.0
        use_huber: False
        huber_threshold: 1.0
        l2_reg: 0.000001