mirror of
https://github.com/vale981/ray
synced 2025-03-06 10:31:39 -05:00
parent
41b6b50d09
commit
8b76bab25c
4 changed files with 9 additions and 8 deletions
|
@ -119,12 +119,12 @@ SpaceInvaders 692 ~600
|
||||||
:start-after: __sphinx_doc_begin__
|
:start-after: __sphinx_doc_begin__
|
||||||
:end-before: __sphinx_doc_end__
|
:end-before: __sphinx_doc_end__
|
||||||
|
|
||||||
Deep Deterministic Policy Gradients (DDPG)
|
Deep Deterministic Policy Gradients (DDPG, TD3)
|
||||||
------------------------------------------
|
-----------------------------------------------
|
||||||
`[paper] <https://arxiv.org/abs/1509.02971>`__ `[implementation] <https://github.com/ray-project/ray/blob/master/python/ray/rllib/agents/ddpg/ddpg.py>`__
|
`[paper] <https://arxiv.org/abs/1509.02971>`__ `[implementation] <https://github.com/ray-project/ray/blob/master/python/ray/rllib/agents/ddpg/ddpg.py>`__
|
||||||
DDPG is implemented similarly to DQN (below). The algorithm can be scaled by increasing the number of workers, switching to AsyncGradientsOptimizer, or using Ape-X.
|
DDPG is implemented similarly to DQN (below). The algorithm can be scaled by increasing the number of workers, switching to AsyncGradientsOptimizer, or using Ape-X. The improvements from `TD3 <https://spinningup.openai.com/en/latest/algorithms/td3.html>`__ are available though not enabled by default.
|
||||||
|
|
||||||
Tuned examples: `Pendulum-v0 <https://github.com/ray-project/ray/blob/master/python/ray/rllib/tuned_examples/pendulum-ddpg.yaml>`__, `MountainCarContinuous-v0 <https://github.com/ray-project/ray/blob/master/python/ray/rllib/tuned_examples/mountaincarcontinuous-ddpg.yaml>`__, `HalfCheetah-v2 <https://github.com/ray-project/ray/blob/master/python/ray/rllib/tuned_examples/halfcheetah-ddpg.yaml>`__
|
Tuned examples: `Pendulum-v0 <https://github.com/ray-project/ray/blob/master/python/ray/rllib/tuned_examples/pendulum-ddpg.yaml>`__, `TD3 configuration <https://github.com/ray-project/ray/blob/master/python/ray/rllib/tuned_examples/pendulum-td3.yaml>`__, `MountainCarContinuous-v0 <https://github.com/ray-project/ray/blob/master/python/ray/rllib/tuned_examples/mountaincarcontinuous-ddpg.yaml>`__, `HalfCheetah-v2 <https://github.com/ray-project/ray/blob/master/python/ray/rllib/tuned_examples/halfcheetah-ddpg.yaml>`__
|
||||||
|
|
||||||
**DDPG-specific configs** (see also `common configs <rllib-training.html#common-parameters>`__):
|
**DDPG-specific configs** (see also `common configs <rllib-training.html#common-parameters>`__):
|
||||||
|
|
||||||
|
|
|
@ -15,7 +15,7 @@ PPO **Yes** **Yes** **Yes** **Yes**
|
||||||
PG **Yes** **Yes** **Yes** **Yes**
|
PG **Yes** **Yes** **Yes** **Yes**
|
||||||
IMPALA **Yes** No **Yes** **Yes**
|
IMPALA **Yes** No **Yes** **Yes**
|
||||||
DQN, Rainbow **Yes** No **Yes** No
|
DQN, Rainbow **Yes** No **Yes** No
|
||||||
DDPG No **Yes** **Yes** No
|
DDPG, TD3 No **Yes** **Yes** No
|
||||||
APEX-DQN **Yes** No **Yes** No
|
APEX-DQN **Yes** No **Yes** No
|
||||||
APEX-DDPG No **Yes** **Yes** No
|
APEX-DDPG No **Yes** **Yes** No
|
||||||
ES **Yes** **Yes** No No
|
ES **Yes** **Yes** No No
|
||||||
|
|
|
@ -54,7 +54,7 @@ Algorithms
|
||||||
|
|
||||||
- `Advantage Actor-Critic (A2C, A3C) <rllib-algorithms.html#advantage-actor-critic-a2c-a3c>`__
|
- `Advantage Actor-Critic (A2C, A3C) <rllib-algorithms.html#advantage-actor-critic-a2c-a3c>`__
|
||||||
|
|
||||||
- `Deep Deterministic Policy Gradients (DDPG) <rllib-algorithms.html#deep-deterministic-policy-gradients-ddpg>`__
|
- `Deep Deterministic Policy Gradients (DDPG, TD3) <rllib-algorithms.html#deep-deterministic-policy-gradients-ddpg-td3>`__
|
||||||
|
|
||||||
- `Deep Q Networks (DQN, Rainbow) <rllib-algorithms.html#deep-q-networks-dqn-rainbow>`__
|
- `Deep Q Networks (DQN, Rainbow) <rllib-algorithms.html#deep-q-networks-dqn-rainbow>`__
|
||||||
|
|
||||||
|
|
|
@ -34,8 +34,9 @@ halfcheetah-ddpg:
|
||||||
clip_rewards: False
|
clip_rewards: False
|
||||||
|
|
||||||
# === Optimization ===
|
# === Optimization ===
|
||||||
actor_lr: 0.0001
|
lr: 0.001
|
||||||
critic_lr: 0.001
|
actor_loss_coeff: 0.1
|
||||||
|
critic_loss_coeff: 1.0
|
||||||
use_huber: False
|
use_huber: False
|
||||||
huber_threshold: 1.0
|
huber_threshold: 1.0
|
||||||
l2_reg: 0.000001
|
l2_reg: 0.000001
|
||||||
|
|
Loading…
Add table
Reference in a new issue