mirror of
https://github.com/vale981/ray
synced 2025-03-06 10:31:39 -05:00
* #7246 - Fixing broken links * Apply suggestions from code review Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
This commit is contained in:
parent
23b6fdcda1
commit
3d0a8662b3
5 changed files with 9 additions and 9 deletions
|
@ -241,7 +241,7 @@ RLlib DQN is implemented using the SyncReplayOptimizer. The algorithm can be sca
|
|||
|
||||
DQN architecture
|
||||
|
||||
Tuned examples: `PongDeterministic-v4 <https://github.com/ray-project/ray/blob/master/rllib/tuned_examples/pong-dqn.yaml>`__, `Rainbow configuration <https://github.com/ray-project/ray/blob/master/rllib/tuned_examples/pong-rainbow.yaml>`__, `{BeamRider,Breakout,Qbert,SpaceInvaders}NoFrameskip-v4 <https://github.com/ray-project/ray/blob/master/rllib/tuned_examples/atari-basic-dqn.yaml>`__, `with Dueling and Double-Q <https://github.com/ray-project/ray/blob/master/rllib/tuned_examples/atari-duel-ddqn.yaml>`__, `with Distributional DQN <https://github.com/ray-project/ray/blob/master/rllib/tuned_examples/atari-dist-dqn.yaml>`__.
|
||||
Tuned examples: `PongDeterministic-v4 <https://github.com/ray-project/ray/blob/master/rllib/tuned_examples/pong-dqn.yaml>`__, `Rainbow configuration <https://github.com/ray-project/ray/blob/master/rllib/tuned_examples/pong-rainbow.yaml>`__, `{BeamRider,Breakout,Qbert,SpaceInvaders}NoFrameskip-v4 <https://github.com/ray-project/ray/blob/master/rllib/tuned_examples/atari-dqn.yaml>`__, `with Dueling and Double-Q <https://github.com/ray-project/ray/blob/master/rllib/tuned_examples/atari-duel-ddqn.yaml>`__, `with Distributional DQN <https://github.com/ray-project/ray/blob/master/rllib/tuned_examples/atari-dist-dqn.yaml>`__.
|
||||
|
||||
.. tip::
|
||||
Consider using `Ape-X <#distributed-prioritized-experience-replay-ape-x>`__ for faster training with similar timestep efficiency.
|
||||
|
|
|
@ -275,7 +275,7 @@ Now let's take a look at the ``update_kl`` function. This is used to adaptively
|
|||
# multi-agent
|
||||
trainer.workers.local_worker().foreach_trainable_policy(update)
|
||||
|
||||
The ``update_kl`` method on the policy is defined in `PPOTFPolicy <https://github.com/ray-project/ray/blob/master/rllib/agents/ppo/ppo_policy.py>`__ via the ``KLCoeffMixin``, along with several other advanced features. Let's look at each new feature used by the policy:
|
||||
The ``update_kl`` method on the policy is defined in `PPOTFPolicy <https://github.com/ray-project/ray/blob/master/rllib/agents/ppo/ppo_tf_policy.py>`__ via the ``KLCoeffMixin``, along with several other advanced features. Let's look at each new feature used by the policy:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
|
|
|
@ -165,7 +165,7 @@ If all the agents will be using the same algorithm class to train, then you can
|
|||
|
||||
RLlib will create three distinct policies and route agent decisions to its bound policy. When an agent first appears in the env, ``policy_mapping_fn`` will be called to determine which policy it is bound to. RLlib reports separate training statistics for each policy in the return from ``train()``, along with the combined reward.
|
||||
|
||||
Here is a simple `example training script <https://github.com/ray-project/ray/blob/master/rllib/examples/multiagent_cartpole.py>`__ in which you can vary the number of agents and policies in the environment. For how to use multiple training methods at once (here DQN and PPO), see the `two-trainer example <https://github.com/ray-project/ray/blob/master/rllib/examples/multiagent_two_trainers.py>`__. Metrics are reported for each policy separately, for example:
|
||||
Here is a simple `example training script <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_cartpole.py>`__ in which you can vary the number of agents and policies in the environment. For how to use multiple training methods at once (here DQN and PPO), see the `two-trainer example <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_two_trainers.py>`__. Metrics are reported for each policy separately, for example:
|
||||
|
||||
.. code-block:: bash
|
||||
:emphasize-lines: 6,14,22
|
||||
|
@ -223,7 +223,7 @@ RLlib will create each policy's model in a separate ``tf.variable_scope``. Howev
|
|||
auxiliary_name_scope=False):
|
||||
<create the shared layers here>
|
||||
|
||||
There is a full example of this in the `example training script <https://github.com/ray-project/ray/blob/master/rllib/examples/multiagent_cartpole.py>`__.
|
||||
There is a full example of this in the `example training script <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_cartpole.py>`__.
|
||||
|
||||
Implementing a Centralized Critic
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
|
|
@ -71,11 +71,11 @@ Multi-Agent and Hierarchical
|
|||
Example of customizing PPO to leverage a centralized value function.
|
||||
- `Centralized critic in the env <https://github.com/ray-project/ray/blob/master/rllib/examples/centralized_critic_2.py>`__:
|
||||
A simpler method of implementing a centralized critic by augmentating agent observations with global information.
|
||||
- `Hand-coded policy <https://github.com/ray-project/ray/blob/master/rllib/examples/multiagent_custom_policy.py>`__:
|
||||
- `Hand-coded policy <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_custom_policy.py>`__:
|
||||
Example of running a custom hand-coded policy alongside trainable policies.
|
||||
- `Weight sharing between policies <https://github.com/ray-project/ray/blob/master/rllib/examples/multiagent_cartpole.py>`__:
|
||||
- `Weight sharing between policies <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_cartpole.py>`__:
|
||||
Example of how to define weight-sharing layers between two different policies.
|
||||
- `Multiple trainers <https://github.com/ray-project/ray/blob/master/rllib/examples/multiagent_two_trainers.py>`__:
|
||||
- `Multiple trainers <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_two_trainers.py>`__:
|
||||
Example of alternating training between two DQN and PPO trainers.
|
||||
- `Hierarchical training <https://github.com/ray-project/ray/blob/master/rllib/examples/hierarchical_training.py>`__:
|
||||
Example of hierarchical training using the multi-agent API.
|
||||
|
|
|
@ -149,7 +149,7 @@ In order to use this search algorithm, you will need to install Nevergrad via th
|
|||
|
||||
Keep in mind that ``nevergrad`` is a Python 3.6+ library.
|
||||
|
||||
This algorithm requires using an optimizer provided by ``nevergrad``, of which there are many options. A good rundown can be found on their README's `Optimization <https://github.com/facebookresearch/nevergrad/blob/master/docs/optimization.md#Choosing-an-optimizer>`__ section. You can use ``NevergradSearch`` like follows:
|
||||
This algorithm requires using an optimizer provided by ``nevergrad``, of which there are many options. A good rundown can be found on their README's `Optimization <https://github.com/facebookresearch/nevergrad/blob/master/docs/optimization.rst#choosing-an-optimizer>`__ section. You can use ``NevergradSearch`` like follows:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
|
@ -172,7 +172,7 @@ In order to use this search algorithm, you will need to install Scikit-Optimize
|
|||
|
||||
$ pip install scikit-optimize
|
||||
|
||||
This algorithm requires using the `Scikit-Optimize ask and tell interface <https://scikit-optimize.github.io/notebooks/ask-and-tell.html>`__. This interface requires using the `Optimizer <https://scikit-optimize.github.io/#skopt.Optimizer>`__ provided by Scikit-Optimize. You can use SkOptSearch like follows:
|
||||
This algorithm requires using the `Scikit-Optimize ask and tell interface <https://scikit-optimize.github.io/stable/auto_examples/ask-and-tell.html>`__. This interface requires using the `Optimizer <https://scikit-optimize.github.io/#skopt.Optimizer>`__ provided by Scikit-Optimize. You can use SkOptSearch like follows:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
|
|
Loading…
Add table
Reference in a new issue