mirror of
https://github.com/vale981/ray
synced 2025-03-09 12:56:46 -04:00

* initial documentation for RLLib * more RL documentation * fix linting * fix comments * update * fix
159 lines
5.2 KiB
ReStructuredText
159 lines
5.2 KiB
ReStructuredText
RLLib: Ray's scalable reinforcement learning library
|
|
====================================================
|
|
|
|
This document describes Ray's reinforcement learning library.
|
|
It currently supports the following algorithms:
|
|
|
|
- `Proximal Policy Optimization <https://arxiv.org/abs/1707.06347>`__ which
|
|
is a proximal variant of `TRPO <https://arxiv.org/abs/1502.05477>`__.
|
|
|
|
- Evolution Strategies which is decribed in `this
|
|
paper <https://arxiv.org/abs/1703.03864>`__. Our implementation
|
|
borrows code from
|
|
`here <https://github.com/openai/evolution-strategies-starter>`__.
|
|
|
|
- `The Asynchronous Advantage Actor-Critic <https://arxiv.org/abs/1602.01783>`__
|
|
based on `the OpenAI starter agent <https://github.com/openai/universe-starter-agent>`__.
|
|
|
|
Proximal Policy Optimization scales to hundreds of cores and several GPUs,
|
|
Evolution Strategies to clusters with thousands of cores and
|
|
the Asynchronous Advantage Actor-Critic scales to dozens of cores
|
|
on a single node.
|
|
|
|
These algorithms can be run on any OpenAI gym MDP, including custom ones written
|
|
and registered by the user.
|
|
|
|
Getting Started
|
|
---------------
|
|
|
|
You can run training with
|
|
|
|
::
|
|
|
|
python ray/python/ray/rllib/train.py --env CartPole-v0 --alg PPO --config '{"timesteps_per_batch": 10000}'
|
|
|
|
By default, the results will be logged to a subdirectory of ``/tmp/ray``.
|
|
This subdirectory will contain a file ``config.json`` which contains the
|
|
hyperparameters, a file ``result.json`` which contains a training summary
|
|
for each episode and a TensorBoard file that can be used to visualize
|
|
training process with TensorBoard by running
|
|
|
|
::
|
|
|
|
tensorboard --logdir=/tmp/ray
|
|
|
|
|
|
The ``train.py`` script has a number of options you can show by running
|
|
|
|
::
|
|
|
|
python ray/python/ray/rllib/train.py --help
|
|
|
|
The most important options are for choosing the environment
|
|
with ``--env`` (any OpenAI gym environment including ones registered by the user
|
|
can be used) and for choosing the algorithm with ``--alg``
|
|
(available options are ``PPO``, ``A3C``, ``ES`` and ``DQN``). Each algorithm
|
|
has specific hyperparameters that can be set with ``--config``, see the
|
|
``DEFAULT_CONFIG`` variable in
|
|
`PPO <https://github.com/ray-project/ray/blob/master/python/ray/rllib/ppo/ppo.py>`__,
|
|
`A3C <https://github.com/ray-project/ray/blob/master/python/ray/rllib/a3c/a3c.py>`__,
|
|
`ES <https://github.com/ray-project/ray/blob/master/python/ray/rllib/es/es.py>`__ and
|
|
`DQN <https://github.com/ray-project/ray/blob/master/python/ray/rllib/dqn/dqn.py>`__.
|
|
|
|
|
|
Examples
|
|
--------
|
|
|
|
Some good hyperparameters and settings are available in
|
|
`the repository <https://github.com/ray-project/ray/blob/master/python/ray/rllib/test/tuned_examples.sh>`__
|
|
(some of them are tuned to run on GPUs). If you find better settings or tune
|
|
an algorithm on a different domain, consider submitting a Pull Request!
|
|
|
|
The User API
|
|
------------
|
|
|
|
You will be using this part of the API if you run the existing algorithms
|
|
on a new problem. Note that the API is not considered to be stable yet.
|
|
Here is an example how to use it:
|
|
|
|
::
|
|
|
|
import ray
|
|
import ray.rllib.ppo as ppo
|
|
|
|
ray.init()
|
|
|
|
config = ppo.DEFAULT_CONFIG.copy()
|
|
alg = ppo.PPOAgent("CartPole-v1", config)
|
|
|
|
# Can optionally call alg.restore(path) to load a checkpoint.
|
|
|
|
for i in range(10):
|
|
# Perform one iteration of the algorithm.
|
|
result = alg.train()
|
|
print("result: {}".format(result))
|
|
print("checkpoint saved at path: {}".format(alg.save()))
|
|
|
|
The Developer API
|
|
-----------------
|
|
|
|
This part of the API will be useful if you need to change existing RL algorithms
|
|
or implement new ones. Note that the API is not considered to be stable yet.
|
|
|
|
Agents
|
|
~~~~~~
|
|
|
|
Agents implement a particular algorithm and can be used to run
|
|
some number of iterations of the algorithm, save and load the state
|
|
of training and evaluate the current policy. All agents inherit from
|
|
a common base class:
|
|
|
|
.. autoclass:: ray.rllib.common.Agent
|
|
:members:
|
|
|
|
Models
|
|
~~~~~~
|
|
|
|
Models are subclasses of the Model class:
|
|
|
|
.. autoclass:: ray.rllib.models.Model
|
|
|
|
Currently we support fully connected policies, convolutional policies and
|
|
LSTMs:
|
|
|
|
.. autofunction:: ray.rllib.models.FullyConnectedNetwork
|
|
.. autofunction:: ray.rllib.models.ConvolutionalNetwork
|
|
.. autofunction:: ray.rllib.models.LSTM
|
|
|
|
Action Distributions
|
|
~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Actions can be sampled from different distributions, they have a common base
|
|
class:
|
|
|
|
.. autoclass:: ray.rllib.models.ActionDistribution
|
|
:members:
|
|
|
|
Currently we support the following action distributions:
|
|
|
|
.. autofunction:: ray.rllib.models.Categorical
|
|
.. autofunction:: ray.rllib.models.DiagGaussian
|
|
.. autofunction:: ray.rllib.models.Deterministic
|
|
|
|
The Model Catalog
|
|
~~~~~~~~~~~~~~~~~
|
|
|
|
To make picking the right action distribution and models easier, there is
|
|
a mechanism to pick good default values for various gym environments.
|
|
|
|
.. autoclass:: ray.rllib.models.ModelCatalog
|
|
:members:
|
|
|
|
Using RLLib on a cluster
|
|
------------------------
|
|
|
|
First create a cluster as described in `managing a cluster with parallel ssh`_.
|
|
You can then run RLLib on this cluster by passing the address of the main redis
|
|
shard into ``train.py`` with ``--redis-address``.
|
|
|
|
.. _`managing a cluster with parallel ssh`: http://ray.readthedocs.io/en/latest/using-ray-on-a-large-cluster.html
|