ray/doc/source/rllib-dev.rst

RLlib Developer Guide
=====================

.. note::

    This guide will take you through steps for implementing a new algorithm in RLlib. To apply existing algorithms already implemented in RLlib, please see the `user docs <rllib.html>`__.

Recipe for an RLlib algorithm
-----------------------------

Here are the steps for implementing a new algorithm in RLlib:

1. Define an algorithm-specific `Policy evaluator class <#policy-evaluators-and-optimizers>`__ (the core of the algorithm). Evaluators encapsulate framework-specific components such as the policy and loss functions. For an example, see the `simple policy gradient evaluator example <https://github.com/ray-project/ray/blob/master/python/ray/rllib/pg/pg_evaluator.py>`__.


2. Pick an appropriate `Policy optimizer class <#policy-evaluators-and-optimizers>`__. Optimizers manage the parallel execution of the algorithm. RLlib provides several built-in optimizers for gradient-based algorithms. Advanced algorithms may find it beneficial to implement their own optimizers.


3. Wrap the two up in an `Agent class <#agents>`__. Agents are the user-facing API of RLlib. They provide the necessary "glue" and implement accessory functionality such as statistics reporting and checkpointing.

To help with implementation, RLlib provides common action distributions, preprocessors, and neural network models, found in `catalog.py <https://github.com/ray-project/ray/blob/master/python/ray/rllib/models/catalog.py>`__, which are shared by all algorithms. Note that most of these utilities are currently Tensorflow specific.

.. image:: rllib-api.svg


The Developer API
-----------------

The following APIs are the building blocks of RLlib algorithms (also take a look at the `user components overview <rllib.html#components-user-customizable-and-internal>`__).

Agents
~~~~~~

Agents implement a particular algorithm and can be used to run
some number of iterations of the algorithm, save and load the state
of training and evaluate the current policy. All agents inherit from
a common base class:

.. autoclass:: ray.rllib.agent.Agent
    :members:

Policy Evaluators and Optimizers
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: ray.rllib.optimizers.policy_evaluator.PolicyEvaluator
    :members:

.. autoclass:: ray.rllib.optimizers.policy_optimizer.PolicyOptimizer
    :members:

Sample Batches
~~~~~~~~~~~~~~

In order for Optimizers to manipulate sample data, they should be returned from Evaluators
in the SampleBatch format (a wrapper around a dict).

.. autoclass:: ray.rllib.optimizers.SampleBatch
    :members:

Models and Preprocessors
~~~~~~~~~~~~~~~~~~~~~~~~

Algorithms share neural network models which inherit from the following class:

.. autoclass:: ray.rllib.models.Model
    :members:

Currently we support fully connected and convolutional TensorFlow policies on all algorithms:

.. autoclass:: ray.rllib.models.FullyConnectedNetwork
.. autoclass:: ray.rllib.models.ConvolutionalNetwork

A3C also supports a TensorFlow LSTM policy.

.. autoclass:: ray.rllib.models.LSTM

Observations are transformed by Preprocessors before used in the model:

.. autoclass:: ray.rllib.models.preprocessors.Preprocessor
    :members:

Action Distributions
~~~~~~~~~~~~~~~~~~~~

Actions can be sampled from different distributions which have a common base
class:

.. autoclass:: ray.rllib.models.ActionDistribution
    :members:

Currently we support the following action distributions:

.. autoclass:: ray.rllib.models.Categorical
.. autoclass:: ray.rllib.models.DiagGaussian
.. autoclass:: ray.rllib.models.Deterministic

The Model Catalog
~~~~~~~~~~~~~~~~~

The Model Catalog is the mechanism for algorithms to get canonical preprocessors, models, and action distributions for varying gym environments. It enables easy reuse of these components across different algorithms.

.. autoclass:: ray.rllib.models.ModelCatalog
    :members:
[rllib] Split docs into user and development guide (#1377) * docs * Update README.rst * Sat Dec 30 15:23:49 PST 2017 * comments * Sun Dec 31 23:33:30 PST 2017 * Sun Dec 31 23:33:38 PST 2017 * Sun Dec 31 23:37:46 PST 2017 * Sun Dec 31 23:39:28 PST 2017 * Sun Dec 31 23:43:05 PST 2017 * Sun Dec 31 23:51:55 PST 2017 * Sun Dec 31 23:52:51 PST 2017 2018-01-01 11:10:44 -08:00			`RLlib Developer Guide`
			`=====================`

			`.. note::`

			This guide will take you through steps for implementing a new algorithm in RLlib. To apply existing algorithms already implemented in RLlib, please see the `user docs <rllib.html>`__.

			`Recipe for an RLlib algorithm`
			`-----------------------------`

			`Here are the steps for implementing a new algorithm in RLlib:`

[rllib] Added vanilla policy gradient (#1497) 2018-02-10 13:54:51 -08:00			1. Define an algorithm-specific `Policy evaluator class <#policy-evaluators-and-optimizers>`__ (the core of the algorithm). Evaluators encapsulate framework-specific components such as the policy and loss functions. For an example, see the `simple policy gradient evaluator example <https://github.com/ray-project/ray/blob/master/python/ray/rllib/pg/pg_evaluator.py>`__.
[rllib] Split docs into user and development guide (#1377) * docs * Update README.rst * Sat Dec 30 15:23:49 PST 2017 * comments * Sun Dec 31 23:33:30 PST 2017 * Sun Dec 31 23:33:38 PST 2017 * Sun Dec 31 23:37:46 PST 2017 * Sun Dec 31 23:39:28 PST 2017 * Sun Dec 31 23:43:05 PST 2017 * Sun Dec 31 23:51:55 PST 2017 * Sun Dec 31 23:52:51 PST 2017 2018-01-01 11:10:44 -08:00

[rllib] Update docs with api and components overview figures (#1443) 2018-01-19 10:08:45 -08:00			2. Pick an appropriate `Policy optimizer class <#policy-evaluators-and-optimizers>`__. Optimizers manage the parallel execution of the algorithm. RLlib provides several built-in optimizers for gradient-based algorithms. Advanced algorithms may find it beneficial to implement their own optimizers.
[rllib] Split docs into user and development guide (#1377) * docs * Update README.rst * Sat Dec 30 15:23:49 PST 2017 * comments * Sun Dec 31 23:33:30 PST 2017 * Sun Dec 31 23:33:38 PST 2017 * Sun Dec 31 23:37:46 PST 2017 * Sun Dec 31 23:39:28 PST 2017 * Sun Dec 31 23:43:05 PST 2017 * Sun Dec 31 23:51:55 PST 2017 * Sun Dec 31 23:52:51 PST 2017 2018-01-01 11:10:44 -08:00

			3. Wrap the two up in an `Agent class <#agents>`__. Agents are the user-facing API of RLlib. They provide the necessary "glue" and implement accessory functionality such as statistics reporting and checkpointing.

			To help with implementation, RLlib provides common action distributions, preprocessors, and neural network models, found in `catalog.py <https://github.com/ray-project/ray/blob/master/python/ray/rllib/models/catalog.py>`__, which are shared by all algorithms. Note that most of these utilities are currently Tensorflow specific.

[rllib] Update docs with api and components overview figures (#1443) 2018-01-19 10:08:45 -08:00			`.. image:: rllib-api.svg`
[rllib] Split docs into user and development guide (#1377) * docs * Update README.rst * Sat Dec 30 15:23:49 PST 2017 * comments * Sun Dec 31 23:33:30 PST 2017 * Sun Dec 31 23:33:38 PST 2017 * Sun Dec 31 23:37:46 PST 2017 * Sun Dec 31 23:39:28 PST 2017 * Sun Dec 31 23:43:05 PST 2017 * Sun Dec 31 23:51:55 PST 2017 * Sun Dec 31 23:52:51 PST 2017 2018-01-01 11:10:44 -08:00

			`The Developer API`
			`-----------------`

[rllib] Update docs with api and components overview figures (#1443) 2018-01-19 10:08:45 -08:00			The following APIs are the building blocks of RLlib algorithms (also take a look at the `user components overview <rllib.html#components-user-customizable-and-internal>`__).
[rllib] Split docs into user and development guide (#1377) * docs * Update README.rst * Sat Dec 30 15:23:49 PST 2017 * comments * Sun Dec 31 23:33:30 PST 2017 * Sun Dec 31 23:33:38 PST 2017 * Sun Dec 31 23:37:46 PST 2017 * Sun Dec 31 23:39:28 PST 2017 * Sun Dec 31 23:43:05 PST 2017 * Sun Dec 31 23:51:55 PST 2017 * Sun Dec 31 23:52:51 PST 2017 2018-01-01 11:10:44 -08:00
			`Agents`
			`~~~~~~`

			`Agents implement a particular algorithm and can be used to run`
			`some number of iterations of the algorithm, save and load the state`
			`of training and evaluate the current policy. All agents inherit from`
			`a common base class:`

			`.. autoclass:: ray.rllib.agent.Agent`
			`:members:`

[rllib] Update docs with api and components overview figures (#1443) 2018-01-19 10:08:45 -08:00			`Policy Evaluators and Optimizers`
			`~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`
[rllib] Split docs into user and development guide (#1377) * docs * Update README.rst * Sat Dec 30 15:23:49 PST 2017 * comments * Sun Dec 31 23:33:30 PST 2017 * Sun Dec 31 23:33:38 PST 2017 * Sun Dec 31 23:37:46 PST 2017 * Sun Dec 31 23:39:28 PST 2017 * Sun Dec 31 23:43:05 PST 2017 * Sun Dec 31 23:51:55 PST 2017 * Sun Dec 31 23:52:51 PST 2017 2018-01-01 11:10:44 -08:00
[rllib] [docs] Cleanup RLlib API and make docs consistent with upcoming blog post (#1708) * wip * more work * fix apex * docs * apex doc * pool comment * clean up * make wrap stack pluggable * Mon Mar 12 21:45:50 PDT 2018 * clean up comment * table * Mon Mar 12 22:51:57 PDT 2018 * Mon Mar 12 22:53:05 PDT 2018 * Mon Mar 12 22:55:03 PDT 2018 * Mon Mar 12 22:56:18 PDT 2018 * Mon Mar 12 22:59:54 PDT 2018 * Update apex_optimizer.py * Update index.rst * Update README.rst * Update README.rst * comments * Wed Mar 14 19:01:02 PDT 2018 2018-03-15 15:57:31 -07:00			`.. autoclass:: ray.rllib.optimizers.policy_evaluator.PolicyEvaluator`
[rllib] Split docs into user and development guide (#1377) * docs * Update README.rst * Sat Dec 30 15:23:49 PST 2017 * comments * Sun Dec 31 23:33:30 PST 2017 * Sun Dec 31 23:33:38 PST 2017 * Sun Dec 31 23:37:46 PST 2017 * Sun Dec 31 23:39:28 PST 2017 * Sun Dec 31 23:43:05 PST 2017 * Sun Dec 31 23:51:55 PST 2017 * Sun Dec 31 23:52:51 PST 2017 2018-01-01 11:10:44 -08:00			`:members:`

[rllib] [docs] Cleanup RLlib API and make docs consistent with upcoming blog post (#1708) * wip * more work * fix apex * docs * apex doc * pool comment * clean up * make wrap stack pluggable * Mon Mar 12 21:45:50 PDT 2018 * clean up comment * table * Mon Mar 12 22:51:57 PDT 2018 * Mon Mar 12 22:53:05 PDT 2018 * Mon Mar 12 22:55:03 PDT 2018 * Mon Mar 12 22:56:18 PDT 2018 * Mon Mar 12 22:59:54 PDT 2018 * Update apex_optimizer.py * Update index.rst * Update README.rst * Update README.rst * comments * Wed Mar 14 19:01:02 PDT 2018 2018-03-15 15:57:31 -07:00			`.. autoclass:: ray.rllib.optimizers.policy_optimizer.PolicyOptimizer`
[rllib] Split docs into user and development guide (#1377) * docs * Update README.rst * Sat Dec 30 15:23:49 PST 2017 * comments * Sun Dec 31 23:33:30 PST 2017 * Sun Dec 31 23:33:38 PST 2017 * Sun Dec 31 23:37:46 PST 2017 * Sun Dec 31 23:39:28 PST 2017 * Sun Dec 31 23:43:05 PST 2017 * Sun Dec 31 23:51:55 PST 2017 * Sun Dec 31 23:52:51 PST 2017 2018-01-01 11:10:44 -08:00			`:members:`

			`Sample Batches`
			`~~~~~~~~~~~~~~`

			`In order for Optimizers to manipulate sample data, they should be returned from Evaluators`
			`in the SampleBatch format (a wrapper around a dict).`

			`.. autoclass:: ray.rllib.optimizers.SampleBatch`
			`:members:`

			`Models and Preprocessors`
			`~~~~~~~~~~~~~~~~~~~~~~~~`

			`Algorithms share neural network models which inherit from the following class:`

			`.. autoclass:: ray.rllib.models.Model`
			`:members:`

			`Currently we support fully connected and convolutional TensorFlow policies on all algorithms:`

			`.. autoclass:: ray.rllib.models.FullyConnectedNetwork`
			`.. autoclass:: ray.rllib.models.ConvolutionalNetwork`

			`A3C also supports a TensorFlow LSTM policy.`

			`.. autoclass:: ray.rllib.models.LSTM`

			`Observations are transformed by Preprocessors before used in the model:`

			`.. autoclass:: ray.rllib.models.preprocessors.Preprocessor`
			`:members:`

			`Action Distributions`
			`~~~~~~~~~~~~~~~~~~~~`

			`Actions can be sampled from different distributions which have a common base`
			`class:`

			`.. autoclass:: ray.rllib.models.ActionDistribution`
			`:members:`

			`Currently we support the following action distributions:`

			`.. autoclass:: ray.rllib.models.Categorical`
			`.. autoclass:: ray.rllib.models.DiagGaussian`
			`.. autoclass:: ray.rllib.models.Deterministic`

			`The Model Catalog`
			`~~~~~~~~~~~~~~~~~`

[rllib] Update docs with api and components overview figures (#1443) 2018-01-19 10:08:45 -08:00			`The Model Catalog is the mechanism for algorithms to get canonical preprocessors, models, and action distributions for varying gym environments. It enables easy reuse of these components across different algorithms.`
[rllib] Split docs into user and development guide (#1377) * docs * Update README.rst * Sat Dec 30 15:23:49 PST 2017 * comments * Sun Dec 31 23:33:30 PST 2017 * Sun Dec 31 23:33:38 PST 2017 * Sun Dec 31 23:37:46 PST 2017 * Sun Dec 31 23:39:28 PST 2017 * Sun Dec 31 23:43:05 PST 2017 * Sun Dec 31 23:51:55 PST 2017 * Sun Dec 31 23:52:51 PST 2017 2018-01-01 11:10:44 -08:00
			`.. autoclass:: ray.rllib.models.ModelCatalog`
			`:members:`