mirror of
https://github.com/vale981/ray
synced 2025-03-06 10:31:39 -05:00

* wip * more work * fix apex * docs * apex doc * pool comment * clean up * make wrap stack pluggable * Mon Mar 12 21:45:50 PDT 2018 * clean up comment * table * Mon Mar 12 22:51:57 PDT 2018 * Mon Mar 12 22:53:05 PDT 2018 * Mon Mar 12 22:55:03 PDT 2018 * Mon Mar 12 22:56:18 PDT 2018 * Mon Mar 12 22:59:54 PDT 2018 * Update apex_optimizer.py * Update index.rst * Update README.rst * Update README.rst * comments * Wed Mar 14 19:01:02 PDT 2018
103 lines
3.8 KiB
ReStructuredText
103 lines
3.8 KiB
ReStructuredText
RLlib Developer Guide
|
|
=====================
|
|
|
|
.. note::
|
|
|
|
This guide will take you through steps for implementing a new algorithm in RLlib. To apply existing algorithms already implemented in RLlib, please see the `user docs <rllib.html>`__.
|
|
|
|
Recipe for an RLlib algorithm
|
|
-----------------------------
|
|
|
|
Here are the steps for implementing a new algorithm in RLlib:
|
|
|
|
1. Define an algorithm-specific `Policy evaluator class <#policy-evaluators-and-optimizers>`__ (the core of the algorithm). Evaluators encapsulate framework-specific components such as the policy and loss functions. For an example, see the `simple policy gradient evaluator example <https://github.com/ray-project/ray/blob/master/python/ray/rllib/pg/pg_evaluator.py>`__.
|
|
|
|
|
|
2. Pick an appropriate `Policy optimizer class <#policy-evaluators-and-optimizers>`__. Optimizers manage the parallel execution of the algorithm. RLlib provides several built-in optimizers for gradient-based algorithms. Advanced algorithms may find it beneficial to implement their own optimizers.
|
|
|
|
|
|
3. Wrap the two up in an `Agent class <#agents>`__. Agents are the user-facing API of RLlib. They provide the necessary "glue" and implement accessory functionality such as statistics reporting and checkpointing.
|
|
|
|
To help with implementation, RLlib provides common action distributions, preprocessors, and neural network models, found in `catalog.py <https://github.com/ray-project/ray/blob/master/python/ray/rllib/models/catalog.py>`__, which are shared by all algorithms. Note that most of these utilities are currently Tensorflow specific.
|
|
|
|
.. image:: rllib-api.svg
|
|
|
|
|
|
The Developer API
|
|
-----------------
|
|
|
|
The following APIs are the building blocks of RLlib algorithms (also take a look at the `user components overview <rllib.html#components-user-customizable-and-internal>`__).
|
|
|
|
Agents
|
|
~~~~~~
|
|
|
|
Agents implement a particular algorithm and can be used to run
|
|
some number of iterations of the algorithm, save and load the state
|
|
of training and evaluate the current policy. All agents inherit from
|
|
a common base class:
|
|
|
|
.. autoclass:: ray.rllib.agent.Agent
|
|
:members:
|
|
|
|
Policy Evaluators and Optimizers
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: ray.rllib.optimizers.policy_evaluator.PolicyEvaluator
|
|
:members:
|
|
|
|
.. autoclass:: ray.rllib.optimizers.policy_optimizer.PolicyOptimizer
|
|
:members:
|
|
|
|
Sample Batches
|
|
~~~~~~~~~~~~~~
|
|
|
|
In order for Optimizers to manipulate sample data, they should be returned from Evaluators
|
|
in the SampleBatch format (a wrapper around a dict).
|
|
|
|
.. autoclass:: ray.rllib.optimizers.SampleBatch
|
|
:members:
|
|
|
|
Models and Preprocessors
|
|
~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Algorithms share neural network models which inherit from the following class:
|
|
|
|
.. autoclass:: ray.rllib.models.Model
|
|
:members:
|
|
|
|
Currently we support fully connected and convolutional TensorFlow policies on all algorithms:
|
|
|
|
.. autoclass:: ray.rllib.models.FullyConnectedNetwork
|
|
.. autoclass:: ray.rllib.models.ConvolutionalNetwork
|
|
|
|
A3C also supports a TensorFlow LSTM policy.
|
|
|
|
.. autoclass:: ray.rllib.models.LSTM
|
|
|
|
Observations are transformed by Preprocessors before used in the model:
|
|
|
|
.. autoclass:: ray.rllib.models.preprocessors.Preprocessor
|
|
:members:
|
|
|
|
Action Distributions
|
|
~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Actions can be sampled from different distributions which have a common base
|
|
class:
|
|
|
|
.. autoclass:: ray.rllib.models.ActionDistribution
|
|
:members:
|
|
|
|
Currently we support the following action distributions:
|
|
|
|
.. autoclass:: ray.rllib.models.Categorical
|
|
.. autoclass:: ray.rllib.models.DiagGaussian
|
|
.. autoclass:: ray.rllib.models.Deterministic
|
|
|
|
The Model Catalog
|
|
~~~~~~~~~~~~~~~~~
|
|
|
|
The Model Catalog is the mechanism for algorithms to get canonical preprocessors, models, and action distributions for varying gym environments. It enables easy reuse of these components across different algorithms.
|
|
|
|
.. autoclass:: ray.rllib.models.ModelCatalog
|
|
:members:
|