mirror of
https://github.com/vale981/ray
synced 2025-03-06 18:41:40 -05:00

* wip * cls * re * wip * wip * a3c working * torch support * pg works * lint * rm v2 * consumer id * clean up pg * clean up more * fix python 2.7 * tf session management * docs * dqn wip * fix compile * dqn * apex runs * up * impotrs * ddpg * quotes * fix tests * fix last r * fix tests * lint * pass checkpoint restore * kwar * nits * policy graph * fix yapf * com * class * pyt * vectorization * update * test cpe * unit test * fix ddpg2 * changes * wip * args * faster test * common * fix * add alg option * batch mode and policy serving * multi serving test * todo * wip * serving test * doc async env * num envs * comments * thread * remove init hook * update * fix ppo * comments1 * fix * updates * add jenkins tests * fix * fix pytorch * fix * fixes * fix a3c policy * fix squeeze * fix trunc on apex * fix squeezing for real * update * remove horizon test for now * multiagent wip * update * fix race condition * fix ma * t * doc * st * wip * example * wip * working * cartpole * wip * batch wip * fix bug * make other_batches None default * working * debug * nit * warn * comments * fix ppo * fix obs filter * update * wip * tf * update * fix * cleanup * cleanup * spacing * model * fix * dqn * fix ddpg * doc * keep names * update * fix * com * docs * clarify model outputs * Update torch_policy_graph.py * fix obs filter * pass thru worker index * fix * rename * vlad torch comments * fix log action * debug name * fix lstm * remove unused ddpg net * remove conv net * revert lstm * cast * clean up * fix lstm check * move to end * fix sphinx * fix cmd * remove bad doc * clarify * copy * async sa * fix
102 lines
3.8 KiB
ReStructuredText
102 lines
3.8 KiB
ReStructuredText
RLlib Developer Guide
|
|
=====================
|
|
|
|
.. note::
|
|
|
|
This guide will take you through steps for implementing a new algorithm in RLlib. To apply existing algorithms already implemented in RLlib, please see the `user docs <rllib.html>`__.
|
|
|
|
Recipe for an RLlib algorithm
|
|
-----------------------------
|
|
|
|
Here are the steps for implementing a new algorithm in RLlib:
|
|
|
|
1. Define an algorithm-specific `Policy evaluator class <#policy-evaluators-and-optimizers>`__ (the core of the algorithm). Evaluators encapsulate framework-specific components such as the policy and loss functions. For an example, see the `simple policy gradient evaluator example <https://github.com/ray-project/ray/blob/master/python/ray/rllib/pg/pg_evaluator.py>`__.
|
|
|
|
|
|
2. Pick an appropriate `Policy optimizer class <#policy-evaluators-and-optimizers>`__. Optimizers manage the parallel execution of the algorithm. RLlib provides several built-in optimizers for gradient-based algorithms. Advanced algorithms may find it beneficial to implement their own optimizers.
|
|
|
|
|
|
3. Wrap the two up in an `Agent class <#agents>`__. Agents are the user-facing API of RLlib. They provide the necessary "glue" and implement accessory functionality such as statistics reporting and checkpointing.
|
|
|
|
To help with implementation, RLlib provides common action distributions, preprocessors, and neural network models, found in `catalog.py <https://github.com/ray-project/ray/blob/master/python/ray/rllib/models/catalog.py>`__, which are shared by all algorithms. Note that most of these utilities are currently Tensorflow specific.
|
|
|
|
.. image:: rllib-api.svg
|
|
|
|
|
|
The Developer API
|
|
-----------------
|
|
|
|
The following APIs are the building blocks of RLlib algorithms (also take a look at the `user components overview <rllib.html#components-user-customizable-and-internal>`__).
|
|
|
|
Agents
|
|
~~~~~~
|
|
|
|
Agents implement a particular algorithm and can be used to run
|
|
some number of iterations of the algorithm, save and load the state
|
|
of training and evaluate the current policy. All agents inherit from
|
|
a common base class:
|
|
|
|
.. autoclass:: ray.rllib.agent.Agent
|
|
:members:
|
|
|
|
Policy Evaluators and Optimizers
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: ray.rllib.optimizers.policy_evaluator.PolicyEvaluator
|
|
:members:
|
|
|
|
.. autoclass:: ray.rllib.optimizers.policy_optimizer.PolicyOptimizer
|
|
:members:
|
|
|
|
Sample Batches
|
|
~~~~~~~~~~~~~~
|
|
|
|
In order for Optimizers to manipulate sample data, they should be returned from Evaluators
|
|
in the SampleBatch format (a wrapper around a dict).
|
|
|
|
.. autoclass:: ray.rllib.optimizers.SampleBatch
|
|
:members:
|
|
|
|
Models and Preprocessors
|
|
~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Algorithms share neural network models which inherit from the following class:
|
|
|
|
.. autoclass:: ray.rllib.models.Model
|
|
:members:
|
|
|
|
Currently we support fully connected and convolutional TensorFlow policies on all algorithms:
|
|
|
|
.. autoclass:: ray.rllib.models.FullyConnectedNetwork
|
|
|
|
A3C also supports a TensorFlow LSTM policy.
|
|
|
|
.. autoclass:: ray.rllib.models.LSTM
|
|
|
|
Observations are transformed by Preprocessors before used in the model:
|
|
|
|
.. autoclass:: ray.rllib.models.preprocessors.Preprocessor
|
|
:members:
|
|
|
|
Action Distributions
|
|
~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Actions can be sampled from different distributions which have a common base
|
|
class:
|
|
|
|
.. autoclass:: ray.rllib.models.ActionDistribution
|
|
:members:
|
|
|
|
Currently we support the following action distributions:
|
|
|
|
.. autoclass:: ray.rllib.models.Categorical
|
|
.. autoclass:: ray.rllib.models.DiagGaussian
|
|
.. autoclass:: ray.rllib.models.Deterministic
|
|
|
|
The Model Catalog
|
|
~~~~~~~~~~~~~~~~~
|
|
|
|
The Model Catalog is the mechanism for algorithms to get canonical preprocessors, models, and action distributions for varying gym environments. It enables easy reuse of these components across different algorithms.
|
|
|
|
.. autoclass:: ray.rllib.models.ModelCatalog
|
|
:members:
|