ray/doc/source/rllib.rst
Eric Liang 8aa56c12e6
[rllib] Document "v2" APIs (#2316)
* re

* wip

* wip

* a3c working

* torch support

* pg works

* lint

* rm v2

* consumer id

* clean up pg

* clean up more

* fix python 2.7

* tf session management

* docs

* dqn wip

* fix compile

* dqn

* apex runs

* up

* impotrs

* ddpg

* quotes

* fix tests

* fix last r

* fix tests

* lint

* pass checkpoint restore

* kwar

* nits

* policy graph

* fix yapf

* com

* class

* pyt

* vectorization

* update

* test cpe

* unit test

* fix ddpg2

* changes

* wip

* args

* faster test

* common

* fix

* add alg option

* batch mode and policy serving

* multi serving test

* todo

* wip

* serving test

* doc async env

* num envs

* comments

* thread

* remove init hook

* update

* fix ppo

* comments1

* fix

* updates

* add jenkins tests

* fix

* fix pytorch

* fix

* fixes

* fix a3c policy

* fix squeeze

* fix trunc on apex

* fix squeezing for real

* update

* remove horizon test for now

* multiagent wip

* update

* fix race condition

* fix ma

* t

* doc

* st

* wip

* example

* wip

* working

* cartpole

* wip

* batch wip

* fix bug

* make other_batches None default

* working

* debug

* nit

* warn

* comments

* fix ppo

* fix obs filter

* update

* wip

* tf

* update

* fix

* cleanup

* cleanup

* spacing

* model

* fix

* dqn

* fix ddpg

* doc

* keep names

* update

* fix

* com

* docs

* clarify model outputs

* Update torch_policy_graph.py

* fix obs filter

* pass thru worker index

* fix

* rename

* vlad torch comments

* fix log action

* debug name

* fix lstm

* remove unused ddpg net

* remove conv net

* revert lstm

* wip

* wip

* cast

* wip

* works

* fix a3c

* works

* lstm util test

* doc

* clean up

* update

* fix lstm check

* move to end

* fix sphinx

* fix cmd

* remove bad doc

* envs

* vec

* doc prep

* models

* rl

* alg

* up

* clarify

* copy

* async sa

* fix

* comments

* fix a3c conf

* tune lstm

* fix reshape

* fix

* back to 16

* tuned a3c update

* update

* tuned

* optional

* merge

* wip

* fix up

* move pg class

* rename env

* wip

* update

* tip

* alg

* readme

* fix catalog

* readme

* doc

* context

* remove prep

* comma

* add env

* link to paper

* paper

* update

* rnn

* update

* wip

* clean up ev creation

* fix

* fix

* fix

* fix lint

* up

* no comma

* ma

* Update run_multi_node_tests.sh

* fix

* sphinx is stupid

* sphinx is stupid

* clarify torch graph

* no horizon

* fix config

* sb

* Update test_optimizers.py
2018-07-01 00:05:08 -07:00

73 lines
2.9 KiB
ReStructuredText

RLlib: Scalable Reinforcement Learning
======================================
RLlib is an open-source library for reinforcement learning that offers both a collection of reference algorithms and scalable primitives for composing new ones.
.. image:: rllib-stack.svg
Learn more about RLlib's design by reading the `ICML paper <https://arxiv.org/abs/1712.09381>`__.
Installation
------------
RLlib has extra dependencies on top of ``ray``. First, you'll need to install either `PyTorch <http://pytorch.org/>`__ or `TensorFlow <https://www.tensorflow.org>`__. Then, install the Ray RLlib module:
.. code-block:: bash
pip install tensorflow # or tensorflow-gpu
pip install ray[rllib]
You might also want to clone the Ray repo for convenient access to RLlib helper scripts:
.. code-block:: bash
git clone https://github.com/ray-project/ray
cd ray/python/ray/rllib
Training APIs
-------------
* `Command-line <rllib-training.html>`__
* `Python API <rllib-training.html#python-api>`__
* `REST API <rllib-training.html#rest-api>`__
Environments
------------
* `RLlib Environments Overview <rllib-env.html>`__
* `OpenAI Gym <rllib-env.html#openai-gym>`__
* `Vectorized (Batch) <rllib-env.html#vectorized>`__
* `Multi-Agent <rllib-env.html#multi-agent>`__
* `Serving (Agent-oriented) <rllib-env.html#serving>`__
* `Offline Data Ingest <rllib-env.html#offline-data>`__
* `Batch Asynchronous <rllib-env.html#batch-asynchronous>`__
Algorithms
----------
* `Ape-X Distributed Prioritized Experience Replay <rllib-algorithms.html#ape-x-distributed-prioritized-experience-replay>`__
* `Asynchronous Advantage Actor-Critic <rllib-algorithms.html#asynchronous-advantage-actor-critic>`__
* `Deep Deterministic Policy Gradients <rllib-algorithms.html#deep-deterministic-policy-gradients>`__
* `Deep Q Networks <rllib-algorithms.html#deep-q-networks>`__
* `Evolution Strategies <rllib-algorithms.html#evolution-strategies>`__
* `Policy Gradients <rllib-algorithms.html#policy-gradients>`__
* `Proximal Policy Optimization <rllib-algorithms.html#proximal-policy-optimization>`__
Models and Preprocessors
-------------------------------
* `RLlib Models and Preprocessors Overview <rllib-models.html>`__
* `Built-in Models and Preprocessors <rllib-models.html#built-in-models-and-preprocessors>`__
* `Custom Models <rllib-models.html#custom-models>`__
* `Custom Preprocessors <rllib-models.html#custom-preprocessors>`__
RL Building Blocks
------------------
* Policy Models, Losses, Postprocessing
* Policy Evaluation
* Policy Optimization
Package Reference
-----------------
* `ray.rllib.agents <rllib-package-ref.html#module-ray.rllib.agents>`__
* `ray.rllib.env <rllib-package-ref.html#module-ray.rllib.env>`__
* `ray.rllib.evaluation <rllib-package-ref.html#module-ray.rllib.evaluation>`__
* `ray.rllib.models <rllib-package-ref.html#module-ray.rllib.models>`__
* `ray.rllib.optimizers <rllib-package-ref.html#module-ray.rllib.optimizers>`__
* `ray.rllib.utils <rllib-package-ref.html#module-ray.rllib.utils>`__