mirror of
https://github.com/vale981/ray
synced 2025-03-06 18:41:40 -05:00

* re * wip * wip * a3c working * torch support * pg works * lint * rm v2 * consumer id * clean up pg * clean up more * fix python 2.7 * tf session management * docs * dqn wip * fix compile * dqn * apex runs * up * impotrs * ddpg * quotes * fix tests * fix last r * fix tests * lint * pass checkpoint restore * kwar * nits * policy graph * fix yapf * com * class * pyt * vectorization * update * test cpe * unit test * fix ddpg2 * changes * wip * args * faster test * common * fix * add alg option * batch mode and policy serving * multi serving test * todo * wip * serving test * doc async env * num envs * comments * thread * remove init hook * update * fix ppo * comments1 * fix * updates * add jenkins tests * fix * fix pytorch * fix * fixes * fix a3c policy * fix squeeze * fix trunc on apex * fix squeezing for real * update * remove horizon test for now * multiagent wip * update * fix race condition * fix ma * t * doc * st * wip * example * wip * working * cartpole * wip * batch wip * fix bug * make other_batches None default * working * debug * nit * warn * comments * fix ppo * fix obs filter * update * wip * tf * update * fix * cleanup * cleanup * spacing * model * fix * dqn * fix ddpg * doc * keep names * update * fix * com * docs * clarify model outputs * Update torch_policy_graph.py * fix obs filter * pass thru worker index * fix * rename * vlad torch comments * fix log action * debug name * fix lstm * remove unused ddpg net * remove conv net * revert lstm * wip * wip * cast * wip * works * fix a3c * works * lstm util test * doc * clean up * update * fix lstm check * move to end * fix sphinx * fix cmd * remove bad doc * envs * vec * doc prep * models * rl * alg * up * clarify * copy * async sa * fix * comments * fix a3c conf * tune lstm * fix reshape * fix * back to 16 * tuned a3c update * update * tuned * optional * merge * wip * fix up * move pg class * rename env * wip * update * tip * alg * readme * fix catalog * readme * doc * context * remove prep * comma * add env * link to paper * paper * update * rnn * update * wip * clean up ev creation * fix * fix * fix * fix lint * up * no comma * ma * Update run_multi_node_tests.sh * fix * sphinx is stupid * sphinx is stupid * clarify torch graph * no horizon * fix config * sb * Update test_optimizers.py
73 lines
2.9 KiB
ReStructuredText
73 lines
2.9 KiB
ReStructuredText
RLlib: Scalable Reinforcement Learning
|
|
======================================
|
|
|
|
RLlib is an open-source library for reinforcement learning that offers both a collection of reference algorithms and scalable primitives for composing new ones.
|
|
|
|
.. image:: rllib-stack.svg
|
|
|
|
Learn more about RLlib's design by reading the `ICML paper <https://arxiv.org/abs/1712.09381>`__.
|
|
|
|
Installation
|
|
------------
|
|
|
|
RLlib has extra dependencies on top of ``ray``. First, you'll need to install either `PyTorch <http://pytorch.org/>`__ or `TensorFlow <https://www.tensorflow.org>`__. Then, install the Ray RLlib module:
|
|
|
|
.. code-block:: bash
|
|
|
|
pip install tensorflow # or tensorflow-gpu
|
|
pip install ray[rllib]
|
|
|
|
You might also want to clone the Ray repo for convenient access to RLlib helper scripts:
|
|
|
|
.. code-block:: bash
|
|
|
|
git clone https://github.com/ray-project/ray
|
|
cd ray/python/ray/rllib
|
|
|
|
Training APIs
|
|
-------------
|
|
* `Command-line <rllib-training.html>`__
|
|
* `Python API <rllib-training.html#python-api>`__
|
|
* `REST API <rllib-training.html#rest-api>`__
|
|
|
|
Environments
|
|
------------
|
|
* `RLlib Environments Overview <rllib-env.html>`__
|
|
* `OpenAI Gym <rllib-env.html#openai-gym>`__
|
|
* `Vectorized (Batch) <rllib-env.html#vectorized>`__
|
|
* `Multi-Agent <rllib-env.html#multi-agent>`__
|
|
* `Serving (Agent-oriented) <rllib-env.html#serving>`__
|
|
* `Offline Data Ingest <rllib-env.html#offline-data>`__
|
|
* `Batch Asynchronous <rllib-env.html#batch-asynchronous>`__
|
|
|
|
Algorithms
|
|
----------
|
|
* `Ape-X Distributed Prioritized Experience Replay <rllib-algorithms.html#ape-x-distributed-prioritized-experience-replay>`__
|
|
* `Asynchronous Advantage Actor-Critic <rllib-algorithms.html#asynchronous-advantage-actor-critic>`__
|
|
* `Deep Deterministic Policy Gradients <rllib-algorithms.html#deep-deterministic-policy-gradients>`__
|
|
* `Deep Q Networks <rllib-algorithms.html#deep-q-networks>`__
|
|
* `Evolution Strategies <rllib-algorithms.html#evolution-strategies>`__
|
|
* `Policy Gradients <rllib-algorithms.html#policy-gradients>`__
|
|
* `Proximal Policy Optimization <rllib-algorithms.html#proximal-policy-optimization>`__
|
|
|
|
Models and Preprocessors
|
|
-------------------------------
|
|
* `RLlib Models and Preprocessors Overview <rllib-models.html>`__
|
|
* `Built-in Models and Preprocessors <rllib-models.html#built-in-models-and-preprocessors>`__
|
|
* `Custom Models <rllib-models.html#custom-models>`__
|
|
* `Custom Preprocessors <rllib-models.html#custom-preprocessors>`__
|
|
|
|
RL Building Blocks
|
|
------------------
|
|
* Policy Models, Losses, Postprocessing
|
|
* Policy Evaluation
|
|
* Policy Optimization
|
|
|
|
Package Reference
|
|
-----------------
|
|
* `ray.rllib.agents <rllib-package-ref.html#module-ray.rllib.agents>`__
|
|
* `ray.rllib.env <rllib-package-ref.html#module-ray.rllib.env>`__
|
|
* `ray.rllib.evaluation <rllib-package-ref.html#module-ray.rllib.evaluation>`__
|
|
* `ray.rllib.models <rllib-package-ref.html#module-ray.rllib.models>`__
|
|
* `ray.rllib.optimizers <rllib-package-ref.html#module-ray.rllib.optimizers>`__
|
|
* `ray.rllib.utils <rllib-package-ref.html#module-ray.rllib.utils>`__
|