2018-07-01 00:05:08 -07:00
RLlib: Scalable Reinforcement Learning
======================================
2017-09-12 23:38:21 -07:00
2018-07-01 00:05:08 -07:00
RLlib is an open-source library for reinforcement learning that offers both a collection of reference algorithms and scalable primitives for composing new ones.
2017-12-06 18:17:51 -08:00
2018-07-01 00:05:08 -07:00
.. image :: rllib-stack.svg
2017-12-06 18:17:51 -08:00
2018-11-14 14:14:07 -08:00
Learn more about RLlib's design by reading the `ICML paper <https://arxiv.org/abs/1712.09381> `__ .
2018-03-17 14:45:04 -07:00
2017-12-08 01:57:19 -08:00
Installation
------------
2018-10-01 12:49:39 -07:00
RLlib has extra dependencies on top of `` ray `` . First, you'll need to install either `PyTorch <http://pytorch.org/> `__ or `TensorFlow <https://www.tensorflow.org> `__ . Then, install the RLlib module:
2018-06-10 10:29:12 -07:00
.. code-block :: bash
pip install tensorflow # or tensorflow-gpu
2018-11-27 09:50:59 -08:00
pip install ray[rllib] # also recommended: ray[debug]
2018-06-10 10:29:12 -07:00
2018-11-14 14:14:07 -08:00
You might also want to clone the `Ray repo <https://github.com/ray-project/ray> `__ for convenient access to RLlib helper scripts:
2018-06-10 10:29:12 -07:00
.. code-block :: bash
2018-01-19 10:08:45 -08:00
git clone https://github.com/ray-project/ray
2018-07-01 00:05:08 -07:00
cd ray/python/ray/rllib
2017-12-08 01:57:19 -08:00
2018-07-01 00:05:08 -07:00
Training APIs
-------------
* `Command-line <rllib-training.html> `__
2018-10-16 15:55:11 -07:00
* `Configuration <rllib-training.html#configuration> `__
2018-07-01 00:05:08 -07:00
* `Python API <rllib-training.html#python-api> `__
2018-11-03 18:48:32 -07:00
* `Debugging <rllib-training.html#debugging> `__
2018-07-01 00:05:08 -07:00
* `REST API <rllib-training.html#rest-api> `__
2018-06-10 10:29:12 -07:00
2018-07-01 00:05:08 -07:00
Environments
------------
* `RLlib Environments Overview <rllib-env.html> `__
* `OpenAI Gym <rllib-env.html#openai-gym> `__
2018-07-08 18:46:52 -07:00
* `Vectorized <rllib-env.html#vectorized> `__
2018-07-01 00:05:08 -07:00
* `Multi-Agent <rllib-env.html#multi-agent> `__
2018-11-12 16:31:27 -08:00
* `Interfacing with External Agents <rllib-env.html#interfacing-with-external-agents> `__
2018-07-01 00:05:08 -07:00
* `Batch Asynchronous <rllib-env.html#batch-asynchronous> `__
Algorithms
----------
2018-08-28 18:13:36 -07:00
* High-throughput architectures
- `Distributed Prioritized Experience Replay (Ape-X) <rllib-algorithms.html#distributed-prioritized-experience-replay-ape-x> `__
- `Importance Weighted Actor-Learner Architecture (IMPALA) <rllib-algorithms.html#importance-weighted-actor-learner-architecture-impala> `__
* Gradient-based
- `Advantage Actor-Critic (A2C, A3C) <rllib-algorithms.html#advantage-actor-critic-a2c-a3c> `__
2018-11-22 13:36:47 -08:00
- `Deep Deterministic Policy Gradients (DDPG, TD3) <rllib-algorithms.html#deep-deterministic-policy-gradients-ddpg-td3> `__
2018-08-28 18:13:36 -07:00
2018-11-27 23:35:19 -08:00
- `Deep Q Networks (DQN, Rainbow, Parametric DQN) <rllib-algorithms.html#deep-q-networks-dqn-rainbow-parametric-dqn> `__
2018-08-28 18:13:36 -07:00
- `Policy Gradients <rllib-algorithms.html#policy-gradients> `__
- `Proximal Policy Optimization (PPO) <rllib-algorithms.html#proximal-policy-optimization-ppo> `__
* Derivative-free
- `Augmented Random Search (ARS) <rllib-algorithms.html#augmented-random-search-ars> `__
- `Evolution Strategies <rllib-algorithms.html#evolution-strategies> `__
2018-07-01 00:05:08 -07:00
Models and Preprocessors
2018-07-08 18:46:52 -07:00
------------------------
2018-07-01 00:05:08 -07:00
* `RLlib Models and Preprocessors Overview <rllib-models.html> `__
* `Built-in Models and Preprocessors <rllib-models.html#built-in-models-and-preprocessors> `__
* `Custom Models <rllib-models.html#custom-models> `__
* `Custom Preprocessors <rllib-models.html#custom-preprocessors> `__
2018-07-23 00:47:14 +03:00
* `Customizing Policy Graphs <rllib-models.html#customizing-policy-graphs> `__
2018-11-27 23:35:19 -08:00
* `Variable-length / Parametric Action Spaces <rllib-models.html#variable-length-parametric-action-spaces> `__
2018-08-16 14:37:21 -07:00
* `Model-Based Rollouts <rllib-models.html#model-based-rollouts> `__
2018-07-01 00:05:08 -07:00
2018-07-08 18:46:52 -07:00
RLlib Concepts
--------------
* `Policy Graphs <rllib-concepts.html> `__
* `Policy Evaluation <rllib-concepts.html#policy-evaluation> `__
* `Policy Optimization <rllib-concepts.html#policy-optimization> `__
2018-07-01 00:05:08 -07:00
Package Reference
-----------------
* `ray.rllib.agents <rllib-package-ref.html#module-ray.rllib.agents> `__
* `ray.rllib.env <rllib-package-ref.html#module-ray.rllib.env> `__
* `ray.rllib.evaluation <rllib-package-ref.html#module-ray.rllib.evaluation> `__
* `ray.rllib.models <rllib-package-ref.html#module-ray.rllib.models> `__
* `ray.rllib.optimizers <rllib-package-ref.html#module-ray.rllib.optimizers> `__
* `ray.rllib.utils <rllib-package-ref.html#module-ray.rllib.utils> `__
2018-08-15 14:31:50 -07:00
Troubleshooting
---------------
If you encounter errors like
`blas_thread_init: pthread_create: Resource temporarily unavailable` when using many workers,
try setting `` OMP_NUM_THREADS=1 `` . Similarly, check configured system limits with
`ulimit -a` for other resource limit errors.
2018-11-03 13:13:02 -07:00
For debugging unexpected hangs or performance problems, you can run `` ray stack `` to dump
the stack traces of all Ray workers on the current node. This requires py-spy to be installed.