RLlib: Scalable Reinforcement Learning ====================================== RLlib is an open-source library for reinforcement learning that offers both a collection of reference algorithms and scalable primitives for composing new ones. .. image:: rllib-stack.svg Learn more about RLlib's design by reading the `ICML paper `__. Installation ------------ RLlib has extra dependencies on top of ``ray``. First, you'll need to install either `PyTorch `__ or `TensorFlow `__. Then, install the RLlib module: .. code-block:: bash pip install tensorflow # or tensorflow-gpu pip install ray[rllib] # also recommended: ray[debug] You might also want to clone the `Ray repo `__ for convenient access to RLlib helper scripts: .. code-block:: bash git clone https://github.com/ray-project/ray cd ray/python/ray/rllib Training APIs ------------- * `Command-line `__ * `Configuration `__ * `Python API `__ * `Debugging `__ * `REST API `__ Environments ------------ * `RLlib Environments Overview `__ * `OpenAI Gym `__ * `Vectorized `__ * `Multi-Agent and Hierarchical `__ * `Interfacing with External Agents `__ * `Batch Asynchronous `__ Algorithms ---------- * High-throughput architectures - `Distributed Prioritized Experience Replay (Ape-X) `__ - `Importance Weighted Actor-Learner Architecture (IMPALA) `__ - `Asynchronous Proximal Policy Optimization (APPO) `__ * Gradient-based - `Advantage Actor-Critic (A2C, A3C) `__ - `Deep Deterministic Policy Gradients (DDPG, TD3) `__ - `Deep Q Networks (DQN, Rainbow, Parametric DQN) `__ - `Policy Gradients `__ - `Proximal Policy Optimization (PPO) `__ * Derivative-free - `Augmented Random Search (ARS) `__ - `Evolution Strategies `__ * Multi-agent specific - `QMIX Monotonic Value Factorisation (QMIX, VDN, IQN) `__ * Offline - `Advantage Re-Weighted Imitation Learning (MARWIL) `__ Models and Preprocessors ------------------------ * `RLlib Models and Preprocessors Overview `__ * `Built-in Models and Preprocessors `__ * `Custom Models (TensorFlow) `__ * `Custom Models (PyTorch) `__ * `Custom Preprocessors `__ * `Customizing Policy Graphs `__ * `Variable-length / Parametric Action Spaces `__ * `Model-Based Rollouts `__ Offline Datasets ---------------- * `Working with Offline Datasets `__ * `Input API `__ * `Output API `__ Development ----------- * `Development Install `__ * `Features `__ * `Benchmarks `__ * `Contributing Algorithms `__ Concepts -------- * `Policy Graphs `__ * `Policy Evaluation `__ * `Policy Optimization `__ Package Reference ----------------- * `ray.rllib.agents `__ * `ray.rllib.env `__ * `ray.rllib.evaluation `__ * `ray.rllib.models `__ * `ray.rllib.optimizers `__ * `ray.rllib.utils `__ Troubleshooting --------------- If you encounter errors like `blas_thread_init: pthread_create: Resource temporarily unavailable` when using many workers, try setting ``OMP_NUM_THREADS=1``. Similarly, check configured system limits with `ulimit -a` for other resource limit errors. For debugging unexpected hangs or performance problems, you can run ``ray stack`` to dump the stack traces of all Ray workers on the current node. This requires py-spy to be installed.