RLlib Table of Contents ======================= Training APIs ------------- * `Command-line `__ * `Configuration `__ - `Specifying Parameters `__ - `Specifying Resources `__ - `Common Parameters `__ - `Tuned Examples `__ * `Python API `__ - `Custom Training Workflows `__ - `Accessing Policy State `__ - `Accessing Model State `__ - `Global Coordination `__ - `Callbacks and Custom Metrics `__ - `Rewriting Trajectories `__ - `Curriculum Learning `__ * `Debugging `__ - `Gym Monitor `__ - `Eager Mode `__ - `Episode Traces `__ - `Log Verbosity `__ - `Stack Traces `__ * `REST API `__ Environments ------------ * `RLlib Environments Overview `__ * `Feature Compatibility Matrix `__ * `OpenAI Gym `__ * `Vectorized `__ * `Multi-Agent and Hierarchical `__ * `Interfacing with External Agents `__ * `Advanced Integrations `__ Models, Preprocessors, and Action Distributions ----------------------------------------------- * `RLlib Models, Preprocessors, and Action Distributions Overview `__ * `TensorFlow Models `__ * `PyTorch Models `__ * `Custom Preprocessors `__ * `Custom Action Distributions `__ * `Supervised Model Losses `__ * `Variable-length / Parametric Action Spaces `__ * `Autoregressive Action Distributions `__ Algorithms ---------- * High-throughput architectures - `Distributed Prioritized Experience Replay (Ape-X) `__ - `Importance Weighted Actor-Learner Architecture (IMPALA) `__ - `Asynchronous Proximal Policy Optimization (APPO) `__ * Gradient-based - `Advantage Actor-Critic (A2C, A3C) `__ - `Deep Deterministic Policy Gradients (DDPG, TD3) `__ - `Deep Q Networks (DQN, Rainbow, Parametric DQN) `__ - `Policy Gradients `__ - `Proximal Policy Optimization (PPO) `__ - `Soft Actor Critic (SAC) `__ * Derivative-free - `Augmented Random Search (ARS) `__ - `Evolution Strategies `__ * Multi-agent specific - `QMIX Monotonic Value Factorisation (QMIX, VDN, IQN) `__ - `Multi-Agent Deep Deterministic Policy Gradient (contrib/MADDPG) `__ * Offline - `Advantage Re-Weighted Imitation Learning (MARWIL) `__ Offline Datasets ---------------- * `Working with Offline Datasets `__ * `Input Pipeline for Supervised Losses `__ * `Input API `__ * `Output API `__ Concepts and Custom Algorithms ------------------------------ * `Policies `__ - `Policies in Multi-Agent `__ - `Building Policies in TensorFlow `__ - `Building Policies in TensorFlow Eager `__ - `Building Policies in PyTorch `__ - `Extending Existing Policies `__ * `Policy Evaluation `__ * `Policy Optimization `__ * `Trainers `__ Examples -------- * `Tuned Examples `__ * `Training Workflows `__ * `Custom Envs and Models `__ * `Serving and Offline `__ * `Multi-Agent and Hierarchical `__ * `Community Examples `__ Development ----------- * `Development Install `__ * `API Stability `__ * `Features `__ * `Benchmarks `__ * `Contributing Algorithms `__ Package Reference ----------------- * `ray.rllib.agents `__ * `ray.rllib.env `__ * `ray.rllib.evaluation `__ * `ray.rllib.models `__ * `ray.rllib.optimizers `__ * `ray.rllib.utils `__ Troubleshooting --------------- If you encounter errors like `blas_thread_init: pthread_create: Resource temporarily unavailable` when using many workers, try setting ``OMP_NUM_THREADS=1``. Similarly, check configured system limits with `ulimit -a` for other resource limit errors. If you encounter out-of-memory errors, consider setting ``redis_max_memory`` and ``object_store_memory`` in ``ray.init()`` to reduce memory usage. For debugging unexpected hangs or performance problems, you can run ``ray stack`` to dump the stack traces of all Ray workers on the current node, and ``ray timeline`` to dump a timeline visualization of tasks to a file. TensorFlow 2.0 ~~~~~~~~~~~~~~ RLlib currently runs in ``tf.compat.v1`` mode. This means eager execution is disabled by default, and RLlib imports TF with ``import tensorflow.compat.v1 as tf; tf.disable_v2_behaviour()``. Eager execution can be enabled manually by calling ``tf.enable_eager_execution()`` or setting the ``"eager": True`` trainer config.