2019-08-12 17:39:02 -07:00
RLlib Table of Contents
=======================
Training APIs
-------------
2019-09-11 12:15:34 -07:00
* `Command-line <rllib-training.html> `__
2020-02-01 22:12:12 -08:00
- `Evaluating Trained Policies <rllib-training.html#evaluating-trained-policies> `__
2019-09-11 12:15:34 -07:00
* `Configuration <rllib-training.html#configuration> `__
2019-09-07 11:50:18 -07:00
- `Specifying Parameters <rllib-training.html#specifying-parameters> `__
2019-09-11 12:15:34 -07:00
2019-09-07 11:50:18 -07:00
- `Specifying Resources <rllib-training.html#specifying-resources> `__
2019-09-11 12:15:34 -07:00
2019-09-07 11:50:18 -07:00
- `Common Parameters <rllib-training.html#common-parameters> `__
2019-09-11 12:15:34 -07:00
2020-03-27 22:05:43 -07:00
- `Scaling Guide <rllib-training.html#scaling-guide> `__
2019-09-07 11:50:18 -07:00
- `Tuned Examples <rllib-training.html#tuned-examples> `__
2020-02-01 22:12:12 -08:00
* `Basic Python API <rllib-training.html#basic-python-api> `__
2019-09-11 12:15:34 -07:00
2019-12-03 00:10:50 -08:00
- `Computing Actions <rllib-training.html#computing-actions> `__
2019-09-07 11:50:18 -07:00
- `Accessing Policy State <rllib-training.html#accessing-policy-state> `__
2019-09-11 12:15:34 -07:00
2019-09-07 11:50:18 -07:00
- `Accessing Model State <rllib-training.html#accessing-model-state> `__
2019-09-11 12:15:34 -07:00
2020-02-01 22:12:12 -08:00
* `Advanced Python APIs <rllib-training.html#advanced-python-apis> `__
- `Custom Training Workflows <rllib-training.html#custom-training-workflows> `__
2019-09-07 11:50:18 -07:00
- `Global Coordination <rllib-training.html#global-coordination> `__
2019-09-11 12:15:34 -07:00
2019-09-07 11:50:18 -07:00
- `Callbacks and Custom Metrics <rllib-training.html#callbacks-and-custom-metrics> `__
2019-09-11 12:15:34 -07:00
2020-03-14 11:16:54 -07:00
- `Customizing Exploration Behavior <rllib-training.html#customizing-exploration-behavior> `__
2020-03-02 01:55:41 +01:00
2020-02-01 22:12:12 -08:00
- `Customized Evaluation During Training <rllib-training.html#customized-evaluation-during-training> `__
2019-09-07 11:50:18 -07:00
- `Rewriting Trajectories <rllib-training.html#rewriting-trajectories> `__
2019-09-11 12:15:34 -07:00
2019-09-07 11:50:18 -07:00
- `Curriculum Learning <rllib-training.html#curriculum-learning> `__
2019-09-11 12:15:34 -07:00
* `Debugging <rllib-training.html#debugging> `__
2019-09-07 11:50:18 -07:00
- `Gym Monitor <rllib-training.html#gym-monitor> `__
2019-09-11 12:15:34 -07:00
2019-09-07 11:50:18 -07:00
- `Eager Mode <rllib-training.html#eager-mode> `__
2019-09-11 12:15:34 -07:00
2019-09-07 11:50:18 -07:00
- `Episode Traces <rllib-training.html#episode-traces> `__
2019-09-11 12:15:34 -07:00
2019-09-07 11:50:18 -07:00
- `Log Verbosity <rllib-training.html#log-verbosity> `__
2019-09-11 12:15:34 -07:00
2019-09-07 11:50:18 -07:00
- `Stack Traces <rllib-training.html#stack-traces> `__
2020-03-20 12:43:57 -07:00
* `External Application API <rllib-training.html#external-application-api> `__
2019-08-12 17:39:02 -07:00
Environments
------------
2020-03-27 22:05:43 -07:00
* `RLlib Environments Overview <rllib-env.html> `__
* `Feature Compatibility Matrix <rllib-env.html#feature-compatibility-matrix> `__
* `OpenAI Gym <rllib-env.html#openai-gym> `__
* `Vectorized <rllib-env.html#vectorized> `__
* `Multi-Agent and Hierarchical <rllib-env.html#multi-agent-and-hierarchical> `__
* `External Agents and Applications <rllib-env.html#external-agents-and-applications> `__
2020-03-20 12:43:57 -07:00
- `External Application Clients <rllib-env.html#external-application-clients> `__
2020-03-27 22:05:43 -07:00
* `Advanced Integrations <rllib-env.html#advanced-integrations> `__
2019-08-12 17:39:02 -07:00
Models, Preprocessors, and Action Distributions
-----------------------------------------------
2020-03-27 22:05:43 -07:00
* `RLlib Models, Preprocessors, and Action Distributions Overview <rllib-models.html> `__
* `TensorFlow Models <rllib-models.html#tensorflow-models> `__
* `PyTorch Models <rllib-models.html#pytorch-models> `__
* `Custom Preprocessors <rllib-models.html#custom-preprocessors> `__
* `Custom Action Distributions <rllib-models.html#custom-action-distributions> `__
* `Supervised Model Losses <rllib-models.html#supervised-model-losses> `__
* `Self-Supervised Model Losses <rllib-models.html#self-supervised-model-losses> `__
* `Variable-length / Parametric Action Spaces <rllib-models.html#variable-length-parametric-action-spaces> `__
* `Autoregressive Action Distributions <rllib-models.html#autoregressive-action-distributions> `__
2019-08-12 17:39:02 -07:00
Algorithms
----------
* High-throughput architectures
2020-03-27 22:05:43 -07:00
- |tensorflow| :ref: `Distributed Prioritized Experience Replay (Ape-X) <apex>`
2019-08-12 17:39:02 -07:00
2020-03-27 22:05:43 -07:00
- |tensorflow| :ref: `Importance Weighted Actor-Learner Architecture (IMPALA) <impala>`
2019-08-12 17:39:02 -07:00
2020-03-27 22:05:43 -07:00
- |tensorflow| :ref: `Asynchronous Proximal Policy Optimization (APPO) <appo>`
2019-08-12 17:39:02 -07:00
2020-03-27 22:05:43 -07:00
- |pytorch| :ref: `Decentralized Distributed Proximal Policy Optimization (DD-PPO) <ddppo>`
2020-02-10 15:28:27 -08:00
2020-03-27 22:05:43 -07:00
- |pytorch| :ref: `Single-Player AlphaZero (contrib/AlphaZero) <alphazero>`
2019-12-07 21:08:40 +01:00
2019-08-12 17:39:02 -07:00
* Gradient-based
2020-03-27 22:05:43 -07:00
- |pytorch| |tensorflow| :ref: `Advantage Actor-Critic (A2C, A3C) <a3c>`
2019-08-12 17:39:02 -07:00
2020-03-27 22:05:43 -07:00
- |tensorflow| :ref: `Deep Deterministic Policy Gradients (DDPG, TD3) <ddpg>`
2019-08-12 17:39:02 -07:00
2020-03-27 22:05:43 -07:00
- |tensorflow| :ref: `Deep Q Networks (DQN, Rainbow, Parametric DQN) <dqn>`
2019-08-12 17:39:02 -07:00
2020-03-27 22:05:43 -07:00
- |pytorch| |tensorflow| :ref: `Policy Gradients <pg>`
2019-08-12 17:39:02 -07:00
2020-03-27 22:05:43 -07:00
- |pytorch| |tensorflow| :ref: `Proximal Policy Optimization (PPO) <ppo>`
2019-08-12 17:39:02 -07:00
2020-03-27 22:05:43 -07:00
- |tensorflow| :ref: `Soft Actor Critic (SAC) <sac>`
2019-08-12 17:39:02 -07:00
* Derivative-free
2020-03-27 22:05:43 -07:00
- |tensorflow| :ref: `Augmented Random Search (ARS) <ars>`
2019-08-12 17:39:02 -07:00
2020-03-27 22:05:43 -07:00
- |tensorflow| :ref: `Evolution Strategies <es>`
2019-08-12 17:39:02 -07:00
* Multi-agent specific
2020-03-27 22:05:43 -07:00
- |pytorch| :ref: `QMIX Monotonic Value Factorisation (QMIX, VDN, IQN) <qmix>`
- |tensorflow| :ref: `Multi-Agent Deep Deterministic Policy Gradient (contrib/MADDPG) <maddpg>`
2019-08-12 17:39:02 -07:00
* Offline
2020-03-27 22:05:43 -07:00
- |tensorflow| :ref: `Advantage Re-Weighted Imitation Learning (MARWIL) <marwil>`
* Contextual bandits
- |pytorch| :ref: `Linear Upper Confidence Bound (contrib/LinUCB) <linucb>`
- |pytorch| :ref: `Linear Thompson Sampling (contrib/LinTS) <lints>`
2019-08-12 17:39:02 -07:00
Offline Datasets
----------------
2020-03-27 22:05:43 -07:00
* `Working with Offline Datasets <rllib-offline.html> `__
* `Input Pipeline for Supervised Losses <rllib-offline.html#input-pipeline-for-supervised-losses> `__
* `Input API <rllib-offline.html#input-api> `__
* `Output API <rllib-offline.html#output-api> `__
2019-08-12 17:39:02 -07:00
Concepts and Custom Algorithms
------------------------------
* `Policies <rllib-concepts.html> `__
- `Policies in Multi-Agent <rllib-concepts.html#policies-in-multi-agent> `__
- `Building Policies in TensorFlow <rllib-concepts.html#building-policies-in-tensorflow> `__
- `Building Policies in TensorFlow Eager <rllib-concepts.html#building-policies-in-tensorflow-eager> `__
- `Building Policies in PyTorch <rllib-concepts.html#building-policies-in-pytorch> `__
- `Extending Existing Policies <rllib-concepts.html#extending-existing-policies> `__
* `Policy Evaluation <rllib-concepts.html#policy-evaluation> `__
* `Policy Optimization <rllib-concepts.html#policy-optimization> `__
* `Trainers <rllib-concepts.html#trainers> `__
Examples
--------
2020-03-27 22:05:43 -07:00
* `Tuned Examples <rllib-examples.html#tuned-examples> `__
* `Training Workflows <rllib-examples.html#training-workflows> `__
* `Custom Envs and Models <rllib-examples.html#custom-envs-and-models> `__
* `Serving and Offline <rllib-examples.html#serving-and-offline> `__
* `Multi-Agent and Hierarchical <rllib-examples.html#multi-agent-and-hierarchical> `__
* `Community Examples <rllib-examples.html#community-examples> `__
2019-08-12 17:39:02 -07:00
Development
-----------
2020-03-27 22:05:43 -07:00
* `Development Install <rllib-dev.html#development-install> `__
* `API Stability <rllib-dev.html#api-stability> `__
* `Features <rllib-dev.html#feature-development> `__
* `Benchmarks <rllib-dev.html#benchmarks> `__
* `Contributing Algorithms <rllib-dev.html#contributing-algorithms> `__
2019-08-12 17:39:02 -07:00
Package Reference
-----------------
2020-03-27 22:05:43 -07:00
* `ray.rllib.agents <rllib-package-ref.html#module-ray.rllib.agents> `__
* `ray.rllib.env <rllib-package-ref.html#module-ray.rllib.env> `__
* `ray.rllib.evaluation <rllib-package-ref.html#module-ray.rllib.evaluation> `__
* `ray.rllib.models <rllib-package-ref.html#module-ray.rllib.models> `__
* `ray.rllib.optimizers <rllib-package-ref.html#module-ray.rllib.optimizers> `__
* `ray.rllib.utils <rllib-package-ref.html#module-ray.rllib.utils> `__
2019-08-12 17:39:02 -07:00
Troubleshooting
---------------
If you encounter errors like
`blas_thread_init: pthread_create: Resource temporarily unavailable` when using many workers,
try setting `` OMP_NUM_THREADS=1 `` . Similarly, check configured system limits with
`ulimit -a` for other resource limit errors.
If you encounter out-of-memory errors, consider setting `` redis_max_memory `` and `` object_store_memory `` in `` ray.init() `` to reduce memory usage.
For debugging unexpected hangs or performance problems, you can run `` ray stack `` to dump
2020-03-20 12:43:57 -07:00
the stack traces of all Ray workers on the current node, `` ray timeline `` to dump
a timeline visualization of tasks to a file, and `` ray memory `` to list all object
references in the cluster.
2019-10-26 13:23:42 -07:00
TensorFlow 2.0
~~~~~~~~~~~~~~
RLlib currently runs in `` tf.compat.v1 `` mode. This means eager execution is disabled by default, and RLlib imports TF with `` import tensorflow.compat.v1 as tf; tf.disable_v2_behaviour() `` . Eager execution can be enabled manually by calling `` tf.enable_eager_execution() `` or setting the `` "eager": True `` trainer config.
2020-01-20 15:22:21 -08:00
.. |tensorflow| image :: tensorflow.png
:width: 16
.. |pytorch| image :: pytorch.png
:width: 16