ray/doc/source/rllib-examples.rst

RLlib Examples
==============

This page is an index of examples for the various use cases and features of RLlib.

If any example is broken, or if you'd like to add an example to this page, feel free to raise an issue on our Github repository.

Tuned Examples
--------------

- `Tuned examples <https://github.com/ray-project/ray/blob/master/rllib/tuned_examples>`__:
   Collection of tuned hyperparameters by algorithm.
- `MuJoCo and Atari benchmarks <https://github.com/ray-project/rl-experiments>`__:
   Collection of reasonably optimized Atari and MuJoCo results.

Blog Posts
----------

- `Attention Nets and More with RLlib’s Trajectory View API <https://medium.com/distributed-computing-with-ray/attention-nets-and-more-with-rllibs-trajectory-view-api-d326339a6e65>`__:
   This blog describes RLlib's new "trajectory view API" and how it enables implementations of GTrXL (attention net) architectures.
- `Reinforcement Learning with RLlib in the Unity Game Engine <https://medium.com/distributed-computing-with-ray/reinforcement-learning-with-rllib-in-the-unity-game-engine-1a98080a7c0d>`__:
   A how-to on connectig RLlib with the Unity3D game engine for running visual- and physics-based RL experiments.
- `Lessons from Implementing 12 Deep RL Algorithms in TF and PyTorch <https://medium.com/distributed-computing-with-ray/lessons-from-implementing-12-deep-rl-algorithms-in-tf-and-pytorch-1b412009297d>`__:
   Discussion on how we ported 12 of RLlib's algorithms from TensorFlow to PyTorch and what we learnt on the way.
- `Scaling Multi-Agent Reinforcement Learning <http://bair.berkeley.edu/blog/2018/12/12/rllib>`__:
   This blog post is a brief tutorial on multi-agent RL and its design in RLlib.
- `Functional RL with Keras and TensorFlow Eager <https://medium.com/riselab/functional-rl-with-keras-and-tensorflow-eager-7973f81d6345>`__:
   Exploration of a functional paradigm for implementing reinforcement learning (RL) algorithms.

Environments and Adapters
-------------------------

- `Registering a custom env and model <https://github.com/ray-project/ray/blob/master/rllib/examples/custom_env.py>`__:
   Example of defining and registering a gym env and model for use with RLlib.
- `Local Unity3D multi-agent environment example <https://github.com/ray-project/ray/tree/master/rllib/examples/unity3d_env_local.py>`__:
   Example of how to setup an RLlib Trainer against a locally running Unity3D editor instance to
   learn any Unity3D game (including support for multi-agent).
   Use this example to try things out and watch the game and the learning progress live in the editor.
   Providing a compiled game, this example could also run in distributed fashion with `num_workers > 0`.
   For a more heavy-weight, distributed, cloud-based example, see ``Unity3D client/server`` below.
- `Rendering and recording of an environment <https://github.com/ray-project/ray/blob/master/rllib/examples/env_rendering_and_recording.py>`__:
   Example showing how to switch on rendering and recording of an env.
- `Coin Game Example <https://github.com/ray-project/ray/blob/master/rllib/examples/coin_game_env.py>`__:
   Coin Game Env Example (provided by the "Center on Long Term Risk").
- `DMLab Watermaze example <https://github.com/ray-project/ray/blob/master/rllib/examples/dmlab_watermaze.py>`__:
   Example for how to use a DMLab environment (Watermaze).
- `RecSym environment example (for recommender systems) using the SlateQ algorithm <https://github.com/ray-project/ray/blob/master/rllib/examples/recsim_with_slateq.py>`__:
   Script showing how to train a SlateQTrainer on a RecSym environment.
- `SUMO (Simulation of Urban MObility) environment example <https://github.com/ray-project/ray/blob/master/rllib/examples/sumo_env_local.py>`__:
   Example demonstrating how to use the SUMO simulator in connection with RLlib.
- `VizDoom example script using RLlib's auto-attention wrapper <https://github.com/ray-project/ray/blob/master/rllib/examples/vizdoom_with_attention_net.py>`__:
   Script showing how to run PPO with an attention net against a VizDoom gym environment.
- `Subprocess environment <https://github.com/ray-project/ray/blob/master/rllib/tests/test_env_with_subprocess.py>`__:
   Example of how to ensure subprocesses spawned by envs are killed when RLlib exits.


Custom- and Complex Models
--------------------------

- `Attention Net (GTrXL) learning the "repeat-after-me" environment <https://github.com/ray-project/ray/blob/master/rllib/examples/attention_net.py>`__:
   Example showing how to use the auto-attention wrapper for your default- and custom models in RLlib.
- `LSTM model learning the "repeat-after-me" environment <https://github.com/ray-project/ray/blob/master/rllib/examples/lstm_auto_wrapping.py>`__:
   Example showing how to use the auto-LSTM wrapper for your default- and custom models in RLlib.
- `Custom Keras model <https://github.com/ray-project/ray/blob/master/rllib/examples/custom_keras_model.py>`__:
   Example of using a custom Keras model.
- `Custom Keras/PyTorch RNN model <https://github.com/ray-project/ray/blob/master/rllib/examples/custom_rnn_model.py>`__:
   Example of using a custom Keras- or PyTorch RNN model.
- `Registering a custom model with supervised loss <https://github.com/ray-project/ray/blob/master/rllib/examples/custom_loss.py>`__:
   Example of defining and registering a custom model with a supervised loss.
- `Batch normalization <https://github.com/ray-project/ray/blob/master/rllib/examples/batch_norm_model.py>`__:
   Example of adding batch norm layers to a custom model.
- `Eager execution <https://github.com/ray-project/ray/blob/master/rllib/examples/eager_execution.py>`__:
   Example of how to leverage TensorFlow eager to simplify debugging and design of custom models and policies.
- `Custom "Fast" Model <https://github.com/ray-project/ray/blob/master/rllib/examples/custom_fast_model.py>`__:
   Example of a "fast" Model learning only one parameter for tf and torch.
- `Custom model API example <https://github.com/ray-project/ray/blob/master/rllib/examples/custom_model_api.py>`__:
   Shows how to define a custom Model API in RLlib, such that it can be used inside certain algorithms.
- `Trajectory View API utilizing model <https://github.com/ray-project/ray/blob/master/rllib/examples/trajectory_view_api.py>`__:
   An example on how a model can use the trajectory view API to specify its own input.
- `MobileNetV2 wrapping example model <https://github.com/ray-project/ray/blob/master/rllib/examples/mobilenet_v2_with_lstm.py>`__:
   Implementations of `tf.keras.applications.mobilenet_v2.MobileNetV2` and `torch.hub (mobilenet_v2)`-wrapping example models.
- `Differentiable Neural Computer <https://github.com/ray-project/ray/blob/master/rllib/examples/neural_computer.py>`__:
   Example of DeepMind's Differentiable Neural Computer for partially-observable environments.


Training Workflows
------------------

- `Custom training workflows <https://github.com/ray-project/ray/blob/master/rllib/examples/custom_train_fn.py>`__:
   Example of how to use Tune's support for custom training functions to implement custom training workflows.
- `Curriculum learning with the TaskSettableEnv API <https://github.com/ray-project/ray/blob/master/rllib/examples/curriculum_learning.py>`__:
   Example of how to advance the environment through different phases (tasks) over time.
   Also see the `curriculum learning how-to <rllib-training.html#example-curriculum-learning>`__ from the documentation here.
- `Custom logger <https://github.com/ray-project/ray/blob/master/rllib/examples/custom_logger.py>`__:
   How to setup a custom Logger object in RLlib.
- `Custom metrics <https://github.com/ray-project/ray/blob/master/rllib/examples/custom_metrics_and_callbacks.py>`__:
   Example of how to output custom training metrics to TensorBoard.
- `Custom Policy class (TensorFlow) <https://github.com/ray-project/ray/blob/master/rllib/examples/custom_tf_policy.py>`__:
   How to setup a custom TFPolicy.
- `Custom Policy class (PyTorch) <https://github.com/ray-project/ray/blob/master/rllib/examples/custom_torch_policy.py>`__:
   How to setup a custom TorchPolicy.
- `Using rollout workers directly for control over the whole training workflow <https://github.com/ray-project/ray/blob/master/rllib/examples/rollout_worker_custom_workflow.py>`__:
   Example of how to use RLlib's lower-level building blocks to implement a fully customized training workflow.
- `Custom execution plan function handling two different Policies (DQN and PPO) at the same time <https://github.com/ray-project/ray/blob/master/rllib/examples/two_trainer_workflow.py>`__:
   Example of how to use the exec. plan of a Trainer to trin two different policies in parallel (also using multi-agent API).
- `Custom tune experiment <https://github.com/ray-project/ray/blob/master/rllib/examples/custom_experiment.py>`__:
   How to run a custom Ray Tune experiment with RLlib with custom training- and evaluation phases.


Evaluation:
-----------
- `Custom evaluation function <https://github.com/ray-project/ray/blob/master/rllib/examples/custom_eval.py>`__:
   Example of how to write a custom evaluation function that is called instead of the default behavior, which is running with the evaluation worker set through n episodes.
- `Parallel evaluation and training <https://github.com/ray-project/ray/blob/master/rllib/examples/parallel_evaluation_and_training.py>`__:
   Example showing how the evaluation workers and the "normal" rollout workers can run (to some extend) in parallel to speed up training.


Serving and Offline
-------------------
- `Offline RL with CQL <https://github.com/ray-project/ray/tree/master/rllib/examples/serving/offline_rl.py>`__:
   Example showing how to run an offline RL training job using a historic-data json file.
- :ref:`Serving RLlib models with Ray Serve <serve-rllib-tutorial>`: Example of using Ray Serve to serve RLlib models
   with HTTP and JSON interface. **This is the recommended way to expose RLlib for online serving use case**.
- `Unity3D client/server <https://github.com/ray-project/ray/tree/master/rllib/examples/serving/unity3d_server.py>`__:
   Example of how to setup n distributed Unity3D (compiled) games in the cloud that function as data collecting
   clients against a central RLlib Policy server learning how to play the game.
   The n distributed clients could themselves be servers for external/human players and allow for control
   being fully in the hands of the Unity entities instead of RLlib.
   Note: Uses Unity's MLAgents SDK (>=1.0) and supports all provided MLAgents example games and multi-agent setups.
- `CartPole client/server <https://github.com/ray-project/ray/tree/master/rllib/examples/serving/cartpole_server.py>`__:
   Example of online serving of predictions for a simple CartPole policy.
- `Saving experiences <https://github.com/ray-project/ray/blob/master/rllib/examples/saving_experiences.py>`__:
   Example of how to externally generate experience batches in RLlib-compatible format.
- `Finding a checkpoint using custom criteria <https://github.com/ray-project/ray/blob/master/rllib/examples/checkpoint_by_custom_criteria.py>`__:
   Example of how to find a checkpoint after a `tune.run` via some custom defined criteria.


Multi-Agent and Hierarchical
----------------------------

- `Simple independent multi-agent setup vs a PettingZoo env <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_independent_learning.py>`__:
   Setup RLlib to run any algorithm in (independent) multi-agent mode against a multi-agent environment.
- `More complex (shared-parameter) multi-agent setup vs a PettingZoo env <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_parameter_sharing.py>`__:
   Setup RLlib to run any algorithm in (shared-parameter) multi-agent mode against a multi-agent environment.
- `Rock-paper-scissors <https://github.com/ray-project/ray/blob/master/rllib/examples/rock_paper_scissors_multiagent.py>`__:
   Example of different heuristic and learned policies competing against each other in rock-paper-scissors.
- `Two-step game <https://github.com/ray-project/ray/blob/master/rllib/examples/two_step_game.py>`__:
   Example of the two-step game from the `QMIX paper <https://arxiv.org/pdf/1803.11485.pdf>`__.
- `PettingZoo multi-agent example <https://github.com/ray-project/ray/blob/master/rllib/examples/pettingzoo_env.py>`__:
   Example on how to use RLlib to learn in `PettingZoo <https://www.pettingzoo.ml>`__ multi-agent environments.
- `PPO with centralized critic on two-step game <https://github.com/ray-project/ray/blob/master/rllib/examples/centralized_critic.py>`__:
   Example of customizing PPO to leverage a centralized value function.
- `Centralized critic in the env <https://github.com/ray-project/ray/blob/master/rllib/examples/centralized_critic_2.py>`__:
   A simpler method of implementing a centralized critic by augmentating agent observations with global information.
- `Hand-coded policy <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_custom_policy.py>`__:
   Example of running a custom hand-coded policy alongside trainable policies.
- `Weight sharing between policies <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_cartpole.py>`__:
   Example of how to define weight-sharing layers between two different policies.
- `Multiple trainers <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_two_trainers.py>`__:
   Example of alternating training between two DQN and PPO trainers.
- `Hierarchical training <https://github.com/ray-project/ray/blob/master/rllib/examples/hierarchical_training.py>`__:
   Example of hierarchical training using the multi-agent API.
- `Iterated Prisoner's Dilemma environment example <https://github.com/ray-project/ray/blob/master/rllib/examples/iterated_prisoners_dilemma_env.py>`__:
   Example of an iterated prisoner's dilemma environment solved by RLlib.


GPU examples
------------
- `Example showing how to setup fractional GPUs <https://github.com/ray-project/ray/blob/master/rllib/examples/partial_gpus.py>`__:
   Example of how to setup fractional GPUs for learning (driver) and environment rollouts (remote workers).


Special Action- and Observation Spaces
--------------------------------------

- `Nested action spaces <https://github.com/ray-project/ray/blob/master/rllib/examples/nested_action_spaces.py>`__:
   Learning in arbitrarily nested action spaces.
- `Parametric actions <https://github.com/ray-project/ray/blob/master/rllib/examples/parametric_actions_cartpole.py>`__:
   Example of how to handle variable-length or parametric action spaces (see also `this example here <https://github.com/ray-project/ray/blob/master/rllib/examples/random_parametric_agent.py>`__).
- `Custom observation filters <https://github.com/ray-project/ray/blob/master/rllib/examples/custom_observation_filters.py>`__:
   How to filter raw observations coming from the environment for further processing by the Agent's model(s).
- `Using the "Repeated" space of RLlib for variable lengths observations <https://github.com/ray-project/ray/blob/master/rllib/examples/complex_struct_space.py>`__:
   How to use RLlib's `Repeated` space to handle variable length observations.
- `Autoregressive action distribution example <https://github.com/ray-project/ray/blob/master/rllib/examples/autoregressive_action_dist.py>`__:
   Learning with auto-regressive action dependencies (e.g. 2 action components; distribution for 2nd component depends on the 1st component's actually sampled value).


Community Examples
------------------
- `Arena AI <https://sites.google.com/view/arena-unity/home>`__:
   A General Evaluation Platform and Building Toolkit for Single/Multi-Agent Intelligence
   with RLlib-generated baselines.
- `CARLA <https://github.com/layssi/Carla_Ray_Rlib>`__:
   Example of training autonomous vehicles with RLlib and `CARLA <http://carla.org/>`__ simulator.
- `The Emergence of Adversarial Communication in Multi-Agent Reinforcement Learning <https://arxiv.org/pdf/2008.02616.pdf>`__:
   Using Graph Neural Networks and RLlib to train multiple cooperative and adversarial agents to solve the
   "cover the area"-problem, thereby learning how to best communicate (or - in the adversarial case - how to disturb communication).
- `Flatland <https://flatland.aicrowd.com/intro.html>`__:
   A dense traffic simulating environment with RLlib-generated baselines.
- `GFootball <https://github.com/google-research/football/blob/master/gfootball/examples/run_multiagent_rllib.py>`__:
   Example of setting up a multi-agent version of `GFootball <https://github.com/google-research>`__ with RLlib.
- `Neural MMO <https://jsuarez5341.github.io/neural-mmo/build/html/rst/userguide.html>`__:
   A multiagent AI research environment inspired by Massively Multiplayer Online (MMO) role playing games –
   self-contained worlds featuring thousands of agents per persistent macrocosm, diverse skilling systems, local and global economies, complex emergent social structures,
   and ad-hoc high-stakes single and team based conflict.
- `NeuroCuts <https://github.com/neurocuts/neurocuts>`__:
   Example of building packet classification trees using RLlib / multi-agent in a bandit-like setting.
- `NeuroVectorizer <https://github.com/ucb-bar/NeuroVectorizer>`__:
   Example of learning optimal LLVM vectorization compiler pragmas for loops in C and C++ codes using RLlib.
- `Roboschool / SageMaker <https://github.com/awslabs/amazon-sagemaker-examples/tree/master/reinforcement_learning/rl_roboschool_ray>`__:
   Example of training robotic control policies in SageMaker with RLlib.
- `Sequential Social Dilemma Games <https://github.com/eugenevinitsky/sequential_social_dilemma_games>`__:
   Example of using the multi-agent API to model several `social dilemma games <https://arxiv.org/abs/1702.03037>`__.
- `StarCraft2 <https://github.com/oxwhirl/smac>`__:
   Example of training in StarCraft2 maps with RLlib / multi-agent.
- `Traffic Flow <https://berkeleyflow.readthedocs.io/en/latest/flow_setup.html>`__:
   Example of optimizing mixed-autonomy traffic simulations with RLlib / multi-agent.
-												[rllib] Add examples page, add hierarchical training example, delete SC2 examples (#3815)

* wip

* lint

* wip

* up

* wip

* update examples

* wip

* remove carla

* update

* improve envspec

* link to custom

* Update rllib-env.rst

* update

* fix

* fn

* lint

* ds

* ssd games

* desc

* fix up docs

* fix

											
										
										
											2019-01-29 21:06:09 -08:00
+								RLlib Examples
 								==============
 								This page is an index of examples for the various use cases and features of RLlib.
 								If any example is broken, or if you'd like to add an example to this page, feel free to raise an issue on our Github repository.
 								Tuned Examples
 								--------------
-												[rllib] Try moving RLlib to top level dir (#5324)


											
										
										
											2019-08-05 23:25:49 -07:00
+								- `Tuned examples <https://github.com/ray-project/ray/blob/master/rllib/tuned_examples>`__:
-												[RLlib] Auto-framework, retire `use_pytorch` in favor of `framework=...` (#8520)


											
										
										
											2020-05-27 16:19:13 +02:00
+								   Collection of tuned hyperparameters by algorithm.
 								- `MuJoCo and Atari benchmarks <https://github.com/ray-project/rl-experiments>`__:
 								   Collection of reasonably optimized Atari and MuJoCo results.
-												[rllib] Add examples page, add hierarchical training example, delete SC2 examples (#3815)

* wip

* lint

* wip

* up

* wip

* update examples

* wip

* remove carla

* update

* improve envspec

* link to custom

* Update rllib-env.rst

* update

* fix

* fn

* lint

* ds

* ssd games

* desc

* fix up docs

* fix

											
										
										
											2019-01-29 21:06:09 -08:00
-												[rllib] add blog posts to examples list (#5762)

* add blog post

* remove

* link

											
										
										
											2019-09-23 10:42:21 -07:00
+								Blog Posts
 								----------
-												[RLlib] Docs: Example scripts and blogs documentation update. (#15763)


											
										
										
											2021-05-16 15:24:38 +02:00
+								- `Attention Nets and More with RLlib’s Trajectory View API <https://medium.com/distributed-computing-with-ray/attention-nets-and-more-with-rllibs-trajectory-view-api-d326339a6e65>`__:
 								   This blog describes RLlib's new "trajectory view API" and how it enables implementations of GTrXL (attention net) architectures.
 								- `Reinforcement Learning with RLlib in the Unity Game Engine <https://medium.com/distributed-computing-with-ray/reinforcement-learning-with-rllib-in-the-unity-game-engine-1a98080a7c0d>`__:
 								   A how-to on connectig RLlib with the Unity3D game engine for running visual- and physics-based RL experiments.
 								- `Lessons from Implementing 12 Deep RL Algorithms in TF and PyTorch <https://medium.com/distributed-computing-with-ray/lessons-from-implementing-12-deep-rl-algorithms-in-tf-and-pytorch-1b412009297d>`__:
 								   Discussion on how we ported 12 of RLlib's algorithms from TensorFlow to PyTorch and what we learnt on the way.
-												[rllib] add blog posts to examples list (#5762)

* add blog post

* remove

* link

											
										
										
											2019-09-23 10:42:21 -07:00
+								- `Scaling Multi-Agent Reinforcement Learning <http://bair.berkeley.edu/blog/2018/12/12/rllib>`__:
 								   This blog post is a brief tutorial on multi-agent RL and its design in RLlib.
 								- `Functional RL with Keras and TensorFlow Eager <https://medium.com/riselab/functional-rl-with-keras-and-tensorflow-eager-7973f81d6345>`__:
 								   Exploration of a functional paradigm for implementing reinforcement learning (RL) algorithms.
-												[RLlib] Docs: Example scripts and blogs documentation update. (#15763)


											
										
										
											2021-05-16 15:24:38 +02:00
+								Environments and Adapters
 								-------------------------
-												[rllib] Add examples page, add hierarchical training example, delete SC2 examples (#3815)

* wip

* lint

* wip

* up

* wip

* update examples

* wip

* remove carla

* update

* improve envspec

* link to custom

* Update rllib-env.rst

* update

* fix

* fn

* lint

* ds

* ssd games

* desc

* fix up docs

* fix

											
										
										
											2019-01-29 21:06:09 -08:00
-												[RLlib] Docs: Example scripts and blogs documentation update. (#15763)


											
										
										
											2021-05-16 15:24:38 +02:00
+								- `Registering a custom env and model <https://github.com/ray-project/ray/blob/master/rllib/examples/custom_env.py>`__:
 								   Example of defining and registering a gym env and model for use with RLlib.
-												[RLlib] Unity3D integration (n Unity3D clients vs learning server). (#8590)


											
										
										
											2020-05-30 22:48:34 +02:00
+								- `Local Unity3D multi-agent environment example <https://github.com/ray-project/ray/tree/master/rllib/examples/unity3d_env_local.py>`__:
 								   Example of how to setup an RLlib Trainer against a locally running Unity3D editor instance to
 								   learn any Unity3D game (including support for multi-agent).
 								   Use this example to try things out and watch the game and the learning progress live in the editor.
 								   Providing a compiled game, this example could also run in distributed fashion with `num_workers > 0`.
-												 [Serve] Add RLlib tutorial (#14194)


											
										
										
											2021-02-22 13:23:12 -08:00
+								   For a more heavy-weight, distributed, cloud-based example, see ``Unity3D client/server`` below.
-												[RLlib] Docs: Example scripts and blogs documentation update. (#15763)


											
										
										
											2021-05-16 15:24:38 +02:00
+								- `Rendering and recording of an environment <https://github.com/ray-project/ray/blob/master/rllib/examples/env_rendering_and_recording.py>`__:
 								   Example showing how to switch on rendering and recording of an env.
 								- `Coin Game Example <https://github.com/ray-project/ray/blob/master/rllib/examples/coin_game_env.py>`__:
 								   Coin Game Env Example (provided by the "Center on Long Term Risk").
 								- `DMLab Watermaze example <https://github.com/ray-project/ray/blob/master/rllib/examples/dmlab_watermaze.py>`__:
 								   Example for how to use a DMLab environment (Watermaze).
 								- `RecSym environment example (for recommender systems) using the SlateQ algorithm <https://github.com/ray-project/ray/blob/master/rllib/examples/recsim_with_slateq.py>`__:
 								   Script showing how to train a SlateQTrainer on a RecSym environment.
 								- `SUMO (Simulation of Urban MObility) environment example <https://github.com/ray-project/ray/blob/master/rllib/examples/sumo_env_local.py>`__:
 								   Example demonstrating how to use the SUMO simulator in connection with RLlib.
 								- `VizDoom example script using RLlib's auto-attention wrapper <https://github.com/ray-project/ray/blob/master/rllib/examples/vizdoom_with_attention_net.py>`__:
 								   Script showing how to run PPO with an attention net against a VizDoom gym environment.
 								- `Subprocess environment <https://github.com/ray-project/ray/blob/master/rllib/tests/test_env_with_subprocess.py>`__:
 								   Example of how to ensure subprocesses spawned by envs are killed when RLlib exits.
 								Custom- and Complex Models
 								--------------------------
 								- `Attention Net (GTrXL) learning the "repeat-after-me" environment <https://github.com/ray-project/ray/blob/master/rllib/examples/attention_net.py>`__:
 								   Example showing how to use the auto-attention wrapper for your default- and custom models in RLlib.
 								- `LSTM model learning the "repeat-after-me" environment <https://github.com/ray-project/ray/blob/master/rllib/examples/lstm_auto_wrapping.py>`__:
 								   Example showing how to use the auto-LSTM wrapper for your default- and custom models in RLlib.
-												[rllib] Try moving RLlib to top level dir (#5324)


											
										
										
											2019-08-05 23:25:49 -07:00
+								- `Custom Keras model <https://github.com/ray-project/ray/blob/master/rllib/examples/custom_keras_model.py>`__:
-												[rllib] Document ModelV2 and clean up the models/ directory (#5277)


											
										
										
											2019-07-27 02:08:16 -07:00
+								   Example of using a custom Keras model.
-												[RLlib] Docs: Example scripts and blogs documentation update. (#15763)


											
										
										
											2021-05-16 15:24:38 +02:00
+								- `Custom Keras/PyTorch RNN model <https://github.com/ray-project/ray/blob/master/rllib/examples/custom_rnn_model.py>`__:
-												[RLlib] Unity3D integration (n Unity3D clients vs learning server). (#8590)


											
										
										
											2020-05-30 22:48:34 +02:00
+								   Example of using a custom Keras- or PyTorch RNN model.
-												[rllib] Try moving RLlib to top level dir (#5324)


											
										
										
											2019-08-05 23:25:49 -07:00
+								- `Registering a custom model with supervised loss <https://github.com/ray-project/ray/blob/master/rllib/examples/custom_loss.py>`__:
-												[rllib] Custom supervised loss API (#4083)


											
										
										
											2019-02-24 15:36:13 -08:00
+								   Example of defining and registering a custom model with a supervised loss.
-												[rllib] Try moving RLlib to top level dir (#5324)


											
										
										
											2019-08-05 23:25:49 -07:00
+								- `Batch normalization <https://github.com/ray-project/ray/blob/master/rllib/examples/batch_norm_model.py>`__:
-												[rllib] Add examples page, add hierarchical training example, delete SC2 examples (#3815)

* wip

* lint

* wip

* up

* wip

* update examples

* wip

* remove carla

* update

* improve envspec

* link to custom

* Update rllib-env.rst

* update

* fix

* fn

* lint

* ds

* ssd games

* desc

* fix up docs

* fix

											
										
										
											2019-01-29 21:06:09 -08:00
+								   Example of adding batch norm layers to a custom model.
-												[rllib] Try moving RLlib to top level dir (#5324)


											
										
										
											2019-08-05 23:25:49 -07:00
+								- `Eager execution <https://github.com/ray-project/ray/blob/master/rllib/examples/eager_execution.py>`__:
-												[rllib] Add docs on how to use TF eager execution (#4927)


											
										
										
											2019-06-07 16:42:37 -07:00
+								   Example of how to leverage TensorFlow eager to simplify debugging and design of custom models and policies.
-												[RLlib] Docs: Example scripts and blogs documentation update. (#15763)


											
										
										
											2021-05-16 15:24:38 +02:00
+								- `Custom "Fast" Model <https://github.com/ray-project/ray/blob/master/rllib/examples/custom_fast_model.py>`__:
 								   Example of a "fast" Model learning only one parameter for tf and torch.
 								- `Custom model API example <https://github.com/ray-project/ray/blob/master/rllib/examples/custom_model_api.py>`__:
 								   Shows how to define a custom Model API in RLlib, such that it can be used inside certain algorithms.
 								- `Trajectory View API utilizing model <https://github.com/ray-project/ray/blob/master/rllib/examples/trajectory_view_api.py>`__:
 								   An example on how a model can use the trajectory view API to specify its own input.
 								- `MobileNetV2 wrapping example model <https://github.com/ray-project/ray/blob/master/rllib/examples/mobilenet_v2_with_lstm.py>`__:
 								   Implementations of `tf.keras.applications.mobilenet_v2.MobileNetV2` and `torch.hub (mobilenet_v2)`-wrapping example models.
-												[RLlib] Add differentiable neural computer example (#14844)


											
										
										
											2021-05-19 00:15:39 -07:00
+								- `Differentiable Neural Computer <https://github.com/ray-project/ray/blob/master/rllib/examples/neural_computer.py>`__:
 								   Example of DeepMind's Differentiable Neural Computer for partially-observable environments.
-												[RLlib] Docs: Example scripts and blogs documentation update. (#15763)


											
										
										
											2021-05-16 15:24:38 +02:00
 								Training Workflows
 								------------------
 								- `Custom training workflows <https://github.com/ray-project/ray/blob/master/rllib/examples/custom_train_fn.py>`__:
 								   Example of how to use Tune's support for custom training functions to implement custom training workflows.
-												[RLlib] Add simple curriculum learning API and example script. (#15740)


											
										
										
											2021-05-16 17:35:10 +02:00
+								- `Curriculum learning with the TaskSettableEnv API <https://github.com/ray-project/ray/blob/master/rllib/examples/curriculum_learning.py>`__:
 								   Example of how to advance the environment through different phases (tasks) over time.
 								   Also see the `curriculum learning how-to <rllib-training.html#example-curriculum-learning>`__ from the documentation here.
-												[RLlib] Docs: Example scripts and blogs documentation update. (#15763)


											
										
										
											2021-05-16 15:24:38 +02:00
+								- `Custom logger <https://github.com/ray-project/ray/blob/master/rllib/examples/custom_logger.py>`__:
 								   How to setup a custom Logger object in RLlib.
 								- `Custom metrics <https://github.com/ray-project/ray/blob/master/rllib/examples/custom_metrics_and_callbacks.py>`__:
 								   Example of how to output custom training metrics to TensorBoard.
 								- `Custom Policy class (TensorFlow) <https://github.com/ray-project/ray/blob/master/rllib/examples/custom_tf_policy.py>`__:
 								   How to setup a custom TFPolicy.
 								- `Custom Policy class (PyTorch) <https://github.com/ray-project/ray/blob/master/rllib/examples/custom_torch_policy.py>`__:
 								   How to setup a custom TorchPolicy.
 								- `Using rollout workers directly for control over the whole training workflow <https://github.com/ray-project/ray/blob/master/rllib/examples/rollout_worker_custom_workflow.py>`__:
 								   Example of how to use RLlib's lower-level building blocks to implement a fully customized training workflow.
 								- `Custom execution plan function handling two different Policies (DQN and PPO) at the same time <https://github.com/ray-project/ray/blob/master/rllib/examples/two_trainer_workflow.py>`__:
 								   Example of how to use the exec. plan of a Trainer to trin two different policies in parallel (also using multi-agent API).
 								- `Custom tune experiment <https://github.com/ray-project/ray/blob/master/rllib/examples/custom_experiment.py>`__:
 								   How to run a custom Ray Tune experiment with RLlib with custom training- and evaluation phases.
 								Evaluation:
 								-----------
 								- `Custom evaluation function <https://github.com/ray-project/ray/blob/master/rllib/examples/custom_eval.py>`__:
 								   Example of how to write a custom evaluation function that is called instead of the default behavior, which is running with the evaluation worker set through n episodes.
 								- `Parallel evaluation and training <https://github.com/ray-project/ray/blob/master/rllib/examples/parallel_evaluation_and_training.py>`__:
 								   Example showing how the evaluation workers and the "normal" rollout workers can run (to some extend) in parallel to speed up training.
-												[rllib] Add examples page, add hierarchical training example, delete SC2 examples (#3815)

* wip

* lint

* wip

* up

* wip

* update examples

* wip

* remove carla

* update

* improve envspec

* link to custom

* Update rllib-env.rst

* update

* fix

* fn

* lint

* ds

* ssd games

* desc

* fix up docs

* fix

											
										
										
											2019-01-29 21:06:09 -08:00
 								Serving and Offline
 								-------------------
-												[RLlib] Docs: Example scripts and blogs documentation update. (#15763)


											
										
										
											2021-05-16 15:24:38 +02:00
+								- `Offline RL with CQL <https://github.com/ray-project/ray/tree/master/rllib/examples/serving/offline_rl.py>`__:
 								   Example showing how to run an offline RL training job using a historic-data json file.
-												 [Serve] Add RLlib tutorial (#14194)


											
										
										
											2021-02-22 13:23:12 -08:00
+								- :ref:`Serving RLlib models with Ray Serve <serve-rllib-tutorial>`: Example of using Ray Serve to serve RLlib models
 								   with HTTP and JSON interface. **This is the recommended way to expose RLlib for online serving use case**.
-												[RLlib] Unity3D integration (n Unity3D clients vs learning server). (#8590)


											
										
										
											2020-05-30 22:48:34 +02:00
+								- `Unity3D client/server <https://github.com/ray-project/ray/tree/master/rllib/examples/serving/unity3d_server.py>`__:
 								   Example of how to setup n distributed Unity3D (compiled) games in the cloud that function as data collecting
 								   clients against a central RLlib Policy server learning how to play the game.
 								   The n distributed clients could themselves be servers for external/human players and allow for control
 								   being fully in the hands of the Unity entities instead of RLlib.
 								   Note: Uses Unity's MLAgents SDK (>=1.0) and supports all provided MLAgents example games and multi-agent setups.
 								- `CartPole client/server <https://github.com/ray-project/ray/tree/master/rllib/examples/serving/cartpole_server.py>`__:
-												[rllib] Add examples page, add hierarchical training example, delete SC2 examples (#3815)

* wip

* lint

* wip

* up

* wip

* update examples

* wip

* remove carla

* update

* improve envspec

* link to custom

* Update rllib-env.rst

* update

* fix

* fn

* lint

* ds

* ssd games

* desc

* fix up docs

* fix

											
										
										
											2019-01-29 21:06:09 -08:00
+								   Example of online serving of predictions for a simple CartPole policy.
-												[rllib] Try moving RLlib to top level dir (#5324)


											
										
										
											2019-08-05 23:25:49 -07:00
+								- `Saving experiences <https://github.com/ray-project/ray/blob/master/rllib/examples/saving_experiences.py>`__:
-												[rllib] Add examples page, add hierarchical training example, delete SC2 examples (#3815)

* wip

* lint

* wip

* up

* wip

* update examples

* wip

* remove carla

* update

* improve envspec

* link to custom

* Update rllib-env.rst

* update

* fix

* fn

* lint

* ds

* ssd games

* desc

* fix up docs

* fix

											
										
										
											2019-01-29 21:06:09 -08:00
+								   Example of how to externally generate experience batches in RLlib-compatible format.
-												[RLlib] Docs: Example scripts and blogs documentation update. (#15763)


											
										
										
											2021-05-16 15:24:38 +02:00
+								- `Finding a checkpoint using custom criteria <https://github.com/ray-project/ray/blob/master/rllib/examples/checkpoint_by_custom_criteria.py>`__:
 								   Example of how to find a checkpoint after a `tune.run` via some custom defined criteria.
-												[rllib] Add examples page, add hierarchical training example, delete SC2 examples (#3815)

* wip

* lint

* wip

* up

* wip

* update examples

* wip

* remove carla

* update

* improve envspec

* link to custom

* Update rllib-env.rst

* update

* fix

* fn

* lint

* ds

* ssd games

* desc

* fix up docs

* fix

											
										
										
											2019-01-29 21:06:09 -08:00
 								Multi-Agent and Hierarchical
 								----------------------------
-												[RLlib] Docs: Example scripts and blogs documentation update. (#15763)


											
										
										
											2021-05-16 15:24:38 +02:00
+								- `Simple independent multi-agent setup vs a PettingZoo env <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_independent_learning.py>`__:
 								   Setup RLlib to run any algorithm in (independent) multi-agent mode against a multi-agent environment.
 								- `More complex (shared-parameter) multi-agent setup vs a PettingZoo env <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_parameter_sharing.py>`__:
 								   Setup RLlib to run any algorithm in (shared-parameter) multi-agent mode against a multi-agent environment.
-												[rllib] Try moving RLlib to top level dir (#5324)


											
										
										
											2019-08-05 23:25:49 -07:00
+								- `Rock-paper-scissors <https://github.com/ray-project/ray/blob/master/rllib/examples/rock_paper_scissors_multiagent.py>`__:
-												[rllib] Add rock paper scissors multi-agent example (#5336)


											
										
										
											2019-08-01 13:03:59 -07:00
+								   Example of different heuristic and learned policies competing against each other in rock-paper-scissors.
-												[RLlib] Issue 8384: QMIX doesn't learn anything. (#9527)


											
										
										
											2020-07-17 12:14:34 +02:00
+								- `Two-step game <https://github.com/ray-project/ray/blob/master/rllib/examples/two_step_game.py>`__:
-												[rllib] Add examples page, add hierarchical training example, delete SC2 examples (#3815)

* wip

* lint

* wip

* up

* wip

* update examples

* wip

* remove carla

* update

* improve envspec

* link to custom

* Update rllib-env.rst

* update

* fix

* fn

* lint

* ds

* ssd games

* desc

* fix up docs

* fix

											
										
										
											2019-01-29 21:06:09 -08:00
+								   Example of the two-step game from the `QMIX paper <https://arxiv.org/pdf/1803.11485.pdf>`__.
-												[RLlib] Docs: Example scripts and blogs documentation update. (#15763)


											
										
										
											2021-05-16 15:24:38 +02:00
+								- `PettingZoo multi-agent example <https://github.com/ray-project/ray/blob/master/rllib/examples/pettingzoo_env.py>`__:
 								   Example on how to use RLlib to learn in `PettingZoo <https://www.pettingzoo.ml>`__ multi-agent environments.
-												[rllib] Centralized critic / PPO example on TwoStepGame (#5392)


											
										
										
											2019-08-08 14:03:28 -07:00
+								- `PPO with centralized critic on two-step game <https://github.com/ray-project/ray/blob/master/rllib/examples/centralized_critic.py>`__:
 								   Example of customizing PPO to leverage a centralized value function.
 								- `Centralized critic in the env <https://github.com/ray-project/ray/blob/master/rllib/examples/centralized_critic_2.py>`__:
 								   A simpler method of implementing a centralized critic by augmentating agent observations with global information.
-												#7246 - Fixing broken links (#7247)

* #7246 - Fixing broken links

* Apply suggestions from code review

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
											
										
										
											2020-03-26 00:46:13 -04:00
+								- `Hand-coded policy <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_custom_policy.py>`__:
-												[rllib] Add multi-agent examples for hand-coded policy, centralized VF (#4554)


											
										
										
											2019-04-09 00:36:49 -07:00
+								   Example of running a custom hand-coded policy alongside trainable policies.
-												#7246 - Fixing broken links (#7247)

* #7246 - Fixing broken links

* Apply suggestions from code review

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
											
										
										
											2020-03-26 00:46:13 -04:00
+								- `Weight sharing between policies <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_cartpole.py>`__:
-												[rllib] Add examples page, add hierarchical training example, delete SC2 examples (#3815)

* wip

* lint

* wip

* up

* wip

* update examples

* wip

* remove carla

* update

* improve envspec

* link to custom

* Update rllib-env.rst

* update

* fix

* fn

* lint

* ds

* ssd games

* desc

* fix up docs

* fix

											
										
										
											2019-01-29 21:06:09 -08:00
+								   Example of how to define weight-sharing layers between two different policies.
-												#7246 - Fixing broken links (#7247)

* #7246 - Fixing broken links

* Apply suggestions from code review

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
											
										
										
											2020-03-26 00:46:13 -04:00
+								- `Multiple trainers <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_two_trainers.py>`__:
-												[rllib] Add examples page, add hierarchical training example, delete SC2 examples (#3815)

* wip

* lint

* wip

* up

* wip

* update examples

* wip

* remove carla

* update

* improve envspec

* link to custom

* Update rllib-env.rst

* update

* fix

* fn

* lint

* ds

* ssd games

* desc

* fix up docs

* fix

											
										
										
											2019-01-29 21:06:09 -08:00
+								   Example of alternating training between two DQN and PPO trainers.
-												[rllib] Try moving RLlib to top level dir (#5324)


											
										
										
											2019-08-05 23:25:49 -07:00
+								- `Hierarchical training <https://github.com/ray-project/ray/blob/master/rllib/examples/hierarchical_training.py>`__:
-												[rllib] Add examples page, add hierarchical training example, delete SC2 examples (#3815)

* wip

* lint

* wip

* up

* wip

* update examples

* wip

* remove carla

* update

* improve envspec

* link to custom

* Update rllib-env.rst

* update

* fix

* fn

* lint

* ds

* ssd games

* desc

* fix up docs

* fix

											
										
										
											2019-01-29 21:06:09 -08:00
+								   Example of hierarchical training using the multi-agent API.
-												[RLlib] Docs: Example scripts and blogs documentation update. (#15763)


											
										
										
											2021-05-16 15:24:38 +02:00
+								- `Iterated Prisoner's Dilemma environment example <https://github.com/ray-project/ray/blob/master/rllib/examples/iterated_prisoners_dilemma_env.py>`__:
 								   Example of an iterated prisoner's dilemma environment solved by RLlib.
 								GPU examples
 								------------
 								- `Example showing how to setup fractional GPUs <https://github.com/ray-project/ray/blob/master/rllib/examples/partial_gpus.py>`__:
 								   Example of how to setup fractional GPUs for learning (driver) and environment rollouts (remote workers).
 								Special Action- and Observation Spaces
 								--------------------------------------
 								- `Nested action spaces <https://github.com/ray-project/ray/blob/master/rllib/examples/nested_action_spaces.py>`__:
 								   Learning in arbitrarily nested action spaces.
 								- `Parametric actions <https://github.com/ray-project/ray/blob/master/rllib/examples/parametric_actions_cartpole.py>`__:
 								   Example of how to handle variable-length or parametric action spaces (see also `this example here <https://github.com/ray-project/ray/blob/master/rllib/examples/random_parametric_agent.py>`__).
 								- `Custom observation filters <https://github.com/ray-project/ray/blob/master/rllib/examples/custom_observation_filters.py>`__:
 								   How to filter raw observations coming from the environment for further processing by the Agent's model(s).
 								- `Using the "Repeated" space of RLlib for variable lengths observations <https://github.com/ray-project/ray/blob/master/rllib/examples/complex_struct_space.py>`__:
 								   How to use RLlib's `Repeated` space to handle variable length observations.
 								- `Autoregressive action distribution example <https://github.com/ray-project/ray/blob/master/rllib/examples/autoregressive_action_dist.py>`__:
 								   Learning with auto-regressive action dependencies (e.g. 2 action components; distribution for 2nd component depends on the 1st component's actually sampled value).
-												[rllib] Add examples page, add hierarchical training example, delete SC2 examples (#3815)

* wip

* lint

* wip

* up

* wip

* update examples

* wip

* remove carla

* update

* improve envspec

* link to custom

* Update rllib-env.rst

* update

* fix

* fn

* lint

* ds

* ssd games

* desc

* fix up docs

* fix

											
										
										
											2019-01-29 21:06:09 -08:00
 								Community Examples
 								------------------
-												[RLlib] Model documentation enhancements. (#10011)


											
										
										
											2020-08-13 13:36:40 +02:00
+								- `Arena AI <https://sites.google.com/view/arena-unity/home>`__:
 								   A General Evaluation Platform and Building Toolkit for Single/Multi-Agent Intelligence
 								   with RLlib-generated baselines.
-												Added CARLA Community Example (#5333)


											
										
										
											2019-07-31 21:10:50 -04:00
+								- `CARLA <https://github.com/layssi/Carla_Ray_Rlib>`__:
 								   Example of training autonomous vehicles with RLlib and `CARLA <http://carla.org/>`__ simulator.
-												[rllib] Try fixing torch GPU and masking errors (#10168)


											
										
										
											2020-08-25 18:34:19 -07:00
+								- `The Emergence of Adversarial Communication in Multi-Agent Reinforcement Learning <https://arxiv.org/pdf/2008.02616.pdf>`__:
-												[RLlib] Model documentation enhancements. (#10011)


											
										
										
											2020-08-13 13:36:40 +02:00
+								   Using Graph Neural Networks and RLlib to train multiple cooperative and adversarial agents to solve the
 								   "cover the area"-problem, thereby learning how to best communicate (or - in the adversarial case - how to disturb communication).
 								- `Flatland <https://flatland.aicrowd.com/intro.html>`__:
 								   A dense traffic simulating environment with RLlib-generated baselines.
-												[rllib] Add rock paper scissors multi-agent example (#5336)


											
										
										
											2019-08-01 13:03:59 -07:00
+								- `GFootball <https://github.com/google-research/football/blob/master/gfootball/examples/run_multiagent_rllib.py>`__:
 								   Example of setting up a multi-agent version of `GFootball <https://github.com/google-research>`__ with RLlib.
-												[RLlib] Model documentation enhancements. (#10011)


											
										
										
											2020-08-13 13:36:40 +02:00
+								- `Neural MMO <https://jsuarez5341.github.io/neural-mmo/build/html/rst/userguide.html>`__:
 								   A multiagent AI research environment inspired by Massively Multiplayer Online (MMO) role playing games –
 								   self-contained worlds featuring thousands of agents per persistent macrocosm, diverse skilling systems, local and global economies, complex emergent social structures,
 								   and ad-hoc high-stakes single and team based conflict.
-												[rllib] Add rock paper scissors multi-agent example (#5336)


											
										
										
											2019-08-01 13:03:59 -07:00
+								- `NeuroCuts <https://github.com/neurocuts/neurocuts>`__:
 								   Example of building packet classification trees using RLlib / multi-agent in a bandit-like setting.
-												Update rllib-examples.rst (#6396)


											
										
										
											2019-12-08 16:21:50 -08:00
+								- `NeuroVectorizer <https://github.com/ucb-bar/NeuroVectorizer>`__:
 								   Example of learning optimal LLVM vectorization compiler pragmas for loops in C and C++ codes using RLlib.
-												[rllib] Add examples page, add hierarchical training example, delete SC2 examples (#3815)

* wip

* lint

* wip

* up

* wip

* update examples

* wip

* remove carla

* update

* improve envspec

* link to custom

* Update rllib-env.rst

* update

* fix

* fn

* lint

* ds

* ssd games

* desc

* fix up docs

* fix

											
										
										
											2019-01-29 21:06:09 -08:00
+								- `Roboschool / SageMaker <https://github.com/awslabs/amazon-sagemaker-examples/tree/master/reinforcement_learning/rl_roboschool_ray>`__:
 								   Example of training robotic control policies in SageMaker with RLlib.
-												[RLlib] Model documentation enhancements. (#10011)


											
										
										
											2020-08-13 13:36:40 +02:00
+								- `Sequential Social Dilemma Games <https://github.com/eugenevinitsky/sequential_social_dilemma_games>`__:
 								   Example of using the multi-agent API to model several `social dilemma games <https://arxiv.org/abs/1702.03037>`__.
-												[rllib] Add examples page, add hierarchical training example, delete SC2 examples (#3815)

* wip

* lint

* wip

* up

* wip

* update examples

* wip

* remove carla

* update

* improve envspec

* link to custom

* Update rllib-env.rst

* update

* fix

* fn

* lint

* ds

* ssd games

* desc

* fix up docs

* fix

											
										
										
											2019-01-29 21:06:09 -08:00
+								- `StarCraft2 <https://github.com/oxwhirl/smac>`__:
 								   Example of training in StarCraft2 maps with RLlib / multi-agent.
-												Fix broken link to Flow docs (#14058)


											
										
										
											2021-02-11 22:20:34 +01:00
+								- `Traffic Flow <https://berkeleyflow.readthedocs.io/en/latest/flow_setup.html>`__:
-												[rllib] Add rock paper scissors multi-agent example (#5336)


											
										
										
											2019-08-01 13:03:59 -07:00
+								   Example of optimizing mixed-autonomy traffic simulations with RLlib / multi-agent.