-`Attention Nets and More with RLlib’s Trajectory View API <https://medium.com/distributed-computing-with-ray/attention-nets-and-more-with-rllibs-trajectory-view-api-d326339a6e65>`__:
This blog describes RLlib's new "trajectory view API" and how it enables implementations of GTrXL (attention net) architectures.
-`Reinforcement Learning with RLlib in the Unity Game Engine <https://medium.com/distributed-computing-with-ray/reinforcement-learning-with-rllib-in-the-unity-game-engine-1a98080a7c0d>`__:
A how-to on connectig RLlib with the Unity3D game engine for running visual- and physics-based RL experiments.
-`Lessons from Implementing 12 Deep RL Algorithms in TF and PyTorch <https://medium.com/distributed-computing-with-ray/lessons-from-implementing-12-deep-rl-algorithms-in-tf-and-pytorch-1b412009297d>`__:
Discussion on how we ported 12 of RLlib's algorithms from TensorFlow to PyTorch and what we learnt on the way.
-`Rendering and recording of an environment <https://github.com/ray-project/ray/blob/master/rllib/examples/env_rendering_and_recording.py>`__:
Example showing how to switch on rendering and recording of an env.
-`Coin Game Example <https://github.com/ray-project/ray/blob/master/rllib/examples/coin_game_env.py>`__:
Coin Game Env Example (provided by the "Center on Long Term Risk").
-`DMLab Watermaze example <https://github.com/ray-project/ray/blob/master/rllib/examples/dmlab_watermaze.py>`__:
Example for how to use a DMLab environment (Watermaze).
-`RecSym environment example (for recommender systems) using the SlateQ algorithm <https://github.com/ray-project/ray/blob/master/rllib/examples/recsim_with_slateq.py>`__:
Script showing how to train a SlateQTrainer on a RecSym environment.
-`SUMO (Simulation of Urban MObility) environment example <https://github.com/ray-project/ray/blob/master/rllib/examples/sumo_env_local.py>`__:
Example demonstrating how to use the SUMO simulator in connection with RLlib.
-`VizDoom example script using RLlib's auto-attention wrapper <https://github.com/ray-project/ray/blob/master/rllib/examples/vizdoom_with_attention_net.py>`__:
Script showing how to run PPO with an attention net against a VizDoom gym environment.
Example of how to ensure subprocesses spawned by envs are killed when RLlib exits.
Custom- and Complex Models
--------------------------
-`Attention Net (GTrXL) learning the "repeat-after-me" environment <https://github.com/ray-project/ray/blob/master/rllib/examples/attention_net.py>`__:
Example showing how to use the auto-attention wrapper for your default- and custom models in RLlib.
-`LSTM model learning the "repeat-after-me" environment <https://github.com/ray-project/ray/blob/master/rllib/examples/lstm_auto_wrapping.py>`__:
Example showing how to use the auto-LSTM wrapper for your default- and custom models in RLlib.
Example of how to output custom training metrics to TensorBoard.
-`Custom Policy class (TensorFlow) <https://github.com/ray-project/ray/blob/master/rllib/examples/custom_tf_policy.py>`__:
How to setup a custom TFPolicy.
-`Custom Policy class (PyTorch) <https://github.com/ray-project/ray/blob/master/rllib/examples/custom_torch_policy.py>`__:
How to setup a custom TorchPolicy.
-`Using rollout workers directly for control over the whole training workflow <https://github.com/ray-project/ray/blob/master/rllib/examples/rollout_worker_custom_workflow.py>`__:
Example of how to use RLlib's lower-level building blocks to implement a fully customized training workflow.
-`Custom execution plan function handling two different Policies (DQN and PPO) at the same time <https://github.com/ray-project/ray/blob/master/rllib/examples/two_trainer_workflow.py>`__:
Example of how to use the exec. plan of a Trainer to trin two different policies in parallel (also using multi-agent API).
How to run a custom Ray Tune experiment with RLlib with custom training- and evaluation phases.
Evaluation:
-----------
-`Custom evaluation function <https://github.com/ray-project/ray/blob/master/rllib/examples/custom_eval.py>`__:
Example of how to write a custom evaluation function that is called instead of the default behavior, which is running with the evaluation worker set through n episodes.
-`Parallel evaluation and training <https://github.com/ray-project/ray/blob/master/rllib/examples/parallel_evaluation_and_training.py>`__:
Example showing how the evaluation workers and the "normal" rollout workers can run (to some extend) in parallel to speed up training.
-`Simple independent multi-agent setup vs a PettingZoo env <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_independent_learning.py>`__:
Setup RLlib to run any algorithm in (independent) multi-agent mode against a multi-agent environment.
-`More complex (shared-parameter) multi-agent setup vs a PettingZoo env <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_parameter_sharing.py>`__:
Setup RLlib to run any algorithm in (shared-parameter) multi-agent mode against a multi-agent environment.
Example of how to handle variable-length or parametric action spaces (see also `this example here <https://github.com/ray-project/ray/blob/master/rllib/examples/random_parametric_agent.py>`__).
How to filter raw observations coming from the environment for further processing by the Agent's model(s).
-`Using the "Repeated" space of RLlib for variable lengths observations <https://github.com/ray-project/ray/blob/master/rllib/examples/complex_struct_space.py>`__:
How to use RLlib's `Repeated` space to handle variable length observations.
-`Autoregressive action distribution example <https://github.com/ray-project/ray/blob/master/rllib/examples/autoregressive_action_dist.py>`__:
Learning with auto-regressive action dependencies (e.g. 2 action components; distribution for 2nd component depends on the 1st component's actually sampled value).
A multiagent AI research environment inspired by Massively Multiplayer Online (MMO) role playing games –
self-contained worlds featuring thousands of agents per persistent macrocosm, diverse skilling systems, local and global economies, complex emergent social structures,
and ad-hoc high-stakes single and team based conflict.