ray/rllib/algorithms/ddppo/README.md

35 lines
1.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Decentralized Distributed Proximal Policy Optimization (DDPPO)
## Overview
[PPO](https://arxiv.org/abs/1707.06347) is a model-free on-policy RL algorithm that works
well for both discrete and continuous action space environments. PPO utilizes an
actor-critic framework, where there are two networks, an actor (policy network) and
critic network (value function).
## Distributed PPO Algorithms
### Distributed baseline PPO
[See implementation here](https://github.com/ray-project/ray/blob/master/rllib/algorithms/ppo/ppo.py)
### Asychronous PPO (APPO)
[See implementation here](https://github.com/ray-project/ray/blob/master/rllib/algorithms/appo/appo.py)
### Decentralized Distributed PPO (DDPPO) ..
.. removes the assumption that gradient-updates must
be done on a central node. Instead, gradients are computed remotely on each data
collection node and all-reduced at each mini-batch using torch distributed. This allows
each workers GPU to be used both for sampling and for training.
[See implementation here](https://github.com/ray-project/ray/blob/master/rllib/algorithms/ddppo/ddppo.py)
## Documentation & Implementation:
### [Decentralized Distributed Proximal Policy Optimization (DDPPO)](https://arxiv.org/abs/1911.00357)
**[Detailed Documentation](https://docs.ray.io/en/master/rllib-algorithms.html#decentralized-distributed-proximal-policy-optimization-dd-ppo)**
**[Implementation](https://github.com/ray-project/ray/blob/master/rllib/algorithms/ddppo/ddppo.py)**