hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 02:21:39 -05:00

History

Yi Cheng 7b8b0f8e03 Revert "[RLlib] Remove execution plan code no longer used by RLlib. (#25624 )" (#25776 ) This reverts commit `804719876b`.		2022-06-14 13:59:15 -07:00
..
tests	[RLlib] Move all remaining algos into `algorithms` directory. (#25366 )	2022-06-04 07:35:24 +02:00
__init__.py	[RLlib] Move all remaining algos into `algorithms` directory. (#25366 )	2022-06-04 07:35:24 +02:00
ddppo.py	Revert "[RLlib] Remove execution plan code no longer used by RLlib. (#25624 )" (#25776 )	2022-06-14 13:59:15 -07:00
README.md	[RLlib] Move all remaining algos into `algorithms` directory. (#25366 )	2022-06-04 07:35:24 +02:00

README.md

Decentralized Distributed Proximal Policy Optimization (DDPPO)

Overview

PPO is a model-free on-policy RL algorithm that works well for both discrete and continuous action space environments. PPO utilizes an actor-critic framework, where there are two networks, an actor (policy network) and critic network (value function).

Distributed PPO Algorithms

Distributed baseline PPO

See implementation here

Asychronous PPO (APPO)

See implementation here

Decentralized Distributed PPO (DDPPO) ..

.. removes the assumption that gradient-updates must be done on a central node. Instead, gradients are computed remotely on each data collection node and all-reduced at each mini-batch using torch distributed. This allows each worker’s GPU to be used both for sampling and for training.

See implementation here

Documentation & Implementation: