mirror of
https://github.com/vale981/ray
synced 2025-03-06 02:21:39 -05:00
![]() This PR includes / depends on #25709 The two concepts of Syncer and SyncClient are confusing, as is the current API for passing custom sync functions. This PR refactors Tune's syncing behavior. The Sync client concept is hard deprecated. Instead, we offer a well defined Syncer API that can be extended to provide own syncing functionality. However, the default will be to use Ray AIRs file transfer utilities. New API: - Users can pass `syncer=CustomSyncer` which implements the `Syncer` API - Otherwise our off-the-shelf syncing is used - As before, syncing to cloud disables syncing to driver Changes: - Sync client is removed - Syncer interface introduced - _DefaultSyncer is a wrapper around the URI upload/download API from Ray AIR - SyncerCallback only uses remote tasks to synchronize data - Rsync syncing is fully depracated and removed - Docker and kubernetes-specific syncing is fully deprecated and removed - Testing is improved to use `file://` URIs instead of mock sync clients |
||
---|---|---|
.. | ||
tests | ||
__init__.py | ||
ddppo.py | ||
README.md |
Decentralized Distributed Proximal Policy Optimization (DDPPO)
Overview
PPO is a model-free on-policy RL algorithm that works well for both discrete and continuous action space environments. PPO utilizes an actor-critic framework, where there are two networks, an actor (policy network) and critic network (value function).
Distributed PPO Algorithms
Distributed baseline PPO
Asychronous PPO (APPO)
Decentralized Distributed PPO (DDPPO) ..
.. removes the assumption that gradient-updates must be done on a central node. Instead, gradients are computed remotely on each data collection node and all-reduced at each mini-batch using torch distributed. This allows each worker’s GPU to be used both for sampling and for training.