mirror of
https://github.com/vale981/ray
synced 2025-03-05 18:11:42 -05:00
.. | ||
tests | ||
__init__.py | ||
appo.py | ||
appo_tf_policy.py | ||
appo_torch_policy.py | ||
README.md | ||
utils.py |
Asynchronous Proximal Policy Optimization (APPO)
Overview
PPO is a model-free on-policy RL algorithm that works well for both discrete and continuous action space environments. PPO utilizes an actor-critic framework, where there are two networks, an actor (policy network) and critic network (value function).
Distributed PPO Algorithms
Distributed baseline PPO
Asychronous PPO (APPO) ..
.. opts to imitate IMPALA as its distributed execution plan. Data collection nodes gather data asynchronously, which are collected in a circular replay buffer. A target network and doubly-importance sampled surrogate objective is introduced to enforce training stability in the asynchronous data-collection setting. See implementation here