hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-05 18:11:42 -05:00

History

Jun Gong b383d987d1 [RLlib] Fix a bunch of issues related to connectors. (#26510 )		2022-07-13 18:55:20 +02:00
..
tests	[RLlib] Aggregate Impala learner info. (#25856 )	2022-06-22 09:43:10 +02:00
__init__.py	[RLlib] Move all remaining algos into `algorithms` directory. (#25366 )	2022-06-04 07:35:24 +02:00
appo.py	[RLlib] Move IMPALA and APPO back to exec plan (for now; due to unresolved learning/performance issues). (#25851 )	2022-06-29 08:41:47 +02:00
appo_tf_policy.py	[RLlib] Fix a bunch of issues related to connectors. (#26510 )	2022-07-13 18:55:20 +02:00
appo_torch_policy.py	[RLlib] IMPALA/APPO multi-agent mix-in-buffer fixes (plus MA learning tests). (#25848 )	2022-06-17 14:10:36 +02:00
README.md	[RLlib] Move all remaining algos into `algorithms` directory. (#25366 )	2022-06-04 07:35:24 +02:00
utils.py	[RLlib] IMPALA/APPO multi-agent mix-in-buffer fixes (plus MA learning tests). (#25848 )	2022-06-17 14:10:36 +02:00

README.md

Asynchronous Proximal Policy Optimization (APPO)

Overview

PPO is a model-free on-policy RL algorithm that works well for both discrete and continuous action space environments. PPO utilizes an actor-critic framework, where there are two networks, an actor (policy network) and critic network (value function).

Distributed PPO Algorithms

Distributed baseline PPO

See implementation here

Asychronous PPO (APPO) ..

.. opts to imitate IMPALA as its distributed execution plan. Data collection nodes gather data asynchronously, which are collected in a circular replay buffer. A target network and doubly-importance sampled surrogate objective is introduced to enforce training stability in the asynchronous data-collection setting. See implementation here

Decentralized Distributed PPO (DDPPO)

See implementation here

Documentation & Implementation:

Asynchronous Proximal Policy Optimization (APPO).

Detailed Documentation

Implementation