hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

History

Yi Cheng fd0f967d2e Revert "[RLlib] Move (A/DD)?PPO and IMPALA algos to `algorithms` dir and rename policy and trainer classes. (#25346 )" (#25420 ) This reverts commit `e4ceae19ef`. Reverts #25346 linux://python/ray/tests:test_client_library_integration never fail before this PR. In the CI of the reverted PR, it also fails (https://buildkite.com/ray-project/ray-builders-pr/builds/34079#01812442-c541-4145-af22-2a012655c128). So high likely it's because of this PR. And test output failure seems related as well (https://buildkite.com/ray-project/ray-builders-branch/builds/7923#018125c2-4812-4ead-a42f-7fddb344105b)		2022-06-02 20:38:44 -07:00
..
tests	[RLlib] MAML config objects. (#25066 )	2022-05-23 10:14:24 +02:00
__init__.py	[RLlib] MAML config objects. (#25066 )	2022-05-23 10:14:24 +02:00
maml.py	[RLlib] Migrate PPO Impala and APPO policies to use sub-classing implementation. (#25117 )	2022-05-25 14:38:03 +02:00
maml_tf_policy.py	Revert "[RLlib] Move (A/DD)?PPO and IMPALA algos to `algorithms` dir and rename policy and trainer classes. (#25346 )" (#25420 )	2022-06-02 20:38:44 -07:00
maml_torch_policy.py	Revert "[RLlib] Move (A/DD)?PPO and IMPALA algos to `algorithms` dir and rename policy and trainer classes. (#25346 )" (#25420 )	2022-06-02 20:38:44 -07:00
README.md	[RLlib] Retry agents -> algorithms. with proper doc changes this time. (#24797 )	2022-05-16 09:45:32 +02:00

README.md

Model Agnostic Meta-learning (MAML)

Overview

MAML is an on-policy meta RL algorithm. Unlike standard RL algorithms, which aim to maximize the sum of rewards into the future for a single task (e.g. HalfCheetah), meta RL algorithms seek to maximize the sum of rewards for a given distribution of tasks.

On a high level, MAML seeks to learn quick adaptation across different tasks (e.g. different velocities for HalfCheetah). Quick adaptation is defined by the number of gradient steps it takes to adapt. MAML aims to maximize the RL objective for each task after X gradient steps. Doing this requires partitioning the algorithm into two steps. The first step is data collection. This involves collecting data for each task for each step of adaptation (from 1, 2, ..., X). The second step is the meta-update step. This second step takes all the aggregated ddata from the first step and computes the meta-gradient.

Documentation & Implementation:

MAML.

Detailed Documentation

Implementation