hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

History

Sven Mika 732197e23a [RLlib] Multi-GPU for tf-DQN/PG/A2C. (#13393 )		2021-03-08 15:41:27 +01:00
..
tests	MBMPO Cartpole (#11832 )	2020-11-12 10:30:41 -08:00
__init__.py	[RLlib] Implementation of "Model-based Meta Policy Optimization" (MB MPO) (#9409 )	2020-08-02 18:12:09 +02:00
mbmpo.py	[RLlib] Multi-GPU for tf-DQN/PG/A2C. (#13393 )	2021-03-08 15:41:27 +01:00
mbmpo_torch_policy.py	[RLlib] Issue #13342 : Add `validate_spaces` to MB-MPO. (#14038 )	2021-02-11 11:36:53 +01:00
model_ensemble.py	[RLlib] Issue #13507 : Fix MB-MPO CartPole Env's reward function as well as MB-MPO running into a traj. view API related issue. (#14037 )	2021-02-11 18:58:46 +01:00
README.md	[RLLib] Readme.md Documentation for Almost All Algorithms in rllib/agents (#13035 )	2020-12-29 18:45:55 -05:00
utils.py	[RLlib] MB-MPO cleanup (comments, docstrings, type annotations). (#11033 )	2020-10-06 20:28:16 +02:00

README.md

Model-based Meta-Policy Optimization (MB-MPO)

Code in this package is adapted from https://github.com/jonasrothfuss/model_ensemble_meta_learning.

Overview

MBMPO is an on-policy model-based algorithm. On a high level, MBMPO is model-based MAML. On top of MAML, MBMPO learns an ensemble of dynamics models. MBMPO trains the dynamics models with real-life data and the actor/critic networks with fake data generated by the dynamics models. The actor and critic are updated via the MAML algorithm. For the distributed execution plan, MBMPO alternates between training the dynanmics model and training the actor and critic network.

More details can be found here.

Documentation & Implementation:

MBMPO.

Detailed Documentation

Implementation