mirror of
https://github.com/vale981/ray
synced 2025-03-08 19:41:38 -05:00
![]() * [RLlib] Unify the way we create and use LocalReplayBuffer for all the agents. This change 1. Get rid of the try...except clause when we call execution_plan(), and get rid of the Deprecation warning as a result. 2. Fix the execution_plan() call in Trainer._try_recover() too. 3. Most importantly, makes it much easier to create and use different types of local replay buffers for all our agents. E.g., allow us to easily create a reservoir sampling replay buffer for APPO agent for Riot in the near future. * Introduce explicit configuration for replay buffer types. * Fix is_training key error. * actually deprecate buffer_size field. |
||
---|---|---|
.. | ||
tests | ||
__init__.py | ||
mbmpo.py | ||
mbmpo_torch_policy.py | ||
model_ensemble.py | ||
README.md | ||
utils.py |
Model-based Meta-Policy Optimization (MB-MPO)
Code in this package is adapted from https://github.com/jonasrothfuss/model_ensemble_meta_learning.
Overview
MBMPO is an on-policy model-based algorithm. On a high level, MBMPO is model-based MAML. On top of MAML, MBMPO learns an ensemble of dynamics models. MBMPO trains the dynamics models with real-life data and the actor/critic networks with fake data generated by the dynamics models. The actor and critic are updated via the MAML algorithm. For the distributed execution plan, MBMPO alternates between training the dynanmics model and training the actor and critic network.
More details can be found here.
Documentation & Implementation:
MBMPO.