hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-10 13:26:39 -04:00

History

Yi Cheng fd0f967d2e Revert "[RLlib] Move (A/DD)?PPO and IMPALA algos to `algorithms` dir and rename policy and trainer classes. (#25346 )" (#25420 ) This reverts commit `e4ceae19ef`. Reverts #25346 linux://python/ray/tests:test_client_library_integration never fail before this PR. In the CI of the reverted PR, it also fails (https://buildkite.com/ray-project/ray-builders-pr/builds/34079#01812442-c541-4145-af22-2a012655c128). So high likely it's because of this PR. And test output failure seems related as well (https://buildkite.com/ray-project/ray-builders-branch/builds/7923#018125c2-4812-4ead-a42f-7fddb344105b)		2022-06-02 20:38:44 -07:00
..
tests	[RLlib] Retry agents -> algorithms. with proper doc changes this time. (#24797 )	2022-05-16 09:45:32 +02:00
__init__.py	[RLlib] Retry agents -> algorithms. with proper doc changes this time. (#24797 )	2022-05-16 09:45:32 +02:00
dreamer.py	Revert "[RLlib] Move (A/DD)?PPO and IMPALA algos to `algorithms` dir and rename policy and trainer classes. (#25346 )" (#25420 )	2022-06-02 20:38:44 -07:00
dreamer_model.py	Clean up docstyle in python modules and add LINT rule (#25272 )	2022-06-01 11:27:54 -07:00
dreamer_torch_policy.py	[RLlib] Retry agents -> algorithms. with proper doc changes this time. (#24797 )	2022-05-16 09:45:32 +02:00
README.md	[rllib] Fix some missing agent->algorithm doc changes (#24841 )	2022-05-16 11:52:49 +01:00
utils.py	[RLlib] Retry agents -> algorithms. with proper doc changes this time. (#24797 )	2022-05-16 09:45:32 +02:00

README.md

Dreamer

Overview

Dreamer is a model-based off-policy RL algorithm that learns by imagining and works well in visual-based environments. Like all model-based algorithms, Dreamer learns the environment's transiton dynamics via a latent-space model called PlaNet. PlaNet learns to encode visual space into latent vectors, which can be used as pseudo-observations in Dreamer.

Dreamer is a gradient-based RL algorithm. This means that the agent imagines ahead using its learned transition dynamics model (PlaNet) to discover new rewards and states. Because imagining ahead is fully differentiable, the RL objective (maximizing the sum of rewards) is fully differentiable and does not need to be optimized indirectly such as policy gradient methods. This feature of gradient-based learning, in conjunction with PlaNet, enables the agent to learn in a latent space and achieves much better sample complexity and performance than other visual-based agents.

For more details, there is a Ray/RLlib blogpost that better covers the components of PlaNet and the distributed execution plan.

Documentation & Implementation:

Dreamer.

Detailed Documentation

Implementation