mirror of
https://github.com/vale981/ray
synced 2025-03-08 19:41:38 -05:00
![]() * Fix DDPG, since it is based on GenericOffPolicyTrainer. * Fix QMix, SAC, and MADDPA too. * Undo QMix change. * Fix DQN input batch type. Always use SampleBatch. * apex ddpg should not use replay_buffer_config yet. * Make eager tf policy to use SampleBatch. * lint * LINT. * Re-enable RLlib broken tests to make sure things work ok now. * fixes. Co-authored-by: sven1977 <svenmika1977@gmail.com> |
||
---|---|---|
.. | ||
tests | ||
__init__.py | ||
README.md | ||
rnnsac.py | ||
rnnsac_torch_model.py | ||
rnnsac_torch_policy.py | ||
sac.py | ||
sac_tf_model.py | ||
sac_tf_policy.py | ||
sac_torch_model.py | ||
sac_torch_policy.py |
Soft Actor Critic (SAC)
Overview
SAC is a SOTA model-free off-policy RL algorithm that performs remarkably well on continuous-control domains. SAC employs an actor-critic framework and combats high sample complexity and training stability via learning based on a maximum-entropy framework. Unlike the standard RL objective which aims to maximize sum of reward into the future, SAC seeks to optimize sum of rewards as well as expected entropy over the current policy. In addition to optimizing over an actor and critic with entropy-based objectives, SAC also optimizes for the entropy coeffcient.
Documentation & Implementation:
Soft Actor-Critic Algorithm (SAC) with also discrete-action support.