hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-10 05:16:49 -04:00

History

Sven Mika 2e3655e8a9 [RLlib] Issue 9071 A3C w/ RNN not working due to VF assuming no RNN. (#13238 )		2021-01-19 14:22:36 +01:00
..
tests	[RLlib] Issue 11591: SAC loss does not use PR-weights in critic loss term. (#12394 )	2020-11-25 11:28:46 -08:00
__init__.py	[RLlib] SAC algo cleanup. (#10825 )	2020-09-20 11:27:02 +02:00
README.md	[RLlib] Improved Documentation for PPO, DDPG, and SAC (#12943 )	2020-12-24 09:31:35 -05:00
sac.py	[RLlib] Issue 9071 A3C w/ RNN not working due to VF assuming no RNN. (#13238 )	2021-01-19 14:22:36 +01:00
sac_tf_model.py	[RLlib] Redo: Make TFModelV2 fully modular like TorchModelV2 (soft-deprecate register_variables, unify var names wrt torch). (#13363 )	2021-01-14 14:44:33 +01:00
sac_tf_policy.py	[RLlib] Issue 11591: SAC loss does not use PR-weights in critic loss term. (#12394 )	2020-11-25 11:28:46 -08:00
sac_torch_model.py	[RLlib] JAXPolicy prep PR #2 (move get_activation_fn (backward-compatibly), minor fixes and preparations). (#13091 )	2020-12-30 22:30:52 -05:00
sac_torch_policy.py	[RLlib] Issue 9071 A3C w/ RNN not working due to VF assuming no RNN. (#13238 )	2021-01-19 14:22:36 +01:00

README.md

Soft Actor Critic (SAC)

Overview

SAC is a SOTA model-free off-policy RL algorithm that performs remarkably well on continuous-control domains. SAC employs an actor-critic framework and combats high sample complexity and training stability via learning based on a maximum-entropy framework. Unlike the standard RL objective which aims to maximize sum of reward into the future, SAC seeks to optimize sum of rewards as well as expected entropy over the current policy. In addition to optimizing over an actor and critic with entropy-based objectives, SAC also optimizes for the entropy coeffcient.

Documentation & Implementation:

Soft Actor-Critic Algorithm (SAC) with also discrete-action support.

Detailed Documentation

Implementation