hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

History

Sven Mika 9c5a0cfd7a [RLlib] Issue 14385: `Policy.compute_actions_from_input_dict` does not properly track accessed fields for Policy's view requirements. (#14386 )		2021-04-11 18:20:04 +02:00
..
tests	[RLlib] Issue 14385: `Policy.compute_actions_from_input_dict` does not properly track accessed fields for Policy's view requirements. (#14386 )	2021-04-11 18:20:04 +02:00
__init__.py	[RLlib] SAC algo cleanup. (#10825 )	2020-09-20 11:27:02 +02:00
README.md	[RLlib] Improved Documentation for PPO, DDPG, and SAC (#12943 )	2020-12-24 09:31:35 -05:00
sac.py	[RLlib] Redo issue 14533 tf enable eager exec (#14984 )	2021-03-29 20:07:44 +02:00
sac_tf_model.py	[RLlib] Extend on_learn_on_batch callback to allow for custom metrics to be added. (#13584 )	2021-02-08 15:02:19 +01:00
sac_tf_policy.py	[RLlib] Update sac_tf_policy.py (add tf.cast to float32 for rewards) (#14843 )	2021-03-24 16:12:55 +01:00
sac_torch_model.py	[RLlib] Extend on_learn_on_batch callback to allow for custom metrics to be added. (#13584 )	2021-02-08 15:02:19 +01:00
sac_torch_policy.py	[RLlib] Obsolete usage tracking dict via sample batch. (#13065 )	2021-03-17 08:18:15 +01:00

README.md

Soft Actor Critic (SAC)

Overview

SAC is a SOTA model-free off-policy RL algorithm that performs remarkably well on continuous-control domains. SAC employs an actor-critic framework and combats high sample complexity and training stability via learning based on a maximum-entropy framework. Unlike the standard RL objective which aims to maximize sum of reward into the future, SAC seeks to optimize sum of rewards as well as expected entropy over the current policy. In addition to optimizing over an actor and critic with entropy-based objectives, SAC also optimizes for the entropy coeffcient.

Documentation & Implementation:

Soft Actor-Critic Algorithm (SAC) with also discrete-action support.

Detailed Documentation

Implementation