ray/rllib/agents/sac
gjoliver 99a0088233
[RLlib] Unify the way we create local replay buffer for all agents (#19627)
* [RLlib] Unify the way we create and use LocalReplayBuffer for all the agents.

This change
1. Get rid of the try...except clause when we call execution_plan(),
   and get rid of the Deprecation warning as a result.
2. Fix the execution_plan() call in Trainer._try_recover() too.
3. Most importantly, makes it much easier to create and use different types
   of local replay buffers for all our agents.
   E.g., allow us to easily create a reservoir sampling replay buffer for
   APPO agent for Riot in the near future.
* Introduce explicit configuration for replay buffer types.
* Fix is_training key error.
* actually deprecate buffer_size field.
2021-10-26 20:56:02 +02:00
..
tests [RLlib] Issue 18418: SAC w/ dict space not working. (#19101) 2021-10-06 09:05:50 +02:00
__init__.py [RLlib] Add RNN-SAC agent (#16577) 2021-07-25 10:04:52 -04:00
README.md [RLlib] Improved Documentation for PPO, DDPG, and SAC (#12943) 2020-12-24 09:31:35 -05:00
rnnsac.py [RLlib] Allow n-step > 1 and prio. replay for R2D2 and RNNSAC. (#18939) 2021-09-29 21:31:34 +02:00
rnnsac_torch_model.py [RLlib] Add RNN-SAC agent (#16577) 2021-07-25 10:04:52 -04:00
rnnsac_torch_policy.py [RLlib] Issue 18812: Torch multi-GPU stats not protected against race conditions. (#18937) 2021-10-04 13:29:00 +02:00
sac.py [RLlib Testing] Lower --smoke-test "time_total_s" to make sure it doesn't time out. (#18670) 2021-09-16 18:22:23 +02:00
sac_tf_model.py [RLlib] Unify the way we create local replay buffer for all agents (#19627) 2021-10-26 20:56:02 +02:00
sac_tf_policy.py [RLlib] Issue 18418: SAC w/ dict space not working. (#19101) 2021-10-06 09:05:50 +02:00
sac_torch_model.py [RLlib] Unify the way we create local replay buffer for all agents (#19627) 2021-10-26 20:56:02 +02:00
sac_torch_policy.py [RLlib] Issue 18812: Torch multi-GPU stats not protected against race conditions. (#18937) 2021-10-04 13:29:00 +02:00

Soft Actor Critic (SAC)

Overview

SAC is a SOTA model-free off-policy RL algorithm that performs remarkably well on continuous-control domains. SAC employs an actor-critic framework and combats high sample complexity and training stability via learning based on a maximum-entropy framework. Unlike the standard RL objective which aims to maximize sum of reward into the future, SAC seeks to optimize sum of rewards as well as expected entropy over the current policy. In addition to optimizing over an actor and critic with entropy-based objectives, SAC also optimizes for the entropy coeffcient.

Documentation & Implementation:

Soft Actor-Critic Algorithm (SAC) with also discrete-action support.

Detailed Documentation

Implementation