ray/rllib/models/torch
Sven Mika 165a86f1ab
[RLlib] SAC MuJoCo instability issues (tf and torch versions). (#8063)
SAC (both torch and tf versions) are showing issues (crashes) due to numeric instabilities in the SquashedGaussian distribution (sampling + logp after extreme NN outputs).
This PR fixes these. Stable MuJoCo learning (HalfCheetah) has been confirmed on both tf and torch versions. A Distribution stability test (using extreme NN outputs) has been added for SquashedGaussian (can be used for any other type of distribution as well).
2020-04-19 10:20:23 +02:00
..
__init__.py [RLlib] Working/learning example: PPO + torch + LSTM. (#7797) 2020-03-31 22:00:28 -07:00
fcnet.py [RLlib] SAC Torch (incl. Atari learning) (#7984) 2020-04-15 13:25:16 +02:00
misc.py [RLlib] SAC Torch (incl. Atari learning) (#7984) 2020-04-15 13:25:16 +02:00
recurrent_torch_model.py [RLlib] Working/learning example: PPO + torch + LSTM. (#7797) 2020-03-31 22:00:28 -07:00
torch_action_dist.py [RLlib] SAC MuJoCo instability issues (tf and torch versions). (#8063) 2020-04-19 10:20:23 +02:00
torch_modelv2.py [RLlib] DQN torch version. (#7597) 2020-04-06 11:56:16 -07:00
visionnet.py [RLlib] SAC Torch (incl. Atari learning) (#7984) 2020-04-15 13:25:16 +02:00