mirror of
https://github.com/vale981/ray
synced 2025-03-06 10:31:39 -05:00
![]() SAC (both torch and tf versions) are showing issues (crashes) due to numeric instabilities in the SquashedGaussian distribution (sampling + logp after extreme NN outputs). This PR fixes these. Stable MuJoCo learning (HalfCheetah) has been confirmed on both tf and torch versions. A Distribution stability test (using extreme NN outputs) has been added for SquashedGaussian (can be used for any other type of distribution as well). |
||
---|---|---|
.. | ||
__init__.py | ||
fcnet_v1.py | ||
fcnet_v2.py | ||
lstm_v1.py | ||
misc.py | ||
modelv1_compat.py | ||
recurrent_tf_modelv2.py | ||
tf_action_dist.py | ||
tf_modelv2.py | ||
visionnet_v1.py | ||
visionnet_v2.py |