mirror of
https://github.com/vale981/ray
synced 2025-03-06 18:41:40 -05:00
![]() * [RLlib] Raise error for kl penalty ddpo DDPPO doesn't support KL penalties like PPO-1. In order to support KL penalties, DDPPO would need to become undecentralized, which defeats the purpose of the algorithm. Users can still tune the entropy coefficient to control the policy entropy (similar to controlling the KL penalty.) * Update rllib/agents/ppo/ddppo.py Co-authored-by: avnishn <avnishnarayan@gmail.com> Co-authored-by: Sven Mika <sven@anyscale.io> |
||
---|---|---|
.. | ||
test_appo.py | ||
test_ddppo.py | ||
test_ppo.py |