ray/rllib/agents
Avnish Narayan 6dc1a6b72f
[RLlib] Raise error for kl penalty ddpo (#18959)
* [RLlib] Raise error for kl penalty ddpo

DDPPO doesn't support KL penalties like PPO-1.
In order to support KL penalties, DDPPO would need to
become undecentralized, which defeats the purpose of the
algorithm. Users can still tune the entropy coefficient to
control the policy entropy (similar to controlling the KL
penalty.)

* Update rllib/agents/ppo/ddppo.py

Co-authored-by: avnishn <avnishnarayan@gmail.com>
Co-authored-by: Sven Mika <sven@anyscale.io>
2021-09-30 10:56:22 +02:00
..
a3c [RLlib] Move existing fake multi-GPU learning tests into separate buildkite job. (#18065) 2021-08-31 14:56:53 +02:00
ars [RLlib] DDPPO fixes and benchmarks. (#18390) 2021-09-08 19:39:01 +02:00
cql [RLlib Testing] Lower --smoke-test "time_total_s" to make sure it doesn't time out. (#18670) 2021-09-16 18:22:23 +02:00
ddpg [RLlib] Move existing fake multi-GPU learning tests into separate buildkite job. (#18065) 2021-08-31 14:56:53 +02:00
dqn [RLlib] Allow n-step > 1 and prio. replay for R2D2 and RNNSAC. (#18939) 2021-09-29 21:31:34 +02:00
dreamer [RLlib] Dreamer fixes and reinstate Dreamer test. (#17821) 2021-08-18 18:47:08 +02:00
es [RLlib] DDPPO fixes and benchmarks. (#18390) 2021-09-08 19:39:01 +02:00
impala [RLlib] Add support for IMPALA to handle more than one loss/optimizer (analogous to recent enhancement for APPO). (#18971) 2021-09-29 21:30:04 +02:00
maml [RLlib] CQL TensorFlow support (#15841) 2021-05-18 11:10:46 +02:00
marwil [RLlib] MARWIL + BC: Various fixes and enhancements. (#16218) 2021-06-03 22:29:00 +02:00
mbmpo [RLlib] Multi-GPU for tf-DQN/PG/A2C. (#13393) 2021-03-08 15:41:27 +01:00
pg [RLlib] Fix Atari learning test regressions (2 bugs) and 1 minor attention net bug. (#18306) 2021-09-03 13:29:57 +02:00
ppo [RLlib] Raise error for kl penalty ddpo (#18959) 2021-09-30 10:56:22 +02:00
qmix [RLlib] CQL BC loss fixes; PPO/PG/A2|3C action normalization fixes (#16531) 2021-06-30 12:32:11 +02:00
sac [RLlib] Allow n-step > 1 and prio. replay for R2D2 and RNNSAC. (#18939) 2021-09-29 21:31:34 +02:00
slateq [RLlib] Multi-GPU for tf-DQN/PG/A2C. (#13393) 2021-03-08 15:41:27 +01:00
tests [RLlib] Faster remote worker space inference (don't infer if not required). (#18805) 2021-09-23 10:54:37 +02:00
__init__.py [RLlib] Fixing Memory Leak In Multi-Agent environments. Adding tooling for finding memory leaks in workers. (#15815) 2021-05-18 13:23:00 +02:00
callbacks.py [RLlib] Add policies arg to callback: on_episode_step (already exists in all other episode-related callbacks) (#18119) 2021-08-27 16:12:19 +02:00
mock.py [Testing] Split RLlib example scripts CI tests into 4 jobs (from 2). (#17331) 2021-07-26 10:52:55 -04:00
registry.py [RLlib] Add @Deprecated decorator to simplify/unify deprecation of classes, methods, functions. (#17530) 2021-08-03 18:30:02 -04:00
trainer.py [RLlib] No Preprocessors (part 2). (#18468) 2021-09-23 12:56:45 +02:00
trainer_template.py [RLlib] Add support for evaluation_num_episodes=auto (run eval for as long as the parallel train step takes). (#18380) 2021-09-07 08:08:37 +02:00