ray/rllib/agents at 6dc1a6b72f687c8ed8bcb8e687d928afd442f5f5 - hiro/ray

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

History

Avnish Narayan 6dc1a6b72f [RLlib] Raise error for kl penalty ddpo (#18959 ) * [RLlib] Raise error for kl penalty ddpo DDPPO doesn't support KL penalties like PPO-1. In order to support KL penalties, DDPPO would need to become undecentralized, which defeats the purpose of the algorithm. Users can still tune the entropy coefficient to control the policy entropy (similar to controlling the KL penalty.) * Update rllib/agents/ppo/ddppo.py Co-authored-by: avnishn <avnishnarayan@gmail.com> Co-authored-by: Sven Mika <sven@anyscale.io>		2021-09-30 10:56:22 +02:00
..
a3c	[RLlib] Move existing fake multi-GPU learning tests into separate buildkite job. (#18065 )	2021-08-31 14:56:53 +02:00
ars	[RLlib] DDPPO fixes and benchmarks. (#18390 )	2021-09-08 19:39:01 +02:00
cql	[RLlib Testing] Lower `--smoke-test` "time_total_s" to make sure it doesn't time out. (#18670 )	2021-09-16 18:22:23 +02:00
ddpg	[RLlib] Move existing fake multi-GPU learning tests into separate buildkite job. (#18065 )	2021-08-31 14:56:53 +02:00
dqn	[RLlib] Allow n-step > 1 and prio. replay for R2D2 and RNNSAC. (#18939 )	2021-09-29 21:31:34 +02:00
dreamer	[RLlib] Dreamer fixes and reinstate Dreamer test. (#17821 )	2021-08-18 18:47:08 +02:00
es	[RLlib] DDPPO fixes and benchmarks. (#18390 )	2021-09-08 19:39:01 +02:00
impala	[RLlib] Add support for IMPALA to handle more than one loss/optimizer (analogous to recent enhancement for APPO). (#18971 )	2021-09-29 21:30:04 +02:00
maml	[RLlib] CQL TensorFlow support (#15841 )	2021-05-18 11:10:46 +02:00
marwil	[RLlib] MARWIL + BC: Various fixes and enhancements. (#16218 )	2021-06-03 22:29:00 +02:00
mbmpo	[RLlib] Multi-GPU for tf-DQN/PG/A2C. (#13393 )	2021-03-08 15:41:27 +01:00
pg	[RLlib] Fix Atari learning test regressions (2 bugs) and 1 minor attention net bug. (#18306 )	2021-09-03 13:29:57 +02:00
ppo	[RLlib] Raise error for kl penalty ddpo (#18959 )	2021-09-30 10:56:22 +02:00
qmix	[RLlib] CQL BC loss fixes; PPO/PG/A2\|3C action normalization fixes (#16531 )	2021-06-30 12:32:11 +02:00
sac	[RLlib] Allow n-step > 1 and prio. replay for R2D2 and RNNSAC. (#18939 )	2021-09-29 21:31:34 +02:00
slateq	[RLlib] Multi-GPU for tf-DQN/PG/A2C. (#13393 )	2021-03-08 15:41:27 +01:00
tests	[RLlib] Faster remote worker space inference (don't infer if not required). (#18805 )	2021-09-23 10:54:37 +02:00
__init__.py	[RLlib] Fixing Memory Leak In Multi-Agent environments. Adding tooling for finding memory leaks in workers. (#15815 )	2021-05-18 13:23:00 +02:00
callbacks.py	[RLlib] Add `policies` arg to callback: `on_episode_step` (already exists in all other episode-related callbacks) (#18119 )	2021-08-27 16:12:19 +02:00
mock.py	[Testing] Split RLlib example scripts CI tests into 4 jobs (from 2). (#17331 )	2021-07-26 10:52:55 -04:00
registry.py	[RLlib] Add @Deprecated decorator to simplify/unify deprecation of classes, methods, functions. (#17530 )	2021-08-03 18:30:02 -04:00
trainer.py	[RLlib] No Preprocessors (part 2). (#18468 )	2021-09-23 12:56:45 +02:00
trainer_template.py	[RLlib] Add support for evaluation_num_episodes=auto (run eval for as long as the parallel train step takes). (#18380 )	2021-09-07 08:08:37 +02:00