..
a2c
[RLlib] Using PG when not doing microbatching kills A2C performance. ( #26844 )
2022-07-25 15:11:26 +02:00
a3c
[RLlib] Using PG when not doing microbatching kills A2C performance. ( #26844 )
2022-07-25 15:11:26 +02:00
alpha_star
[RLlib] Move IMPALA and APPO back to exec plan (for now; due to unresolved learning/performance issues). ( #25851 )
2022-06-29 08:41:47 +02:00
alpha_zero
[RLlib] Make QMix use the ReplayBufferAPI ( #25560 )
2022-06-23 22:55:22 -07:00
apex_ddpg
[RLlib] Algorithm step()
fixes: evaluation should NOT be part of timed training_step
loop. ( #25924 )
2022-06-20 19:53:47 +02:00
apex_dqn
[AIR] Remove ML code from ray.util
( #27005 )
2022-07-27 14:24:19 +01:00
appo
[RLlib] Unify gnorm mixin for tf and torch policies. ( #26102 )
2022-07-24 15:31:09 +02:00
ars
[RLlib] More Trainer -> Algorithm renaming cleanups. ( #25869 )
2022-06-20 15:54:00 +02:00
bandit
[RLlib] Trainer
to Algorithm
renaming. ( #25539 )
2022-06-11 15:10:39 +02:00
bc
[RLlib]: Raise deprecation warning in MARWIL OPE methods. ( #26893 )
2022-07-23 13:55:40 +02:00
cql
[RLlib]: Move OPE to evaluation config ( #25911 )
2022-07-12 11:04:34 -07:00
crr
[RLlib] Fixes CRR flakeyness ( #26770 )
2022-07-20 12:08:57 -07:00
ddpg
[RLlib] Fix a bunch of issues related to connectors. ( #26510 )
2022-07-13 18:55:20 +02:00
ddppo
[RLlib] Cleanup some deprecated metric keys and classes. ( #26036 )
2022-06-23 21:30:01 +02:00
dqn
[RLlib] Fix memory leak in APEX_DQN ( #26691 )
2022-07-19 16:16:24 -07:00
dreamer
[RLlib] Simplify agent collector ( #26803 )
2022-07-25 13:17:17 -07:00
es
[RLlib] More Trainer -> Algorithm renaming cleanups. ( #25869 )
2022-06-20 15:54:00 +02:00
impala
[RLlib] Unify gnorm mixin for tf and torch policies. ( #26102 )
2022-07-24 15:31:09 +02:00
maddpg
[RLlib] Save serialized PolicySpec. Extract num_gpus
related logics into a util function. ( #25954 )
2022-06-30 11:38:21 +02:00
maml
[RLlib] Fix a bunch of issues related to connectors. ( #26510 )
2022-07-13 18:55:20 +02:00
marwil
[RLlib]: Raise deprecation warning in MARWIL OPE methods. ( #26893 )
2022-07-23 13:55:40 +02:00
mbmpo
[RLlib] Trainer
to Algorithm
renaming. ( #25539 )
2022-06-11 15:10:39 +02:00
pg
[RLlib] Fix a bunch of issues related to connectors. ( #26510 )
2022-07-13 18:55:20 +02:00
ppo
[RLlib] Unify gnorm mixin for tf and torch policies. ( #26102 )
2022-07-24 15:31:09 +02:00
qmix
[RLlib] Make QMix use the ReplayBufferAPI ( #25560 )
2022-06-23 22:55:22 -07:00
r2d2
[RLlib] More Trainer -> Algorithm renaming cleanups. ( #25869 )
2022-06-20 15:54:00 +02:00
sac
[RLlib] Migrating DDPG to PolicyV2. ( #26054 )
2022-06-28 15:52:56 +02:00
simple_q
[RLlib] Fix a bunch of issues related to connectors. ( #26510 )
2022-07-13 18:55:20 +02:00
slateq
[RLlib] Trainer
to Algorithm
renaming. ( #25539 )
2022-06-11 15:10:39 +02:00
td3
[RLlib] More Trainer -> Algorithm renaming cleanups. ( #25869 )
2022-06-20 15:54:00 +02:00
tests
[RLlib] Beef up worker failure test. ( #26953 )
2022-07-27 00:10:45 -07:00
__init__.py
[RLlib] Trainer
to Algorithm
renaming. ( #25539 )
2022-06-11 15:10:39 +02:00
algorithm.py
[RLlib] Beef up worker failure test. ( #26953 )
2022-07-27 00:10:45 -07:00
algorithm_config.py
[RLlib] Beef up worker failure test. ( #26953 )
2022-07-27 00:10:45 -07:00
callbacks.py
[RLlib] more connector polishes and fixes. ( #26645 )
2022-07-19 08:50:28 -07:00
mock.py
[RLlib] Trainer
to Algorithm
renaming. ( #25539 )
2022-06-11 15:10:39 +02:00
registry.py
[RLlib] Try to checkpoint a durable policy name ( #27016 )
2022-07-27 00:01:14 -07:00