ray/rllib/algorithms
Jun Gong ca5e0dcaf4
[RLLib] Record framework and algorithm used by an RLlib run. (#26956)
Automatically record framework and algorithm used by RLlib jobs.
For better planning.
2022-07-25 16:16:36 -07:00
..
a2c [RLlib] Using PG when not doing microbatching kills A2C performance. (#26844) 2022-07-25 15:11:26 +02:00
a3c [RLlib] Using PG when not doing microbatching kills A2C performance. (#26844) 2022-07-25 15:11:26 +02:00
alpha_star [RLlib] Move IMPALA and APPO back to exec plan (for now; due to unresolved learning/performance issues). (#25851) 2022-06-29 08:41:47 +02:00
alpha_zero [RLlib] Make QMix use the ReplayBufferAPI (#25560) 2022-06-23 22:55:22 -07:00
apex_ddpg [RLlib] Algorithm step() fixes: evaluation should NOT be part of timed training_step loop. (#25924) 2022-06-20 19:53:47 +02:00
apex_dqn [RLlib] Fix memory leak in APEX_DQN (#26691) 2022-07-19 16:16:24 -07:00
appo [RLlib] Unify gnorm mixin for tf and torch policies. (#26102) 2022-07-24 15:31:09 +02:00
ars [RLlib] More Trainer -> Algorithm renaming cleanups. (#25869) 2022-06-20 15:54:00 +02:00
bandit [RLlib] Trainer to Algorithm renaming. (#25539) 2022-06-11 15:10:39 +02:00
bc [RLlib]: Raise deprecation warning in MARWIL OPE methods. (#26893) 2022-07-23 13:55:40 +02:00
cql [RLlib]: Move OPE to evaluation config (#25911) 2022-07-12 11:04:34 -07:00
crr [RLlib] Fixes CRR flakeyness (#26770) 2022-07-20 12:08:57 -07:00
ddpg [RLlib] Fix a bunch of issues related to connectors. (#26510) 2022-07-13 18:55:20 +02:00
ddppo [RLlib] Cleanup some deprecated metric keys and classes. (#26036) 2022-06-23 21:30:01 +02:00
dqn [RLlib] Fix memory leak in APEX_DQN (#26691) 2022-07-19 16:16:24 -07:00
dreamer [RLlib] Simplify agent collector (#26803) 2022-07-25 13:17:17 -07:00
es [RLlib] More Trainer -> Algorithm renaming cleanups. (#25869) 2022-06-20 15:54:00 +02:00
impala [RLlib] Unify gnorm mixin for tf and torch policies. (#26102) 2022-07-24 15:31:09 +02:00
maddpg [RLlib] Save serialized PolicySpec. Extract num_gpus related logics into a util function. (#25954) 2022-06-30 11:38:21 +02:00
maml [RLlib] Fix a bunch of issues related to connectors. (#26510) 2022-07-13 18:55:20 +02:00
marwil [RLlib]: Raise deprecation warning in MARWIL OPE methods. (#26893) 2022-07-23 13:55:40 +02:00
mbmpo [RLlib] Trainer to Algorithm renaming. (#25539) 2022-06-11 15:10:39 +02:00
pg [RLlib] Fix a bunch of issues related to connectors. (#26510) 2022-07-13 18:55:20 +02:00
ppo [RLlib] Unify gnorm mixin for tf and torch policies. (#26102) 2022-07-24 15:31:09 +02:00
qmix [RLlib] Make QMix use the ReplayBufferAPI (#25560) 2022-06-23 22:55:22 -07:00
r2d2 [RLlib] More Trainer -> Algorithm renaming cleanups. (#25869) 2022-06-20 15:54:00 +02:00
sac [RLlib] Migrating DDPG to PolicyV2. (#26054) 2022-06-28 15:52:56 +02:00
simple_q [RLlib] Fix a bunch of issues related to connectors. (#26510) 2022-07-13 18:55:20 +02:00
slateq [RLlib] Trainer to Algorithm renaming. (#25539) 2022-06-11 15:10:39 +02:00
td3 [RLlib] More Trainer -> Algorithm renaming cleanups. (#25869) 2022-06-20 15:54:00 +02:00
tests [RLlib] restart_failed_sub_environments now works for MA cases and crashes during reset(); +more tests and logging; add eval worker sub-env fault tolerance test. (#26276) 2022-07-15 08:55:14 +02:00
__init__.py [RLlib] Trainer to Algorithm renaming. (#25539) 2022-06-11 15:10:39 +02:00
algorithm.py [RLLib] Record framework and algorithm used by an RLlib run. (#26956) 2022-07-25 16:16:36 -07:00
algorithm_config.py [RLlib] more connector polishes and fixes. (#26645) 2022-07-19 08:50:28 -07:00
callbacks.py [RLlib] more connector polishes and fixes. (#26645) 2022-07-19 08:50:28 -07:00
mock.py [RLlib] Trainer to Algorithm renaming. (#25539) 2022-06-11 15:10:39 +02:00
registry.py [RLlib]: Fix OPE trainables (#26279) 2022-07-17 14:25:53 -07:00