.. |
a2c
|
[RLlib] Unflake some CI-tests. (#25313)
|
2022-06-03 14:51:50 +02:00 |
a3c
|
[RLlib] A2C + A3C move to algorithms folder and re-name into A2C/A3C (from ...Trainer). (#25314)
|
2022-06-01 09:29:16 +02:00 |
alpha_star
|
[RLlib] Retry agents -> algorithms. with proper doc changes this time. (#24797)
|
2022-05-16 09:45:32 +02:00 |
alpha_zero
|
[RLlib] AlphaZero uses training_iteration API. (#24507)
|
2022-05-18 09:58:25 +02:00 |
apex_ddpg
|
[RLlib] Deprecation: Replace remaining evaluation_num_episodes with evaluation_duration . (#26000)
|
2022-06-23 19:11:29 +02:00 |
apex_dqn
|
[RLlib] Trainer.training_iteration -> Trainer.training_step; Iterations vs reportings: Clarification of terms. (#25076)
|
2022-06-10 17:09:18 +02:00 |
appo
|
[RLlib] IMPALA/APPO multi-agent mix-in-buffer fixes (plus MA learning tests). (#25848)
|
2022-06-17 14:10:36 +02:00 |
ars
|
[RLlib] Make sure torch and tf behave the same wrt conv2d nets. (#8785)
|
2020-06-20 00:05:19 +02:00 |
bandits
|
[RLlib] SlateQ (tf GPU + multi-GPU) + Bandit fixes (#23276)
|
2022-03-18 13:45:16 +01:00 |
bc
|
[RLlib] Move all remaining algos into algorithms directory. (#25366)
|
2022-06-04 07:35:24 +02:00 |
cql
|
[RLlib] Deprecation: Replace remaining evaluation_num_episodes with evaluation_duration . (#26000)
|
2022-06-23 19:11:29 +02:00 |
crr
|
[RLlib] Added expectation advantage_type option to CRR. (#26142)
|
2022-06-28 15:40:09 +02:00 |
ddpg
|
[RLlib] Deprecation: Replace remaining evaluation_num_episodes with evaluation_duration . (#26000)
|
2022-06-23 19:11:29 +02:00 |
ddppo
|
[RLlib] Move all remaining algos into algorithms directory. (#25366)
|
2022-06-04 07:35:24 +02:00 |
dqn
|
[RLlib] Trainer.training_iteration -> Trainer.training_step; Iterations vs reportings: Clarification of terms. (#25076)
|
2022-06-10 17:09:18 +02:00 |
dreamer
|
[RLlib] Dreamer (#10172)
|
2020-08-26 13:24:05 +02:00 |
es
|
[RLlib] 2 RLlib Flaky Tests (#14930)
|
2021-03-30 19:21:13 +02:00 |
impala
|
[RLlib] IMPALA/APPO multi-agent mix-in-buffer fixes (plus MA learning tests). (#25848)
|
2022-06-17 14:10:36 +02:00 |
maddpg
|
[RLlib] MADDPG: Move into main algorithms folder and add proper unit and learning tests. (#24579)
|
2022-05-24 12:53:53 +02:00 |
maml
|
[RLLib] MAML extension for all models except RNNs (#11337)
|
2020-11-12 16:51:40 -08:00 |
marwil
|
[RLlib] Move all remaining algos into algorithms directory. (#25366)
|
2022-06-04 07:35:24 +02:00 |
mbmpo
|
MBMPO Cartpole (#11832)
|
2020-11-12 10:30:41 -08:00 |
pg
|
[RLlib] Algorithm step() fixes: evaluation should NOT be part of timed training_step loop. (#25924)
|
2022-06-20 19:53:47 +02:00 |
ppo
|
[RLlib] Deprecation: Replace remaining evaluation_num_episodes with evaluation_duration . (#26000)
|
2022-06-23 19:11:29 +02:00 |
qmix
|
[RLlib] QMIX better defaults + added to CI learning tests (#21332)
|
2022-01-04 08:54:41 +01:00 |
r2d2
|
[RLlib] Better default values for training_intensity and target_network_update_freq for R2D2. (#25510)
|
2022-06-07 10:29:56 +02:00 |
sac
|
[RLlib] Trainer.training_iteration -> Trainer.training_step; Iterations vs reportings: Clarification of terms. (#25076)
|
2022-06-10 17:09:18 +02:00 |
simple_q
|
[RLlib] Move all remaining algos into algorithms directory. (#25366)
|
2022-06-04 07:35:24 +02:00 |
slateq
|
[RLlib] SlateQ (tf GPU + multi-GPU) + Bandit fixes (#23276)
|
2022-03-18 13:45:16 +01:00 |
td3
|
[RLlib] Deprecation: Replace remaining evaluation_num_episodes with evaluation_duration . (#26000)
|
2022-06-23 19:11:29 +02:00 |
cleanup_experiment.py
|
[CI] Format Python code with Black (#21975)
|
2022-01-29 18:41:57 -08:00 |
compact-regression-test.yaml
|
[RLlib] Trainer.training_iteration -> Trainer.training_step; Iterations vs reportings: Clarification of terms. (#25076)
|
2022-06-10 17:09:18 +02:00 |
create_plots.py
|
[RLlib] Benchmark and regression test yaml cleanup and restructuring. (#8414)
|
2020-05-26 11:10:27 +02:00 |