Sven Mika
|
7c39aa5fac
|
[RLlib] Trainer.training_iteration -> Trainer.training_step; Iterations vs reportings: Clarification of terms. (#25076)
|
2022-06-10 17:09:18 +02:00 |
|
Steven Morad
|
501d932449
|
[RLlib] SAC, RNNSAC, and CQL TrainerConfig objects (#25059)
|
2022-05-22 19:58:47 +02:00 |
|
Artur Niederfahrenhorst
|
fb2915d26a
|
[RLlib] Replay Buffer API and Ape-X. (#24506)
|
2022-05-17 13:43:49 +02:00 |
|
Sven Mika
|
f066180ed5
|
[RLlib] Deprecate timesteps_per_iteration config key (in favor of min_[sample|train]_timesteps_per_reporting . (#24372)
|
2022-05-02 12:51:14 +02:00 |
|
Avnish Narayan
|
0d2ba41e41
|
[RLlib] [CI] Deflake longer running RLlib learning tests for off policy algorithms. Fix seeding issue in TransformedAction Environments (#21685)
|
2022-02-04 14:59:56 +01:00 |
|
Sven Mika
|
63db0e3a7c
|
[RLlib] Fix SAC learning test flakiness introduced in PR: "Sub-class Trainer (instead of build_trainer() ): All remaining classes; soft-deprecate build_trainer ." (#20985)
|
2021-12-09 14:24:27 +01:00 |
|
Sven Mika
|
8a72824c63
|
[RLlib Testig] Split and unflake more CI tests (make sure all jobs are < 30min). (#18591)
|
2021-09-15 22:16:48 +02:00 |
|
Sven Mika
|
53206dd440
|
[RLlib] CQL BC loss fixes; PPO/PG/A2|3C action normalization fixes (#16531)
|
2021-06-30 12:32:11 +02:00 |
|