Rohan Potdar
|
ab81c8e9ca
|
[RLlib]: Rename input_evaluation to off_policy_estimation_methods . (#25107)
|
2022-05-27 13:14:54 +02:00 |
|
Sven Mika
|
ec89fe5203
|
[RLlib] APEX-DQN and R2D2 config objects. (#25067)
|
2022-05-23 12:15:45 +02:00 |
|
Sven Mika
|
baf8c2fa1e
|
[RLlib] TD3 config objects. (#25065)
|
2022-05-23 10:07:13 +02:00 |
|
Sven Mika
|
09886d7ab8
|
[RLlib] Upgrade gym 0.23 (#24171)
|
2022-05-23 08:18:44 +02:00 |
|
Steven Morad
|
501d932449
|
[RLlib] SAC, RNNSAC, and CQL TrainerConfig objects (#25059)
|
2022-05-22 19:58:47 +02:00 |
|
Rohan Potdar
|
5a70b732e8
|
[RLlib] MARWIL and BC Config. (#24853)
|
2022-05-21 12:50:20 +02:00 |
|
Artur Niederfahrenhorst
|
fb2915d26a
|
[RLlib] Replay Buffer API and Ape-X. (#24506)
|
2022-05-17 13:43:49 +02:00 |
|
Sven Mika
|
5b61a00792
|
[RLlib] Feed all values in COMMON_CONFIG directly from TrainerConfig() (removes duplicate values and comments). (#24433)
|
2022-05-04 16:28:12 +02:00 |
|
Sven Mika
|
1bc6419e0e
|
[RLlib] R2D2 training iteration fn AND switch off execution_plan API by default. (#24165)
|
2022-05-03 07:59:26 +02:00 |
|
Sven Mika
|
7cca7782f1
|
[RLlib] OPE (off policy estimator) API. (#24384)
|
2022-05-02 21:15:50 +02:00 |
|
Sven Mika
|
f066180ed5
|
[RLlib] Deprecate timesteps_per_iteration config key (in favor of min_[sample|train]_timesteps_per_reporting . (#24372)
|
2022-05-02 12:51:14 +02:00 |
|
Sven Mika
|
b2b1c95aa5
|
[RLlib] A2/3C Config objects (A2CConfig and A3CConfig). (#24332)
|
2022-04-30 09:51:09 +02:00 |
|
Sven Mika
|
ba14f0a41b
|
[RLlib] PGTrainer config object class (PGConfig ). (#24295)
|
2022-04-28 22:25:16 +02:00 |
|
Sven Mika
|
c82f6c62c8
|
[RLlib] Make RolloutWorkers (optionally) recoverable after failure. (#23739)
|
2022-04-08 15:33:28 +02:00 |
|
Sven Mika
|
2eaa54bd76
|
[RLlib] POC: Config objects instead of dicts (PPO only). (#23491)
|
2022-03-31 18:26:12 +02:00 |
|