Commit graph

648 commits

Author SHA1 Message Date
Eric Liang
55d039af32
Annotate datasources and add API annotation check script (#24999)
Why are these changes needed?
Add API stability annotations for datasource classes, and add a linter to check all data classes have appropriate annotations.
2022-05-21 15:05:07 -07:00
Rohan Potdar
5a70b732e8
[RLlib] MARWIL and BC Config. (#24853) 2022-05-21 12:50:20 +02:00
Jun Gong
d5a6d46049
[RLlib] Migrate MAML, MB-MPO, MARWIL, and BC to use Policy sub-classing implementation. (#24914) 2022-05-20 14:10:59 +02:00
Kai Fricke
3e053c85ee
[RLlib] Fix broken links from agent -> algo conversion. (#25014) 2022-05-20 11:37:11 +02:00
kourosh hakhamaneshi
3815e52a61
[RLlib] Agents to algos: DQN w/o Apex and R2D2, DDPG/TD3, SAC, SlateQ, QMIX, PG, Bandits (#24896) 2022-05-19 18:30:42 +02:00
Sven Mika
628ee4b5f0
[RLlib] Bandit tf2 fix (+ add tf2 to test cases). (#24908) 2022-05-18 18:58:42 +02:00
Sven Mika
8f50087908
[RLlib] AlphaZero uses training_iteration API. (#24507) 2022-05-18 09:58:25 +02:00
Jun Gong
dea134a472
[RLlib] Clean up Policy mixins. (#24746) 2022-05-17 17:16:08 +02:00
Artur Niederfahrenhorst
c2a1e5abd1
[RLlib] Prioritized Replay (if required) in SimpleQ and DDPG. (#24866) 2022-05-17 13:53:07 +02:00
Artur Niederfahrenhorst
fb2915d26a
[RLlib] Replay Buffer API and Ape-X. (#24506) 2022-05-17 13:43:49 +02:00
Sven Mika
25001f6d8d
[RLlib] APPO Training iteration fn. (#24545) 2022-05-17 10:31:07 +02:00
Sven Mika
0cd7bc4054
[RLlib] Re-establish dashboard performance tests. (#24728) 2022-05-16 13:13:49 +02:00
Jun Gong
68a9a33386
[RLlib] Retry agents -> algorithms. with proper doc changes this time. (#24797) 2022-05-16 09:45:32 +02:00
Steven Morad
5c96e7223b
[RLlib] SimpleQ (minor cleanups) and DQN TrainerConfig objects. (#24584) 2022-05-15 16:14:43 +02:00
Simon Mo
9f23affdc0
[Hotfix] Unbreak lint in master (#24794) 2022-05-13 15:05:05 -07:00
Sven Mika
8fe3fd8f7b
[RLlib] QMix TrainerConfig objects. (#24775) 2022-05-13 18:50:28 +02:00
kourosh hakhamaneshi
ffcbb30552
[RLlib] Move from agents to algorithms - CQL, MARWIL, AlphaStar, MAML, Dreamer, MBMPO. (#24739) 2022-05-13 18:43:36 +02:00
Steven Morad
ebe6ab0afc
[RLlib] Bandits use TrainerConfig objects. (#24687) 2022-05-12 22:02:15 +02:00
Max Pumperla
6a6c58b5b4
[RLlib] Config objects for DDPG and SimpleQ. (#24339) 2022-05-12 16:12:42 +02:00
Artur Niederfahrenhorst
95d4a83a87
[RLlib] R2D2 Replay Buffer API integration. (#24473) 2022-05-10 20:36:14 +02:00
Sven Mika
44a51610c2
[RLlib] SlateQ config objects. (#24577) 2022-05-10 20:07:18 +02:00
Sven Mika
f243895ebb
[RLlib] Dreamer ConfigObject class. (#24650) 2022-05-10 16:19:42 +02:00
Sven Mika
6d94b2acbe
[RLlib] AlphaStar config objects. (#24576) 2022-05-10 14:01:00 +02:00
Amog Kamsetty
b5b48f6cc7
[RLlib] Switch Dreamer to training_iteration API. (#24488) 2022-05-10 08:37:34 +02:00
Artur Niederfahrenhorst
8d906f9bf8
[RLlib] SAC with new Replay Buffer API. (#24156) 2022-05-09 14:33:02 +02:00
Steven Morad
b76273357b
[RLlib] APEX-DQN replay buffer config validation fix. (#24588) 2022-05-09 09:59:04 +02:00
kourosh hakhamaneshi
69055f556d
[RLlib] Move agents.ars to algorithms.ars. (#24516) 2022-05-06 19:11:15 +02:00
kourosh hakhamaneshi
f48f1b252c
[RLlib] Moved agents.es to algorithms.es (#24511) 2022-05-06 14:54:22 +02:00
Sven Mika
7ab19ddc32
[RLlib] MADDPG: Move into agents folder (from contrib) and use training_iteration method. (#24502) 2022-05-06 12:35:21 +02:00
Sven Mika
f54557073e
[RLlib] Remove execution_plan API code no longer needed. (#24501) 2022-05-06 12:29:53 +02:00
Sven Mika
f891a2b6f1
[RLlib] SlateQ + tf; release test fixes, related to TD-error not properly being formatted. (#24521) 2022-05-06 08:50:30 +02:00
Avnish Narayan
f2bb6f6806
[RLlib] Impala training iteration fn (#23454) 2022-05-05 16:11:08 +02:00
Artur Niederfahrenhorst
86bc9ecce2
[RLlib] DDPG Training iteration fn & Replay Buffer API (#24212) 2022-05-05 09:41:38 +02:00
Sven Mika
5b61a00792
[RLlib] Feed all values in COMMON_CONFIG directly from TrainerConfig() (removes duplicate values and comments). (#24433) 2022-05-04 16:28:12 +02:00
Sven Mika
b48f63113b
[RLlib] SlateQ fixes: Release learning tests wrong yaml structure + TD-error torch issue (#24429) 2022-05-04 13:37:14 +02:00
Sven Mika
1bc6419e0e
[RLlib] R2D2 training iteration fn AND switch off execution_plan API by default. (#24165) 2022-05-03 07:59:26 +02:00
Sven Mika
7cca7782f1
[RLlib] OPE (off policy estimator) API. (#24384) 2022-05-02 21:15:50 +02:00
Sven Mika
0c5ac3b9e8
[RLlib] Issue 24075: Better error message for Bandit MultiDiscrete (suggest using our wrapper). (#24385) 2022-05-02 21:14:08 +02:00
Sven Mika
f53ca1cacb
[RLlib] ES + ARS TrainerConfig objects. (#24374) 2022-05-02 16:55:28 +02:00
Sven Mika
026849cd27
[RLlib] APPO TrainerConfig objects. (#24376) 2022-05-02 15:06:23 +02:00
Sven Mika
f066180ed5
[RLlib] Deprecate timesteps_per_iteration config key (in favor of min_[sample|train]_timesteps_per_reporting. (#24372) 2022-05-02 12:51:14 +02:00
Sven Mika
950bd3fc3f
[RLlib] IMPALA TrainerConfig objects. (#24375) 2022-05-02 12:05:30 +02:00
Sven Mika
b2b1c95aa5
[RLlib] A2/3C Config objects (A2CConfig and A3CConfig). (#24332) 2022-04-30 09:51:09 +02:00
Sven Mika
3052193c9e
[RLlib] Fix CQL getting stuck when deprecated timesteps_per_iteration is used (use min_train_timesteps_per_reporting instead). (#24345)
Fix CQL getting stuck when deprecated timesteps_per_iteration is used (use min_train_timesteps_per_reporting instead).

CQL does not perform sampling timesteps and the deprecated timesteps_per_iteration is automatically translated into the new min_sample_timesteps_per_reporting, but should be translated (only for CQL and other purely offline RL algos) into min_train_timesteps_per_reporting.

If timesteps_per_iteration, CQL lever leaves the first iteration as it thinks it's not done yet (sample timesteps always remain at 0).
2022-04-29 21:02:34 +01:00
Sven Mika
539832f2c5
[RLlib] SlateQ training iteration function. (#24151) 2022-04-29 18:38:17 +02:00
Xuehai Pan
3c3dd5051f
[RLlib] Fix type hints for original_batches in callbacks. (#24214) 2022-04-29 10:33:53 +02:00
Xuehai Pan
9c76e21a5e
[RLlib] Ensure MultiCallbacks always implements all callback methods (#24254) 2022-04-29 10:30:24 +02:00
Sven Mika
ba14f0a41b
[RLlib] PGTrainer config object class (PGConfig). (#24295) 2022-04-28 22:25:16 +02:00
Sven Mika
6551922c21
[RLlib] Fix AlphaStar for tf2+tracing; smaller cleanups around avoiding to wrap a TFPolicy as_eager() or with_tracing more than once. (#24271) 2022-04-28 13:43:21 +02:00
Sven Mika
c95dd79953
[RLlib] APPO eager fix (APPOTFPolicy gets wrapped as_eager() twice by mistake). (#24268) 2022-04-27 21:27:34 +02:00