Sven Mika
b5bc2b93c3
[RLlib] Move all remaining algos into algorithms
directory. ( #25366 )
2022-06-04 07:35:24 +02:00
kourosh hakhamaneshi
3815e52a61
[RLlib] Agents to algos: DQN w/o Apex and R2D2, DDPG/TD3, SAC, SlateQ, QMIX, PG, Bandits ( #24896 )
2022-05-19 18:30:42 +02:00
Jun Gong
dea134a472
[RLlib] Clean up Policy mixins. ( #24746 )
2022-05-17 17:16:08 +02:00
Artur Niederfahrenhorst
fb2915d26a
[RLlib] Replay Buffer API and Ape-X. ( #24506 )
2022-05-17 13:43:49 +02:00
Sven Mika
25001f6d8d
[RLlib] APPO Training iteration fn. ( #24545 )
2022-05-17 10:31:07 +02:00
Sven Mika
44a51610c2
[RLlib] SlateQ config objects. ( #24577 )
2022-05-10 20:07:18 +02:00
Sven Mika
f54557073e
[RLlib] Remove execution_plan
API code no longer needed. ( #24501 )
2022-05-06 12:29:53 +02:00
Sven Mika
f891a2b6f1
[RLlib] SlateQ + tf; release test fixes, related to TD-error not properly being formatted. ( #24521 )
2022-05-06 08:50:30 +02:00
Sven Mika
b48f63113b
[RLlib] SlateQ fixes: Release learning tests wrong yaml structure + TD-error torch issue ( #24429 )
2022-05-04 13:37:14 +02:00
Sven Mika
1bc6419e0e
[RLlib] R2D2 training iteration fn AND switch off execution_plan
API by default. ( #24165 )
2022-05-03 07:59:26 +02:00
Sven Mika
f066180ed5
[RLlib] Deprecate timesteps_per_iteration
config key (in favor of min_[sample|train]_timesteps_per_reporting
. ( #24372 )
2022-05-02 12:51:14 +02:00
Sven Mika
539832f2c5
[RLlib] SlateQ training iteration function. ( #24151 )
2022-04-29 18:38:17 +02:00
Sven Mika
627b9f2e88
[RLlib] QMIX training iteration function and new replay buffer API. ( #24164 )
2022-04-27 14:24:20 +02:00
Artur Niederfahrenhorst
e57ce7efd6
[RLlib] Replay Buffer API and Training Iteration Fn for DQN. ( #23420 )
2022-04-18 12:20:12 +02:00
Sven Mika
22c9c4aa39
[RLlib] Slate-Q +GPU torch bug fix. ( #23464 )
2022-03-24 17:39:33 +01:00
Sven Mika
b1cda46681
[RLlib] SlateQ (tf GPU + multi-GPU) + Bandit fixes ( #23276 )
2022-03-18 13:45:16 +01:00
Jun Gong
e765915ded
[RLlib] Make sure SlateQ works with GPU. ( #22738 )
2022-03-04 17:49:51 +01:00
Sven Mika
7b687e6cd8
[RLlib] SlateQ: Add a hard-task learning test to weekly regression suite. ( #22544 )
2022-02-25 21:58:16 +01:00
Sven Mika
8e00537b65
[RLlib] SlateQ: framework=tf fixes and SlateQ documentation update ( #22543 )
2022-02-23 13:03:45 +01:00
Sven Mika
6522935291
[RLlib] Slate-Q tf implementation and tests/benchmarks. ( #22389 )
2022-02-22 09:36:44 +01:00
Balaji Veeramani
31ed9e5d02
[CI] Replace YAPF disables with Black disables ( #21982 )
2022-02-08 16:29:25 -08:00
Sven Mika
38d75ce058
[RLlib] Cleanup SlateQ algo; add test + add target Q-net ( #21827 )
2022-02-04 17:01:12 +01:00
Balaji Veeramani
7f1bacc7dc
[CI] Format Python code with Black ( #21975 )
...
See #21316 and #21311 for the motivation behind these changes.
2022-01-29 18:41:57 -08:00
Sven Mika
d5bfb7b7da
[RLlib] Preparatory PR for multi-agent multi-GPU learner (alpha-star style) #03 ( #21652 )
2022-01-25 14:16:58 +01:00
Sven Mika
b10d5533be
[RLlib] Issue 20920 (partial solution): contrib/MADDPG + pettingzoo coop-pong-v4 not working. ( #21452 )
2022-01-10 11:19:40 +01:00
Sven Mika
b4790900f5
[RLlib] Sub-class Trainer
(instead of build_trainer()
): All remaining classes; soft-deprecate build_trainer
. ( #20725 )
2021-12-04 22:05:26 +01:00
Artur Niederfahrenhorst
d07e50e957
[RLlib] Replay buffer API (cleanups; docstrings; renames; move into rllib/execution/buffers
dir) ( #20552 )
2021-11-19 11:57:37 +01:00
gjoliver
99a0088233
[RLlib] Unify the way we create local replay buffer for all agents ( #19627 )
...
* [RLlib] Unify the way we create and use LocalReplayBuffer for all the agents.
This change
1. Get rid of the try...except clause when we call execution_plan(),
and get rid of the Deprecation warning as a result.
2. Fix the execution_plan() call in Trainer._try_recover() too.
3. Most importantly, makes it much easier to create and use different types
of local replay buffers for all our agents.
E.g., allow us to easily create a reservoir sampling replay buffer for
APPO agent for Riot in the near future.
* Introduce explicit configuration for replay buffer types.
* Fix is_training key error.
* actually deprecate buffer_size field.
2021-10-26 20:56:02 +02:00
gjoliver
89fbfc00f8
[RLlib] Some minor cleanups (buffer buffer_size -> capacity and others). ( #19623 )
2021-10-25 09:42:39 +02:00
Sven Mika
732197e23a
[RLlib] Multi-GPU for tf-DQN/PG/A2C. ( #13393 )
2021-03-08 15:41:27 +01:00
Sven Mika
8000258333
[RLlib] R2D2 Implementation. ( #13933 )
2021-02-25 12:18:11 +01:00
Sven Mika
99ae7bae05
[RLlib] JAXPolicy prep. PR #1 . ( #13077 )
2020-12-26 20:14:18 -05:00
desktable
5af745c90d
[RLlib] Implement the SlateQ algorithm ( #11450 )
2020-11-03 09:52:04 +01:00