Artur Niederfahrenhorst
306853b5b8
[RLlib] Issue 22693: RNN-SAC fixes. ( #23814 )
2022-04-25 09:19:24 +02:00
Ben Kasper
531fdd50d4
[RLlib] Add 2 missing callbacks to MultiCallbacks
class (on_trainer_init
and on_sub_environment_created
) ( #24153 )
2022-04-25 09:18:03 +02:00
Avnish Narayan
6e68b6bef9
[RLlib] DD-PPO training iteration fn. ( #24118 )
...
We had unreported merge conflicts with DDPPO. This PR closes and combines #24092 , #24035 , #24030 and #23096
Co-authored-by: sven1977 <svenmika1977@gmail.com>
2022-04-22 15:22:14 -07:00
Kai Fricke
9f7170e444
Revert "Revert revert #23906 [RLlib] DD-PPO training iteration function implementation. ( #24035 )" ( #24103 )
...
This reverts commit a337fd994e
.
2022-04-22 09:58:58 +01:00
Avnish Narayan
a337fd994e
Revert revert #23906 [RLlib] DD-PPO training iteration function implementation. ( #24035 )
2022-04-21 17:37:49 +02:00
Avnish Narayan
477b9d22d2
[RLlib][Training iteration fn] APEX conversion ( #22937 )
2022-04-20 17:56:18 +02:00
Avnish Narayan
0ddbce6518
Revert "[RLlib] DD-PPO training iteration fn ( #23906 )" ( #24030 )
...
The DDPPO LR scheduler test is broken because the learner_info_dictionary that is returned by the training iteration function does not consistently return a learner info for every training iteration, but the test expects that it does.
We'll need to fix the test then re-merge
Reverts #23906
2022-04-19 16:43:57 -07:00
Sven Mika
eb54236d13
[RLlib] DD-PPO training iteration fn ( #23906 )
2022-04-19 17:55:26 +02:00
Artur Niederfahrenhorst
e57ce7efd6
[RLlib] Replay Buffer API and Training Iteration Fn for DQN. ( #23420 )
2022-04-18 12:20:12 +02:00
Sven Mika
92781c603e
[RLlib] A2C training_iteration
method implementation (_disable_execution_plan_api=True
) ( #23735 )
2022-04-15 18:36:13 +02:00
kourosh hakhamaneshi
c38a29573f
[RLlib] Removed deprecated code with error=True ( #23916 )
2022-04-15 13:51:12 +02:00
Sven Mika
a8494742a3
[RLlib] Memory leak finding toolset using tracemalloc + CI memory leak tests. ( #15412 )
2022-04-12 07:50:09 +02:00
Jun Gong
c61910487f
[RLlib] Fix typo in docstring of PGTorchPolicy ( #23818 )
2022-04-11 19:31:45 +02:00
Sven Mika
a3d4fc74a6
[RLlib] MARWIL: Move to training_iteration API. ( #23798 )
2022-04-11 19:28:32 +02:00
Steven Morad
00922817b6
[RLlib] Rewrite PPO to use training_iteration + enable DD-PPO for Win32. ( #23673 )
2022-04-11 08:39:10 +02:00
Eric Liang
1ff874e8e8
[spelling] Add linter rule for mis-capitalizations of RLLib -> RLlib ( #23817 )
2022-04-10 16:12:53 -07:00
Sven Mika
c82f6c62c8
[RLlib] Make RolloutWorkers (optionally) recoverable after failure. ( #23739 )
2022-04-08 15:33:28 +02:00
Steven Morad
39841b65b3
[RLlib] PPOTorchPolicy: Remove extra call to model.value_function
( #23671 )
2022-04-05 08:40:29 +02:00
Jiajun Yao
5f37231842
Remove yapf dependency ( #23656 )
...
Yapf has been replaced by black.
2022-04-04 21:50:04 -07:00
Sven Mika
0bb82f29b6
[RLlib] AlphaStar polishing (fix logger.info bug). ( #22281 )
2022-04-01 09:49:41 +02:00
Sven Mika
2eaa54bd76
[RLlib] POC: Config objects instead of dicts (PPO only). ( #23491 )
2022-03-31 18:26:12 +02:00
Artur Niederfahrenhorst
9a64bd4e9b
[RLlib] Simple-Q uses training iteration fn (instead of execution_plan); ReplayBuffer API for Simple-Q ( #22842 )
2022-03-29 14:44:40 +02:00
Sven Mika
7cb86acce2
[RLlib] trainer_template.py: hard deprecation (error when used). ( #23488 )
2022-03-25 18:25:51 +01:00
Max Pumperla
60054995e6
[docs] fix doctests and activate CI ( #23418 )
2022-03-24 17:04:02 -07:00
Sven Mika
22c9c4aa39
[RLlib] Slate-Q +GPU torch bug fix. ( #23464 )
2022-03-24 17:39:33 +01:00
Avnish Narayan
5134e0dc12
[RLlib] Change type to tensortype for cql policies. ( #23438 )
2022-03-24 12:32:29 +01:00
Fabian Witter
2547055f38
[RLlib] Add support for complex observations in CQL ( #23332 )
2022-03-22 17:04:07 +01:00
Jun Gong
d12977c4fb
[RLlib] TF2 Bandit Agent ( #22838 )
2022-03-21 16:55:55 +01:00
Sven Mika
b1cda46681
[RLlib] SlateQ (tf GPU + multi-GPU) + Bandit fixes ( #23276 )
2022-03-18 13:45:16 +01:00
Siyuan (Ryans) Zhuang
0c74ecad12
[Lint] Cleanup incorrectly formatted strings (Part 1: RLLib). ( #23128 )
2022-03-15 17:34:21 +01:00
Jeroen Bédorf
bc21a4593d
[RLlib] Fix crash when kl_coeff is set to 0 ( #23063 )
...
Co-authored-by: Jeroen Bédorf <jeroen@minds.ai>
Co-authored-by: Ishant Mrinal Haloi <mrinal.haloi11@gmail.com>
Co-authored-by: Ishant Mrinal <33053278+n30111@users.noreply.github.com>
2022-03-11 12:24:52 -08:00
simonsays1980
8627f44d7f
[RLlib] Remove duplicate code block: Config deprecation check for metrics_smoothing_episodes
( #22152 )
2022-03-09 16:51:42 +01:00
Sven Mika
3fe6f3b3eb
[RLlib] 2 bug fixes: Bandit registration not working if torch not installed. Env checker for MA envs. ( #22821 )
2022-03-04 19:16:30 +01:00
Jun Gong
e765915ded
[RLlib] Make sure SlateQ works with GPU. ( #22738 )
2022-03-04 17:49:51 +01:00
Jun Gong
e8be45065e
[RLlib] Restore policies on eval_workers
as well. ( #22641 )
2022-03-01 08:38:14 +01:00
Sven Mika
7b687e6cd8
[RLlib] SlateQ: Add a hard-task learning test to weekly regression suite. ( #22544 )
2022-02-25 21:58:16 +01:00
Sven Mika
526fd6b5fb
[RLlib] Issue 22444: KL-coeff not stored in persistent policy state. ( #22590 )
2022-02-24 22:05:36 +01:00
Sven Mika
8e00537b65
[RLlib] SlateQ: framework=tf fixes and SlateQ documentation update ( #22543 )
2022-02-23 13:03:45 +01:00
Sven Mika
6522935291
[RLlib] Slate-Q tf implementation and tests/benchmarks. ( #22389 )
2022-02-22 09:36:44 +01:00
Jun Gong
2b6a0c71d7
[RLlib] Add a callback for when trainer finishes initialization: on_trainer_init
. ( #22493 )
2022-02-22 08:18:32 +01:00
Daniel
308ccfe25c
[RLlib] DD-PPO move train_batch_size==-1
check to __init__ ( #22521 )
2022-02-21 11:44:12 +01:00
Sven Mika
c58cd90619
[RLlib] Enable Bandits to work in batches mode(s) (vector envs + multiple workers + train_batch_sizes > 1). ( #22465 )
2022-02-17 22:32:26 +01:00
Avnish Narayan
740def0a13
[RLlib] Put env-checker on critical path. ( #22191 )
2022-02-17 14:06:14 +01:00
Sven Mika
5ca6a56e16
[RLlib] Bug fix: eval-workers in offline RL setup have no env, even though eval_config includes env key. ( #22350 )
2022-02-15 09:32:43 +01:00
Steven Morad
5d52b599aa
[RLlib] Fix zero gradients for ppo-clipped vf ( #22171 )
2022-02-15 08:57:18 +01:00
Sven Mika
04a5c72ea3
Revert "Revert "[RLlib] Speedup A3C up to 3x (new training_iteration function instead of execution_plan) and re-instate Pong learning test."" ( #18708 )
2022-02-10 13:44:22 +01:00
Sven Mika
44d09c2aa5
[RLlib] Filter.clear_buffer() deprecated (use Filter.reset_buffer() instead). ( #22246 )
2022-02-10 02:58:43 +01:00
Alex Wu
b122f093c1
Revert "[RLlib] Speedup A3C up to 3x (new training_iteration
function instead of execution_plan
) and re-instate Pong learning test." ( #22250 )
...
Reverts ray-project/ray#22126
Breaks rllib:tests/test_io
2022-02-09 09:26:36 -08:00
Ishant Mrinal
f0d8b6d701
[RLlib] Fix compute_actions() for Trainer due to missing if prev_actions/rewards is not None checks. ( #22078 )
2022-02-09 09:05:26 +01:00
Balaji Veeramani
31ed9e5d02
[CI] Replace YAPF disables with Black disables ( #21982 )
2022-02-08 16:29:25 -08:00