Commit graph

586 commits

Author SHA1 Message Date
Sven Mika
eb54236d13
[RLlib] DD-PPO training iteration fn (#23906) 2022-04-19 17:55:26 +02:00
Artur Niederfahrenhorst
e57ce7efd6
[RLlib] Replay Buffer API and Training Iteration Fn for DQN. (#23420) 2022-04-18 12:20:12 +02:00
Sven Mika
92781c603e
[RLlib] A2C training_iteration method implementation (_disable_execution_plan_api=True) (#23735) 2022-04-15 18:36:13 +02:00
kourosh hakhamaneshi
c38a29573f
[RLlib] Removed deprecated code with error=True (#23916) 2022-04-15 13:51:12 +02:00
Sven Mika
a8494742a3
[RLlib] Memory leak finding toolset using tracemalloc + CI memory leak tests. (#15412) 2022-04-12 07:50:09 +02:00
Jun Gong
c61910487f
[RLlib] Fix typo in docstring of PGTorchPolicy (#23818) 2022-04-11 19:31:45 +02:00
Sven Mika
a3d4fc74a6
[RLlib] MARWIL: Move to training_iteration API. (#23798) 2022-04-11 19:28:32 +02:00
Steven Morad
00922817b6
[RLlib] Rewrite PPO to use training_iteration + enable DD-PPO for Win32. (#23673) 2022-04-11 08:39:10 +02:00
Eric Liang
1ff874e8e8
[spelling] Add linter rule for mis-capitalizations of RLLib -> RLlib (#23817) 2022-04-10 16:12:53 -07:00
Sven Mika
c82f6c62c8
[RLlib] Make RolloutWorkers (optionally) recoverable after failure. (#23739) 2022-04-08 15:33:28 +02:00
Steven Morad
39841b65b3
[RLlib] PPOTorchPolicy: Remove extra call to model.value_function (#23671) 2022-04-05 08:40:29 +02:00
Jiajun Yao
5f37231842
Remove yapf dependency (#23656)
Yapf has been replaced by black.
2022-04-04 21:50:04 -07:00
Sven Mika
0bb82f29b6
[RLlib] AlphaStar polishing (fix logger.info bug). (#22281) 2022-04-01 09:49:41 +02:00
Sven Mika
2eaa54bd76
[RLlib] POC: Config objects instead of dicts (PPO only). (#23491) 2022-03-31 18:26:12 +02:00
Artur Niederfahrenhorst
9a64bd4e9b
[RLlib] Simple-Q uses training iteration fn (instead of execution_plan); ReplayBuffer API for Simple-Q (#22842) 2022-03-29 14:44:40 +02:00
Sven Mika
7cb86acce2
[RLlib] trainer_template.py: hard deprecation (error when used). (#23488) 2022-03-25 18:25:51 +01:00
Max Pumperla
60054995e6
[docs] fix doctests and activate CI (#23418) 2022-03-24 17:04:02 -07:00
Sven Mika
22c9c4aa39
[RLlib] Slate-Q +GPU torch bug fix. (#23464) 2022-03-24 17:39:33 +01:00
Avnish Narayan
5134e0dc12
[RLlib] Change type to tensortype for cql policies. (#23438) 2022-03-24 12:32:29 +01:00
Fabian Witter
2547055f38
[RLlib] Add support for complex observations in CQL (#23332) 2022-03-22 17:04:07 +01:00
Jun Gong
d12977c4fb
[RLlib] TF2 Bandit Agent (#22838) 2022-03-21 16:55:55 +01:00
Sven Mika
b1cda46681
[RLlib] SlateQ (tf GPU + multi-GPU) + Bandit fixes (#23276) 2022-03-18 13:45:16 +01:00
Siyuan (Ryans) Zhuang
0c74ecad12
[Lint] Cleanup incorrectly formatted strings (Part 1: RLLib). (#23128) 2022-03-15 17:34:21 +01:00
Jeroen Bédorf
bc21a4593d
[RLlib] Fix crash when kl_coeff is set to 0 (#23063)
Co-authored-by: Jeroen Bédorf <jeroen@minds.ai>
Co-authored-by: Ishant Mrinal Haloi <mrinal.haloi11@gmail.com>
Co-authored-by: Ishant Mrinal <33053278+n30111@users.noreply.github.com>
2022-03-11 12:24:52 -08:00
simonsays1980
8627f44d7f
[RLlib] Remove duplicate code block: Config deprecation check for metrics_smoothing_episodes (#22152) 2022-03-09 16:51:42 +01:00
Sven Mika
3fe6f3b3eb
[RLlib] 2 bug fixes: Bandit registration not working if torch not installed. Env checker for MA envs. (#22821) 2022-03-04 19:16:30 +01:00
Jun Gong
e765915ded
[RLlib] Make sure SlateQ works with GPU. (#22738) 2022-03-04 17:49:51 +01:00
Jun Gong
e8be45065e
[RLlib] Restore policies on eval_workers as well. (#22641) 2022-03-01 08:38:14 +01:00
Sven Mika
7b687e6cd8
[RLlib] SlateQ: Add a hard-task learning test to weekly regression suite. (#22544) 2022-02-25 21:58:16 +01:00
Sven Mika
526fd6b5fb
[RLlib] Issue 22444: KL-coeff not stored in persistent policy state. (#22590) 2022-02-24 22:05:36 +01:00
Sven Mika
8e00537b65
[RLlib] SlateQ: framework=tf fixes and SlateQ documentation update (#22543) 2022-02-23 13:03:45 +01:00
Sven Mika
6522935291
[RLlib] Slate-Q tf implementation and tests/benchmarks. (#22389) 2022-02-22 09:36:44 +01:00
Jun Gong
2b6a0c71d7
[RLlib] Add a callback for when trainer finishes initialization: on_trainer_init. (#22493) 2022-02-22 08:18:32 +01:00
Daniel
308ccfe25c
[RLlib] DD-PPO move train_batch_size==-1 check to __init__ (#22521) 2022-02-21 11:44:12 +01:00
Sven Mika
c58cd90619
[RLlib] Enable Bandits to work in batches mode(s) (vector envs + multiple workers + train_batch_sizes > 1). (#22465) 2022-02-17 22:32:26 +01:00
Avnish Narayan
740def0a13
[RLlib] Put env-checker on critical path. (#22191) 2022-02-17 14:06:14 +01:00
Sven Mika
5ca6a56e16
[RLlib] Bug fix: eval-workers in offline RL setup have no env, even though eval_config includes env key. (#22350) 2022-02-15 09:32:43 +01:00
Steven Morad
5d52b599aa
[RLlib] Fix zero gradients for ppo-clipped vf (#22171) 2022-02-15 08:57:18 +01:00
Sven Mika
04a5c72ea3
Revert "Revert "[RLlib] Speedup A3C up to 3x (new training_iteration function instead of execution_plan) and re-instate Pong learning test."" (#18708) 2022-02-10 13:44:22 +01:00
Sven Mika
44d09c2aa5
[RLlib] Filter.clear_buffer() deprecated (use Filter.reset_buffer() instead). (#22246) 2022-02-10 02:58:43 +01:00
Alex Wu
b122f093c1
Revert "[RLlib] Speedup A3C up to 3x (new training_iteration function instead of execution_plan) and re-instate Pong learning test." (#22250)
Reverts ray-project/ray#22126

Breaks rllib:tests/test_io
2022-02-09 09:26:36 -08:00
Ishant Mrinal
f0d8b6d701
[RLlib] Fix compute_actions() for Trainer due to missing if prev_actions/rewards is not None checks. (#22078) 2022-02-09 09:05:26 +01:00
Balaji Veeramani
31ed9e5d02
[CI] Replace YAPF disables with Black disables (#21982) 2022-02-08 16:29:25 -08:00
Sven Mika
ac3e6ab411
[RLlib] Speedup A3C up to 3x (new training_iteration function instead of execution_plan) and re-instate Pong learning test. (#22126) 2022-02-08 19:04:13 +01:00
Sven Mika
c17a44cdfa
Revert "Revert "[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learni…" (#22153) 2022-02-08 16:43:00 +01:00
Sven Mika
f6617506a2
[RLlib] Add on_sub_environment_created to DefaultCallbacks class. (#21893) 2022-02-04 22:22:47 +01:00
Sven Mika
38d75ce058
[RLlib] Cleanup SlateQ algo; add test + add target Q-net (#21827) 2022-02-04 17:01:12 +01:00
SangBin Cho
a887763b38
Revert "[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learni… (#22105)
This reverts commit 3f03ef8ba8.
2022-02-04 00:54:50 -08:00
Sven Mika
3f03ef8ba8
[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learning via league-based self-play. (#21356) 2022-02-03 09:32:09 +01:00
Rodrigo de Lazcano
a258f9c692
[RLlib] Neural-MMO keep_per_episode_custom_metrics patch (toward making Neuro-MMO RLlib's default massive-multi-agent learning test environment). (#22042) 2022-02-02 17:28:42 +01:00