Sven Mika
|
eb54236d13
|
[RLlib] DD-PPO training iteration fn (#23906)
|
2022-04-19 17:55:26 +02:00 |
|
Artur Niederfahrenhorst
|
e57ce7efd6
|
[RLlib] Replay Buffer API and Training Iteration Fn for DQN. (#23420)
|
2022-04-18 12:20:12 +02:00 |
|
Sven Mika
|
92781c603e
|
[RLlib] A2C training_iteration method implementation (_disable_execution_plan_api=True ) (#23735)
|
2022-04-15 18:36:13 +02:00 |
|
kourosh hakhamaneshi
|
c38a29573f
|
[RLlib] Removed deprecated code with error=True (#23916)
|
2022-04-15 13:51:12 +02:00 |
|
Sven Mika
|
a8494742a3
|
[RLlib] Memory leak finding toolset using tracemalloc + CI memory leak tests. (#15412)
|
2022-04-12 07:50:09 +02:00 |
|
Jun Gong
|
c61910487f
|
[RLlib] Fix typo in docstring of PGTorchPolicy (#23818)
|
2022-04-11 19:31:45 +02:00 |
|
Sven Mika
|
a3d4fc74a6
|
[RLlib] MARWIL: Move to training_iteration API. (#23798)
|
2022-04-11 19:28:32 +02:00 |
|
Steven Morad
|
00922817b6
|
[RLlib] Rewrite PPO to use training_iteration + enable DD-PPO for Win32. (#23673)
|
2022-04-11 08:39:10 +02:00 |
|
Eric Liang
|
1ff874e8e8
|
[spelling] Add linter rule for mis-capitalizations of RLLib -> RLlib (#23817)
|
2022-04-10 16:12:53 -07:00 |
|
Sven Mika
|
c82f6c62c8
|
[RLlib] Make RolloutWorkers (optionally) recoverable after failure. (#23739)
|
2022-04-08 15:33:28 +02:00 |
|
Steven Morad
|
39841b65b3
|
[RLlib] PPOTorchPolicy: Remove extra call to model.value_function (#23671)
|
2022-04-05 08:40:29 +02:00 |
|
Jiajun Yao
|
5f37231842
|
Remove yapf dependency (#23656)
Yapf has been replaced by black.
|
2022-04-04 21:50:04 -07:00 |
|
Sven Mika
|
0bb82f29b6
|
[RLlib] AlphaStar polishing (fix logger.info bug). (#22281)
|
2022-04-01 09:49:41 +02:00 |
|
Sven Mika
|
2eaa54bd76
|
[RLlib] POC: Config objects instead of dicts (PPO only). (#23491)
|
2022-03-31 18:26:12 +02:00 |
|
Artur Niederfahrenhorst
|
9a64bd4e9b
|
[RLlib] Simple-Q uses training iteration fn (instead of execution_plan); ReplayBuffer API for Simple-Q (#22842)
|
2022-03-29 14:44:40 +02:00 |
|
Sven Mika
|
7cb86acce2
|
[RLlib] trainer_template.py: hard deprecation (error when used). (#23488)
|
2022-03-25 18:25:51 +01:00 |
|
Max Pumperla
|
60054995e6
|
[docs] fix doctests and activate CI (#23418)
|
2022-03-24 17:04:02 -07:00 |
|
Sven Mika
|
22c9c4aa39
|
[RLlib] Slate-Q +GPU torch bug fix. (#23464)
|
2022-03-24 17:39:33 +01:00 |
|
Avnish Narayan
|
5134e0dc12
|
[RLlib] Change type to tensortype for cql policies. (#23438)
|
2022-03-24 12:32:29 +01:00 |
|
Fabian Witter
|
2547055f38
|
[RLlib] Add support for complex observations in CQL (#23332)
|
2022-03-22 17:04:07 +01:00 |
|
Jun Gong
|
d12977c4fb
|
[RLlib] TF2 Bandit Agent (#22838)
|
2022-03-21 16:55:55 +01:00 |
|
Sven Mika
|
b1cda46681
|
[RLlib] SlateQ (tf GPU + multi-GPU) + Bandit fixes (#23276)
|
2022-03-18 13:45:16 +01:00 |
|
Siyuan (Ryans) Zhuang
|
0c74ecad12
|
[Lint] Cleanup incorrectly formatted strings (Part 1: RLLib). (#23128)
|
2022-03-15 17:34:21 +01:00 |
|
Jeroen Bédorf
|
bc21a4593d
|
[RLlib] Fix crash when kl_coeff is set to 0 (#23063)
Co-authored-by: Jeroen Bédorf <jeroen@minds.ai>
Co-authored-by: Ishant Mrinal Haloi <mrinal.haloi11@gmail.com>
Co-authored-by: Ishant Mrinal <33053278+n30111@users.noreply.github.com>
|
2022-03-11 12:24:52 -08:00 |
|
simonsays1980
|
8627f44d7f
|
[RLlib] Remove duplicate code block: Config deprecation check for metrics_smoothing_episodes (#22152)
|
2022-03-09 16:51:42 +01:00 |
|
Sven Mika
|
3fe6f3b3eb
|
[RLlib] 2 bug fixes: Bandit registration not working if torch not installed. Env checker for MA envs. (#22821)
|
2022-03-04 19:16:30 +01:00 |
|
Jun Gong
|
e765915ded
|
[RLlib] Make sure SlateQ works with GPU. (#22738)
|
2022-03-04 17:49:51 +01:00 |
|
Jun Gong
|
e8be45065e
|
[RLlib] Restore policies on eval_workers as well. (#22641)
|
2022-03-01 08:38:14 +01:00 |
|
Sven Mika
|
7b687e6cd8
|
[RLlib] SlateQ: Add a hard-task learning test to weekly regression suite. (#22544)
|
2022-02-25 21:58:16 +01:00 |
|
Sven Mika
|
526fd6b5fb
|
[RLlib] Issue 22444: KL-coeff not stored in persistent policy state. (#22590)
|
2022-02-24 22:05:36 +01:00 |
|
Sven Mika
|
8e00537b65
|
[RLlib] SlateQ: framework=tf fixes and SlateQ documentation update (#22543)
|
2022-02-23 13:03:45 +01:00 |
|
Sven Mika
|
6522935291
|
[RLlib] Slate-Q tf implementation and tests/benchmarks. (#22389)
|
2022-02-22 09:36:44 +01:00 |
|
Jun Gong
|
2b6a0c71d7
|
[RLlib] Add a callback for when trainer finishes initialization: on_trainer_init . (#22493)
|
2022-02-22 08:18:32 +01:00 |
|
Daniel
|
308ccfe25c
|
[RLlib] DD-PPO move train_batch_size==-1 check to __init__ (#22521)
|
2022-02-21 11:44:12 +01:00 |
|
Sven Mika
|
c58cd90619
|
[RLlib] Enable Bandits to work in batches mode(s) (vector envs + multiple workers + train_batch_sizes > 1). (#22465)
|
2022-02-17 22:32:26 +01:00 |
|
Avnish Narayan
|
740def0a13
|
[RLlib] Put env-checker on critical path. (#22191)
|
2022-02-17 14:06:14 +01:00 |
|
Sven Mika
|
5ca6a56e16
|
[RLlib] Bug fix: eval-workers in offline RL setup have no env, even though eval_config includes env key. (#22350)
|
2022-02-15 09:32:43 +01:00 |
|
Steven Morad
|
5d52b599aa
|
[RLlib] Fix zero gradients for ppo-clipped vf (#22171)
|
2022-02-15 08:57:18 +01:00 |
|
Sven Mika
|
04a5c72ea3
|
Revert "Revert "[RLlib] Speedup A3C up to 3x (new training_iteration function instead of execution_plan) and re-instate Pong learning test."" (#18708)
|
2022-02-10 13:44:22 +01:00 |
|
Sven Mika
|
44d09c2aa5
|
[RLlib] Filter.clear_buffer() deprecated (use Filter.reset_buffer() instead). (#22246)
|
2022-02-10 02:58:43 +01:00 |
|
Alex Wu
|
b122f093c1
|
Revert "[RLlib] Speedup A3C up to 3x (new training_iteration function instead of execution_plan ) and re-instate Pong learning test." (#22250)
Reverts ray-project/ray#22126
Breaks rllib:tests/test_io
|
2022-02-09 09:26:36 -08:00 |
|
Ishant Mrinal
|
f0d8b6d701
|
[RLlib] Fix compute_actions() for Trainer due to missing if prev_actions/rewards is not None checks. (#22078)
|
2022-02-09 09:05:26 +01:00 |
|
Balaji Veeramani
|
31ed9e5d02
|
[CI] Replace YAPF disables with Black disables (#21982)
|
2022-02-08 16:29:25 -08:00 |
|
Sven Mika
|
ac3e6ab411
|
[RLlib] Speedup A3C up to 3x (new training_iteration function instead of execution_plan ) and re-instate Pong learning test. (#22126)
|
2022-02-08 19:04:13 +01:00 |
|
Sven Mika
|
c17a44cdfa
|
Revert "Revert "[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learni…" (#22153)
|
2022-02-08 16:43:00 +01:00 |
|
Sven Mika
|
f6617506a2
|
[RLlib] Add on_sub_environment_created to DefaultCallbacks class. (#21893)
|
2022-02-04 22:22:47 +01:00 |
|
Sven Mika
|
38d75ce058
|
[RLlib] Cleanup SlateQ algo; add test + add target Q-net (#21827)
|
2022-02-04 17:01:12 +01:00 |
|
SangBin Cho
|
a887763b38
|
Revert "[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learni… (#22105)
This reverts commit 3f03ef8ba8 .
|
2022-02-04 00:54:50 -08:00 |
|
Sven Mika
|
3f03ef8ba8
|
[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learning via league-based self-play. (#21356)
|
2022-02-03 09:32:09 +01:00 |
|
Rodrigo de Lazcano
|
a258f9c692
|
[RLlib] Neural-MMO keep_per_episode_custom_metrics patch (toward making Neuro-MMO RLlib's default massive-multi-agent learning test environment). (#22042)
|
2022-02-02 17:28:42 +01:00 |
|