Sven Mika
7cb86acce2
[RLlib] trainer_template.py: hard deprecation (error when used). ( #23488 )
2022-03-25 18:25:51 +01:00
Jun Gong
d12977c4fb
[RLlib] TF2 Bandit Agent ( #22838 )
2022-03-21 16:55:55 +01:00
Siyuan (Ryans) Zhuang
0c74ecad12
[Lint] Cleanup incorrectly formatted strings (Part 1: RLLib). ( #23128 )
2022-03-15 17:34:21 +01:00
simonsays1980
568cf28dd4
[RLlib] Example script custom_metrics_and_callbacks.py
should work for batch_mode=complete_episodes
. ( #22684 )
2022-03-01 09:00:38 +01:00
Sven Mika
7b687e6cd8
[RLlib] SlateQ: Add a hard-task learning test to weekly regression suite. ( #22544 )
2022-02-25 21:58:16 +01:00
Jun Gong
a385c9b127
[RLlib] Update bandit_envs_recommender_system ( #22421 )
2022-02-24 22:43:41 +01:00
Sven Mika
6522935291
[RLlib] Slate-Q tf implementation and tests/benchmarks. ( #22389 )
2022-02-22 09:36:44 +01:00
Sven Mika
c58cd90619
[RLlib] Enable Bandits to work in batches mode(s) (vector envs + multiple workers + train_batch_sizes > 1). ( #22465 )
2022-02-17 22:32:26 +01:00
Avnish Narayan
740def0a13
[RLlib] Put env-checker on critical path. ( #22191 )
2022-02-17 14:06:14 +01:00
Sven Mika
04a5c72ea3
Revert "Revert "[RLlib] Speedup A3C up to 3x (new training_iteration function instead of execution_plan) and re-instate Pong learning test."" ( #18708 )
2022-02-10 13:44:22 +01:00
Sven Mika
44d09c2aa5
[RLlib] Filter.clear_buffer() deprecated (use Filter.reset_buffer() instead). ( #22246 )
2022-02-10 02:58:43 +01:00
Jun Gong
3207f537cc
[RLlib] RecSim Interest evolution environment should use custom video sampler: IEvVideoSampler
due to only one cluster being used. ( #22211 )
2022-02-09 10:29:35 +01:00
Sven Mika
c17a44cdfa
Revert "Revert "[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learni…" ( #22153 )
2022-02-08 16:43:00 +01:00
Sven Mika
8b678ddd68
[RLlib] Issue 22036: Client should handle concurrent episodes with one being training_enabled=False
. ( #22076 )
2022-02-06 12:35:03 +01:00
Sven Mika
38d75ce058
[RLlib] Cleanup SlateQ algo; add test + add target Q-net ( #21827 )
2022-02-04 17:01:12 +01:00
Avnish Narayan
0d2ba41e41
[RLlib] [CI] Deflake longer running RLlib learning tests for off policy algorithms. Fix seeding issue in TransformedAction Environments ( #21685 )
2022-02-04 14:59:56 +01:00
SangBin Cho
a887763b38
Revert "[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learni… ( #22105 )
...
This reverts commit 3f03ef8ba8
.
2022-02-04 00:54:50 -08:00
Sven Mika
3f03ef8ba8
[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learning via league-based self-play. ( #21356 )
2022-02-03 09:32:09 +01:00
Jun Gong
9c95b9a5fa
[RLlib] Add an env wrapper so RecSim works with our Bandits agent. ( #22028 )
2022-02-02 12:15:38 +01:00
Jun Gong
a55258eb9c
[RLlib] Move bandit example scripts into examples folder. ( #21949 )
2022-02-02 09:20:47 +01:00
Balaji Veeramani
7f1bacc7dc
[CI] Format Python code with Black ( #21975 )
...
See #21316 and #21311 for the motivation behind these changes.
2022-01-29 18:41:57 -08:00
Sven Mika
893536ebd9
[RLlib] Move bandits into main agents folder; Make RecSim adapter more accessible; ( #21773 )
2022-01-27 13:58:12 +01:00
Sven Mika
371fbb17e4
[RLlib] Make policies_to_train
more flexible via callable option. ( #20735 )
2022-01-27 12:17:34 +01:00
Sven Mika
d5bfb7b7da
[RLlib] Preparatory PR for multi-agent multi-GPU learner (alpha-star style) #03 ( #21652 )
2022-01-25 14:16:58 +01:00
Sven Mika
c288b97e5f
[RLlib] Issue 21629: Video recorder env wrapper not working. Added test case. ( #21670 )
2022-01-24 19:38:21 +01:00
Avnish Narayan
12b087acb8
[RLlib] Base env pre-checker. ( #21569 )
2022-01-18 16:34:06 +01:00
Jun Gong
7517aefe05
[RLlib] Bring back BC and Marwil learning tests. ( #21574 )
2022-01-14 14:35:32 +01:00
Sven Mika
f94bd99ce4
[RLlib] Issue 21044: Improve error message for "multiagent" dict checks. ( #21448 )
2022-01-11 19:50:03 +01:00
Avnish Narayan
f7a5fc36eb
[rllib] Give rnnsac_stateless cartpole gpu, increase timeout ( #21407 )
...
Increase test_preprocessors runtimes.
2022-01-06 11:54:19 -08:00
Sven Mika
c01245763e
[RLlib] Revert "Revert "updated pettingzoo wrappers, env versions, urls"" ( #21339 )
2022-01-04 18:30:26 +01:00
Sven Mika
abd3bef63b
[RLlib] QMIX better defaults + added to CI learning tests ( #21332 )
2022-01-04 08:54:41 +01:00
Kai Fricke
489e6945a6
Revert "[RLlib] Updated pettingzoo wrappers, env versions, urls ( #20113 )" ( #21338 )
...
This reverts commit 327eb84154
.
2022-01-03 10:21:25 +00:00
Benjamin Black
327eb84154
[RLlib] Updated pettingzoo wrappers, env versions, urls ( #20113 )
2022-01-02 21:29:09 +01:00
Sven Mika
62dbf26394
[RLlib] POC: Run PGTrainer w/o the distr. exec API (Trainer's new training_iteration method). ( #20984 )
2021-12-21 08:39:05 +01:00
Sven Mika
daa4304a91
[RLlib] Switch off preprocessors by default for PGTrainer. ( #21008 )
2021-12-13 12:04:23 +01:00
Sven Mika
db058d0fb3
[RLlib] Rename metrics_smoothing_episodes
into metrics_num_episodes_for_smoothing
for clarity. ( #20983 )
2021-12-11 20:33:35 +01:00
Sven Mika
596c8e2772
[RLlib] Experimental no-flatten option for actions/prev-actions. ( #20918 )
2021-12-11 14:57:58 +01:00
kk-55
9acf2f954d
[RLlib] Example containing a proposal for computing an adapted (time-dependent) GAE used by the PPO algorithm (via callback on_postprocess_trajectory) ( #20850 )
2021-12-09 14:48:56 +01:00
Ishant Mrinal
2868d1a2cf
[RLlib] Support for RE3 exploration algorithm (for tf) ( #19551 )
2021-12-07 13:26:34 +01:00
Sven Mika
60b2219d72
[RLlib] Allow for evaluation to run by timesteps
(alternative to episodes
) and add auto-setting to make sure train doesn't ever have to wait for eval (e.g. long episodes) to finish. ( #20757 )
2021-12-04 13:26:33 +01:00
Jun Gong
2317c693cf
[RLlib] Use SampleBrach instead of input dict whenever possible ( #20746 )
2021-12-02 13:11:26 +01:00
Jun Gong
65bd8e29f8
[RLlib] Update a few things to get rid of the remote_vector_env
deprecation warning. ( #20753 )
2021-12-02 13:10:44 +01:00
Sven Mika
49cd7ea6f9
[RLlib] Trainer sub-class PPO/DDPPO (instead of build_trainer()
). ( #20571 )
2021-11-23 23:01:05 +01:00
Artur Niederfahrenhorst
d07e50e957
[RLlib] Replay buffer API (cleanups; docstrings; renames; move into rllib/execution/buffers
dir) ( #20552 )
2021-11-19 11:57:37 +01:00
Sven Mika
7a585fb275
[RLlib; Documentation] RLlib README overhaul. ( #20249 )
2021-11-18 18:08:40 +01:00
Sven Mika
56619b955e
[RLlib; Documentation] Some docstring cleanups; Rename RemoteVectorEnv into RemoteBaseEnv for clarity. ( #20250 )
2021-11-17 21:40:16 +01:00
Sven Mika
f82880eda1
Revert "Revert [RLlib] POC: Deprecate build_policy
(policy template) for torch only; PPOTorchPolicy ( #20061 ) ( #20399 )" ( #20417 )
...
This reverts commit 90dc5460d4
.
2021-11-16 14:49:41 +01:00
Stefan Schneider
2b3d0c691f
[RLlib] Document and extend action mask example. ( #20390 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Sven Mika <sven@anyscale.io>
Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-11-16 13:20:41 +01:00
Kai Fricke
3e6ba5d6d2
Revert "Revert [RLlib] POC: PGTrainer
class that works by sub-classing, not trainer_template.py
." ( #20285 )
...
* Revert "Revert "[RLlib] POC: `PGTrainer` class that works by sub-classing, not `trainer_template.py`. (#20055 )" (#20284 )"
This reverts commit 246787cdd9
.
Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-11-16 12:26:47 +01:00
Amog Kamsetty
90dc5460d4
Revert "[RLlib] POC: Deprecate build_policy
(policy template) for torch only; PPOTorchPolicy ( #20061 )" ( #20399 )
...
This reverts commit 5b1c8e46e1
.
2021-11-15 16:11:35 -08:00