Grzegorz Rypeść
|
dfb9689701
|
[RLlib] Issue 21489: Unity3D env lacks group rewards (#24016).
|
2022-04-21 18:49:52 +02:00 |
|
Sven Mika
|
92781c603e
|
[RLlib] A2C training_iteration method implementation (_disable_execution_plan_api=True ) (#23735)
|
2022-04-15 18:36:13 +02:00 |
|
kourosh hakhamaneshi
|
c38a29573f
|
[RLlib] Removed deprecated code with error=True (#23916)
|
2022-04-15 13:51:12 +02:00 |
|
Sven Mika
|
a8494742a3
|
[RLlib] Memory leak finding toolset using tracemalloc + CI memory leak tests. (#15412)
|
2022-04-12 07:50:09 +02:00 |
|
Jun Gong
|
500cf7dcef
|
[RLlib] Run test_policy_client_server_setup.sh tests on different ports. (#23787)
|
2022-04-11 22:07:07 +02:00 |
|
Steven Morad
|
00922817b6
|
[RLlib] Rewrite PPO to use training_iteration + enable DD-PPO for Win32. (#23673)
|
2022-04-11 08:39:10 +02:00 |
|
Eric Liang
|
1ff874e8e8
|
[spelling] Add linter rule for mis-capitalizations of RLLib -> RLlib (#23817)
|
2022-04-10 16:12:53 -07:00 |
|
Sven Mika
|
0b3a79ca41
|
[RLlib] Issue 23639: Error in client/server setup when using LSTMs (#23740)
|
2022-04-07 10:16:22 +02:00 |
|
Sven Mika
|
434265edd0
|
[RLlib] Examples folder: All training_iteration translations. (#23712)
|
2022-04-05 16:33:50 +02:00 |
|
mesjou
|
e725472b5b
|
[RLlib] Fix bug in prisoners dillemma example. (#23690)
|
2022-04-05 08:36:20 +02:00 |
|
simonsays1980
|
e4c6e9c3d3
|
[RLlib] Changed the if-block in the example callback to become more readable. (#22900)
|
2022-03-31 09:13:04 +02:00 |
|
Artur Niederfahrenhorst
|
9a64bd4e9b
|
[RLlib] Simple-Q uses training iteration fn (instead of execution_plan); ReplayBuffer API for Simple-Q (#22842)
|
2022-03-29 14:44:40 +02:00 |
|
Sven Mika
|
7cb86acce2
|
[RLlib] trainer_template.py: hard deprecation (error when used). (#23488)
|
2022-03-25 18:25:51 +01:00 |
|
Jun Gong
|
d12977c4fb
|
[RLlib] TF2 Bandit Agent (#22838)
|
2022-03-21 16:55:55 +01:00 |
|
Siyuan (Ryans) Zhuang
|
0c74ecad12
|
[Lint] Cleanup incorrectly formatted strings (Part 1: RLLib). (#23128)
|
2022-03-15 17:34:21 +01:00 |
|
simonsays1980
|
568cf28dd4
|
[RLlib] Example script custom_metrics_and_callbacks.py should work for batch_mode=complete_episodes . (#22684)
|
2022-03-01 09:00:38 +01:00 |
|
Sven Mika
|
7b687e6cd8
|
[RLlib] SlateQ: Add a hard-task learning test to weekly regression suite. (#22544)
|
2022-02-25 21:58:16 +01:00 |
|
Jun Gong
|
a385c9b127
|
[RLlib] Update bandit_envs_recommender_system (#22421)
|
2022-02-24 22:43:41 +01:00 |
|
Sven Mika
|
6522935291
|
[RLlib] Slate-Q tf implementation and tests/benchmarks. (#22389)
|
2022-02-22 09:36:44 +01:00 |
|
Sven Mika
|
c58cd90619
|
[RLlib] Enable Bandits to work in batches mode(s) (vector envs + multiple workers + train_batch_sizes > 1). (#22465)
|
2022-02-17 22:32:26 +01:00 |
|
Avnish Narayan
|
740def0a13
|
[RLlib] Put env-checker on critical path. (#22191)
|
2022-02-17 14:06:14 +01:00 |
|
Sven Mika
|
04a5c72ea3
|
Revert "Revert "[RLlib] Speedup A3C up to 3x (new training_iteration function instead of execution_plan) and re-instate Pong learning test."" (#18708)
|
2022-02-10 13:44:22 +01:00 |
|
Sven Mika
|
44d09c2aa5
|
[RLlib] Filter.clear_buffer() deprecated (use Filter.reset_buffer() instead). (#22246)
|
2022-02-10 02:58:43 +01:00 |
|
Jun Gong
|
3207f537cc
|
[RLlib] RecSim Interest evolution environment should use custom video sampler: IEvVideoSampler due to only one cluster being used. (#22211)
|
2022-02-09 10:29:35 +01:00 |
|
Sven Mika
|
c17a44cdfa
|
Revert "Revert "[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learni…" (#22153)
|
2022-02-08 16:43:00 +01:00 |
|
Sven Mika
|
8b678ddd68
|
[RLlib] Issue 22036: Client should handle concurrent episodes with one being training_enabled=False . (#22076)
|
2022-02-06 12:35:03 +01:00 |
|
Sven Mika
|
38d75ce058
|
[RLlib] Cleanup SlateQ algo; add test + add target Q-net (#21827)
|
2022-02-04 17:01:12 +01:00 |
|
Avnish Narayan
|
0d2ba41e41
|
[RLlib] [CI] Deflake longer running RLlib learning tests for off policy algorithms. Fix seeding issue in TransformedAction Environments (#21685)
|
2022-02-04 14:59:56 +01:00 |
|
SangBin Cho
|
a887763b38
|
Revert "[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learni… (#22105)
This reverts commit 3f03ef8ba8 .
|
2022-02-04 00:54:50 -08:00 |
|
Sven Mika
|
3f03ef8ba8
|
[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learning via league-based self-play. (#21356)
|
2022-02-03 09:32:09 +01:00 |
|
Jun Gong
|
9c95b9a5fa
|
[RLlib] Add an env wrapper so RecSim works with our Bandits agent. (#22028)
|
2022-02-02 12:15:38 +01:00 |
|
Jun Gong
|
a55258eb9c
|
[RLlib] Move bandit example scripts into examples folder. (#21949)
|
2022-02-02 09:20:47 +01:00 |
|
Balaji Veeramani
|
7f1bacc7dc
|
[CI] Format Python code with Black (#21975)
See #21316 and #21311 for the motivation behind these changes.
|
2022-01-29 18:41:57 -08:00 |
|
Sven Mika
|
893536ebd9
|
[RLlib] Move bandits into main agents folder; Make RecSim adapter more accessible; (#21773)
|
2022-01-27 13:58:12 +01:00 |
|
Sven Mika
|
371fbb17e4
|
[RLlib] Make policies_to_train more flexible via callable option. (#20735)
|
2022-01-27 12:17:34 +01:00 |
|
Sven Mika
|
d5bfb7b7da
|
[RLlib] Preparatory PR for multi-agent multi-GPU learner (alpha-star style) #03 (#21652)
|
2022-01-25 14:16:58 +01:00 |
|
Sven Mika
|
c288b97e5f
|
[RLlib] Issue 21629: Video recorder env wrapper not working. Added test case. (#21670)
|
2022-01-24 19:38:21 +01:00 |
|
Avnish Narayan
|
12b087acb8
|
[RLlib] Base env pre-checker. (#21569)
|
2022-01-18 16:34:06 +01:00 |
|
Jun Gong
|
7517aefe05
|
[RLlib] Bring back BC and Marwil learning tests. (#21574)
|
2022-01-14 14:35:32 +01:00 |
|
Sven Mika
|
f94bd99ce4
|
[RLlib] Issue 21044: Improve error message for "multiagent" dict checks. (#21448)
|
2022-01-11 19:50:03 +01:00 |
|
Avnish Narayan
|
f7a5fc36eb
|
[rllib] Give rnnsac_stateless cartpole gpu, increase timeout (#21407)
Increase test_preprocessors runtimes.
|
2022-01-06 11:54:19 -08:00 |
|
Sven Mika
|
c01245763e
|
[RLlib] Revert "Revert "updated pettingzoo wrappers, env versions, urls"" (#21339)
|
2022-01-04 18:30:26 +01:00 |
|
Sven Mika
|
abd3bef63b
|
[RLlib] QMIX better defaults + added to CI learning tests (#21332)
|
2022-01-04 08:54:41 +01:00 |
|
Kai Fricke
|
489e6945a6
|
Revert "[RLlib] Updated pettingzoo wrappers, env versions, urls (#20113)" (#21338)
This reverts commit 327eb84154 .
|
2022-01-03 10:21:25 +00:00 |
|
Benjamin Black
|
327eb84154
|
[RLlib] Updated pettingzoo wrappers, env versions, urls (#20113)
|
2022-01-02 21:29:09 +01:00 |
|
Sven Mika
|
62dbf26394
|
[RLlib] POC: Run PGTrainer w/o the distr. exec API (Trainer's new training_iteration method). (#20984)
|
2021-12-21 08:39:05 +01:00 |
|
Sven Mika
|
daa4304a91
|
[RLlib] Switch off preprocessors by default for PGTrainer. (#21008)
|
2021-12-13 12:04:23 +01:00 |
|
Sven Mika
|
db058d0fb3
|
[RLlib] Rename metrics_smoothing_episodes into metrics_num_episodes_for_smoothing for clarity. (#20983)
|
2021-12-11 20:33:35 +01:00 |
|
Sven Mika
|
596c8e2772
|
[RLlib] Experimental no-flatten option for actions/prev-actions. (#20918)
|
2021-12-11 14:57:58 +01:00 |
|
kk-55
|
9acf2f954d
|
[RLlib] Example containing a proposal for computing an adapted (time-dependent) GAE used by the PPO algorithm (via callback on_postprocess_trajectory) (#20850)
|
2021-12-09 14:48:56 +01:00 |
|