Sven Mika
|
92781c603e
|
[RLlib] A2C training_iteration method implementation (_disable_execution_plan_api=True ) (#23735)
|
2022-04-15 18:36:13 +02:00 |
|
Sven Mika
|
a8494742a3
|
[RLlib] Memory leak finding toolset using tracemalloc + CI memory leak tests. (#15412)
|
2022-04-12 07:50:09 +02:00 |
|
Jun Gong
|
500cf7dcef
|
[RLlib] Run test_policy_client_server_setup.sh tests on different ports. (#23787)
|
2022-04-11 22:07:07 +02:00 |
|
Sven Mika
|
c82f6c62c8
|
[RLlib] Make RolloutWorkers (optionally) recoverable after failure. (#23739)
|
2022-04-08 15:33:28 +02:00 |
|
Sven Mika
|
4d285a00a4
|
[RLlib] Issue 23689: tf Initializer has hard-coded float32 dtype. (#23741)
|
2022-04-07 21:35:02 +02:00 |
|
Sven Mika
|
0b3a79ca41
|
[RLlib] Issue 23639: Error in client/server setup when using LSTMs (#23740)
|
2022-04-07 10:16:22 +02:00 |
|
Sven Mika
|
e391b624f0
|
[RLlib] Re-enable (for CI-testing) our two self_play example scripts. (#23742)
|
2022-04-07 08:20:48 +02:00 |
|
Sven Mika
|
434265edd0
|
[RLlib] Examples folder: All training_iteration translations. (#23712)
|
2022-04-05 16:33:50 +02:00 |
|
Sven Mika
|
b1cda46681
|
[RLlib] SlateQ (tf GPU + multi-GPU) + Bandit fixes (#23276)
|
2022-03-18 13:45:16 +01:00 |
|
Artur Niederfahrenhorst
|
37d129a965
|
[RLlib] ReplayBuffer API: Test cases. (#22390)
|
2022-03-08 16:54:12 +01:00 |
|
Artur Niederfahrenhorst
|
c0ade5f0b7
|
[RLlib] Issue 22625: MultiAgentBatch.timeslices() does not behave as expected. (#22657)
|
2022-03-08 14:25:48 +01:00 |
|
Jiajun Yao
|
4801e57c77
|
[Test] Add missing tests to bazel BUILD (#22827)
|
2022-03-07 19:54:49 -08:00 |
|
Sven Mika
|
e50bd212a1
|
[RLlib] Disable flakey Pendulum-v1 tests (until further investigation). (#22686)
|
2022-03-01 16:44:17 +01:00 |
|
Sven Mika
|
8e00537b65
|
[RLlib] SlateQ: framework=tf fixes and SlateQ documentation update (#22543)
|
2022-02-23 13:03:45 +01:00 |
|
Sven Mika
|
6522935291
|
[RLlib] Slate-Q tf implementation and tests/benchmarks. (#22389)
|
2022-02-22 09:36:44 +01:00 |
|
Sven Mika
|
c58cd90619
|
[RLlib] Enable Bandits to work in batches mode(s) (vector envs + multiple workers + train_batch_sizes > 1). (#22465)
|
2022-02-17 22:32:26 +01:00 |
|
Sven Mika
|
04a5c72ea3
|
Revert "Revert "[RLlib] Speedup A3C up to 3x (new training_iteration function instead of execution_plan) and re-instate Pong learning test."" (#18708)
|
2022-02-10 13:44:22 +01:00 |
|
Alex Wu
|
b122f093c1
|
Revert "[RLlib] Speedup A3C up to 3x (new training_iteration function instead of execution_plan ) and re-instate Pong learning test." (#22250)
Reverts ray-project/ray#22126
Breaks rllib:tests/test_io
|
2022-02-09 09:26:36 -08:00 |
|
Sven Mika
|
ac3e6ab411
|
[RLlib] Speedup A3C up to 3x (new training_iteration function instead of execution_plan ) and re-instate Pong learning test. (#22126)
|
2022-02-08 19:04:13 +01:00 |
|
Sven Mika
|
c17a44cdfa
|
Revert "Revert "[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learni…" (#22153)
|
2022-02-08 16:43:00 +01:00 |
|
Sven Mika
|
8b678ddd68
|
[RLlib] Issue 22036: Client should handle concurrent episodes with one being training_enabled=False . (#22076)
|
2022-02-06 12:35:03 +01:00 |
|
Sven Mika
|
f6617506a2
|
[RLlib] Add on_sub_environment_created to DefaultCallbacks class. (#21893)
|
2022-02-04 22:22:47 +01:00 |
|
Sven Mika
|
38d75ce058
|
[RLlib] Cleanup SlateQ algo; add test + add target Q-net (#21827)
|
2022-02-04 17:01:12 +01:00 |
|
Avnish Narayan
|
0d2ba41e41
|
[RLlib] [CI] Deflake longer running RLlib learning tests for off policy algorithms. Fix seeding issue in TransformedAction Environments (#21685)
|
2022-02-04 14:59:56 +01:00 |
|
SangBin Cho
|
a887763b38
|
Revert "[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learni… (#22105)
This reverts commit 3f03ef8ba8 .
|
2022-02-04 00:54:50 -08:00 |
|
Sven Mika
|
3f03ef8ba8
|
[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learning via league-based self-play. (#21356)
|
2022-02-03 09:32:09 +01:00 |
|
Jun Gong
|
9c95b9a5fa
|
[RLlib] Add an env wrapper so RecSim works with our Bandits agent. (#22028)
|
2022-02-02 12:15:38 +01:00 |
|
Jun Gong
|
a55258eb9c
|
[RLlib] Move bandit example scripts into examples folder. (#21949)
|
2022-02-02 09:20:47 +01:00 |
|
Sven Mika
|
893536ebd9
|
[RLlib] Move bandits into main agents folder; Make RecSim adapter more accessible; (#21773)
|
2022-01-27 13:58:12 +01:00 |
|
Sven Mika
|
d5bfb7b7da
|
[RLlib] Preparatory PR for multi-agent multi-GPU learner (alpha-star style) #03 (#21652)
|
2022-01-25 14:16:58 +01:00 |
|
Sven Mika
|
3ac4daba07
|
[RLlib] Discussion 4351: Conv2d default filter tests and add default setting for 96x96 image obs space. (#21560)
|
2022-01-13 18:50:42 +01:00 |
|
Avnish Narayan
|
f7a5fc36eb
|
[rllib] Give rnnsac_stateless cartpole gpu, increase timeout (#21407)
Increase test_preprocessors runtimes.
|
2022-01-06 11:54:19 -08:00 |
|
Sven Mika
|
9e6b871739
|
[RLlib] Better utils for flattening complex inputs and enable prev-actions for LSTM/attention for complex action spaces. (#21330)
|
2022-01-05 11:29:44 +01:00 |
|
Sven Mika
|
abd3bef63b
|
[RLlib] QMIX better defaults + added to CI learning tests (#21332)
|
2022-01-04 08:54:41 +01:00 |
|
Sven Mika
|
daa4304a91
|
[RLlib] Switch off preprocessors by default for PGTrainer. (#21008)
|
2021-12-13 12:04:23 +01:00 |
|
Sven Mika
|
596c8e2772
|
[RLlib] Experimental no-flatten option for actions/prev-actions. (#20918)
|
2021-12-11 14:57:58 +01:00 |
|
Eric Liang
|
6f93ea437e
|
Remove the flaky test tag (#21006)
|
2021-12-11 01:03:17 -08:00 |
|
Avnish Narayan
|
6996eaa986
|
[RLlib] Add necessary fields to Base Envs, and BaseEnv wrapper classes (#20832)
|
2021-12-09 14:40:40 +01:00 |
|
Ishant Mrinal
|
2868d1a2cf
|
[RLlib] Support for RE3 exploration algorithm (for tf) (#19551)
|
2021-12-07 13:26:34 +01:00 |
|
Sven Mika
|
60b2219d72
|
[RLlib] Allow for evaluation to run by timesteps (alternative to episodes ) and add auto-setting to make sure train doesn't ever have to wait for eval (e.g. long episodes) to finish. (#20757)
|
2021-12-04 13:26:33 +01:00 |
|
Jun Gong
|
65bd8e29f8
|
[RLlib] Update a few things to get rid of the remote_vector_env deprecation warning. (#20753)
|
2021-12-02 13:10:44 +01:00 |
|
mvindiola1
|
8cee0c03bf
|
[RLlib] Update max_seq_len in pad_batch_to_sequences_of_same_size (#20743)
|
2021-11-30 18:00:07 +01:00 |
|
Sven Mika
|
7a585fb275
|
[RLlib; Documentation] RLlib README overhaul. (#20249)
|
2021-11-18 18:08:40 +01:00 |
|
Sven Mika
|
56619b955e
|
[RLlib; Documentation] Some docstring cleanups; Rename RemoteVectorEnv into RemoteBaseEnv for clarity. (#20250)
|
2021-11-17 21:40:16 +01:00 |
|
Avnish Narayan
|
dc17f0a241
|
Add error messages for missing tf and torch imports (#20205)
Co-authored-by: Sven Mika <sven@anyscale.io>
Co-authored-by: sven1977 <svenmika1977@gmail.com>
|
2021-11-16 16:30:53 -08:00 |
|
Sven Mika
|
f82880eda1
|
Revert "Revert [RLlib] POC: Deprecate build_policy (policy template) for torch only; PPOTorchPolicy (#20061) (#20399)" (#20417)
This reverts commit 90dc5460d4 .
|
2021-11-16 14:49:41 +01:00 |
|
Amog Kamsetty
|
90dc5460d4
|
Revert "[RLlib] POC: Deprecate build_policy (policy template) for torch only; PPOTorchPolicy (#20061)" (#20399)
This reverts commit 5b1c8e46e1 .
|
2021-11-15 16:11:35 -08:00 |
|
Sven Mika
|
5b1c8e46e1
|
[RLlib] POC: Deprecate build_policy (policy template) for torch only; PPOTorchPolicy (#20061)
|
2021-11-15 10:41:54 +01:00 |
|
Sven Mika
|
ebd56b57db
|
[RLlib; documentation] "RLlib in 60sec" overhaul. (#20215)
|
2021-11-10 22:20:06 +01:00 |
|
Sven Mika
|
143d23a278
|
[RLlib] Issue 20062: Action inference examples missing (#20144)
|
2021-11-10 18:49:06 +01:00 |
|