Commit graph

310 commits

Author SHA1 Message Date
Artur Niederfahrenhorst
32ad6c6ef1
[RLlib] Replay Buffer capacity check (#23523) 2022-03-29 12:06:27 +02:00
Max Pumperla
60054995e6
[docs] fix doctests and activate CI (#23418) 2022-03-24 17:04:02 -07:00
Jun Gong
d12977c4fb
[RLlib] TF2 Bandit Agent (#22838) 2022-03-21 16:55:55 +01:00
Sven Mika
b1cda46681
[RLlib] SlateQ (tf GPU + multi-GPU) + Bandit fixes (#23276) 2022-03-18 13:45:16 +01:00
Siyuan (Ryans) Zhuang
0c74ecad12
[Lint] Cleanup incorrectly formatted strings (Part 1: RLLib). (#23128) 2022-03-15 17:34:21 +01:00
Artur Niederfahrenhorst
37d129a965
[RLlib] ReplayBuffer API: Test cases. (#22390) 2022-03-08 16:54:12 +01:00
Artur Niederfahrenhorst
c0ade5f0b7
[RLlib] Issue 22625: MultiAgentBatch.timeslices() does not behave as expected. (#22657) 2022-03-08 14:25:48 +01:00
Jun Gong
e765915ded
[RLlib] Make sure SlateQ works with GPU. (#22738) 2022-03-04 17:49:51 +01:00
Kai Fricke
84a163a2c4
[RLlib] Remove atari rom install script (#22797) 2022-03-03 16:55:56 +01:00
Sven Mika
0af100ffae
[RLlib] Fix tree.flatten dict ordering bug: flatten_space([obs_space]) should produce same struct as tree.flatten([obs]). (#22731) 2022-03-01 21:24:24 +01:00
Sven Mika
8e00537b65
[RLlib] SlateQ: framework=tf fixes and SlateQ documentation update (#22543) 2022-02-23 13:03:45 +01:00
Sven Mika
6522935291
[RLlib] Slate-Q tf implementation and tests/benchmarks. (#22389) 2022-02-22 09:36:44 +01:00
Avnish Narayan
740def0a13
[RLlib] Put env-checker on critical path. (#22191) 2022-02-17 14:06:14 +01:00
Sven Mika
04a5c72ea3
Revert "Revert "[RLlib] Speedup A3C up to 3x (new training_iteration function instead of execution_plan) and re-instate Pong learning test."" (#18708) 2022-02-10 13:44:22 +01:00
Sven Mika
637cacedc9
[RLlib] Discussion 4986: OU Exploration (torch) crashes when restoring from checkpoint. (#22245) 2022-02-10 02:58:09 +01:00
Alex Wu
b122f093c1
Revert "[RLlib] Speedup A3C up to 3x (new training_iteration function instead of execution_plan) and re-instate Pong learning test." (#22250)
Reverts ray-project/ray#22126

Breaks rllib:tests/test_io
2022-02-09 09:26:36 -08:00
Artur Niederfahrenhorst
dea3574050
[RLlib] Replay Buffer API (#22114) 2022-02-09 15:04:43 +01:00
Balaji Veeramani
31ed9e5d02
[CI] Replace YAPF disables with Black disables (#21982) 2022-02-08 16:29:25 -08:00
Sven Mika
ac3e6ab411
[RLlib] Speedup A3C up to 3x (new training_iteration function instead of execution_plan) and re-instate Pong learning test. (#22126) 2022-02-08 19:04:13 +01:00
Sven Mika
c17a44cdfa
Revert "Revert "[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learni…" (#22153) 2022-02-08 16:43:00 +01:00
Sven Mika
38d75ce058
[RLlib] Cleanup SlateQ algo; add test + add target Q-net (#21827) 2022-02-04 17:01:12 +01:00
SangBin Cho
a887763b38
Revert "[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learni… (#22105)
This reverts commit 3f03ef8ba8.
2022-02-04 00:54:50 -08:00
Sven Mika
3f03ef8ba8
[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learning via league-based self-play. (#21356) 2022-02-03 09:32:09 +01:00
Balaji Veeramani
7f1bacc7dc
[CI] Format Python code with Black (#21975)
See #21316 and #21311 for the motivation behind these changes.
2022-01-29 18:41:57 -08:00
Sven Mika
ee41800c16
[RLlib] Preparatory PR for multi-agent, multi-GPU learning agent (alpha-star style) #02. (#21649) 2022-01-27 22:07:05 +01:00
Sven Mika
893536ebd9
[RLlib] Move bandits into main agents folder; Make RecSim adapter more accessible; (#21773) 2022-01-27 13:58:12 +01:00
Sven Mika
371fbb17e4
[RLlib] Make policies_to_train more flexible via callable option. (#20735) 2022-01-27 12:17:34 +01:00
Sven Mika
d5bfb7b7da
[RLlib] Preparatory PR for multi-agent multi-GPU learner (alpha-star style) #03 (#21652) 2022-01-25 14:16:58 +01:00
Avnish Narayan
12b087acb8
[RLlib] Base env pre-checker. (#21569) 2022-01-18 16:34:06 +01:00
Jun Gong
7517aefe05
[RLlib] Bring back BC and Marwil learning tests. (#21574) 2022-01-14 14:35:32 +01:00
Avnish Narayan
c0f1202278
[RLlib] MultiAgentEnv pre-checker (#21476) 2022-01-13 11:31:22 +01:00
Sven Mika
90c6b10498
[RLlib] Decentralized multi-agent learning; PR #01 (#21421) 2022-01-13 10:52:55 +01:00
Sven Mika
188324c5c7
[RLlib] Issue 21552: unsquash_action and clip_action (when None) cause wrong actions computed by Trainer.compute_single_action. (#21553) 2022-01-12 18:56:51 +01:00
Sven Mika
f94bd99ce4
[RLlib] Issue 21044: Improve error message for "multiagent" dict checks. (#21448) 2022-01-11 19:50:03 +01:00
Sven Mika
92f030331e
[RLlib] Initial code/comment cleanups in preparation for decentralized multi-agent learner. (#21420) 2022-01-10 11:22:55 +01:00
Sven Mika
35af30a446
[RLlib] Issue 21109: Action unsquashing causes inf/NaN actions for unbounded action spaces. (#21110) 2022-01-10 11:20:37 +01:00
Sven Mika
853d10871c
[RLlib] Issue 18499: PGTrainer with training_iteration fn does not support multi-GPU. (#21376) 2022-01-05 18:22:33 +01:00
Sven Mika
9e6b871739
[RLlib] Better utils for flattening complex inputs and enable prev-actions for LSTM/attention for complex action spaces. (#21330) 2022-01-05 11:29:44 +01:00
Sven Mika
62dbf26394
[RLlib] POC: Run PGTrainer w/o the distr. exec API (Trainer's new training_iteration method). (#20984) 2021-12-21 08:39:05 +01:00
brulu
8b77fc0aef
[RLlib] Updating Repeated space. Allowing numpy arrays and adding representation. (#20799) 2021-12-16 08:27:55 +01:00
Sven Mika
e485aa846a
[RLlib; Docs overhaul] Overhaul of auto-API reference pages (via sphinx autoclass/automodule). (#19786) 2021-12-15 22:32:52 +01:00
Sven Mika
daa4304a91
[RLlib] Switch off preprocessors by default for PGTrainer. (#21008) 2021-12-13 12:04:23 +01:00
Ishant Mrinal
2868d1a2cf
[RLlib] Support for RE3 exploration algorithm (for tf) (#19551) 2021-12-07 13:26:34 +01:00
mvindiola1
8cee0c03bf
[RLlib] Update max_seq_len in pad_batch_to_sequences_of_same_size (#20743) 2021-11-30 18:00:07 +01:00
gjoliver
e7f9e8ceec
[RLlib] Report total_train_steps correctly for offline agents like CQL. (#20541)
* Fix trainer timestep reporting for offline agents like CQL.

* wip.

* extend timesteps_total to 200K for learning_tests_pendulum_cql test

Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-11-22 21:46:45 +01:00
Avnish Narayan
b6077a36d4
[RLlib; Pre-checks/better failure behavior]: Env Checker for Gym Environments (#20481) 2021-11-19 09:41:03 +01:00
Sven Mika
56619b955e
[RLlib; Documentation] Some docstring cleanups; Rename RemoteVectorEnv into RemoteBaseEnv for clarity. (#20250) 2021-11-17 21:40:16 +01:00
gjoliver
724a140795
[rllib] Make sure json can serialize result dict (#20439)
We may have fields in the result dict that are or None.
Make sure our results are json serializable.
2021-11-17 10:27:00 -08:00
gjoliver
6e787f70e0
[Rllib/release] Disable throughput check (#20387)
Throughput check was enabled by d8a61f801f prematurely.
E.g., see state before the commit:
a931076f59/rllib/utils/test_utils.py (L740-L741)
2021-11-16 11:05:51 -08:00
Sven Mika
f82880eda1
Revert "Revert [RLlib] POC: Deprecate build_policy (policy template) for torch only; PPOTorchPolicy (#20061) (#20399)" (#20417)
This reverts commit 90dc5460d4.
2021-11-16 14:49:41 +01:00