Noon van der Silk
|
38a028de2d
|
[RLlib] Don't add elements to _agent_ids during env pre-checking. (#24136)
|
2022-04-26 15:55:15 +02:00 |
|
Sven Mika
|
bb4e5cb70a
|
[RLlib] CQL: training iteration function. (#24166)
|
2022-04-26 14:28:39 +02:00 |
|
Artur Niederfahrenhorst
|
f7be409462
|
[RLlib] Training Iteration Function for SAC (#24157)
|
2022-04-26 12:37:54 +02:00 |
|
Noon van der Silk
|
3589c21924
|
[RLlib] Fix some missing f-strings and a f-string related bug in tf eager policy. (#24148)
|
2022-04-25 11:25:28 +02:00 |
|
Avnish Narayan
|
3bf907bcf8
|
[RLlib] Don't modify environments via the env checker utilities. (#24083)
|
2022-04-22 18:39:47 +02:00 |
|
Jun Gong
|
d3c69ebdb6
|
[RLlib] Make sure unsquash_action moves user action to proper range (#23941)
|
2022-04-18 18:55:57 +02:00 |
|
Artur Niederfahrenhorst
|
e57ce7efd6
|
[RLlib] Replay Buffer API and Training Iteration Fn for DQN. (#23420)
|
2022-04-18 12:20:12 +02:00 |
|
kourosh hakhamaneshi
|
c38a29573f
|
[RLlib] Removed deprecated code with error=True (#23916)
|
2022-04-15 13:51:12 +02:00 |
|
Sven Mika
|
a8494742a3
|
[RLlib] Memory leak finding toolset using tracemalloc + CI memory leak tests. (#15412)
|
2022-04-12 07:50:09 +02:00 |
|
Artur Niederfahrenhorst
|
02a50f02b7
|
[RLlib] RepayBuffer: _hit_counts working again. (#23586)
|
2022-04-07 10:56:25 +02:00 |
|
Sven Mika
|
2eaa54bd76
|
[RLlib] POC: Config objects instead of dicts (PPO only). (#23491)
|
2022-03-31 18:26:12 +02:00 |
|
Artur Niederfahrenhorst
|
9a64bd4e9b
|
[RLlib] Simple-Q uses training iteration fn (instead of execution_plan); ReplayBuffer API for Simple-Q (#22842)
|
2022-03-29 14:44:40 +02:00 |
|
Artur Niederfahrenhorst
|
32ad6c6ef1
|
[RLlib] Replay Buffer capacity check (#23523)
|
2022-03-29 12:06:27 +02:00 |
|
Max Pumperla
|
60054995e6
|
[docs] fix doctests and activate CI (#23418)
|
2022-03-24 17:04:02 -07:00 |
|
Jun Gong
|
d12977c4fb
|
[RLlib] TF2 Bandit Agent (#22838)
|
2022-03-21 16:55:55 +01:00 |
|
Sven Mika
|
b1cda46681
|
[RLlib] SlateQ (tf GPU + multi-GPU) + Bandit fixes (#23276)
|
2022-03-18 13:45:16 +01:00 |
|
Siyuan (Ryans) Zhuang
|
0c74ecad12
|
[Lint] Cleanup incorrectly formatted strings (Part 1: RLLib). (#23128)
|
2022-03-15 17:34:21 +01:00 |
|
Artur Niederfahrenhorst
|
37d129a965
|
[RLlib] ReplayBuffer API: Test cases. (#22390)
|
2022-03-08 16:54:12 +01:00 |
|
Artur Niederfahrenhorst
|
c0ade5f0b7
|
[RLlib] Issue 22625: MultiAgentBatch.timeslices() does not behave as expected. (#22657)
|
2022-03-08 14:25:48 +01:00 |
|
Jun Gong
|
e765915ded
|
[RLlib] Make sure SlateQ works with GPU. (#22738)
|
2022-03-04 17:49:51 +01:00 |
|
Kai Fricke
|
84a163a2c4
|
[RLlib] Remove atari rom install script (#22797)
|
2022-03-03 16:55:56 +01:00 |
|
Sven Mika
|
0af100ffae
|
[RLlib] Fix tree.flatten dict ordering bug: flatten_space([obs_space]) should produce same struct as tree.flatten([obs]) . (#22731)
|
2022-03-01 21:24:24 +01:00 |
|
Sven Mika
|
8e00537b65
|
[RLlib] SlateQ: framework=tf fixes and SlateQ documentation update (#22543)
|
2022-02-23 13:03:45 +01:00 |
|
Sven Mika
|
6522935291
|
[RLlib] Slate-Q tf implementation and tests/benchmarks. (#22389)
|
2022-02-22 09:36:44 +01:00 |
|
Avnish Narayan
|
740def0a13
|
[RLlib] Put env-checker on critical path. (#22191)
|
2022-02-17 14:06:14 +01:00 |
|
Sven Mika
|
04a5c72ea3
|
Revert "Revert "[RLlib] Speedup A3C up to 3x (new training_iteration function instead of execution_plan) and re-instate Pong learning test."" (#18708)
|
2022-02-10 13:44:22 +01:00 |
|
Sven Mika
|
637cacedc9
|
[RLlib] Discussion 4986: OU Exploration (torch) crashes when restoring from checkpoint. (#22245)
|
2022-02-10 02:58:09 +01:00 |
|
Alex Wu
|
b122f093c1
|
Revert "[RLlib] Speedup A3C up to 3x (new training_iteration function instead of execution_plan ) and re-instate Pong learning test." (#22250)
Reverts ray-project/ray#22126
Breaks rllib:tests/test_io
|
2022-02-09 09:26:36 -08:00 |
|
Artur Niederfahrenhorst
|
dea3574050
|
[RLlib] Replay Buffer API (#22114)
|
2022-02-09 15:04:43 +01:00 |
|
Balaji Veeramani
|
31ed9e5d02
|
[CI] Replace YAPF disables with Black disables (#21982)
|
2022-02-08 16:29:25 -08:00 |
|
Sven Mika
|
ac3e6ab411
|
[RLlib] Speedup A3C up to 3x (new training_iteration function instead of execution_plan ) and re-instate Pong learning test. (#22126)
|
2022-02-08 19:04:13 +01:00 |
|
Sven Mika
|
c17a44cdfa
|
Revert "Revert "[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learni…" (#22153)
|
2022-02-08 16:43:00 +01:00 |
|
Sven Mika
|
38d75ce058
|
[RLlib] Cleanup SlateQ algo; add test + add target Q-net (#21827)
|
2022-02-04 17:01:12 +01:00 |
|
SangBin Cho
|
a887763b38
|
Revert "[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learni… (#22105)
This reverts commit 3f03ef8ba8 .
|
2022-02-04 00:54:50 -08:00 |
|
Sven Mika
|
3f03ef8ba8
|
[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learning via league-based self-play. (#21356)
|
2022-02-03 09:32:09 +01:00 |
|
Balaji Veeramani
|
7f1bacc7dc
|
[CI] Format Python code with Black (#21975)
See #21316 and #21311 for the motivation behind these changes.
|
2022-01-29 18:41:57 -08:00 |
|
Sven Mika
|
ee41800c16
|
[RLlib] Preparatory PR for multi-agent, multi-GPU learning agent (alpha-star style) #02. (#21649)
|
2022-01-27 22:07:05 +01:00 |
|
Sven Mika
|
893536ebd9
|
[RLlib] Move bandits into main agents folder; Make RecSim adapter more accessible; (#21773)
|
2022-01-27 13:58:12 +01:00 |
|
Sven Mika
|
371fbb17e4
|
[RLlib] Make policies_to_train more flexible via callable option. (#20735)
|
2022-01-27 12:17:34 +01:00 |
|
Sven Mika
|
d5bfb7b7da
|
[RLlib] Preparatory PR for multi-agent multi-GPU learner (alpha-star style) #03 (#21652)
|
2022-01-25 14:16:58 +01:00 |
|
Avnish Narayan
|
12b087acb8
|
[RLlib] Base env pre-checker. (#21569)
|
2022-01-18 16:34:06 +01:00 |
|
Jun Gong
|
7517aefe05
|
[RLlib] Bring back BC and Marwil learning tests. (#21574)
|
2022-01-14 14:35:32 +01:00 |
|
Avnish Narayan
|
c0f1202278
|
[RLlib] MultiAgentEnv pre-checker (#21476)
|
2022-01-13 11:31:22 +01:00 |
|
Sven Mika
|
90c6b10498
|
[RLlib] Decentralized multi-agent learning; PR #01 (#21421)
|
2022-01-13 10:52:55 +01:00 |
|
Sven Mika
|
188324c5c7
|
[RLlib] Issue 21552: unsquash_action and clip_action (when None) cause wrong actions computed by Trainer.compute_single_action . (#21553)
|
2022-01-12 18:56:51 +01:00 |
|
Sven Mika
|
f94bd99ce4
|
[RLlib] Issue 21044: Improve error message for "multiagent" dict checks. (#21448)
|
2022-01-11 19:50:03 +01:00 |
|
Sven Mika
|
92f030331e
|
[RLlib] Initial code/comment cleanups in preparation for decentralized multi-agent learner. (#21420)
|
2022-01-10 11:22:55 +01:00 |
|
Sven Mika
|
35af30a446
|
[RLlib] Issue 21109: Action unsquashing causes inf/NaN actions for unbounded action spaces. (#21110)
|
2022-01-10 11:20:37 +01:00 |
|
Sven Mika
|
853d10871c
|
[RLlib] Issue 18499: PGTrainer with training_iteration fn does not support multi-GPU. (#21376)
|
2022-01-05 18:22:33 +01:00 |
|
Sven Mika
|
9e6b871739
|
[RLlib] Better utils for flattening complex inputs and enable prev-actions for LSTM/attention for complex action spaces. (#21330)
|
2022-01-05 11:29:44 +01:00 |
|