Avnish Narayan
f2bb6f6806
[RLlib] Impala training iteration fn ( #23454 )
2022-05-05 16:11:08 +02:00
Artur Niederfahrenhorst
86bc9ecce2
[RLlib] DDPG Training iteration fn & Replay Buffer API ( #24212 )
2022-05-05 09:41:38 +02:00
Sven Mika
b48f63113b
[RLlib] SlateQ fixes: Release learning tests wrong yaml structure + TD-error torch issue ( #24429 )
2022-05-04 13:37:14 +02:00
Kai Fricke
7a4d58d80f
[rllib] Fix doctest failure ( #24343 )
...
Lint was still failing (but only caught with doctest):
```
File "../../python/ray/rllib/utils/numpy.py", line ?, in default
Failed example:
tree.traverse(make_action_immutable, d, top_down=False)
Exception raised:
Traceback (most recent call last):
File "/opt/miniconda/lib/python3.6/doctest.py", line 1330, in __run
compileflags, 1), test.globs)
File "<doctest default[4]>", line 1, in <module>
tree.traverse(make_action_immutable, d, top_down=False)
NameError: name 'make_action_immutable' is not defined
```
2022-04-29 19:13:24 +01:00
Sven Mika
539832f2c5
[RLlib] SlateQ training iteration function. ( #24151 )
2022-04-29 18:38:17 +02:00
Kai Fricke
242706922b
[rllib] Fix linting ( #24335 )
...
#24262 broke linting. This fixes this.
2022-04-29 15:21:11 +01:00
simonsays1980
ff575eeafc
[RLlib] Make actions sent by RLlib to the env immutable. ( #24262 )
2022-04-29 10:27:06 +02:00
Sven Mika
6551922c21
[RLlib] Fix AlphaStar for tf2+tracing; smaller cleanups around avoiding to wrap a TFPolicy as_eager()
or with_tracing
more than once. ( #24271 )
2022-04-28 13:43:21 +02:00
Sven Mika
627b9f2e88
[RLlib] QMIX training iteration function and new replay buffer API. ( #24164 )
2022-04-27 14:24:20 +02:00
Noon van der Silk
38a028de2d
[RLlib] Don't add elements to _agent_ids during env pre-checking. ( #24136 )
2022-04-26 15:55:15 +02:00
Sven Mika
bb4e5cb70a
[RLlib] CQL: training iteration function. ( #24166 )
2022-04-26 14:28:39 +02:00
Artur Niederfahrenhorst
f7be409462
[RLlib] Training Iteration Function for SAC ( #24157 )
2022-04-26 12:37:54 +02:00
Noon van der Silk
3589c21924
[RLlib] Fix some missing f-strings and a f-string related bug in tf eager policy. ( #24148 )
2022-04-25 11:25:28 +02:00
Avnish Narayan
3bf907bcf8
[RLlib] Don't modify environments via the env checker utilities. ( #24083 )
2022-04-22 18:39:47 +02:00
Jun Gong
d3c69ebdb6
[RLlib] Make sure unsquash_action moves user action to proper range ( #23941 )
2022-04-18 18:55:57 +02:00
Artur Niederfahrenhorst
e57ce7efd6
[RLlib] Replay Buffer API and Training Iteration Fn for DQN. ( #23420 )
2022-04-18 12:20:12 +02:00
kourosh hakhamaneshi
c38a29573f
[RLlib] Removed deprecated code with error=True ( #23916 )
2022-04-15 13:51:12 +02:00
Sven Mika
a8494742a3
[RLlib] Memory leak finding toolset using tracemalloc + CI memory leak tests. ( #15412 )
2022-04-12 07:50:09 +02:00
Artur Niederfahrenhorst
02a50f02b7
[RLlib] RepayBuffer: _hit_counts
working again. ( #23586 )
2022-04-07 10:56:25 +02:00
Sven Mika
2eaa54bd76
[RLlib] POC: Config objects instead of dicts (PPO only). ( #23491 )
2022-03-31 18:26:12 +02:00
Artur Niederfahrenhorst
9a64bd4e9b
[RLlib] Simple-Q uses training iteration fn (instead of execution_plan); ReplayBuffer API for Simple-Q ( #22842 )
2022-03-29 14:44:40 +02:00
Artur Niederfahrenhorst
32ad6c6ef1
[RLlib] Replay Buffer capacity check ( #23523 )
2022-03-29 12:06:27 +02:00
Max Pumperla
60054995e6
[docs] fix doctests and activate CI ( #23418 )
2022-03-24 17:04:02 -07:00
Jun Gong
d12977c4fb
[RLlib] TF2 Bandit Agent ( #22838 )
2022-03-21 16:55:55 +01:00
Sven Mika
b1cda46681
[RLlib] SlateQ (tf GPU + multi-GPU) + Bandit fixes ( #23276 )
2022-03-18 13:45:16 +01:00
Siyuan (Ryans) Zhuang
0c74ecad12
[Lint] Cleanup incorrectly formatted strings (Part 1: RLLib). ( #23128 )
2022-03-15 17:34:21 +01:00
Artur Niederfahrenhorst
37d129a965
[RLlib] ReplayBuffer API: Test cases. ( #22390 )
2022-03-08 16:54:12 +01:00
Artur Niederfahrenhorst
c0ade5f0b7
[RLlib] Issue 22625: MultiAgentBatch.timeslices()
does not behave as expected. ( #22657 )
2022-03-08 14:25:48 +01:00
Jun Gong
e765915ded
[RLlib] Make sure SlateQ works with GPU. ( #22738 )
2022-03-04 17:49:51 +01:00
Kai Fricke
84a163a2c4
[RLlib] Remove atari rom install script ( #22797 )
2022-03-03 16:55:56 +01:00
Sven Mika
0af100ffae
[RLlib] Fix tree.flatten dict ordering bug: flatten_space([obs_space])
should produce same struct as tree.flatten([obs])
. ( #22731 )
2022-03-01 21:24:24 +01:00
Sven Mika
8e00537b65
[RLlib] SlateQ: framework=tf fixes and SlateQ documentation update ( #22543 )
2022-02-23 13:03:45 +01:00
Sven Mika
6522935291
[RLlib] Slate-Q tf implementation and tests/benchmarks. ( #22389 )
2022-02-22 09:36:44 +01:00
Avnish Narayan
740def0a13
[RLlib] Put env-checker on critical path. ( #22191 )
2022-02-17 14:06:14 +01:00
Sven Mika
04a5c72ea3
Revert "Revert "[RLlib] Speedup A3C up to 3x (new training_iteration function instead of execution_plan) and re-instate Pong learning test."" ( #18708 )
2022-02-10 13:44:22 +01:00
Sven Mika
637cacedc9
[RLlib] Discussion 4986: OU Exploration (torch) crashes when restoring from checkpoint. ( #22245 )
2022-02-10 02:58:09 +01:00
Alex Wu
b122f093c1
Revert "[RLlib] Speedup A3C up to 3x (new training_iteration
function instead of execution_plan
) and re-instate Pong learning test." ( #22250 )
...
Reverts ray-project/ray#22126
Breaks rllib:tests/test_io
2022-02-09 09:26:36 -08:00
Artur Niederfahrenhorst
dea3574050
[RLlib] Replay Buffer API ( #22114 )
2022-02-09 15:04:43 +01:00
Balaji Veeramani
31ed9e5d02
[CI] Replace YAPF disables with Black disables ( #21982 )
2022-02-08 16:29:25 -08:00
Sven Mika
ac3e6ab411
[RLlib] Speedup A3C up to 3x (new training_iteration
function instead of execution_plan
) and re-instate Pong learning test. ( #22126 )
2022-02-08 19:04:13 +01:00
Sven Mika
c17a44cdfa
Revert "Revert "[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learni…" ( #22153 )
2022-02-08 16:43:00 +01:00
Sven Mika
38d75ce058
[RLlib] Cleanup SlateQ algo; add test + add target Q-net ( #21827 )
2022-02-04 17:01:12 +01:00
SangBin Cho
a887763b38
Revert "[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learni… ( #22105 )
...
This reverts commit 3f03ef8ba8
.
2022-02-04 00:54:50 -08:00
Sven Mika
3f03ef8ba8
[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learning via league-based self-play. ( #21356 )
2022-02-03 09:32:09 +01:00
Balaji Veeramani
7f1bacc7dc
[CI] Format Python code with Black ( #21975 )
...
See #21316 and #21311 for the motivation behind these changes.
2022-01-29 18:41:57 -08:00
Sven Mika
ee41800c16
[RLlib] Preparatory PR for multi-agent, multi-GPU learning agent (alpha-star style) #02 . ( #21649 )
2022-01-27 22:07:05 +01:00
Sven Mika
893536ebd9
[RLlib] Move bandits into main agents folder; Make RecSim adapter more accessible; ( #21773 )
2022-01-27 13:58:12 +01:00
Sven Mika
371fbb17e4
[RLlib] Make policies_to_train
more flexible via callable option. ( #20735 )
2022-01-27 12:17:34 +01:00
Sven Mika
d5bfb7b7da
[RLlib] Preparatory PR for multi-agent multi-GPU learner (alpha-star style) #03 ( #21652 )
2022-01-25 14:16:58 +01:00
Avnish Narayan
12b087acb8
[RLlib] Base env pre-checker. ( #21569 )
2022-01-18 16:34:06 +01:00