Avnish Narayan
393cf4d8f7
[RLlib] Fix action_sampler_fn
call in TorchPolicyV2
(obs_batch
instead of input_dict
arg). ( #25877 )
2022-06-17 08:39:39 +02:00
kourosh hakhamaneshi
f597e21ac8
[RLlib] Fix sample batch concat samples. ( #25572 )
2022-06-14 12:47:29 +02:00
Sven Mika
130b7eeaba
[RLlib] Trainer
to Algorithm
renaming. ( #25539 )
2022-06-11 15:10:39 +02:00
Artur Niederfahrenhorst
94d6c212df
[RLlib] Replay Buffer API documentation. ( #24683 )
2022-06-10 16:47:51 +02:00
Artur Niederfahrenhorst
7495e9c89c
[RLlib] Dreamer Policy sub-classing schema. ( #25585 )
2022-06-09 17:14:15 +02:00
Artur Niederfahrenhorst
5133978adc
[RLlib] PG policy subclassing conversion. ( #25288 )
2022-06-06 13:07:47 +02:00
kourosh hakhamaneshi
d49d0efbaf
[RLlib] Bug fix: when on GPU, sample_batch.to_device() only converts the device and does not convert float64 to float32. ( #25460 )
2022-06-06 12:43:11 +02:00
Sven Mika
b5bc2b93c3
[RLlib] Move all remaining algos into algorithms
directory. ( #25366 )
2022-06-04 07:35:24 +02:00
Yi Cheng
fd0f967d2e
Revert "[RLlib] Move (A/DD)?PPO and IMPALA algos to algorithms
dir and rename policy and trainer classes. ( #25346 )" ( #25420 )
...
This reverts commit e4ceae19ef
.
Reverts #25346
linux://python/ray/tests:test_client_library_integration never fail before this PR.
In the CI of the reverted PR, it also fails (https://buildkite.com/ray-project/ray-builders-pr/builds/34079#01812442-c541-4145-af22-2a012655c128 ). So high likely it's because of this PR.
And test output failure seems related as well (https://buildkite.com/ray-project/ray-builders-branch/builds/7923#018125c2-4812-4ead-a42f-7fddb344105b )
2022-06-02 20:38:44 -07:00
Sven Mika
e4ceae19ef
[RLlib] Move (A/DD)?PPO and IMPALA algos to algorithms
dir and rename policy and trainer classes. ( #25346 )
2022-06-02 16:47:05 +02:00
Artur Niederfahrenhorst
71a8a443ce
[RLlib] Fix Policy global timesteps being off by init sample batch size. ( #25349 )
2022-06-02 10:19:21 +02:00
Eric Liang
905258dbc1
Clean up docstyle in python modules and add LINT rule ( #25272 )
2022-06-01 11:27:54 -07:00
Sven Mika
d95009a3ac
[RLlib] Vectorized envs: Gracefully handle sub-environments failing by restarting them (if configured so). ( #24967 )
2022-05-28 10:50:03 +02:00
Sven Mika
ab6c3027e5
[RLlib] A2/3C policy sub-classing schema. ( #25078 )
2022-05-28 09:54:47 +02:00
kourosh hakhamaneshi
9684ea3af6
[RLlib] Fix TorchPolicyV2 bug. ( #25203 )
2022-05-26 20:49:26 +02:00
Jun Gong
eaf9c941ae
[RLlib] Migrate PPO Impala and APPO policies to use sub-classing implementation. ( #25117 )
2022-05-25 14:38:03 +02:00
Eric Liang
4963dfaae0
[api] Add API stability annotations for all RLlib symbols and add to LINT ( #25060 )
2022-05-24 22:14:25 -07:00
Jun Gong
93ff0beb4e
[RLlib] Introduce utils to serialize gym Spaces (and thus ViewRequirements). ( #25007 )
2022-05-24 21:12:20 +02:00
Steven Morad
501d932449
[RLlib] SAC, RNNSAC, and CQL TrainerConfig objects ( #25059 )
2022-05-22 19:58:47 +02:00
Jun Gong
d5a6d46049
[RLlib] Migrate MAML, MB-MPO, MARWIL, and BC to use Policy sub-classing implementation. ( #24914 )
2022-05-20 14:10:59 +02:00
kourosh hakhamaneshi
3815e52a61
[RLlib] Agents to algos: DQN w/o Apex and R2D2, DDPG/TD3, SAC, SlateQ, QMIX, PG, Bandits ( #24896 )
2022-05-19 18:30:42 +02:00
Jun Gong
dea134a472
[RLlib] Clean up Policy mixins. ( #24746 )
2022-05-17 17:16:08 +02:00
Sven Mika
25001f6d8d
[RLlib] APPO Training iteration fn. ( #24545 )
2022-05-17 10:31:07 +02:00
Jun Gong
bc3a1d35cf
[RLlib] Introduce new policy base classes. ( #24742 )
2022-05-13 21:48:30 +02:00
Artur Niederfahrenhorst
bd2fdf4752
[RLlib] Automate sequences in timeslice_along_seq_lens_with_overlap()
. ( #24561 )
2022-05-09 11:55:06 +02:00
Daewoo Lee
fee35444ab
[RLlib] Issue 24530: Fix add_time_dimension
( #24531 )
...
Co-authored-by: Daewoo Lee <dwlee@rtst.co.kr>
2022-05-06 15:21:42 +02:00
Edward Oakes
11954e6798
Issue 24143: Fix a few f-strings missing the f. ( #24232 )
2022-05-02 16:11:33 +02:00
Xuehai Pan
377a522ce2
[RLlib] Fix time dimension shaping for PyTorch RNN models. ( #21735 )
2022-04-29 10:39:03 +02:00
Ishant Mrinal
0248c60387
[RLlib] Add additional return values to action_sampler_fn
. ( #22721 )
2022-04-29 10:34:48 +02:00
Sven Mika
6551922c21
[RLlib] Fix AlphaStar for tf2+tracing; smaller cleanups around avoiding to wrap a TFPolicy as_eager()
or with_tracing
more than once. ( #24271 )
2022-04-28 13:43:21 +02:00
Xuehai Pan
6087eda91b
[RLlib] Issue 21991: Fix SampleBatch
slicing for SampleBatch.INFOS
in RNN cases ( #22050 )
2022-04-25 11:40:24 +02:00
Noon van der Silk
3589c21924
[RLlib] Fix some missing f-strings and a f-string related bug in tf eager policy. ( #24148 )
2022-04-25 11:25:28 +02:00
Jeroen Bédorf
1263015931
[RLlib] Add support for writing env 'info' dicts to output datasets for TFPolicies (for TorchPolicies, these are part of the view-requirements by default and thus written either way). ( #24041 )
2022-04-25 11:17:50 +02:00
Sven Mika
9de391b70e
[RLlib] Issue 23897: add_time_dimension()
causes returned shape to be completely unknown. ( #24006 )
2022-04-19 17:56:56 +02:00
Sven Mika
de9e143938
[RLlib] Issue 23907: SampleBatch.shuffle does not flush intercepted_values dict (which it should). ( #24005 )
2022-04-19 17:55:59 +02:00
Kinal Mehta
758e758c32
[rllib] Fix incorrect sequence length for rnn ( #23830 )
...
Update the torch policy to find the seq_lens using state_batches instead of input_dict. This helps handle the complex inputs to the model when the inbuilt preprocessing API is disabled.
2022-04-12 21:07:18 +01:00
Sven Mika
a8494742a3
[RLlib] Memory leak finding toolset using tracemalloc + CI memory leak tests. ( #15412 )
2022-04-12 07:50:09 +02:00
Sven Mika
0b3a79ca41
[RLlib] Issue 23639: Error in client/server setup when using LSTMs ( #23740 )
2022-04-07 10:16:22 +02:00
Max Pumperla
60054995e6
[docs] fix doctests and activate CI ( #23418 )
2022-03-24 17:04:02 -07:00
Sven Mika
b1cda46681
[RLlib] SlateQ (tf GPU + multi-GPU) + Bandit fixes ( #23276 )
2022-03-18 13:45:16 +01:00
Siyuan (Ryans) Zhuang
0c74ecad12
[Lint] Cleanup incorrectly formatted strings (Part 1: RLLib). ( #23128 )
2022-03-15 17:34:21 +01:00
Artur Niederfahrenhorst
c0ade5f0b7
[RLlib] Issue 22625: MultiAgentBatch.timeslices()
does not behave as expected. ( #22657 )
2022-03-08 14:25:48 +01:00
Xuehai Pan
018ebbf4cb
[RLlib] Issue #21671 : Handle callbacks and model metrics for TorchPolicy
while using multi-GPU optimizers ( #21697 )
2022-02-23 08:30:38 +01:00
Steven Morad
d4571741aa
[RLlib] seq_lens
should always be torch tensors. ( #22398 )
2022-02-22 08:15:43 +01:00
Sven Mika
04a5c72ea3
Revert "Revert "[RLlib] Speedup A3C up to 3x (new training_iteration function instead of execution_plan) and re-instate Pong learning test."" ( #18708 )
2022-02-10 13:44:22 +01:00
Alex Wu
b122f093c1
Revert "[RLlib] Speedup A3C up to 3x (new training_iteration
function instead of execution_plan
) and re-instate Pong learning test." ( #22250 )
...
Reverts ray-project/ray#22126
Breaks rllib:tests/test_io
2022-02-09 09:26:36 -08:00
Sven Mika
ac3e6ab411
[RLlib] Speedup A3C up to 3x (new training_iteration
function instead of execution_plan
) and re-instate Pong learning test. ( #22126 )
2022-02-08 19:04:13 +01:00
Sven Mika
c17a44cdfa
Revert "Revert "[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learni…" ( #22153 )
2022-02-08 16:43:00 +01:00
SangBin Cho
a887763b38
Revert "[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learni… ( #22105 )
...
This reverts commit 3f03ef8ba8
.
2022-02-04 00:54:50 -08:00
Sven Mika
3f03ef8ba8
[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learning via league-based self-play. ( #21356 )
2022-02-03 09:32:09 +01:00