Commit graph

302 commits

Author SHA1 Message Date
Edward Oakes
11954e6798
Issue 24143: Fix a few f-strings missing the f. (#24232) 2022-05-02 16:11:33 +02:00
Xuehai Pan
377a522ce2
[RLlib] Fix time dimension shaping for PyTorch RNN models. (#21735) 2022-04-29 10:39:03 +02:00
Ishant Mrinal
0248c60387
[RLlib] Add additional return values to action_sampler_fn. (#22721) 2022-04-29 10:34:48 +02:00
Sven Mika
6551922c21
[RLlib] Fix AlphaStar for tf2+tracing; smaller cleanups around avoiding to wrap a TFPolicy as_eager() or with_tracing more than once. (#24271) 2022-04-28 13:43:21 +02:00
Xuehai Pan
6087eda91b
[RLlib] Issue 21991: Fix SampleBatch slicing for SampleBatch.INFOS in RNN cases (#22050) 2022-04-25 11:40:24 +02:00
Noon van der Silk
3589c21924
[RLlib] Fix some missing f-strings and a f-string related bug in tf eager policy. (#24148) 2022-04-25 11:25:28 +02:00
Jeroen Bédorf
1263015931
[RLlib] Add support for writing env 'info' dicts to output datasets for TFPolicies (for TorchPolicies, these are part of the view-requirements by default and thus written either way). (#24041) 2022-04-25 11:17:50 +02:00
Sven Mika
9de391b70e
[RLlib] Issue 23897: add_time_dimension() causes returned shape to be completely unknown. (#24006) 2022-04-19 17:56:56 +02:00
Sven Mika
de9e143938
[RLlib] Issue 23907: SampleBatch.shuffle does not flush intercepted_values dict (which it should). (#24005) 2022-04-19 17:55:59 +02:00
Kinal Mehta
758e758c32
[rllib] Fix incorrect sequence length for rnn (#23830)
Update the torch policy to find the seq_lens using state_batches instead of input_dict. This helps handle the complex inputs to the model when the inbuilt preprocessing API is disabled.
2022-04-12 21:07:18 +01:00
Sven Mika
a8494742a3
[RLlib] Memory leak finding toolset using tracemalloc + CI memory leak tests. (#15412) 2022-04-12 07:50:09 +02:00
Sven Mika
0b3a79ca41
[RLlib] Issue 23639: Error in client/server setup when using LSTMs (#23740) 2022-04-07 10:16:22 +02:00
Max Pumperla
60054995e6
[docs] fix doctests and activate CI (#23418) 2022-03-24 17:04:02 -07:00
Sven Mika
b1cda46681
[RLlib] SlateQ (tf GPU + multi-GPU) + Bandit fixes (#23276) 2022-03-18 13:45:16 +01:00
Siyuan (Ryans) Zhuang
0c74ecad12
[Lint] Cleanup incorrectly formatted strings (Part 1: RLLib). (#23128) 2022-03-15 17:34:21 +01:00
Artur Niederfahrenhorst
c0ade5f0b7
[RLlib] Issue 22625: MultiAgentBatch.timeslices() does not behave as expected. (#22657) 2022-03-08 14:25:48 +01:00
Xuehai Pan
018ebbf4cb
[RLlib] Issue #21671: Handle callbacks and model metrics for TorchPolicy while using multi-GPU optimizers (#21697) 2022-02-23 08:30:38 +01:00
Steven Morad
d4571741aa
[RLlib] seq_lens should always be torch tensors. (#22398) 2022-02-22 08:15:43 +01:00
Sven Mika
04a5c72ea3
Revert "Revert "[RLlib] Speedup A3C up to 3x (new training_iteration function instead of execution_plan) and re-instate Pong learning test."" (#18708) 2022-02-10 13:44:22 +01:00
Alex Wu
b122f093c1
Revert "[RLlib] Speedup A3C up to 3x (new training_iteration function instead of execution_plan) and re-instate Pong learning test." (#22250)
Reverts ray-project/ray#22126

Breaks rllib:tests/test_io
2022-02-09 09:26:36 -08:00
Sven Mika
ac3e6ab411
[RLlib] Speedup A3C up to 3x (new training_iteration function instead of execution_plan) and re-instate Pong learning test. (#22126) 2022-02-08 19:04:13 +01:00
Sven Mika
c17a44cdfa
Revert "Revert "[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learni…" (#22153) 2022-02-08 16:43:00 +01:00
SangBin Cho
a887763b38
Revert "[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learni… (#22105)
This reverts commit 3f03ef8ba8.
2022-02-04 00:54:50 -08:00
Sven Mika
3f03ef8ba8
[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learning via league-based self-play. (#21356) 2022-02-03 09:32:09 +01:00
Balaji Veeramani
7f1bacc7dc
[CI] Format Python code with Black (#21975)
See #21316 and #21311 for the motivation behind these changes.
2022-01-29 18:41:57 -08:00
Sven Mika
ee41800c16
[RLlib] Preparatory PR for multi-agent, multi-GPU learning agent (alpha-star style) #02. (#21649) 2022-01-27 22:07:05 +01:00
Sven Mika
92f030331e
[RLlib] Initial code/comment cleanups in preparation for decentralized multi-agent learner. (#21420) 2022-01-10 11:22:55 +01:00
Sven Mika
3a3d0a4a2b
[RLlib] Issue 21340: SampleBatch __init__ docstring wrong. (#21447) 2022-01-07 15:48:14 +01:00
Sven Mika
9e6b871739
[RLlib] Better utils for flattening complex inputs and enable prev-actions for LSTM/attention for complex action spaces. (#21330) 2022-01-05 11:29:44 +01:00
Sven Mika
62dbf26394
[RLlib] POC: Run PGTrainer w/o the distr. exec API (Trainer's new training_iteration method). (#20984) 2021-12-21 08:39:05 +01:00
brulu
8b77fc0aef
[RLlib] Updating Repeated space. Allowing numpy arrays and adding representation. (#20799) 2021-12-16 08:27:55 +01:00
Sven Mika
daa4304a91
[RLlib] Switch off preprocessors by default for PGTrainer. (#21008) 2021-12-13 12:04:23 +01:00
Sven Mika
596c8e2772
[RLlib] Experimental no-flatten option for actions/prev-actions. (#20918) 2021-12-11 14:57:58 +01:00
Sven Mika
f814c2af89
[RLlib; Docs] Docs API reference pages: rllib/execution, rllib/evaluation, rllib/models, rllib/offline. (#20538) 2021-12-10 09:41:29 +01:00
Carlo Grisetti
a8286c55af
[RLLib] Fix deprecated convert_to_non_torch_type (#20751) 2021-12-09 14:42:12 +01:00
Ishant Mrinal
2868d1a2cf
[RLlib] Support for RE3 exploration algorithm (for tf) (#19551) 2021-12-07 13:26:34 +01:00
Jun Gong
2317c693cf
[RLlib] Use SampleBrach instead of input dict whenever possible (#20746) 2021-12-02 13:11:26 +01:00
mvindiola1
8cee0c03bf
[RLlib] Update max_seq_len in pad_batch_to_sequences_of_same_size (#20743) 2021-11-30 18:00:07 +01:00
mvindiola1
eadc7669c5
[RLlib] SampleBatch.concat_samples fix incorrect max_seq_len calculation (#20704) 2021-11-29 12:01:40 +01:00
Sven Mika
e37afe0425
[RLlib; Docs] Auto API reference pages overhaul: rllib/policy and rllib/agents packages. (#20537) 2021-11-25 09:35:19 +01:00
Sven Mika
f82880eda1
Revert "Revert [RLlib] POC: Deprecate build_policy (policy template) for torch only; PPOTorchPolicy (#20061) (#20399)" (#20417)
This reverts commit 90dc5460d4.
2021-11-16 14:49:41 +01:00
Kai Fricke
3e6ba5d6d2
Revert "Revert [RLlib] POC: PGTrainer class that works by sub-classing, not trainer_template.py." (#20285)
* Revert "Revert "[RLlib] POC: `PGTrainer` class that works by sub-classing, not `trainer_template.py`. (#20055)" (#20284)"
This reverts commit 246787cdd9.
Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-11-16 12:26:47 +01:00
Amog Kamsetty
90dc5460d4
Revert "[RLlib] POC: Deprecate build_policy (policy template) for torch only; PPOTorchPolicy (#20061)" (#20399)
This reverts commit 5b1c8e46e1.
2021-11-15 16:11:35 -08:00
Sven Mika
6ff4061f3a
[RLlib] Issue 20269: Offline RL example not working due to new_obs not being written to file. (#20366)
* wip.

* Apply suggestions from code review
2021-11-15 16:41:08 +01:00
Sven Mika
5b1c8e46e1
[RLlib] POC: Deprecate build_policy (policy template) for torch only; PPOTorchPolicy (#20061) 2021-11-15 10:41:54 +01:00
Sven Mika
70fe25055a
[RLlib] Issue: Get single step input dict incorrect. (#20217) 2021-11-12 08:38:51 +01:00
Sven Mika
a931076f59
[RLlib] Tf2 + eager-tracing same speed as framework=tf; Add more test coverage for tf2+tracing. (#19981) 2021-11-05 16:10:00 +01:00
Sven Mika
f3397b6f48
[RLlib] Minor fixes/cleanups; chop_into_sequences now handles nested data. (#19408) 2021-11-05 14:39:28 +01:00
Avnish Narayan
026bf01071
[RLlib] Upgrade gym version to 0.21 and deprecate pendulum-v0. (#19535)
* Fix QMix, SAC, and MADDPA too.

* Unpin gym and deprecate pendulum v0

Many tests in rllib depended on pendulum v0,
however in gym 0.21, pendulum v0 was deprecated
in favor of pendulum v1. This may change reward
thresholds, so will have to potentially rerun
all of the pendulum v1 benchmarks, or use another
environment in favor. The same applies to frozen
lake v0 and frozen lake v1

Lastly, all of the RLlib tests and have
been moved to python 3.7

* Add gym installation based on python version.

Pin python<= 3.6 to gym 0.19 due to install
issues with atari roms in gym 0.20

* Reformatting

* Fixing tests

* Move atari-py install conditional to req.txt

* migrate to new ale install method

* Fix QMix, SAC, and MADDPA too.

* Unpin gym and deprecate pendulum v0

Many tests in rllib depended on pendulum v0,
however in gym 0.21, pendulum v0 was deprecated
in favor of pendulum v1. This may change reward
thresholds, so will have to potentially rerun
all of the pendulum v1 benchmarks, or use another
environment in favor. The same applies to frozen
lake v0 and frozen lake v1

Lastly, all of the RLlib tests and have
been moved to python 3.7
* Add gym installation based on python version.

Pin python<= 3.6 to gym 0.19 due to install
issues with atari roms in gym 0.20

Move atari-py install conditional to req.txt

migrate to new ale install method

Make parametric_actions_cartpole return float32 actions/obs

Adding type conversions if obs/actions don't match space

Add utils to make elements match gym space dtypes

Co-authored-by: Jun Gong <jungong@anyscale.com>
Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-11-03 16:24:00 +01:00
Sven Mika
cf21c634a3
[RLlib] Fix deprecated warning for torch_ops.py (soft-replaced by torch_utils.py). (#19982) 2021-11-03 10:00:46 +01:00