Commit graph

401 commits

Author SHA1 Message Date
Rohan Potdar
5b6a58ed28
[RLlib] Add OPE Learning Tests (#27154) 2022-08-02 17:51:38 -07:00
Steven Morad
77318abfaf
[RLlib] Warn on PPO infinite KL loss term. (#26629) 2022-08-01 12:55:26 +02:00
Eric Liang
a4434fac7f
[docs] Fix the remaining style violations in docstrings and add lint rule (#27033) 2022-07-27 22:24:20 -07:00
Jun Gong
acf2bf9b2f
[RLlib] Get rid of all these deprecation warnings. (#27085) 2022-07-27 10:48:54 -07:00
xwjiang2010
fcf897ee72
[air] update rllib example to use Tuner API. (#26987)
update rllib example to use Tuner API.

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>
2022-07-27 12:12:59 +01:00
kourosh hakhamaneshi
5030a4c1d3
[RLlib] Simplify agent collector (#26803) 2022-07-25 13:17:17 -07:00
Artur Niederfahrenhorst
e9a8f7d9ae
[RLlib] Unify gnorm mixin for tf and torch policies. (#26102) 2022-07-24 15:31:09 +02:00
Ishant Mrinal
b32c784c7f
[RLLib] RE3 exploration algorithm TF2 framework support (#25221) 2022-07-23 18:05:01 -07:00
Rohan Potdar
97bcf38ec0
[RLlib] Fix torch None conversion in torch_utils.py::convert_to_torch_tensor. (#26863) 2022-07-23 13:54:57 +02:00
Steven Morad
259429bdc3
Bump gym dep to 0.24 (#26190)
Co-authored-by: Steven Morad <smorad@anyscale.com>
Co-authored-by: Avnish <avnishnarayan@gmail.com>
Co-authored-by: Avnish Narayan <38871737+avnishn@users.noreply.github.com>
2022-07-22 12:37:16 -07:00
Olaf Lipinski
8271406a04
[RLLib] Fix MultiDiscrete not being one-hotted correctly (#26558)
Co-authored-by: Jun Gong <jungong@anyscale.com>
2022-07-20 15:25:53 -07:00
Jun Gong
6b6d3017ba
[RLlib] more connector polishes and fixes. (#26645) 2022-07-19 08:50:28 -07:00
Artur Niederfahrenhorst
0ce3bc5e48
[RLlib] Add/reorder Args of Prioritized/MixIn MultiAgentReplayBuffer. (#26428) 2022-07-18 18:04:03 +02:00
Rohan Potdar
38c9e1d52a
[RLlib]: Fix OPE trainables (#26279)
Co-authored-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
2022-07-17 14:25:53 -07:00
mgerstgrasser
f0e9d1a9bb
[RLlib] In env check, step only expected agents. (#26425) 2022-07-15 09:16:09 +02:00
Sven Mika
4aea24c8a8
[RLlib] restart_failed_sub_environments now works for MA cases and crashes during reset(); +more tests and logging; add eval worker sub-env fault tolerance test. (#26276) 2022-07-15 08:55:14 +02:00
Jun Gong
b383d987d1
[RLlib] Fix a bunch of issues related to connectors. (#26510) 2022-07-13 18:55:20 +02:00
Avnish Narayan
5df66b917d
[Lint Check] Remove broken link (#26505)
The paper is not available anymore.
2022-07-13 10:30:20 +01:00
Jun Gong
0c469e490e
[RLlib] Checkpoint and restore connectors. (#26253) 2022-07-09 01:06:24 -07:00
Jun Gong
d234348bd2
[RLlib] Minor simplification of code. (#26312) 2022-07-08 13:21:54 -07:00
Sven Mika
f8785c49df
[RLlib] Issue 25696: Output writers not working w/ multiple workers. (#25722) 2022-06-30 13:25:56 +02:00
Jun Gong
d83bbda281
[RLlib] Save serialized PolicySpec. Extract num_gpus related logics into a util function. (#25954) 2022-06-30 11:38:21 +02:00
Jun Gong
52bb8e47d4
[RLlib] EnvRunnerV2 and EpisodeV2 that support Connectors. (#25922) 2022-06-30 08:44:10 +02:00
Artur Niederfahrenhorst
64a0eae758
simplexfix (#26122) 2022-06-27 08:25:19 -07:00
Artur Niederfahrenhorst
bed9083f35
[RLlib] Add timeout to filter synchronization. (#25959) 2022-06-24 14:37:43 +02:00
Jun Gong
257e67474c
[RLlib] introduce serialization for our custom gym space types. (#25923) 2022-06-23 22:55:57 -07:00
Jun Gong
8c9cac350d
Fix unit test test_check_env.py and est_check_multi_agent.py. (#25993) 2022-06-23 22:55:41 -07:00
Artur Niederfahrenhorst
a3f1323457
[RLlib] Make QMix use the ReplayBufferAPI (#25560) 2022-06-23 22:55:22 -07:00
Sven Mika
59a967a3a0
[RLlib] Cleanup some deprecated metric keys and classes. (#26036) 2022-06-23 21:30:01 +02:00
Eric Liang
43aa2299e6
[api] Annotate as public / move ray-core APIs to _private and add enforcement rule (#25695)
Enable checking of the ray core module, excluding serve, workflows, and tune, in ./ci/lint/check_api_annotations.py. This required moving many files to ray._private and associated fixes.
2022-06-21 15:13:29 -07:00
Sven Mika
96693055bd
[RLlib] More Trainer -> Algorithm renaming cleanups. (#25869) 2022-06-20 15:54:00 +02:00
Sven Mika
d90c6cfbd6
[RLlib] SimpleQ PolicyV2 (sub-classing). (#25871) 2022-06-17 20:12:16 +02:00
Artur Niederfahrenhorst
a322cc5765
[RLlib] IMPALA/APPO multi-agent mix-in-buffer fixes (plus MA learning tests). (#25848) 2022-06-17 14:10:36 +02:00
Yi Cheng
7b8b0f8e03
Revert "[RLlib] Remove execution plan code no longer used by RLlib. (#25624)" (#25776)
This reverts commit 804719876b.
2022-06-14 13:59:15 -07:00
Jun Gong
c026374acb
[RLlib] Fix the 2 failing RLlib release tests. (#25603) 2022-06-14 14:51:08 +02:00
Avnish Narayan
804719876b
[RLlib] Remove execution plan code no longer used by RLlib. (#25624) 2022-06-14 10:57:27 +02:00
Sven Mika
130b7eeaba
[RLlib] Trainer to Algorithm renaming. (#25539) 2022-06-11 15:10:39 +02:00
Artur Niederfahrenhorst
94d6c212df
[RLlib] Replay Buffer API documentation. (#24683) 2022-06-10 16:47:51 +02:00
Artur Niederfahrenhorst
9226643433
[RLlib] Issue 4965: Fixes PyTorch grad clipping logic and adds grad clipping to QMIX. (#25584) 2022-06-08 19:40:57 +02:00
Jun Gong
9b65d5535d
[RLlib] Introduce basic connectors library. (#25311) 2022-06-07 19:18:14 +02:00
Artur Niederfahrenhorst
429d0f0eee
[RLlib] Fix multi agent environment checks for observations that contain only some agents' obs each step. (#25506) 2022-06-07 10:33:35 +02:00
Artur Niederfahrenhorst
5133978adc
[RLlib] PG policy subclassing conversion. (#25288) 2022-06-06 13:07:47 +02:00
Artur Niederfahrenhorst
c4a0e9d0f2
[RLlib] Disambiguate timestep fragment storage unit in replay buffers. (#25242) 2022-06-06 11:35:49 +02:00
Sven Mika
b5bc2b93c3
[RLlib] Move all remaining algos into algorithms directory. (#25366) 2022-06-04 07:35:24 +02:00
Yi Cheng
fd0f967d2e
Revert "[RLlib] Move (A/DD)?PPO and IMPALA algos to algorithms dir and rename policy and trainer classes. (#25346)" (#25420)
This reverts commit e4ceae19ef.

Reverts #25346

linux://python/ray/tests:test_client_library_integration never fail before this PR.

In the CI of the reverted PR, it also fails (https://buildkite.com/ray-project/ray-builders-pr/builds/34079#01812442-c541-4145-af22-2a012655c128). So high likely it's because of this PR.

And test output failure seems related as well (https://buildkite.com/ray-project/ray-builders-branch/builds/7923#018125c2-4812-4ead-a42f-7fddb344105b)
2022-06-02 20:38:44 -07:00
Sven Mika
e4ceae19ef
[RLlib] Move (A/DD)?PPO and IMPALA algos to algorithms dir and rename policy and trainer classes. (#25346) 2022-06-02 16:47:05 +02:00
Eric Liang
905258dbc1
Clean up docstyle in python modules and add LINT rule (#25272) 2022-06-01 11:27:54 -07:00
Sven Mika
18c03f8d93
[RLlib] A2C + A3C move to algorithms folder and re-name into A2C/A3C (from ...Trainer). (#25314) 2022-06-01 09:29:16 +02:00
Sven Mika
c5edd82c63
[RLlib] MB-MPO TrainerConfig objects. (#25278) 2022-05-30 17:33:01 +02:00
Sven Mika
d95009a3ac
[RLlib] Vectorized envs: Gracefully handle sub-environments failing by restarting them (if configured so). (#24967) 2022-05-28 10:50:03 +02:00