Commit graph

61 commits

Author SHA1 Message Date
Jun Gong
9b65d5535d
[RLlib] Introduce basic connectors library. (#25311) 2022-06-07 19:18:14 +02:00
Rohan Potdar
a9d8da0100
[RLlib]: Doubly Robust Off-Policy Evaluation. (#25056) 2022-06-07 12:52:19 +02:00
Jun Gong
1d24d6af98
[RLlib] Fix MARWIL tf policy. (#25384) 2022-06-03 10:50:36 +02:00
Eric Liang
905258dbc1
Clean up docstyle in python modules and add LINT rule (#25272) 2022-06-01 11:27:54 -07:00
Rohan Potdar
ab81c8e9ca
[RLlib]: Rename input_evaluation to off_policy_estimation_methods. (#25107) 2022-05-27 13:14:54 +02:00
Eric Liang
4963dfaae0
[api] Add API stability annotations for all RLlib symbols and add to LINT (#25060) 2022-05-24 22:14:25 -07:00
Eric Liang
55d039af32
Annotate datasources and add API annotation check script (#24999)
Why are these changes needed?
Add API stability annotations for datasource classes, and add a linter to check all data classes have appropriate annotations.
2022-05-21 15:05:07 -07:00
Rohan Potdar
5a70b732e8
[RLlib] MARWIL and BC Config. (#24853) 2022-05-21 12:50:20 +02:00
Sven Mika
7cca7782f1
[RLlib] OPE (off policy estimator) API. (#24384) 2022-05-02 21:15:50 +02:00
Kai Fricke
8c2e471265
[AIR] Add RLTrainer interface, implementation, and examples (#23465)
This PR adds a RLTrainer to Ray AIR. It works for both offline and online use cases. In offline training, it will leverage the datasets key of the Trainer API to specify a dataset reader input, used e.g. in Behavioral Cloning (BC). In online training, it is a wrapper around the rllib trainables making use of the parameter layering enabled by the Trainer API.
2022-04-08 17:16:42 -07:00
Kai Fricke
262d6121bb
[rllib] Fix error messages and example for dataset writer (#23419)
Currently the error message and example refer to a field type that is actually format.
2022-03-28 19:53:12 +01:00
Max Pumperla
60054995e6
[docs] fix doctests and activate CI (#23418) 2022-03-24 17:04:02 -07:00
Sven Mika
0af100ffae
[RLlib] Fix tree.flatten dict ordering bug: flatten_space([obs_space]) should produce same struct as tree.flatten([obs]). (#22731) 2022-03-01 21:24:24 +01:00
Jun Gong
6f5afcbce9
[RLlib] Docs enhancements: Setup-dev instructions; Ray datasets integration. (#22239) 2022-02-15 09:09:24 +01:00
Jun Gong
87fe033f7b
[RLlib] Request CPU resources in Trainer.default_resource_request() if using dataset input. (#21948) 2022-02-02 10:20:37 +01:00
Balaji Veeramani
7f1bacc7dc
[CI] Format Python code with Black (#21975)
See #21316 and #21311 for the motivation behind these changes.
2022-01-29 18:41:57 -08:00
Jun Gong
099c170ab4
[RLlib] Dataset Reader/Writer for RLlib (#21808) 2022-01-26 16:00:46 +01:00
xwjiang2010
9af8f11191
Revert "[docs] Clean up doc structure (first part) (#21667)" (#21763)
This reverts commit 38e46c9fb3.
2022-01-20 15:30:56 -08:00
Max Pumperla
38e46c9fb3
[docs] Clean up doc structure (first part) (#21667) 2022-01-20 16:19:04 +01:00
Sven Mika
596c8e2772
[RLlib] Experimental no-flatten option for actions/prev-actions. (#20918) 2021-12-11 14:57:58 +01:00
Sven Mika
f814c2af89
[RLlib; Docs] Docs API reference pages: rllib/execution, rllib/evaluation, rllib/models, rllib/offline. (#20538) 2021-12-10 09:41:29 +01:00
Sungho Joo
dc51af798c
[RLlib] Minor fix on json encoding during worker sampling (#20134)
* import custom json encoder from util and improve encoder default function

* linting
2021-11-09 16:46:41 -08:00
Sven Mika
ea2bea7e30
[RLlib; Docs overhaul] Docstring cleanup: Offline. (#19808) 2021-11-01 10:59:53 +01:00
Sven Mika
cabaa3b3c6
[RLlib Testing] Add A3C/APPO/BC/DDPPO/MARWIL/CQL/ES/ARS/TD3 to weekly learning tests. (#18381) 2021-09-07 11:48:41 +02:00
Sven Mika
18d173b172
[RLlib] Implement policy_maps (multi-agent case) in RolloutWorkers as LRU caches. (#17031) 2021-07-19 13:16:03 -04:00
Sven Mika
1fd0eb805e
[RLlib] Redo fix bug normalize vs unsquash actions (original PR made log-likelihood test flakey). (#17014) 2021-07-13 14:01:30 -04:00
Amog Kamsetty
bc33dc7e96
Revert "[RLlib] Fix bug in policy.py: normalize_actions=True has to call unsquash_action, not normalize_action." (#17002)
This reverts commit 7862dd64ea.
2021-07-12 11:09:14 -07:00
Julius Frost
a88b217d3f
[rllib] Enhancements to Input API for customizing offline datasets (#16957)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-07-10 15:05:25 -07:00
Sven Mika
7862dd64ea
[RLlib] Fix bug in policy.py: normalize_actions=True has to call unsquash_action, not normalize_action. (#16774) 2021-07-08 17:31:34 +02:00
Sven Mika
53206dd440
[RLlib] CQL BC loss fixes; PPO/PG/A2|3C action normalization fixes (#16531) 2021-06-30 12:32:11 +02:00
Sven Mika
e2be41b407
[RLlib] MARWIL + BC: Various fixes and enhancements. (#16218) 2021-06-03 22:29:00 +02:00
Sven Mika
c4a3e1589b
[RLlib] CQL: Bug fixes and OPE example added to test and offline_rl.py example. (#15761) 2021-05-13 09:17:23 +02:00
Michael Luo
4cbe13cdfd
[RLlib] CQL loss fn fixes, MuJoCo + Pendulum benchmarks, offline-RL example script w/ json file. (#15603)
Co-authored-by: Sven Mika <sven@anyscale.io>
Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-05-04 19:06:19 +02:00
Sven Mika
8b3554e37e
[RLlib] Remove all (already soft-deprecated) SampleBatch.data from code. (#15335) 2021-04-15 19:19:51 +02:00
Sven Mika
6708211b59
[RLlib] JSONReader: Mix files if > 1 at beginning (each worker should start with different file). (#14865) 2021-03-24 16:07:40 +01:00
Michael Luo
587f207c2f
[RLlib] Support for D4RL + Semi-working CQL Benchmark (#13550) 2021-01-21 16:43:55 +01:00
Sven Mika
391cdfae8c
[RLlib] Trajectory view API docs. (#12718) 2020-12-30 17:32:21 -08:00
Sven Mika
c524f86785
[RLlib] BC/MARWIL/recurrent nets minor cleanups and bug fixes. (#13064) 2020-12-27 09:46:03 -05:00
Felipe Antunes
4c0f0ce3a9
[RLlib] In OffPolicyEstimators (Offline RL): Include last step of trajectory (#12619) 2020-12-08 12:39:40 +01:00
Sven Mika
f6b84cb2f7
[RLlib] Fix offline logp vs prob bug in OffPolicyEstimator class. (#12158) 2020-11-20 08:59:43 +01:00
Eric Liang
6b7a4dfaa0
[rllib] Forgot to pass ioctx to child json readers (#11839)
* fix ioctx

* fix
2020-11-05 22:07:57 -08:00
Sven Mika
805dad3bc4
[RLlib] SAC algo cleanup. (#10825) 2020-09-20 11:27:02 +02:00
Sven Mika
4b278c36fc
[RLlib] Behavioral Cloning (from MARWIL). (#10619) 2020-09-09 17:33:21 +02:00
Julius Frost
dc659ae89a
make action probabilities a numpy array (#10122) 2020-08-16 11:25:12 -07:00
Sven Mika
2256047876
[RLlib] Rename rllib.utils.types into typing to match built-in python module's name. (#10114) 2020-08-15 13:24:22 +02:00
Julius Frost
6d9d2b320a
[RLlib] Support windows drives other than C drive for the offline json API (#9909) 2020-08-13 11:57:54 +02:00
Sven Mika
b0b0463161
[RLlib] Trajectory View API (preparatory cleanup and enhancements). (#9678) 2020-07-29 21:15:09 +02:00
Michael Luo
b51ab2af66
[RLlib] Offline Type Annotations (#9676)
* Offline Annotations

* Modifications

* Fixed circular dependencies

* Linter fix
2020-07-27 14:01:17 -07:00
Sven Mika
43043ee4d5
[RLlib] Tf2x preparation; part 2 (upgrading try_import_tf()). (#9136)
* WIP.

* Fixes.

* LINT.

* WIP.

* WIP.

* Fixes.

* Fixes.

* Fixes.

* Fixes.

* WIP.

* Fixes.

* Test

* Fix.

* Fixes and LINT.

* Fixes and LINT.

* LINT.
2020-06-30 10:13:20 +02:00
Sven Mika
af1203b9df
[RLlib] Issue 8507 (PyTorch does not support custom loss). (#9142) 2020-06-26 09:52:22 +02:00