Commit graph

336 commits

Author SHA1 Message Date
Sven Mika
ee41800c16
[RLlib] Preparatory PR for multi-agent, multi-GPU learning agent (alpha-star style) #02. (#21649) 2022-01-27 22:07:05 +01:00
Sven Mika
893536ebd9
[RLlib] Move bandits into main agents folder; Make RecSim adapter more accessible; (#21773) 2022-01-27 13:58:12 +01:00
Sven Mika
371fbb17e4
[RLlib] Make policies_to_train more flexible via callable option. (#20735) 2022-01-27 12:17:34 +01:00
Sven Mika
d5bfb7b7da
[RLlib] Preparatory PR for multi-agent multi-GPU learner (alpha-star style) #03 (#21652) 2022-01-25 14:16:58 +01:00
Avnish Narayan
12b087acb8
[RLlib] Base env pre-checker. (#21569) 2022-01-18 16:34:06 +01:00
Jun Gong
7517aefe05
[RLlib] Bring back BC and Marwil learning tests. (#21574) 2022-01-14 14:35:32 +01:00
Avnish Narayan
c0f1202278
[RLlib] MultiAgentEnv pre-checker (#21476) 2022-01-13 11:31:22 +01:00
Sven Mika
90c6b10498
[RLlib] Decentralized multi-agent learning; PR #01 (#21421) 2022-01-13 10:52:55 +01:00
Sven Mika
188324c5c7
[RLlib] Issue 21552: unsquash_action and clip_action (when None) cause wrong actions computed by Trainer.compute_single_action. (#21553) 2022-01-12 18:56:51 +01:00
Sven Mika
f94bd99ce4
[RLlib] Issue 21044: Improve error message for "multiagent" dict checks. (#21448) 2022-01-11 19:50:03 +01:00
Sven Mika
92f030331e
[RLlib] Initial code/comment cleanups in preparation for decentralized multi-agent learner. (#21420) 2022-01-10 11:22:55 +01:00
Sven Mika
35af30a446
[RLlib] Issue 21109: Action unsquashing causes inf/NaN actions for unbounded action spaces. (#21110) 2022-01-10 11:20:37 +01:00
Sven Mika
853d10871c
[RLlib] Issue 18499: PGTrainer with training_iteration fn does not support multi-GPU. (#21376) 2022-01-05 18:22:33 +01:00
Sven Mika
9e6b871739
[RLlib] Better utils for flattening complex inputs and enable prev-actions for LSTM/attention for complex action spaces. (#21330) 2022-01-05 11:29:44 +01:00
Sven Mika
62dbf26394
[RLlib] POC: Run PGTrainer w/o the distr. exec API (Trainer's new training_iteration method). (#20984) 2021-12-21 08:39:05 +01:00
brulu
8b77fc0aef
[RLlib] Updating Repeated space. Allowing numpy arrays and adding representation. (#20799) 2021-12-16 08:27:55 +01:00
Sven Mika
e485aa846a
[RLlib; Docs overhaul] Overhaul of auto-API reference pages (via sphinx autoclass/automodule). (#19786) 2021-12-15 22:32:52 +01:00
Sven Mika
daa4304a91
[RLlib] Switch off preprocessors by default for PGTrainer. (#21008) 2021-12-13 12:04:23 +01:00
Ishant Mrinal
2868d1a2cf
[RLlib] Support for RE3 exploration algorithm (for tf) (#19551) 2021-12-07 13:26:34 +01:00
mvindiola1
8cee0c03bf
[RLlib] Update max_seq_len in pad_batch_to_sequences_of_same_size (#20743) 2021-11-30 18:00:07 +01:00
gjoliver
e7f9e8ceec
[RLlib] Report total_train_steps correctly for offline agents like CQL. (#20541)
* Fix trainer timestep reporting for offline agents like CQL.

* wip.

* extend timesteps_total to 200K for learning_tests_pendulum_cql test

Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-11-22 21:46:45 +01:00
Avnish Narayan
b6077a36d4
[RLlib; Pre-checks/better failure behavior]: Env Checker for Gym Environments (#20481) 2021-11-19 09:41:03 +01:00
Sven Mika
56619b955e
[RLlib; Documentation] Some docstring cleanups; Rename RemoteVectorEnv into RemoteBaseEnv for clarity. (#20250) 2021-11-17 21:40:16 +01:00
gjoliver
724a140795
[rllib] Make sure json can serialize result dict (#20439)
We may have fields in the result dict that are or None.
Make sure our results are json serializable.
2021-11-17 10:27:00 -08:00
gjoliver
6e787f70e0
[Rllib/release] Disable throughput check (#20387)
Throughput check was enabled by d8a61f801f prematurely.
E.g., see state before the commit:
a931076f59/rllib/utils/test_utils.py (L740-L741)
2021-11-16 11:05:51 -08:00
Sven Mika
f82880eda1
Revert "Revert [RLlib] POC: Deprecate build_policy (policy template) for torch only; PPOTorchPolicy (#20061) (#20399)" (#20417)
This reverts commit 90dc5460d4.
2021-11-16 14:49:41 +01:00
Kai Fricke
3e6ba5d6d2
Revert "Revert [RLlib] POC: PGTrainer class that works by sub-classing, not trainer_template.py." (#20285)
* Revert "Revert "[RLlib] POC: `PGTrainer` class that works by sub-classing, not `trainer_template.py`. (#20055)" (#20284)"
This reverts commit 246787cdd9.
Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-11-16 12:26:47 +01:00
Amog Kamsetty
90dc5460d4
Revert "[RLlib] POC: Deprecate build_policy (policy template) for torch only; PPOTorchPolicy (#20061)" (#20399)
This reverts commit 5b1c8e46e1.
2021-11-15 16:11:35 -08:00
Sven Mika
5b1c8e46e1
[RLlib] POC: Deprecate build_policy (policy template) for torch only; PPOTorchPolicy (#20061) 2021-11-15 10:41:54 +01:00
Kai Fricke
246787cdd9
Revert "[RLlib] POC: PGTrainer class that works by sub-classing, not trainer_template.py. (#20055)" (#20284)
This reverts commit 6f85af435f.
2021-11-12 13:09:43 +00:00
Sven Mika
6f85af435f
[RLlib] POC: PGTrainer class that works by sub-classing, not trainer_template.py. (#20055) 2021-11-11 12:16:20 +01:00
gjoliver
d8a61f801f
[RLlib] Create a set of performance benchmark tests to run nightly. (#19945)
* Create a core set of algorithms tests to run nightly.

* Run release tests under tf, tf2, and torch frameworks.

* Fix

* Add eager_tracing option for tf2 framework.

* make sure core tests can run in parallel.

* cql

* Report progress while running nightly/weekly tests.

* Innclude SAC in nightly lineup.

* Revert changes to learning_tests

* rebrand to performance test.

* update build_pipeline.py with new performance_tests name.

* Record stats.

* bug fix, need to populate experiments dict.

* Alphabetize yaml files.

* Allow specifying frameworks. And do not run tf2 by default.

* remove some debugging code.

* fix

* Undo testing changes.

* Do not run CQL regression for now.

* LINT.

Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-11-08 18:15:13 +01:00
Sven Mika
a931076f59
[RLlib] Tf2 + eager-tracing same speed as framework=tf; Add more test coverage for tf2+tracing. (#19981) 2021-11-05 16:10:00 +01:00
Sven Mika
f3397b6f48
[RLlib] Minor fixes/cleanups; chop_into_sequences now handles nested data. (#19408) 2021-11-05 14:39:28 +01:00
Avnish Narayan
026bf01071
[RLlib] Upgrade gym version to 0.21 and deprecate pendulum-v0. (#19535)
* Fix QMix, SAC, and MADDPA too.

* Unpin gym and deprecate pendulum v0

Many tests in rllib depended on pendulum v0,
however in gym 0.21, pendulum v0 was deprecated
in favor of pendulum v1. This may change reward
thresholds, so will have to potentially rerun
all of the pendulum v1 benchmarks, or use another
environment in favor. The same applies to frozen
lake v0 and frozen lake v1

Lastly, all of the RLlib tests and have
been moved to python 3.7

* Add gym installation based on python version.

Pin python<= 3.6 to gym 0.19 due to install
issues with atari roms in gym 0.20

* Reformatting

* Fixing tests

* Move atari-py install conditional to req.txt

* migrate to new ale install method

* Fix QMix, SAC, and MADDPA too.

* Unpin gym and deprecate pendulum v0

Many tests in rllib depended on pendulum v0,
however in gym 0.21, pendulum v0 was deprecated
in favor of pendulum v1. This may change reward
thresholds, so will have to potentially rerun
all of the pendulum v1 benchmarks, or use another
environment in favor. The same applies to frozen
lake v0 and frozen lake v1

Lastly, all of the RLlib tests and have
been moved to python 3.7
* Add gym installation based on python version.

Pin python<= 3.6 to gym 0.19 due to install
issues with atari roms in gym 0.20

Move atari-py install conditional to req.txt

migrate to new ale install method

Make parametric_actions_cartpole return float32 actions/obs

Adding type conversions if obs/actions don't match space

Add utils to make elements match gym space dtypes

Co-authored-by: Jun Gong <jungong@anyscale.com>
Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-11-03 16:24:00 +01:00
Sven Mika
cf21c634a3
[RLlib] Fix deprecated warning for torch_ops.py (soft-replaced by torch_utils.py). (#19982) 2021-11-03 10:00:46 +01:00
Sven Mika
2d24ef0d32
[RLlib] Add all simple learning tests as framework=tf2. (#19273)
* Unpin gym and deprecate pendulum v0

Many tests in rllib depended on pendulum v0,
however in gym 0.21, pendulum v0 was deprecated
in favor of pendulum v1. This may change reward
thresholds, so will have to potentially rerun
all of the pendulum v1 benchmarks, or use another
environment in favor. The same applies to frozen
lake v0 and frozen lake v1

Lastly, all of the RLlib tests and Tune tests have
been moved to python 3.7

* fix tune test_sampler::testSampleBoundsAx

* fix re-install ray for py3.7 tests

Co-authored-by: avnishn <avnishn@uw.edu>
2021-11-02 12:10:17 +01:00
Sven Mika
0b308719f8
[RLlib; Docs overhaul] Docstring cleanup: rllib/utils (#19829) 2021-11-01 21:46:02 +01:00
Sven Mika
b213565783
[RLlib] Fix failing test cases: Soft-deprecate ModelV2.from_batch (in favor of ModelV2.__call__). (#19693) 2021-10-25 15:00:00 +02:00
Carlo Grisetti
5cee8a1985
[release tests] Switch from yaml.load to yaml.safe_load (#19365) 2021-10-13 17:27:25 -07:00
Sven Mika
d439fd7f17
[RLlib] TF2/eager memory leak fixes. (#19198) 2021-10-09 00:11:53 +02:00
Sven Mika
c3e3fc7637
[RLlib] Issue 18280: A3C/IMPALA multi-agent not working. (#19100) 2021-10-07 23:57:53 +02:00
Sven Mika
b4300dd532
[RLlib] Issue 18812: Torch multi-GPU stats not protected against race conditions. (#18937) 2021-10-04 13:29:00 +02:00
Sven Mika
73f5c4039b
[RLlib] Fix flakey test_a3c, test_maml, test_apex_dqn. (#19035) 2021-10-04 13:23:51 +02:00
Jiajun Yao
7588bfd315
[Lint] Add flake8-bugbear (#19053)
* Add flake8-bugbear

* Add flake8-bugbear
2021-10-03 23:24:11 -07:00
Sven Mika
16ad46a654
[RLlib] Fix broken test_r2d2.py. (#19017) 2021-09-30 21:19:37 +02:00
Sven Mika
ac3371a148
[RLlib] Discussion 3644: Fix bug for complex obs spaces containing Box([2D shape]) and discrete component. (#18917) 2021-09-30 16:39:38 +02:00
Sven Mika
ed85f59194
[RLlib] Unify all RLlib Trainer.train() -> results[info][learner][policy ID][learner_stats] and add structure tests. (#18879) 2021-09-30 16:39:05 +02:00
Sven Mika
828f5d26b7
[RLlib] Custom view requirements (e.g. for prev-n-obs) work with compute_single_action and compute_actions_from_input_dict. (#18921) 2021-09-30 15:03:37 +02:00
Sven Mika
61a1274619
[RLlib] No Preprocessors (part 2). (#18468) 2021-09-23 12:56:45 +02:00