Commit graph

272 commits

Author SHA1 Message Date
brulu
8b77fc0aef
[RLlib] Updating Repeated space. Allowing numpy arrays and adding representation. (#20799) 2021-12-16 08:27:55 +01:00
Sven Mika
daa4304a91
[RLlib] Switch off preprocessors by default for PGTrainer. (#21008) 2021-12-13 12:04:23 +01:00
Sven Mika
596c8e2772
[RLlib] Experimental no-flatten option for actions/prev-actions. (#20918) 2021-12-11 14:57:58 +01:00
Sven Mika
f814c2af89
[RLlib; Docs] Docs API reference pages: rllib/execution, rllib/evaluation, rllib/models, rllib/offline. (#20538) 2021-12-10 09:41:29 +01:00
Carlo Grisetti
a8286c55af
[RLLib] Fix deprecated convert_to_non_torch_type (#20751) 2021-12-09 14:42:12 +01:00
Ishant Mrinal
2868d1a2cf
[RLlib] Support for RE3 exploration algorithm (for tf) (#19551) 2021-12-07 13:26:34 +01:00
Jun Gong
2317c693cf
[RLlib] Use SampleBrach instead of input dict whenever possible (#20746) 2021-12-02 13:11:26 +01:00
mvindiola1
8cee0c03bf
[RLlib] Update max_seq_len in pad_batch_to_sequences_of_same_size (#20743) 2021-11-30 18:00:07 +01:00
mvindiola1
eadc7669c5
[RLlib] SampleBatch.concat_samples fix incorrect max_seq_len calculation (#20704) 2021-11-29 12:01:40 +01:00
Sven Mika
e37afe0425
[RLlib; Docs] Auto API reference pages overhaul: rllib/policy and rllib/agents packages. (#20537) 2021-11-25 09:35:19 +01:00
Sven Mika
f82880eda1
Revert "Revert [RLlib] POC: Deprecate build_policy (policy template) for torch only; PPOTorchPolicy (#20061) (#20399)" (#20417)
This reverts commit 90dc5460d4.
2021-11-16 14:49:41 +01:00
Kai Fricke
3e6ba5d6d2
Revert "Revert [RLlib] POC: PGTrainer class that works by sub-classing, not trainer_template.py." (#20285)
* Revert "Revert "[RLlib] POC: `PGTrainer` class that works by sub-classing, not `trainer_template.py`. (#20055)" (#20284)"
This reverts commit 246787cdd9.
Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-11-16 12:26:47 +01:00
Amog Kamsetty
90dc5460d4
Revert "[RLlib] POC: Deprecate build_policy (policy template) for torch only; PPOTorchPolicy (#20061)" (#20399)
This reverts commit 5b1c8e46e1.
2021-11-15 16:11:35 -08:00
Sven Mika
6ff4061f3a
[RLlib] Issue 20269: Offline RL example not working due to new_obs not being written to file. (#20366)
* wip.

* Apply suggestions from code review
2021-11-15 16:41:08 +01:00
Sven Mika
5b1c8e46e1
[RLlib] POC: Deprecate build_policy (policy template) for torch only; PPOTorchPolicy (#20061) 2021-11-15 10:41:54 +01:00
Sven Mika
70fe25055a
[RLlib] Issue: Get single step input dict incorrect. (#20217) 2021-11-12 08:38:51 +01:00
Sven Mika
a931076f59
[RLlib] Tf2 + eager-tracing same speed as framework=tf; Add more test coverage for tf2+tracing. (#19981) 2021-11-05 16:10:00 +01:00
Sven Mika
f3397b6f48
[RLlib] Minor fixes/cleanups; chop_into_sequences now handles nested data. (#19408) 2021-11-05 14:39:28 +01:00
Avnish Narayan
026bf01071
[RLlib] Upgrade gym version to 0.21 and deprecate pendulum-v0. (#19535)
* Fix QMix, SAC, and MADDPA too.

* Unpin gym and deprecate pendulum v0

Many tests in rllib depended on pendulum v0,
however in gym 0.21, pendulum v0 was deprecated
in favor of pendulum v1. This may change reward
thresholds, so will have to potentially rerun
all of the pendulum v1 benchmarks, or use another
environment in favor. The same applies to frozen
lake v0 and frozen lake v1

Lastly, all of the RLlib tests and have
been moved to python 3.7

* Add gym installation based on python version.

Pin python<= 3.6 to gym 0.19 due to install
issues with atari roms in gym 0.20

* Reformatting

* Fixing tests

* Move atari-py install conditional to req.txt

* migrate to new ale install method

* Fix QMix, SAC, and MADDPA too.

* Unpin gym and deprecate pendulum v0

Many tests in rllib depended on pendulum v0,
however in gym 0.21, pendulum v0 was deprecated
in favor of pendulum v1. This may change reward
thresholds, so will have to potentially rerun
all of the pendulum v1 benchmarks, or use another
environment in favor. The same applies to frozen
lake v0 and frozen lake v1

Lastly, all of the RLlib tests and have
been moved to python 3.7
* Add gym installation based on python version.

Pin python<= 3.6 to gym 0.19 due to install
issues with atari roms in gym 0.20

Move atari-py install conditional to req.txt

migrate to new ale install method

Make parametric_actions_cartpole return float32 actions/obs

Adding type conversions if obs/actions don't match space

Add utils to make elements match gym space dtypes

Co-authored-by: Jun Gong <jungong@anyscale.com>
Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-11-03 16:24:00 +01:00
Sven Mika
cf21c634a3
[RLlib] Fix deprecated warning for torch_ops.py (soft-replaced by torch_utils.py). (#19982) 2021-11-03 10:00:46 +01:00
Sven Mika
2d24ef0d32
[RLlib] Add all simple learning tests as framework=tf2. (#19273)
* Unpin gym and deprecate pendulum v0

Many tests in rllib depended on pendulum v0,
however in gym 0.21, pendulum v0 was deprecated
in favor of pendulum v1. This may change reward
thresholds, so will have to potentially rerun
all of the pendulum v1 benchmarks, or use another
environment in favor. The same applies to frozen
lake v0 and frozen lake v1

Lastly, all of the RLlib tests and Tune tests have
been moved to python 3.7

* fix tune test_sampler::testSampleBoundsAx

* fix re-install ray for py3.7 tests

Co-authored-by: avnishn <avnishn@uw.edu>
2021-11-02 12:10:17 +01:00
Sven Mika
0b308719f8
[RLlib; Docs overhaul] Docstring cleanup: rllib/utils (#19829) 2021-11-01 21:46:02 +01:00
Sven Mika
9c73871da0
[RLlib; Docs overhaul] Docstring cleanup: Evaluation (#19783) 2021-10-29 12:03:56 +02:00
gjoliver
d81885c1f1
[RLlib] Fix all the CI tests that were broken by is_training and replay buffer changes; re-comment-in the failing RLlib tests (#19809)
* Fix DDPG, since it is based on GenericOffPolicyTrainer.

* Fix QMix, SAC, and MADDPA too.

* Undo QMix change.

* Fix DQN input batch type. Always use SampleBatch.

* apex ddpg should not use replay_buffer_config yet.

* Make eager tf policy to use SampleBatch.

* lint

* LINT.

* Re-enable RLlib broken tests to make sure things work ok now.

* fixes.

Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-10-28 18:06:47 +02:00
Sven Mika
f2cb2ed203
[RLlib; Docs overhaul] Docstring cleanup: Policies, policy_templates. (#19759) 2021-10-27 19:14:39 +02:00
Avnish Narayan
ad87ddf93e
[rllib] Add deterministic test to gpu (#19306)
Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-10-26 10:11:39 -07:00
Sven Mika
b213565783
[RLlib] Fix failing test cases: Soft-deprecate ModelV2.from_batch (in favor of ModelV2.__call__). (#19693) 2021-10-25 15:00:00 +02:00
gjoliver
c3c42278e4
[RLlib] clean up all the SampleBatch['is_training'] deprecation warnings (#19652)
* [RLlib] clean up all the SampleBatch['is_training'] deprecation warnings.

* wip
2021-10-25 09:38:56 +02:00
Sven Mika
bd2d2079d2
[RLlib] Support >1 loss terms and optimizers for framework=tf2 (already supported for framework=[tf|torch]) (#19269) 2021-10-10 12:19:47 +02:00
Sven Mika
d439fd7f17
[RLlib] TF2/eager memory leak fixes. (#19198) 2021-10-09 00:11:53 +02:00
Sven Mika
b4300dd532
[RLlib] Issue 18812: Torch multi-GPU stats not protected against race conditions. (#18937) 2021-10-04 13:29:00 +02:00
Sven Mika
ac3371a148
[RLlib] Discussion 3644: Fix bug for complex obs spaces containing Box([2D shape]) and discrete component. (#18917) 2021-09-30 16:39:38 +02:00
Sven Mika
ed85f59194
[RLlib] Unify all RLlib Trainer.train() -> results[info][learner][policy ID][learner_stats] and add structure tests. (#18879) 2021-09-30 16:39:05 +02:00
Sven Mika
828f5d26b7
[RLlib] Custom view requirements (e.g. for prev-n-obs) work with compute_single_action and compute_actions_from_input_dict. (#18921) 2021-09-30 15:03:37 +02:00
Sven Mika
05a55a9335
[RLlib] Issue 18668: Unity3D env client/server example not working (fix + add to test cases). (#18942) 2021-09-30 08:30:20 +02:00
Sven Mika
61a1274619
[RLlib] No Preprocessors (part 2). (#18468) 2021-09-23 12:56:45 +02:00
Sven Mika
a96dbd885b
[RLlib] Reinstate trajectory view API tests. (#18809) 2021-09-23 08:31:51 +02:00
Sven Mika
698b4eeed3
[RLlib] POC: Separate losses for APPO/IMPALA. Enable TFPolicy to handle multiple optimizers/losses (like TorchPolicy). (#18669) 2021-09-21 22:00:14 +02:00
Sven Mika
08c09737fa
[RLlib] Fix R2D2 (torch) multi-GPU issue. (#18550) 2021-09-14 19:58:10 +02:00
Sven Mika
8a066474d4
[RLlib] No Preprocessors; preparatory PR #1 (#18367) 2021-09-09 08:10:42 +02:00
Sven Mika
cd22a7d1bb
[RLlib] Add locking to PolicyMap in case it is accessed by a RolloutWorker and the same worker's AsyncSampler or the main LearnerThread. (#18444) 2021-09-08 23:32:23 +02:00
Sven Mika
ba58f5edb1
[RLlib] Strictly run evaluation_num_episodes episodes each evaluation run (no matter the other eval config settings). (#18335) 2021-09-05 15:37:05 +02:00
Sven Mika
9a8ca6a69d
[RLlib] Fix Atari learning test regressions (2 bugs) and 1 minor attention net bug. (#18306) 2021-09-03 13:29:57 +02:00
Sven Mika
4888d7c9af
[RLlib] Replay buffers: Add config option to store contents in checkpoints. (#17999) 2021-08-31 12:21:49 +02:00
Sven Mika
9883505e84
[RLlib] Add [LSTM=True + multi-GPU]-tests to nightly RLlib testing suite (for all algos supporting RNNs, except R2D2, RNNSAC, and DDPPO). (#18017) 2021-08-24 21:55:27 +02:00
Sven Mika
494ddd98c1
[RLlib] Replace "seq_lens" w/ SampleBatch.SEQ_LENS. (#17928) 2021-08-21 17:05:48 +02:00
Sven Mika
a428f10ebe
[RLlib] Add multi-GPU learning tests to nightly. (#17778) 2021-08-18 17:21:01 +02:00
Sven Mika
f18213712f
[RLlib] Redo: "fix self play example scripts" PR (17566) (#17895)
* wip.

* wip.

* wip.

* wip.

* wip.

* wip.

* wip.

* wip.

* wip.
2021-08-17 09:13:35 -07:00
simonsays1980
7b33dc21dc
[RLlib] Fix update model view requirements from init state for bare-metal policies with custom view-reqs. (#17867)
* Changed '_update_model_view_requirements_from_init_state()' to adopt the 'shift' in view_requirements from a user-defined policy that inherits directly from Policy.

* Added slightly modifed version of Sven's suggestion. Like this any user-defined attributes of the ViewRequirement of the state get conserved.

* I saw that the code in _update_model_view_requirements_from_init_state() had changed and is not identical to my locally installed version. In the new version view_requirements from the model and the policy get united and therefore a loop runs through this unified list. Code should run now in the present version

* Apply suggestions from code review
2021-08-17 11:49:24 +02:00
Sven Mika
f3bbe4ea44
[RLlib] Test cases/BUILD cleanup; split "everything else" (longest running one rn) tests in 2. (#17640) 2021-08-16 22:01:01 +02:00