Commit graph

265 commits

Author SHA1 Message Date
Avnish Narayan
026bf01071
[RLlib] Upgrade gym version to 0.21 and deprecate pendulum-v0. (#19535)
* Fix QMix, SAC, and MADDPA too.

* Unpin gym and deprecate pendulum v0

Many tests in rllib depended on pendulum v0,
however in gym 0.21, pendulum v0 was deprecated
in favor of pendulum v1. This may change reward
thresholds, so will have to potentially rerun
all of the pendulum v1 benchmarks, or use another
environment in favor. The same applies to frozen
lake v0 and frozen lake v1

Lastly, all of the RLlib tests and have
been moved to python 3.7

* Add gym installation based on python version.

Pin python<= 3.6 to gym 0.19 due to install
issues with atari roms in gym 0.20

* Reformatting

* Fixing tests

* Move atari-py install conditional to req.txt

* migrate to new ale install method

* Fix QMix, SAC, and MADDPA too.

* Unpin gym and deprecate pendulum v0

Many tests in rllib depended on pendulum v0,
however in gym 0.21, pendulum v0 was deprecated
in favor of pendulum v1. This may change reward
thresholds, so will have to potentially rerun
all of the pendulum v1 benchmarks, or use another
environment in favor. The same applies to frozen
lake v0 and frozen lake v1

Lastly, all of the RLlib tests and have
been moved to python 3.7
* Add gym installation based on python version.

Pin python<= 3.6 to gym 0.19 due to install
issues with atari roms in gym 0.20

Move atari-py install conditional to req.txt

migrate to new ale install method

Make parametric_actions_cartpole return float32 actions/obs

Adding type conversions if obs/actions don't match space

Add utils to make elements match gym space dtypes

Co-authored-by: Jun Gong <jungong@anyscale.com>
Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-11-03 16:24:00 +01:00
Sven Mika
cf21c634a3
[RLlib] Fix deprecated warning for torch_ops.py (soft-replaced by torch_utils.py). (#19982) 2021-11-03 10:00:46 +01:00
Sven Mika
0b308719f8
[RLlib; Docs overhaul] Docstring cleanup: rllib/utils (#19829) 2021-11-01 21:46:02 +01:00
Sven Mika
9c73871da0
[RLlib; Docs overhaul] Docstring cleanup: Evaluation (#19783) 2021-10-29 12:03:56 +02:00
Rohan138
b9c9cc5946
[RLlib] Updated PettingZoo+RLlib tutorial; Removed pettingzoo example script (#19069)
* Updated PettingZoo+RLlib tutorial

Updated the tutorial and added link to the blog post by the PettingZoo team.

* Ran linting

* Converted link to tinyurl for linting

* fixed line lengths

* Decrease num_workers to 1

* Added comments

* Decreased num_workers

* Decreased timesteps

* Increased num_workers

* Update links and remove pettingzoo_env.py

* remove pettingzoo.py script from tests

Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-10-29 10:57:10 +02:00
Sven Mika
902e854af2
[RLlib; Docs overhaul] Docstring cleanup: Environments. (#19784)
* wip.

* Test: Make a change in tune to trigger tune tests, which are not run otherwise, but seem to fail nevertheless with this PR's changes.

* remove bare_metal_policy_with_custom_view_reqs from tests
2021-10-29 10:46:52 +02:00
gjoliver
39b0faa3ec
[RLlib]: bug fix, should be input_dict['is_training'] (#19805) 2021-10-27 23:30:43 +02:00
gjoliver
99a0088233
[RLlib] Unify the way we create local replay buffer for all agents (#19627)
* [RLlib] Unify the way we create and use LocalReplayBuffer for all the agents.

This change
1. Get rid of the try...except clause when we call execution_plan(),
   and get rid of the Deprecation warning as a result.
2. Fix the execution_plan() call in Trainer._try_recover() too.
3. Most importantly, makes it much easier to create and use different types
   of local replay buffers for all our agents.
   E.g., allow us to easily create a reservoir sampling replay buffer for
   APPO agent for Riot in the near future.
* Introduce explicit configuration for replay buffer types.
* Fix is_training key error.
* actually deprecate buffer_size field.
2021-10-26 20:56:02 +02:00
Avnish Narayan
ad87ddf93e
[rllib] Add deterministic test to gpu (#19306)
Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-10-26 10:11:39 -07:00
Sven Mika
b213565783
[RLlib] Fix failing test cases: Soft-deprecate ModelV2.from_batch (in favor of ModelV2.__call__). (#19693) 2021-10-25 15:00:00 +02:00
gjoliver
89fbfc00f8
[RLlib] Some minor cleanups (buffer buffer_size -> capacity and others). (#19623) 2021-10-25 09:42:39 +02:00
gjoliver
c3c42278e4
[RLlib] clean up all the SampleBatch['is_training'] deprecation warnings (#19652)
* [RLlib] clean up all the SampleBatch['is_training'] deprecation warnings.

* wip
2021-10-25 09:38:56 +02:00
Sven Mika
d439fd7f17
[RLlib] TF2/eager memory leak fixes. (#19198) 2021-10-09 00:11:53 +02:00
Sven Mika
fd438d5630
[RLlib] Issue 18104: Cannot set remote_worker_envs=True for non local-mode and MultiAgentEnv. (#19133) 2021-10-07 22:39:21 +02:00
Sven Mika
b4300dd532
[RLlib] Issue 18812: Torch multi-GPU stats not protected against race conditions. (#18937) 2021-10-04 13:29:00 +02:00
Jiajun Yao
7588bfd315
[Lint] Add flake8-bugbear (#19053)
* Add flake8-bugbear

* Add flake8-bugbear
2021-10-03 23:24:11 -07:00
Sven Mika
ed85f59194
[RLlib] Unify all RLlib Trainer.train() -> results[info][learner][policy ID][learner_stats] and add structure tests. (#18879) 2021-09-30 16:39:05 +02:00
Sven Mika
828f5d26b7
[RLlib] Custom view requirements (e.g. for prev-n-obs) work with compute_single_action and compute_actions_from_input_dict. (#18921) 2021-09-30 15:03:37 +02:00
Sven Mika
05a55a9335
[RLlib] Issue 18668: Unity3D env client/server example not working (fix + add to test cases). (#18942) 2021-09-30 08:30:20 +02:00
mvindiola1
62f5da0b65
[RLlib] Add unit tests for updating episode data in base_env (#17137) 2021-09-24 16:08:11 +02:00
Sven Mika
61a1274619
[RLlib] No Preprocessors (part 2). (#18468) 2021-09-23 12:56:45 +02:00
Sven Mika
a96dbd885b
[RLlib] Reinstate trajectory view API tests. (#18809) 2021-09-23 08:31:51 +02:00
Sven Mika
698b4eeed3
[RLlib] POC: Separate losses for APPO/IMPALA. Enable TFPolicy to handle multiple optimizers/losses (like TorchPolicy). (#18669) 2021-09-21 22:00:14 +02:00
Sven Mika
fd13bac9b3
[RLlib] Add worker arg (optional) to policy_mapping_fn. (#18184) 2021-09-17 12:07:11 +02:00
Sven Mika
8a72824c63
[RLlib Testig] Split and unflake more CI tests (make sure all jobs are < 30min). (#18591) 2021-09-15 22:16:48 +02:00
Sven Mika
ea4a22249c
[RLlib] Add simple action-masking example script/env/model (tf and torch). (#18494) 2021-09-11 23:08:09 +02:00
Sven Mika
8a066474d4
[RLlib] No Preprocessors; preparatory PR #1 (#18367) 2021-09-09 08:10:42 +02:00
Sven Mika
1520c3d147
[RLlib] Deepcopy env_ctx for vectorized sub-envs AND add eval-worker-option to Trainer.add_policy() (#18428) 2021-09-09 07:10:06 +02:00
Sven Mika
45f60e51a9
[RLlib] DDPPO fixes and benchmarks. (#18390) 2021-09-08 19:39:01 +02:00
Sven Mika
56f142cac1
[RLlib] Add support for evaluation_num_episodes=auto (run eval for as long as the parallel train step takes). (#18380) 2021-09-07 08:08:37 +02:00
Sven Mika
5292b70fc6
[RLlib] Add multi-GPU attention net tests to nightly test suite (+ R2D2 tests for LSTM and attention nets). (#18368) 2021-09-06 17:48:05 +02:00
Sven Mika
59f796edf3
[RLlib] Fix crash when using StochasticSampling exploration (most PG-style algos) w/ tf and numpy > 1.19.5 (#18366) 2021-09-06 12:14:00 +02:00
Sven Mika
ba58f5edb1
[RLlib] Strictly run evaluation_num_episodes episodes each evaluation run (no matter the other eval config settings). (#18335) 2021-09-05 15:37:05 +02:00
Sven Mika
a772c775cd
[RLlib] Set random seed (if provided) to Trainer process as well. (#18307) 2021-09-04 11:02:30 +02:00
gjoliver
336e79956a
[RLlib] Make MultiAgentEnv inherit gym.Env to avoid direct class type manipulation (#18156) 2021-09-03 08:02:05 +02:00
Sven Mika
2357bbc0c8
[RLlib] Issue 18231: Better (earlier) env validation and error message improvement. (#18249) 2021-09-02 09:28:16 +02:00
Sven Mika
82465f9342
[RLlib] Better PolicyServer example (w/ or w/o tune) and add printing out actual listen port address in log-level=INFO. (#18254) 2021-08-31 22:03:23 +02:00
Joseph Suarez
8136d2912b
[RLlib] Add policies arg to callback: on_episode_step (already exists in all other episode-related callbacks) (#18119) 2021-08-27 16:12:19 +02:00
gjoliver
a8813675f4
[RLlib] Issue 17900: Set seed in single vectorized sub-envs properly, if num_envs_per_worker > 1 (#18110)
* In case a worker runs multiple envs, make sure a different seed can be deterministically set on all of them.

* Revert a couple of whitespace changes.

* Fix a few style errors.

Co-authored-by: Jun Gong <jungong@mbpro.local>
2021-08-26 11:32:58 +02:00
Sven Mika
494ddd98c1
[RLlib] Replace "seq_lens" w/ SampleBatch.SEQ_LENS. (#17928) 2021-08-21 17:05:48 +02:00
simonsays1980
60aee4a330
[RLlib] Add example script for bare metal Policy with custom view_requirements. (#17896) 2021-08-20 12:17:13 +02:00
Sven Mika
8248ba531b
[RLlib] Redo #17410: Example script: Remote worker envs with inference done on main node. (#17960) 2021-08-20 08:02:18 +02:00
Alex Wu
318ba6fae0
Revert "[RLlib] Add example script for how to have n remote (parallel) envs with inference happening on "main" (possibly GPU) node. (#17410)" (#17951)
This reverts commit 8fc16b9a18.
2021-08-19 07:55:10 -07:00
Sven Mika
8fc16b9a18
[RLlib] Add example script for how to have n remote (parallel) envs with inference happening on "main" (possibly GPU) node. (#17410) 2021-08-19 12:14:50 +02:00
Sven Mika
a428f10ebe
[RLlib] Add multi-GPU learning tests to nightly. (#17778) 2021-08-18 17:21:01 +02:00
Sven Mika
f18213712f
[RLlib] Redo: "fix self play example scripts" PR (17566) (#17895)
* wip.

* wip.

* wip.

* wip.

* wip.

* wip.

* wip.

* wip.

* wip.
2021-08-17 09:13:35 -07:00
Stefan Schneider
eab9c25856
[RLlib] Better example scripts: Description --no-tune and --local-mode CLI options (autoregressive_action_dist.py) (#17705) 2021-08-16 22:08:13 +02:00
mguarin0
3e010c5760
[rllib] bug fix for rllib pettingzoo pistonball_v4 example (#17701)
* bug fix for rllib pettingzoo pistonball_v4 example

* adding test for PR 17701

* ran scripts/format.sh

* ok

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-08-12 00:25:00 -07:00
J K Terry
48e32555c8
[rllib] Update PettingZoo dependency versions (#17702)
* update pettingzoo dependency versions

* pettingzoo verison

* fix tests
2021-08-11 01:19:19 -07:00
Amog Kamsetty
77f28f1c30
Revert "[RLlib] Fix Trainer.add_policy for num_workers>0 (self play example scripts). (#17566)" (#17709)
This reverts commit 3b447265d8.
2021-08-10 10:50:01 -07:00