Commit graph

901 commits

Author SHA1 Message Date
Sven Mika
4cb23d1c95
[Tune; Testing] Revert to 3.7 (undone by accident by previous PR); + some minor comment cleanups. (#20031) 2021-11-04 10:58:34 +01:00
gjoliver
2c1fa459d4
[RLlib] Add an RLlib Tune experiment to UserTest suite. (#19807)
* Add an RLlib Tune experiment to UserTest suite.

* Add ray.init()

* Move example script to example/tune/, so it can be imported as module.

* add __init__.py so our new module will get included in python wheel.

* Add block device to RLlib test instances.

* Reduce disk size a little bit.

* Add metrics reporting

* Allow max of 5 workers to accomodate all the worker tasks.

* revert disk size change.

* Minor updates

* Trigger build

* set max num workers

* Add a compute cfg for autoscaled cpu and gpu nodes.

* use 1gpu instance.

* install tblib for debugging worker crashes.

* Manually upgrade to pytorch 1.9.0

* -y

* torch=1.9.0

* install torch on driver

* Add an RLlib Tune experiment to UserTest suite.

* Add ray.init()

* Move example script to example/tune/, so it can be imported as module.

* add __init__.py so our new module will get included in python wheel.

* Add block device to RLlib test instances.

* Reduce disk size a little bit.

* Add metrics reporting

* Allow max of 5 workers to accomodate all the worker tasks.

* revert disk size change.

* Minor updates

* Trigger build

* set max num workers

* Add a compute cfg for autoscaled cpu and gpu nodes.

* use 1gpu instance.

* install tblib for debugging worker crashes.

* Manually upgrade to pytorch 1.9.0

* -y

* torch=1.9.0

* install torch on driver

* bump timeout

* Write a more informational result dict.

* Revert changes to compute config files that are not used.

* add smoke test

* update

* reduce timeout

* Reduce the # of env per worker to 1.

* Small fix for getting trial_states

* Trigger build

* simply result dict

* lint

* more lint

* fix smoke test

Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
2021-11-03 17:04:27 -07:00
Avnish Narayan
026bf01071
[RLlib] Upgrade gym version to 0.21 and deprecate pendulum-v0. (#19535)
* Fix QMix, SAC, and MADDPA too.

* Unpin gym and deprecate pendulum v0

Many tests in rllib depended on pendulum v0,
however in gym 0.21, pendulum v0 was deprecated
in favor of pendulum v1. This may change reward
thresholds, so will have to potentially rerun
all of the pendulum v1 benchmarks, or use another
environment in favor. The same applies to frozen
lake v0 and frozen lake v1

Lastly, all of the RLlib tests and have
been moved to python 3.7

* Add gym installation based on python version.

Pin python<= 3.6 to gym 0.19 due to install
issues with atari roms in gym 0.20

* Reformatting

* Fixing tests

* Move atari-py install conditional to req.txt

* migrate to new ale install method

* Fix QMix, SAC, and MADDPA too.

* Unpin gym and deprecate pendulum v0

Many tests in rllib depended on pendulum v0,
however in gym 0.21, pendulum v0 was deprecated
in favor of pendulum v1. This may change reward
thresholds, so will have to potentially rerun
all of the pendulum v1 benchmarks, or use another
environment in favor. The same applies to frozen
lake v0 and frozen lake v1

Lastly, all of the RLlib tests and have
been moved to python 3.7
* Add gym installation based on python version.

Pin python<= 3.6 to gym 0.19 due to install
issues with atari roms in gym 0.20

Move atari-py install conditional to req.txt

migrate to new ale install method

Make parametric_actions_cartpole return float32 actions/obs

Adding type conversions if obs/actions don't match space

Add utils to make elements match gym space dtypes

Co-authored-by: Jun Gong <jungong@anyscale.com>
Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-11-03 16:24:00 +01:00
Sven Mika
e6ae08f416
[RLlib] Optionally don't drop last ts in v-trace calculations (APPO and IMPALA). (#19601) 2021-11-03 10:01:34 +01:00
Sven Mika
cf21c634a3
[RLlib] Fix deprecated warning for torch_ops.py (soft-replaced by torch_utils.py). (#19982) 2021-11-03 10:00:46 +01:00
gjoliver
9385b6c1be
[RLlib] Make a few LRSchedule and EntropyCoeffSchedule tests more reliable. (#19934) 2021-11-02 16:52:56 +01:00
Sven Mika
2d24ef0d32
[RLlib] Add all simple learning tests as framework=tf2. (#19273)
* Unpin gym and deprecate pendulum v0

Many tests in rllib depended on pendulum v0,
however in gym 0.21, pendulum v0 was deprecated
in favor of pendulum v1. This may change reward
thresholds, so will have to potentially rerun
all of the pendulum v1 benchmarks, or use another
environment in favor. The same applies to frozen
lake v0 and frozen lake v1

Lastly, all of the RLlib tests and Tune tests have
been moved to python 3.7

* fix tune test_sampler::testSampleBoundsAx

* fix re-install ray for py3.7 tests

Co-authored-by: avnishn <avnishn@uw.edu>
2021-11-02 12:10:17 +01:00
Sven Mika
0b308719f8
[RLlib; Docs overhaul] Docstring cleanup: rllib/utils (#19829) 2021-11-01 21:46:02 +01:00
Sven Mika
bab9c0f670
[RLlib; Docs overhaul] Redo: Docstring cleanup: Trainer, trainer_template, Callbacks."" (#19830) 2021-11-01 21:45:11 +01:00
Sven Mika
ea2bea7e30
[RLlib; Docs overhaul] Docstring cleanup: Offline. (#19808) 2021-11-01 10:59:53 +01:00
Sven Mika
4d945fe651
[RLlib] Issue 19878: Re-instate bare_metal_policy example script (#19881) 2021-10-30 12:50:39 -07:00
Sven Mika
9c73871da0
[RLlib; Docs overhaul] Docstring cleanup: Evaluation (#19783) 2021-10-29 12:03:56 +02:00
Rohan138
b9c9cc5946
[RLlib] Updated PettingZoo+RLlib tutorial; Removed pettingzoo example script (#19069)
* Updated PettingZoo+RLlib tutorial

Updated the tutorial and added link to the blog post by the PettingZoo team.

* Ran linting

* Converted link to tinyurl for linting

* fixed line lengths

* Decrease num_workers to 1

* Added comments

* Decreased num_workers

* Decreased timesteps

* Increased num_workers

* Update links and remove pettingzoo_env.py

* remove pettingzoo.py script from tests

Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-10-29 10:57:10 +02:00
Sven Mika
902e854af2
[RLlib; Docs overhaul] Docstring cleanup: Environments. (#19784)
* wip.

* Test: Make a change in tune to trigger tune tests, which are not run otherwise, but seem to fail nevertheless with this PR's changes.

* remove bare_metal_policy_with_custom_view_reqs from tests
2021-10-29 10:46:52 +02:00
gjoliver
d81885c1f1
[RLlib] Fix all the CI tests that were broken by is_training and replay buffer changes; re-comment-in the failing RLlib tests (#19809)
* Fix DDPG, since it is based on GenericOffPolicyTrainer.

* Fix QMix, SAC, and MADDPA too.

* Undo QMix change.

* Fix DQN input batch type. Always use SampleBatch.

* apex ddpg should not use replay_buffer_config yet.

* Make eager tf policy to use SampleBatch.

* lint

* LINT.

* Re-enable RLlib broken tests to make sure things work ok now.

* fixes.

Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-10-28 18:06:47 +02:00
Simon Mo
5e927b01ad
Revert "[CI] Remove config that disables Bazel test result cache" (#19818)
* Revert "[CI] Remove config that disables Bazel test result cache (#18701)"

This reverts commit 098ff36faa.

* Remove all RLlib tests from BUILD that currently fail.

Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-10-28 15:54:53 +02:00
gjoliver
39b0faa3ec
[RLlib]: bug fix, should be input_dict['is_training'] (#19805) 2021-10-27 23:30:43 +02:00
Sven Mika
4a82d3ea6c
Revert "[RLlib; Docs overhaul] Docstring cleanup: Trainer, trainer_template, Callbacks. (#19758)" (#19806)
This reverts commit 80eeb13175.
2021-10-27 23:30:07 +02:00
Sven Mika
80eeb13175
[RLlib; Docs overhaul] Docstring cleanup: Trainer, trainer_template, Callbacks. (#19758) 2021-10-27 19:15:35 +02:00
Sven Mika
f2cb2ed203
[RLlib; Docs overhaul] Docstring cleanup: Policies, policy_templates. (#19759) 2021-10-27 19:14:39 +02:00
gjoliver
99a0088233
[RLlib] Unify the way we create local replay buffer for all agents (#19627)
* [RLlib] Unify the way we create and use LocalReplayBuffer for all the agents.

This change
1. Get rid of the try...except clause when we call execution_plan(),
   and get rid of the Deprecation warning as a result.
2. Fix the execution_plan() call in Trainer._try_recover() too.
3. Most importantly, makes it much easier to create and use different types
   of local replay buffers for all our agents.
   E.g., allow us to easily create a reservoir sampling replay buffer for
   APPO agent for Riot in the near future.
* Introduce explicit configuration for replay buffer types.
* Fix is_training key error.
* actually deprecate buffer_size field.
2021-10-26 20:56:02 +02:00
Avnish Narayan
ad87ddf93e
[rllib] Add deterministic test to gpu (#19306)
Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-10-26 10:11:39 -07:00
Sven Mika
b213565783
[RLlib] Fix failing test cases: Soft-deprecate ModelV2.from_batch (in favor of ModelV2.__call__). (#19693) 2021-10-25 15:00:00 +02:00
gjoliver
89fbfc00f8
[RLlib] Some minor cleanups (buffer buffer_size -> capacity and others). (#19623) 2021-10-25 09:42:39 +02:00
roireshef
9b0352f363
[RLlib] Added LearningRateSchedule and EntropyCoeffSchedule to TF and Torch versions of A3C and PPO (#19276) 2021-10-25 09:39:35 +02:00
gjoliver
c3c42278e4
[RLlib] clean up all the SampleBatch['is_training'] deprecation warnings (#19652)
* [RLlib] clean up all the SampleBatch['is_training'] deprecation warnings.

* wip
2021-10-25 09:38:56 +02:00
xwjiang2010
a632cb439f
[Tune] Remove queue_trials. (#19472) 2021-10-22 09:24:54 +01:00
gjoliver
44a4e42172
[rllib] Add entropy_coeff_schedule support for APPO. (#19544)
* Add entropy_coeff_schedule support for APPO.

* lint
2021-10-20 14:18:01 -07:00
Carlo Grisetti
5cee8a1985
[release tests] Switch from yaml.load to yaml.safe_load (#19365) 2021-10-13 17:27:25 -07:00
Antoine Galataud
edb338ff7c
[RLlib] Check training_enabled on PolicyServer (#19007) 2021-10-12 16:21:02 +02:00
gjoliver
9226f9bddc
[RLlib] Report timesteps_this_iter to Tune, so it can track/checkpoint/restore total timesteps trained. (#19264)
* Report timesteps_this_iter to Tune, so it can track/checkpoint/restore
total timesteps trained.

* Trigger Build

* lint
2021-10-12 16:03:41 +02:00
Sven Mika
bd2d2079d2
[RLlib] Support >1 loss terms and optimizers for framework=tf2 (already supported for framework=[tf|torch]) (#19269) 2021-10-10 12:19:47 +02:00
Sven Mika
d439fd7f17
[RLlib] TF2/eager memory leak fixes. (#19198) 2021-10-09 00:11:53 +02:00
Sven Mika
c3e3fc7637
[RLlib] Issue 18280: A3C/IMPALA multi-agent not working. (#19100) 2021-10-07 23:57:53 +02:00
Sven Mika
fd438d5630
[RLlib] Issue 18104: Cannot set remote_worker_envs=True for non local-mode and MultiAgentEnv. (#19133) 2021-10-07 22:39:21 +02:00
Sven Mika
1f0646f658
[RLlib] Issue 18418: SAC w/ dict space not working. (#19101) 2021-10-06 09:05:50 +02:00
Sven Mika
b4300dd532
[RLlib] Issue 18812: Torch multi-GPU stats not protected against race conditions. (#18937) 2021-10-04 13:29:00 +02:00
Sven Mika
73f5c4039b
[RLlib] Fix flakey test_a3c, test_maml, test_apex_dqn. (#19035) 2021-10-04 13:23:51 +02:00
Jiajun Yao
7588bfd315
[Lint] Add flake8-bugbear (#19053)
* Add flake8-bugbear

* Add flake8-bugbear
2021-10-03 23:24:11 -07:00
Sven Mika
16ad46a654
[RLlib] Fix broken test_r2d2.py. (#19017) 2021-09-30 21:19:37 +02:00
Sven Mika
ac3371a148
[RLlib] Discussion 3644: Fix bug for complex obs spaces containing Box([2D shape]) and discrete component. (#18917) 2021-09-30 16:39:38 +02:00
Sven Mika
ed85f59194
[RLlib] Unify all RLlib Trainer.train() -> results[info][learner][policy ID][learner_stats] and add structure tests. (#18879) 2021-09-30 16:39:05 +02:00
Sven Mika
828f5d26b7
[RLlib] Custom view requirements (e.g. for prev-n-obs) work with compute_single_action and compute_actions_from_input_dict. (#18921) 2021-09-30 15:03:37 +02:00
Avnish Narayan
6dc1a6b72f
[RLlib] Raise error for kl penalty ddpo (#18959)
* [RLlib] Raise error for kl penalty ddpo

DDPPO doesn't support KL penalties like PPO-1.
In order to support KL penalties, DDPPO would need to
become undecentralized, which defeats the purpose of the
algorithm. Users can still tune the entropy coefficient to
control the policy entropy (similar to controlling the KL
penalty.)

* Update rllib/agents/ppo/ddppo.py

Co-authored-by: avnishn <avnishnarayan@gmail.com>
Co-authored-by: Sven Mika <sven@anyscale.io>
2021-09-30 10:56:22 +02:00
Sven Mika
05a55a9335
[RLlib] Issue 18668: Unity3D env client/server example not working (fix + add to test cases). (#18942) 2021-09-30 08:30:20 +02:00
Sven Mika
9c9b482661
[RLlib] Allow n-step > 1 and prio. replay for R2D2 and RNNSAC. (#18939) 2021-09-29 21:31:34 +02:00
Sven Mika
b99943806e
[RLlib] Add support for IMPALA to handle more than one loss/optimizer (analogous to recent enhancement for APPO). (#18971) 2021-09-29 21:30:04 +02:00
mvindiola1
62f5da0b65
[RLlib] Add unit tests for updating episode data in base_env (#17137) 2021-09-24 16:08:11 +02:00
Julius Frost
8b8b447fd7
[RLlib] Fix train.py input config patching (#18747) 2021-09-24 14:41:33 +02:00
o0olele
ff6730f903
[RLlib] Attention Nets + MultiDiscrete spaces: Fix range() takes no keyword args error! (#17502) 2021-09-24 13:43:58 +02:00