Commit graph

110 commits

Author SHA1 Message Date
Sven Mika
d5bfb7b7da
[RLlib] Preparatory PR for multi-agent multi-GPU learner (alpha-star style) #03 (#21652) 2022-01-25 14:16:58 +01:00
Sven Mika
90c6b10498
[RLlib] Decentralized multi-agent learning; PR #01 (#21421) 2022-01-13 10:52:55 +01:00
Sven Mika
b10d5533be
[RLlib] Issue 20920 (partial solution): contrib/MADDPG + pettingzoo coop-pong-v4 not working. (#21452) 2022-01-10 11:19:40 +01:00
Sven Mika
60b2219d72
[RLlib] Allow for evaluation to run by timesteps (alternative to episodes) and add auto-setting to make sure train doesn't ever have to wait for eval (e.g. long episodes) to finish. (#20757) 2021-12-04 13:26:33 +01:00
Jun Gong
2317c693cf
[RLlib] Use SampleBrach instead of input dict whenever possible (#20746) 2021-12-02 13:11:26 +01:00
Sven Mika
9e38f6f613
[RLlib] Trainer sub-class DDPG/TD3/APEX-DDPG (instead of build_trainer). (#20636) 2021-12-01 10:52:12 +01:00
Sven Mika
3d2e27485b
[RLlib] Trainer sub-class DQN/SimpleQ/APEX-DQN/R2D2 (instead of using build_trainer). (#20633) 2021-11-30 18:05:44 +01:00
Artur Niederfahrenhorst
d07e50e957
[RLlib] Replay buffer API (cleanups; docstrings; renames; move into rllib/execution/buffers dir) (#20552) 2021-11-19 11:57:37 +01:00
Sven Mika
a931076f59
[RLlib] Tf2 + eager-tracing same speed as framework=tf; Add more test coverage for tf2+tracing. (#19981) 2021-11-05 16:10:00 +01:00
Avnish Narayan
026bf01071
[RLlib] Upgrade gym version to 0.21 and deprecate pendulum-v0. (#19535)
* Fix QMix, SAC, and MADDPA too.

* Unpin gym and deprecate pendulum v0

Many tests in rllib depended on pendulum v0,
however in gym 0.21, pendulum v0 was deprecated
in favor of pendulum v1. This may change reward
thresholds, so will have to potentially rerun
all of the pendulum v1 benchmarks, or use another
environment in favor. The same applies to frozen
lake v0 and frozen lake v1

Lastly, all of the RLlib tests and have
been moved to python 3.7

* Add gym installation based on python version.

Pin python<= 3.6 to gym 0.19 due to install
issues with atari roms in gym 0.20

* Reformatting

* Fixing tests

* Move atari-py install conditional to req.txt

* migrate to new ale install method

* Fix QMix, SAC, and MADDPA too.

* Unpin gym and deprecate pendulum v0

Many tests in rllib depended on pendulum v0,
however in gym 0.21, pendulum v0 was deprecated
in favor of pendulum v1. This may change reward
thresholds, so will have to potentially rerun
all of the pendulum v1 benchmarks, or use another
environment in favor. The same applies to frozen
lake v0 and frozen lake v1

Lastly, all of the RLlib tests and have
been moved to python 3.7
* Add gym installation based on python version.

Pin python<= 3.6 to gym 0.19 due to install
issues with atari roms in gym 0.20

Move atari-py install conditional to req.txt

migrate to new ale install method

Make parametric_actions_cartpole return float32 actions/obs

Adding type conversions if obs/actions don't match space

Add utils to make elements match gym space dtypes

Co-authored-by: Jun Gong <jungong@anyscale.com>
Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-11-03 16:24:00 +01:00
Sven Mika
cf21c634a3
[RLlib] Fix deprecated warning for torch_ops.py (soft-replaced by torch_utils.py). (#19982) 2021-11-03 10:00:46 +01:00
Sven Mika
2d24ef0d32
[RLlib] Add all simple learning tests as framework=tf2. (#19273)
* Unpin gym and deprecate pendulum v0

Many tests in rllib depended on pendulum v0,
however in gym 0.21, pendulum v0 was deprecated
in favor of pendulum v1. This may change reward
thresholds, so will have to potentially rerun
all of the pendulum v1 benchmarks, or use another
environment in favor. The same applies to frozen
lake v0 and frozen lake v1

Lastly, all of the RLlib tests and Tune tests have
been moved to python 3.7

* fix tune test_sampler::testSampleBoundsAx

* fix re-install ray for py3.7 tests

Co-authored-by: avnishn <avnishn@uw.edu>
2021-11-02 12:10:17 +01:00
Sven Mika
0b308719f8
[RLlib; Docs overhaul] Docstring cleanup: rllib/utils (#19829) 2021-11-01 21:46:02 +01:00
gjoliver
d81885c1f1
[RLlib] Fix all the CI tests that were broken by is_training and replay buffer changes; re-comment-in the failing RLlib tests (#19809)
* Fix DDPG, since it is based on GenericOffPolicyTrainer.

* Fix QMix, SAC, and MADDPA too.

* Undo QMix change.

* Fix DQN input batch type. Always use SampleBatch.

* apex ddpg should not use replay_buffer_config yet.

* Make eager tf policy to use SampleBatch.

* lint

* LINT.

* Re-enable RLlib broken tests to make sure things work ok now.

* fixes.

Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-10-28 18:06:47 +02:00
Sven Mika
b4300dd532
[RLlib] Issue 18812: Torch multi-GPU stats not protected against race conditions. (#18937) 2021-10-04 13:29:00 +02:00
Jiajun Yao
7588bfd315
[Lint] Add flake8-bugbear (#19053)
* Add flake8-bugbear

* Add flake8-bugbear
2021-10-03 23:24:11 -07:00
Sven Mika
ed85f59194
[RLlib] Unify all RLlib Trainer.train() -> results[info][learner][policy ID][learner_stats] and add structure tests. (#18879) 2021-09-30 16:39:05 +02:00
Sven Mika
599e589481
[RLlib] Move existing fake multi-GPU learning tests into separate buildkite job. (#18065) 2021-08-31 14:56:53 +02:00
Sven Mika
4888d7c9af
[RLlib] Replay buffers: Add config option to store contents in checkpoints. (#17999) 2021-08-31 12:21:49 +02:00
Sven Mika
b6aa8223bc
[RLlib] Fix final_scale's default value to 0.02 (see OrnsteinUhlenbeck exploration). (#18070) 2021-08-25 14:22:09 +02:00
Sven Mika
a428f10ebe
[RLlib] Add multi-GPU learning tests to nightly. (#17778) 2021-08-18 17:21:01 +02:00
Thomas Lecat
c02f91fa2d
[RLlib] Ape-X doesn't take the value of prioritized_replay into account (#17541) 2021-08-16 22:18:08 +02:00
Sven Mika
c2ea2c01bb
[RLlib] Redo: Add support for multi-GPU to DDPG. (#17789)
* wip.

* wip.

* wip.

* wip.

* wip.

* wip.
2021-08-13 18:01:24 -07:00
Amog Kamsetty
0b8489dcc6
Revert "[RLlib] Add support for multi-GPU to DDPG. (#17586)" (#17707)
This reverts commit 0eb0e0ff58.
2021-08-10 10:50:21 -07:00
Sven Mika
0eb0e0ff58
[RLlib] Add support for multi-GPU to DDPG. (#17586) 2021-08-05 11:39:51 -04:00
Sven Mika
924f11cd45
[RLlib] Torch algos use now-framework-agnostic MultiGPUTrainOneStep execution op (~33% speedup for PPO-torch + GPU). (#17371) 2021-08-03 11:35:49 -04:00
Sven Mika
90b21ce27e
[RLlib] De-flake 3 test cases; Fix config.simple_optimizer and SampleBatch.is_training warnings. (#17321) 2021-07-27 14:39:06 -04:00
Sven Mika
1fd0eb805e
[RLlib] Redo fix bug normalize vs unsquash actions (original PR made log-likelihood test flakey). (#17014) 2021-07-13 14:01:30 -04:00
Amog Kamsetty
bc33dc7e96
Revert "[RLlib] Fix bug in policy.py: normalize_actions=True has to call unsquash_action, not normalize_action." (#17002)
This reverts commit 7862dd64ea.
2021-07-12 11:09:14 -07:00
Sven Mika
7862dd64ea
[RLlib] Fix bug in policy.py: normalize_actions=True has to call unsquash_action, not normalize_action. (#16774) 2021-07-08 17:31:34 +02:00
Sven Mika
53206dd440
[RLlib] CQL BC loss fixes; PPO/PG/A2|3C action normalization fixes (#16531) 2021-06-30 12:32:11 +02:00
Sven Mika
be6db06485
[RLlib] Re-do: Trainer: Support add and delete Policies. (#16569) 2021-06-21 13:46:01 +02:00
Sven Mika
d0014cd351
[RLlib] Policies get/set_state fixes and enhancements. (#16354) 2021-06-15 13:08:43 +02:00
Sven Mika
5fe34862ce
[RLlib] DDPG torch GPU bug. (#16133) 2021-05-28 22:09:25 +02:00
Michael Luo
474f04e322
[RLlib] DDPG/TD3 + A3C/A2C + MARWIL/BC Annotation/Comments/Code Cleanup (#14707) 2021-05-19 16:32:29 +02:00
Sven Mika
839fc59224
[RLlib] CQL TensorFlow support (#15841) 2021-05-18 11:10:46 +02:00
Sven Mika
bc09e75b78
[RLlib] Fix 3 flakey test cases. (#15785) 2021-05-16 12:20:33 +02:00
SebastianBo1995
f5be8d8f74
[Rllib] Offline Learning Bug, different shapes (#15132) 2021-04-27 17:18:17 +02:00
Sven Mika
bb8a286cbc
[RLlib] Support native tf.keras.Model (milestone toward obsoleting ModelV2 class). (#14684) 2021-04-27 10:44:54 +02:00
Sven Mika
4f66309e19
[RLlib] Redo issue 14533 tf enable eager exec (#14984) 2021-03-29 20:07:44 +02:00
SangBin Cho
fa5f961d5e
Revert "[RLlib] Issue 14533: tf.enable_eager_execution() must be called at beginning. (#14737)" (#14918)
This reverts commit 3e389d5812.
2021-03-25 00:42:01 -07:00
mvindiola1
5e350ceaa2
[RLlib] Issue 14119: Fix TD3 policy delay for torch. (#14840) 2021-03-24 16:26:22 +01:00
Sven Mika
3e389d5812
[RLlib] Issue 14533: tf.enable_eager_execution() must be called at beginning. (#14737) 2021-03-24 12:54:27 +01:00
Sven Mika
04bc0a9828
[RLlib] Remove all non-trajectory view API code. (#14860) 2021-03-23 09:50:18 -07:00
Sven Mika
69202c6a7d
[RLlib] Obsolete usage tracking dict via sample batch. (#13065) 2021-03-17 08:18:15 +01:00
Sven Mika
732197e23a
[RLlib] Multi-GPU for tf-DQN/PG/A2C. (#13393) 2021-03-08 15:41:27 +01:00
Sven Mika
37c7daa3c0
[RLlib] DDPG: Support simplex action space. (#14011) 2021-02-10 15:10:01 +01:00
Sven Mika
2e3655e8a9
[RLlib] Issue 9071 A3C w/ RNN not working due to VF assuming no RNN. (#13238) 2021-01-19 14:22:36 +01:00
Sven Mika
56878221ed
[RLlib] Redo: Make TFModelV2 fully modular like TorchModelV2 (soft-deprecate register_variables, unify var names wrt torch). (#13363) 2021-01-14 14:44:33 +01:00
Kai Fricke
25f10a947a
Revert "[RLlib] Make TFModelV2 behave more like TorchModelV2: Obsolete register_variables. Unify variable dicts. (#13339)" (#13361)
This reverts commit e2b2abb88b.
2021-01-12 12:33:57 +01:00