Commit graph

201 commits

Author SHA1 Message Date
Artur Niederfahrenhorst
d76ef9add5
[RLLib] Fix RNNSAC example failing on CI + fixes for recurrent models for other Q Learning Algos. (#24923) 2022-05-24 14:39:43 +02:00
Sven Mika
ec89fe5203
[RLlib] APEX-DQN and R2D2 config objects. (#25067) 2022-05-23 12:15:45 +02:00
Kai Fricke
3e053c85ee
[RLlib] Fix broken links from agent -> algo conversion. (#25014) 2022-05-20 11:37:11 +02:00
kourosh hakhamaneshi
3815e52a61
[RLlib] Agents to algos: DQN w/o Apex and R2D2, DDPG/TD3, SAC, SlateQ, QMIX, PG, Bandits (#24896) 2022-05-19 18:30:42 +02:00
Jun Gong
dea134a472
[RLlib] Clean up Policy mixins. (#24746) 2022-05-17 17:16:08 +02:00
Artur Niederfahrenhorst
c2a1e5abd1
[RLlib] Prioritized Replay (if required) in SimpleQ and DDPG. (#24866) 2022-05-17 13:53:07 +02:00
Artur Niederfahrenhorst
fb2915d26a
[RLlib] Replay Buffer API and Ape-X. (#24506) 2022-05-17 13:43:49 +02:00
Sven Mika
25001f6d8d
[RLlib] APPO Training iteration fn. (#24545) 2022-05-17 10:31:07 +02:00
Jun Gong
68a9a33386
[RLlib] Retry agents -> algorithms. with proper doc changes this time. (#24797) 2022-05-16 09:45:32 +02:00
Steven Morad
5c96e7223b
[RLlib] SimpleQ (minor cleanups) and DQN TrainerConfig objects. (#24584) 2022-05-15 16:14:43 +02:00
Simon Mo
9f23affdc0
[Hotfix] Unbreak lint in master (#24794) 2022-05-13 15:05:05 -07:00
Sven Mika
8fe3fd8f7b
[RLlib] QMix TrainerConfig objects. (#24775) 2022-05-13 18:50:28 +02:00
Max Pumperla
6a6c58b5b4
[RLlib] Config objects for DDPG and SimpleQ. (#24339) 2022-05-12 16:12:42 +02:00
Artur Niederfahrenhorst
95d4a83a87
[RLlib] R2D2 Replay Buffer API integration. (#24473) 2022-05-10 20:36:14 +02:00
Sven Mika
44a51610c2
[RLlib] SlateQ config objects. (#24577) 2022-05-10 20:07:18 +02:00
Artur Niederfahrenhorst
8d906f9bf8
[RLlib] SAC with new Replay Buffer API. (#24156) 2022-05-09 14:33:02 +02:00
Steven Morad
b76273357b
[RLlib] APEX-DQN replay buffer config validation fix. (#24588) 2022-05-09 09:59:04 +02:00
Sven Mika
f54557073e
[RLlib] Remove execution_plan API code no longer needed. (#24501) 2022-05-06 12:29:53 +02:00
Artur Niederfahrenhorst
86bc9ecce2
[RLlib] DDPG Training iteration fn & Replay Buffer API (#24212) 2022-05-05 09:41:38 +02:00
Sven Mika
1bc6419e0e
[RLlib] R2D2 training iteration fn AND switch off execution_plan API by default. (#24165) 2022-05-03 07:59:26 +02:00
Sven Mika
f066180ed5
[RLlib] Deprecate timesteps_per_iteration config key (in favor of min_[sample|train]_timesteps_per_reporting. (#24372) 2022-05-02 12:51:14 +02:00
Sven Mika
539832f2c5
[RLlib] SlateQ training iteration function. (#24151) 2022-04-29 18:38:17 +02:00
Sven Mika
627b9f2e88
[RLlib] QMIX training iteration function and new replay buffer API. (#24164) 2022-04-27 14:24:20 +02:00
Sven Mika
bb4e5cb70a
[RLlib] CQL: training iteration function. (#24166) 2022-04-26 14:28:39 +02:00
Avnish Narayan
477b9d22d2
[RLlib][Training iteration fn] APEX conversion (#22937) 2022-04-20 17:56:18 +02:00
Artur Niederfahrenhorst
e57ce7efd6
[RLlib] Replay Buffer API and Training Iteration Fn for DQN. (#23420) 2022-04-18 12:20:12 +02:00
Steven Morad
00922817b6
[RLlib] Rewrite PPO to use training_iteration + enable DD-PPO for Win32. (#23673) 2022-04-11 08:39:10 +02:00
Sven Mika
2eaa54bd76
[RLlib] POC: Config objects instead of dicts (PPO only). (#23491) 2022-03-31 18:26:12 +02:00
Artur Niederfahrenhorst
9a64bd4e9b
[RLlib] Simple-Q uses training iteration fn (instead of execution_plan); ReplayBuffer API for Simple-Q (#22842) 2022-03-29 14:44:40 +02:00
Siyuan (Ryans) Zhuang
0c74ecad12
[Lint] Cleanup incorrectly formatted strings (Part 1: RLLib). (#23128) 2022-03-15 17:34:21 +01:00
Balaji Veeramani
31ed9e5d02
[CI] Replace YAPF disables with Black disables (#21982) 2022-02-08 16:29:25 -08:00
Balaji Veeramani
7f1bacc7dc
[CI] Format Python code with Black (#21975)
See #21316 and #21311 for the motivation behind these changes.
2022-01-29 18:41:57 -08:00
Sven Mika
371fbb17e4
[RLlib] Make policies_to_train more flexible via callable option. (#20735) 2022-01-27 12:17:34 +01:00
Sven Mika
d5bfb7b7da
[RLlib] Preparatory PR for multi-agent multi-GPU learner (alpha-star style) #03 (#21652) 2022-01-25 14:16:58 +01:00
Sven Mika
90c6b10498
[RLlib] Decentralized multi-agent learning; PR #01 (#21421) 2022-01-13 10:52:55 +01:00
Sven Mika
b10d5533be
[RLlib] Issue 20920 (partial solution): contrib/MADDPG + pettingzoo coop-pong-v4 not working. (#21452) 2022-01-10 11:19:40 +01:00
Sven Mika
853d10871c
[RLlib] Issue 18499: PGTrainer with training_iteration fn does not support multi-GPU. (#21376) 2022-01-05 18:22:33 +01:00
Sven Mika
63db0e3a7c
[RLlib] Fix SAC learning test flakiness introduced in PR: "Sub-class Trainer (instead of build_trainer()): All remaining classes; soft-deprecate build_trainer." (#20985) 2021-12-09 14:24:27 +01:00
Sven Mika
b4790900f5
[RLlib] Sub-class Trainer (instead of build_trainer()): All remaining classes; soft-deprecate build_trainer. (#20725) 2021-12-04 22:05:26 +01:00
Sven Mika
0de41e4a6b
[RLlib] Trainer sub-class QMIX/MAML/MB-MPO (instead of build_trainer). (#20639) 2021-12-02 13:17:10 +01:00
Jun Gong
2317c693cf
[RLlib] Use SampleBrach instead of input dict whenever possible (#20746) 2021-12-02 13:11:26 +01:00
Sven Mika
9e38f6f613
[RLlib] Trainer sub-class DDPG/TD3/APEX-DDPG (instead of build_trainer). (#20636) 2021-12-01 10:52:12 +01:00
Sven Mika
3d2e27485b
[RLlib] Trainer sub-class DQN/SimpleQ/APEX-DQN/R2D2 (instead of using build_trainer). (#20633) 2021-11-30 18:05:44 +01:00
Sven Mika
49cd7ea6f9
[RLlib] Trainer sub-class PPO/DDPPO (instead of build_trainer()). (#20571) 2021-11-23 23:01:05 +01:00
Artur Niederfahrenhorst
d07e50e957
[RLlib] Replay buffer API (cleanups; docstrings; renames; move into rllib/execution/buffers dir) (#20552) 2021-11-19 11:57:37 +01:00
Sven Mika
a931076f59
[RLlib] Tf2 + eager-tracing same speed as framework=tf; Add more test coverage for tf2+tracing. (#19981) 2021-11-05 16:10:00 +01:00
Avnish Narayan
026bf01071
[RLlib] Upgrade gym version to 0.21 and deprecate pendulum-v0. (#19535)
* Fix QMix, SAC, and MADDPA too.

* Unpin gym and deprecate pendulum v0

Many tests in rllib depended on pendulum v0,
however in gym 0.21, pendulum v0 was deprecated
in favor of pendulum v1. This may change reward
thresholds, so will have to potentially rerun
all of the pendulum v1 benchmarks, or use another
environment in favor. The same applies to frozen
lake v0 and frozen lake v1

Lastly, all of the RLlib tests and have
been moved to python 3.7

* Add gym installation based on python version.

Pin python<= 3.6 to gym 0.19 due to install
issues with atari roms in gym 0.20

* Reformatting

* Fixing tests

* Move atari-py install conditional to req.txt

* migrate to new ale install method

* Fix QMix, SAC, and MADDPA too.

* Unpin gym and deprecate pendulum v0

Many tests in rllib depended on pendulum v0,
however in gym 0.21, pendulum v0 was deprecated
in favor of pendulum v1. This may change reward
thresholds, so will have to potentially rerun
all of the pendulum v1 benchmarks, or use another
environment in favor. The same applies to frozen
lake v0 and frozen lake v1

Lastly, all of the RLlib tests and have
been moved to python 3.7
* Add gym installation based on python version.

Pin python<= 3.6 to gym 0.19 due to install
issues with atari roms in gym 0.20

Move atari-py install conditional to req.txt

migrate to new ale install method

Make parametric_actions_cartpole return float32 actions/obs

Adding type conversions if obs/actions don't match space

Add utils to make elements match gym space dtypes

Co-authored-by: Jun Gong <jungong@anyscale.com>
Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-11-03 16:24:00 +01:00
Sven Mika
cf21c634a3
[RLlib] Fix deprecated warning for torch_ops.py (soft-replaced by torch_utils.py). (#19982) 2021-11-03 10:00:46 +01:00
gjoliver
9385b6c1be
[RLlib] Make a few LRSchedule and EntropyCoeffSchedule tests more reliable. (#19934) 2021-11-02 16:52:56 +01:00
Sven Mika
2d24ef0d32
[RLlib] Add all simple learning tests as framework=tf2. (#19273)
* Unpin gym and deprecate pendulum v0

Many tests in rllib depended on pendulum v0,
however in gym 0.21, pendulum v0 was deprecated
in favor of pendulum v1. This may change reward
thresholds, so will have to potentially rerun
all of the pendulum v1 benchmarks, or use another
environment in favor. The same applies to frozen
lake v0 and frozen lake v1

Lastly, all of the RLlib tests and Tune tests have
been moved to python 3.7

* fix tune test_sampler::testSampleBoundsAx

* fix re-install ray for py3.7 tests

Co-authored-by: avnishn <avnishn@uw.edu>
2021-11-02 12:10:17 +01:00