Commit graph

176 commits

Author SHA1 Message Date
Sven Mika
44773e810b
[RLlib] DD-PPO Config objects. (#25028) 2022-05-22 13:05:24 +02:00
Jun Gong
d5a6d46049
[RLlib] Migrate MAML, MB-MPO, MARWIL, and BC to use Policy sub-classing implementation. (#24914) 2022-05-20 14:10:59 +02:00
kourosh hakhamaneshi
3815e52a61
[RLlib] Agents to algos: DQN w/o Apex and R2D2, DDPG/TD3, SAC, SlateQ, QMIX, PG, Bandits (#24896) 2022-05-19 18:30:42 +02:00
Sven Mika
8f50087908
[RLlib] AlphaZero uses training_iteration API. (#24507) 2022-05-18 09:58:25 +02:00
Jun Gong
dea134a472
[RLlib] Clean up Policy mixins. (#24746) 2022-05-17 17:16:08 +02:00
Sven Mika
25001f6d8d
[RLlib] APPO Training iteration fn. (#24545) 2022-05-17 10:31:07 +02:00
Max Pumperla
6a6c58b5b4
[RLlib] Config objects for DDPG and SimpleQ. (#24339) 2022-05-12 16:12:42 +02:00
Sven Mika
f54557073e
[RLlib] Remove execution_plan API code no longer needed. (#24501) 2022-05-06 12:29:53 +02:00
Avnish Narayan
f2bb6f6806
[RLlib] Impala training iteration fn (#23454) 2022-05-05 16:11:08 +02:00
Sven Mika
1bc6419e0e
[RLlib] R2D2 training iteration fn AND switch off execution_plan API by default. (#24165) 2022-05-03 07:59:26 +02:00
Sven Mika
f53ca1cacb
[RLlib] ES + ARS TrainerConfig objects. (#24374) 2022-05-02 16:55:28 +02:00
Sven Mika
026849cd27
[RLlib] APPO TrainerConfig objects. (#24376) 2022-05-02 15:06:23 +02:00
Sven Mika
f066180ed5
[RLlib] Deprecate timesteps_per_iteration config key (in favor of min_[sample|train]_timesteps_per_reporting. (#24372) 2022-05-02 12:51:14 +02:00
Sven Mika
b2b1c95aa5
[RLlib] A2/3C Config objects (A2CConfig and A3CConfig). (#24332) 2022-04-30 09:51:09 +02:00
Sven Mika
ba14f0a41b
[RLlib] PGTrainer config object class (PGConfig). (#24295) 2022-04-28 22:25:16 +02:00
Sven Mika
c95dd79953
[RLlib] APPO eager fix (APPOTFPolicy gets wrapped as_eager() twice by mistake). (#24268) 2022-04-27 21:27:34 +02:00
Avnish Narayan
6e68b6bef9
[RLlib] DD-PPO training iteration fn. (#24118)
We had unreported merge conflicts with DDPPO. This PR closes and combines #24092, #24035, #24030 and #23096

Co-authored-by: sven1977 <svenmika1977@gmail.com>
2022-04-22 15:22:14 -07:00
Kai Fricke
9f7170e444
Revert "Revert revert #23906 [RLlib] DD-PPO training iteration function implementation. (#24035)" (#24103)
This reverts commit a337fd994e.
2022-04-22 09:58:58 +01:00
Avnish Narayan
a337fd994e
Revert revert #23906 [RLlib] DD-PPO training iteration function implementation. (#24035) 2022-04-21 17:37:49 +02:00
Avnish Narayan
0ddbce6518
Revert "[RLlib] DD-PPO training iteration fn (#23906)" (#24030)
The DDPPO LR scheduler test is broken because the learner_info_dictionary that is returned by the training iteration function does not consistently return a learner info for every training iteration, but the test expects that it does.

We'll need to fix the test then re-merge

Reverts #23906
2022-04-19 16:43:57 -07:00
Sven Mika
eb54236d13
[RLlib] DD-PPO training iteration fn (#23906) 2022-04-19 17:55:26 +02:00
Sven Mika
a3d4fc74a6
[RLlib] MARWIL: Move to training_iteration API. (#23798) 2022-04-11 19:28:32 +02:00
Steven Morad
00922817b6
[RLlib] Rewrite PPO to use training_iteration + enable DD-PPO for Win32. (#23673) 2022-04-11 08:39:10 +02:00
Steven Morad
39841b65b3
[RLlib] PPOTorchPolicy: Remove extra call to model.value_function (#23671) 2022-04-05 08:40:29 +02:00
Sven Mika
2eaa54bd76
[RLlib] POC: Config objects instead of dicts (PPO only). (#23491) 2022-03-31 18:26:12 +02:00
Sven Mika
7cb86acce2
[RLlib] trainer_template.py: hard deprecation (error when used). (#23488) 2022-03-25 18:25:51 +01:00
Siyuan (Ryans) Zhuang
0c74ecad12
[Lint] Cleanup incorrectly formatted strings (Part 1: RLLib). (#23128) 2022-03-15 17:34:21 +01:00
Jeroen Bédorf
bc21a4593d
[RLlib] Fix crash when kl_coeff is set to 0 (#23063)
Co-authored-by: Jeroen Bédorf <jeroen@minds.ai>
Co-authored-by: Ishant Mrinal Haloi <mrinal.haloi11@gmail.com>
Co-authored-by: Ishant Mrinal <33053278+n30111@users.noreply.github.com>
2022-03-11 12:24:52 -08:00
Sven Mika
526fd6b5fb
[RLlib] Issue 22444: KL-coeff not stored in persistent policy state. (#22590) 2022-02-24 22:05:36 +01:00
Daniel
308ccfe25c
[RLlib] DD-PPO move train_batch_size==-1 check to __init__ (#22521) 2022-02-21 11:44:12 +01:00
Steven Morad
5d52b599aa
[RLlib] Fix zero gradients for ppo-clipped vf (#22171) 2022-02-15 08:57:18 +01:00
Balaji Veeramani
31ed9e5d02
[CI] Replace YAPF disables with Black disables (#21982) 2022-02-08 16:29:25 -08:00
Balaji Veeramani
7f1bacc7dc
[CI] Format Python code with Black (#21975)
See #21316 and #21311 for the motivation behind these changes.
2022-01-29 18:41:57 -08:00
Jun Gong
8ebc50f844
[RLlib] Issue 21334: Fix APPO when kl_loss is enabled. (#21855) 2022-01-27 20:08:58 +01:00
Sven Mika
371fbb17e4
[RLlib] Make policies_to_train more flexible via callable option. (#20735) 2022-01-27 12:17:34 +01:00
Sven Mika
d5bfb7b7da
[RLlib] Preparatory PR for multi-agent multi-GPU learner (alpha-star style) #03 (#21652) 2022-01-25 14:16:58 +01:00
Sven Mika
92f030331e
[RLlib] Initial code/comment cleanups in preparation for decentralized multi-agent learner. (#21420) 2022-01-10 11:22:55 +01:00
Sven Mika
4eaf70942d
[RLlib] Issue 21297: Ignore PPO KL-loss term completely if kl-coeff == 0.0 to avoid NaN values due to some discrete action probs==0.0 (#21456) 2022-01-10 11:22:40 +01:00
Sven Mika
853d10871c
[RLlib] Issue 18499: PGTrainer with training_iteration fn does not support multi-GPU. (#21376) 2022-01-05 18:22:33 +01:00
Sven Mika
49cd7ea6f9
[RLlib] Trainer sub-class PPO/DDPPO (instead of build_trainer()). (#20571) 2021-11-23 23:01:05 +01:00
Sven Mika
9d2fe5756c
[RLlib] Trainer sub-class for APPO (instead of using build_trainer()). (#20424) 2021-11-22 22:14:21 +01:00
Sven Mika
f82880eda1
Revert "Revert [RLlib] POC: Deprecate build_policy (policy template) for torch only; PPOTorchPolicy (#20061) (#20399)" (#20417)
This reverts commit 90dc5460d4.
2021-11-16 14:49:41 +01:00
Amog Kamsetty
90dc5460d4
Revert "[RLlib] POC: Deprecate build_policy (policy template) for torch only; PPOTorchPolicy (#20061)" (#20399)
This reverts commit 5b1c8e46e1.
2021-11-15 16:11:35 -08:00
Sven Mika
5b1c8e46e1
[RLlib] POC: Deprecate build_policy (policy template) for torch only; PPOTorchPolicy (#20061) 2021-11-15 10:41:54 +01:00
Sven Mika
a931076f59
[RLlib] Tf2 + eager-tracing same speed as framework=tf; Add more test coverage for tf2+tracing. (#19981) 2021-11-05 16:10:00 +01:00
Avnish Narayan
026bf01071
[RLlib] Upgrade gym version to 0.21 and deprecate pendulum-v0. (#19535)
* Fix QMix, SAC, and MADDPA too.

* Unpin gym and deprecate pendulum v0

Many tests in rllib depended on pendulum v0,
however in gym 0.21, pendulum v0 was deprecated
in favor of pendulum v1. This may change reward
thresholds, so will have to potentially rerun
all of the pendulum v1 benchmarks, or use another
environment in favor. The same applies to frozen
lake v0 and frozen lake v1

Lastly, all of the RLlib tests and have
been moved to python 3.7

* Add gym installation based on python version.

Pin python<= 3.6 to gym 0.19 due to install
issues with atari roms in gym 0.20

* Reformatting

* Fixing tests

* Move atari-py install conditional to req.txt

* migrate to new ale install method

* Fix QMix, SAC, and MADDPA too.

* Unpin gym and deprecate pendulum v0

Many tests in rllib depended on pendulum v0,
however in gym 0.21, pendulum v0 was deprecated
in favor of pendulum v1. This may change reward
thresholds, so will have to potentially rerun
all of the pendulum v1 benchmarks, or use another
environment in favor. The same applies to frozen
lake v0 and frozen lake v1

Lastly, all of the RLlib tests and have
been moved to python 3.7
* Add gym installation based on python version.

Pin python<= 3.6 to gym 0.19 due to install
issues with atari roms in gym 0.20

Move atari-py install conditional to req.txt

migrate to new ale install method

Make parametric_actions_cartpole return float32 actions/obs

Adding type conversions if obs/actions don't match space

Add utils to make elements match gym space dtypes

Co-authored-by: Jun Gong <jungong@anyscale.com>
Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-11-03 16:24:00 +01:00
Sven Mika
e6ae08f416
[RLlib] Optionally don't drop last ts in v-trace calculations (APPO and IMPALA). (#19601) 2021-11-03 10:01:34 +01:00
Sven Mika
cf21c634a3
[RLlib] Fix deprecated warning for torch_ops.py (soft-replaced by torch_utils.py). (#19982) 2021-11-03 10:00:46 +01:00
gjoliver
9385b6c1be
[RLlib] Make a few LRSchedule and EntropyCoeffSchedule tests more reliable. (#19934) 2021-11-02 16:52:56 +01:00
Sven Mika
0b308719f8
[RLlib; Docs overhaul] Docstring cleanup: rllib/utils (#19829) 2021-11-01 21:46:02 +01:00