Commit graph

150 commits

Author SHA1 Message Date
Sven Mika
1499af945b
[RLlib] Algorithm step() fixes: evaluation should NOT be part of timed training_step loop. (#25924) 2022-06-20 19:53:47 +02:00
Artur Niederfahrenhorst
a322cc5765
[RLlib] IMPALA/APPO multi-agent mix-in-buffer fixes (plus MA learning tests). (#25848) 2022-06-17 14:10:36 +02:00
Sven Mika
7c39aa5fac
[RLlib] Trainer.training_iteration -> Trainer.training_step; Iterations vs reportings: Clarification of terms. (#25076) 2022-06-10 17:09:18 +02:00
kourosh hakhamaneshi
4cdd508f70
[RLlib] Added CRR implementation. (#25499) 2022-06-08 11:42:02 +02:00
Artur Niederfahrenhorst
35bd397181
[RLlib] Better default values for training_intensity and target_network_update_freq for R2D2. (#25510) 2022-06-07 10:29:56 +02:00
Sven Mika
b5bc2b93c3
[RLlib] Move all remaining algos into algorithms directory. (#25366) 2022-06-04 07:35:24 +02:00
Sven Mika
6c7f781d8e
[RLlib] Unflake some CI-tests. (#25313) 2022-06-03 14:51:50 +02:00
Yi Cheng
fd0f967d2e
Revert "[RLlib] Move (A/DD)?PPO and IMPALA algos to algorithms dir and rename policy and trainer classes. (#25346)" (#25420)
This reverts commit e4ceae19ef.

Reverts #25346

linux://python/ray/tests:test_client_library_integration never fail before this PR.

In the CI of the reverted PR, it also fails (https://buildkite.com/ray-project/ray-builders-pr/builds/34079#01812442-c541-4145-af22-2a012655c128). So high likely it's because of this PR.

And test output failure seems related as well (https://buildkite.com/ray-project/ray-builders-branch/builds/7923#018125c2-4812-4ead-a42f-7fddb344105b)
2022-06-02 20:38:44 -07:00
Sven Mika
e4ceae19ef
[RLlib] Move (A/DD)?PPO and IMPALA algos to algorithms dir and rename policy and trainer classes. (#25346) 2022-06-02 16:47:05 +02:00
Sven Mika
18c03f8d93
[RLlib] A2C + A3C move to algorithms folder and re-name into A2C/A3C (from ...Trainer). (#25314) 2022-06-01 09:29:16 +02:00
Sven Mika
d95009a3ac
[RLlib] Vectorized envs: Gracefully handle sub-environments failing by restarting them (if configured so). (#24967) 2022-05-28 10:50:03 +02:00
Artur Niederfahrenhorst
d76ef9add5
[RLLib] Fix RNNSAC example failing on CI + fixes for recurrent models for other Q Learning Algos. (#24923) 2022-05-24 14:39:43 +02:00
Sven Mika
e73c37cc17
[RLlib] MADDPG: Move into main algorithms folder and add proper unit and learning tests. (#24579) 2022-05-24 12:53:53 +02:00
Steven Morad
501d932449
[RLlib] SAC, RNNSAC, and CQL TrainerConfig objects (#25059) 2022-05-22 19:58:47 +02:00
Sven Mika
8f50087908
[RLlib] AlphaZero uses training_iteration API. (#24507) 2022-05-18 09:58:25 +02:00
Artur Niederfahrenhorst
fb2915d26a
[RLlib] Replay Buffer API and Ape-X. (#24506) 2022-05-17 13:43:49 +02:00
Jun Gong
68a9a33386
[RLlib] Retry agents -> algorithms. with proper doc changes this time. (#24797) 2022-05-16 09:45:32 +02:00
Simon Mo
9f23affdc0
[Hotfix] Unbreak lint in master (#24794) 2022-05-13 15:05:05 -07:00
kourosh hakhamaneshi
ffcbb30552
[RLlib] Move from agents to algorithms - CQL, MARWIL, AlphaStar, MAML, Dreamer, MBMPO. (#24739) 2022-05-13 18:43:36 +02:00
Artur Niederfahrenhorst
95d4a83a87
[RLlib] R2D2 Replay Buffer API integration. (#24473) 2022-05-10 20:36:14 +02:00
Artur Niederfahrenhorst
8d906f9bf8
[RLlib] SAC with new Replay Buffer API. (#24156) 2022-05-09 14:33:02 +02:00
Sven Mika
7ab19ddc32
[RLlib] MADDPG: Move into agents folder (from contrib) and use training_iteration method. (#24502) 2022-05-06 12:35:21 +02:00
Artur Niederfahrenhorst
86bc9ecce2
[RLlib] DDPG Training iteration fn & Replay Buffer API (#24212) 2022-05-05 09:41:38 +02:00
Sven Mika
f066180ed5
[RLlib] Deprecate timesteps_per_iteration config key (in favor of min_[sample|train]_timesteps_per_reporting. (#24372) 2022-05-02 12:51:14 +02:00
Sven Mika
3052193c9e
[RLlib] Fix CQL getting stuck when deprecated timesteps_per_iteration is used (use min_train_timesteps_per_reporting instead). (#24345)
Fix CQL getting stuck when deprecated timesteps_per_iteration is used (use min_train_timesteps_per_reporting instead).

CQL does not perform sampling timesteps and the deprecated timesteps_per_iteration is automatically translated into the new min_sample_timesteps_per_reporting, but should be translated (only for CQL and other purely offline RL algos) into min_train_timesteps_per_reporting.

If timesteps_per_iteration, CQL lever leaves the first iteration as it thinks it's not done yet (sample timesteps always remain at 0).
2022-04-29 21:02:34 +01:00
Avnish Narayan
6e68b6bef9
[RLlib] DD-PPO training iteration fn. (#24118)
We had unreported merge conflicts with DDPPO. This PR closes and combines #24092, #24035, #24030 and #23096

Co-authored-by: sven1977 <svenmika1977@gmail.com>
2022-04-22 15:22:14 -07:00
Kai Fricke
9f7170e444
Revert "Revert revert #23906 [RLlib] DD-PPO training iteration function implementation. (#24035)" (#24103)
This reverts commit a337fd994e.
2022-04-22 09:58:58 +01:00
Avnish Narayan
a337fd994e
Revert revert #23906 [RLlib] DD-PPO training iteration function implementation. (#24035) 2022-04-21 17:37:49 +02:00
Avnish Narayan
477b9d22d2
[RLlib][Training iteration fn] APEX conversion (#22937) 2022-04-20 17:56:18 +02:00
Avnish Narayan
0ddbce6518
Revert "[RLlib] DD-PPO training iteration fn (#23906)" (#24030)
The DDPPO LR scheduler test is broken because the learner_info_dictionary that is returned by the training iteration function does not consistently return a learner info for every training iteration, but the test expects that it does.

We'll need to fix the test then re-merge

Reverts #23906
2022-04-19 16:43:57 -07:00
Sven Mika
eb54236d13
[RLlib] DD-PPO training iteration fn (#23906) 2022-04-19 17:55:26 +02:00
Sven Mika
92781c603e
[RLlib] A2C training_iteration method implementation (_disable_execution_plan_api=True) (#23735) 2022-04-15 18:36:13 +02:00
Sven Mika
a8494742a3
[RLlib] Memory leak finding toolset using tracemalloc + CI memory leak tests. (#15412) 2022-04-12 07:50:09 +02:00
Artur Niederfahrenhorst
9a64bd4e9b
[RLlib] Simple-Q uses training iteration fn (instead of execution_plan); ReplayBuffer API for Simple-Q (#22842) 2022-03-29 14:44:40 +02:00
Kai Fricke
262d6121bb
[rllib] Fix error messages and example for dataset writer (#23419)
Currently the error message and example refer to a field type that is actually format.
2022-03-28 19:53:12 +01:00
Sven Mika
b1cda46681
[RLlib] SlateQ (tf GPU + multi-GPU) + Bandit fixes (#23276) 2022-03-18 13:45:16 +01:00
Sven Mika
7b687e6cd8
[RLlib] SlateQ: Add a hard-task learning test to weekly regression suite. (#22544) 2022-02-25 21:58:16 +01:00
Sven Mika
8e00537b65
[RLlib] SlateQ: framework=tf fixes and SlateQ documentation update (#22543) 2022-02-23 13:03:45 +01:00
Sven Mika
6522935291
[RLlib] Slate-Q tf implementation and tests/benchmarks. (#22389) 2022-02-22 09:36:44 +01:00
Sven Mika
c17a44cdfa
Revert "Revert "[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learni…" (#22153) 2022-02-08 16:43:00 +01:00
Sven Mika
38d75ce058
[RLlib] Cleanup SlateQ algo; add test + add target Q-net (#21827) 2022-02-04 17:01:12 +01:00
Avnish Narayan
0d2ba41e41
[RLlib] [CI] Deflake longer running RLlib learning tests for off policy algorithms. Fix seeding issue in TransformedAction Environments (#21685) 2022-02-04 14:59:56 +01:00
SangBin Cho
a887763b38
Revert "[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learni… (#22105)
This reverts commit 3f03ef8ba8.
2022-02-04 00:54:50 -08:00
Sven Mika
3f03ef8ba8
[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learning via league-based self-play. (#21356) 2022-02-03 09:32:09 +01:00
Balaji Veeramani
7f1bacc7dc
[CI] Format Python code with Black (#21975)
See #21316 and #21311 for the motivation behind these changes.
2022-01-29 18:41:57 -08:00
Jun Gong
099c170ab4
[RLlib] Dataset Reader/Writer for RLlib (#21808) 2022-01-26 16:00:46 +01:00
Sven Mika
d5bfb7b7da
[RLlib] Preparatory PR for multi-agent multi-GPU learner (alpha-star style) #03 (#21652) 2022-01-25 14:16:58 +01:00
Sven Mika
853d10871c
[RLlib] Issue 18499: PGTrainer with training_iteration fn does not support multi-GPU. (#21376) 2022-01-05 18:22:33 +01:00
Sven Mika
abd3bef63b
[RLlib] QMIX better defaults + added to CI learning tests (#21332) 2022-01-04 08:54:41 +01:00
Sven Mika
62dbf26394
[RLlib] POC: Run PGTrainer w/o the distr. exec API (Trainer's new training_iteration method). (#20984) 2021-12-21 08:39:05 +01:00