Charles Sun
edde905741
[RLlib] Add Decision Transformer (DT) ( #27890 )
2022-08-17 13:49:13 -07:00
Sven Mika
436c89ba1a
[RLlib] Eval workers use async req manager. ( #27390 )
2022-08-16 12:05:55 +02:00
Artur Niederfahrenhorst
0dceddb912
[RLlib] Move learning_starts logic from buffers into training_step()
. ( #26032 )
2022-08-11 13:07:30 +02:00
Charles Sun
c358305ca6
[RLlib] DatasetReader action normalization. ( #27356 )
2022-08-09 16:54:03 +02:00
Jun Gong
61add8ede6
[RLlib] Fix the last cartpole-crashing premerge test. ( #27315 )
2022-08-02 20:08:33 +02:00
Jun Gong
e6e10ce4cf
[RLlib] Revert 41c9ef70
. ( #27243 )
...
Why are these changes needed?
Also:
Add validation to make sure multi-gpu and micro-batch is not used together.
Update A2C learning test to hit the microbatching branch.
Minor comment updates.
2022-07-29 11:05:15 -07:00
Jun Gong
e1cf0cc982
[RLlib] Deflake cartpole crashing tests. ( #27097 )
...
Make sure cartpole crashing tests are not flaky.
2022-07-27 12:50:34 -07:00
Jun Gong
a22457b548
[RLlib] Small bug fix ( #27003 )
2022-07-27 00:02:18 -07:00
kourosh hakhamaneshi
8ddcf89096
[RLlib] Implemented ViewRequirementConnector ( #26998 )
2022-07-26 21:52:14 -07:00
Avnish Narayan
41c9ef709a
[RLlib] Using PG when not doing microbatching kills A2C performance. ( #26844 )
2022-07-25 15:11:26 +02:00
Avnish Narayan
2a0ef663c9
[rllib] Use compress observations where replay buffers and image obs are used in tuned examples ( #26735 )
2022-07-22 10:10:51 -07:00
kourosh hakhamaneshi
aec79afda1
[RLlib] Fixes CRR flakeyness ( #26770 )
2022-07-20 12:08:57 -07:00
Sven Mika
4aea24c8a8
[RLlib] restart_failed_sub_environments
now works for MA cases and crashes during reset()
; +more tests and logging; add eval worker sub-env fault tolerance test. ( #26276 )
2022-07-15 08:55:14 +02:00
Avnish Narayan
a322ac463c
[RLlib] Make JSONReader default, users will have to use the DatasetReader for any speedups. ( #26541 )
2022-07-14 17:19:38 +02:00
Avnish Narayan
1243ed62bf
[RLlib] Make Dataset reader default reader and enable CRR to use dataset ( #26304 )
...
Co-authored-by: avnish <avnish@avnishs-MBP.local.meter>
2022-07-08 12:43:35 -07:00
Sven Mika
2b43713785
[RLlib] Move IMPALA and APPO back to exec plan (for now; due to unresolved learning/performance issues). ( #25851 )
2022-06-29 08:41:47 +02:00
kourosh hakhamaneshi
f421730b47
[RLlib] Added expectation
advantage_type option to CRR. ( #26142 )
2022-06-28 15:40:09 +02:00
Sven Mika
be1042429d
[RLlib] Deprecation: Replace remaining evaluation_num_episodes
with evaluation_duration
. ( #26000 )
2022-06-23 19:11:29 +02:00
Sven Mika
1499af945b
[RLlib] Algorithm step()
fixes: evaluation should NOT be part of timed training_step
loop. ( #25924 )
2022-06-20 19:53:47 +02:00
Artur Niederfahrenhorst
a322cc5765
[RLlib] IMPALA/APPO multi-agent mix-in-buffer fixes (plus MA learning tests). ( #25848 )
2022-06-17 14:10:36 +02:00
Sven Mika
7c39aa5fac
[RLlib] Trainer.training_iteration -> Trainer.training_step; Iterations vs reportings: Clarification of terms. ( #25076 )
2022-06-10 17:09:18 +02:00
kourosh hakhamaneshi
4cdd508f70
[RLlib] Added CRR implementation. ( #25499 )
2022-06-08 11:42:02 +02:00
Artur Niederfahrenhorst
35bd397181
[RLlib] Better default values for training_intensity
and target_network_update_freq
for R2D2. ( #25510 )
2022-06-07 10:29:56 +02:00
Sven Mika
b5bc2b93c3
[RLlib] Move all remaining algos into algorithms
directory. ( #25366 )
2022-06-04 07:35:24 +02:00
Sven Mika
6c7f781d8e
[RLlib] Unflake some CI-tests. ( #25313 )
2022-06-03 14:51:50 +02:00
Yi Cheng
fd0f967d2e
Revert "[RLlib] Move (A/DD)?PPO and IMPALA algos to algorithms
dir and rename policy and trainer classes. ( #25346 )" ( #25420 )
...
This reverts commit e4ceae19ef
.
Reverts #25346
linux://python/ray/tests:test_client_library_integration never fail before this PR.
In the CI of the reverted PR, it also fails (https://buildkite.com/ray-project/ray-builders-pr/builds/34079#01812442-c541-4145-af22-2a012655c128 ). So high likely it's because of this PR.
And test output failure seems related as well (https://buildkite.com/ray-project/ray-builders-branch/builds/7923#018125c2-4812-4ead-a42f-7fddb344105b )
2022-06-02 20:38:44 -07:00
Sven Mika
e4ceae19ef
[RLlib] Move (A/DD)?PPO and IMPALA algos to algorithms
dir and rename policy and trainer classes. ( #25346 )
2022-06-02 16:47:05 +02:00
Sven Mika
18c03f8d93
[RLlib] A2C + A3C move to algorithms
folder and re-name into A2C/A3C (from ...Trainer). ( #25314 )
2022-06-01 09:29:16 +02:00
Sven Mika
d95009a3ac
[RLlib] Vectorized envs: Gracefully handle sub-environments failing by restarting them (if configured so). ( #24967 )
2022-05-28 10:50:03 +02:00
Artur Niederfahrenhorst
d76ef9add5
[RLLib] Fix RNNSAC example failing on CI + fixes for recurrent models for other Q Learning Algos. ( #24923 )
2022-05-24 14:39:43 +02:00
Sven Mika
e73c37cc17
[RLlib] MADDPG: Move into main algorithms
folder and add proper unit and learning tests. ( #24579 )
2022-05-24 12:53:53 +02:00
Steven Morad
501d932449
[RLlib] SAC, RNNSAC, and CQL TrainerConfig objects ( #25059 )
2022-05-22 19:58:47 +02:00
Sven Mika
8f50087908
[RLlib] AlphaZero uses training_iteration API. ( #24507 )
2022-05-18 09:58:25 +02:00
Artur Niederfahrenhorst
fb2915d26a
[RLlib] Replay Buffer API and Ape-X. ( #24506 )
2022-05-17 13:43:49 +02:00
Jun Gong
68a9a33386
[RLlib] Retry agents -> algorithms. with proper doc changes this time. ( #24797 )
2022-05-16 09:45:32 +02:00
Simon Mo
9f23affdc0
[Hotfix] Unbreak lint in master ( #24794 )
2022-05-13 15:05:05 -07:00
kourosh hakhamaneshi
ffcbb30552
[RLlib] Move from agents
to algorithms
- CQL, MARWIL, AlphaStar, MAML, Dreamer, MBMPO. ( #24739 )
2022-05-13 18:43:36 +02:00
Artur Niederfahrenhorst
95d4a83a87
[RLlib] R2D2 Replay Buffer API integration. ( #24473 )
2022-05-10 20:36:14 +02:00
Artur Niederfahrenhorst
8d906f9bf8
[RLlib] SAC with new Replay Buffer API. ( #24156 )
2022-05-09 14:33:02 +02:00
Sven Mika
7ab19ddc32
[RLlib] MADDPG: Move into agents folder (from contrib) and use training_iteration
method. ( #24502 )
2022-05-06 12:35:21 +02:00
Artur Niederfahrenhorst
86bc9ecce2
[RLlib] DDPG Training iteration fn & Replay Buffer API ( #24212 )
2022-05-05 09:41:38 +02:00
Sven Mika
f066180ed5
[RLlib] Deprecate timesteps_per_iteration
config key (in favor of min_[sample|train]_timesteps_per_reporting
. ( #24372 )
2022-05-02 12:51:14 +02:00
Sven Mika
3052193c9e
[RLlib] Fix CQL getting stuck when deprecated timesteps_per_iteration
is used (use min_train_timesteps_per_reporting
instead). ( #24345 )
...
Fix CQL getting stuck when deprecated timesteps_per_iteration is used (use min_train_timesteps_per_reporting instead).
CQL does not perform sampling timesteps and the deprecated timesteps_per_iteration is automatically translated into the new min_sample_timesteps_per_reporting, but should be translated (only for CQL and other purely offline RL algos) into min_train_timesteps_per_reporting.
If timesteps_per_iteration, CQL lever leaves the first iteration as it thinks it's not done yet (sample timesteps always remain at 0).
2022-04-29 21:02:34 +01:00
Avnish Narayan
6e68b6bef9
[RLlib] DD-PPO training iteration fn. ( #24118 )
...
We had unreported merge conflicts with DDPPO. This PR closes and combines #24092 , #24035 , #24030 and #23096
Co-authored-by: sven1977 <svenmika1977@gmail.com>
2022-04-22 15:22:14 -07:00
Kai Fricke
9f7170e444
Revert "Revert revert #23906 [RLlib] DD-PPO training iteration function implementation. ( #24035 )" ( #24103 )
...
This reverts commit a337fd994e
.
2022-04-22 09:58:58 +01:00
Avnish Narayan
a337fd994e
Revert revert #23906 [RLlib] DD-PPO training iteration function implementation. ( #24035 )
2022-04-21 17:37:49 +02:00
Avnish Narayan
477b9d22d2
[RLlib][Training iteration fn] APEX conversion ( #22937 )
2022-04-20 17:56:18 +02:00
Avnish Narayan
0ddbce6518
Revert "[RLlib] DD-PPO training iteration fn ( #23906 )" ( #24030 )
...
The DDPPO LR scheduler test is broken because the learner_info_dictionary that is returned by the training iteration function does not consistently return a learner info for every training iteration, but the test expects that it does.
We'll need to fix the test then re-merge
Reverts #23906
2022-04-19 16:43:57 -07:00
Sven Mika
eb54236d13
[RLlib] DD-PPO training iteration fn ( #23906 )
2022-04-19 17:55:26 +02:00
Sven Mika
92781c603e
[RLlib] A2C training_iteration
method implementation (_disable_execution_plan_api=True
) ( #23735 )
2022-04-15 18:36:13 +02:00