hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-04 17:41:43 -05:00

Author	SHA1	Message	Date
Charles Sun	edde905741	[RLlib] Add Decision Transformer (DT) (#27890 )	2022-08-17 13:49:13 -07:00
Sven Mika	436c89ba1a	[RLlib] Eval workers use async req manager. (#27390 )	2022-08-16 12:05:55 +02:00
Artur Niederfahrenhorst	0dceddb912	[RLlib] Move learning_starts logic from buffers into `training_step()`. (#26032 )	2022-08-11 13:07:30 +02:00
Charles Sun	c358305ca6	[RLlib] DatasetReader action normalization. (#27356 )	2022-08-09 16:54:03 +02:00
Jun Gong	61add8ede6	[RLlib] Fix the last cartpole-crashing premerge test. (#27315 )	2022-08-02 20:08:33 +02:00
Jun Gong	e6e10ce4cf	[RLlib] Revert `41c9ef70`. (#27243 ) Why are these changes needed? Also: Add validation to make sure multi-gpu and micro-batch is not used together. Update A2C learning test to hit the microbatching branch. Minor comment updates.	2022-07-29 11:05:15 -07:00
Jun Gong	e1cf0cc982	[RLlib] Deflake cartpole crashing tests. (#27097 ) Make sure cartpole crashing tests are not flaky.	2022-07-27 12:50:34 -07:00
Jun Gong	a22457b548	[RLlib] Small bug fix (#27003 )	2022-07-27 00:02:18 -07:00
kourosh hakhamaneshi	8ddcf89096	[RLlib] Implemented ViewRequirementConnector (#26998 )	2022-07-26 21:52:14 -07:00
Avnish Narayan	41c9ef709a	[RLlib] Using PG when not doing microbatching kills A2C performance. (#26844 )	2022-07-25 15:11:26 +02:00
Avnish Narayan	2a0ef663c9	[rllib] Use compress observations where replay buffers and image obs are used in tuned examples (#26735 )	2022-07-22 10:10:51 -07:00
kourosh hakhamaneshi	aec79afda1	[RLlib] Fixes CRR flakeyness (#26770 )	2022-07-20 12:08:57 -07:00
Sven Mika	4aea24c8a8	[RLlib] `restart_failed_sub_environments` now works for MA cases and crashes during `reset()`; +more tests and logging; add eval worker sub-env fault tolerance test. (#26276 )	2022-07-15 08:55:14 +02:00
Avnish Narayan	a322ac463c	[RLlib] Make JSONReader default, users will have to use the DatasetReader for any speedups. (#26541 )	2022-07-14 17:19:38 +02:00
Avnish Narayan	1243ed62bf	[RLlib] Make Dataset reader default reader and enable CRR to use dataset (#26304 ) Co-authored-by: avnish <avnish@avnishs-MBP.local.meter>	2022-07-08 12:43:35 -07:00
Sven Mika	2b43713785	[RLlib] Move IMPALA and APPO back to exec plan (for now; due to unresolved learning/performance issues). (#25851 )	2022-06-29 08:41:47 +02:00
kourosh hakhamaneshi	f421730b47	[RLlib] Added `expectation` advantage_type option to CRR. (#26142 )	2022-06-28 15:40:09 +02:00
Sven Mika	be1042429d	[RLlib] Deprecation: Replace remaining `evaluation_num_episodes` with `evaluation_duration`. (#26000 )	2022-06-23 19:11:29 +02:00
Sven Mika	1499af945b	[RLlib] Algorithm `step()` fixes: evaluation should NOT be part of timed `training_step` loop. (#25924 )	2022-06-20 19:53:47 +02:00
Artur Niederfahrenhorst	a322cc5765	[RLlib] IMPALA/APPO multi-agent mix-in-buffer fixes (plus MA learning tests). (#25848 )	2022-06-17 14:10:36 +02:00
Sven Mika	7c39aa5fac	[RLlib] Trainer.training_iteration -> Trainer.training_step; Iterations vs reportings: Clarification of terms. (#25076 )	2022-06-10 17:09:18 +02:00
kourosh hakhamaneshi	4cdd508f70	[RLlib] Added CRR implementation. (#25499 )	2022-06-08 11:42:02 +02:00
Artur Niederfahrenhorst	35bd397181	[RLlib] Better default values for `training_intensity` and `target_network_update_freq` for R2D2. (#25510 )	2022-06-07 10:29:56 +02:00
Sven Mika	b5bc2b93c3	[RLlib] Move all remaining algos into `algorithms` directory. (#25366 )	2022-06-04 07:35:24 +02:00
Sven Mika	6c7f781d8e	[RLlib] Unflake some CI-tests. (#25313 )	2022-06-03 14:51:50 +02:00
Yi Cheng	fd0f967d2e	Revert "[RLlib] Move (A/DD)?PPO and IMPALA algos to `algorithms` dir and rename policy and trainer classes. (#25346 )" (#25420 ) This reverts commit `e4ceae19ef`. Reverts #25346 linux://python/ray/tests:test_client_library_integration never fail before this PR. In the CI of the reverted PR, it also fails (https://buildkite.com/ray-project/ray-builders-pr/builds/34079#01812442-c541-4145-af22-2a012655c128). So high likely it's because of this PR. And test output failure seems related as well (https://buildkite.com/ray-project/ray-builders-branch/builds/7923#018125c2-4812-4ead-a42f-7fddb344105b)	2022-06-02 20:38:44 -07:00
Sven Mika	e4ceae19ef	[RLlib] Move (A/DD)?PPO and IMPALA algos to `algorithms` dir and rename policy and trainer classes. (#25346 )	2022-06-02 16:47:05 +02:00
Sven Mika	18c03f8d93	[RLlib] A2C + A3C move to `algorithms` folder and re-name into A2C/A3C (from ...Trainer). (#25314 )	2022-06-01 09:29:16 +02:00
Sven Mika	d95009a3ac	[RLlib] Vectorized envs: Gracefully handle sub-environments failing by restarting them (if configured so). (#24967 )	2022-05-28 10:50:03 +02:00
Artur Niederfahrenhorst	d76ef9add5	[RLLib] Fix RNNSAC example failing on CI + fixes for recurrent models for other Q Learning Algos. (#24923 )	2022-05-24 14:39:43 +02:00
Sven Mika	e73c37cc17	[RLlib] MADDPG: Move into main `algorithms` folder and add proper unit and learning tests. (#24579 )	2022-05-24 12:53:53 +02:00
Steven Morad	501d932449	[RLlib] SAC, RNNSAC, and CQL TrainerConfig objects (#25059 )	2022-05-22 19:58:47 +02:00
Sven Mika	8f50087908	[RLlib] AlphaZero uses training_iteration API. (#24507 )	2022-05-18 09:58:25 +02:00
Artur Niederfahrenhorst	fb2915d26a	[RLlib] Replay Buffer API and Ape-X. (#24506 )	2022-05-17 13:43:49 +02:00
Jun Gong	68a9a33386	[RLlib] Retry agents -> algorithms. with proper doc changes this time. (#24797 )	2022-05-16 09:45:32 +02:00
Simon Mo	9f23affdc0	[Hotfix] Unbreak lint in master (#24794 )	2022-05-13 15:05:05 -07:00
kourosh hakhamaneshi	ffcbb30552	[RLlib] Move from `agents` to `algorithms` - CQL, MARWIL, AlphaStar, MAML, Dreamer, MBMPO. (#24739 )	2022-05-13 18:43:36 +02:00
Artur Niederfahrenhorst	95d4a83a87	[RLlib] R2D2 Replay Buffer API integration. (#24473 )	2022-05-10 20:36:14 +02:00
Artur Niederfahrenhorst	8d906f9bf8	[RLlib] SAC with new Replay Buffer API. (#24156 )	2022-05-09 14:33:02 +02:00
Sven Mika	7ab19ddc32	[RLlib] MADDPG: Move into agents folder (from contrib) and use `training_iteration` method. (#24502 )	2022-05-06 12:35:21 +02:00
Artur Niederfahrenhorst	86bc9ecce2	[RLlib] DDPG Training iteration fn & Replay Buffer API (#24212 )	2022-05-05 09:41:38 +02:00
Sven Mika	f066180ed5	[RLlib] Deprecate `timesteps_per_iteration` config key (in favor of `min_[sample\|train]_timesteps_per_reporting`. (#24372 )	2022-05-02 12:51:14 +02:00
Sven Mika	3052193c9e	[RLlib] Fix CQL getting stuck when deprecated `timesteps_per_iteration` is used (use `min_train_timesteps_per_reporting` instead). (#24345 ) Fix CQL getting stuck when deprecated timesteps_per_iteration is used (use min_train_timesteps_per_reporting instead). CQL does not perform sampling timesteps and the deprecated timesteps_per_iteration is automatically translated into the new min_sample_timesteps_per_reporting, but should be translated (only for CQL and other purely offline RL algos) into min_train_timesteps_per_reporting. If timesteps_per_iteration, CQL lever leaves the first iteration as it thinks it's not done yet (sample timesteps always remain at 0).	2022-04-29 21:02:34 +01:00
Avnish Narayan	6e68b6bef9	[RLlib] DD-PPO training iteration fn. (#24118 ) We had unreported merge conflicts with DDPPO. This PR closes and combines #24092, #24035, #24030 and #23096 Co-authored-by: sven1977 <svenmika1977@gmail.com>	2022-04-22 15:22:14 -07:00
Kai Fricke	9f7170e444	Revert "Revert revert #23906 [RLlib] DD-PPO training iteration function implementation. (#24035 )" (#24103 ) This reverts commit `a337fd994e`.	2022-04-22 09:58:58 +01:00
Avnish Narayan	a337fd994e	Revert revert #23906 [RLlib] DD-PPO training iteration function implementation. (#24035 )	2022-04-21 17:37:49 +02:00
Avnish Narayan	477b9d22d2	[RLlib][Training iteration fn] APEX conversion (#22937 )	2022-04-20 17:56:18 +02:00
Avnish Narayan	0ddbce6518	Revert "[RLlib] DD-PPO training iteration fn (#23906 )" (#24030 ) The DDPPO LR scheduler test is broken because the learner_info_dictionary that is returned by the training iteration function does not consistently return a learner info for every training iteration, but the test expects that it does. We'll need to fix the test then re-merge Reverts #23906	2022-04-19 16:43:57 -07:00
Sven Mika	eb54236d13	[RLlib] DD-PPO training iteration fn (#23906 )	2022-04-19 17:55:26 +02:00
Sven Mika	92781c603e	[RLlib] A2C `training_iteration` method implementation (`_disable_execution_plan_api=True`) (#23735 )	2022-04-15 18:36:13 +02:00

1 2 3 4

168 commits