hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Eric Liang	55d039af32	Annotate datasources and add API annotation check script (#24999 ) Why are these changes needed? Add API stability annotations for datasource classes, and add a linter to check all data classes have appropriate annotations.	2022-05-21 15:05:07 -07:00
kourosh hakhamaneshi	3815e52a61	[RLlib] Agents to algos: DQN w/o Apex and R2D2, DDPG/TD3, SAC, SlateQ, QMIX, PG, Bandits (#24896 )	2022-05-19 18:30:42 +02:00
Artur Niederfahrenhorst	fb2915d26a	[RLlib] Replay Buffer API and Ape-X. (#24506 )	2022-05-17 13:43:49 +02:00
Max Pumperla	6a6c58b5b4	[RLlib] Config objects for DDPG and SimpleQ. (#24339 )	2022-05-12 16:12:42 +02:00
Sven Mika	f54557073e	[RLlib] Remove `execution_plan` API code no longer needed. (#24501 )	2022-05-06 12:29:53 +02:00
Sven Mika	5b61a00792	[RLlib] Feed all values in COMMON_CONFIG directly from TrainerConfig() (removes duplicate values and comments). (#24433 )	2022-05-04 16:28:12 +02:00
Sven Mika	1bc6419e0e	[RLlib] R2D2 training iteration fn AND switch off `execution_plan` API by default. (#24165 )	2022-05-03 07:59:26 +02:00
Sven Mika	7cca7782f1	[RLlib] OPE (off policy estimator) API. (#24384 )	2022-05-02 21:15:50 +02:00
Sven Mika	f066180ed5	[RLlib] Deprecate `timesteps_per_iteration` config key (in favor of `min_[sample\|train]_timesteps_per_reporting`. (#24372 )	2022-05-02 12:51:14 +02:00
Sven Mika	b2b1c95aa5	[RLlib] A2/3C Config objects (A2CConfig and A3CConfig). (#24332 )	2022-04-30 09:51:09 +02:00
Sven Mika	627b9f2e88	[RLlib] QMIX training iteration function and new replay buffer API. (#24164 )	2022-04-27 14:24:20 +02:00
Avnish Narayan	6e68b6bef9	[RLlib] DD-PPO training iteration fn. (#24118 ) We had unreported merge conflicts with DDPPO. This PR closes and combines #24092, #24035, #24030 and #23096 Co-authored-by: sven1977 <svenmika1977@gmail.com>	2022-04-22 15:22:14 -07:00
Kai Fricke	9f7170e444	Revert "Revert revert #23906 [RLlib] DD-PPO training iteration function implementation. (#24035 )" (#24103 ) This reverts commit `a337fd994e`.	2022-04-22 09:58:58 +01:00
Avnish Narayan	a337fd994e	Revert revert #23906 [RLlib] DD-PPO training iteration function implementation. (#24035 )	2022-04-21 17:37:49 +02:00
Avnish Narayan	477b9d22d2	[RLlib][Training iteration fn] APEX conversion (#22937 )	2022-04-20 17:56:18 +02:00
Avnish Narayan	0ddbce6518	Revert "[RLlib] DD-PPO training iteration fn (#23906 )" (#24030 ) The DDPPO LR scheduler test is broken because the learner_info_dictionary that is returned by the training iteration function does not consistently return a learner info for every training iteration, but the test expects that it does. We'll need to fix the test then re-merge Reverts #23906	2022-04-19 16:43:57 -07:00
Sven Mika	eb54236d13	[RLlib] DD-PPO training iteration fn (#23906 )	2022-04-19 17:55:26 +02:00
Artur Niederfahrenhorst	e57ce7efd6	[RLlib] Replay Buffer API and Training Iteration Fn for DQN. (#23420 )	2022-04-18 12:20:12 +02:00
kourosh hakhamaneshi	c38a29573f	[RLlib] Removed deprecated code with error=True (#23916 )	2022-04-15 13:51:12 +02:00
Steven Morad	00922817b6	[RLlib] Rewrite PPO to use training_iteration + enable DD-PPO for Win32. (#23673 )	2022-04-11 08:39:10 +02:00
Sven Mika	c82f6c62c8	[RLlib] Make RolloutWorkers (optionally) recoverable after failure. (#23739 )	2022-04-08 15:33:28 +02:00
Sven Mika	2eaa54bd76	[RLlib] POC: Config objects instead of dicts (PPO only). (#23491 )	2022-03-31 18:26:12 +02:00
Artur Niederfahrenhorst	9a64bd4e9b	[RLlib] Simple-Q uses training iteration fn (instead of execution_plan); ReplayBuffer API for Simple-Q (#22842 )	2022-03-29 14:44:40 +02:00
Sven Mika	7cb86acce2	[RLlib] trainer_template.py: hard deprecation (error when used). (#23488 )	2022-03-25 18:25:51 +01:00
Max Pumperla	60054995e6	[docs] fix doctests and activate CI (#23418 )	2022-03-24 17:04:02 -07:00
Sven Mika	b1cda46681	[RLlib] SlateQ (tf GPU + multi-GPU) + Bandit fixes (#23276 )	2022-03-18 13:45:16 +01:00
Siyuan (Ryans) Zhuang	0c74ecad12	[Lint] Cleanup incorrectly formatted strings (Part 1: RLLib). (#23128 )	2022-03-15 17:34:21 +01:00
simonsays1980	8627f44d7f	[RLlib] Remove duplicate code block: Config deprecation check for `metrics_smoothing_episodes` (#22152 )	2022-03-09 16:51:42 +01:00
Jun Gong	e8be45065e	[RLlib] Restore policies on `eval_workers` as well. (#22641 )	2022-03-01 08:38:14 +01:00
Jun Gong	2b6a0c71d7	[RLlib] Add a callback for when trainer finishes initialization: `on_trainer_init`. (#22493 )	2022-02-22 08:18:32 +01:00
Avnish Narayan	740def0a13	[RLlib] Put env-checker on critical path. (#22191 )	2022-02-17 14:06:14 +01:00
Sven Mika	5ca6a56e16	[RLlib] Bug fix: eval-workers in offline RL setup have no env, even though eval_config includes env key. (#22350 )	2022-02-15 09:32:43 +01:00
Sven Mika	04a5c72ea3	Revert "Revert "[RLlib] Speedup A3C up to 3x (new training_iteration function instead of execution_plan) and re-instate Pong learning test."" (#18708 )	2022-02-10 13:44:22 +01:00
Alex Wu	b122f093c1	Revert "[RLlib] Speedup A3C up to 3x (new `training_iteration` function instead of `execution_plan`) and re-instate Pong learning test." (#22250 ) Reverts ray-project/ray#22126 Breaks rllib:tests/test_io	2022-02-09 09:26:36 -08:00
Ishant Mrinal	f0d8b6d701	[RLlib] Fix compute_actions() for Trainer due to missing if prev_actions/rewards is not None checks. (#22078 )	2022-02-09 09:05:26 +01:00
Balaji Veeramani	31ed9e5d02	[CI] Replace YAPF disables with Black disables (#21982 )	2022-02-08 16:29:25 -08:00
Sven Mika	ac3e6ab411	[RLlib] Speedup A3C up to 3x (new `training_iteration` function instead of `execution_plan`) and re-instate Pong learning test. (#22126 )	2022-02-08 19:04:13 +01:00
Sven Mika	c17a44cdfa	Revert "Revert "[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learni…" (#22153 )	2022-02-08 16:43:00 +01:00
SangBin Cho	a887763b38	Revert "[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learni… (#22105 ) This reverts commit `3f03ef8ba8`.	2022-02-04 00:54:50 -08:00
Sven Mika	3f03ef8ba8	[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learning via league-based self-play. (#21356 )	2022-02-03 09:32:09 +01:00
Rodrigo de Lazcano	a258f9c692	[RLlib] Neural-MMO `keep_per_episode_custom_metrics` patch (toward making Neuro-MMO RLlib's default massive-multi-agent learning test environment). (#22042 )	2022-02-02 17:28:42 +01:00
Jun Gong	87fe033f7b	[RLlib] Request CPU resources in `Trainer.default_resource_request()` if using dataset input. (#21948 )	2022-02-02 10:20:37 +01:00
Balaji Veeramani	7f1bacc7dc	[CI] Format Python code with Black (#21975 ) See #21316 and #21311 for the motivation behind these changes.	2022-01-29 18:41:57 -08:00
Sven Mika	371fbb17e4	[RLlib] Make `policies_to_train` more flexible via callable option. (#20735 )	2022-01-27 12:17:34 +01:00
Jun Gong	099c170ab4	[RLlib] Dataset Reader/Writer for RLlib (#21808 )	2022-01-26 16:00:46 +01:00
Sven Mika	d5bfb7b7da	[RLlib] Preparatory PR for multi-agent multi-GPU learner (alpha-star style) #03 (#21652 )	2022-01-25 14:16:58 +01:00
Sven Mika	c4636c7c05	[RLlib] Issue 21633: SimpleQ should not use a prio. replay buffer. (#21665 )	2022-01-20 11:46:25 +01:00
Jun Gong	7517aefe05	[RLlib] Bring back BC and Marwil learning tests. (#21574 )	2022-01-14 14:35:32 +01:00
Sven Mika	90c6b10498	[RLlib] Decentralized multi-agent learning; PR #01 (#21421 )	2022-01-13 10:52:55 +01:00
Sven Mika	188324c5c7	[RLlib] Issue 21552: `unsquash_action` and `clip_action` (when None) cause wrong actions computed by `Trainer.compute_single_action`. (#21553 )	2022-01-12 18:56:51 +01:00

1 2 3 4 5

239 commits