hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-09 12:56:46 -04:00

Author	SHA1	Message	Date
Kai Fricke	c0ec20dc3a	[tune] Next deprecation cycle (#24076 ) Rolling out next deprecation cycle: - DeprecationWarnings that were `warnings.warn` or `logger.warn` before are now raised errors - Raised Deprecation warnings are now removed - Notably, this involves deprecating the TrialCheckpoint functionality and associated cloud tests - Added annotations to deprecation warning for when to fully remove	2022-04-26 09:30:15 +01:00
Chen Shen	1d0fe1e1c3	[doc/linter] fix broken deepmind link #23542	2022-03-28 22:35:53 -07:00
Philipp Moritz	886cc4d674	Fix broken links in documentation and put linkcheck linter in place on CI (#23340 )	2022-03-18 21:02:52 -07:00
Max Pumperla	372c620f58	[docs] Tune overhaul part II (#22656 ) Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>	2022-02-26 23:07:34 -08:00
Antoni Baum	d5284a740c	[tune] Remove `Trainable.update_resources` (#22471 )	2022-02-25 08:38:34 -08:00
Antoni Baum	4a15c6f8f3	[tune] Preparation for deadline schedulers (#22006 )	2022-02-22 11:05:28 -08:00
xwjiang2010	323511b716	[tune] Single wait refactor. (#21852 ) This is a down scoped change. For the full overview picture of Tune control loop, see [`Tune control loop refactoring`](https://docs.google.com/document/d/1RDsW7SVzwMPZfA0WLOPA4YTqbRyXIHGYmBenJk33HaE/edit#heading=h.2za3bbxbs5gn) 1. Previously there are separate waits on pg ready and other events. As a result, there are quite a few timing tweaks that are inefficient, hard to understand and unit test. This PR consolidates into a single wait that is handled by TrialRunner in each step. - A few event types are introduced, and their mapping into scenarios * PG_READY --> Should place a trial onto it. If somehow there is no trial to be placed there, the pg will be put in _ready momentarily. This is due to historically resources is conceptualized as a pull based model. * NO_RUNNING_TRIALS_TIME_OUT --> possibly not sufficient resources case * TRAINING_RESULT * SAVING_RESULT * RESTORING_RESULT * YIELD --> This just means that simply taking very long to train. We need to punt back to the main loop to print out status info etc. 2. Previously TrialCleanup is not very efficient and can be racing between Trainable.stop() and `return_placement_group`. This PR streamlines the Trial cleanup process by explicitly let Trainable.stop() to finish followed by `return_placement_group(pg)`. Note, graceful shutdown is needed in cases like `pause_trial` where checkpointing to memory needs to be given the time to happen before the actor is gone. 3. There are quite some env variables removed (timing tweaks), that I consider OK to proceed without deprecation cycle.	2022-02-09 15:31:17 +00:00
Max Pumperla	5cc9355303	[Docs ] Tune docs overhaul (first part) (#22112 ) Continuing docs overhaul, tune now has: - [x] better landing page - [x] a getting started guide - [x] user guide was cut down, partially merged with FAQ, and partially integrated with tutorials - [x] the new user guide contains guides to tune features and practical integrations - [x] we rewrote some of the feature guides for clarity - [x] we got rid of sphinx-gallery for this sub-project (only data and core left), as it looks bad and is unnecessarily complicated anyway (plus, makes the build slower) - [x] sphinx-gallery examples are now moved to markdown notebook, as started in #22030. - [x] Examples are tested in the new framework, of course. There's still a lot one can do, but this is already getting too large. Will follow up with more fine-tuning next week. Co-authored-by: Antoni Baum <antoni.baum@protonmail.com> Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>	2022-02-07 15:47:03 +00:00
Max Pumperla	b34099e764	[docs] landing page (fixes #21750 ) (#21859 ) Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2022-01-26 17:14:25 -08:00
Max Pumperla	f9b71a8bf6	[docs] new structure (#21776 ) This PR consolidates both #21667 and #21759 (look there for features), but improves on them in the following way: - [x] we reverted renaming of existing projects `tune`, `rllib`, `train`, `cluster`, `serve`, `raysgd` and `data` so that links won't break. I think my consolidation efforts with the `ray-` prefix were a little overeager in that regard. It's better like this. Only the creation of `ray-core` was a necessity, and some files moved into the `rllib` folder, so that should be relatively benign. - [x] Additionally, we added Algolia `docsearch`, screenshot below. This is _much_ better than our current search. Caveat: there's a sphinx dependency that needs to be replaced (`sphinx-tabs`) by another, newer one (`sphinx-panels`), as the former prevents loading of the `algolia.js` library. Will follow-up in the next PR (hoping this one doesn't get re-re-re-re-reverted).	2022-01-21 15:42:05 -08:00
Adam Golinski	2954bf9a48	[docs][tune] Fix typo in schedulers.rst (#21777 ) Fix typo in schedulers.rst	2022-01-21 13:21:01 -08:00
xwjiang2010	9af8f11191	Revert "[docs] Clean up doc structure (first part) (#21667 )" (#21763 ) This reverts commit `38e46c9fb3`.	2022-01-20 15:30:56 -08:00
Max Pumperla	38e46c9fb3	[docs] Clean up doc structure (first part) (#21667 )	2022-01-20 16:19:04 +01:00
Max Pumperla	703c161034	[doc] Fix sklearn doc error, introduce MyST markdown parser (#21527 )	2022-01-12 15:17:28 -08:00
Kai Fricke	236951ee4c	[tune] Introduce TrialCheckpoint class, making checkpoint down/upload easie (#20585 ) This PR introduces a TrialCheckpoint class which is returned e.g. by ExperimentAnalysis.best_checkpoint. The class enables easy access to cloud storage locations (rather than just local directories before). It also comes with utilities to download, upload, and save trial checkpoints to local and cloud targets.	2021-11-22 14:16:26 +00:00
Antoni Baum	3f9ded55f7	[tune] Merge `Analysis` into `ExperimentAnalysis` (#20197 ) Co-authored-by: Kai Fricke <kai@anyscale.com>	2021-11-16 16:47:12 +00:00
Will Drevo	fa878e2d4d	Added example to user guide for cloud checkpointing (#20045 ) Co-authored-by: will <will@anyscale.com> Co-authored-by: Antoni Baum <antoni.baum@protonmail.com> Co-authored-by: Kai Fricke <kai@anyscale.com>	2021-11-15 15:43:06 +00:00
matthewdeng	790e22f9ad	[tune] move force_on_current_node to ml_utils (#20211 )	2021-11-10 10:21:24 -08:00
Kai Fricke	9c2b8c8501	[tune] Deprecate DurableTrainable (#19880 )	2021-11-08 20:56:07 +00:00
Philipp Moritz	a64e32c53b	[docs] Fix broken links in documentation and add linkcheck to documentation (#20030 ) Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2021-11-04 13:19:43 -07:00
Antoni Baum	f2773267c7	[docs] Tune doc fixes (#19791 )	2021-10-29 11:45:29 +02:00
xwjiang2010	7ffd9cbed1	[Tune] Fix column width in doc. (#19159 )	2021-10-07 18:16:21 +01:00
Antoni Baum	27b8633198	[docs] Remove outdated note in Tune docs (#19110 )	2021-10-07 15:42:11 +01:00
Kai Fricke	81d3d8705e	[tune] fix docs example for tune qloguniform (#17539 )	2021-08-03 14:48:22 +01:00
Antoni Baum	b500a651b7	[docs] Add LightGBM Tune integration to docs (#17304 ) * Add LightGBM integration to docs * Fix	2021-07-23 21:21:13 -07:00
Antoni Baum	2e37826458	[tune] Function API support for `ResourceChangingScheduler` (#17150 ) Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2021-07-21 14:14:12 -07:00
Antoni Baum	f20311f194	[tune] `ResourceChangingScheduler` improvements (#17082 )	2021-07-15 15:03:27 +01:00
Antoni Baum	6e780ebf07	[tune] `ResourceChangingScheduler` dynamic resource allocation during tuning (#16787 )	2021-07-14 10:45:13 +01:00
Antoni Baum	d71ec6e874	[docs] Add examples of new features to contribute (#16477 )	2021-06-18 00:07:03 -07:00
Qingyun Wu	dae3ac1def	[Tune] Add new searchers from FLAML (#16329 )	2021-06-12 02:10:51 -07:00
Kai Fricke	e8f8e9f328	[tune] Adjust searcher sample bounds to match Tune API (#15899 )	2021-06-11 14:31:08 +01:00
Amog Kamsetty	38b657cb65	[Tune] Place remote tune.run on node running the client server (#16034 ) * force placement on persistent node * address comments * doc	2021-05-28 18:32:57 -07:00
Kai Fricke	84b3c3376b	[tune] document scalability best practices (k8s, scalability thresholds) (#14566 ) Adds a new page and table to document current scalability thresholds in Ray Tune to the documentation. Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2021-03-25 09:54:14 +01:00
Kai Fricke	43e098402a	[tune] make `tune.with_parameters()` work with the class API (#14532 ) * [tune] make `tune.with_parameters()` work with the class API * Update python/ray/tune/utils/trainable.py Co-authored-by: Richard Liaw <rliaw@berkeley.edu> Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2021-03-09 09:36:17 +01:00
Kai Fricke	b0bf44b154	[tune/docs] Add high level trial runner flow to documentation (#14468 ) * [tune/docs] Add high level trial runner flow to documentation * Apply suggestions from code review	2021-03-08 10:35:54 +01:00
Kai Fricke	4014168928	[tune] Introduce `durable()` wrapper to convert trainables into durable trainables (#14306 ) * [tune] Introduce `durable()` wrapper to convert trainables into durable trainables * Fix wrong check * Improve docs, add FAQ for tackling overhead * Fix bugs in `tune.with_parameters` * Update doc/source/tune/api_docs/trainable.rst Co-authored-by: Richard Liaw <rliaw@berkeley.edu> * Update doc/source/tune/_tutorials/_faq.rst Co-authored-by: Richard Liaw <rliaw@berkeley.edu> Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2021-02-26 13:59:28 +01:00
Kai Fricke	757866ec01	[tune] enable placement groups per default (#13906 ) * Refactor placement group factory object to accept placement_group arguments instead of callables * Convert resources to pgf * Enable placement groups per default * Fix tests WIP * Fix stop/resume with placement groups * Fix progress reporter test * Fix trial executor tests * Check resource for trial, not resource object * Move ENV vars into class * Fix tests * Sphinx * Wait for trial start in PBT * Revert merge errors * Support trial reuse with placement groups * Better check for just staged trials * Fix trial queuing * Wait for pg after trial termination * Clean up PGs before tune run * No PG settings in pbt scheduler * Fix buffering tests * Skip test if ray reports erroneous available resources * Disable PG for cluster resource counting test * Debug output for tests * Output in-use resources for placement groups * Don't start new trial on trial start failure * Add docs * Cleanup PGs once futures returned * Fix placement group shutdown * Use updated_queue flag * Apply suggestions from code review * Apply suggestions from code review * Update docs * Reuse placement groups independently from actors * Do not remove placement groups for paused trials * Only continue enqueueing trials if it didn't fail the first time * Rename parameter * Fix pause trial * Code review + try_recover * Update python/ray/tune/utils/placement_groups.py Co-authored-by: Richard Liaw <rliaw@berkeley.edu> * Move placement group lifecycle management * Move total used resources to pg manager * Update FAQ example * Requeue trial if start was unsuccessful * Do not cleanup pgs at start of run * Revert "Do not cleanup pgs at start of run" This reverts commit 933d9c4c * Delayed PG removal * Fix trial requeue test * Trigger pg cleanup on status update * Fix tests * Fix docs * fix-test Signed-off-by: Richard Liaw <rliaw@berkeley.edu> Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2021-02-23 18:46:02 +01:00
Antoni Baum	58d7398246	[Tune] Add `HEBOSearch` Searcher (#13863 ) * HEBO first pass * Fix bad quotes * Fixes * Reproductibility * Update python/ray/tune/suggest/hebo.py Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com> * Add hebo_example.py to BUILD * Nit * Update to pypi package * Alphabetical HEBO requirement * Fix syntax error * Fix wrong space in hebo example * Move validate_warmstart to utils * Space assertion in HEBO * Comment * Apply suggestions from code review Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com> * Formatting Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>	2021-02-17 22:53:10 +01:00
architkulkarni	28cf5f91e3	[docs] change MLFlow to MLflow in docs (#13739 )	2021-01-27 16:53:15 -08:00
Amog Kamsetty	20016c983f	[Tune] MLflow Credentials (#13533 )	2021-01-19 11:55:13 -08:00
Lavanya Shukla	350917958c	[docs] fix wandb url (#13094 )	2020-12-28 17:19:17 -08:00
Antoni Baum	a4f2dd2138	[Tune]Add integer loguniform support (#12994 ) * Add integer quantization and loguniform support * Fix hyperopt qloguniform not being np.log'd first * Add tests, __init__ * Try to fix tests, better exceptions * Tweak docstrings * Type checks in SearchSpaceTest * Update docs * Lint, tests * Update doc/source/tune/api_docs/search_space.rst Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com> Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>	2020-12-23 09:27:16 -08:00
Amog Kamsetty	5d3c9c8861	[Tune] Mlflow Integration (#12840 ) Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com> Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2020-12-19 00:40:02 -08:00
Kai Fricke	3d72000826	[tune] Add `points_to_evaluate` to BasicVariantGenerator (#12916 ) Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2020-12-17 19:16:03 -08:00
Kai Fricke	5f04ade6ef	[tune] add more stoppers and stopper documentation (#12750 ) * Add new stoppers & docs * Add tests for maximum iteration stopper and trial plateau stopper * Update python/ray/tune/stopper.py Co-authored-by: Richard Liaw <rliaw@berkeley.edu> * Update doc/source/tune/api_docs/stoppers.rst Co-authored-by: Richard Liaw <rliaw@berkeley.edu> * Update doc/source/tune/api_docs/stoppers.rst Co-authored-by: Richard Liaw <rliaw@berkeley.edu> * Apply suggestions from code review * Apply suggestions from code review * Update python/ray/tune/stopper.py Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2020-12-12 01:47:19 -08:00
Kai Fricke	9f5986ee58	[tune] logger migration to ExperimentLogger classes (#11984 )	2020-11-16 15:08:37 -08:00
Keqiu Hu	0c1bdaef59	[tune] TensorFlow Distributed Trainable (#11876 ) Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2020-11-10 14:59:08 -08:00
Richard Liaw	efa07d5403	Revert "Revert "[tune] PB2 (#11466 )" (#11795 )" (#11812 )	2020-11-04 20:47:12 -08:00
Amog Kamsetty	7248d5f4ae	Revert "[tune] PB2 (#11466 )" (#11795 ) This reverts commit `e7aafd7d24`.	2020-11-03 21:05:00 -08:00
Kai Fricke	f7b19c41e3	[tune] logger refactor part 1: move classes and utilities to own files (#11746 ) * [tune] logger refactor part 1: move classes and utilities to own files * Fix circular dependency * Remove uneeded pretty print copy * Apply suggestions from code review	2020-11-03 07:48:09 -08:00

1 2

93 commits