Commit graph

206 commits

Author SHA1 Message Date
Philipp Moritz
886cc4d674
Fix broken links in documentation and put linkcheck linter in place on CI (#23340) 2022-03-18 21:02:52 -07:00
Archit Kulkarni
76bb5396c7
[Doc] [jobs] Add links to Job Submission and improve doc (#23209)
- Adds links to Job Submission from existing library tutorials where `ray submit` is used.  When Jobs becomes GA, we should fully replace the uses of `ray submit` with Ray job submission and ensure this is tested.
- Adds docstrings for the Jobs SDK, which automatically show up in the API reference
- Improve the Job Submission main page
- Add a "Deployment Guide" landing page explaining when to use Ray Client vs Ray Jobs

Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
2022-03-18 12:52:13 -05:00
Eric Liang
c8f207f746
[docs] Core docs refactor (#23216)
This PR makes a number of major overhauls to the Ray core docs:

Add a key-concepts section for {Tasks, Actors, Objects, Placement Groups, Env Deps}.
Re-org the user guide to align with key concepts.
Rewrite the walkthrough to link to mini-walkthroughs in the key concept sections.
Minor tweaks and additional transition material.
2022-03-17 11:26:17 -07:00
Max Pumperla
11c40e363d
[docs] external promo content (#22823) 2022-03-10 11:39:44 -08:00
Max Pumperla
7d4296c72f
run code in browser (#22727)
Example for running notebooks on our docs directly in the browser by connecting to a binder instance launched on demand.
If this seems useful we can extend this to other examples gradually.

Signed-off-by: Max Pumperla <max.pumperla@googlemail.com>
2022-03-02 10:27:00 +01:00
Max Pumperla
372c620f58
[docs] Tune overhaul part II (#22656)
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
2022-02-26 23:07:34 -08:00
Antoni Baum
d5284a740c
[tune] Remove Trainable.update_resources (#22471) 2022-02-25 08:38:34 -08:00
mwtian
9a157dfe82
[GCS-Ray] update doc and error message for GCS-Ray (#22528)
Update documentation to reflect that Ray no longer starts Redis by default.
2022-02-22 17:56:30 -08:00
Antoni Baum
4a15c6f8f3
[tune] Preparation for deadline schedulers (#22006) 2022-02-22 11:05:28 -08:00
Max Pumperla
29d94a2211
[docs] sphinx gallery removal, migrate to ipynb (#22467) 2022-02-19 01:19:07 -08:00
Simon Mo
495221e7d2
[Doc] Update Serve logo for tune user guide (#22369)
We have deprecated the old logo.
2022-02-15 12:10:08 -06:00
Max Pumperla
d594b668bb
[docs] [tune] hyperopt notebook (#22315) 2022-02-12 02:46:03 -08:00
xwjiang2010
323511b716
[tune] Single wait refactor. (#21852)
This is a down scoped change. For the full overview picture of Tune control loop, see [`Tune control loop refactoring`](https://docs.google.com/document/d/1RDsW7SVzwMPZfA0WLOPA4YTqbRyXIHGYmBenJk33HaE/edit#heading=h.2za3bbxbs5gn)

1. Previously there are separate waits on pg ready and other events. As a result, there are quite a few timing tweaks that are inefficient, hard to understand and unit test. This PR consolidates into a single wait that is handled by TrialRunner in each step.
- A few event types are introduced, and their mapping into scenarios
  * PG_READY --> Should place a trial onto it. If somehow there is no trial to be placed there, the pg will be put in _ready momentarily. This is due to historically resources is conceptualized as a pull based model. 
  * NO_RUNNING_TRIALS_TIME_OUT --> possibly not sufficient resources case
  * TRAINING_RESULT
  * SAVING_RESULT
  * RESTORING_RESULT
  * YIELD --> This just means that simply taking very long to train. We need to punt back to the main loop to print out status info etc.

2. Previously TrialCleanup is not very efficient and can be racing between Trainable.stop() and `return_placement_group`. This PR streamlines the Trial cleanup process by explicitly let Trainable.stop() to finish followed by `return_placement_group(pg)`. Note, graceful shutdown is needed in cases like `pause_trial` where checkpointing to memory needs to be given the time to happen before the actor is gone. 

3. There are quite some env variables removed (timing tweaks), that I consider OK to proceed without deprecation cycle.
2022-02-09 15:31:17 +00:00
Max Pumperla
5cc9355303
[Docs ] Tune docs overhaul (first part) (#22112)
Continuing docs overhaul, tune now has:

- [x] better landing page
- [x] a getting started guide
- [x] user guide was cut down, partially merged with FAQ, and partially integrated with tutorials
- [x] the new user guide contains guides to tune features and practical integrations
- [x] we rewrote some of the feature guides for clarity 
- [x] we got rid of sphinx-gallery for this sub-project (only data and core left), as it looks bad and is unnecessarily complicated anyway (plus, makes the build slower)
- [x] sphinx-gallery examples are now moved to markdown notebook, as started in #22030.
- [x] Examples are tested in the new framework, of course.

There's still a lot one can do, but this is already getting too large. Will follow up with more fine-tuning next week.

Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
2022-02-07 15:47:03 +00:00
Balaji Veeramani
7f1bacc7dc
[CI] Format Python code with Black (#21975)
See #21316 and #21311 for the motivation behind these changes.
2022-01-29 18:41:57 -08:00
Max Pumperla
b34099e764
[docs] landing page (fixes #21750) (#21859)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2022-01-26 17:14:25 -08:00
Dhruv Nair
3d79815cd0
Comet Integration (#20766)
This PR adds a `CometLoggerCallback` to the Tune Integrations, allowing users to log runs from Ray to [Comet](https://www.comet.ml/site/).

Co-authored-by: Michael Cullan <mjcullan@gmail.com>
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
2022-01-25 11:42:00 -08:00
Max Pumperla
7953c9ca57
[docs] integrate algolia docsearch, move to sphinx panels (#21814) 2022-01-24 17:00:41 -08:00
Max Pumperla
f9b71a8bf6
[docs] new structure (#21776)
This PR consolidates both #21667 and #21759 (look there for features), but improves on them in the following way:

- [x] we reverted renaming of existing projects `tune`, `rllib`, `train`, `cluster`, `serve`, `raysgd` and `data` so that links won't break. I think my consolidation efforts with the `ray-` prefix were a little overeager in that regard. It's better like this. Only the creation of `ray-core` was a necessity, and some files moved into the `rllib` folder, so that should be relatively benign.
- [x] Additionally, we added Algolia `docsearch`, screenshot below. This is _much_ better than our current search. Caveat: there's a sphinx dependency that needs to be replaced (`sphinx-tabs`) by another, newer one (`sphinx-panels`), as the former prevents loading of the `algolia.js` library. Will follow-up in the next PR (hoping this one doesn't get re-re-re-re-reverted).
2022-01-21 15:42:05 -08:00
Adam Golinski
2954bf9a48
[docs][tune] Fix typo in schedulers.rst (#21777)
Fix typo in schedulers.rst
2022-01-21 13:21:01 -08:00
xwjiang2010
9af8f11191
Revert "[docs] Clean up doc structure (first part) (#21667)" (#21763)
This reverts commit 38e46c9fb3.
2022-01-20 15:30:56 -08:00
Max Pumperla
38e46c9fb3
[docs] Clean up doc structure (first part) (#21667) 2022-01-20 16:19:04 +01:00
Max Pumperla
703c161034
[doc] Fix sklearn doc error, introduce MyST markdown parser (#21527) 2022-01-12 15:17:28 -08:00
Jules S. Damji
064f976eb4
Added hyperparameters to the concepts section (#21024)
Added hyperameters to the concetp section since it's important to explain what they are and added diagrams help readeer visualize the difference between model and hyperparameters

Signed-off-by: Jules S.Damji <jules@anyscale.com>
Co-authored-by: Jules S.Damji <jules@anyscale.com>
2021-12-13 12:21:39 +00:00
xwjiang2010
368da1742b
[tune] Enforce one future at a time for any given trial at any given time. (#20783)
Also enforce disabling (instead of allowing user to override this) buffer training when checkpoint_at_end is used.
2021-12-03 08:14:12 -08:00
Kai Fricke
236951ee4c
[tune] Introduce TrialCheckpoint class, making checkpoint down/upload easie (#20585)
This PR introduces a TrialCheckpoint class which is returned e.g. by ExperimentAnalysis.best_checkpoint. The class enables easy access to cloud storage locations (rather than just local directories before). It also comes with utilities to download, upload, and save trial checkpoints to local and cloud targets.
2021-11-22 14:16:26 +00:00
Antoni Baum
0b14f38ac7
[tune] Multi-objective support for Optuna (#20489)
This PR adds multi-objective support for Optuna searchers, including a test and example.

Co-authored-by: gjoliver <jungong@anyscale.com>
2021-11-18 18:47:29 +00:00
Antoni Baum
3f9ded55f7
[tune] Merge Analysis into ExperimentAnalysis (#20197)
Co-authored-by: Kai Fricke <kai@anyscale.com>
2021-11-16 16:47:12 +00:00
Will Drevo
fa878e2d4d
Added example to user guide for cloud checkpointing (#20045)
Co-authored-by: will <will@anyscale.com>
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
Co-authored-by: Kai Fricke <kai@anyscale.com>
2021-11-15 15:43:06 +00:00
xwjiang2010
cdf70c2900
[Tune] Remove legacy resources implementations in Runner and Executor. (#19773) 2021-11-12 12:33:39 -08:00
Kai Fricke
d88fdd6e38
[tune] refactor SyncConfig (#20155) 2021-11-12 09:36:15 +00:00
Jules S. Damji
71a162d8ab
Fixed code snippet to include config parameter and a minor typo (#20193)
Signed-off-by: Jules S.Damji <jules@anyscale.com>

Co-authored-by: Jules S.Damji <jules@anyscale.com>
2021-11-11 18:37:03 +00:00
Edward Oakes
082a4af3e6
[serve] Remove lingering backend/endpoint wording in docs (#20229) 2021-11-10 16:49:29 -08:00
matthewdeng
790e22f9ad
[tune] move force_on_current_node to ml_utils (#20211) 2021-11-10 10:21:24 -08:00
Kai Fricke
9c2b8c8501
[tune] Deprecate DurableTrainable (#19880) 2021-11-08 20:56:07 +00:00
Amog Kamsetty
b1f24768a1
[Tune] More fixes to PTL Tutorial (#20065)
* ptl-fix-2

* improve

* fix
2021-11-08 09:13:44 -08:00
Philipp Moritz
a64e32c53b
[docs] Fix broken links in documentation and add linkcheck to documentation (#20030)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-11-04 13:19:43 -07:00
Amog Kamsetty
f67b526b7a
[Tune] Fix PTL tutorial docs (#19999) 2021-11-04 09:21:28 -07:00
Philipp Moritz
0a5942d8b0
[Documentation] Fix quotes for windows installations (#19859)
* [Documentation] Fix quotes for windows installations

* update

* formatting
2021-10-29 10:54:38 -07:00
Antoni Baum
f2773267c7
[docs] Tune doc fixes (#19791) 2021-10-29 11:45:29 +02:00
matthewdeng
4674c78050
[Train] Rename Ray SGD v2 to Ray Train (#19436) 2021-10-18 22:27:46 -07:00
Antoni Baum
c7d6f838f6
[tune] Optional forcible trial cleanup, return default autofilled metrics even if Trainable doesn't report at least once (#19144) 2021-10-08 18:16:26 +01:00
xwjiang2010
7ffd9cbed1
[Tune] Fix column width in doc. (#19159) 2021-10-07 18:16:21 +01:00
Antoni Baum
27b8633198
[docs] Remove outdated note in Tune docs (#19110) 2021-10-07 15:42:11 +01:00
Antoni Baum
cc3199b814
[docs] Provide information about resource deadlocks, early stopping in Tune docs (#18947)
Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
2021-10-01 13:52:47 +01:00
Richard Liaw
227aa9e89b
[tune] change delimiter for results (#16573)
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Kai Fricke <kai@anyscale.com>
2021-09-28 10:03:00 +01:00
Kai Fricke
9b0d804eed
[tune] Add documentation for reproducible runs (setting seeds) (#18849)
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
2021-09-24 10:57:31 +01:00
xwjiang2010
5551cdac19
[Tune] Break from loop after warning msg is logged. (#18720) 2021-09-18 16:33:44 -07:00
Kai Fricke
395976c8a1
[tune] Never block for results (#18391)
* [tune] Never block for results

* Fix tests

* Block in tests

* Add comment to test
2021-09-09 12:08:00 -07:00
Richard Liaw
0594deafdf
[tune] allow users to configure bootstrap for docker syncer (#17786) 2021-09-05 22:04:31 -07:00