Commit graph

263 commits

Author SHA1 Message Date
Eric Liang
52f7b89865
[docs] Editing pass on clusters docs, removing legacy material and fixing style issues (#27816) 2022-08-12 00:15:03 -07:00
Richard Liaw
93a3cc222b
[docs/air] remove xgboost/lightgbm references and move AIR toc (#27687) 2022-08-09 12:49:44 -07:00
xwjiang2010
9c7fc5ccdd
[tune/doc] fix emphasized line number. (#27648) 2022-08-08 16:37:47 -07:00
Richard Liaw
4629a3a649
[air/docs] Update Trainer documentation (#27481)
Co-authored-by: xwjiang2010 <xwjiang2010@gmail.com>
Co-authored-by: Kai Fricke <kai@anyscale.com>
Co-authored-by: xwjiang2010 <87673679+xwjiang2010@users.noreply.github.com>
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2022-08-05 11:21:19 -07:00
xwjiang2010
ff2b728e9a
[air] add tuner user guide (#26837)
Co-authored-by: Kai Fricke <kai@anyscale.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2022-08-03 09:43:42 -07:00
xwjiang2010
36cf1baa82
[air doc] checkpoint_freq --> checkpoint_frequency (#27325)
Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>
2022-08-02 11:34:10 +01:00
Kai Fricke
1f097e9d12
[tune/docs] Update custom syncer example (#27252)
There is a small bug in the docs example for custom command based syncers. This PR fixes them and adds a test to test these changes.

Signed-off-by: Kai Fricke <kai@anyscale.com>
2022-07-29 16:09:19 +01:00
xwjiang2010
d331489a9d
[ air ] clean up some more tune.run (#27117)
More replacements of tune.run() in examples/docstrings for Tuner.fit()

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

Co-authored-by: Kai Fricke <kai@anyscale.com>
2022-07-29 10:43:45 +01:00
xwjiang2010
eb69c1ca28
[air] Add annotation for Tune module. (#27060)
Co-authored-by: Kai Fricke <kai@anyscale.com>
2022-07-27 13:53:46 -07:00
Amog Kamsetty
862d10c162
[AIR] Remove ML code from ray.util (#27005)
Removes all ML related code from `ray.util`

Removes:
- `ray.util.xgboost`
- `ray.util.lightgbm`
- `ray.util.horovod`
- `ray.util.ray_lightning`

Moves `ray.util.ml_utils` to other locations

Closes #23900

Signed-off-by: Amog Kamsetty <amogkamsetty@yahoo.com>
Signed-off-by: Kai Fricke <kai@anyscale.com>
Co-authored-by: Kai Fricke <kai@anyscale.com>
2022-07-27 14:24:19 +01:00
Jiao
5315f1e643
[AIR] Enable other notebooks previously marked with # REGRESSION (#26896)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2022-07-25 13:40:21 -07:00
Kai Fricke
803c094534
[air/tuner/docs] Update docs for Tuner() API 2b: Tune examples (ipynb) (#26884)
This PR updates the Ray AIR/Tune ipynb examples to use the Tuner() API instead of tune.run().

Signed-off-by: Kai Fricke <kai@anyscale.com>
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Signed-off-by: Kai Fricke <coding@kaifricke.com>

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Xiaowei Jiang <xwjiang2010@gmail.com>
2022-07-24 18:53:57 +01:00
Kai Fricke
8fe439998e
[air/tuner/docs] Update docs for Tuner() API 1: RSTs, docs, move reuse_actors (#26930)
Signed-off-by: Kai Fricke coding@kaifricke.com

Why are these changes needed?
Splitting up #26884: This PR includes changes to use Tuner() instead of tune.run() for most docs files (rst and py), and a change to move reuse_actors to the TuneConfig
2022-07-24 07:45:24 -07:00
Kai Fricke
77ba30d34e
[tune] Docs for custom command based syncer (awscli / gsutil) (#26879)
Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
2022-07-22 15:28:53 -07:00
Sumanth Ratna
759966781f
[air] Allow users to use instances of ScalingConfig (#25712)
Co-authored-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
2022-07-18 15:46:58 -07:00
Antoni Baum
cc7115f6a2
[Tune/CI] Fix tune-sklearn notebook example (#26470)
Fixes the tune-sklearn notebook example as found in #26410

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
2022-07-13 18:14:36 +01:00
Antoni Baum
ddb5572040
[Tune/CI] Fix Hyperopt notebook example (#26469)
Fixes failing hyperopt notebook in CI (as found in #26410). The cause was a mismatch between keys in points to evaluate and the search space - now, an informative exception will be raised.

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
2022-07-13 16:50:11 +01:00
Antoni Baum
9b2cd29511
[CI] Install Horovod in doc tests to fix notebook (#26476)
Fixes the Horovod notebook example as found in #26410 by installing Horovod in doc tests jobs.

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
2022-07-13 16:27:20 +01:00
Antoni Baum
67a7ffa6b4
[Tune/CI] Fix BOHB notebook example (#26473)
Fixes the BOHB notebook example as found in #26410

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
2022-07-13 10:35:38 +01:00
Antoni Baum
e48d381926
[Tune/CI] Fix Tune-Pytorch-CIFAR notebook example (#26474)
Fixes the Tune-Pytorch-CIFAR notebook example as found in #26410

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
2022-07-13 10:28:30 +01:00
Antoni Baum
65ea710e30
[Docs] Update Train user guide to use the new APIs (#26091) 2022-07-11 15:10:10 -07:00
Kai Fricke
753f5feaf4
[tune] Remove TrialCheckpoint class (#25406)
The old user-facing TrialCheckpoint class has been deprecated in favor of `ray.ml.Checkpoint` and will be removed with this PR.

The main change in this PR is to delete the old `TrialCheckpoint` class and replace remaining API calls (e.g. `checkpoint.local_path`) with the correct AIR equivalents.

One issue that comes up is that with Ray client usage, checkpoint directories are not available on the local node (the client). Thus, we can't construct `Checkpoint` objects easily. (Previously, the TrialCheckpoint object held a reference to the location, even if it is not locally available). There are ongoing discussions on how to resolve this in the future. For now, we print an error when such a checkpoint is requested.

Depends on #25805

Signed-off-by: Kai Fricke <kai@anyscale.com>
2022-07-11 20:08:10 +01:00
xwjiang2010
c97d65e64f
[tune] fix hebo_example. (#26439)
Fixes a bug in the the ipython notebook.
2022-07-11 17:12:10 +01:00
Richard Liaw
5892a76a44
[air/tune] Documentation testing fixes (#26409) 2022-07-09 19:47:21 -07:00
Antoni Baum
ea94cda1f3
[AIR] Replace train. with session. (#26303)
This PR replaces legacy API calls to `train.` with AIR `session.` in Train code, examples and docs.

Depends on https://github.com/ray-project/ray/pull/25735
2022-07-07 16:29:04 -07:00
xwjiang2010
ac831fded4
[air] update documentation to use session.report (#26051)
Update documentation to use `session.report`.

Next steps:
1. Update our internal caller to use `session.report`. Most importantly, CheckpointManager and DataParallelTrainer.
2. Update `get_trial_resources` to use PGF notions to incorporate the requirement of ResourceChangingScheduler. @Yard1 
3. After 2 is done, change all `tune.get_trial_resources` to `session.get_trial_resources`
4. [internal implementation] remove special checkpoint handling logic from huggingface trainer. Optimize the flow for checkpoint conversion with `session.report`.

Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
2022-06-30 10:37:31 -07:00
Antoni Baum
128f9e5664
[AIR] Move integration logging callbacks to AIR (#26126)
As the integration logging callbacks are commonly used with AIR Trainers, they should be moved from the tune package to the air package. The old imports will still work, but raise a deprecation warning.
2022-06-28 17:25:19 -07:00
Kai Fricke
75d08b0632
[tune/structure] Refactor suggest into search package (#26074)
This PR renames the `suggest` package to `search` and alters the layout slightly.

In the new package, the higher-level abstractions are on the top level and the search algorithms have their own subdirectories.

In a future refactor, we can turn algorithms such as PBT into actual `SearchAlgorithm` classes and move them into the `search` package. 

The main reason to keep algorithms and searchers in the same directory is to avoid user confusion - for a user, `Bayesopt` is as much a search algorithm as e.g. `PBT`, so it doesn't make sense to split them up.
2022-06-25 14:55:30 +01:00
Kai Fricke
012306da68
[hotfix] Fix linkcheck (#26070) 2022-06-24 13:38:01 +01:00
Kai Fricke
b21314fac2
[tune/structure] Introduce trainable package (#26046)
Introduce a `trainable` package to house Trainable, FunctionTrainable (renamed), Session, and utilities.
2022-06-23 21:50:55 +01:00
Kai Fricke
0959f44b6f
[tune/structure] Introduce execution package (#26015)
Execution-specific packages are moved to tune.execution.

Co-authored-by: Xiaowei Jiang <xwjiang2010@gmail.com>
2022-06-23 11:13:19 +01:00
Kai Fricke
6313ddc47c
[tune] Refactor Syncer / deprecate Sync client (#25655)
This PR includes / depends on #25709

The two concepts of Syncer and SyncClient are confusing, as is the current API for passing custom sync functions.

This PR refactors Tune's syncing behavior. The Sync client concept is hard deprecated. Instead, we offer a well defined Syncer API that can be extended to provide own syncing functionality. However, the default will be to use Ray AIRs file transfer utilities.

New API:
- Users can pass `syncer=CustomSyncer` which implements the `Syncer` API
- Otherwise our off-the-shelf syncing is used
- As before, syncing to cloud disables syncing to driver

Changes:
- Sync client is removed
- Syncer interface introduced
- _DefaultSyncer is a wrapper around the URI upload/download API from Ray AIR
- SyncerCallback only uses remote tasks to synchronize data
- Rsync syncing is fully depracated and removed
- Docker and kubernetes-specific syncing is fully deprecated and removed
- Testing is improved to use `file://` URIs instead of mock sync clients
2022-06-14 14:46:30 +02:00
Amog Kamsetty
1316a2d05e
[AIR/Train] Move ray.air.train to ray.train (#25570) 2022-06-08 21:34:18 -07:00
Kai Fricke
8affbc7be6
[tune/train] Consolidate checkpoint manager 3: Ray Tune (#24430)
**Update**: This PR is now part 3 of a three PR group to consolidate the checkpoints.

1. Part 1 adds the common checkpoint management class #24771 
2. Part 2 adds the integration for Ray Train #24772
3. This PR builds on #24772 and includes all changes. It moves the Ray Tune integration to use the new common checkpoint manager class.

Old PR description:

This PR consolidates the Ray Train and Tune checkpoint managers. These concepts previously did something very similar but in different modules. To simplify maintenance in the future, we've consolidated the common core.

- This PR keeps full compatibility with the previous interfaces and implementations. This means that for now, Train and Tune will have separate CheckpointManagers that both extend the common core
- This PR prepares Tune to move to a CheckpointStrategy object
- In follow-up PRs, we can further unify interfacing with the common core, possibly removing any train- or tune-specific adjustments (e.g. moving to setup on init rather on runtime for Ray Train)

Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
2022-06-08 12:05:34 +01:00
Kai Fricke
4b9a89ad90
[air] Move python/ray/ml to python/ray/air (#25449)
The package "ml" should be renamed to "air".

Main question: Keep a `ml.py` with `from ray.air import *` for some level of backwards compatibility?
I'd go for no to force people to use the new structure.
2022-06-03 21:53:44 +01:00
Kai Fricke
313e8730a2
[tune/docs] Trial executor doc fix (#25440) 2022-06-03 16:25:38 +01:00
Kai Fricke
2e058380d7
[tune] Remove TrialExecutor base class (#25404)
The TrialExecutor base class was a stub and has been deprecated long ago; direct inheritance was disabled. This PR removes the base class and moves the remaining functionality into the RayTrialExecutor.
2022-06-03 10:16:47 +01:00
Kai Fricke
f0fa8e54f8
[tune] Remove DurableTrainable class (#25405)
The DurableTrainable is deprecated (every trainable is a durable trainable). This PR removes it from the Tune library and a related example.
2022-06-03 10:16:02 +01:00
Kai Fricke
67cd984b92
[tune] Add annotations/set scope for Tune classes (#25077)
This PR adds API annotations or changes the scope of several Ray Tune library classes.
2022-05-25 15:21:28 +02:00
Nintorac
81c0b24164
[tune/docs] fix typo (#25109) 2022-05-24 18:20:10 +01:00
Eric Liang
437df9431c
[docs] Remove bad suggestions to use local_mode or num_cpus in init (#24827) 2022-05-17 12:55:04 -07:00
Antoni Baum
c74886a55e
[CI] Run doc notebooks in CI (#24816)
Currently, we are not running doc notebooks in CI due to a bazel misconfiguration - we are using `glob` in a top level package in order to get the paths for the notebooks, but those are contained inside subpackages, which glob purposefully ignores. Therefore, the lists of notebooks to run are empty. This PR fixes that by:
* Running the `py_test_run_all_notebooks` macro inside the relevant subpackages
* Editing the `test_myst_doc.py` script to allow for recursive search for the target file, allowing to deal with mismatches between `name` and `data` arguments in `py_test_run_all_notebooks`
* Setting the `allow_empty=False` flag inside `glob` calls in our macros to ensure that this oversight is caught early
* Enabling detection of changes in doc folder for `*.ipynb` and `BUILD` files

This PR also adds a GPU runner for doc tests, allowing one of our examples to pass - and setting the infra for more to come. Finally, a misconfigured path for one set of doc tests is also fixed.
2022-05-17 09:50:42 +01:00
Kai Fricke
06ef672699
[ci/docs] Fix broken linkcheck URL (#24777)
The hyperband blogpost URL is broken, link to other blog post
2022-05-13 15:58:36 +01:00
Max Pumperla
cd5218f831
[docs] Tune examples better navigation, minor fixes (#24733)
Replaces #24225 and adds example navigation

Signed-off-by: Max Pumperla <max.pumperla@googlemail.com>
2022-05-13 14:39:18 +01:00
Edward Oakes
fb71743935
[serve] Convert "End-to-end Tutorial" to "Getting Started" (#24690) 2022-05-12 08:44:43 -07:00
Amog Kamsetty
a36e2a8f51
[Tune] Deprecate DistributedTrainableCreator (#24453)
Fully deprecate DistributedTrainableCreator for Ray 2.0

Closes #24453
2022-05-10 11:06:43 -07:00
Kai Fricke
61a9de732f
[docs/tune] Small fixes to tune-distributed for new restore modes (#24220)
We've updated restore modes, so we should reflect that in the docs.
2022-04-26 22:19:49 +01:00
Kai Fricke
c0ec20dc3a
[tune] Next deprecation cycle (#24076)
Rolling out next deprecation cycle:

- DeprecationWarnings that were `warnings.warn` or `logger.warn` before are now raised errors
- Raised Deprecation warnings are now removed
- Notably, this involves deprecating the TrialCheckpoint functionality and associated cloud tests
- Added annotations to deprecation warning for when to fully remove
2022-04-26 09:30:15 +01:00
Amog Kamsetty
ae9c68e75f
[Train] Fully deprecate Ray SGD v1 (#24038)
Ray SGD v1 has been denoted as a deprecated API for a while. This PR fully deprecates Ray SGD v1. An error will be raised if ray.util.sgd package is attempted to be imported.

Closes #16435
2022-04-25 16:12:57 -07:00
Brett Göhre
9e0a59d94a
[docs] search algorithm notebook examples (#23924)
Co-authored-by: brettskymind <brett@pathmind.com>
Co-authored-by: Max Pumperla <max.pumperla@googlemail.com>
2022-04-25 11:10:58 -07:00