There is a small bug in the docs example for custom command based syncers. This PR fixes them and adds a test to test these changes.
Signed-off-by: Kai Fricke <kai@anyscale.com>
More replacements of tune.run() in examples/docstrings for Tuner.fit()
Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>
Co-authored-by: Kai Fricke <kai@anyscale.com>
Removes all ML related code from `ray.util`
Removes:
- `ray.util.xgboost`
- `ray.util.lightgbm`
- `ray.util.horovod`
- `ray.util.ray_lightning`
Moves `ray.util.ml_utils` to other locations
Closes#23900
Signed-off-by: Amog Kamsetty <amogkamsetty@yahoo.com>
Signed-off-by: Kai Fricke <kai@anyscale.com>
Co-authored-by: Kai Fricke <kai@anyscale.com>
This PR updates the Ray AIR/Tune ipynb examples to use the Tuner() API instead of tune.run().
Signed-off-by: Kai Fricke <kai@anyscale.com>
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Signed-off-by: Kai Fricke <coding@kaifricke.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Signed-off-by: Kai Fricke coding@kaifricke.com
Why are these changes needed?
Splitting up #26884: This PR includes changes to use Tuner() instead of tune.run() for most docs files (rst and py), and a change to move reuse_actors to the TuneConfig
Fixes failing hyperopt notebook in CI (as found in #26410). The cause was a mismatch between keys in points to evaluate and the search space - now, an informative exception will be raised.
Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
The old user-facing TrialCheckpoint class has been deprecated in favor of `ray.ml.Checkpoint` and will be removed with this PR.
The main change in this PR is to delete the old `TrialCheckpoint` class and replace remaining API calls (e.g. `checkpoint.local_path`) with the correct AIR equivalents.
One issue that comes up is that with Ray client usage, checkpoint directories are not available on the local node (the client). Thus, we can't construct `Checkpoint` objects easily. (Previously, the TrialCheckpoint object held a reference to the location, even if it is not locally available). There are ongoing discussions on how to resolve this in the future. For now, we print an error when such a checkpoint is requested.
Depends on #25805
Signed-off-by: Kai Fricke <kai@anyscale.com>
Update documentation to use `session.report`.
Next steps:
1. Update our internal caller to use `session.report`. Most importantly, CheckpointManager and DataParallelTrainer.
2. Update `get_trial_resources` to use PGF notions to incorporate the requirement of ResourceChangingScheduler. @Yard1
3. After 2 is done, change all `tune.get_trial_resources` to `session.get_trial_resources`
4. [internal implementation] remove special checkpoint handling logic from huggingface trainer. Optimize the flow for checkpoint conversion with `session.report`.
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
As the integration logging callbacks are commonly used with AIR Trainers, they should be moved from the tune package to the air package. The old imports will still work, but raise a deprecation warning.
This PR renames the `suggest` package to `search` and alters the layout slightly.
In the new package, the higher-level abstractions are on the top level and the search algorithms have their own subdirectories.
In a future refactor, we can turn algorithms such as PBT into actual `SearchAlgorithm` classes and move them into the `search` package.
The main reason to keep algorithms and searchers in the same directory is to avoid user confusion - for a user, `Bayesopt` is as much a search algorithm as e.g. `PBT`, so it doesn't make sense to split them up.
This PR includes / depends on #25709
The two concepts of Syncer and SyncClient are confusing, as is the current API for passing custom sync functions.
This PR refactors Tune's syncing behavior. The Sync client concept is hard deprecated. Instead, we offer a well defined Syncer API that can be extended to provide own syncing functionality. However, the default will be to use Ray AIRs file transfer utilities.
New API:
- Users can pass `syncer=CustomSyncer` which implements the `Syncer` API
- Otherwise our off-the-shelf syncing is used
- As before, syncing to cloud disables syncing to driver
Changes:
- Sync client is removed
- Syncer interface introduced
- _DefaultSyncer is a wrapper around the URI upload/download API from Ray AIR
- SyncerCallback only uses remote tasks to synchronize data
- Rsync syncing is fully depracated and removed
- Docker and kubernetes-specific syncing is fully deprecated and removed
- Testing is improved to use `file://` URIs instead of mock sync clients
**Update**: This PR is now part 3 of a three PR group to consolidate the checkpoints.
1. Part 1 adds the common checkpoint management class #24771
2. Part 2 adds the integration for Ray Train #24772
3. This PR builds on #24772 and includes all changes. It moves the Ray Tune integration to use the new common checkpoint manager class.
Old PR description:
This PR consolidates the Ray Train and Tune checkpoint managers. These concepts previously did something very similar but in different modules. To simplify maintenance in the future, we've consolidated the common core.
- This PR keeps full compatibility with the previous interfaces and implementations. This means that for now, Train and Tune will have separate CheckpointManagers that both extend the common core
- This PR prepares Tune to move to a CheckpointStrategy object
- In follow-up PRs, we can further unify interfacing with the common core, possibly removing any train- or tune-specific adjustments (e.g. moving to setup on init rather on runtime for Ray Train)
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
The package "ml" should be renamed to "air".
Main question: Keep a `ml.py` with `from ray.air import *` for some level of backwards compatibility?
I'd go for no to force people to use the new structure.
The TrialExecutor base class was a stub and has been deprecated long ago; direct inheritance was disabled. This PR removes the base class and moves the remaining functionality into the RayTrialExecutor.
Currently, we are not running doc notebooks in CI due to a bazel misconfiguration - we are using `glob` in a top level package in order to get the paths for the notebooks, but those are contained inside subpackages, which glob purposefully ignores. Therefore, the lists of notebooks to run are empty. This PR fixes that by:
* Running the `py_test_run_all_notebooks` macro inside the relevant subpackages
* Editing the `test_myst_doc.py` script to allow for recursive search for the target file, allowing to deal with mismatches between `name` and `data` arguments in `py_test_run_all_notebooks`
* Setting the `allow_empty=False` flag inside `glob` calls in our macros to ensure that this oversight is caught early
* Enabling detection of changes in doc folder for `*.ipynb` and `BUILD` files
This PR also adds a GPU runner for doc tests, allowing one of our examples to pass - and setting the infra for more to come. Finally, a misconfigured path for one set of doc tests is also fixed.
Rolling out next deprecation cycle:
- DeprecationWarnings that were `warnings.warn` or `logger.warn` before are now raised errors
- Raised Deprecation warnings are now removed
- Notably, this involves deprecating the TrialCheckpoint functionality and associated cloud tests
- Added annotations to deprecation warning for when to fully remove
Ray SGD v1 has been denoted as a deprecated API for a while. This PR fully deprecates Ray SGD v1. An error will be raised if ray.util.sgd package is attempted to be imported.
Closes#16435