More replacements of tune.run() in examples/docstrings for Tuner.fit()
Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>
Co-authored-by: Kai Fricke <kai@anyscale.com>
Signed-off-by: Kai Fricke coding@kaifricke.com
Why are these changes needed?
Splitting up #26884: This PR includes changes to use Tuner() instead of tune.run() for most docs files (rst and py), and a change to move reuse_actors to the TuneConfig
Update documentation to use `session.report`.
Next steps:
1. Update our internal caller to use `session.report`. Most importantly, CheckpointManager and DataParallelTrainer.
2. Update `get_trial_resources` to use PGF notions to incorporate the requirement of ResourceChangingScheduler. @Yard1
3. After 2 is done, change all `tune.get_trial_resources` to `session.get_trial_resources`
4. [internal implementation] remove special checkpoint handling logic from huggingface trainer. Optimize the flow for checkpoint conversion with `session.report`.
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
**Update**: This PR is now part 3 of a three PR group to consolidate the checkpoints.
1. Part 1 adds the common checkpoint management class #24771
2. Part 2 adds the integration for Ray Train #24772
3. This PR builds on #24772 and includes all changes. It moves the Ray Tune integration to use the new common checkpoint manager class.
Old PR description:
This PR consolidates the Ray Train and Tune checkpoint managers. These concepts previously did something very similar but in different modules. To simplify maintenance in the future, we've consolidated the common core.
- This PR keeps full compatibility with the previous interfaces and implementations. This means that for now, Train and Tune will have separate CheckpointManagers that both extend the common core
- This PR prepares Tune to move to a CheckpointStrategy object
- In follow-up PRs, we can further unify interfacing with the common core, possibly removing any train- or tune-specific adjustments (e.g. moving to setup on init rather on runtime for Ray Train)
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
Continuing docs overhaul, tune now has:
- [x] better landing page
- [x] a getting started guide
- [x] user guide was cut down, partially merged with FAQ, and partially integrated with tutorials
- [x] the new user guide contains guides to tune features and practical integrations
- [x] we rewrote some of the feature guides for clarity
- [x] we got rid of sphinx-gallery for this sub-project (only data and core left), as it looks bad and is unnecessarily complicated anyway (plus, makes the build slower)
- [x] sphinx-gallery examples are now moved to markdown notebook, as started in #22030.
- [x] Examples are tested in the new framework, of course.
There's still a lot one can do, but this is already getting too large. Will follow up with more fine-tuning next week.
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
* [tune] make `tune.with_parameters()` work with the class API
* Update python/ray/tune/utils/trainable.py
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
* Refactor placement group factory object to accept placement_group arguments instead of callables
* Convert resources to pgf
* Enable placement groups per default
* Fix tests WIP
* Fix stop/resume with placement groups
* Fix progress reporter test
* Fix trial executor tests
* Check resource for trial, not resource object
* Move ENV vars into class
* Fix tests
* Sphinx
* Wait for trial start in PBT
* Revert merge errors
* Support trial reuse with placement groups
* Better check for just staged trials
* Fix trial queuing
* Wait for pg after trial termination
* Clean up PGs before tune run
* No PG settings in pbt scheduler
* Fix buffering tests
* Skip test if ray reports erroneous available resources
* Disable PG for cluster resource counting test
* Debug output for tests
* Output in-use resources for placement groups
* Don't start new trial on trial start failure
* Add docs
* Cleanup PGs once futures returned
* Fix placement group shutdown
* Use updated_queue flag
* Apply suggestions from code review
* Apply suggestions from code review
* Update docs
* Reuse placement groups independently from actors
* Do not remove placement groups for paused trials
* Only continue enqueueing trials if it didn't fail the first time
* Rename parameter
* Fix pause trial
* Code review + try_recover
* Update python/ray/tune/utils/placement_groups.py
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
* Move placement group lifecycle management
* Move total used resources to pg manager
* Update FAQ example
* Requeue trial if start was unsuccessful
* Do not cleanup pgs at start of run
* Revert "Do not cleanup pgs at start of run"
This reverts commit 933d9c4c
* Delayed PG removal
* Fix trial requeue test
* Trigger pg cleanup on status update
* Fix tests
* Fix docs
* fix-test
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
* Support `yield` and `return` statements in Tune trainable functions
* Support anonymous metric with ``tune.report(value)``
* Raise on invalid return/yield value
* Fix end to end reporter test
* create guide gallery for Tune
* mods
* ok
* fix
* fix_up_gallery
* ok
* Apply suggestions from code review
Co-Authored-By: Sven Mika <sven@anyscale.io>
* Apply suggestions from code review
Co-Authored-By: Sven Mika <sven@anyscale.io>
Co-authored-by: Sven Mika <sven@anyscale.io>