Commit graph

148 commits

Author SHA1 Message Date
Qingyun Wu
7678503d84
[Tune][docs]Correct reference name to CFO example (#17503) 2021-08-02 14:46:10 +01:00
amavilla
f2d9b1f2b9
[docs] Link broken in Tune's page (#17394) (#17407) 2021-07-28 09:27:54 -07:00
Antoni Baum
b500a651b7
[docs] Add LightGBM Tune integration to docs (#17304)
* Add LightGBM integration to docs

* Fix
2021-07-23 21:21:13 -07:00
Antoni Baum
2e37826458
[tune] Function API support for ResourceChangingScheduler (#17150)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-07-21 14:14:12 -07:00
Antoni Baum
f20311f194
[tune] ResourceChangingScheduler improvements (#17082) 2021-07-15 15:03:27 +01:00
Antoni Baum
6e780ebf07
[tune] ResourceChangingScheduler dynamic resource allocation during tuning (#16787) 2021-07-14 10:45:13 +01:00
Kai Fricke
fce8fa2668
[tune] use bayesopt for quick start example (which actually converges) (#16997) 2021-07-12 14:50:32 +01:00
Antoni Baum
0935ec30d0
[tune] Add information about environment variables to tune.run docstring (#16980) 2021-07-11 17:20:17 -07:00
Amog Kamsetty
33d798f8fc
[Docs] Add e2e guide on using Pytorch Lightning with Ray (#16484)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-06-19 10:04:58 -07:00
Kai Fricke
172d33be02
[tune] Use unbuffered training when checkpoint_at_end is used. (#16504) 2021-06-18 14:19:14 +01:00
Antoni Baum
d71ec6e874
[docs] Add examples of new features to contribute (#16477) 2021-06-18 00:07:03 -07:00
Qingyun Wu
dae3ac1def
[Tune] Add new searchers from FLAML (#16329) 2021-06-12 02:10:51 -07:00
Kai Fricke
e8f8e9f328
[tune] Adjust searcher sample bounds to match Tune API (#15899) 2021-06-11 14:31:08 +01:00
Amog Kamsetty
04863d158a
[Tune] MLflow with Ray Client (#16029)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-06-01 09:50:44 -07:00
Amog Kamsetty
38b657cb65
[Tune] Place remote tune.run on node running the client server (#16034)
* force placement on persistent node

* address comments

* doc
2021-05-28 18:32:57 -07:00
Edward Oakes
82410f20b2
[serve] Add warning + docstring for anonymous namespaces (#15921) 2021-05-20 22:27:15 -05:00
Tom Dörr
3c99f1db4c
[Docs] Tune Contributors fix (#15719) 2021-05-10 12:22:47 -07:00
Tom Dörr
b5c03b6458
Fix Link (#15722) 2021-05-10 12:19:32 -07:00
Kai Fricke
16381625db
[tune] Reduce default number of maximum pending trials to max(16, cluster_cpus) (#15628) 2021-05-05 15:54:27 +01:00
Edward Oakes
c9550a86dc
[serve] Update docs for v2 Deployments API (#15582) 2021-05-03 13:19:34 -05:00
Richard Liaw
f4b2dd94b2
[tune] Cache MNIST and restore MNIST tests (#15260) 2021-04-13 14:20:26 -07:00
Kai Fricke
d33b0e4bc3
[tune] Reconcile placement groups every N seconds to avoid bottlenecks when running many short trials (#15011)
Closes a release blocking issue
2021-04-01 17:04:44 +02:00
Kai Fricke
84b3c3376b
[tune] document scalability best practices (k8s, scalability thresholds) (#14566)
Adds a new page and table to document current scalability thresholds in Ray Tune to the documentation.

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-03-25 09:54:14 +01:00
Kai Fricke
898243d538
[tune] Limit maximum number of pending trials. Add convergence test. (#14835) 2021-03-23 18:19:41 -07:00
Amog Kamsetty
7ee2e4185b
[Tune] PTL Fractional GPUs (#14781)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-03-18 17:07:51 -07:00
Kai Fricke
43e098402a
[tune] make tune.with_parameters() work with the class API (#14532)
* [tune] make `tune.with_parameters()` work with the class API

* Update python/ray/tune/utils/trainable.py

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-03-09 09:36:17 +01:00
Kai Fricke
b0bf44b154
[tune/docs] Add high level trial runner flow to documentation (#14468)
* [tune/docs] Add high level trial runner flow to documentation

* Apply suggestions from code review
2021-03-08 10:35:54 +01:00
Kai Fricke
4014168928
[tune] Introduce durable() wrapper to convert trainables into durable trainables (#14306)
* [tune] Introduce `durable()` wrapper to convert trainables into durable trainables

* Fix wrong check

* Improve docs, add FAQ for tackling overhead

* Fix bugs in `tune.with_parameters`

* Update doc/source/tune/api_docs/trainable.rst

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* Update doc/source/tune/_tutorials/_faq.rst

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-02-26 13:59:28 +01:00
Kai Fricke
757866ec01
[tune] enable placement groups per default (#13906)
* Refactor placement group factory object to accept placement_group arguments instead of callables

* Convert resources to pgf

* Enable placement groups per default

* Fix tests WIP

* Fix stop/resume with placement groups

* Fix progress reporter test

* Fix trial executor tests

* Check resource for trial, not resource object

* Move ENV vars into class

* Fix tests

* Sphinx

* Wait for trial start in PBT

* Revert merge errors

* Support trial reuse with placement groups

* Better check for just staged trials

* Fix trial queuing

* Wait for pg after trial termination

* Clean up PGs before tune run

* No PG settings in pbt scheduler

* Fix buffering tests

* Skip test if ray reports erroneous available resources

* Disable PG for cluster resource counting test

* Debug output for tests

* Output in-use resources for placement groups

* Don't start new trial on trial start failure

* Add docs

* Cleanup PGs once futures returned

* Fix placement group shutdown

* Use updated_queue flag

* Apply suggestions from code review

* Apply suggestions from code review

* Update docs

* Reuse placement groups independently from actors

* Do not remove placement groups for paused trials

* Only continue enqueueing trials if it didn't fail the first time

* Rename parameter

* Fix pause trial

* Code review + try_recover

* Update python/ray/tune/utils/placement_groups.py

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* Move placement group lifecycle management

* Move total used resources to pg manager

* Update FAQ example

* Requeue trial if start was unsuccessful

* Do not cleanup pgs at start of run

* Revert "Do not cleanup pgs at start of run"

This reverts commit 933d9c4c

* Delayed PG removal

* Fix trial requeue test

* Trigger pg cleanup on status update

* Fix tests

* Fix docs

* fix-test

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-02-23 18:46:02 +01:00
Antoni Baum
58d7398246
[Tune] Add HEBOSearch Searcher (#13863)
* HEBO first pass

* Fix bad quotes

* Fixes

* Reproductibility

* Update python/ray/tune/suggest/hebo.py

Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>

* Add hebo_example.py to BUILD

* Nit

* Update to pypi package

* Alphabetical HEBO requirement

* Fix syntax error

* Fix wrong space in hebo example

* Move validate_warmstart to utils

* Space assertion in HEBO

* Comment

* Apply suggestions from code review

Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>

* Formatting

Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
2021-02-17 22:53:10 +01:00
javi-redondo
b8b2d6410d
[docs] new Ray Cluster documentation (#13839)
Co-authored-by: Javier Redondo <javier@anyscale.com>
Co-authored-by: AmeerHajAli <ameerh@berkeley.edu>
2021-02-15 00:47:14 -08:00
Richard Liaw
6c77aeb98a
[docs] ray slack remove banners (#13898)
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2021-02-04 01:14:34 -08:00
Kai Fricke
d29fcfb45c
[tune] catch SIGINT signal and trigger experiment checkpoint (#13767)
* [tune] catch SIGINT signal and trigger experiment checkpoint

* Apply suggestions from code review

* Fix user guide docs

* Update doc/source/tune/user-guide.rst
2021-02-02 14:52:09 +01:00
architkulkarni
28cf5f91e3
[docs] change MLFlow to MLflow in docs (#13739) 2021-01-27 16:53:15 -08:00
Amog Kamsetty
20016c983f
[Tune] MLflow Credentials (#13533) 2021-01-19 11:55:13 -08:00
Kai Fricke
dc42abb2f5
[tune] placement group support (#13370) 2021-01-18 11:58:57 -08:00
Richard Liaw
86387504ee
[tune] fix small docs typo (#13355)
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2021-01-16 00:49:17 -08:00
Kai Fricke
518427627b
[tune] buffer trainable results (#13236)
* Working prototype

* Pass buffer length, fix tests

* Don't buffer per default

* Dispatch and process save in one go, added tests

* Fix tests

* Pass adaptive seconds to train_buffered, stop result processing after STOP decision

* Fix tests, add release test

* Update tests

* Added detailed logs for slow operations

* Update python/ray/tune/trial_runner.py

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* Apply suggestions from code review

* Revert tests and go back to old tuning loop

* nit

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-01-12 18:52:47 +01:00
Edwin Goh
a5ddc27bab
Fix typo in Tune Docs (Checkpointing) (#13348)
See issue #13299
2021-01-11 20:27:18 -08:00
Amog Kamsetty
0452a3a435
[Tune] Rename MLFlow to MLflow (#13301) 2021-01-11 17:36:55 -08:00
Kai Fricke
97211a6170
[Tune] Fix tune serve integration example (#13233) 2021-01-06 17:02:04 +01:00
Lavanya Shukla
350917958c
[docs] fix wandb url (#13094) 2020-12-28 17:19:17 -08:00
Antoni Baum
a4f2dd2138
[Tune]Add integer loguniform support (#12994)
* Add integer quantization and loguniform support

* Fix hyperopt qloguniform not being np.log'd first

* Add tests, __init__

* Try to fix tests, better exceptions

* Tweak docstrings

* Type checks in SearchSpaceTest

* Update docs

* Lint, tests

* Update doc/source/tune/api_docs/search_space.rst

Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>

Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
2020-12-23 09:27:16 -08:00
Amog Kamsetty
5d3c9c8861
[Tune] Mlflow Integration (#12840)
Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-12-19 00:40:02 -08:00
Kai Fricke
3d72000826
[tune] Add points_to_evaluate to BasicVariantGenerator (#12916)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-12-17 19:16:03 -08:00
Kai Fricke
5f04ade6ef
[tune] add more stoppers and stopper documentation (#12750)
* Add new stoppers & docs

* Add tests for maximum iteration stopper and trial plateau stopper

* Update python/ray/tune/stopper.py

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* Update doc/source/tune/api_docs/stoppers.rst

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* Update doc/source/tune/api_docs/stoppers.rst

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* Apply suggestions from code review

* Apply suggestions from code review

* Update python/ray/tune/stopper.py

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-12-12 01:47:19 -08:00
Richard Liaw
9ce7ad17fd
[tune] remove some bottlenecks in trialrunner (#12476) 2020-11-30 14:54:25 -08:00
Richard Liaw
7c009d22cf
[docs] Add xgboost_ray to docs (#12184)
Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
2020-11-27 11:36:56 -08:00
Richard Liaw
e59fe65d3d
[tune] Fix logging for dockersyncer (#12196) 2020-11-23 14:29:41 -08:00
Kai Fricke
9f5986ee58
[tune] logger migration to ExperimentLogger classes (#11984) 2020-11-16 15:08:37 -08:00