Kai Fricke
9352cb781c
[release tests] Fix microbenchmark base image, network overhead cluster wait time, add long running tests ( #16355 )
2021-06-16 21:37:17 +01:00
Kai Fricke
153a8b8fec
[release] convert tune release tests ( #15913 )
2021-06-01 11:19:15 -07:00
Kai Fricke
1d52ab819f
[release] release 1.3.0 results and test updates ( #15366 )
...
Convert a number of release tests and add logs for release 1.3.0
2021-05-04 22:10:04 +01:00
Kai Fricke
4014168928
[tune] Introduce durable()
wrapper to convert trainables into durable trainables ( #14306 )
...
* [tune] Introduce `durable()` wrapper to convert trainables into durable trainables
* Fix wrong check
* Improve docs, add FAQ for tackling overhead
* Fix bugs in `tune.with_parameters`
* Update doc/source/tune/api_docs/trainable.rst
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
* Update doc/source/tune/_tutorials/_faq.rst
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-02-26 13:59:28 +01:00
Kai Fricke
1ef2a6790c
[tune] add scalability release tests ( #13986 )
...
* Add scalability tests
* Network overhead cluster
* Update xgboost tests
* Document release tests
* Don't raise on failed trial
* Update to multi node yamls
* Update yamls
* Revert xgboost test changes
* Fix import
* Update release/tune_tests/scalability_tests/workloads/test_bookkeeping_overhead.py
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
* Pass aws credentials (WIP)
* Update durable trainable example
* Update xgboost sweep
* Change xgboost scope, fix durable trainable stop condition
* Fix max depth to limit total test length
* Add cluster information to test descriptions. Update release checklist/process docs
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-02-10 17:16:31 +01:00
Kai Fricke
518427627b
[tune] buffer trainable results ( #13236 )
...
* Working prototype
* Pass buffer length, fix tests
* Don't buffer per default
* Dispatch and process save in one go, added tests
* Fix tests
* Pass adaptive seconds to train_buffered, stop result processing after STOP decision
* Fix tests, add release test
* Update tests
* Added detailed logs for slow operations
* Update python/ray/tune/trial_runner.py
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
* Apply suggestions from code review
* Revert tests and go back to old tuning loop
* nit
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-01-12 18:52:47 +01:00