Commit graph

17 commits

Author SHA1 Message Date
Kai Fricke
43e098402a
[tune] make tune.with_parameters() work with the class API (#14532)
* [tune] make `tune.with_parameters()` work with the class API

* Update python/ray/tune/utils/trainable.py

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-03-09 09:36:17 +01:00
Kai Fricke
4014168928
[tune] Introduce durable() wrapper to convert trainables into durable trainables (#14306)
* [tune] Introduce `durable()` wrapper to convert trainables into durable trainables

* Fix wrong check

* Improve docs, add FAQ for tackling overhead

* Fix bugs in `tune.with_parameters`

* Update doc/source/tune/api_docs/trainable.rst

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* Update doc/source/tune/_tutorials/_faq.rst

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-02-26 13:59:28 +01:00
Kai Fricke
757866ec01
[tune] enable placement groups per default (#13906)
* Refactor placement group factory object to accept placement_group arguments instead of callables

* Convert resources to pgf

* Enable placement groups per default

* Fix tests WIP

* Fix stop/resume with placement groups

* Fix progress reporter test

* Fix trial executor tests

* Check resource for trial, not resource object

* Move ENV vars into class

* Fix tests

* Sphinx

* Wait for trial start in PBT

* Revert merge errors

* Support trial reuse with placement groups

* Better check for just staged trials

* Fix trial queuing

* Wait for pg after trial termination

* Clean up PGs before tune run

* No PG settings in pbt scheduler

* Fix buffering tests

* Skip test if ray reports erroneous available resources

* Disable PG for cluster resource counting test

* Debug output for tests

* Output in-use resources for placement groups

* Don't start new trial on trial start failure

* Add docs

* Cleanup PGs once futures returned

* Fix placement group shutdown

* Use updated_queue flag

* Apply suggestions from code review

* Apply suggestions from code review

* Update docs

* Reuse placement groups independently from actors

* Do not remove placement groups for paused trials

* Only continue enqueueing trials if it didn't fail the first time

* Rename parameter

* Fix pause trial

* Code review + try_recover

* Update python/ray/tune/utils/placement_groups.py

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* Move placement group lifecycle management

* Move total used resources to pg manager

* Update FAQ example

* Requeue trial if start was unsuccessful

* Do not cleanup pgs at start of run

* Revert "Do not cleanup pgs at start of run"

This reverts commit 933d9c4c

* Delayed PG removal

* Fix trial requeue test

* Trigger pg cleanup on status update

* Fix tests

* Fix docs

* fix-test

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-02-23 18:46:02 +01:00
Keqiu Hu
0c1bdaef59
[tune] TensorFlow Distributed Trainable (#11876)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-10 14:59:08 -08:00
Richard Liaw
b02e61f672
[minor] fix up docs (#11596)
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2020-10-26 12:19:03 -07:00
Richard Liaw
56f858ed1a
[tune][docs/util] gputil check, docs (#11260)
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
2020-10-10 00:54:31 -07:00
Kai Fricke
508cfa3540
[tune] Support yield and return statements (#10857)
* Support `yield` and `return` statements in Tune trainable functions

* Support anonymous metric with ``tune.report(value)``

* Raise on invalid return/yield value

* Fix end to end reporter test
2020-09-17 20:18:35 -07:00
krfricke
f3f698816d
[tune] Added PyTorch Lightning callbacks to integrations (#10220)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-08-31 15:30:48 -07:00
Richard Liaw
0c3b9ebeef
[tune/sgd] Document func_trainable and add checkpoint context (#9739)
Co-authored-by: krfricke <krfricke@users.noreply.github.com>
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
2020-07-30 09:46:37 -07:00
Richard Liaw
b71c912da7
[tune] Fix up examples (#9201) 2020-07-05 01:16:20 -07:00
Richard Liaw
d35f0e40d0
[tune] Use public methods for trainable (#9184) 2020-07-01 11:00:00 -07:00
Richard Liaw
6c49c01837
[tune] Function API checkpointing (#8471)
Co-authored-by: krfricke <krfricke@users.noreply.github.com>
2020-06-15 10:42:54 -07:00
Richard Liaw
67c01455fe
[tune] tune.track -> tune.report (#8388) 2020-05-16 12:55:08 -07:00
Richard Liaw
be5235d982
[tune] Clarify Intro Tune Documentation (#8201) 2020-04-27 18:01:00 -07:00
Richard Liaw
b506f87117
[tune] New Doc edits, add Concepts page (#8083)
Co-Authored-By: Sven Mika <sven@anyscale.io>
2020-04-25 18:25:56 -07:00
Richard Liaw
a67edc4051
[tune] Improve user guides and API docs (#7716)
* create guide gallery for Tune

* mods

* ok

* fix

* fix_up_gallery

* ok

* Apply suggestions from code review

Co-Authored-By: Sven Mika <sven@anyscale.io>

* Apply suggestions from code review

Co-Authored-By: Sven Mika <sven@anyscale.io>

Co-authored-by: Sven Mika <sven@anyscale.io>
2020-04-06 12:16:35 -07:00
Richard Liaw
e311013afd
[tune] Reformat Sections of API Reference (#7706)
* moveit

* moveit

* docstrings to ref

* Update tune-usage.rst

Co-authored-by: Sven Mika <sven@anyscale.io>
2020-03-23 12:23:21 -07:00