* Refactor placement group factory object to accept placement_group arguments instead of callables
* Convert resources to pgf
* Enable placement groups per default
* Fix tests WIP
* Fix stop/resume with placement groups
* Fix progress reporter test
* Fix trial executor tests
* Check resource for trial, not resource object
* Move ENV vars into class
* Fix tests
* Sphinx
* Wait for trial start in PBT
* Revert merge errors
* Support trial reuse with placement groups
* Better check for just staged trials
* Fix trial queuing
* Wait for pg after trial termination
* Clean up PGs before tune run
* No PG settings in pbt scheduler
* Fix buffering tests
* Skip test if ray reports erroneous available resources
* Disable PG for cluster resource counting test
* Debug output for tests
* Output in-use resources for placement groups
* Don't start new trial on trial start failure
* Add docs
* Cleanup PGs once futures returned
* Fix placement group shutdown
* Use updated_queue flag
* Apply suggestions from code review
* Apply suggestions from code review
* Update docs
* Reuse placement groups independently from actors
* Do not remove placement groups for paused trials
* Only continue enqueueing trials if it didn't fail the first time
* Rename parameter
* Fix pause trial
* Code review + try_recover
* Update python/ray/tune/utils/placement_groups.py
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
* Move placement group lifecycle management
* Move total used resources to pg manager
* Update FAQ example
* Requeue trial if start was unsuccessful
* Do not cleanup pgs at start of run
* Revert "Do not cleanup pgs at start of run"
This reverts commit 933d9c4c
* Delayed PG removal
* Fix trial requeue test
* Trigger pg cleanup on status update
* Fix tests
* Fix docs
* fix-test
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
* Small improvements to the Ray Cluster docs
* Update quickstart.rst
Changed title for quick start
Co-authored-by: Javier Redondo <javier@Anyscale-MacBook-Pro.local>
* random doc typo
* max-worker-default-inf
* fix
* -1 means infinity
* doc
* comment tweak
* fix random typo
* Cluster max-worker default
* fix
* typo
* test
* Git add the test
* doc-tweak
* rest of the test logistics
* periods in doc
* Address comments
* docstring
* Add better Dask-on-Ray example, and detail custom shuffle optimization.
* Misc. updates and feedback.
* Update doc/source/dask-on-ray.rst
Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>
* Set max_branch to infinity in shuffle optimization example.
* Feedback
* Apply suggestions from code review
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
* 80 col width
Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
* Revert "Revert "Enable Ray client server by default (#13350)" (#13429)"
This reverts commit 560299972c.
* fix job id collision with ray client server