* Refactor placement group factory object to accept placement_group arguments instead of callables
* Convert resources to pgf
* Enable placement groups per default
* Fix tests WIP
* Fix stop/resume with placement groups
* Fix progress reporter test
* Fix trial executor tests
* Check resource for trial, not resource object
* Move ENV vars into class
* Fix tests
* Sphinx
* Wait for trial start in PBT
* Revert merge errors
* Support trial reuse with placement groups
* Better check for just staged trials
* Fix trial queuing
* Wait for pg after trial termination
* Clean up PGs before tune run
* No PG settings in pbt scheduler
* Fix buffering tests
* Skip test if ray reports erroneous available resources
* Disable PG for cluster resource counting test
* Debug output for tests
* Output in-use resources for placement groups
* Don't start new trial on trial start failure
* Add docs
* Cleanup PGs once futures returned
* Fix placement group shutdown
* Use updated_queue flag
* Apply suggestions from code review
* Apply suggestions from code review
* Update docs
* Reuse placement groups independently from actors
* Do not remove placement groups for paused trials
* Only continue enqueueing trials if it didn't fail the first time
* Rename parameter
* Fix pause trial
* Code review + try_recover
* Update python/ray/tune/utils/placement_groups.py
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
* Move placement group lifecycle management
* Move total used resources to pg manager
* Update FAQ example
* Requeue trial if start was unsuccessful
* Do not cleanup pgs at start of run
* Revert "Do not cleanup pgs at start of run"
This reverts commit 933d9c4c
* Delayed PG removal
* Fix trial requeue test
* Trigger pg cleanup on status update
* Fix tests
* Fix docs
* fix-test
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
* Small improvements to the Ray Cluster docs
* Update quickstart.rst
Changed title for quick start
Co-authored-by: Javier Redondo <javier@Anyscale-MacBook-Pro.local>
* Add `ray get-logs` CLI command to fetch logs and state from nodes in a cluster
* Add dataclasses for py < 3.7
* Remove dataclasses dependency in setup.py
* Rename command, print what is collected
* Remove dataclass dependency
* Typo
* Lint
* Apply suggestions fom code review
* random doc typo
* max-worker-default-inf
* fix
* -1 means infinity
* doc
* comment tweak
* fix random typo
* Cluster max-worker default
* fix
* typo
* test
* Git add the test
* doc-tweak
* rest of the test logistics
* periods in doc
* Address comments
* docstring
* initial-commit-to-support
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
* basic-test
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
* ok
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
* smoke-test
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
* Track which pull bundle requests are ready to run
* Regression test
* Reset retry timer on pull activation, don't count created objects towards memory usage, abort objects on pull deactivation
* Revert "Track which pull bundle requests are ready to run"
This reverts commit b5d0714783fa2fc842bdd4e2d2802228e25f03c2.
* Check object active before receiving chunk
* lint
* debug, unit test, fix race condition
* lint
* update
* lint
* fix
* fix build
* fix test
* remove print
* Fix bug in bytes accounting
* Split
* Track which pull bundle requests are ready to run
* Regression test
* Reset retry timer on pull activation, don't count created objects towards memory usage, abort objects on pull deactivation
* Revert "Track which pull bundle requests are ready to run"
This reverts commit b5d0714783fa2fc842bdd4e2d2802228e25f03c2.
* Check object active before receiving chunk
* lint
* debug, unit test, fix race condition
* lint
* update
* lint
* fix
* fix build
* fix test
* remove print
* Fix bug in bytes accounting