mirror of
https://github.com/vale981/ray
synced 2025-03-06 10:31:39 -05:00
![]() This PR introduces cluster-level fault tolerance for Tune by checkpointing global state. This occurs with relatively high frequency and allows users to easily resume experiments when the cluster crashes. Note that this PR may affect automated workflows due to auto-prompting, but this is resolvable. |
||
---|---|---|
.. | ||
benchmarks | ||
ray | ||
asv.conf.json | ||
build-wheel-macos.sh | ||
build-wheel-manylinux1.sh | ||
README-benchmarks.rst | ||
README-building-wheels.md | ||
setup.py |