mirror of
https://github.com/vale981/ray
synced 2025-03-06 02:21:39 -05:00
![]() Adds a working failure test for streaming and non-streaming shuffle, without lineage reconstruction. This does a few things. Test improvements: - modifies AutoscalingCluster to allow passing an idle node timeout (the default is very low) - some small improvements to the NodeKiller actor to hopefully improve flakiness. Shuffle fixes: - modifies shuffle tracker to wait on futures instead of having tasks signal. During failures, tasks may never signal the tracker, so we can't rely on these to track progress. Core fixes: - raylet will exit immediately if it receives the Shutdown RPC with graceful=False - there was a bug here where it's supposed to exit after replying to the client, but the gRPC server goes down for an unknown reason and the client reply is never sent - On reference deletion, the owner now publishes an additional message to subscribers that the object has been deleted. Previously, this was causing a hang in streaming shuffle because the raylets pulling an object subscribed after the object was already deleted, so they never received the error signal. |
||
---|---|---|
.. | ||
ray | ||
requirements | ||
asv.conf.json | ||
build-wheel-macos-arm64.sh | ||
build-wheel-macos.sh | ||
build-wheel-manylinux2014.sh | ||
build-wheel-windows.sh | ||
MANIFEST.in | ||
README-building-wheels.md | ||
requirements.txt | ||
requirements_linters.txt | ||
requirements_ml_docker.txt | ||
setup.py |