mirror of
https://github.com/vale981/ray
synced 2025-03-06 02:21:39 -05:00
![]() Dask default's to a disk-based shuffle even thought we're using a distributed scheduler, which appears to be resulting in dropped data since the filesystem isn't shared across nodes. Dask Distributed manually sets the shuffle algorithm in the global config to the task-based shuffle, which the Dask-on-Ray scheduler should probably do as well. This PR adds a Dask config helper, `enable_dask_on_ray`, that sets Dask-on-Ray as the default scheduler along with changing the default shuffle to a task-based shuffle. The shuffle method can still be overridden by the user by manually specifying `df.set_index(shuffle="disk")`. |
||
---|---|---|
.. | ||
_build | ||
azure | ||
examples | ||
kubernetes | ||
site | ||
source | ||
tools | ||
yarn | ||
.gitignore | ||
BUILD | ||
make.bat | ||
Makefile | ||
README.md | ||
requirements-doc.txt | ||
requirements-rtd.txt |
Ray Documentation
To compile the documentation, run the following commands from this directory. Note that Ray must be installed first.
pip install -r requirements-doc.txt
pip install -U -r requirements-rtd.txt # important for reproducing the deployment environment
make html
open _build/html/index.html
To test if there are any build errors with the documentation, do the following.
sphinx-build -b html -d _build/doctrees source _build/html
To check if there are broken links, run the following (we are currently not running this in the CI since there are false positives).
make linkcheck