ray/python at c95dd799535771df2ac789f5db17a24f8cfcf1c0 - hiro/ray

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

History

Stephanie Wang c1054a0baa [Datasets] Implement push-based shuffle (#23758 ) The simple shuffle currently implemented in Datasets does not reliably scale past 1000+ partitions due to metadata and I/O overhead. This PR adds an experimental shuffle implementation for a "push-based shuffle", as described in this paper draft. This algorithm should see better performance at larger data scales. The algorithm works by merging intermediate map outputs at the reducer side while other map tasks are executing. Then, a final reduce task merges these merged outputs. Currently, the PR exposes this option through the DatasetContext. It can also be set through a hidden OS environment variable (RAY_DATASET_PUSH_BASED_SHUFFLE). Once we have more comprehensive benchmarks, we can better document this option and allow the algorithm to be chosen at run time. Related issue number Closes #23758.		2022-04-27 11:59:41 -07:00
..
ray	[Datasets] Implement push-based shuffle (#23758 )	2022-04-27 11:59:41 -07:00
requirements	[AIR] Add distributed `torch_geometric` example (#23580 )	2022-04-21 09:48:43 -07:00
asv.conf.json	[docs] Move all /latest links to /master (#11897 )	2020-11-10 10:53:28 -08:00
build-wheel-macos-arm64.sh	[ci] Clean up ci/ directory (refactor ci/travis) (#23866 )	2022-04-13 18:11:30 +01:00
build-wheel-macos.sh	[ci] Clean up ci/ directory (refactor ci/travis) (#23866 )	2022-04-13 18:11:30 +01:00
build-wheel-manylinux2014.sh	[ci] Clean up ci/ directory (refactor ci/travis) (#23866 )	2022-04-13 18:11:30 +01:00
build-wheel-windows.sh	[ci] Clean up ci/ directory (refactor ci/travis) (#23866 )	2022-04-13 18:11:30 +01:00
MANIFEST.in	Includes .pyi files in package data. (#21247 )	2021-12-27 11:50:02 -08:00
README-building-wheels.md	[build] Build wheels with manylinux2014 (#11621 )	2020-11-03 19:36:32 -08:00
requirements.txt	[core] Fix internal storage S3 bugs (#24167 )	2022-04-27 09:57:14 -07:00
requirements_linters.txt	Remove yapf dependency (#23656 )	2022-04-04 21:50:04 -07:00
requirements_ml_docker.txt	[AIR] Add distributed `torch_geometric` example (#23580 )	2022-04-21 09:48:43 -07:00
setup.py	[air] Move storage handling to pyarrow.fs.FileSystem (#23370 )	2022-04-13 14:31:30 -07:00