ray/doc/source/data at 489e6945a671b7af2cf1f047b2ad2c879286ec73 - hiro/ray

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 18:41:40 -05:00

History

Clark Zinzow c3d68fa0c1 [Dask-on-Ray] Add Dask config helper, set task-based shuffle by default. (#21114 ) Dask default's to a disk-based shuffle even thought we're using a distributed scheduler, which appears to be resulting in dropped data since the filesystem isn't shared across nodes. Dask Distributed manually sets the shuffle algorithm in the global config to the task-based shuffle, which the Dask-on-Ray scheduler should probably do as well. This PR adds a Dask config helper, `enable_dask_on_ray`, that sets Dask-on-Ray as the default scheduler along with changing the default shuffle to a task-based shuffle. The shuffle method can still be overridden by the user by manually specifying `df.set_index(shuffle="disk")`.		2021-12-17 13:16:37 -08:00
..
_examples	[Train][Data] Change usages of `iter_datasets` to `iter_epochs` (#20487 )	2021-11-17 18:05:51 -08:00
modin	[client][docs] update docs for new client support in init (#17333 )	2021-08-04 05:31:44 +03:00
.gitignore	[Core][Dataset] adding example for large scale data ingestion (#18998 )	2021-10-11 15:37:09 -07:00
big_data_ingestion.yaml	[Core][Dataset] adding example for large scale data ingestion (#18998 )	2021-10-11 15:37:09 -07:00
dask-on-ray.rst	[Dask-on-Ray] Add Dask config helper, set task-based shuffle by default. (#21114 )	2021-12-17 13:16:37 -08:00
dataset-arch.svg	[data] Cleanup Block type by dropping Generic[T] (#17276 )	2021-07-23 09:23:06 -07:00
dataset-compute-1.png	Dataset doc updates (#19815 )	2021-11-04 18:13:40 -07:00
dataset-execution-model.rst	Initial stats framework for datasets (#20867 )	2021-12-08 16:13:57 -08:00
dataset-loading-1.png	Dataset doc updates (#19815 )	2021-11-04 18:13:40 -07:00
dataset-loading-2.png	Dataset doc updates (#19815 )	2021-11-04 18:13:40 -07:00
dataset-map.svg	Split blocks automatically into 500MB chunks on file read and transformation (#20235 )	2021-11-15 22:25:11 -08:00
dataset-ml-preprocessing.rst	[Datasets] Last-mile preprocessing docs. (#20712 )	2021-11-29 23:23:27 -08:00
dataset-pipeline-1.svg	Initial implementation of Dataset pipelining and docs (#17309 )	2021-07-28 21:12:01 -07:00
dataset-pipeline-2.svg	Initial implementation of Dataset pipelining and docs (#17309 )	2021-07-28 21:12:01 -07:00
dataset-pipeline-3.svg	Initial implementation of Dataset pipelining and docs (#17309 )	2021-07-28 21:12:01 -07:00
dataset-pipeline.rst	[Train] Rename Ray SGD v2 to Ray Train (#19436 )	2021-10-18 22:27:46 -07:00
dataset-read.svg	Split blocks automatically into 500MB chunks on file read and transformation (#20235 )	2021-11-15 22:25:11 -08:00
dataset-repeat-1.svg	Initial implementation of Dataset pipelining and docs (#17309 )	2021-07-28 21:12:01 -07:00
dataset-repeat-2.svg	Initial implementation of Dataset pipelining and docs (#17309 )	2021-07-28 21:12:01 -07:00
dataset-shuffle.svg	Split blocks automatically into 500MB chunks on file read and transformation (#20235 )	2021-11-15 22:25:11 -08:00
dataset-spill.svg	Split blocks automatically into 500MB chunks on file read and transformation (#20235 )	2021-11-15 22:25:11 -08:00
dataset-tensor-support.rst	[Datasets] Delineate between ref and raw APIs for the Pandas/Arrow integrations. (#18992 )	2021-10-01 13:08:25 -07:00
dataset.rst	[Datasets] Last-mile preprocessing docs. (#20712 )	2021-11-29 23:23:27 -08:00
dataset.svg	[data] Cleanup Block type by dropping Generic[T] (#17276 )	2021-07-23 09:23:06 -07:00
mars-on-ray.rst	First cut at dataset documentation (#16956 )	2021-07-14 23:27:13 -07:00
package-ref.rst	Simple block dataset groupBy (#19435 )	2021-10-19 19:53:13 -07:00
raydp.rst	[Train] Rename Ray SGD v2 to Ray Train (#19436 )	2021-10-18 22:27:46 -07:00