ray/doc/source/data at f814c2af89555d01c1c319ef8feb4cf8058d393a - hiro/ray

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 18:41:40 -05:00

History

Eric Liang 22ccc6b300 Initial stats framework for datasets (#20867 ) This adds an initial Dataset.stats() framework for debugging dataset performance. At a high level, execution stats for tasks (e.g., CPU time) are attached to block metadata objects. Datasets have stats objects that hold references to these stats and parent dataset stats (this avoids stats holding references to parent datasets, allowing them to be gc'ed). Similarly, DatasetPipelines hold stats from recently computed datasets. Currently only basic ops like map / map_batches are instrumented. TODO placeholders are left for future PRs.		2021-12-08 16:13:57 -08:00
..
_examples	[Train][Data] Change usages of `iter_datasets` to `iter_epochs` (#20487 )	2021-11-17 18:05:51 -08:00
modin	[client][docs] update docs for new client support in init (#17333 )	2021-08-04 05:31:44 +03:00
.gitignore	[Core][Dataset] adding example for large scale data ingestion (#18998 )	2021-10-11 15:37:09 -07:00
big_data_ingestion.yaml	[Core][Dataset] adding example for large scale data ingestion (#18998 )	2021-10-11 15:37:09 -07:00
dask-on-ray.rst	[docs] add dask compatibility for 1.9.0 (#20707 )	2021-11-24 15:00:17 -08:00
dataset-arch.svg	[data] Cleanup Block type by dropping Generic[T] (#17276 )	2021-07-23 09:23:06 -07:00
dataset-compute-1.png	Dataset doc updates (#19815 )	2021-11-04 18:13:40 -07:00
dataset-execution-model.rst	Initial stats framework for datasets (#20867 )	2021-12-08 16:13:57 -08:00
dataset-loading-1.png	Dataset doc updates (#19815 )	2021-11-04 18:13:40 -07:00
dataset-loading-2.png	Dataset doc updates (#19815 )	2021-11-04 18:13:40 -07:00
dataset-map.svg	Split blocks automatically into 500MB chunks on file read and transformation (#20235 )	2021-11-15 22:25:11 -08:00
dataset-ml-preprocessing.rst	[Datasets] Last-mile preprocessing docs. (#20712 )	2021-11-29 23:23:27 -08:00
dataset-pipeline-1.svg	Initial implementation of Dataset pipelining and docs (#17309 )	2021-07-28 21:12:01 -07:00
dataset-pipeline-2.svg	Initial implementation of Dataset pipelining and docs (#17309 )	2021-07-28 21:12:01 -07:00
dataset-pipeline-3.svg	Initial implementation of Dataset pipelining and docs (#17309 )	2021-07-28 21:12:01 -07:00
dataset-pipeline.rst	[Train] Rename Ray SGD v2 to Ray Train (#19436 )	2021-10-18 22:27:46 -07:00
dataset-read.svg	Split blocks automatically into 500MB chunks on file read and transformation (#20235 )	2021-11-15 22:25:11 -08:00
dataset-repeat-1.svg	Initial implementation of Dataset pipelining and docs (#17309 )	2021-07-28 21:12:01 -07:00
dataset-repeat-2.svg	Initial implementation of Dataset pipelining and docs (#17309 )	2021-07-28 21:12:01 -07:00
dataset-shuffle.svg	Split blocks automatically into 500MB chunks on file read and transformation (#20235 )	2021-11-15 22:25:11 -08:00
dataset-spill.svg	Split blocks automatically into 500MB chunks on file read and transformation (#20235 )	2021-11-15 22:25:11 -08:00
dataset-tensor-support.rst	[Datasets] Delineate between ref and raw APIs for the Pandas/Arrow integrations. (#18992 )	2021-10-01 13:08:25 -07:00
dataset.rst	[Datasets] Last-mile preprocessing docs. (#20712 )	2021-11-29 23:23:27 -08:00
dataset.svg	[data] Cleanup Block type by dropping Generic[T] (#17276 )	2021-07-23 09:23:06 -07:00
mars-on-ray.rst	First cut at dataset documentation (#16956 )	2021-07-14 23:27:13 -07:00
package-ref.rst	Simple block dataset groupBy (#19435 )	2021-10-19 19:53:13 -07:00
raydp.rst	[Train] Rename Ray SGD v2 to Ray Train (#19436 )	2021-10-18 22:27:46 -07:00