ray/doc/source/data at 1ad019aac3480e9d707b0fe6ade632a969c5b78a - hiro/ray

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-08 19:41:38 -05:00

History

Eric Liang 015181ab9a Add random access support for Datasets (experimental feature) (#22749 ) This PR adds experimental support for random access to datasets. A Dataset can be random access enabled by calling `ds.to_random_access_dataset(key, num_workers=N)`. This creates a RandomAccessDataset. RandomAccessDataset partitions the dataset across the cluster by the given sort key, providing efficient random access to records via binary search. A number of worker actors are created, each of which has zero-copy access to the underlying sorted data blocks of the Dataset. Performance-wise, you can expect each worker to provide ~3000 records / second via ``get_async()``, and ~10000 records / second via ``multiget()``. Since Ray actor calls go direct from worker->worker, throughput scales linearly with the number of workers.		2022-03-17 15:01:12 -07:00
..
doc_code	[docs] sphinx gallery removal, migrate to ipynb (#22467 )	2022-02-19 01:19:07 -08:00
examples	Make a pass fixing Dataset API issues (#22886 )	2022-03-08 13:07:55 -08:00
images	Remove beta label from Datasets (#23220 )	2022-03-15 23:05:59 -07:00
modin	[Docs] Ray Data docs target state (#21931 )	2022-01-27 13:14:36 -08:00
advanced-pipelines.rst	Undo revert of windowing dataset by bytes (#22735 )	2022-03-01 12:24:04 -08:00
big_data_ingestion.yaml	Revert "[docs] Clean up doc structure (first part) (#21667 )" (#21763 )	2022-01-20 15:30:56 -08:00
custom-data.rst	[Docs] Ray Data docs target state (#21931 )	2022-01-27 13:14:36 -08:00
dask-on-ray.rst	Update dask version for Ray 1.12.0 (#23197 )	2022-03-15 19:22:19 -07:00
dataset-ml-preprocessing.rst	[Datasets] [Docs] Datasets library branding + positioning tweaks (#22067 )	2022-02-05 16:59:34 -08:00
dataset-tensor-support.rst	Make a pass fixing Dataset API issues (#22886 )	2022-03-08 13:07:55 -08:00
dataset.rst	Add random access support for Datasets (experimental feature) (#22749 )	2022-03-17 15:01:12 -07:00
getting-started.rst	Improve actor pool support in Datasets (#22574 )	2022-02-24 12:01:36 -08:00
integrations.rst	Move the third-party data integrations (non-Dataset stuff) out of the user guides which is for Dataset (#23162 )	2022-03-17 11:27:40 -07:00
key-concepts.rst	Document Dataset pipeline stage fusion (#22737 )	2022-03-01 14:38:09 -08:00
mars-on-ray.rst	[Docs] Ray Data docs target state (#21931 )	2022-01-27 13:14:36 -08:00
package-ref.rst	Add random access support for Datasets (experimental feature) (#22749 )	2022-03-17 15:01:12 -07:00
performance-tips.rst	[Docs] Ray Data docs target state (#21931 )	2022-01-27 13:14:36 -08:00
random-access.rst	Add random access support for Datasets (experimental feature) (#22749 )	2022-03-17 15:01:12 -07:00
raydp.rst	[Docs] Ray Data docs target state (#21931 )	2022-01-27 13:14:36 -08:00
user-guide.rst	Add random access support for Datasets (experimental feature) (#22749 )	2022-03-17 15:01:12 -07:00