ray/release/nightly_tests/dataset at bb4ff42eeca50a5b91dd569caafdd71bf771b66e - hiro/ray

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-10 05:16:49 -04:00

History

Eric Liang 015181ab9a Add random access support for Datasets (experimental feature) (#22749 ) This PR adds experimental support for random access to datasets. A Dataset can be random access enabled by calling `ds.to_random_access_dataset(key, num_workers=N)`. This creates a RandomAccessDataset. RandomAccessDataset partitions the dataset across the cluster by the given sort key, providing efficient random access to records via binary search. A number of worker actors are created, each of which has zero-copy access to the underlying sorted data blocks of the Dataset. Performance-wise, you can expect each worker to provide ~3000 records / second via ``get_async()``, and ~10000 records / second via ``multiget()``. Since Ray actor calls go direct from worker->worker, throughput scales linearly with the number of workers.		2022-03-17 15:01:12 -07:00
..
app_config.yaml	[Dataset][nighlyt-test] pin pyarrow==4.0.1 for dataset related tests (#22277 )	2022-02-10 14:22:41 -08:00
dataset_ingest_400G_compute.yaml	[dataset][cuj2] add another single node ingestion example (#20754 )	2021-12-07 02:50:17 -08:00
dataset_random_access.py	Add random access support for Datasets (experimental feature) (#22749 )	2022-03-17 15:01:12 -07:00
dataset_shuffle_data_loader.py	[CI] Format Python code with Black (#21975 )	2022-01-29 18:41:57 -08:00
inference.py	[CI] Format Python code with Black (#21975 )	2022-01-29 18:41:57 -08:00
inference.yaml	[Dataset] imagenet nightly test (#17069 )	2021-07-16 14:15:49 -07:00
parquet_metadata_resolution.py	[Datasets] Patch Parquet file fragment serialization to prevent metadata fetching. (#22665 )	2022-02-28 15:15:30 -08:00
pipelined_ingestion_app.yaml	[Dataset][nighlyt-test] pin pyarrow==4.0.1 for dataset related tests (#22277 )	2022-02-10 14:22:41 -08:00
pipelined_ingestion_compute.yaml	Don't advertise cpus on gpu nodes for pipelined ingestion tests (#21899 )	2022-01-27 09:17:01 -08:00
pipelined_training.py	Enable stage fusion by default for dataset pipelines (#22476 )	2022-02-23 17:34:05 -08:00
pipelined_training_app.yaml	[Dataset][nighlyt-test] pin pyarrow==4.0.1 for dataset related tests (#22277 )	2022-02-10 14:22:41 -08:00
pipelined_training_compute.yaml	[Dataset][nighlyt-test] spend less money #19488	2021-10-18 18:53:50 -07:00
ray_sgd_runner.py	Round robin during spread scheduling (#21303 )	2022-02-18 15:05:35 -08:00
ray_sgd_training.py	Round robin during spread scheduling (#21303 )	2022-02-18 15:05:35 -08:00
ray_sgd_training_app.yaml	[Dataset][nighlyt-test] pin pyarrow==4.0.1 for dataset related tests (#22277 )	2022-02-10 14:22:41 -08:00
ray_sgd_training_compute.yaml	[CUJ2] add nightly tests for running 500GB ray train (#20195 )	2021-11-21 20:04:45 -08:00
ray_sgd_training_compute_no_gpu.yaml	[nighly-test] update cuj2 to reflect latest change #20889	2021-12-06 09:59:21 -08:00
ray_sgd_training_smoke_compute.yaml	[Dataset][nighlytest] use latest ray for running test #21148	2021-12-17 23:48:44 -08:00
shuffle_app_config.yaml	[Dataset][nighlyt-test] pin pyarrow==4.0.1 for dataset related tests (#22277 )	2022-02-10 14:22:41 -08:00
shuffle_compute.yaml	add dataset shuffle data loader (#17917 )	2021-08-20 11:26:01 -07:00