ray/release/nightly_tests
Eric Liang 015181ab9a
Add random access support for Datasets (experimental feature) (#22749)
This PR adds experimental support for random access to datasets. A Dataset can be random access enabled by calling `ds.to_random_access_dataset(key, num_workers=N)`. This creates a RandomAccessDataset.

RandomAccessDataset partitions the dataset across the cluster by the given sort key, providing efficient random access to records via binary search. A number of worker actors are created, each of which has zero-copy access to the underlying sorted data blocks of the Dataset.

Performance-wise, you can expect each worker to provide ~3000 records / second via ``get_async()``, and ~10000 records / second via ``multiget()``.

Since Ray actor calls go direct from worker->worker, throughput scales linearly with the number of workers.
2022-03-17 15:01:12 -07:00
..
chaos_test [CI] Format Python code with Black (#21975) 2022-01-29 18:41:57 -08:00
dask_on_ray [ci/release] Always use full cluster address (#23067) 2022-03-11 16:31:21 +00:00
dataset Add random access support for Datasets (experimental feature) (#22749) 2022-03-17 15:01:12 -07:00
decision_tree [CI] Format Python code with Black (#21975) 2022-01-29 18:41:57 -08:00
many_nodes_tests [CI] Format Python code with Black (#21975) 2022-01-29 18:41:57 -08:00
placement_group_tests [placement group] fix pg benchmark regression #22441 2022-02-16 16:24:51 -08:00
shuffle [CI] Format Python code with Black (#21975) 2022-01-29 18:41:57 -08:00
stress_tests [CI] Format Python code with Black (#21975) 2022-01-29 18:41:57 -08:00
setup_chaos.py [CI] Format Python code with Black (#21975) 2022-01-29 18:41:57 -08:00