mirror of
https://github.com/vale981/ray
synced 2025-03-06 02:21:39 -05:00
![]() This PR adds experimental support for random access to datasets. A Dataset can be random access enabled by calling `ds.to_random_access_dataset(key, num_workers=N)`. This creates a RandomAccessDataset. RandomAccessDataset partitions the dataset across the cluster by the given sort key, providing efficient random access to records via binary search. A number of worker actors are created, each of which has zero-copy access to the underlying sorted data blocks of the Dataset. Performance-wise, you can expect each worker to provide ~3000 records / second via ``get_async()``, and ~10000 records / second via ``multiget()``. Since Ray actor calls go direct from worker->worker, throughput scales linearly with the number of workers. |
||
---|---|---|
.. | ||
alerts | ||
benchmarks | ||
golden_notebook_tests | ||
horovod_tests | ||
kubernetes_manual_tests | ||
lightgbm_tests | ||
long_running_distributed_tests | ||
long_running_tests | ||
microbenchmark | ||
ml_user_tests | ||
nightly_tests | ||
ray_release | ||
release_logs | ||
rllib_tests | ||
runtime_env_tests | ||
serve_tests | ||
sgd_tests/sgd_gpu | ||
tune_tests | ||
util | ||
xgboost_tests | ||
__init__.py | ||
BUILD | ||
README.md | ||
release_tests.yaml | ||
requirements.txt | ||
requirements_buildkite.txt | ||
run_release_test.sh | ||
setup.py |
Release Tests
While the exact release process relies on Anyscale internal tooling, the tests we run during the releases are located at https://github.com/ray-project/ray/blob/master/release/.buildkite/build_pipeline.py