This PR adds experimental support for random access to datasets. A Dataset can be random access enabled by calling `ds.to_random_access_dataset(key, num_workers=N)`. This creates a RandomAccessDataset.
RandomAccessDataset partitions the dataset across the cluster by the given sort key, providing efficient random access to records via binary search. A number of worker actors are created, each of which has zero-copy access to the underlying sorted data blocks of the Dataset.
Performance-wise, you can expect each worker to provide ~3000 records / second via ``get_async()``, and ~10000 records / second via ``multiget()``.
Since Ray actor calls go direct from worker->worker, throughput scales linearly with the number of workers.
Preview: [docs](https://ray--21931.org.readthedocs.build/en/21931/data/dataset.html)
The Ray Data project's docs now have a clearer structure and have partly been rewritten/modified. In particular we have
- [x] A Getting Started Guide
- [x] An explicit User / How-To Guide
- [x] A dedicated Key Concepts page
- [x] A consistent naming convention in `Ray Data` whenever is is referred to the project.
This surfaces quite clearly that, apart from the "Getting Started" sections, we really only have one real example. Once we have more, we can create an "Example" section like many other sub-projects have. This will be addressed in https://github.com/ray-project/ray/issues/21838.