hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-07 02:51:39 -05:00

Author	SHA1	Message	Date
Clark Zinzow	399334d53c	[Datasets] Overhaul "Accessing Datasets" feature guide. (#24963 ) This PR overhauls the "Accessing Datasets", adding proper coverage of each data consuming methods, including the ML framework exchange APIs (to_torch() and to_tf()).	2022-05-19 12:50:00 -07:00
Clark Zinzow	ef870e936c	[Datasets] Change `range_arrow()` API to `range_table()` (#24704 ) This PR changes the ray.data.range_arrow() to ray.data.range_table(), making the Arrow representation an implementation detail.	2022-05-17 01:09:45 -07:00
Eric Liang	858d607b19	[data] Fix small doc issues (#23813 )	2022-04-09 12:09:08 -07:00
Eric Liang	08dc31e747	[minor] Fix incorrect link to ray core user guide (#23316 )	2022-03-17 20:58:56 -07:00
Eric Liang	015181ab9a	Add random access support for Datasets (experimental feature) (#22749 ) This PR adds experimental support for random access to datasets. A Dataset can be random access enabled by calling `ds.to_random_access_dataset(key, num_workers=N)`. This creates a RandomAccessDataset. RandomAccessDataset partitions the dataset across the cluster by the given sort key, providing efficient random access to records via binary search. A number of worker actors are created, each of which has zero-copy access to the underlying sorted data blocks of the Dataset. Performance-wise, you can expect each worker to provide ~3000 records / second via ``get_async()``, and ~10000 records / second via ``multiget()``. Since Ray actor calls go direct from worker->worker, throughput scales linearly with the number of workers.	2022-03-17 15:01:12 -07:00

5 commits