ray/doc/source/data
matthewdeng ba0a2a022a
[datasets] add Dataset.randomize_block_order (#25568)
This exposes a low-cost way to perform a pseudo global shuffle.

For extremely large datasets that span multiple nodes, contiguous blocks will often be colocated on the same node. This leads to hot spots during iteration of the dataset in which single nodes (1) must send a lot of data over the network, and (2) perform lots of disk reads if the dataset is spilled to disk.

This allows the workload to be spread across the nodes on which the dataset blocks are on.
2022-06-08 18:39:15 -07:00
..
doc_code Fix hyperlink in rst doc (#25427) 2022-06-08 13:46:23 -07:00
examples [Datasets] Add basic e2e Datasets example on NYC taxi dataset (#24874) 2022-05-19 12:54:25 -07:00
images [minor] Fix incorrect link to ray core user guide (#23316) 2022-03-17 20:58:56 -07:00
modin Fix broken links in documentation and put linkcheck linter in place on CI (#23340) 2022-03-18 21:02:52 -07:00
accessing-datasets.rst [Datasets] Overhaul "Accessing Datasets" feature guide. (#24963) 2022-05-19 12:50:00 -07:00
advanced-pipelines.rst [data] [docs] Doc audit-- rebalance basic vs advanced materials (#25262) 2022-06-01 13:50:46 -07:00
big_data_ingestion.yaml Revert "[docs] Clean up doc structure (first part) (#21667)" (#21763) 2022-01-20 15:30:56 -08:00
creating-datasets.rst Fix range_arrow(), which is replaced by range_table() (#25036) 2022-05-20 19:24:49 -07:00
custom-data.rst [Datasets] Overhaul of "Creating Datasets" feature guide. (#24831) 2022-05-17 16:23:42 -07:00
dask-on-ray.rst Update dask version for Ray 1.12.0 (#23197) 2022-03-15 19:22:19 -07:00
dataset-ml-preprocessing.rst [datasets] add Dataset.randomize_block_order (#25568) 2022-06-08 18:39:15 -07:00
dataset-tensor-support.rst [Datasets] Unrevert "[Datasets] [Tensor Story - 1/2] Automatically provide tensor views to UDFs and infer tensor blocks for pure-tensor datasets. (#25031)" (#25531) 2022-06-08 10:33:25 -07:00
dataset.rst [Docs] Add "Examples" block to Ray Data landing page, and consistently use bold font (#24994) 2022-05-23 21:22:00 -07:00
faq.rst Proofread the some datasets docs (#25068) 2022-05-22 12:11:51 -07:00
getting-started.rst Revamp the Getting Started page for Dataset (#24860) 2022-05-18 13:46:23 -07:00
integrations.rst Revamp the Getting Started page for Dataset (#24860) 2022-05-18 13:46:23 -07:00
key-concepts.rst [Datasets] [Docs] Add docs about fault tolerance in Datasets (#25371) 2022-06-02 15:53:50 -07:00
mars-on-ray.rst [Datasets] Integrate Mars-on-Ray with Datasets; improve docs and add tests (#23402) 2022-04-29 09:43:52 -07:00
memory-management.rst [data] [docs] Doc audit-- rebalance basic vs advanced materials (#25262) 2022-06-01 13:50:46 -07:00
package-ref.rst [Data] Add partitioning classes to Data API reference (#24203) 2022-05-23 09:34:41 -07:00
performance-tips.rst Proofread the some datasets docs (#25068) 2022-05-22 12:11:51 -07:00
pipelining-compute.rst [data] [docs] Doc audit-- rebalance basic vs advanced materials (#25262) 2022-06-01 13:50:46 -07:00
random-access.rst [Datasets] Overhaul "Accessing Datasets" feature guide. (#24963) 2022-05-19 12:50:00 -07:00
raydp.rst [Docs] Ray Data docs target state (#21931) 2022-01-27 13:14:36 -08:00
saving-datasets.rst Revamp the Transforming Datasets user guide (#25033) 2022-05-20 19:25:06 -07:00
transforming-datasets.rst Fix hyperlink in rst doc (#25427) 2022-06-08 13:46:23 -07:00
user-guide.rst [data] [docs] Doc audit-- rebalance basic vs advanced materials (#25262) 2022-06-01 13:50:46 -07:00