From a1a78077ca4dca949c2f10f57f2a07ec0848fbdc Mon Sep 17 00:00:00 2001 From: Myeongju Kim Date: Mon, 20 Jun 2022 13:17:46 -0700 Subject: [PATCH] Fix a broken link in Ray Dataset doc (#25927) Co-authored-by: Myeong Kim --- doc/source/data/key-concepts.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/data/key-concepts.rst b/doc/source/data/key-concepts.rst index b7b842d79..92651eaf5 100644 --- a/doc/source/data/key-concepts.rst +++ b/doc/source/data/key-concepts.rst @@ -46,7 +46,7 @@ Dataset Pipelines ----------------- -Datasets execute their transformations synchronously in blocking calls. However, it can be useful to overlap dataset computations with output. This can be done with a `DatasetPipeline `__. +Datasets execute their transformations synchronously in blocking calls. However, it can be useful to overlap dataset computations with output. This can be done with a `DatasetPipeline `__. A DatasetPipeline is an unified iterator over a (potentially infinite) sequence of Ray Datasets, each of which represents a *window* over the original data. Conceptually it is similar to a `Spark DStream `__, but manages execution over a bounded amount of source data instead of an unbounded stream. Ray computes each dataset window on-demand and stitches their output together into a single logical data iterator. DatasetPipeline implements most of the same transformation and output methods as Datasets (e.g., map, filter, split, iter_rows, to_torch, etc.).