Fix a broken link in Ray Dataset doc (#25927)

Co-authored-by: Myeong Kim <myeongki@amazon.com>
This commit is contained in:
Myeongju Kim 2022-06-20 13:17:46 -07:00 committed by GitHub
parent 1499af945b
commit a1a78077ca
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23

View file

@ -46,7 +46,7 @@ Dataset Pipelines
-----------------
Datasets execute their transformations synchronously in blocking calls. However, it can be useful to overlap dataset computations with output. This can be done with a `DatasetPipeline <data-pipelines-quick-start>`__.
Datasets execute their transformations synchronously in blocking calls. However, it can be useful to overlap dataset computations with output. This can be done with a `DatasetPipeline <package-ref.html#datasetpipeline-api>`__.
A DatasetPipeline is an unified iterator over a (potentially infinite) sequence of Ray Datasets, each of which represents a *window* over the original data. Conceptually it is similar to a `Spark DStream <https://spark.apache.org/docs/latest/streaming-programming-guide.html#discretized-streams-dstreams>`__, but manages execution over a bounded amount of source data instead of an unbounded stream. Ray computes each dataset window on-demand and stitches their output together into a single logical data iterator. DatasetPipeline implements most of the same transformation and output methods as Datasets (e.g., map, filter, split, iter_rows, to_torch, etc.).