mirror of
https://github.com/vale981/ray
synced 2025-03-06 10:31:39 -05:00
Fix a broken link in Ray Dataset doc (#25927)
Co-authored-by: Myeong Kim <myeongki@amazon.com>
This commit is contained in:
parent
1499af945b
commit
a1a78077ca
1 changed files with 1 additions and 1 deletions
|
@ -46,7 +46,7 @@ Dataset Pipelines
|
|||
-----------------
|
||||
|
||||
|
||||
Datasets execute their transformations synchronously in blocking calls. However, it can be useful to overlap dataset computations with output. This can be done with a `DatasetPipeline <data-pipelines-quick-start>`__.
|
||||
Datasets execute their transformations synchronously in blocking calls. However, it can be useful to overlap dataset computations with output. This can be done with a `DatasetPipeline <package-ref.html#datasetpipeline-api>`__.
|
||||
|
||||
A DatasetPipeline is an unified iterator over a (potentially infinite) sequence of Ray Datasets, each of which represents a *window* over the original data. Conceptually it is similar to a `Spark DStream <https://spark.apache.org/docs/latest/streaming-programming-guide.html#discretized-streams-dstreams>`__, but manages execution over a bounded amount of source data instead of an unbounded stream. Ray computes each dataset window on-demand and stitches their output together into a single logical data iterator. DatasetPipeline implements most of the same transformation and output methods as Datasets (e.g., map, filter, split, iter_rows, to_torch, etc.).
|
||||
|
||||
|
|
Loading…
Add table
Reference in a new issue