[minor] Fix incorrect link to ray core user guide (#23316)

2025-03-05 18:11:42 -05:00 · 2022-03-17 20:58:56 -07:00 · 2022-03-17 20:58:56 -07:00 · 08dc31e747
commit 08dc31e747
parent 1a293a1187
4 changed files with 13 additions and 7 deletions
--- a/doc/source/data/advanced-pipelines.rst
+++ b/doc/source/data/advanced-pipelines.rst
@ -138,7 +138,7 @@ Pre-repeat vs post-repeat transforms

 Transformations made prior to the Dataset prior to the call to ``.repeat()`` are executed once. Transformations made to the DatasetPipeline after the repeat will be executed once for each repetition of the Dataset.

-For example, in the following pipeline, the datasource read only occurs once. However, the random shuffle is applied to each repetition in the pipeline.
+For example, in the following pipeline, the ``map(func)`` transformation only occurs once. However, the random shuffle is applied to each repetition in the pipeline.

 **Code**:

@ -147,6 +147,7 @@ For example, in the following pipeline, the datasource read only occurs once. Ho
    # Create a pipeline that loops over its source dataset indefinitely.
    pipe: DatasetPipeline = ray.data \
        .read_datasource(...) \
+        .map(func) \
        .repeat() \
        .random_shuffle_each_window()

@ -164,6 +165,10 @@ For example, in the following pipeline, the datasource read only occurs once. Ho

 .. image:: images/dataset-repeat-1.svg

+.. important::
+
+    Result caching only applies if there are *transformation* stages prior to the pipelining operation. If you ``repeat()`` or ``window()`` a Dataset right after the read call (e.g., ``ray.data.read_parquet(...).repeat()``), then the read will still be re-executed on each repetition. This optimization saves memory, at the cost of repeated reads from the datasource.
+
 Splitting pipelines for distributed ingest
 ==========================================

--- a/doc/source/data/images/dataset.svg
+++ b/doc/source/data/images/dataset.svg
--- a/doc/source/data/random-access.rst
+++ b/doc/source/data/random-access.rst
@ -35,9 +35,10 @@ Architecture
 RandomAccessDataset spreads its workers evenly across the cluster. Each worker fetches and pins in shared memory all blocks of the sorted source dataset found on its node. In addition, it is ensured that each block is assigned to at least one worker. A central index of block to key-range assignments is computed, which is used to serve lookups.

 Lookups occur as follows:
- First, the id of the block that contains the given key is located via binary search on the central index.
- Second, an actor that has the block pinned is selected (this is done randomly).
- A method call is sent to the actor, which then performs binary search to locate the record for the key.
+
+* First, the id of the block that contains the given key is located via binary search on the central index.
+* Second, an actor that has the block pinned is selected (this is done randomly).
+* A method call is sent to the actor, which then performs binary search to locate the record for the key.

 This means that each random lookup costs ~1 network RTT as well as a small amount of computation on both the client and server side.

--- a/doc/source/index.md
+++ b/doc/source/index.md
@ -138,8 +138,8 @@ Our user guides provide you with in-depth information about how to use Ray's lib
 You will learn about the key concepts and features of Ray and how to use them in practice.
 +++

-{link-badge}`ray-core/using-ray.html,"Core",cls=badge-light`
-{link-badge}`data/user-guide.html,"Core",cls=badge-light`
+{link-badge}`ray-core/user-guide.html,"Core",cls=badge-light`
+{link-badge}`data/user-guide.html,"Data",cls=badge-light`
 {link-badge}`train/user_guide.html,"Train",cls=badge-light`
 {link-badge}`tune/user-guide.html,"Tune",cls=badge-light`
 {link-badge}`serve/tutorial.html,"Serve",cls=badge-light`