[docs] Improve the "Why Ray" and "Why AIR" sections of the docs (#27480)

2025-03-04 17:41:43 -05:00 · 2022-08-05 18:42:45 -07:00 · 2022-08-05 18:42:45 -07:00 · 9b467e3954
commit 9b467e3954
parent a6b9019d38
12 changed files with 95 additions and 75 deletions
--- a/README.rst
+++ b/README.rst
@ -35,7 +35,7 @@ Or more about `Ray Core`_ and its key abstractions:
 - `Actors`_: Stateful worker processes created in the cluster.
 - `Objects`_: Immutable values accessible across the cluster.

-Ray runs on any machine, cluster, cloud provider, and Kubernetes, and also features a growing
+Ray runs on any machine, cluster, cloud provider, and Kubernetes, and features a growing
 `ecosystem of community integrations`_.

 Install Ray with: ``pip install ray``. For nightly wheels, see the
@ -49,6 +49,16 @@ Install Ray with: ``pip install ray``. For nightly wheels, see the
 .. _`RLlib`: https://docs.ray.io/en/latest/rllib/index.html
 .. _`ecosystem of community integrations`: https://docs.ray.io/en/latest/ray-overview/ray-libraries.html

+
+Why Ray?
+--------
+
+Today's ML workloads are increasingly compute-intensive. As convenient as they are, single-node development environments such as your laptop cannot scale to meet these demands.
+
+Ray is a unified way to scale Python and AI applications from a laptop to a cluster.
+
+With Ray, you can seamlessly scale the same code from a laptop to a cluster. Ray is designed to be general-purpose, meaning that it can performantly run any kind of workload. If your application is written in Python, you can scale it with Ray, no other infrastructure required.
+
 More Information
 ----------------

--- a/doc/source/data/dataset-tensor-support.rst
+++ b/doc/source/data/dataset-tensor-support.rst
@ -194,7 +194,7 @@ Because Tensor datasets rely on Datasets-specific extension types, they can only
 .. _disable_tensor_extension_casting:

 Disabling Tensor Extension Casting
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+----------------------------------

 To disable automatic casting of Pandas and Arrow arrays to
 :class:`TensorArray <ray.data.extensions.tensor_extension.TensorArray>`, run the code
--- a/doc/source/index.md
+++ b/doc/source/index.md
@ -103,9 +103,17 @@ Or more about [Ray Core](ray-core/walkthrough) and its key abstractions:
 - [Actors](ray-core/actors): Stateful worker processes created in the cluster.
 - [Objects](ray-core/objects): Immutable values accessible across the cluster.

-Ray runs on any machine, cluster, cloud provider, and Kubernetes, and also features a growing
+Ray runs on any machine, cluster, cloud provider, and Kubernetes, and features a growing
 [ecosystem of community integrations](ray-overview/ray-libraries).

+## Why Ray?
+
+Today's ML workloads are increasingly compute-intensive. As convenient as they are, single-node development environments such as your laptop cannot scale to meet these demands.
+
+Ray is a unified way to scale Python and AI applications from a laptop to a cluster.
+
+With Ray, you can seamlessly scale the same code from a laptop to a cluster. Ray is designed to be general-purpose, meaning that it can performantly run any kind of workload. If your application is written in Python, you can scale it with Ray, no other infrastructure required.
+
 ## How to get involved?

 Ray is more than a framework for distributed applications but also an active community of developers, researchers, and folks that love machine learning.
--- a/doc/source/ray-air/examples/pytorch_tabular_starter.py
+++ b/doc/source/ray-air/examples/pytorch_tabular_starter.py
@ -3,7 +3,6 @@

 # __air_generic_preprocess_start__
 import ray
-from ray.data.preprocessors import StandardScaler

 # Load data.
 dataset = ray.data.read_csv("s3://anonymous@air-example-data/breast_cancer.csv")
@ -15,21 +14,16 @@ train_dataset, valid_dataset = dataset.train_test_split(test_size=0.3)
 test_dataset = valid_dataset.map_batches(
    lambda df: df.drop("target", axis=1), batch_format="pandas"
 )
-
-# Create a preprocessor to scale some columns
-columns_to_scale = ["mean radius", "mean texture"]
-preprocessor = StandardScaler(columns=columns_to_scale)
 # __air_generic_preprocess_end__

 # __air_pytorch_preprocess_start__
 import numpy as np
-import pandas as pd

-from ray.data.preprocessors import Concatenator, Chain
+from ray.data.preprocessors import Concatenator, Chain, StandardScaler

-# Chain the preprocessors together.
+# Create a preprocessor to scale some columns and concatenate the result.
 preprocessor = Chain(
-    preprocessor,
+    StandardScaler(columns=["mean radius", "mean texture"]),
    Concatenator(exclude=["target"], dtype=np.float32),
 )
 # __air_pytorch_preprocess_end__
@ -161,4 +155,5 @@ predicted_probabilities = batch_predictor.predict(test_dataset)
 predicted_probabilities.show()
 # {'predictions': array([1.], dtype=float32)}
 # {'predictions': array([0.], dtype=float32)}
+# ...
 # __air_pytorch_batchpred_end__
--- a/doc/source/ray-air/examples/tf_tabular_starter.py
+++ b/doc/source/ray-air/examples/tf_tabular_starter.py
@ -3,10 +3,8 @@

 # __air_generic_preprocess_start__
 import ray
-from ray.data.preprocessors import StandardScaler
-from ray.air.config import ScalingConfig
-

+# Load data.
 dataset = ray.data.read_csv("s3://anonymous@air-example-data/breast_cancer.csv")

 # Split data into train and validation.
@ -16,21 +14,16 @@ train_dataset, valid_dataset = dataset.train_test_split(test_size=0.3)
 test_dataset = valid_dataset.map_batches(
    lambda df: df.drop("target", axis=1), batch_format="pandas"
 )
-
-# Create a preprocessor to scale some columns
-columns_to_scale = ["mean radius", "mean texture"]
-preprocessor = StandardScaler(columns=columns_to_scale)
 # __air_generic_preprocess_end__

 # __air_tf_preprocess_start__
 import numpy as np
-import pandas as pd

-from ray.data.preprocessors import Concatenator, Chain
+from ray.data.preprocessors import Concatenator, Chain, StandardScaler

-# Chain the preprocessors together.
+# Create a preprocessor to scale some columns and concatenate the result.
 preprocessor = Chain(
-    preprocessor,
+    StandardScaler(columns=["mean radius", "mean texture"]),
    Concatenator(exclude=["target"], dtype=np.float32),
 )
 # __air_tf_preprocess_end__
--- a/doc/source/ray-air/examples/xgboost_starter.py
+++ b/doc/source/ray-air/examples/xgboost_starter.py
@ -3,8 +3,6 @@

 # __air_generic_preprocess_start__
 import ray
-from ray.data.preprocessors import StandardScaler
-from ray.air.config import ScalingConfig

 # Load data.
 dataset = ray.data.read_csv("s3://anonymous@air-example-data/breast_cancer.csv")
@ -16,16 +14,18 @@ train_dataset, valid_dataset = dataset.train_test_split(test_size=0.3)
 test_dataset = valid_dataset.map_batches(
    lambda df: df.drop("target", axis=1), batch_format="pandas"
 )
-
-# Create a preprocessor to scale some columns
-columns_to_scale = ["mean radius", "mean texture"]
-preprocessor = StandardScaler(columns=columns_to_scale)
 # __air_generic_preprocess_end__

+# __air_xgb_preprocess_start__
+# Create a preprocessor to scale some columns.
+from ray.data.preprocessors import StandardScaler
+
+preprocessor = StandardScaler(columns=["mean radius", "mean texture"])
+# __air_xgb_preprocess_end__

 # __air_xgb_train_start__
-from ray.train.xgboost import XGBoostTrainer
 from ray.air.config import ScalingConfig
+from ray.train.xgboost import XGBoostTrainer

 trainer = XGBoostTrainer(
    scaling_config=ScalingConfig(
@ -84,4 +84,5 @@ predicted_probabilities.show()
 # {'predictions': 0.9970690608024597}
 # {'predictions': 0.9943051934242249}
 # {'predictions': 0.00334902573376894}
+# ...
 # __air_xgb_batchpred_end__
--- a/doc/source/ray-air/getting-started.rst
+++ b/doc/source/ray-air/getting-started.rst
@ -9,13 +9,54 @@ Ray AI Runtime (AIR)

 Ray AI Runtime (AIR) is a scalable and unified toolkit for ML applications. AIR enables easy scaling of individual workloads, end-to-end workflows, and popular ecosystem frameworks, all in just Python.

+..
+  https://docs.google.com/drawings/d/1atB1dLjZIi8ibJ2-CoHdd3Zzyl_hDRWyK2CJAVBBLdU/edit
+
 .. image:: images/ray-air.svg

 AIR comes with ready-to-use libraries for :ref:`Preprocessing <datasets>`, :ref:`Training <train-docs>`, :ref:`Tuning <tune-main>`, :ref:`Scoring <air-predictors>`, :ref:`Serving <rayserve>`, and :ref:`Reinforcement Learning <rllib-index>`, as well as an ecosystem of integrations.

-Ray AIR focuses on the compute aspects of ML:
- * It provides scalability by leveraging Ray’s distributed compute layer for ML workloads.
- * It is designed to interoperate with other systems for storage and metadata needs.
+Why AIR?
+--------
+
+Ray AIR aims to simplify the ecosystem of machine learning frameworks, platforms, and tools. It does this by leveraging Ray to provide a seamless, unified, and open experience for scalable ML:
+
+.. image:: images/why-air-2.svg
+
+..
+  https://docs.google.com/drawings/d/1oi_JwNHXVgtR_9iTdbecquesUd4hOk0dWgHaTaFj6gk/edit
+
+**1. Seamless Dev to Prod**: AIR reduces friction going from development to production. With Ray and AIR, the same Python code scales seamlessly from a laptop to a large cluster.
+
+**2. Unified ML API**: AIR's unified ML API enables swapping between popular frameworks, such as XGBoost, PyTorch, and HuggingFace, with just a single class change in your code.
+
+**3. Open and Extensible**: AIR and Ray are fully open-source and can run on any cluster, cloud, or Kubernetes. Build custom components and integrations on top of scalable developer APIs.
+
+When to use AIR?
+----------------
+
+AIR is for both data scientists and ML engineers alike.
+
+.. image:: images/when-air.svg
+
+..
+  https://docs.google.com/drawings/d/1Qw_h457v921jWQkx63tmKAsOsJ-qemhwhCZvhkxWrWo/edit
+
+For data scientists, AIR can be used to scale individual workloads, and also end-to-end ML applications. For ML Engineers, AIR provides scalable platform abstractions that can be used to easily onboard and integrate tooling from the broader ML ecosystem.
+
+Quick Start
+-----------
+
+Below, we walk through how AIR's unified ML API enables scaling of end-to-end ML workflows, focusing on
+a few of the popular frameworks AIR integrates with (XGBoost, Pytorch, and Tensorflow). The ML workflow we're going to build is summarized by the following diagram:
+
+..
+  https://docs.google.com/drawings/d/1z0r_Yc7-0NAPVsP2jWUkLV2jHVHdcJHdt9uN1GDANSY/edit
+
+.. figure:: images/why-air.svg
+
+  AIR provides a unified API for the ML ecosystem.
+  This diagram shows how AIR enables an ecosystem of libraries to be run at scale in just a few lines of code.

 Get started by installing Ray AIR:

@ -31,30 +72,24 @@ Get started by installing Ray AIR:
    pip install -U tensorflow>=2.6.2
    pip install -U pyarrow>=6.0.1

-Quick Start
-----------
-
-Below, we demonstrate how AIR enables simple scaling of end-to-end ML workflows, focusing on
-a few of the popular frameworks AIR integrates with (XGBoost, Pytorch, and Tensorflow):
-
 Preprocessing
 ~~~~~~~~~~~~~

-Below, let's start by preprocessing your data with Ray AIR's ``Preprocessors``:
+First, let's start by loading a dataset from storage:

 .. literalinclude:: examples/xgboost_starter.py
    :language: python
    :start-after: __air_generic_preprocess_start__
    :end-before: __air_generic_preprocess_end__

-If using Tensorflow or Pytorch, format your data for use with your training framework:
+Then, we define a ``Preprocessor`` pipeline for our task:

 .. tabbed:: XGBoost

-    .. code-block:: python
-        
-        # No extra preprocessing is required for XGBoost.
-        # The data is already in the correct format.
+    .. literalinclude:: examples/xgboost_starter.py
+        :language: python
+        :start-after: __air_xgb_preprocess_start__
+        :end-before: __air_xgb_preprocess_end__

 .. tabbed:: Pytorch

@ -155,38 +190,13 @@ Use the trained model for scalable batch prediction with a ``BatchPredictor``.
        :start-after: __air_tf_batchpred_start__
        :end-before: __air_tf_batchpred_end__

-Why Ray AIR?
------------

-Ray AIR aims to simplify the ecosystem of machine learning frameworks, platforms, and tools. It does this by taking a scalable, single-system approach to ML infrastructure (i.e., leveraging Ray as a unified compute framework):
+Project Status
+--------------

-**1. Seamless Dev to Prod**: AIR reduces friction going from development to production. Traditional orchestration approaches introduce separate systems and operational overheads. With Ray and AIR, the same Python code scales seamlessly from a laptop to a large cluster.
+AIR is currently in **beta**. If you have questions for the team or are interested in getting involved in the development process, fill out `this short form <https://forms.gle/wCCdbaQDtgErYycT6>`__.

-**2. Unified API**: Want to switch between frameworks like XGBoost and PyTorch, or try out a new library like HuggingFace? Thanks to the flexibility of AIR, you can do this by just swapping out a single class, without needing to set up new systems or change other aspects of your workflow.
-
-**3. Open and Evolvable**: Ray core and libraries are fully open-source and can run on any cluster, cloud, or Kubernetes, reducing the costs of platform lock-in. Want to go out of the box? Run any framework you want using AIR's integration APIs, or build advanced use cases directly on Ray core.
-
-.. figure:: images/why-air.png
-
-  AIR enables a single-system / single-script approach to scaling ML. Ray's
-  distributed Python APIs enable scaling of ML workloads without the burden of
-  setting up or orchestrating separate distributed systems.
-
-AIR is for both data scientists and ML engineers. Consider using AIR when you want to:
- * Scale a single workload.
- * Scale end-to-end ML applications.
- * Build a custom ML platform for your organization.
-
-AIR Ecosystem
-------------
-
-AIR comes with built-in integrations with the most popular ecosystem libraries. The following diagram provides an overview of the AIR libraries, ecosystem integrations, and their readiness.
-AIR's developer APIs also enable *custom integrations* to be easily created.
-
-..
-  https://docs.google.com/drawings/d/1pZkRrkAbRD8jM-xlGlAaVo3T66oBQ_HpsCzomMT7OIc/edit
-
-.. image:: images/air-ecosystem.svg
+For an overview of the AIR libraries, ecosystem integrations, and their readiness, check out the latest `AIR ecosystem map <https://docs.ray.io/en/master/_images/air-ecosystem.svg>`_.

 Next Steps
 ----------
--- a/doc/source/ray-air/images/ray-air.svg
+++ b/doc/source/ray-air/images/ray-air.svg
--- a/doc/source/ray-air/images/when-air.svg
+++ b/doc/source/ray-air/images/when-air.svg
--- a/doc/source/ray-air/images/why-air-2.svg
+++ b/doc/source/ray-air/images/why-air-2.svg
--- a/doc/source/ray-air/images/why-air.png
+++ b/doc/source/ray-air/images/why-air.png
--- a/doc/source/ray-air/images/why-air.svg
+++ b/doc/source/ray-air/images/why-air.svg