mirror of
https://github.com/vale981/ray
synced 2025-03-04 17:41:43 -05:00
[docs] Improve the "Why Ray" and "Why AIR" sections of the docs (#27480)
This commit is contained in:
parent
a6b9019d38
commit
9b467e3954
12 changed files with 95 additions and 75 deletions
12
README.rst
12
README.rst
|
@ -35,7 +35,7 @@ Or more about `Ray Core`_ and its key abstractions:
|
|||
- `Actors`_: Stateful worker processes created in the cluster.
|
||||
- `Objects`_: Immutable values accessible across the cluster.
|
||||
|
||||
Ray runs on any machine, cluster, cloud provider, and Kubernetes, and also features a growing
|
||||
Ray runs on any machine, cluster, cloud provider, and Kubernetes, and features a growing
|
||||
`ecosystem of community integrations`_.
|
||||
|
||||
Install Ray with: ``pip install ray``. For nightly wheels, see the
|
||||
|
@ -49,6 +49,16 @@ Install Ray with: ``pip install ray``. For nightly wheels, see the
|
|||
.. _`RLlib`: https://docs.ray.io/en/latest/rllib/index.html
|
||||
.. _`ecosystem of community integrations`: https://docs.ray.io/en/latest/ray-overview/ray-libraries.html
|
||||
|
||||
|
||||
Why Ray?
|
||||
--------
|
||||
|
||||
Today's ML workloads are increasingly compute-intensive. As convenient as they are, single-node development environments such as your laptop cannot scale to meet these demands.
|
||||
|
||||
Ray is a unified way to scale Python and AI applications from a laptop to a cluster.
|
||||
|
||||
With Ray, you can seamlessly scale the same code from a laptop to a cluster. Ray is designed to be general-purpose, meaning that it can performantly run any kind of workload. If your application is written in Python, you can scale it with Ray, no other infrastructure required.
|
||||
|
||||
More Information
|
||||
----------------
|
||||
|
||||
|
|
|
@ -194,7 +194,7 @@ Because Tensor datasets rely on Datasets-specific extension types, they can only
|
|||
.. _disable_tensor_extension_casting:
|
||||
|
||||
Disabling Tensor Extension Casting
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
----------------------------------
|
||||
|
||||
To disable automatic casting of Pandas and Arrow arrays to
|
||||
:class:`TensorArray <ray.data.extensions.tensor_extension.TensorArray>`, run the code
|
||||
|
|
|
@ -103,9 +103,17 @@ Or more about [Ray Core](ray-core/walkthrough) and its key abstractions:
|
|||
- [Actors](ray-core/actors): Stateful worker processes created in the cluster.
|
||||
- [Objects](ray-core/objects): Immutable values accessible across the cluster.
|
||||
|
||||
Ray runs on any machine, cluster, cloud provider, and Kubernetes, and also features a growing
|
||||
Ray runs on any machine, cluster, cloud provider, and Kubernetes, and features a growing
|
||||
[ecosystem of community integrations](ray-overview/ray-libraries).
|
||||
|
||||
## Why Ray?
|
||||
|
||||
Today's ML workloads are increasingly compute-intensive. As convenient as they are, single-node development environments such as your laptop cannot scale to meet these demands.
|
||||
|
||||
Ray is a unified way to scale Python and AI applications from a laptop to a cluster.
|
||||
|
||||
With Ray, you can seamlessly scale the same code from a laptop to a cluster. Ray is designed to be general-purpose, meaning that it can performantly run any kind of workload. If your application is written in Python, you can scale it with Ray, no other infrastructure required.
|
||||
|
||||
## How to get involved?
|
||||
|
||||
Ray is more than a framework for distributed applications but also an active community of developers, researchers, and folks that love machine learning.
|
||||
|
|
|
@ -3,7 +3,6 @@
|
|||
|
||||
# __air_generic_preprocess_start__
|
||||
import ray
|
||||
from ray.data.preprocessors import StandardScaler
|
||||
|
||||
# Load data.
|
||||
dataset = ray.data.read_csv("s3://anonymous@air-example-data/breast_cancer.csv")
|
||||
|
@ -15,21 +14,16 @@ train_dataset, valid_dataset = dataset.train_test_split(test_size=0.3)
|
|||
test_dataset = valid_dataset.map_batches(
|
||||
lambda df: df.drop("target", axis=1), batch_format="pandas"
|
||||
)
|
||||
|
||||
# Create a preprocessor to scale some columns
|
||||
columns_to_scale = ["mean radius", "mean texture"]
|
||||
preprocessor = StandardScaler(columns=columns_to_scale)
|
||||
# __air_generic_preprocess_end__
|
||||
|
||||
# __air_pytorch_preprocess_start__
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
|
||||
from ray.data.preprocessors import Concatenator, Chain
|
||||
from ray.data.preprocessors import Concatenator, Chain, StandardScaler
|
||||
|
||||
# Chain the preprocessors together.
|
||||
# Create a preprocessor to scale some columns and concatenate the result.
|
||||
preprocessor = Chain(
|
||||
preprocessor,
|
||||
StandardScaler(columns=["mean radius", "mean texture"]),
|
||||
Concatenator(exclude=["target"], dtype=np.float32),
|
||||
)
|
||||
# __air_pytorch_preprocess_end__
|
||||
|
@ -161,4 +155,5 @@ predicted_probabilities = batch_predictor.predict(test_dataset)
|
|||
predicted_probabilities.show()
|
||||
# {'predictions': array([1.], dtype=float32)}
|
||||
# {'predictions': array([0.], dtype=float32)}
|
||||
# ...
|
||||
# __air_pytorch_batchpred_end__
|
||||
|
|
|
@ -3,10 +3,8 @@
|
|||
|
||||
# __air_generic_preprocess_start__
|
||||
import ray
|
||||
from ray.data.preprocessors import StandardScaler
|
||||
from ray.air.config import ScalingConfig
|
||||
|
||||
|
||||
# Load data.
|
||||
dataset = ray.data.read_csv("s3://anonymous@air-example-data/breast_cancer.csv")
|
||||
|
||||
# Split data into train and validation.
|
||||
|
@ -16,21 +14,16 @@ train_dataset, valid_dataset = dataset.train_test_split(test_size=0.3)
|
|||
test_dataset = valid_dataset.map_batches(
|
||||
lambda df: df.drop("target", axis=1), batch_format="pandas"
|
||||
)
|
||||
|
||||
# Create a preprocessor to scale some columns
|
||||
columns_to_scale = ["mean radius", "mean texture"]
|
||||
preprocessor = StandardScaler(columns=columns_to_scale)
|
||||
# __air_generic_preprocess_end__
|
||||
|
||||
# __air_tf_preprocess_start__
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
|
||||
from ray.data.preprocessors import Concatenator, Chain
|
||||
from ray.data.preprocessors import Concatenator, Chain, StandardScaler
|
||||
|
||||
# Chain the preprocessors together.
|
||||
# Create a preprocessor to scale some columns and concatenate the result.
|
||||
preprocessor = Chain(
|
||||
preprocessor,
|
||||
StandardScaler(columns=["mean radius", "mean texture"]),
|
||||
Concatenator(exclude=["target"], dtype=np.float32),
|
||||
)
|
||||
# __air_tf_preprocess_end__
|
||||
|
|
|
@ -3,8 +3,6 @@
|
|||
|
||||
# __air_generic_preprocess_start__
|
||||
import ray
|
||||
from ray.data.preprocessors import StandardScaler
|
||||
from ray.air.config import ScalingConfig
|
||||
|
||||
# Load data.
|
||||
dataset = ray.data.read_csv("s3://anonymous@air-example-data/breast_cancer.csv")
|
||||
|
@ -16,16 +14,18 @@ train_dataset, valid_dataset = dataset.train_test_split(test_size=0.3)
|
|||
test_dataset = valid_dataset.map_batches(
|
||||
lambda df: df.drop("target", axis=1), batch_format="pandas"
|
||||
)
|
||||
|
||||
# Create a preprocessor to scale some columns
|
||||
columns_to_scale = ["mean radius", "mean texture"]
|
||||
preprocessor = StandardScaler(columns=columns_to_scale)
|
||||
# __air_generic_preprocess_end__
|
||||
|
||||
# __air_xgb_preprocess_start__
|
||||
# Create a preprocessor to scale some columns.
|
||||
from ray.data.preprocessors import StandardScaler
|
||||
|
||||
preprocessor = StandardScaler(columns=["mean radius", "mean texture"])
|
||||
# __air_xgb_preprocess_end__
|
||||
|
||||
# __air_xgb_train_start__
|
||||
from ray.train.xgboost import XGBoostTrainer
|
||||
from ray.air.config import ScalingConfig
|
||||
from ray.train.xgboost import XGBoostTrainer
|
||||
|
||||
trainer = XGBoostTrainer(
|
||||
scaling_config=ScalingConfig(
|
||||
|
@ -84,4 +84,5 @@ predicted_probabilities.show()
|
|||
# {'predictions': 0.9970690608024597}
|
||||
# {'predictions': 0.9943051934242249}
|
||||
# {'predictions': 0.00334902573376894}
|
||||
# ...
|
||||
# __air_xgb_batchpred_end__
|
||||
|
|
|
@ -9,13 +9,54 @@ Ray AI Runtime (AIR)
|
|||
|
||||
Ray AI Runtime (AIR) is a scalable and unified toolkit for ML applications. AIR enables easy scaling of individual workloads, end-to-end workflows, and popular ecosystem frameworks, all in just Python.
|
||||
|
||||
..
|
||||
https://docs.google.com/drawings/d/1atB1dLjZIi8ibJ2-CoHdd3Zzyl_hDRWyK2CJAVBBLdU/edit
|
||||
|
||||
.. image:: images/ray-air.svg
|
||||
|
||||
AIR comes with ready-to-use libraries for :ref:`Preprocessing <datasets>`, :ref:`Training <train-docs>`, :ref:`Tuning <tune-main>`, :ref:`Scoring <air-predictors>`, :ref:`Serving <rayserve>`, and :ref:`Reinforcement Learning <rllib-index>`, as well as an ecosystem of integrations.
|
||||
|
||||
Ray AIR focuses on the compute aspects of ML:
|
||||
* It provides scalability by leveraging Ray’s distributed compute layer for ML workloads.
|
||||
* It is designed to interoperate with other systems for storage and metadata needs.
|
||||
Why AIR?
|
||||
--------
|
||||
|
||||
Ray AIR aims to simplify the ecosystem of machine learning frameworks, platforms, and tools. It does this by leveraging Ray to provide a seamless, unified, and open experience for scalable ML:
|
||||
|
||||
.. image:: images/why-air-2.svg
|
||||
|
||||
..
|
||||
https://docs.google.com/drawings/d/1oi_JwNHXVgtR_9iTdbecquesUd4hOk0dWgHaTaFj6gk/edit
|
||||
|
||||
**1. Seamless Dev to Prod**: AIR reduces friction going from development to production. With Ray and AIR, the same Python code scales seamlessly from a laptop to a large cluster.
|
||||
|
||||
**2. Unified ML API**: AIR's unified ML API enables swapping between popular frameworks, such as XGBoost, PyTorch, and HuggingFace, with just a single class change in your code.
|
||||
|
||||
**3. Open and Extensible**: AIR and Ray are fully open-source and can run on any cluster, cloud, or Kubernetes. Build custom components and integrations on top of scalable developer APIs.
|
||||
|
||||
When to use AIR?
|
||||
----------------
|
||||
|
||||
AIR is for both data scientists and ML engineers alike.
|
||||
|
||||
.. image:: images/when-air.svg
|
||||
|
||||
..
|
||||
https://docs.google.com/drawings/d/1Qw_h457v921jWQkx63tmKAsOsJ-qemhwhCZvhkxWrWo/edit
|
||||
|
||||
For data scientists, AIR can be used to scale individual workloads, and also end-to-end ML applications. For ML Engineers, AIR provides scalable platform abstractions that can be used to easily onboard and integrate tooling from the broader ML ecosystem.
|
||||
|
||||
Quick Start
|
||||
-----------
|
||||
|
||||
Below, we walk through how AIR's unified ML API enables scaling of end-to-end ML workflows, focusing on
|
||||
a few of the popular frameworks AIR integrates with (XGBoost, Pytorch, and Tensorflow). The ML workflow we're going to build is summarized by the following diagram:
|
||||
|
||||
..
|
||||
https://docs.google.com/drawings/d/1z0r_Yc7-0NAPVsP2jWUkLV2jHVHdcJHdt9uN1GDANSY/edit
|
||||
|
||||
.. figure:: images/why-air.svg
|
||||
|
||||
AIR provides a unified API for the ML ecosystem.
|
||||
This diagram shows how AIR enables an ecosystem of libraries to be run at scale in just a few lines of code.
|
||||
|
||||
Get started by installing Ray AIR:
|
||||
|
||||
|
@ -31,30 +72,24 @@ Get started by installing Ray AIR:
|
|||
pip install -U tensorflow>=2.6.2
|
||||
pip install -U pyarrow>=6.0.1
|
||||
|
||||
Quick Start
|
||||
-----------
|
||||
|
||||
Below, we demonstrate how AIR enables simple scaling of end-to-end ML workflows, focusing on
|
||||
a few of the popular frameworks AIR integrates with (XGBoost, Pytorch, and Tensorflow):
|
||||
|
||||
Preprocessing
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
Below, let's start by preprocessing your data with Ray AIR's ``Preprocessors``:
|
||||
First, let's start by loading a dataset from storage:
|
||||
|
||||
.. literalinclude:: examples/xgboost_starter.py
|
||||
:language: python
|
||||
:start-after: __air_generic_preprocess_start__
|
||||
:end-before: __air_generic_preprocess_end__
|
||||
|
||||
If using Tensorflow or Pytorch, format your data for use with your training framework:
|
||||
Then, we define a ``Preprocessor`` pipeline for our task:
|
||||
|
||||
.. tabbed:: XGBoost
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
# No extra preprocessing is required for XGBoost.
|
||||
# The data is already in the correct format.
|
||||
.. literalinclude:: examples/xgboost_starter.py
|
||||
:language: python
|
||||
:start-after: __air_xgb_preprocess_start__
|
||||
:end-before: __air_xgb_preprocess_end__
|
||||
|
||||
.. tabbed:: Pytorch
|
||||
|
||||
|
@ -155,38 +190,13 @@ Use the trained model for scalable batch prediction with a ``BatchPredictor``.
|
|||
:start-after: __air_tf_batchpred_start__
|
||||
:end-before: __air_tf_batchpred_end__
|
||||
|
||||
Why Ray AIR?
|
||||
------------
|
||||
|
||||
Ray AIR aims to simplify the ecosystem of machine learning frameworks, platforms, and tools. It does this by taking a scalable, single-system approach to ML infrastructure (i.e., leveraging Ray as a unified compute framework):
|
||||
Project Status
|
||||
--------------
|
||||
|
||||
**1. Seamless Dev to Prod**: AIR reduces friction going from development to production. Traditional orchestration approaches introduce separate systems and operational overheads. With Ray and AIR, the same Python code scales seamlessly from a laptop to a large cluster.
|
||||
AIR is currently in **beta**. If you have questions for the team or are interested in getting involved in the development process, fill out `this short form <https://forms.gle/wCCdbaQDtgErYycT6>`__.
|
||||
|
||||
**2. Unified API**: Want to switch between frameworks like XGBoost and PyTorch, or try out a new library like HuggingFace? Thanks to the flexibility of AIR, you can do this by just swapping out a single class, without needing to set up new systems or change other aspects of your workflow.
|
||||
|
||||
**3. Open and Evolvable**: Ray core and libraries are fully open-source and can run on any cluster, cloud, or Kubernetes, reducing the costs of platform lock-in. Want to go out of the box? Run any framework you want using AIR's integration APIs, or build advanced use cases directly on Ray core.
|
||||
|
||||
.. figure:: images/why-air.png
|
||||
|
||||
AIR enables a single-system / single-script approach to scaling ML. Ray's
|
||||
distributed Python APIs enable scaling of ML workloads without the burden of
|
||||
setting up or orchestrating separate distributed systems.
|
||||
|
||||
AIR is for both data scientists and ML engineers. Consider using AIR when you want to:
|
||||
* Scale a single workload.
|
||||
* Scale end-to-end ML applications.
|
||||
* Build a custom ML platform for your organization.
|
||||
|
||||
AIR Ecosystem
|
||||
-------------
|
||||
|
||||
AIR comes with built-in integrations with the most popular ecosystem libraries. The following diagram provides an overview of the AIR libraries, ecosystem integrations, and their readiness.
|
||||
AIR's developer APIs also enable *custom integrations* to be easily created.
|
||||
|
||||
..
|
||||
https://docs.google.com/drawings/d/1pZkRrkAbRD8jM-xlGlAaVo3T66oBQ_HpsCzomMT7OIc/edit
|
||||
|
||||
.. image:: images/air-ecosystem.svg
|
||||
For an overview of the AIR libraries, ecosystem integrations, and their readiness, check out the latest `AIR ecosystem map <https://docs.ray.io/en/master/_images/air-ecosystem.svg>`_.
|
||||
|
||||
Next Steps
|
||||
----------
|
||||
|
|
File diff suppressed because one or more lines are too long
Before Width: | Height: | Size: 685 KiB After Width: | Height: | Size: 685 KiB |
1
doc/source/ray-air/images/when-air.svg
Normal file
1
doc/source/ray-air/images/when-air.svg
Normal file
File diff suppressed because one or more lines are too long
After Width: | Height: | Size: 214 KiB |
1
doc/source/ray-air/images/why-air-2.svg
Normal file
1
doc/source/ray-air/images/why-air-2.svg
Normal file
File diff suppressed because one or more lines are too long
After Width: | Height: | Size: 32 KiB |
Binary file not shown.
Before Width: | Height: | Size: 175 KiB |
1
doc/source/ray-air/images/why-air.svg
Normal file
1
doc/source/ray-air/images/why-air.svg
Normal file
File diff suppressed because one or more lines are too long
After Width: | Height: | Size: 1.8 MiB |
Loading…
Add table
Reference in a new issue