2022-05-10 11:06:43 -07:00
.. _air:
2022-06-06 15:15:11 -07:00
Ray AI Runtime (AIR)
====================
2022-03-21 17:20:45 -07:00
2022-06-06 15:15:11 -07:00
.. tip ::
2022-05-13 01:29:59 -07:00
2022-08-03 16:04:04 -07:00
AIR is currently in **beta** . Fill out `this short form <https://forms.gle/wCCdbaQDtgErYycT6> `__ to get involved. We'll be holding office hours, development sprints, and other activities as we get closer to the GA release. Join us!
2022-03-21 17:20:45 -07:00
2022-08-03 16:04:04 -07:00
Ray AI Runtime (AIR) is a scalable and unified toolkit for ML applications. AIR enables easy scaling of individual workloads, end-to-end workflows, and popular ecosystem frameworks, all in just Python.
2022-03-21 17:20:45 -07:00
2022-08-05 18:42:45 -07:00
..
https://docs.google.com/drawings/d/1atB1dLjZIi8ibJ2-CoHdd3Zzyl_hDRWyK2CJAVBBLdU/edit
2022-08-03 16:04:04 -07:00
.. image :: images/ray-air.svg
2022-03-21 17:20:45 -07:00
2022-08-03 16:04:04 -07:00
AIR comes with ready-to-use libraries for :ref: `Preprocessing <datasets>` , :ref: `Training <train-docs>` , :ref: `Tuning <tune-main>` , :ref: `Scoring <air-predictors>` , :ref: `Serving <rayserve>` , and :ref: `Reinforcement Learning <rllib-index>` , as well as an ecosystem of integrations.
2022-03-21 17:20:45 -07:00
2022-08-05 18:42:45 -07:00
Why AIR?
--------
Ray AIR aims to simplify the ecosystem of machine learning frameworks, platforms, and tools. It does this by leveraging Ray to provide a seamless, unified, and open experience for scalable ML:
.. image :: images/why-air-2.svg
..
https://docs.google.com/drawings/d/1oi_JwNHXVgtR_9iTdbecquesUd4hOk0dWgHaTaFj6gk/edit
**1. Seamless Dev to Prod** : AIR reduces friction going from development to production. With Ray and AIR, the same Python code scales seamlessly from a laptop to a large cluster.
**2. Unified ML API** : AIR's unified ML API enables swapping between popular frameworks, such as XGBoost, PyTorch, and HuggingFace, with just a single class change in your code.
**3. Open and Extensible** : AIR and Ray are fully open-source and can run on any cluster, cloud, or Kubernetes. Build custom components and integrations on top of scalable developer APIs.
When to use AIR?
----------------
AIR is for both data scientists and ML engineers alike.
.. image :: images/when-air.svg
..
https://docs.google.com/drawings/d/1Qw_h457v921jWQkx63tmKAsOsJ-qemhwhCZvhkxWrWo/edit
For data scientists, AIR can be used to scale individual workloads, and also end-to-end ML applications. For ML Engineers, AIR provides scalable platform abstractions that can be used to easily onboard and integrate tooling from the broader ML ecosystem.
Quick Start
-----------
Below, we walk through how AIR's unified ML API enables scaling of end-to-end ML workflows, focusing on
a few of the popular frameworks AIR integrates with (XGBoost, Pytorch, and Tensorflow). The ML workflow we're going to build is summarized by the following diagram:
..
https://docs.google.com/drawings/d/1z0r_Yc7-0NAPVsP2jWUkLV2jHVHdcJHdt9uN1GDANSY/edit
.. figure :: images/why-air.svg
AIR provides a unified API for the ML ecosystem.
This diagram shows how AIR enables an ecosystem of libraries to be run at scale in just a few lines of code.
2022-03-21 17:20:45 -07:00
2022-08-03 16:04:04 -07:00
Get started by installing Ray AIR:
2022-03-21 17:20:45 -07:00
2022-08-03 16:04:04 -07:00
.. code :: bash
2022-03-21 17:20:45 -07:00
2022-08-04 10:47:10 -07:00
pip install -U "ray[air]"
2022-08-05 11:21:19 -07:00
2022-08-04 15:56:40 -07:00
# The below Ray AIR tutorial was written with the following libraries.
# Consider running the following to ensure that the code below runs properly:
pip install -U pandas>=1.3.5
pip install -U torch>=1.12
pip install -U numpy>=1.19.5
pip install -U tensorflow>=2.6.2
pip install -U pyarrow>=6.0.1
2022-07-11 20:16:37 -07:00
Preprocessing
~~~~~~~~~~~~~
2022-08-05 18:42:45 -07:00
First, let's start by loading a dataset from storage:
2022-07-11 20:16:37 -07:00
.. literalinclude :: examples/xgboost_starter.py
:language: python
:start-after: __air_generic_preprocess_start__
:end-before: __air_generic_preprocess_end__
2022-08-05 18:42:45 -07:00
Then, we define a `` Preprocessor `` pipeline for our task:
2022-07-11 20:16:37 -07:00
2022-05-13 01:29:59 -07:00
.. tabbed :: XGBoost
2022-03-21 17:20:45 -07:00
2022-08-05 18:42:45 -07:00
.. literalinclude :: examples/xgboost_starter.py
:language: python
:start-after: __air_xgb_preprocess_start__
:end-before: __air_xgb_preprocess_end__
2022-07-11 20:16:37 -07:00
.. tabbed :: Pytorch
2022-03-21 17:20:45 -07:00
2022-07-11 20:16:37 -07:00
.. literalinclude :: examples/pytorch_tabular_starter.py
2022-05-13 01:29:59 -07:00
:language: python
2022-07-11 20:16:37 -07:00
:start-after: __air_pytorch_preprocess_start__
:end-before: __air_pytorch_preprocess_end__
2022-03-21 17:20:45 -07:00
2022-07-11 20:16:37 -07:00
.. tabbed :: Tensorflow
2022-03-21 17:20:45 -07:00
2022-07-11 20:16:37 -07:00
.. literalinclude :: examples/tf_tabular_starter.py
:language: python
:start-after: __air_tf_preprocess_start__
:end-before: __air_tf_preprocess_end__
Training
~~~~~~~~
Train a model with a `` Trainer `` with common ML frameworks:
.. tabbed :: XGBoost
.. literalinclude :: examples/xgboost_starter.py
2022-05-13 01:29:59 -07:00
:language: python
:start-after: __air_xgb_train_start__
:end-before: __air_xgb_train_end__
2022-03-21 17:20:45 -07:00
2022-07-11 20:16:37 -07:00
.. tabbed :: Pytorch
.. literalinclude :: examples/pytorch_tabular_starter.py
:language: python
:start-after: __air_pytorch_train_start__
:end-before: __air_pytorch_train_end__
2022-03-21 17:20:45 -07:00
2022-07-11 20:16:37 -07:00
.. tabbed :: Tensorflow
2022-03-21 17:20:45 -07:00
2022-07-11 20:16:37 -07:00
.. literalinclude :: examples/tf_tabular_starter.py
2022-05-13 01:29:59 -07:00
:language: python
2022-07-11 20:16:37 -07:00
:start-after: __air_tf_train_start__
:end-before: __air_tf_train_end__
2022-03-21 17:20:45 -07:00
2022-07-11 20:16:37 -07:00
Hyperparameter Tuning
~~~~~~~~~~~~~~~~~~~~~
2022-03-21 17:20:45 -07:00
2022-07-11 20:16:37 -07:00
You can specify a hyperparameter space to search over for each trainer:
2022-03-21 17:20:45 -07:00
2022-07-11 20:16:37 -07:00
.. tabbed :: XGBoost
2022-04-29 23:31:54 +02:00
2022-07-11 20:16:37 -07:00
.. literalinclude :: examples/xgboost_starter.py
2022-05-13 01:29:59 -07:00
:language: python
2022-07-11 20:16:37 -07:00
:start-after: __air_xgb_tuner_start__
:end-before: __air_xgb_tuner_end__
2022-04-12 00:11:42 +02:00
2022-07-11 20:16:37 -07:00
.. tabbed :: Pytorch
2022-03-21 17:20:45 -07:00
2022-07-11 20:16:37 -07:00
.. literalinclude :: examples/pytorch_tabular_starter.py
2022-05-13 01:29:59 -07:00
:language: python
2022-07-11 20:16:37 -07:00
:start-after: __air_pytorch_tuner_start__
:end-before: __air_pytorch_tuner_end__
2022-03-21 17:20:45 -07:00
2022-05-13 01:29:59 -07:00
.. tabbed :: Tensorflow
2022-03-21 17:20:45 -07:00
2022-07-11 20:16:37 -07:00
.. literalinclude :: examples/tf_tabular_starter.py
2022-05-13 01:29:59 -07:00
:language: python
2022-07-11 20:16:37 -07:00
:start-after: __air_tf_tuner_start__
:end-before: __air_tf_tuner_end__
Then use the `` Tuner `` to run the search:
.. literalinclude :: examples/pytorch_tabular_starter.py
:language: python
:start-after: __air_tune_generic_start__
:end-before: __air_tune_generic_end__
Batch Inference
~~~~~~~~~~~~~~~
Use the trained model for scalable batch prediction with a `` BatchPredictor `` .
2022-03-21 17:20:45 -07:00
2022-07-11 20:16:37 -07:00
.. tabbed :: XGBoost
2022-03-21 17:20:45 -07:00
2022-07-11 20:16:37 -07:00
.. literalinclude :: examples/xgboost_starter.py
2022-05-13 01:29:59 -07:00
:language: python
2022-07-11 20:16:37 -07:00
:start-after: __air_xgb_batchpred_start__
:end-before: __air_xgb_batchpred_end__
.. tabbed :: Pytorch
2022-03-21 17:20:45 -07:00
2022-07-11 20:16:37 -07:00
.. literalinclude :: examples/pytorch_tabular_starter.py
:language: python
:start-after: __air_pytorch_batchpred_start__
:end-before: __air_pytorch_batchpred_end__
2022-03-21 17:20:45 -07:00
2022-07-11 20:16:37 -07:00
.. tabbed :: Tensorflow
2022-04-13 08:58:08 +01:00
2022-07-11 20:16:37 -07:00
.. literalinclude :: examples/tf_tabular_starter.py
2022-05-13 01:29:59 -07:00
:language: python
:start-after: __air_tf_batchpred_start__
:end-before: __air_tf_batchpred_end__
2022-03-21 17:20:45 -07:00
2022-08-05 18:42:45 -07:00
Project Status
--------------
2022-08-03 16:04:04 -07:00
2022-08-05 18:42:45 -07:00
AIR is currently in **beta** . If you have questions for the team or are interested in getting involved in the development process, fill out `this short form <https://forms.gle/wCCdbaQDtgErYycT6> `__ .
2022-07-21 19:06:47 -07:00
2022-08-08 22:36:04 +09:00
For an overview of the AIR libraries, ecosystem integrations, and their readiness, check out the latest :ref: `AIR ecosystem map <air-ecosystem-map>` .
2022-07-21 19:06:47 -07:00
2022-05-13 01:29:59 -07:00
Next Steps
----------
2022-03-21 17:20:45 -07:00
2022-05-13 01:29:59 -07:00
- :ref: `air-key-concepts`
2022-06-03 21:53:44 +01:00
- `Examples <https://github.com/ray-project/ray/tree/master/python/ray/air/examples> `__
2022-05-13 01:29:59 -07:00
- :ref: `Deployment Guide <air-deployment>`
- :ref: `API reference <air-api-ref>`