2022-05-10 11:06:43 -07:00
.. _air:
2022-06-06 15:15:11 -07:00
Ray AI Runtime (AIR)
====================
2022-03-21 17:20:45 -07:00
2022-06-06 15:15:11 -07:00
.. tip ::
2022-05-13 01:29:59 -07:00
2022-08-03 16:04:04 -07:00
AIR is currently in **beta** . Fill out `this short form <https://forms.gle/wCCdbaQDtgErYycT6> `__ to get involved. We'll be holding office hours, development sprints, and other activities as we get closer to the GA release. Join us!
2022-03-21 17:20:45 -07:00
2022-08-03 16:04:04 -07:00
Ray AI Runtime (AIR) is a scalable and unified toolkit for ML applications. AIR enables easy scaling of individual workloads, end-to-end workflows, and popular ecosystem frameworks, all in just Python.
2022-03-21 17:20:45 -07:00
2022-08-03 16:04:04 -07:00
.. image :: images/ray-air.svg
2022-03-21 17:20:45 -07:00
2022-08-03 16:04:04 -07:00
AIR comes with ready-to-use libraries for :ref: `Preprocessing <datasets>` , :ref: `Training <train-docs>` , :ref: `Tuning <tune-main>` , :ref: `Scoring <air-predictors>` , :ref: `Serving <rayserve>` , and :ref: `Reinforcement Learning <rllib-index>` , as well as an ecosystem of integrations.
2022-03-21 17:20:45 -07:00
2022-08-03 16:04:04 -07:00
Ray AIR focuses on the compute aspects of ML:
* It provides scalability by leveraging Ray’ s distributed compute layer for ML workloads.
* It is designed to interoperate with other systems for storage and metadata needs.
2022-03-21 17:20:45 -07:00
2022-08-03 16:04:04 -07:00
Get started by installing Ray AIR:
2022-03-21 17:20:45 -07:00
2022-08-03 16:04:04 -07:00
.. code :: bash
2022-03-21 17:20:45 -07:00
2022-08-04 10:47:10 -07:00
pip install -U "ray[air]"
2022-08-05 11:21:19 -07:00
2022-08-04 15:56:40 -07:00
# The below Ray AIR tutorial was written with the following libraries.
# Consider running the following to ensure that the code below runs properly:
pip install -U pandas>=1.3.5
pip install -U torch>=1.12
pip install -U numpy>=1.19.5
pip install -U tensorflow>=2.6.2
pip install -U pyarrow>=6.0.1
2022-05-13 01:29:59 -07:00
Quick Start
-----------
2022-03-21 17:20:45 -07:00
2022-08-03 16:04:04 -07:00
Below, we demonstrate how AIR enables simple scaling of end-to-end ML workflows, focusing on
a few of the popular frameworks AIR integrates with (XGBoost, Pytorch, and Tensorflow):
2022-07-11 20:16:37 -07:00
Preprocessing
~~~~~~~~~~~~~
Below, let's start by preprocessing your data with Ray AIR's `` Preprocessors `` :
.. literalinclude :: examples/xgboost_starter.py
:language: python
:start-after: __air_generic_preprocess_start__
:end-before: __air_generic_preprocess_end__
If using Tensorflow or Pytorch, format your data for use with your training framework:
2022-05-13 01:29:59 -07:00
.. tabbed :: XGBoost
2022-03-21 17:20:45 -07:00
2022-07-11 20:16:37 -07:00
.. code-block :: python
# No extra preprocessing is required for XGBoost.
# The data is already in the correct format.
.. tabbed :: Pytorch
2022-03-21 17:20:45 -07:00
2022-07-11 20:16:37 -07:00
.. literalinclude :: examples/pytorch_tabular_starter.py
2022-05-13 01:29:59 -07:00
:language: python
2022-07-11 20:16:37 -07:00
:start-after: __air_pytorch_preprocess_start__
:end-before: __air_pytorch_preprocess_end__
2022-03-21 17:20:45 -07:00
2022-07-11 20:16:37 -07:00
.. tabbed :: Tensorflow
2022-03-21 17:20:45 -07:00
2022-07-11 20:16:37 -07:00
.. literalinclude :: examples/tf_tabular_starter.py
:language: python
:start-after: __air_tf_preprocess_start__
:end-before: __air_tf_preprocess_end__
Training
~~~~~~~~
Train a model with a `` Trainer `` with common ML frameworks:
.. tabbed :: XGBoost
.. literalinclude :: examples/xgboost_starter.py
2022-05-13 01:29:59 -07:00
:language: python
:start-after: __air_xgb_train_start__
:end-before: __air_xgb_train_end__
2022-03-21 17:20:45 -07:00
2022-07-11 20:16:37 -07:00
.. tabbed :: Pytorch
.. literalinclude :: examples/pytorch_tabular_starter.py
:language: python
:start-after: __air_pytorch_train_start__
:end-before: __air_pytorch_train_end__
2022-03-21 17:20:45 -07:00
2022-07-11 20:16:37 -07:00
.. tabbed :: Tensorflow
2022-03-21 17:20:45 -07:00
2022-07-11 20:16:37 -07:00
.. literalinclude :: examples/tf_tabular_starter.py
2022-05-13 01:29:59 -07:00
:language: python
2022-07-11 20:16:37 -07:00
:start-after: __air_tf_train_start__
:end-before: __air_tf_train_end__
2022-03-21 17:20:45 -07:00
2022-07-11 20:16:37 -07:00
Hyperparameter Tuning
~~~~~~~~~~~~~~~~~~~~~
2022-03-21 17:20:45 -07:00
2022-07-11 20:16:37 -07:00
You can specify a hyperparameter space to search over for each trainer:
2022-03-21 17:20:45 -07:00
2022-07-11 20:16:37 -07:00
.. tabbed :: XGBoost
2022-04-29 23:31:54 +02:00
2022-07-11 20:16:37 -07:00
.. literalinclude :: examples/xgboost_starter.py
2022-05-13 01:29:59 -07:00
:language: python
2022-07-11 20:16:37 -07:00
:start-after: __air_xgb_tuner_start__
:end-before: __air_xgb_tuner_end__
2022-04-12 00:11:42 +02:00
2022-07-11 20:16:37 -07:00
.. tabbed :: Pytorch
2022-03-21 17:20:45 -07:00
2022-07-11 20:16:37 -07:00
.. literalinclude :: examples/pytorch_tabular_starter.py
2022-05-13 01:29:59 -07:00
:language: python
2022-07-11 20:16:37 -07:00
:start-after: __air_pytorch_tuner_start__
:end-before: __air_pytorch_tuner_end__
2022-03-21 17:20:45 -07:00
2022-05-13 01:29:59 -07:00
.. tabbed :: Tensorflow
2022-03-21 17:20:45 -07:00
2022-07-11 20:16:37 -07:00
.. literalinclude :: examples/tf_tabular_starter.py
2022-05-13 01:29:59 -07:00
:language: python
2022-07-11 20:16:37 -07:00
:start-after: __air_tf_tuner_start__
:end-before: __air_tf_tuner_end__
Then use the `` Tuner `` to run the search:
.. literalinclude :: examples/pytorch_tabular_starter.py
:language: python
:start-after: __air_tune_generic_start__
:end-before: __air_tune_generic_end__
Batch Inference
~~~~~~~~~~~~~~~
Use the trained model for scalable batch prediction with a `` BatchPredictor `` .
2022-03-21 17:20:45 -07:00
2022-07-11 20:16:37 -07:00
.. tabbed :: XGBoost
2022-03-21 17:20:45 -07:00
2022-07-11 20:16:37 -07:00
.. literalinclude :: examples/xgboost_starter.py
2022-05-13 01:29:59 -07:00
:language: python
2022-07-11 20:16:37 -07:00
:start-after: __air_xgb_batchpred_start__
:end-before: __air_xgb_batchpred_end__
.. tabbed :: Pytorch
2022-03-21 17:20:45 -07:00
2022-07-11 20:16:37 -07:00
.. literalinclude :: examples/pytorch_tabular_starter.py
:language: python
:start-after: __air_pytorch_batchpred_start__
:end-before: __air_pytorch_batchpred_end__
2022-03-21 17:20:45 -07:00
2022-07-11 20:16:37 -07:00
.. tabbed :: Tensorflow
2022-04-13 08:58:08 +01:00
2022-07-11 20:16:37 -07:00
.. literalinclude :: examples/tf_tabular_starter.py
2022-05-13 01:29:59 -07:00
:language: python
:start-after: __air_tf_batchpred_start__
:end-before: __air_tf_batchpred_end__
2022-03-21 17:20:45 -07:00
2022-05-13 01:29:59 -07:00
Why Ray AIR?
------------
2022-03-21 17:20:45 -07:00
2022-08-03 16:04:04 -07:00
Ray AIR aims to simplify the ecosystem of machine learning frameworks, platforms, and tools. It does this by taking a scalable, single-system approach to ML infrastructure (i.e., leveraging Ray as a unified compute framework):
**1. Seamless Dev to Prod** : AIR reduces friction going from development to production. Traditional orchestration approaches introduce separate systems and operational overheads. With Ray and AIR, the same Python code scales seamlessly from a laptop to a large cluster.
**2. Unified API** : Want to switch between frameworks like XGBoost and PyTorch, or try out a new library like HuggingFace? Thanks to the flexibility of AIR, you can do this by just swapping out a single class, without needing to set up new systems or change other aspects of your workflow.
2022-04-12 00:11:42 +02:00
2022-08-03 16:04:04 -07:00
**3. Open and Evolvable** : Ray core and libraries are fully open-source and can run on any cluster, cloud, or Kubernetes, reducing the costs of platform lock-in. Want to go out of the box? Run any framework you want using AIR's integration APIs, or build advanced use cases directly on Ray core.
2022-03-21 17:20:45 -07:00
2022-08-03 16:04:04 -07:00
.. figure :: images/why-air.png
2022-03-21 17:20:45 -07:00
2022-08-03 16:04:04 -07:00
AIR enables a single-system / single-script approach to scaling ML. Ray's
distributed Python APIs enable scaling of ML workloads without the burden of
setting up or orchestrating separate distributed systems.
2022-03-21 17:20:45 -07:00
2022-08-03 16:04:04 -07:00
AIR is for both data scientists and ML engineers. Consider using AIR when you want to:
* Scale a single workload.
* Scale end-to-end ML applications.
* Build a custom ML platform for your organization.
2022-03-21 17:20:45 -07:00
2022-07-21 19:06:47 -07:00
AIR Ecosystem
-------------
2022-08-03 16:04:04 -07:00
AIR comes with built-in integrations with the most popular ecosystem libraries. The following diagram provides an overview of the AIR libraries, ecosystem integrations, and their readiness.
AIR's developer APIs also enable *custom integrations* to be easily created.
2022-07-21 19:06:47 -07:00
..
https://docs.google.com/drawings/d/1pZkRrkAbRD8jM-xlGlAaVo3T66oBQ_HpsCzomMT7OIc/edit
.. image :: images/air-ecosystem.svg
2022-05-13 01:29:59 -07:00
Next Steps
----------
2022-03-21 17:20:45 -07:00
2022-05-13 01:29:59 -07:00
- :ref: `air-key-concepts`
2022-06-03 21:53:44 +01:00
- `Examples <https://github.com/ray-project/ray/tree/master/python/ray/air/examples> `__
2022-05-13 01:29:59 -07:00
- :ref: `Deployment Guide <air-deployment>`
- :ref: `API reference <air-api-ref>`