ray/doc/source/ray-air/getting-started.rst
Richard Liaw 4629a3a649
[air/docs] Update Trainer documentation (#27481)
Co-authored-by: xwjiang2010 <xwjiang2010@gmail.com>
Co-authored-by: Kai Fricke <kai@anyscale.com>
Co-authored-by: xwjiang2010 <87673679+xwjiang2010@users.noreply.github.com>
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2022-08-05 11:21:19 -07:00

197 lines
7 KiB
ReStructuredText
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

.. _air:
Ray AI Runtime (AIR)
====================
.. tip::
AIR is currently in **beta**. Fill out `this short form <https://forms.gle/wCCdbaQDtgErYycT6>`__ to get involved. We'll be holding office hours, development sprints, and other activities as we get closer to the GA release. Join us!
Ray AI Runtime (AIR) is a scalable and unified toolkit for ML applications. AIR enables easy scaling of individual workloads, end-to-end workflows, and popular ecosystem frameworks, all in just Python.
.. image:: images/ray-air.svg
AIR comes with ready-to-use libraries for :ref:`Preprocessing <datasets>`, :ref:`Training <train-docs>`, :ref:`Tuning <tune-main>`, :ref:`Scoring <air-predictors>`, :ref:`Serving <rayserve>`, and :ref:`Reinforcement Learning <rllib-index>`, as well as an ecosystem of integrations.
Ray AIR focuses on the compute aspects of ML:
* It provides scalability by leveraging Rays distributed compute layer for ML workloads.
* It is designed to interoperate with other systems for storage and metadata needs.
Get started by installing Ray AIR:
.. code:: bash
pip install -U "ray[air]"
# The below Ray AIR tutorial was written with the following libraries.
# Consider running the following to ensure that the code below runs properly:
pip install -U pandas>=1.3.5
pip install -U torch>=1.12
pip install -U numpy>=1.19.5
pip install -U tensorflow>=2.6.2
pip install -U pyarrow>=6.0.1
Quick Start
-----------
Below, we demonstrate how AIR enables simple scaling of end-to-end ML workflows, focusing on
a few of the popular frameworks AIR integrates with (XGBoost, Pytorch, and Tensorflow):
Preprocessing
~~~~~~~~~~~~~
Below, let's start by preprocessing your data with Ray AIR's ``Preprocessors``:
.. literalinclude:: examples/xgboost_starter.py
:language: python
:start-after: __air_generic_preprocess_start__
:end-before: __air_generic_preprocess_end__
If using Tensorflow or Pytorch, format your data for use with your training framework:
.. tabbed:: XGBoost
.. code-block:: python
# No extra preprocessing is required for XGBoost.
# The data is already in the correct format.
.. tabbed:: Pytorch
.. literalinclude:: examples/pytorch_tabular_starter.py
:language: python
:start-after: __air_pytorch_preprocess_start__
:end-before: __air_pytorch_preprocess_end__
.. tabbed:: Tensorflow
.. literalinclude:: examples/tf_tabular_starter.py
:language: python
:start-after: __air_tf_preprocess_start__
:end-before: __air_tf_preprocess_end__
Training
~~~~~~~~
Train a model with a ``Trainer`` with common ML frameworks:
.. tabbed:: XGBoost
.. literalinclude:: examples/xgboost_starter.py
:language: python
:start-after: __air_xgb_train_start__
:end-before: __air_xgb_train_end__
.. tabbed:: Pytorch
.. literalinclude:: examples/pytorch_tabular_starter.py
:language: python
:start-after: __air_pytorch_train_start__
:end-before: __air_pytorch_train_end__
.. tabbed:: Tensorflow
.. literalinclude:: examples/tf_tabular_starter.py
:language: python
:start-after: __air_tf_train_start__
:end-before: __air_tf_train_end__
Hyperparameter Tuning
~~~~~~~~~~~~~~~~~~~~~
You can specify a hyperparameter space to search over for each trainer:
.. tabbed:: XGBoost
.. literalinclude:: examples/xgboost_starter.py
:language: python
:start-after: __air_xgb_tuner_start__
:end-before: __air_xgb_tuner_end__
.. tabbed:: Pytorch
.. literalinclude:: examples/pytorch_tabular_starter.py
:language: python
:start-after: __air_pytorch_tuner_start__
:end-before: __air_pytorch_tuner_end__
.. tabbed:: Tensorflow
.. literalinclude:: examples/tf_tabular_starter.py
:language: python
:start-after: __air_tf_tuner_start__
:end-before: __air_tf_tuner_end__
Then use the ``Tuner`` to run the search:
.. literalinclude:: examples/pytorch_tabular_starter.py
:language: python
:start-after: __air_tune_generic_start__
:end-before: __air_tune_generic_end__
Batch Inference
~~~~~~~~~~~~~~~
Use the trained model for scalable batch prediction with a ``BatchPredictor``.
.. tabbed:: XGBoost
.. literalinclude:: examples/xgboost_starter.py
:language: python
:start-after: __air_xgb_batchpred_start__
:end-before: __air_xgb_batchpred_end__
.. tabbed:: Pytorch
.. literalinclude:: examples/pytorch_tabular_starter.py
:language: python
:start-after: __air_pytorch_batchpred_start__
:end-before: __air_pytorch_batchpred_end__
.. tabbed:: Tensorflow
.. literalinclude:: examples/tf_tabular_starter.py
:language: python
:start-after: __air_tf_batchpred_start__
:end-before: __air_tf_batchpred_end__
Why Ray AIR?
------------
Ray AIR aims to simplify the ecosystem of machine learning frameworks, platforms, and tools. It does this by taking a scalable, single-system approach to ML infrastructure (i.e., leveraging Ray as a unified compute framework):
**1. Seamless Dev to Prod**: AIR reduces friction going from development to production. Traditional orchestration approaches introduce separate systems and operational overheads. With Ray and AIR, the same Python code scales seamlessly from a laptop to a large cluster.
**2. Unified API**: Want to switch between frameworks like XGBoost and PyTorch, or try out a new library like HuggingFace? Thanks to the flexibility of AIR, you can do this by just swapping out a single class, without needing to set up new systems or change other aspects of your workflow.
**3. Open and Evolvable**: Ray core and libraries are fully open-source and can run on any cluster, cloud, or Kubernetes, reducing the costs of platform lock-in. Want to go out of the box? Run any framework you want using AIR's integration APIs, or build advanced use cases directly on Ray core.
.. figure:: images/why-air.png
AIR enables a single-system / single-script approach to scaling ML. Ray's
distributed Python APIs enable scaling of ML workloads without the burden of
setting up or orchestrating separate distributed systems.
AIR is for both data scientists and ML engineers. Consider using AIR when you want to:
* Scale a single workload.
* Scale end-to-end ML applications.
* Build a custom ML platform for your organization.
AIR Ecosystem
-------------
AIR comes with built-in integrations with the most popular ecosystem libraries. The following diagram provides an overview of the AIR libraries, ecosystem integrations, and their readiness.
AIR's developer APIs also enable *custom integrations* to be easily created.
..
https://docs.google.com/drawings/d/1pZkRrkAbRD8jM-xlGlAaVo3T66oBQ_HpsCzomMT7OIc/edit
.. image:: images/air-ecosystem.svg
Next Steps
----------
- :ref:`air-key-concepts`
- `Examples <https://github.com/ray-project/ray/tree/master/python/ray/air/examples>`__
- :ref:`Deployment Guide <air-deployment>`
- :ref:`API reference <air-api-ref>`