[air] Remove checkpoint user guide and update key concepts and docstring (#27455)

2025-03-06 02:21:39 -05:00 · 2022-08-04 08:55:26 -07:00 · 2022-08-04 08:55:26 -07:00 · b2cd34cc5c
commit b2cd34cc5c
parent 8d5c07b781
8 changed files with 125 additions and 308 deletions
--- a/doc/source/_toc.yml
+++ b/doc/source/_toc.yml
@ -15,7 +15,6 @@ parts:
    - file: ray-air/user-guides
      sections:
        - file: ray-air/preprocessors
-        - file: ray-air/checkpoints
        - file: ray-air/check-ingest
        - file: ray-air/trainer
        - file: ray-air/tuner
--- a/doc/source/ray-air/checkpoints.rst
+++ b/doc/source/ray-air/checkpoints.rst
@ -1,89 +0,0 @@
-.. _air-checkpoints-doc:
-
-Using Checkpoints
-=================
-
-The AIR trainers, tuners, and custom pretrained model generate Checkpoints. An AIR Checkpoint is a common format for models that
-are used across different components of the Ray AI Runtime. This common format allow easy interoperability among AIR components
-and seamless integration with external supported machine learning frameworks.
-
-.. image:: images/checkpoints.jpg
-
-What is a checkpoint?
---------------------
-
-A Checkpoint object is a serializable reference to a model. A model can be represented in one of three ways:
-
- as a directory on local (on-disk) storage
- as a directory on an external storage (e.g., cloud storage)
- as an in-memory dictionary
-
-Because of these different model storage representation, Checkpoint models provide useful flexibility in
-distributed environments, where you want to recreate an instance of the same model on multiple nodes or
-across different Ray clusters.
-
-How to create a checkpoint?
---------------------------
-
-There are two ways to generate a checkpoint.
-
-The first way is to generate it from a pretrained model. Each AIR supported machine learning (ML) framework has
-a ``Checkpoint`` method that can be used to generate an AIR checkpoint:
-
-.. literalinclude:: doc_code/checkpoint_usage.py
-    :language: python
-    :start-after: __checkpoint_quick_start__
-    :end-before: __checkpoint_quick_end__
-
-
-Another way is to retrieve it from the result object returned by a Trainer or Tuner.
-
-.. literalinclude:: doc_code/checkpoint_usage.py
-    :language: python
-    :start-after: __use_trainer_checkpoint_start__
-    :end-before: __use_trainer_checkpoint_end__
-
-How to use a checkpoint?
------------------------
-
-Checkpoints can be used to instantiate a :class:`Predictor`, :class:`BatchPredictor`, or :class:`PredictorDeployment` class.
-An instance of this instantiated class (in memory) can be used for inference.
-
-For instance, the code example below shows how a checkpoint in the :class:`BatchPredictor` is used for scalable batch inference:
-
-.. literalinclude:: doc_code/checkpoint_usage.py
-    :language: python
-    :start-after: __batch_pred_start__
-    :end-before: __batch_pred_end__
-
-Another example below demonstrates how to use a checkpoint for an online inference via :class:`PredictorDeployment`:
-
-.. literalinclude:: doc_code/checkpoint_usage.py
-    :language: python
-    :start-after: __online_inference_start__
-    :end-before: __online_inference_end__
-
-Furthermore, a Checkpoint object has methods to translate between different checkpoint storage locations.
-With this flexibility, Checkpoint objects can be serialized and used in different contexts
-(e.g., on a different process or a different machine):
-
-.. literalinclude:: doc_code/checkpoint_usage.py
-    :language: python
-    :start-after: __basic_checkpoint_start__
-    :end-before: __basic_checkpoint_end__
-
-
-Example: Using Checkpoints with MLflow
--------------------------------------
-
-`MLflow <https://mlflow.org/>`__ has its own `checkpoint format <https://www.mlflow.org/docs/latest/models.html>`__ called
-the "MLflow Model." It is a standard format to package machine learning models that can be used in a variety of downstream tools.
-
-Below is an example of using MLflow models as a Ray AIR Checkpoint.
-
-.. literalinclude:: doc_code/checkpoint_mlflow.py
-    :language: python
-    :start-after: __mlflow_checkpoint_start__
-    :end-before: __mlflow_checkpoint_end__
-
-
--- a/doc/source/ray-air/doc_code/air_key_concepts.py
+++ b/doc/source/ray-air/doc_code/air_key_concepts.py
@ -65,6 +65,37 @@ best_result = result_grid.get_best_result()
 print(best_result)
 # __air_tuner_end__

+# __air_checkpoints_start__
+checkpoint = result.checkpoint
+print(checkpoint)
+# Checkpoint(local_path=..../checkpoint_000005)
+
+tuned_checkpoint = result_grid.get_best_result().checkpoint
+print(tuned_checkpoint)
+# Checkpoint(local_path=..../checkpoint_000005)
+# __air_checkpoints_end__
+
+# __checkpoint_adhoc_start__
+from ray.train.tensorflow import TensorflowCheckpoint
+import tensorflow as tf
+
+# This can be a trained model.
+def build_model() -> tf.keras.Model:
+    model = tf.keras.Sequential(
+        [
+            tf.keras.layers.InputLayer(input_shape=(1,)),
+            tf.keras.layers.Dense(1),
+        ]
+    )
+    return model
+
+
+model = build_model()
+
+checkpoint = TensorflowCheckpoint.from_model(model)
+# __checkpoint_adhoc_end__
+
+
 # __air_batch_predictor_start__
 from ray.train.batch_predictor import BatchPredictor
 from ray.train.xgboost import XGBoostPredictor
--- a/doc/source/ray-air/doc_code/checkpoint_usage.py
+++ b/doc/source/ray-air/doc_code/checkpoint_usage.py
@ -1,122 +0,0 @@
-# flake8: noqa
-# isort: skip_file
-
-# __checkpoint_quick_start__
-from ray.train.tensorflow import TensorflowCheckpoint
-import tensorflow as tf
-
-# This can be a trained model.
-def build_model() -> tf.keras.Model:
-    model = tf.keras.Sequential(
-        [
-            tf.keras.layers.InputLayer(input_shape=(1,)),
-            tf.keras.layers.Dense(1),
-        ]
-    )
-    return model
-
-
-model = build_model()
-
-checkpoint = TensorflowCheckpoint.from_model(model)
-# __checkpoint_quick_end__
-
-
-# __use_trainer_checkpoint_start__
-import ray
-from ray.train.xgboost import XGBoostTrainer
-from ray.air.config import ScalingConfig
-
-
-dataset = ray.data.read_csv("s3://anonymous@air-example-data/breast_cancer.csv")
-
-# Split data into train and validation.
-train_dataset, valid_dataset = dataset.train_test_split(test_size=0.3)
-
-trainer = XGBoostTrainer(
-    scaling_config=ScalingConfig(num_workers=2),
-    label_column="target",
-    params={
-        "objective": "binary:logistic",
-        "eval_metric": ["logloss", "error"],
-    },
-    datasets={"train": train_dataset},
-    num_boost_round=5,
-)
-
-result = trainer.fit()
-checkpoint = result.checkpoint
-# __use_trainer_checkpoint_end__
-
-# __batch_pred_start__
-from ray.train.batch_predictor import BatchPredictor
-from ray.train.xgboost import XGBoostPredictor
-
-# Create a test dataset by dropping the target column.
-test_dataset = valid_dataset.drop_columns(["target"])
-
-batch_predictor = BatchPredictor.from_checkpoint(checkpoint, XGBoostPredictor)
-
-# Bulk batch prediction.
-batch_predictor.predict(test_dataset)
-# __batch_pred_end__
-
-
-# __online_inference_start__
-import requests
-from fastapi import Request
-import pandas as pd
-
-from ray import serve
-from ray.serve import PredictorDeployment
-from ray.serve.http_adapters import json_request
-
-
-async def adapter(request: Request):
-    content = await request.json()
-    print(content)
-    return pd.DataFrame.from_dict(content)
-
-
-serve.start(detached=True)
-deployment = PredictorDeployment.options(name="XGBoostService")
-
-deployment.deploy(
-    XGBoostPredictor, checkpoint, batching_params=False, http_adapter=adapter
-)
-
-print(deployment.url)
-
-sample_input = test_dataset.take(1)
-sample_input = dict(sample_input[0])
-
-output = requests.post(deployment.url, json=[sample_input]).json()
-print(output)
-# __online_inference_end__
-
-# __basic_checkpoint_start__
-from ray.air.checkpoint import Checkpoint
-
-# Create checkpoint data dict
-checkpoint_data = {"data": 123}
-
-# Create checkpoint object from data
-checkpoint = Checkpoint.from_dict(checkpoint_data)
-
-# Save checkpoint to a directory on the file system.
-path = checkpoint.to_directory()
-
-# This path can then be passed around,
-# # e.g. to a different function or a different script.
-# You can also use `checkpoint.to_uri/from_uri` to
-# read from/write to cloud storage
-
-# In another function or script, recover Checkpoint object from path
-checkpoint = Checkpoint.from_directory(path)
-
-# Convert into dictionary again
-recovered_data = checkpoint.to_dict()
-
-# It is guaranteed that the original data has been recovered
-assert recovered_data == checkpoint_data
-# __basic_checkpoint_end__
--- a/doc/source/ray-air/images/checkpoints.jpg
+++ b/doc/source/ray-air/images/checkpoints.jpg
--- a/doc/source/ray-air/key-concepts.rst
+++ b/doc/source/ray-air/key-concepts.rst
@ -43,9 +43,8 @@ See the documentation on :ref:`Trainers <air-trainers>`.
    :start-after: __air_trainer_start__
    :end-before: __air_trainer_end__

-
-
-Trainer objects will produce a :ref:`Result <air-results-ref>` object after calling ``.fit()``.  These objects will contain training metrics as long as checkpoints to retrieve the best model.
+Trainer objects produce a :ref:`Result <air-results-ref>` object after calling ``.fit()``.
+These objects contain training metrics as well as checkpoints to retrieve the best model.

 .. literalinclude:: doc_code/air_key_concepts.py
    :language: python
@ -65,11 +64,40 @@ Tuners can work seamlessly with any Trainer but also can support arbitrary train
    :start-after: __air_tuner_start__
    :end-before: __air_tuner_end__

+.. _air-checkpoints-doc:
+
+Checkpoints
+-----------
+
+The AIR trainers, tuners, and custom pretrained model generate :class:`a framework-specific Checkpoint <ray.air.Checkpoint>` object.
+Checkpoints are a common interface for models that are used across different AIR components and libraries.
+
+There are two main ways to generate a checkpoint.
+
+Checkpoint objects can be retrieved from the Result object returned by a Trainer or Tuner ``.fit()`` call.
+
+.. literalinclude:: doc_code/air_key_concepts.py
+    :language: python
+    :start-after: __air_checkpoints_start__
+    :end-before: __air_checkpoints_end__
+
+You can also generate a checkpoint from a pretrained model. Each AIR supported machine learning (ML) framework has
+a ``Checkpoint`` object that can be used to generate an AIR checkpoint:
+
+.. literalinclude:: doc_code/air_key_concepts.py
+    :language: python
+    :start-after: __checkpoint_adhoc_start__
+    :end-before: __checkpoint_adhoc_end__
+
+
+Checkpoints can be used to instantiate a :class:`Predictor`, :class:`BatchPredictor`, or :class:`PredictorDeployment` classes,
+as seen below.
+

 Batch Predictor
 ---------------

-You can take a trained model and do batch inference using the BatchPredictor object.
+You can take a checkpoint and do batch inference using the BatchPredictor object.

 .. literalinclude:: doc_code/air_key_concepts.py
    :language: python
--- a/doc/source/ray-air/user-guides.rst
+++ b/doc/source/ray-air/user-guides.rst
@ -23,17 +23,6 @@ AIR User Guides
        :text: Using Preprocessors
        :classes: btn-link btn-block stretched-link

-
-    ---
-    :img-top: /ray-overview/images/ray_svg_logo.svg
-
-    +++
-    .. link-button:: /ray-air/checkpoints
-        :type: ref
-        :text: Using Checkpoints
-        :classes: btn-link btn-block stretched-link
-
-
    ---
    :img-top: /ray-overview/images/ray_svg_logo.svg

--- a/python/ray/air/checkpoint.py
+++ b/python/ray/air/checkpoint.py
@ -42,34 +42,20 @@ logger = logging.getLogger(__name__)
 class Checkpoint:
    """Ray AIR Checkpoint.

-    This implementation provides methods to translate between
-    different checkpoint storage locations: Local storage, external storage
-    (e.g. cloud storage), and data dict representations.
+    An AIR Checkpoint are a common interface for accessing models across
+    different AIR components and libraries. A Checkpoint can have its data
+    represented in one of three ways:

-    The constructor is a private API, instead the ``from_`` methods should
-    be used to create checkpoint objects
-    (e.g. ``Checkpoint.from_directory()``).
+    - as a directory on local (on-disk) storage
+    - as a directory on an external storage (e.g., cloud storage)
+    - as an in-memory dictionary

-    When converting between different checkpoint formats, it is guaranteed
-    that a full round trip of conversions (e.g. directory --> dict -->
-    obj ref --> directory) will recover the original checkpoint data.
-    There are no guarantees made about compatibility of intermediate
-    representations.
+    The Checkpoint object also has methods to translate between different checkpoint
+    storage locations. These storage representations provide flexibility in
+    distributed environments, where you may want to recreate an instance of
+    the same model on multiple nodes or across different Ray clusters.

-    New data can be added to a Checkpoint during conversion. Consider the
-    following conversion: directory --> dict (adding dict["foo"] = "bar")
-    --> directory --> dict (expect to see dict["foo"] = "bar"). Note that
-    the second directory will contain pickle files with the serialized additional
-    field data in them.
-
-    Similarly with a dict as a source: dict --> directory (add file "foo.txt")
-    --> dict --> directory (will have "foo.txt" in it again). Note that the second
-    dict representation will contain an extra field with the serialized additional
-    files in it.
-
-    Examples:
-
-        Example for an arbitrary data checkpoint:
+    Example:

    .. code-block:: python

@ -81,12 +67,15 @@ class Checkpoint:
        # Create checkpoint object from data
        checkpoint = Checkpoint.from_dict(checkpoint_data)

-            # Save checkpoint to temporary location
+        # Save checkpoint to a directory on the file system.
        path = checkpoint.to_directory()

-            # This path can then be passed around, e.g. to a different function
+        # This path can then be passed around,
+        # # e.g. to a different function or a different script.
+        # You can also use `checkpoint.to_uri/from_uri` to
+        # read from/write to cloud storage

-            # At some other location, recover Checkpoint object from path
+        # In another function or script, recover Checkpoint object from path
        checkpoint = Checkpoint.from_directory(path)

        # Convert into dictionary again
@ -95,39 +84,31 @@ class Checkpoint:
        # It is guaranteed that the original data has been recovered
        assert recovered_data == checkpoint_data

-        Example using MLflow for saving and loading a classifier:
+    Checkpoints can be used to instantiate a :class:`Predictor`,
+    :class:`BatchPredictor`, or :class:`PredictorDeployment` class.

-        .. code-block:: python
+    The constructor is a private API, instead the ``from_`` methods should
+    be used to create checkpoint objects
+    (e.g. ``Checkpoint.from_directory()``).

-            from ray.air.checkpoint import Checkpoint
-            from sklearn.ensemble import RandomForestClassifier
-            import mlflow.sklearn
+    *Other implementation notes:*
+    When converting between different checkpoint formats, it is guaranteed
+    that a full round trip of conversions (e.g. directory --> dict -->
+    obj ref --> directory) will recover the original checkpoint data.
+    There are no guarantees made about compatibility of intermediate
+    representations.

-            # Create an sklearn classifier
-            clf = RandomForestClassifier(max_depth=7, random_state=0)
-            # ... e.g. train model with clf.fit()
-            # Save model using MLflow
-            mlflow.sklearn.save_model(clf, "model_directory")
+    New data can be added to a Checkpoint
+    during conversion. Consider the following conversion:
+    directory --> dict (adding dict["foo"] = "bar")
+    --> directory --> dict (expect to see dict["foo"] = "bar"). Note that
+    the second directory will contain pickle files with the serialized additional
+    field data in them.

-            # Create checkpoint object from path
-            checkpoint = Checkpoint.from_directory("model_directory")
-
-            # Convert into dictionary
-            checkpoint_dict = checkpoint.to_dict()
-
-            # This dict can then be passed around, e.g. to a different function
-
-            # At some other location, recover checkpoint object from dict
-            checkpoint = Checkpoint.from_dict(checkpoint_dict)
-
-            # Convert into a directory again
-            checkpoint.to_directory("other_directory")
-
-            # We can now use MLflow to re-load the model
-            clf = mlflow.sklearn.load_model("other_directory")
-
-            # It is guaranteed that the original data was recovered
-            assert isinstance(clf, RandomForestClassifier)
+    Similarly with a dict as a source: dict --> directory (add file "foo.txt")
+    --> dict --> directory (will have "foo.txt" in it again). Note that the second
+    dict representation will contain an extra field with the serialized additional
+    files in it.

    Checkpoints can be pickled and sent to remote processes.
    Please note that checkpoints pointing to local directories will be