[Docs] [Train] Update Train API reference and docs (#28192)

Signed-off-by: Amog Kamsetty amogkamsetty@yahoo.com Adds back more Ray Train APIs to Ray Train docs. Also makes updates to the user guide for better references.
2025-03-04 17:41:43 -05:00 · 2022-09-01 17:47:42 -07:00 · 2022-09-01 17:47:42 -07:00 · b83f10dbde
commit b83f10dbde
parent 118b76218a
4 changed files with 182 additions and 53 deletions
--- a/doc/source/ray-air/package-ref.rst
+++ b/doc/source/ray-air/package-ref.rst
@ -122,6 +122,8 @@ Training Result
 .. automodule:: ray.air.result
    :members:

+.. _air-session-ref:
+
 Training Session
 ################

@ -199,14 +201,17 @@ XGBoost
 .. autoclass:: ray.train.xgboost.XGBoostTrainer
    :members:
    :show-inheritance:
+    :noindex:

    .. automethod:: __init__
+        :noindex:


 .. automodule:: ray.train.xgboost
    :members:
    :exclude-members: XGBoostTrainer
    :show-inheritance:
+    :noindex:

 LightGBM
 ########
@ -214,14 +219,17 @@ LightGBM
 .. autoclass:: ray.train.lightgbm.LightGBMTrainer
    :members:
    :show-inheritance:
+    :noindex:

    .. automethod:: __init__
+        :noindex:


 .. automodule:: ray.train.lightgbm
    :members:
    :exclude-members: LightGBMTrainer
    :show-inheritance:
+    :noindex:

 TensorFlow
 ##########
@ -229,14 +237,17 @@ TensorFlow
 .. autoclass:: ray.train.tensorflow.TensorflowTrainer
    :members:
    :show-inheritance:
+    :noindex:

    .. automethod:: __init__
+        :noindex:


 .. automodule:: ray.train.tensorflow
    :members:
    :exclude-members: TensorflowTrainer
    :show-inheritance:
+    :noindex:

 .. _air-pytorch-ref:

@ -246,14 +257,17 @@ PyTorch
 .. autoclass:: ray.train.torch.TorchTrainer
    :members:
    :show-inheritance:
+    :noindex:

    .. automethod:: __init__
+        :noindex:


 .. automodule:: ray.train.torch
    :members:
    :exclude-members: TorchTrainer
    :show-inheritance:
+    :noindex:

 Horovod
 #######
@ -261,14 +275,17 @@ Horovod
 .. autoclass:: ray.train.horovod.HorovodTrainer
    :members:
    :show-inheritance:
+    :noindex:

    .. automethod:: __init__
+        :noindex:


 .. automodule:: ray.train.horovod
    :members:
    :exclude-members: HorovodTrainer
    :show-inheritance:
+    :noindex:

 HuggingFace
 ###########
@ -276,14 +293,17 @@ HuggingFace
 .. autoclass:: ray.train.huggingface.HuggingFaceTrainer
    :members:
    :show-inheritance:
+    :noindex:

    .. automethod:: __init__
+        :noindex:


 .. automodule:: ray.train.huggingface
    :members:
    :exclude-members: HuggingFaceTrainer
    :show-inheritance:
+    :noindex:

 Scikit-Learn
 ############
@ -291,14 +311,17 @@ Scikit-Learn
 .. autoclass:: ray.train.sklearn.SklearnTrainer
    :members:
    :show-inheritance:
+    :noindex:

    .. automethod:: __init__
+        :noindex:


 .. automodule:: ray.train.sklearn
    :members:
    :exclude-members: SklearnTrainer
    :show-inheritance:
+    :noindex:


 Reinforcement Learning (RLlib)
@ -307,6 +330,7 @@ Reinforcement Learning (RLlib)
 .. automodule:: ray.train.rl
    :members:
    :show-inheritance:
+    :noindex:

 .. _air-builtin-callbacks:

@ -333,5 +357,3 @@ Weights and Biases
 ##################

 .. autoclass:: ray.air.callbacks.wandb.WandbLoggerCallback
-
-.. _air-session-ref:
--- a/doc/source/train/api.rst
+++ b/doc/source/train/api.rst
@ -2,54 +2,159 @@

 Ray Train API
 =============
+This page covers framework specific integrations with Ray Train and Ray Train Developer APIs.

-This page covers advanced configurations for specific frameworks using Train.
+For core Ray AIR APIs, take a look at the :ref:`AIR Trainer package reference <air-trainer-ref>`.

-For different high level trainers and their usage, take a look at the :ref:`AIR Trainer package reference <air-trainer-ref>`.
+.. _train-integration-api:

-.. _train-api-backend-config:
+Trainer and Predictor Integrations
+----------------------------------

-Backend Configurations
----------------------
+XGBoost
+~~~~~~~

-.. _train-api-torch-config:
+.. autoclass:: ray.train.xgboost.XGBoostTrainer
+    :members:
+    :show-inheritance:

-TorchConfig
+    .. automethod:: __init__
+
+
+.. automodule:: ray.train.xgboost
+    :members:
+    :exclude-members: XGBoostTrainer
+    :show-inheritance:
+
+LightGBM
+~~~~~~~~
+
+.. autoclass:: ray.train.lightgbm.LightGBMTrainer
+    :members:
+    :show-inheritance:
+
+    .. automethod:: __init__
+
+
+.. automodule:: ray.train.lightgbm
+    :members:
+    :exclude-members: LightGBMTrainer
+    :show-inheritance:
+
+TensorFlow
+~~~~~~~~~~
+
+.. autoclass:: ray.train.tensorflow.TensorflowTrainer
+    :members:
+    :show-inheritance:
+
+    .. automethod:: __init__
+
+
+.. automodule:: ray.train.tensorflow
+    :members:
+    :exclude-members: TensorflowTrainer
+    :show-inheritance:
+
+PyTorch
+~~~~~~~
+
+.. autoclass:: ray.train.torch.TorchTrainer
+    :members:
+    :show-inheritance:
+
+    .. automethod:: __init__
+
+
+.. automodule:: ray.train.torch
+    :members:
+    :exclude-members: TorchTrainer
+    :show-inheritance:
+
+Horovod
+~~~~~~~
+
+.. autoclass:: ray.train.horovod.HorovodTrainer
+    :members:
+    :show-inheritance:
+
+    .. automethod:: __init__
+
+
+.. automodule:: ray.train.horovod
+    :members:
+    :exclude-members: HorovodTrainer
+    :show-inheritance:
+
+HuggingFace
 ~~~~~~~~~~~

-.. autoclass:: ray.train.torch.TorchConfig
+.. autoclass:: ray.train.huggingface.HuggingFaceTrainer
+    :members:
+    :show-inheritance:
+
+    .. automethod:: __init__
+
+
+.. automodule:: ray.train.huggingface
+    :members:
+    :exclude-members: HuggingFaceTrainer
+    :show-inheritance:
+
+Scikit-Learn
+~~~~~~~~~~~~
+
+.. autoclass:: ray.train.sklearn.SklearnTrainer
+    :members:
+    :show-inheritance:
+
+    .. automethod:: __init__
+
+
+.. automodule:: ray.train.sklearn
+    :members:
+    :exclude-members: SklearnTrainer
+    :show-inheritance:
+
+
+Reinforcement Learning (RLlib)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. automodule:: ray.train.rl
+    :members:
+    :show-inheritance:
+
+
+Base Classes (Developer APIs)
+-----------------------------
+.. autoclass:: ray.train.trainer.BaseTrainer
+    :members:
    :noindex:

-.. _train-api-tensorflow-config:
-
-TensorflowConfig
-~~~~~~~~~~~~~~~~
-
-.. autoclass:: ray.train.tensorflow.TensorflowConfig
+    .. automethod:: __init__
        :noindex:

-.. _train-api-horovod-config:
-
-HorovodConfig
-~~~~~~~~~~~~~
-
-.. autoclass:: ray.train.horovod.HorovodConfig
+.. autoclass:: ray.train.data_parallel_trainer.DataParallelTrainer
+    :members:
+    :show-inheritance:
    :noindex:

-.. _train-api-backend-interfaces:
+    .. automethod:: __init__
+        :noindex:

-Backend interfaces (for developers only)
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: ray.train.gbdt_trainer.GBDTTrainer
+    :members:
+    :show-inheritance:
+    :noindex:

-Backend
-+++++++
+    .. automethod:: __init__
+        :noindex:

 .. autoclass:: ray.train.backend.Backend
-
-BackendConfig
-+++++++++++++
+    :members:

 .. autoclass:: ray.train.backend.BackendConfig
+    :members:


 Deprecated APIs
--- a/doc/source/train/dl_guide.rst
+++ b/doc/source/train/dl_guide.rst
@ -57,10 +57,11 @@ training.
    to automatically prepare your model and data for distributed training.

    .. note::
-       Ray Train will still work even if you don't use the ``prepare_model`` and ``prepare_data_loader`` utilities below,
+       Ray Train will still work even if you don't use the :func:`ray.train.torch.prepare_model`
+       and :func:`ray.train.torch.prepare_data_loader` utilities below,
       and instead handle the logic directly inside your training function.

-    First, use the ``prepare_model`` function to automatically move your model to the right device and wrap it in
+    First, use the :func:~ray.train.torch.prepare_model` function to automatically move your model to the right device and wrap it in
    ``DistributedDataParallel``

    .. code-block:: diff
@ -89,7 +90,8 @@ training.


    Then, use the ``prepare_data_loader`` function to automatically add a ``DistributedSampler`` to your ``DataLoader``
-    and move the batches to the right device.
+    and move the batches to the right device. This step is not necessary if you are passing in Ray Datasets to your Trainer
+    (see :ref:`train-datasets`)

    .. code-block:: diff

@ -216,7 +218,7 @@ with one of the following:
            scaling_config=ScalingConfig(use_gpu=use_gpu, num_workers=2)
        )

-To customize the backend setup, you can use a :ref:`train-api-backend-config` object.
+To customize the backend setup, you can use the :ref:`framework-specific config objects <train-integration-api>`.

 .. tabbed:: PyTorch

@ -258,7 +260,7 @@ To customize the backend setup, you can use a :ref:`train-api-backend-config` ob
            scaling_config=ScalingConfig(num_workers=2),
        )

-For more configurability, please reference the :class:`BaseTrainer` API.
+For more configurability, please reference the :py:class:`~ray.train.data_parallel_trainer.DataParallelTrainer` API.

 Run training function
 ~~~~~~~~~~~~~~~~~~~~~
@ -327,7 +329,7 @@ Accessing Training Results

 .. TODO(ml-team) Flesh this section out.

-The return of a ``Trainer.fit`` is a :class:`Result` object, containing
+The return of a ``Trainer.fit`` is a :py:class:`~ray.air.result.Result` object, containing
 information about the training run. You can access it to obtain saved checkpoints,
 metrics and other relevant data.

@ -370,7 +372,7 @@ For example, you can:

    print(result.metrics_dataframe)

-* Obtain the :class:`Checkpoint`, used for resuming training, prediction and serving.
+* Obtain the :py:class:`~ray.air.checkpoint.Checkpoint`, used for resuming training, prediction and serving.

 .. code-block:: python

@ -385,7 +387,7 @@ Log Directory Structure
 Each ``Trainer`` will have a local directory created for logs and checkpoints.

 You can obtain the path to the directory by accessing the ``log_dir`` attribute
-of the :class:`Result` object returned by ``Trainer.fit``.
+of the :py:class:`~ray.air.result.Result` object returned by ``Trainer.fit()``.

 .. code-block:: python

@ -497,7 +499,7 @@ training function. This will cause the checkpoint state from the distributed
 workers to be saved on the ``Trainer`` (where your python script is executed).

 The latest saved checkpoint can be accessed through the ``checkpoint`` attribute of 
-the :class:`Result`, and the best saved checkpoints can be accessed by the ``best_checkpoints``
+the :py:class:`~ray.air.result.Result`, and the best saved checkpoints can be accessed by the ``best_checkpoints``
 attribute.

 Concrete examples are provided to demonstrate how checkpoints (model weights but not models) are saved
@ -619,7 +621,7 @@ Configuring checkpoints
 +++++++++++++++++++++++

 For more configurability of checkpointing behavior (specifically saving
-checkpoints to disk), a :class:`CheckpointConfig` can be passed into
+checkpoints to disk), a :py:class:`~ray.air.config.CheckpointConfig` can be passed into
 ``Trainer``.

 As an example, to completely disable writing checkpoints to disk:
@ -684,11 +686,11 @@ Loading checkpoints

 Checkpoints can be loaded into the training function in 2 steps:

-1. From the training function, ``session.get_checkpoint`` can be used to access
-   the most recently saved :class:`Checkpoint`. This is useful to continue training even
+1. From the training function, :func:`ray.air.session.get_checkpoint` can be used to access
+   the most recently saved :py:class:`~ray.air.checkpoint.Checkpoint`. This is useful to continue training even
   if there's a worker failure.
 2. The checkpoint to start training with can be bootstrapped by passing in a
-   :class:`Checkpoint` to ``Trainer`` as the ``resume_from_checkpoint`` argument.
+   :py:class:`~ray.air.checkpoint.Checkpoint` to ``Trainer`` as the ``resume_from_checkpoint`` argument.

 .. tabbed:: PyTorch

@ -835,7 +837,7 @@ Callbacks

 You may want to plug in your training code with your favorite experiment management framework.
 Ray AIR provides an interface to fetch intermediate results and callbacks to process/log your intermediate results
-(the values passed into ``session.report(...)``).
+(the values passed into :func:`ray.air.session.report`).

 Ray AIR contains :ref:`built-in callbacks <air-builtin-callbacks>` for popular tracking frameworks, or you can implement your own callback via the :ref:`Callback <tune-callbacks-docs>` interface.

@ -860,7 +862,7 @@ Custom Callbacks
 ++++++++++++++++

 If the provided callbacks do not cover your desired integrations or use-cases,
-you may always implement a custom callback by subclassing ``Callback``. If
+you may always implement a custom callback by subclassing :py:class:`~ray.tune.logger.LoggerCallback`. If
 the callback is general enough, please feel welcome to :ref:`add it <getting-involved>`
 to the ``ray`` `repository <https://github.com/ray-project/ray>`_.

@ -1034,7 +1036,7 @@ Hyperparameter tuning (Ray Tune)

 Hyperparameter tuning with :ref:`Ray Tune <tune-main>` is natively supported
 with Ray Train. Specifically, you can take an existing ``Trainer`` and simply
-pass it into a :class:`Tuner`.
+pass it into a :py:class:`~ray.tune.tuner.Tuner`.

 .. code-block:: python

@ -1076,9 +1078,9 @@ precision datatype for operations like linear layers and convolutions.

    You can train your Torch model with AMP by:

-    1. Adding ``train.torch.accelerate(amp=True)`` to the top of your training function.
-    2. Wrapping your optimizer with ``train.torch.prepare_optimizer``.
-    3. Replacing your backward call with ``train.torch.backward``.
+    1. Adding :func:`ray.train.torch.accelerate` with ``amp=True`` to the top of your training function.
+    2. Wrapping your optimizer with :func:`ray.train.torch.prepare_optimizer`.
+    3. Replacing your backward call with :func:`ray.train.torch.backward`.

    .. code-block:: diff

@ -1120,7 +1122,7 @@ Reproducibility
 .. tabbed:: PyTorch

    To limit sources of nondeterministic behavior, add
-    ``train.torch.enable_reproducibility()`` to the top of your training
+    :func:`ray.train.torch.enable_reproducibility` to the top of your training
    function.

    .. code-block:: diff
@ -1133,7 +1135,7 @@ Reproducibility

            ...

-    .. warning:: ``train.torch.enable_reproducibility`` can't guarantee
+    .. warning:: :func:`ray.train.torch.enable_reproducibility` can't guarantee
        completely reproducible results across executions. To learn more, read
        the `PyTorch notes on randomness <https://pytorch.org/docs/stable/notes/randomness.html>`_.

--- a/python/ray/train/data_parallel_trainer.py
+++ b/python/ray/train/data_parallel_trainer.py
@ -143,8 +143,8 @@ class DataParallelTrainer(BaseTrainer):
      - **Use Case 1:** You want to do data parallel training, but want to have
        a predefined ``training_loop_per_worker``.

-      - **Use Case 2:** You want to implement a custom :ref:`Training backend
-        <train-api-backend-interfaces>` that automatically handles
+      - **Use Case 2:** You want to implement a custom
+        :py:class:`~ray.train.backend.Backend` that automatically handles
        additional setup or teardown logic on each actor, so that the users of this
        new trainer do not have to implement this logic. For example, a
        ``TensorflowTrainer`` can be built on top of ``DataParallelTrainer``