[Docs] [Train] Update Train API reference and docs (#28192)

Signed-off-by: Amog Kamsetty amogkamsetty@yahoo.com Adds back more Ray Train APIs to Ray Train docs. Also makes updates to the user guide for better references.
2025-03-05 10:01:43 -05:00 · 2022-09-01 17:47:42 -07:00 · 2022-09-01 17:47:42 -07:00 · b83f10dbde
commit b83f10dbde
parent 118b76218a
4 changed files with 182 additions and 53 deletions
--- a/doc/source/ray-air/package-ref.rst
+++ b/doc/source/ray-air/package-ref.rst
@ -122,6 +122,8 @@ Training Result
 .. automodule:: ray.air.result
    :members:
 .. _air-session-ref:
 Training Session
 ################
@ -199,14 +201,17 @@ XGBoost
 .. autoclass:: ray.train.xgboost.XGBoostTrainer
    :members:
    :show-inheritance:
    :noindex:
    .. automethod:: __init__
        :noindex:
 .. automodule:: ray.train.xgboost
    :members:
    :exclude-members: XGBoostTrainer
    :show-inheritance:
    :noindex:
 LightGBM
 ########
@ -214,14 +219,17 @@ LightGBM
 .. autoclass:: ray.train.lightgbm.LightGBMTrainer
    :members:
    :show-inheritance:
    :noindex:
    .. automethod:: __init__
        :noindex:
 .. automodule:: ray.train.lightgbm
    :members:
    :exclude-members: LightGBMTrainer
    :show-inheritance:
    :noindex:
 TensorFlow
 ##########
@ -229,14 +237,17 @@ TensorFlow
 .. autoclass:: ray.train.tensorflow.TensorflowTrainer
    :members:
    :show-inheritance:
    :noindex:
    .. automethod:: __init__
        :noindex:
 .. automodule:: ray.train.tensorflow
    :members:
    :exclude-members: TensorflowTrainer
    :show-inheritance:
    :noindex:
 .. _air-pytorch-ref:
@ -246,14 +257,17 @@ PyTorch
 .. autoclass:: ray.train.torch.TorchTrainer
    :members:
    :show-inheritance:
    :noindex:
    .. automethod:: __init__
        :noindex:
 .. automodule:: ray.train.torch
    :members:
    :exclude-members: TorchTrainer
    :show-inheritance:
    :noindex:
 Horovod
 #######
@ -261,14 +275,17 @@ Horovod
 .. autoclass:: ray.train.horovod.HorovodTrainer
    :members:
    :show-inheritance:
    :noindex:
    .. automethod:: __init__
        :noindex:
 .. automodule:: ray.train.horovod
    :members:
    :exclude-members: HorovodTrainer
    :show-inheritance:
    :noindex:
 HuggingFace
 ###########
@ -276,14 +293,17 @@ HuggingFace
 .. autoclass:: ray.train.huggingface.HuggingFaceTrainer
    :members:
    :show-inheritance:
    :noindex:
    .. automethod:: __init__
        :noindex:
 .. automodule:: ray.train.huggingface
    :members:
    :exclude-members: HuggingFaceTrainer
    :show-inheritance:
    :noindex:
 Scikit-Learn
 ############
@ -291,14 +311,17 @@ Scikit-Learn
 .. autoclass:: ray.train.sklearn.SklearnTrainer
    :members:
    :show-inheritance:
    :noindex:
    .. automethod:: __init__
        :noindex:
 .. automodule:: ray.train.sklearn
    :members:
    :exclude-members: SklearnTrainer
    :show-inheritance:
    :noindex:
 Reinforcement Learning (RLlib)
@ -307,6 +330,7 @@ Reinforcement Learning (RLlib)
 .. automodule:: ray.train.rl
    :members:
    :show-inheritance:
    :noindex:
 .. _air-builtin-callbacks:
@ -333,5 +357,3 @@ Weights and Biases
 ##################
 .. autoclass:: ray.air.callbacks.wandb.WandbLoggerCallback
 .. _air-session-ref:
--- a/doc/source/train/api.rst
+++ b/doc/source/train/api.rst
@ -2,54 +2,159 @@
 Ray Train API
 =============
 This page covers framework specific integrations with Ray Train and Ray Train Developer APIs.
-This page covers advanced configurations for specific frameworks using Train.
+For core Ray AIR APIs, take a look at the :ref:`AIR Trainer package reference <air-trainer-ref>`.
-For different high level trainers and their usage, take a look at the :ref:`AIR Trainer package reference <air-trainer-ref>`.
+.. _train-integration-api:
-.. _train-api-backend-config:
+Trainer and Predictor Integrations
 ----------------------------------
-Backend Configurations
+XGBoost
----------------------
+~~~~~~~
-.. _train-api-torch-config:
+.. autoclass:: ray.train.xgboost.XGBoostTrainer
    :members:
    :show-inheritance:
-TorchConfig
+    .. automethod:: __init__
 .. automodule:: ray.train.xgboost
    :members:
    :exclude-members: XGBoostTrainer
    :show-inheritance:
 LightGBM
 ~~~~~~~~
 .. autoclass:: ray.train.lightgbm.LightGBMTrainer
    :members:
    :show-inheritance:
    .. automethod:: __init__
 .. automodule:: ray.train.lightgbm
    :members:
    :exclude-members: LightGBMTrainer
    :show-inheritance:
 TensorFlow
 ~~~~~~~~~~
 .. autoclass:: ray.train.tensorflow.TensorflowTrainer
    :members:
    :show-inheritance:
    .. automethod:: __init__
 .. automodule:: ray.train.tensorflow
    :members:
    :exclude-members: TensorflowTrainer
    :show-inheritance:
 PyTorch
 ~~~~~~~
 .. autoclass:: ray.train.torch.TorchTrainer
    :members:
    :show-inheritance:
    .. automethod:: __init__
 .. automodule:: ray.train.torch
    :members:
    :exclude-members: TorchTrainer
    :show-inheritance:
 Horovod
 ~~~~~~~
 .. autoclass:: ray.train.horovod.HorovodTrainer
    :members:
    :show-inheritance:
    .. automethod:: __init__
 .. automodule:: ray.train.horovod
    :members:
    :exclude-members: HorovodTrainer
    :show-inheritance:
 HuggingFace
 ~~~~~~~~~~~
-.. autoclass:: ray.train.torch.TorchConfig
+.. autoclass:: ray.train.huggingface.HuggingFaceTrainer
    :members:
    :show-inheritance:
    .. automethod:: __init__
 .. automodule:: ray.train.huggingface
    :members:
    :exclude-members: HuggingFaceTrainer
    :show-inheritance:
 Scikit-Learn
 ~~~~~~~~~~~~
 .. autoclass:: ray.train.sklearn.SklearnTrainer
    :members:
    :show-inheritance:
    .. automethod:: __init__
 .. automodule:: ray.train.sklearn
    :members:
    :exclude-members: SklearnTrainer
    :show-inheritance:
 Reinforcement Learning (RLlib)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 .. automodule:: ray.train.rl
    :members:
    :show-inheritance:
 Base Classes (Developer APIs)
 -----------------------------
 .. autoclass:: ray.train.trainer.BaseTrainer
    :members:
    :noindex:
-.. _train-api-tensorflow-config:
+    .. automethod:: __init__
 TensorflowConfig
 ~~~~~~~~~~~~~~~~
 .. autoclass:: ray.train.tensorflow.TensorflowConfig
        :noindex:
-.. _train-api-horovod-config:
+.. autoclass:: ray.train.data_parallel_trainer.DataParallelTrainer
-
+    :members:
-HorovodConfig
+    :show-inheritance:
 ~~~~~~~~~~~~~
 .. autoclass:: ray.train.horovod.HorovodConfig
    :noindex:
-.. _train-api-backend-interfaces:
+    .. automethod:: __init__
        :noindex:
-Backend interfaces (for developers only)
+.. autoclass:: ray.train.gbdt_trainer.GBDTTrainer
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    :members:
    :show-inheritance:
    :noindex:
-Backend
+    .. automethod:: __init__
-+++++++
+        :noindex:
 .. autoclass:: ray.train.backend.Backend
-
+    :members:
 BackendConfig
 +++++++++++++
 .. autoclass:: ray.train.backend.BackendConfig
    :members:
 Deprecated APIs
--- a/doc/source/train/dl_guide.rst
+++ b/doc/source/train/dl_guide.rst
@ -57,10 +57,11 @@ training.
    to automatically prepare your model and data for distributed training.
    .. note::
-       Ray Train will still work even if you don't use the ``prepare_model`` and ``prepare_data_loader`` utilities below,
+       Ray Train will still work even if you don't use the :func:`ray.train.torch.prepare_model`
       and :func:`ray.train.torch.prepare_data_loader` utilities below,
       and instead handle the logic directly inside your training function.
-    First, use the ``prepare_model`` function to automatically move your model to the right device and wrap it in
+    First, use the :func:~ray.train.torch.prepare_model` function to automatically move your model to the right device and wrap it in
    ``DistributedDataParallel``
    .. code-block:: diff
@ -89,7 +90,8 @@ training.
    Then, use the ``prepare_data_loader`` function to automatically add a ``DistributedSampler`` to your ``DataLoader``
-    and move the batches to the right device.
+    and move the batches to the right device. This step is not necessary if you are passing in Ray Datasets to your Trainer
    (see :ref:`train-datasets`)
    .. code-block:: diff
@ -216,7 +218,7 @@ with one of the following:
            scaling_config=ScalingConfig(use_gpu=use_gpu, num_workers=2)
        )
-To customize the backend setup, you can use a :ref:`train-api-backend-config` object.
+To customize the backend setup, you can use the :ref:`framework-specific config objects <train-integration-api>`.
 .. tabbed:: PyTorch
@ -258,7 +260,7 @@ To customize the backend setup, you can use a :ref:`train-api-backend-config` ob
            scaling_config=ScalingConfig(num_workers=2),
        )
-For more configurability, please reference the :class:`BaseTrainer` API.
+For more configurability, please reference the :py:class:`~ray.train.data_parallel_trainer.DataParallelTrainer` API.
 Run training function
 ~~~~~~~~~~~~~~~~~~~~~
@ -327,7 +329,7 @@ Accessing Training Results
 .. TODO(ml-team) Flesh this section out.
-The return of a ``Trainer.fit`` is a :class:`Result` object, containing
+The return of a ``Trainer.fit`` is a :py:class:`~ray.air.result.Result` object, containing
 information about the training run. You can access it to obtain saved checkpoints,
 metrics and other relevant data.
@ -370,7 +372,7 @@ For example, you can:
    print(result.metrics_dataframe)
-* Obtain the :class:`Checkpoint`, used for resuming training, prediction and serving.
+* Obtain the :py:class:`~ray.air.checkpoint.Checkpoint`, used for resuming training, prediction and serving.
 .. code-block:: python
@ -385,7 +387,7 @@ Log Directory Structure
 Each ``Trainer`` will have a local directory created for logs and checkpoints.
 You can obtain the path to the directory by accessing the ``log_dir`` attribute
-of the :class:`Result` object returned by ``Trainer.fit``.
+of the :py:class:`~ray.air.result.Result` object returned by ``Trainer.fit()``.
 .. code-block:: python
@ -497,7 +499,7 @@ training function. This will cause the checkpoint state from the distributed
 workers to be saved on the ``Trainer`` (where your python script is executed).
 The latest saved checkpoint can be accessed through the ``checkpoint`` attribute of 
-the :class:`Result`, and the best saved checkpoints can be accessed by the ``best_checkpoints``
+the :py:class:`~ray.air.result.Result`, and the best saved checkpoints can be accessed by the ``best_checkpoints``
 attribute.
 Concrete examples are provided to demonstrate how checkpoints (model weights but not models) are saved
@ -619,7 +621,7 @@ Configuring checkpoints
 +++++++++++++++++++++++
 For more configurability of checkpointing behavior (specifically saving
-checkpoints to disk), a :class:`CheckpointConfig` can be passed into
+checkpoints to disk), a :py:class:`~ray.air.config.CheckpointConfig` can be passed into
 ``Trainer``.
 As an example, to completely disable writing checkpoints to disk:
@ -684,11 +686,11 @@ Loading checkpoints
 Checkpoints can be loaded into the training function in 2 steps:
-1. From the training function, ``session.get_checkpoint`` can be used to access
+1. From the training function, :func:`ray.air.session.get_checkpoint` can be used to access
-   the most recently saved :class:`Checkpoint`. This is useful to continue training even
+   the most recently saved :py:class:`~ray.air.checkpoint.Checkpoint`. This is useful to continue training even
   if there's a worker failure.
 2. The checkpoint to start training with can be bootstrapped by passing in a
-   :class:`Checkpoint` to ``Trainer`` as the ``resume_from_checkpoint`` argument.
+   :py:class:`~ray.air.checkpoint.Checkpoint` to ``Trainer`` as the ``resume_from_checkpoint`` argument.
 .. tabbed:: PyTorch
@ -835,7 +837,7 @@ Callbacks
 You may want to plug in your training code with your favorite experiment management framework.
 Ray AIR provides an interface to fetch intermediate results and callbacks to process/log your intermediate results
-(the values passed into ``session.report(...)``).
+(the values passed into :func:`ray.air.session.report`).
 Ray AIR contains :ref:`built-in callbacks <air-builtin-callbacks>` for popular tracking frameworks, or you can implement your own callback via the :ref:`Callback <tune-callbacks-docs>` interface.
@ -860,7 +862,7 @@ Custom Callbacks
 ++++++++++++++++
 If the provided callbacks do not cover your desired integrations or use-cases,
-you may always implement a custom callback by subclassing ``Callback``. If
+you may always implement a custom callback by subclassing :py:class:`~ray.tune.logger.LoggerCallback`. If
 the callback is general enough, please feel welcome to :ref:`add it <getting-involved>`
 to the ``ray`` `repository <https://github.com/ray-project/ray>`_.
@ -1034,7 +1036,7 @@ Hyperparameter tuning (Ray Tune)
 Hyperparameter tuning with :ref:`Ray Tune <tune-main>` is natively supported
 with Ray Train. Specifically, you can take an existing ``Trainer`` and simply
-pass it into a :class:`Tuner`.
+pass it into a :py:class:`~ray.tune.tuner.Tuner`.
 .. code-block:: python
@ -1076,9 +1078,9 @@ precision datatype for operations like linear layers and convolutions.
    You can train your Torch model with AMP by:
-    1. Adding ``train.torch.accelerate(amp=True)`` to the top of your training function.
+    1. Adding :func:`ray.train.torch.accelerate` with ``amp=True`` to the top of your training function.
-    2. Wrapping your optimizer with ``train.torch.prepare_optimizer``.
+    2. Wrapping your optimizer with :func:`ray.train.torch.prepare_optimizer`.
-    3. Replacing your backward call with ``train.torch.backward``.
+    3. Replacing your backward call with :func:`ray.train.torch.backward`.
    .. code-block:: diff
@ -1120,7 +1122,7 @@ Reproducibility
 .. tabbed:: PyTorch
    To limit sources of nondeterministic behavior, add
-    ``train.torch.enable_reproducibility()`` to the top of your training
+    :func:`ray.train.torch.enable_reproducibility` to the top of your training
    function.
    .. code-block:: diff
@ -1133,7 +1135,7 @@ Reproducibility
            ...
-    .. warning:: ``train.torch.enable_reproducibility`` can't guarantee
+    .. warning:: :func:`ray.train.torch.enable_reproducibility` can't guarantee
        completely reproducible results across executions. To learn more, read
        the `PyTorch notes on randomness <https://pytorch.org/docs/stable/notes/randomness.html>`_.
--- a/python/ray/train/data_parallel_trainer.py
+++ b/python/ray/train/data_parallel_trainer.py
@ -143,8 +143,8 @@ class DataParallelTrainer(BaseTrainer):
      - **Use Case 1:** You want to do data parallel training, but want to have
        a predefined ``training_loop_per_worker``.
-      - **Use Case 2:** You want to implement a custom :ref:`Training backend
+      - **Use Case 2:** You want to implement a custom
-        <train-api-backend-interfaces>` that automatically handles
+        :py:class:`~ray.train.backend.Backend` that automatically handles
        additional setup or teardown logic on each actor, so that the users of this
        new trainer do not have to implement this logic. For example, a
        ``TensorflowTrainer`` can be built on top of ``DataParallelTrainer``