[Docs] [Train] Update Train API reference and docs (#28192)

Signed-off-by: Amog Kamsetty amogkamsetty@yahoo.com

Adds back more Ray Train APIs to Ray Train docs.

Also makes updates to the user guide for better references.
This commit is contained in:
Amog Kamsetty 2022-09-01 17:47:42 -07:00 committed by GitHub
parent 118b76218a
commit b83f10dbde
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
4 changed files with 182 additions and 53 deletions

View file

@ -122,6 +122,8 @@ Training Result
.. automodule:: ray.air.result .. automodule:: ray.air.result
:members: :members:
.. _air-session-ref:
Training Session Training Session
################ ################
@ -199,14 +201,17 @@ XGBoost
.. autoclass:: ray.train.xgboost.XGBoostTrainer .. autoclass:: ray.train.xgboost.XGBoostTrainer
:members: :members:
:show-inheritance: :show-inheritance:
:noindex:
.. automethod:: __init__ .. automethod:: __init__
:noindex:
.. automodule:: ray.train.xgboost .. automodule:: ray.train.xgboost
:members: :members:
:exclude-members: XGBoostTrainer :exclude-members: XGBoostTrainer
:show-inheritance: :show-inheritance:
:noindex:
LightGBM LightGBM
######## ########
@ -214,14 +219,17 @@ LightGBM
.. autoclass:: ray.train.lightgbm.LightGBMTrainer .. autoclass:: ray.train.lightgbm.LightGBMTrainer
:members: :members:
:show-inheritance: :show-inheritance:
:noindex:
.. automethod:: __init__ .. automethod:: __init__
:noindex:
.. automodule:: ray.train.lightgbm .. automodule:: ray.train.lightgbm
:members: :members:
:exclude-members: LightGBMTrainer :exclude-members: LightGBMTrainer
:show-inheritance: :show-inheritance:
:noindex:
TensorFlow TensorFlow
########## ##########
@ -229,14 +237,17 @@ TensorFlow
.. autoclass:: ray.train.tensorflow.TensorflowTrainer .. autoclass:: ray.train.tensorflow.TensorflowTrainer
:members: :members:
:show-inheritance: :show-inheritance:
:noindex:
.. automethod:: __init__ .. automethod:: __init__
:noindex:
.. automodule:: ray.train.tensorflow .. automodule:: ray.train.tensorflow
:members: :members:
:exclude-members: TensorflowTrainer :exclude-members: TensorflowTrainer
:show-inheritance: :show-inheritance:
:noindex:
.. _air-pytorch-ref: .. _air-pytorch-ref:
@ -246,14 +257,17 @@ PyTorch
.. autoclass:: ray.train.torch.TorchTrainer .. autoclass:: ray.train.torch.TorchTrainer
:members: :members:
:show-inheritance: :show-inheritance:
:noindex:
.. automethod:: __init__ .. automethod:: __init__
:noindex:
.. automodule:: ray.train.torch .. automodule:: ray.train.torch
:members: :members:
:exclude-members: TorchTrainer :exclude-members: TorchTrainer
:show-inheritance: :show-inheritance:
:noindex:
Horovod Horovod
####### #######
@ -261,14 +275,17 @@ Horovod
.. autoclass:: ray.train.horovod.HorovodTrainer .. autoclass:: ray.train.horovod.HorovodTrainer
:members: :members:
:show-inheritance: :show-inheritance:
:noindex:
.. automethod:: __init__ .. automethod:: __init__
:noindex:
.. automodule:: ray.train.horovod .. automodule:: ray.train.horovod
:members: :members:
:exclude-members: HorovodTrainer :exclude-members: HorovodTrainer
:show-inheritance: :show-inheritance:
:noindex:
HuggingFace HuggingFace
########### ###########
@ -276,14 +293,17 @@ HuggingFace
.. autoclass:: ray.train.huggingface.HuggingFaceTrainer .. autoclass:: ray.train.huggingface.HuggingFaceTrainer
:members: :members:
:show-inheritance: :show-inheritance:
:noindex:
.. automethod:: __init__ .. automethod:: __init__
:noindex:
.. automodule:: ray.train.huggingface .. automodule:: ray.train.huggingface
:members: :members:
:exclude-members: HuggingFaceTrainer :exclude-members: HuggingFaceTrainer
:show-inheritance: :show-inheritance:
:noindex:
Scikit-Learn Scikit-Learn
############ ############
@ -291,14 +311,17 @@ Scikit-Learn
.. autoclass:: ray.train.sklearn.SklearnTrainer .. autoclass:: ray.train.sklearn.SklearnTrainer
:members: :members:
:show-inheritance: :show-inheritance:
:noindex:
.. automethod:: __init__ .. automethod:: __init__
:noindex:
.. automodule:: ray.train.sklearn .. automodule:: ray.train.sklearn
:members: :members:
:exclude-members: SklearnTrainer :exclude-members: SklearnTrainer
:show-inheritance: :show-inheritance:
:noindex:
Reinforcement Learning (RLlib) Reinforcement Learning (RLlib)
@ -307,6 +330,7 @@ Reinforcement Learning (RLlib)
.. automodule:: ray.train.rl .. automodule:: ray.train.rl
:members: :members:
:show-inheritance: :show-inheritance:
:noindex:
.. _air-builtin-callbacks: .. _air-builtin-callbacks:
@ -333,5 +357,3 @@ Weights and Biases
################## ##################
.. autoclass:: ray.air.callbacks.wandb.WandbLoggerCallback .. autoclass:: ray.air.callbacks.wandb.WandbLoggerCallback
.. _air-session-ref:

View file

@ -2,54 +2,159 @@
Ray Train API Ray Train API
============= =============
This page covers framework specific integrations with Ray Train and Ray Train Developer APIs.
This page covers advanced configurations for specific frameworks using Train. For core Ray AIR APIs, take a look at the :ref:`AIR Trainer package reference <air-trainer-ref>`.
For different high level trainers and their usage, take a look at the :ref:`AIR Trainer package reference <air-trainer-ref>`. .. _train-integration-api:
.. _train-api-backend-config: Trainer and Predictor Integrations
----------------------------------
Backend Configurations XGBoost
---------------------- ~~~~~~~
.. _train-api-torch-config: .. autoclass:: ray.train.xgboost.XGBoostTrainer
:members:
:show-inheritance:
TorchConfig .. automethod:: __init__
.. automodule:: ray.train.xgboost
:members:
:exclude-members: XGBoostTrainer
:show-inheritance:
LightGBM
~~~~~~~~
.. autoclass:: ray.train.lightgbm.LightGBMTrainer
:members:
:show-inheritance:
.. automethod:: __init__
.. automodule:: ray.train.lightgbm
:members:
:exclude-members: LightGBMTrainer
:show-inheritance:
TensorFlow
~~~~~~~~~~
.. autoclass:: ray.train.tensorflow.TensorflowTrainer
:members:
:show-inheritance:
.. automethod:: __init__
.. automodule:: ray.train.tensorflow
:members:
:exclude-members: TensorflowTrainer
:show-inheritance:
PyTorch
~~~~~~~
.. autoclass:: ray.train.torch.TorchTrainer
:members:
:show-inheritance:
.. automethod:: __init__
.. automodule:: ray.train.torch
:members:
:exclude-members: TorchTrainer
:show-inheritance:
Horovod
~~~~~~~
.. autoclass:: ray.train.horovod.HorovodTrainer
:members:
:show-inheritance:
.. automethod:: __init__
.. automodule:: ray.train.horovod
:members:
:exclude-members: HorovodTrainer
:show-inheritance:
HuggingFace
~~~~~~~~~~~ ~~~~~~~~~~~
.. autoclass:: ray.train.torch.TorchConfig .. autoclass:: ray.train.huggingface.HuggingFaceTrainer
:members:
:show-inheritance:
.. automethod:: __init__
.. automodule:: ray.train.huggingface
:members:
:exclude-members: HuggingFaceTrainer
:show-inheritance:
Scikit-Learn
~~~~~~~~~~~~
.. autoclass:: ray.train.sklearn.SklearnTrainer
:members:
:show-inheritance:
.. automethod:: __init__
.. automodule:: ray.train.sklearn
:members:
:exclude-members: SklearnTrainer
:show-inheritance:
Reinforcement Learning (RLlib)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. automodule:: ray.train.rl
:members:
:show-inheritance:
Base Classes (Developer APIs)
-----------------------------
.. autoclass:: ray.train.trainer.BaseTrainer
:members:
:noindex: :noindex:
.. _train-api-tensorflow-config: .. automethod:: __init__
TensorflowConfig
~~~~~~~~~~~~~~~~
.. autoclass:: ray.train.tensorflow.TensorflowConfig
:noindex: :noindex:
.. _train-api-horovod-config: .. autoclass:: ray.train.data_parallel_trainer.DataParallelTrainer
:members:
HorovodConfig :show-inheritance:
~~~~~~~~~~~~~
.. autoclass:: ray.train.horovod.HorovodConfig
:noindex: :noindex:
.. _train-api-backend-interfaces: .. automethod:: __init__
:noindex:
Backend interfaces (for developers only) .. autoclass:: ray.train.gbdt_trainer.GBDTTrainer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ :members:
:show-inheritance:
:noindex:
Backend .. automethod:: __init__
+++++++ :noindex:
.. autoclass:: ray.train.backend.Backend .. autoclass:: ray.train.backend.Backend
:members:
BackendConfig
+++++++++++++
.. autoclass:: ray.train.backend.BackendConfig .. autoclass:: ray.train.backend.BackendConfig
:members:
Deprecated APIs Deprecated APIs

View file

@ -57,10 +57,11 @@ training.
to automatically prepare your model and data for distributed training. to automatically prepare your model and data for distributed training.
.. note:: .. note::
Ray Train will still work even if you don't use the ``prepare_model`` and ``prepare_data_loader`` utilities below, Ray Train will still work even if you don't use the :func:`ray.train.torch.prepare_model`
and :func:`ray.train.torch.prepare_data_loader` utilities below,
and instead handle the logic directly inside your training function. and instead handle the logic directly inside your training function.
First, use the ``prepare_model`` function to automatically move your model to the right device and wrap it in First, use the :func:~ray.train.torch.prepare_model` function to automatically move your model to the right device and wrap it in
``DistributedDataParallel`` ``DistributedDataParallel``
.. code-block:: diff .. code-block:: diff
@ -89,7 +90,8 @@ training.
Then, use the ``prepare_data_loader`` function to automatically add a ``DistributedSampler`` to your ``DataLoader`` Then, use the ``prepare_data_loader`` function to automatically add a ``DistributedSampler`` to your ``DataLoader``
and move the batches to the right device. and move the batches to the right device. This step is not necessary if you are passing in Ray Datasets to your Trainer
(see :ref:`train-datasets`)
.. code-block:: diff .. code-block:: diff
@ -216,7 +218,7 @@ with one of the following:
scaling_config=ScalingConfig(use_gpu=use_gpu, num_workers=2) scaling_config=ScalingConfig(use_gpu=use_gpu, num_workers=2)
) )
To customize the backend setup, you can use a :ref:`train-api-backend-config` object. To customize the backend setup, you can use the :ref:`framework-specific config objects <train-integration-api>`.
.. tabbed:: PyTorch .. tabbed:: PyTorch
@ -258,7 +260,7 @@ To customize the backend setup, you can use a :ref:`train-api-backend-config` ob
scaling_config=ScalingConfig(num_workers=2), scaling_config=ScalingConfig(num_workers=2),
) )
For more configurability, please reference the :class:`BaseTrainer` API. For more configurability, please reference the :py:class:`~ray.train.data_parallel_trainer.DataParallelTrainer` API.
Run training function Run training function
~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~
@ -327,7 +329,7 @@ Accessing Training Results
.. TODO(ml-team) Flesh this section out. .. TODO(ml-team) Flesh this section out.
The return of a ``Trainer.fit`` is a :class:`Result` object, containing The return of a ``Trainer.fit`` is a :py:class:`~ray.air.result.Result` object, containing
information about the training run. You can access it to obtain saved checkpoints, information about the training run. You can access it to obtain saved checkpoints,
metrics and other relevant data. metrics and other relevant data.
@ -370,7 +372,7 @@ For example, you can:
print(result.metrics_dataframe) print(result.metrics_dataframe)
* Obtain the :class:`Checkpoint`, used for resuming training, prediction and serving. * Obtain the :py:class:`~ray.air.checkpoint.Checkpoint`, used for resuming training, prediction and serving.
.. code-block:: python .. code-block:: python
@ -385,7 +387,7 @@ Log Directory Structure
Each ``Trainer`` will have a local directory created for logs and checkpoints. Each ``Trainer`` will have a local directory created for logs and checkpoints.
You can obtain the path to the directory by accessing the ``log_dir`` attribute You can obtain the path to the directory by accessing the ``log_dir`` attribute
of the :class:`Result` object returned by ``Trainer.fit``. of the :py:class:`~ray.air.result.Result` object returned by ``Trainer.fit()``.
.. code-block:: python .. code-block:: python
@ -497,7 +499,7 @@ training function. This will cause the checkpoint state from the distributed
workers to be saved on the ``Trainer`` (where your python script is executed). workers to be saved on the ``Trainer`` (where your python script is executed).
The latest saved checkpoint can be accessed through the ``checkpoint`` attribute of The latest saved checkpoint can be accessed through the ``checkpoint`` attribute of
the :class:`Result`, and the best saved checkpoints can be accessed by the ``best_checkpoints`` the :py:class:`~ray.air.result.Result`, and the best saved checkpoints can be accessed by the ``best_checkpoints``
attribute. attribute.
Concrete examples are provided to demonstrate how checkpoints (model weights but not models) are saved Concrete examples are provided to demonstrate how checkpoints (model weights but not models) are saved
@ -619,7 +621,7 @@ Configuring checkpoints
+++++++++++++++++++++++ +++++++++++++++++++++++
For more configurability of checkpointing behavior (specifically saving For more configurability of checkpointing behavior (specifically saving
checkpoints to disk), a :class:`CheckpointConfig` can be passed into checkpoints to disk), a :py:class:`~ray.air.config.CheckpointConfig` can be passed into
``Trainer``. ``Trainer``.
As an example, to completely disable writing checkpoints to disk: As an example, to completely disable writing checkpoints to disk:
@ -684,11 +686,11 @@ Loading checkpoints
Checkpoints can be loaded into the training function in 2 steps: Checkpoints can be loaded into the training function in 2 steps:
1. From the training function, ``session.get_checkpoint`` can be used to access 1. From the training function, :func:`ray.air.session.get_checkpoint` can be used to access
the most recently saved :class:`Checkpoint`. This is useful to continue training even the most recently saved :py:class:`~ray.air.checkpoint.Checkpoint`. This is useful to continue training even
if there's a worker failure. if there's a worker failure.
2. The checkpoint to start training with can be bootstrapped by passing in a 2. The checkpoint to start training with can be bootstrapped by passing in a
:class:`Checkpoint` to ``Trainer`` as the ``resume_from_checkpoint`` argument. :py:class:`~ray.air.checkpoint.Checkpoint` to ``Trainer`` as the ``resume_from_checkpoint`` argument.
.. tabbed:: PyTorch .. tabbed:: PyTorch
@ -835,7 +837,7 @@ Callbacks
You may want to plug in your training code with your favorite experiment management framework. You may want to plug in your training code with your favorite experiment management framework.
Ray AIR provides an interface to fetch intermediate results and callbacks to process/log your intermediate results Ray AIR provides an interface to fetch intermediate results and callbacks to process/log your intermediate results
(the values passed into ``session.report(...)``). (the values passed into :func:`ray.air.session.report`).
Ray AIR contains :ref:`built-in callbacks <air-builtin-callbacks>` for popular tracking frameworks, or you can implement your own callback via the :ref:`Callback <tune-callbacks-docs>` interface. Ray AIR contains :ref:`built-in callbacks <air-builtin-callbacks>` for popular tracking frameworks, or you can implement your own callback via the :ref:`Callback <tune-callbacks-docs>` interface.
@ -860,7 +862,7 @@ Custom Callbacks
++++++++++++++++ ++++++++++++++++
If the provided callbacks do not cover your desired integrations or use-cases, If the provided callbacks do not cover your desired integrations or use-cases,
you may always implement a custom callback by subclassing ``Callback``. If you may always implement a custom callback by subclassing :py:class:`~ray.tune.logger.LoggerCallback`. If
the callback is general enough, please feel welcome to :ref:`add it <getting-involved>` the callback is general enough, please feel welcome to :ref:`add it <getting-involved>`
to the ``ray`` `repository <https://github.com/ray-project/ray>`_. to the ``ray`` `repository <https://github.com/ray-project/ray>`_.
@ -1034,7 +1036,7 @@ Hyperparameter tuning (Ray Tune)
Hyperparameter tuning with :ref:`Ray Tune <tune-main>` is natively supported Hyperparameter tuning with :ref:`Ray Tune <tune-main>` is natively supported
with Ray Train. Specifically, you can take an existing ``Trainer`` and simply with Ray Train. Specifically, you can take an existing ``Trainer`` and simply
pass it into a :class:`Tuner`. pass it into a :py:class:`~ray.tune.tuner.Tuner`.
.. code-block:: python .. code-block:: python
@ -1076,9 +1078,9 @@ precision datatype for operations like linear layers and convolutions.
You can train your Torch model with AMP by: You can train your Torch model with AMP by:
1. Adding ``train.torch.accelerate(amp=True)`` to the top of your training function. 1. Adding :func:`ray.train.torch.accelerate` with ``amp=True`` to the top of your training function.
2. Wrapping your optimizer with ``train.torch.prepare_optimizer``. 2. Wrapping your optimizer with :func:`ray.train.torch.prepare_optimizer`.
3. Replacing your backward call with ``train.torch.backward``. 3. Replacing your backward call with :func:`ray.train.torch.backward`.
.. code-block:: diff .. code-block:: diff
@ -1120,7 +1122,7 @@ Reproducibility
.. tabbed:: PyTorch .. tabbed:: PyTorch
To limit sources of nondeterministic behavior, add To limit sources of nondeterministic behavior, add
``train.torch.enable_reproducibility()`` to the top of your training :func:`ray.train.torch.enable_reproducibility` to the top of your training
function. function.
.. code-block:: diff .. code-block:: diff
@ -1133,7 +1135,7 @@ Reproducibility
... ...
.. warning:: ``train.torch.enable_reproducibility`` can't guarantee .. warning:: :func:`ray.train.torch.enable_reproducibility` can't guarantee
completely reproducible results across executions. To learn more, read completely reproducible results across executions. To learn more, read
the `PyTorch notes on randomness <https://pytorch.org/docs/stable/notes/randomness.html>`_. the `PyTorch notes on randomness <https://pytorch.org/docs/stable/notes/randomness.html>`_.

View file

@ -143,8 +143,8 @@ class DataParallelTrainer(BaseTrainer):
- **Use Case 1:** You want to do data parallel training, but want to have - **Use Case 1:** You want to do data parallel training, but want to have
a predefined ``training_loop_per_worker``. a predefined ``training_loop_per_worker``.
- **Use Case 2:** You want to implement a custom :ref:`Training backend - **Use Case 2:** You want to implement a custom
<train-api-backend-interfaces>` that automatically handles :py:class:`~ray.train.backend.Backend` that automatically handles
additional setup or teardown logic on each actor, so that the users of this additional setup or teardown logic on each actor, so that the users of this
new trainer do not have to implement this logic. For example, a new trainer do not have to implement this logic. For example, a
``TensorflowTrainer`` can be built on top of ``DataParallelTrainer`` ``TensorflowTrainer`` can be built on top of ``DataParallelTrainer``