mirror of
https://github.com/vale981/ray
synced 2025-03-04 17:41:43 -05:00
[Docs] [Train] Update Train API reference and docs (#28192)
Signed-off-by: Amog Kamsetty amogkamsetty@yahoo.com Adds back more Ray Train APIs to Ray Train docs. Also makes updates to the user guide for better references.
This commit is contained in:
parent
118b76218a
commit
b83f10dbde
4 changed files with 182 additions and 53 deletions
|
@ -122,6 +122,8 @@ Training Result
|
|||
.. automodule:: ray.air.result
|
||||
:members:
|
||||
|
||||
.. _air-session-ref:
|
||||
|
||||
Training Session
|
||||
################
|
||||
|
||||
|
@ -199,14 +201,17 @@ XGBoost
|
|||
.. autoclass:: ray.train.xgboost.XGBoostTrainer
|
||||
:members:
|
||||
:show-inheritance:
|
||||
:noindex:
|
||||
|
||||
.. automethod:: __init__
|
||||
:noindex:
|
||||
|
||||
|
||||
.. automodule:: ray.train.xgboost
|
||||
:members:
|
||||
:exclude-members: XGBoostTrainer
|
||||
:show-inheritance:
|
||||
:noindex:
|
||||
|
||||
LightGBM
|
||||
########
|
||||
|
@ -214,14 +219,17 @@ LightGBM
|
|||
.. autoclass:: ray.train.lightgbm.LightGBMTrainer
|
||||
:members:
|
||||
:show-inheritance:
|
||||
:noindex:
|
||||
|
||||
.. automethod:: __init__
|
||||
:noindex:
|
||||
|
||||
|
||||
.. automodule:: ray.train.lightgbm
|
||||
:members:
|
||||
:exclude-members: LightGBMTrainer
|
||||
:show-inheritance:
|
||||
:noindex:
|
||||
|
||||
TensorFlow
|
||||
##########
|
||||
|
@ -229,14 +237,17 @@ TensorFlow
|
|||
.. autoclass:: ray.train.tensorflow.TensorflowTrainer
|
||||
:members:
|
||||
:show-inheritance:
|
||||
:noindex:
|
||||
|
||||
.. automethod:: __init__
|
||||
:noindex:
|
||||
|
||||
|
||||
.. automodule:: ray.train.tensorflow
|
||||
:members:
|
||||
:exclude-members: TensorflowTrainer
|
||||
:show-inheritance:
|
||||
:noindex:
|
||||
|
||||
.. _air-pytorch-ref:
|
||||
|
||||
|
@ -246,14 +257,17 @@ PyTorch
|
|||
.. autoclass:: ray.train.torch.TorchTrainer
|
||||
:members:
|
||||
:show-inheritance:
|
||||
:noindex:
|
||||
|
||||
.. automethod:: __init__
|
||||
:noindex:
|
||||
|
||||
|
||||
.. automodule:: ray.train.torch
|
||||
:members:
|
||||
:exclude-members: TorchTrainer
|
||||
:show-inheritance:
|
||||
:noindex:
|
||||
|
||||
Horovod
|
||||
#######
|
||||
|
@ -261,14 +275,17 @@ Horovod
|
|||
.. autoclass:: ray.train.horovod.HorovodTrainer
|
||||
:members:
|
||||
:show-inheritance:
|
||||
:noindex:
|
||||
|
||||
.. automethod:: __init__
|
||||
:noindex:
|
||||
|
||||
|
||||
.. automodule:: ray.train.horovod
|
||||
:members:
|
||||
:exclude-members: HorovodTrainer
|
||||
:show-inheritance:
|
||||
:noindex:
|
||||
|
||||
HuggingFace
|
||||
###########
|
||||
|
@ -276,14 +293,17 @@ HuggingFace
|
|||
.. autoclass:: ray.train.huggingface.HuggingFaceTrainer
|
||||
:members:
|
||||
:show-inheritance:
|
||||
:noindex:
|
||||
|
||||
.. automethod:: __init__
|
||||
:noindex:
|
||||
|
||||
|
||||
.. automodule:: ray.train.huggingface
|
||||
:members:
|
||||
:exclude-members: HuggingFaceTrainer
|
||||
:show-inheritance:
|
||||
:noindex:
|
||||
|
||||
Scikit-Learn
|
||||
############
|
||||
|
@ -291,14 +311,17 @@ Scikit-Learn
|
|||
.. autoclass:: ray.train.sklearn.SklearnTrainer
|
||||
:members:
|
||||
:show-inheritance:
|
||||
:noindex:
|
||||
|
||||
.. automethod:: __init__
|
||||
:noindex:
|
||||
|
||||
|
||||
.. automodule:: ray.train.sklearn
|
||||
:members:
|
||||
:exclude-members: SklearnTrainer
|
||||
:show-inheritance:
|
||||
:noindex:
|
||||
|
||||
|
||||
Reinforcement Learning (RLlib)
|
||||
|
@ -307,6 +330,7 @@ Reinforcement Learning (RLlib)
|
|||
.. automodule:: ray.train.rl
|
||||
:members:
|
||||
:show-inheritance:
|
||||
:noindex:
|
||||
|
||||
.. _air-builtin-callbacks:
|
||||
|
||||
|
@ -333,5 +357,3 @@ Weights and Biases
|
|||
##################
|
||||
|
||||
.. autoclass:: ray.air.callbacks.wandb.WandbLoggerCallback
|
||||
|
||||
.. _air-session-ref:
|
||||
|
|
|
@ -2,54 +2,159 @@
|
|||
|
||||
Ray Train API
|
||||
=============
|
||||
This page covers framework specific integrations with Ray Train and Ray Train Developer APIs.
|
||||
|
||||
This page covers advanced configurations for specific frameworks using Train.
|
||||
For core Ray AIR APIs, take a look at the :ref:`AIR Trainer package reference <air-trainer-ref>`.
|
||||
|
||||
For different high level trainers and their usage, take a look at the :ref:`AIR Trainer package reference <air-trainer-ref>`.
|
||||
.. _train-integration-api:
|
||||
|
||||
.. _train-api-backend-config:
|
||||
Trainer and Predictor Integrations
|
||||
----------------------------------
|
||||
|
||||
Backend Configurations
|
||||
----------------------
|
||||
XGBoost
|
||||
~~~~~~~
|
||||
|
||||
.. _train-api-torch-config:
|
||||
.. autoclass:: ray.train.xgboost.XGBoostTrainer
|
||||
:members:
|
||||
:show-inheritance:
|
||||
|
||||
TorchConfig
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. automodule:: ray.train.xgboost
|
||||
:members:
|
||||
:exclude-members: XGBoostTrainer
|
||||
:show-inheritance:
|
||||
|
||||
LightGBM
|
||||
~~~~~~~~
|
||||
|
||||
.. autoclass:: ray.train.lightgbm.LightGBMTrainer
|
||||
:members:
|
||||
:show-inheritance:
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. automodule:: ray.train.lightgbm
|
||||
:members:
|
||||
:exclude-members: LightGBMTrainer
|
||||
:show-inheritance:
|
||||
|
||||
TensorFlow
|
||||
~~~~~~~~~~
|
||||
|
||||
.. autoclass:: ray.train.tensorflow.TensorflowTrainer
|
||||
:members:
|
||||
:show-inheritance:
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. automodule:: ray.train.tensorflow
|
||||
:members:
|
||||
:exclude-members: TensorflowTrainer
|
||||
:show-inheritance:
|
||||
|
||||
PyTorch
|
||||
~~~~~~~
|
||||
|
||||
.. autoclass:: ray.train.torch.TorchTrainer
|
||||
:members:
|
||||
:show-inheritance:
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. automodule:: ray.train.torch
|
||||
:members:
|
||||
:exclude-members: TorchTrainer
|
||||
:show-inheritance:
|
||||
|
||||
Horovod
|
||||
~~~~~~~
|
||||
|
||||
.. autoclass:: ray.train.horovod.HorovodTrainer
|
||||
:members:
|
||||
:show-inheritance:
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. automodule:: ray.train.horovod
|
||||
:members:
|
||||
:exclude-members: HorovodTrainer
|
||||
:show-inheritance:
|
||||
|
||||
HuggingFace
|
||||
~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: ray.train.torch.TorchConfig
|
||||
.. autoclass:: ray.train.huggingface.HuggingFaceTrainer
|
||||
:members:
|
||||
:show-inheritance:
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. automodule:: ray.train.huggingface
|
||||
:members:
|
||||
:exclude-members: HuggingFaceTrainer
|
||||
:show-inheritance:
|
||||
|
||||
Scikit-Learn
|
||||
~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: ray.train.sklearn.SklearnTrainer
|
||||
:members:
|
||||
:show-inheritance:
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. automodule:: ray.train.sklearn
|
||||
:members:
|
||||
:exclude-members: SklearnTrainer
|
||||
:show-inheritance:
|
||||
|
||||
|
||||
Reinforcement Learning (RLlib)
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. automodule:: ray.train.rl
|
||||
:members:
|
||||
:show-inheritance:
|
||||
|
||||
|
||||
Base Classes (Developer APIs)
|
||||
-----------------------------
|
||||
.. autoclass:: ray.train.trainer.BaseTrainer
|
||||
:members:
|
||||
:noindex:
|
||||
|
||||
.. _train-api-tensorflow-config:
|
||||
|
||||
TensorflowConfig
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: ray.train.tensorflow.TensorflowConfig
|
||||
.. automethod:: __init__
|
||||
:noindex:
|
||||
|
||||
.. _train-api-horovod-config:
|
||||
|
||||
HorovodConfig
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: ray.train.horovod.HorovodConfig
|
||||
.. autoclass:: ray.train.data_parallel_trainer.DataParallelTrainer
|
||||
:members:
|
||||
:show-inheritance:
|
||||
:noindex:
|
||||
|
||||
.. _train-api-backend-interfaces:
|
||||
.. automethod:: __init__
|
||||
:noindex:
|
||||
|
||||
Backend interfaces (for developers only)
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
.. autoclass:: ray.train.gbdt_trainer.GBDTTrainer
|
||||
:members:
|
||||
:show-inheritance:
|
||||
:noindex:
|
||||
|
||||
Backend
|
||||
+++++++
|
||||
.. automethod:: __init__
|
||||
:noindex:
|
||||
|
||||
.. autoclass:: ray.train.backend.Backend
|
||||
|
||||
BackendConfig
|
||||
+++++++++++++
|
||||
:members:
|
||||
|
||||
.. autoclass:: ray.train.backend.BackendConfig
|
||||
:members:
|
||||
|
||||
|
||||
Deprecated APIs
|
||||
|
|
|
@ -57,10 +57,11 @@ training.
|
|||
to automatically prepare your model and data for distributed training.
|
||||
|
||||
.. note::
|
||||
Ray Train will still work even if you don't use the ``prepare_model`` and ``prepare_data_loader`` utilities below,
|
||||
Ray Train will still work even if you don't use the :func:`ray.train.torch.prepare_model`
|
||||
and :func:`ray.train.torch.prepare_data_loader` utilities below,
|
||||
and instead handle the logic directly inside your training function.
|
||||
|
||||
First, use the ``prepare_model`` function to automatically move your model to the right device and wrap it in
|
||||
First, use the :func:~ray.train.torch.prepare_model` function to automatically move your model to the right device and wrap it in
|
||||
``DistributedDataParallel``
|
||||
|
||||
.. code-block:: diff
|
||||
|
@ -89,7 +90,8 @@ training.
|
|||
|
||||
|
||||
Then, use the ``prepare_data_loader`` function to automatically add a ``DistributedSampler`` to your ``DataLoader``
|
||||
and move the batches to the right device.
|
||||
and move the batches to the right device. This step is not necessary if you are passing in Ray Datasets to your Trainer
|
||||
(see :ref:`train-datasets`)
|
||||
|
||||
.. code-block:: diff
|
||||
|
||||
|
@ -216,7 +218,7 @@ with one of the following:
|
|||
scaling_config=ScalingConfig(use_gpu=use_gpu, num_workers=2)
|
||||
)
|
||||
|
||||
To customize the backend setup, you can use a :ref:`train-api-backend-config` object.
|
||||
To customize the backend setup, you can use the :ref:`framework-specific config objects <train-integration-api>`.
|
||||
|
||||
.. tabbed:: PyTorch
|
||||
|
||||
|
@ -258,7 +260,7 @@ To customize the backend setup, you can use a :ref:`train-api-backend-config` ob
|
|||
scaling_config=ScalingConfig(num_workers=2),
|
||||
)
|
||||
|
||||
For more configurability, please reference the :class:`BaseTrainer` API.
|
||||
For more configurability, please reference the :py:class:`~ray.train.data_parallel_trainer.DataParallelTrainer` API.
|
||||
|
||||
Run training function
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
@ -327,7 +329,7 @@ Accessing Training Results
|
|||
|
||||
.. TODO(ml-team) Flesh this section out.
|
||||
|
||||
The return of a ``Trainer.fit`` is a :class:`Result` object, containing
|
||||
The return of a ``Trainer.fit`` is a :py:class:`~ray.air.result.Result` object, containing
|
||||
information about the training run. You can access it to obtain saved checkpoints,
|
||||
metrics and other relevant data.
|
||||
|
||||
|
@ -370,7 +372,7 @@ For example, you can:
|
|||
|
||||
print(result.metrics_dataframe)
|
||||
|
||||
* Obtain the :class:`Checkpoint`, used for resuming training, prediction and serving.
|
||||
* Obtain the :py:class:`~ray.air.checkpoint.Checkpoint`, used for resuming training, prediction and serving.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
|
@ -385,7 +387,7 @@ Log Directory Structure
|
|||
Each ``Trainer`` will have a local directory created for logs and checkpoints.
|
||||
|
||||
You can obtain the path to the directory by accessing the ``log_dir`` attribute
|
||||
of the :class:`Result` object returned by ``Trainer.fit``.
|
||||
of the :py:class:`~ray.air.result.Result` object returned by ``Trainer.fit()``.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
|
@ -497,7 +499,7 @@ training function. This will cause the checkpoint state from the distributed
|
|||
workers to be saved on the ``Trainer`` (where your python script is executed).
|
||||
|
||||
The latest saved checkpoint can be accessed through the ``checkpoint`` attribute of
|
||||
the :class:`Result`, and the best saved checkpoints can be accessed by the ``best_checkpoints``
|
||||
the :py:class:`~ray.air.result.Result`, and the best saved checkpoints can be accessed by the ``best_checkpoints``
|
||||
attribute.
|
||||
|
||||
Concrete examples are provided to demonstrate how checkpoints (model weights but not models) are saved
|
||||
|
@ -619,7 +621,7 @@ Configuring checkpoints
|
|||
+++++++++++++++++++++++
|
||||
|
||||
For more configurability of checkpointing behavior (specifically saving
|
||||
checkpoints to disk), a :class:`CheckpointConfig` can be passed into
|
||||
checkpoints to disk), a :py:class:`~ray.air.config.CheckpointConfig` can be passed into
|
||||
``Trainer``.
|
||||
|
||||
As an example, to completely disable writing checkpoints to disk:
|
||||
|
@ -684,11 +686,11 @@ Loading checkpoints
|
|||
|
||||
Checkpoints can be loaded into the training function in 2 steps:
|
||||
|
||||
1. From the training function, ``session.get_checkpoint`` can be used to access
|
||||
the most recently saved :class:`Checkpoint`. This is useful to continue training even
|
||||
1. From the training function, :func:`ray.air.session.get_checkpoint` can be used to access
|
||||
the most recently saved :py:class:`~ray.air.checkpoint.Checkpoint`. This is useful to continue training even
|
||||
if there's a worker failure.
|
||||
2. The checkpoint to start training with can be bootstrapped by passing in a
|
||||
:class:`Checkpoint` to ``Trainer`` as the ``resume_from_checkpoint`` argument.
|
||||
:py:class:`~ray.air.checkpoint.Checkpoint` to ``Trainer`` as the ``resume_from_checkpoint`` argument.
|
||||
|
||||
.. tabbed:: PyTorch
|
||||
|
||||
|
@ -835,7 +837,7 @@ Callbacks
|
|||
|
||||
You may want to plug in your training code with your favorite experiment management framework.
|
||||
Ray AIR provides an interface to fetch intermediate results and callbacks to process/log your intermediate results
|
||||
(the values passed into ``session.report(...)``).
|
||||
(the values passed into :func:`ray.air.session.report`).
|
||||
|
||||
Ray AIR contains :ref:`built-in callbacks <air-builtin-callbacks>` for popular tracking frameworks, or you can implement your own callback via the :ref:`Callback <tune-callbacks-docs>` interface.
|
||||
|
||||
|
@ -860,7 +862,7 @@ Custom Callbacks
|
|||
++++++++++++++++
|
||||
|
||||
If the provided callbacks do not cover your desired integrations or use-cases,
|
||||
you may always implement a custom callback by subclassing ``Callback``. If
|
||||
you may always implement a custom callback by subclassing :py:class:`~ray.tune.logger.LoggerCallback`. If
|
||||
the callback is general enough, please feel welcome to :ref:`add it <getting-involved>`
|
||||
to the ``ray`` `repository <https://github.com/ray-project/ray>`_.
|
||||
|
||||
|
@ -1034,7 +1036,7 @@ Hyperparameter tuning (Ray Tune)
|
|||
|
||||
Hyperparameter tuning with :ref:`Ray Tune <tune-main>` is natively supported
|
||||
with Ray Train. Specifically, you can take an existing ``Trainer`` and simply
|
||||
pass it into a :class:`Tuner`.
|
||||
pass it into a :py:class:`~ray.tune.tuner.Tuner`.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
|
@ -1076,9 +1078,9 @@ precision datatype for operations like linear layers and convolutions.
|
|||
|
||||
You can train your Torch model with AMP by:
|
||||
|
||||
1. Adding ``train.torch.accelerate(amp=True)`` to the top of your training function.
|
||||
2. Wrapping your optimizer with ``train.torch.prepare_optimizer``.
|
||||
3. Replacing your backward call with ``train.torch.backward``.
|
||||
1. Adding :func:`ray.train.torch.accelerate` with ``amp=True`` to the top of your training function.
|
||||
2. Wrapping your optimizer with :func:`ray.train.torch.prepare_optimizer`.
|
||||
3. Replacing your backward call with :func:`ray.train.torch.backward`.
|
||||
|
||||
.. code-block:: diff
|
||||
|
||||
|
@ -1120,7 +1122,7 @@ Reproducibility
|
|||
.. tabbed:: PyTorch
|
||||
|
||||
To limit sources of nondeterministic behavior, add
|
||||
``train.torch.enable_reproducibility()`` to the top of your training
|
||||
:func:`ray.train.torch.enable_reproducibility` to the top of your training
|
||||
function.
|
||||
|
||||
.. code-block:: diff
|
||||
|
@ -1133,7 +1135,7 @@ Reproducibility
|
|||
|
||||
...
|
||||
|
||||
.. warning:: ``train.torch.enable_reproducibility`` can't guarantee
|
||||
.. warning:: :func:`ray.train.torch.enable_reproducibility` can't guarantee
|
||||
completely reproducible results across executions. To learn more, read
|
||||
the `PyTorch notes on randomness <https://pytorch.org/docs/stable/notes/randomness.html>`_.
|
||||
|
||||
|
|
|
@ -143,8 +143,8 @@ class DataParallelTrainer(BaseTrainer):
|
|||
- **Use Case 1:** You want to do data parallel training, but want to have
|
||||
a predefined ``training_loop_per_worker``.
|
||||
|
||||
- **Use Case 2:** You want to implement a custom :ref:`Training backend
|
||||
<train-api-backend-interfaces>` that automatically handles
|
||||
- **Use Case 2:** You want to implement a custom
|
||||
:py:class:`~ray.train.backend.Backend` that automatically handles
|
||||
additional setup or teardown logic on each actor, so that the users of this
|
||||
new trainer do not have to implement this logic. For example, a
|
||||
``TensorflowTrainer`` can be built on top of ``DataParallelTrainer``
|
||||
|
|
Loading…
Add table
Reference in a new issue