ray/doc/source/train/getting-started.rst
Richard Liaw 4629a3a649
[air/docs] Update Trainer documentation (#27481)
Co-authored-by: xwjiang2010 <xwjiang2010@gmail.com>
Co-authored-by: Kai Fricke <kai@anyscale.com>
Co-authored-by: xwjiang2010 <87673679+xwjiang2010@users.noreply.github.com>
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2022-08-05 11:21:19 -07:00

181 lines
6.6 KiB
ReStructuredText

.. _train-getting-started:
Getting Started
===============
Ray Train offers multiple ``Trainers`` which implement scalable model training for different machine learning frameworks.
Here are examples for some of the commonly used trainers:
.. tabbed:: XGBoost
In this example we will train a model using distributed XGBoost.
First, we load the dataset from S3 using Ray Datasets and split it into a
train and validation dataset.
.. literalinclude:: doc_code/gbdt_user_guide.py
:language: python
:start-after: __xgb_detail_intro_start__
:end-before: __xgb_detail_intro_end__
In the :class:`ScalingConfig <ray.air.config.ScalingConfig>`,
we configure the number of workers to use:
.. literalinclude:: doc_code/gbdt_user_guide.py
:language: python
:start-after: __xgb_detail_scaling_start__
:end-before: __xgb_detail_scaling_end__
We then instantiate our XGBoostTrainer by passing in:
- The aforementioned ``ScalingConfig``.
- The ``label_column`` refers to the column name containing the labels in the Ray Dataset
- The ``params`` are `XGBoost training parameters <https://xgboost.readthedocs.io/en/stable/parameter.html>`__
.. literalinclude:: doc_code/gbdt_user_guide.py
:language: python
:start-after: __xgb_detail_training_start__
:end-before: __xgb_detail_training_end__
Lastly, we call ``trainer.fit()`` to kick off training and obtain the results.
.. literalinclude:: doc_code/gbdt_user_guide.py
:language: python
:start-after: __xgb_detail_fit_start__
:end-before: __xgb_detail_fit_end__
.. tabbed:: LightGBM
In this example we will train a model using distributed LightGBM.
First, we load the dataset from S3 using Ray Datasets and split it into a
train and validation dataset.
.. literalinclude:: doc_code/gbdt_user_guide.py
:language: python
:start-after: __lgbm_detail_intro_start__
:end-before: __lgbm_detail_intro_end__
In the :class:`ScalingConfig <ray.air.config.ScalingConfig>`,
we configure the number of workers to use:
.. literalinclude:: doc_code/gbdt_user_guide.py
:language: python
:start-after: __xgb_detail_scaling_start__
:end-before: __xgb_detail_scaling_end__
We then instantiate our LightGBMTrainer by passing in:
- The aforementioned ``ScalingConfig``
- The ``label_column`` refers to the column name containing the labels in the Ray Dataset
- The ``params`` are core `LightGBM training parameters <https://lightgbm.readthedocs.io/en/latest/Parameters.html>`__
.. literalinclude:: doc_code/gbdt_user_guide.py
:language: python
:start-after: __lgbm_detail_training_start__
:end-before: __lgbm_detail_training_end__
And lastly we call ``trainer.fit()`` to kick off training and obtain the results.
.. literalinclude:: doc_code/gbdt_user_guide.py
:language: python
:start-after: __lgbm_detail_fit_start__
:end-before: __lgbm_detail_fit_end__
.. tabbed:: PyTorch
This example shows how you can use Ray Train with PyTorch.
First, set up your dataset and model.
.. literalinclude:: /../../python/ray/train/examples/torch_quick_start.py
:language: python
:start-after: __torch_setup_begin__
:end-before: __torch_setup_end__
Now define your single-worker PyTorch training function.
.. literalinclude:: /../../python/ray/train/examples/torch_quick_start.py
:language: python
:start-after: __torch_single_begin__
:end-before: __torch_single_end__
This training function can be executed with:
.. literalinclude:: /../../python/ray/train/examples/torch_quick_start.py
:language: python
:start-after: __torch_single_run_begin__
:end-before: __torch_single_run_end__
Now let's convert this to a distributed multi-worker training function!
All you have to do is use the ``ray.train.torch.prepare_model`` and
``ray.train.torch.prepare_data_loader`` utility functions to
easily setup your model & data for distributed training.
This will automatically wrap your model with ``DistributedDataParallel``
and place it on the right device, and add ``DistributedSampler`` to your DataLoaders.
.. literalinclude:: /../../python/ray/train/examples/torch_quick_start.py
:language: python
:start-after: __torch_distributed_begin__
:end-before: __torch_distributed_end__
Then, instantiate a ``TorchTrainer``
with 4 workers, and use it to run the new training function!
.. literalinclude:: /../../python/ray/train/examples/torch_quick_start.py
:language: python
:start-after: __torch_trainer_begin__
:end-before: __torch_trainer_end__
See :ref:`train-porting-code` for a more comprehensive example.
.. tabbed:: TensorFlow
This example shows how you can use Ray Train to set up `Multi-worker training
with Keras <https://www.tensorflow.org/tutorials/distribute/multi_worker_with_keras>`_.
First, set up your dataset and model.
.. literalinclude:: /../../python/ray/train/examples/tensorflow_quick_start.py
:language: python
:start-after: __tf_setup_begin__
:end-before: __tf_setup_end__
Now define your single-worker TensorFlow training function.
.. literalinclude:: /../../python/ray/train/examples/tensorflow_quick_start.py
:language: python
:start-after: __tf_single_begin__
:end-before: __tf_single_end__
This training function can be executed with:
.. literalinclude:: /../../python/ray/train/examples/tensorflow_quick_start.py
:language: python
:start-after: __tf_single_run_begin__
:end-before: __tf_single_run_end__
Now let's convert this to a distributed multi-worker training function!
All you need to do is:
1. Set the per-worker batch size - each worker will process the same size
batch as in the single-worker code.
2. Choose your TensorFlow distributed training strategy. In this example
we use the ``MultiWorkerMirroredStrategy``.
.. literalinclude:: /../../python/ray/train/examples/tensorflow_quick_start.py
:language: python
:start-after: __tf_distributed_begin__
:end-before: __tf_distributed_end__
Then, instantiate a ``TensorflowTrainer`` with 4 workers,
and use it to run the new training function!
.. literalinclude:: /../../python/ray/train/examples/tensorflow_quick_start.py
:language: python
:start-after: __tf_trainer_begin__
:end-before: __tf_trainer_end__
See :ref:`train-porting-code` for a more comprehensive example.