.. _train-key-concepts: Key Concepts ============ There are four main concepts in the Ray Train library. 1. ``Trainers`` execute distributed training. 2. ``Configuration`` objects are used to configure training. 3. ``Checkpoints`` are returned as the result of training. 4. ``Predictors`` can be used for inference and batch prediction. .. https://docs.google.com/drawings/d/1FezcdrXJuxLZzo6Rjz1CHyJzseH8nPFZp6IUepdn3N4/edit .. image:: images/train-specific.svg Trainers -------- Trainers are responsible for executing (distributed) training runs. The output of a Trainer run is a :ref:`Result ` that contains metrics from the training run and the latest saved :ref:`Checkpoint `. Trainers can also be configured with :ref:`Datasets ` and :ref:`Preprocessors ` for scalable data ingest and preprocessing. There are three categories of built-in Trainers: .. tabbed:: Deep Learning Trainers Ray Train supports the following deep learning trainers: - :class:`TorchTrainer ` - :class:`TensorflowTrainer ` - :class:`HorovodTrainer ` For these trainers, you usually define your own training function that loads the model and executes single-worker training steps. Refer to the following guides for more details: - :ref:`Deep learning user guide ` - :ref:`Quick overview of deep-learning trainers in the Ray AIR documentation ` .. tabbed:: Tree-Based Trainers Tree-based trainers utilize gradient-based decision trees for training. The most popular libraries for this are XGBoost and LightGBM. - :class:`XGBoostTrainer ` - :class:`LightGBMTrainer ` For these trainers, you just pass a dataset and parameters. The training loop is configured automatically. - :ref:`XGBoost/LightGBM user guide ` - :ref:`Quick overview of tree-based trainers in the Ray AIR documentation ` .. tabbed:: Other Trainers Some trainers don't fit into the other two categories, such as: - :class:`HuggingFaceTrainer ` for NLP - :class:`RLTrainer ` for reinforcement learning - :class:`SklearnTrainer ` for (non-distributed) training of sklearn models. - :ref:`Other trainers in the Ray AIR documentation ` .. _train-key-concepts-config: Configuration ------------- Trainers are configured with configuration objects. There are two main configuration classes, the :class:`ScalingConfig ` and the :class:`RunConfig `. The latter contains subconfigurations, such as the :class:`FailureConfig `, :class:`SyncConfig ` and :class:`CheckpointConfig `. Check out the :ref:`Configurations User Guide ` for an in-depth guide on using these configurations. .. _train-key-concepts-results: Checkpoints ----------- Calling ``Trainer.fit()`` returns a :class:`Result ` object, which includes information about the run such as the reported metrics and the saved checkpoints. Checkpoints have the following purposes: * They can be passed to a Trainer to resume training from the given model state. * They can be used to create a Predictor / BatchPredictor for scalable batch prediction. * They can be deployed with Ray Serve. .. _train-key-concepts-predictors: Predictors ---------- Predictors are the counterpart to Trainers. A Trainer trains a model on a dataset, and a predictor uses the resulting model and performs inference on it. Each Trainer has a respective Predictor implementation that is compatible with its generated checkpoints. .. dropdown:: Example: :class:`XGBoostPredictor ` .. literalinclude:: /train/doc_code/xgboost_train_predict.py :language: python :start-after: __train_predict_start__ :end-before: __train_predict_end__ A predictor can be passed into a :class:`BatchPredictor ` is used to scale up prediction over a Ray cluster. It takes a Ray Dataset as input. .. dropdown:: Example: Batch prediction with :class:`XGBoostPredictor ` .. literalinclude:: /train/doc_code/xgboost_train_predict.py :language: python :start-after: __batch_predict_start__ :end-before: __batch_predict_end__ See :ref:`the Predictors user guide ` for more information and examples.