mirror of
https://github.com/vale981/ray
synced 2025-03-09 12:56:46 -04:00

This PR adds a user guide to AIR for using Ray Train. It provides a high level overview of the trainers and removes redundant sections. The main file to review is here: doc/source/ray-air/trainer.rst. Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com> Signed-off-by: Richard Liaw <rliaw@berkeley.edu> Signed-off-by: Kai Fricke <kai@anyscale.com> Co-authored-by: Richard Liaw <rliaw@berkeley.edu> Co-authored-by: Kai Fricke <kai@anyscale.com>
28 lines
1.3 KiB
ReStructuredText
28 lines
1.3 KiB
ReStructuredText
.. _train-faq:
|
|
|
|
Ray Train FAQ
|
|
=============
|
|
|
|
How fast is Ray Train compared to PyTorch, TensorFlow, etc.?
|
|
------------------------------------------------------------
|
|
|
|
At its core, training speed should be the same - while Ray Train launches distributed training workers via Ray Actors,
|
|
communication during training (e.g. gradient synchronization) is handled by the backend training framework itself.
|
|
|
|
For example, when running Ray Train with the ``TorchTrainer``,
|
|
distributed training communication is done with Torch's ``DistributedDataParallel``.
|
|
|
|
Take a look at the :ref:`Pytorch <pytorch-training-parity>` and :ref:`Tensorflow <tf-training-parity>` benchmarks to check performance parity.
|
|
|
|
How do I set resources?
|
|
-----------------------
|
|
|
|
By default, each worker will reserve 1 CPU resource, and an additional 1 GPU resource if ``use_gpu=True``.
|
|
|
|
To override these resource requests or request additional custom resources,
|
|
you can initialize the ``Trainer`` with ``resources_per_worker`` specified in ``scaling_config``.
|
|
|
|
.. note::
|
|
Some GPU utility functions (e.g. :ref:`train-api-torch-get-device`, :ref:`train-api-torch-prepare-model`)
|
|
currently assume each worker is allocated exactly 1 GPU. The partial GPU and multi GPU use-cases
|
|
can still be run with Ray Train today without these functions.
|