2022-04-04 16:14:35 -07:00
.. _train-faq:
Ray Train FAQ
How fast is Ray Train compared to PyTorch, TensorFlow, etc.?
At its core, training speed should be the same - while Ray Train launches distributed training workers via Ray Actors,
communication during training (e.g. gradient synchronization) is handled by the backend training framework itself.
2022-07-07 16:29:04 -07:00
For example, when running Ray Train with the ``TorchTrainer``,
2022-04-04 16:14:35 -07:00
distributed training communication is done with Torch's ``DistributedDataParallel``.
2022-08-04 05:59:50 -07:00
Take a look at the :ref:`Pytorch <pytorch-training-parity>` and :ref:`Tensorflow <tf-training-parity>` benchmarks to check performance parity.
2022-04-04 16:14:35 -07:00
How do I set resources?
By default, each worker will reserve 1 CPU resource, and an additional 1 GPU resource if ``use_gpu=True``.
To override these resource requests or request additional custom resources,
2022-07-07 16:29:04 -07:00
you can initialize the ``Trainer`` with ``resources_per_worker`` specified in ``scaling_config``.
2022-04-04 16:14:35 -07:00
.. note::
Some GPU utility functions (e.g. :ref:`train-api-torch-get-device`, :ref:`train-api-torch-prepare-model`)
currently assume each worker is allocated exactly 1 GPU. The partial GPU and multi GPU use-cases
can still be run with Ray Train today without these functions.