ray/doc/source/raysgd/raysgd_tune.rst
Max Pumperla 5cc9355303
[Docs ] Tune docs overhaul (first part) (#22112)
Continuing docs overhaul, tune now has:

- [x] better landing page
- [x] a getting started guide
- [x] user guide was cut down, partially merged with FAQ, and partially integrated with tutorials
- [x] the new user guide contains guides to tune features and practical integrations
- [x] we rewrote some of the feature guides for clarity 
- [x] we got rid of sphinx-gallery for this sub-project (only data and core left), as it looks bad and is unnecessarily complicated anyway (plus, makes the build slower)
- [x] sphinx-gallery examples are now moved to markdown notebook, as started in #22030.
- [x] Examples are tested in the new framework, of course.

There's still a lot one can do, but this is already getting too large. Will follow up with more fine-tuning next week.

Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
2022-02-07 15:47:03 +00:00

56 lines
3.2 KiB
ReStructuredText

.. _raysgd-tune:
RaySGD Hyperparameter Tuning
============================
.. warning:: This is an older version of Ray SGD. A newer, more light-weight version of Ray SGD (named Ray Train) is in alpha as of Ray 1.7.
See the documentation :ref:`here <train-docs>`. To migrate from v1 to v2 you can follow the :ref:`migration guide <sgd-migration>`.
RaySGD integrates with :ref:`Ray Tune <tune-60-seconds>` to easily run distributed hyperparameter tuning experiments with your RaySGD Trainer.
PyTorch
-------
.. tip:: If you want to leverage multi-node data parallel training with PyTorch while using RayTune *without* using RaySGD, check out the :ref:`Tune PyTorch user guide <tune-pytorch-cifar-ref>` and Tune's lightweight :ref:`distributed pytorch integrations <tune-ddp-doc>`.
``TorchTrainer`` naturally integrates with Tune via the ``BaseTorchTrainable`` interface. Without changing any arguments, you can call ``TorchTrainer.as_trainable(...)`` to create a Tune-compatible class.
Then, you can simply pass the returned Trainable class to ``tune.run``. The ``config`` used for each ``Trainable`` in tune will automatically be passed down to the ``TorchTrainer``.
Therefore, each trial will have its own ``TorchTrainable`` that holds an instance of the ``TorchTrainer`` with its own unique hyperparameter configuration.
See the documentation (:ref:`BaseTorchTrainable-doc`) for more info.
.. literalinclude:: ../../../python/ray/util/sgd/torch/examples/tune_example.py
:language: python
:start-after: __torch_tune_example__
:end-before: __end_torch_tune_example__
By default the training step for the returned ``Trainable`` will run one epoch of training and one epoch of validation, and will report
the combined result dictionaries to Tune.
By combining RaySGD with Tune, each individual trial will be run in a distributed fashion with ``num_workers`` workers,
but there can be multiple trials running in parallel as well.
Custom Training Step
~~~~~~~~~~~~~~~~~~~~
Sometimes it is necessary to provide a custom training step, for example if you want to run more than one epoch of training for
each tune iteration, or you need to manually update the scheduler after validation. Custom training steps can easily be provided by passing
in a ``override_tune_step`` function to ``TorchTrainer.as_trainable(...)``.
.. literalinclude:: ../../../python/ray/util/sgd/torch/examples/tune_example.py
:language: python
:start-after: __torch_tune_manual_lr_example__
:end-before: __end_torch_tune_manual_lr_example__
Your custom step function should take in two arguments: an instance of the ``TorchTrainer`` and an ``info`` dict containing other potentially
necessary information.
The info dict contains the following values:
.. code-block:: python
# The current Tune iteration.
# This may be different than the number of epochs trained if each tune step does more than one epoch of training.
iteration
If you would like any other information to be available in the ``info`` dict please file a feature request on `Github Issues <https://github.com/ray-project/ray/issues>`_!
You can see the `Tune example script <https://github.com/ray-project/ray/blob/master/python/ray/util/sgd/torch/examples/tune_example.py>`_ for an end-to-end example.