[tune] Clarify Intro Tune Documentation (#8201)

This commit is contained in:
Richard Liaw 2020-04-27 18:01:00 -07:00 committed by GitHub
parent a77e5a8cbf
commit be5235d982
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
10 changed files with 184 additions and 141 deletions

View file

@ -1,3 +1,5 @@
.. _actor-guide:
Using Actors
============

View file

@ -47,6 +47,8 @@ You can run this `toy PBT example <https://github.com/ray-project/ray/blob/maste
.. autoclass:: ray.tune.schedulers.PopulationBasedTraining
:noindex:
.. _tune-scheduler-hyperband:
Asynchronous HyperBand
----------------------

View file

@ -89,6 +89,8 @@ An example of this can be found in `bayesopt_example.py <https://github.com/ray-
:show-inheritance:
:noindex:
.. _tune-hyperopt:
HyperOpt Search (Tree-structured Parzen Estimators)
---------------------------------------------------
@ -112,6 +114,7 @@ An example of this can be found in `hyperopt_example.py <https://github.com/ray-
:show-inheritance:
:noindex:
SigOpt Search
-------------
@ -141,6 +144,8 @@ An example of this can be found in `sigopt_example.py <https://github.com/ray-pr
:show-inheritance:
:noindex:
.. _tune-nevergrad:
Nevergrad Search
----------------
@ -217,6 +222,8 @@ An example of this can be found in `dragonfly_example.py <https://github.com/ray
:show-inheritance:
:noindex:
.. _tune-ax:
Ax Search
---------

View file

@ -10,7 +10,7 @@ Tune: Scalable Hyperparameter Tuning
Tune is a Python library for experiment execution and hyperparameter tuning at any scale. Core features:
* Launch a multi-node :ref:`distributed hyperparameter sweep <tune-distributed>` in less than 10 lines of code.
* Supports any machine learning framework, including PyTorch, XGBoost, MXNet, and Keras. See :ref:`examples here <tune-guides-overview>`.
* Supports any machine learning framework, :ref:`including PyTorch, XGBoost, MXNet, and Keras<tune-guides-overview>`.
* Natively `integrates with optimization libraries <tune-searchalg.html>`_ such as `HyperOpt <https://github.com/hyperopt/hyperopt>`_, `Bayesian Optimization <https://github.com/fmfn/BayesianOptimization>`_, and `Facebook Ax <http://ax.dev>`_.
* Choose among `scalable algorithms <tune-schedulers.html>`_ such as `Population Based Training (PBT)`_, `Vizier's Median Stopping Rule`_, `HyperBand/ASHA`_.
* Visualize results with `TensorBoard <https://www.tensorflow.org/get_started/summaries_and_tensorboard>`__.
@ -19,14 +19,7 @@ Tune is a Python library for experiment execution and hyperparameter tuning at a
.. _`Vizier's Median Stopping Rule`: tune-schedulers.html#median-stopping-rule
.. _`HyperBand/ASHA`: tune-schedulers.html#asynchronous-hyperband
.. important:: Join our `community slack <https://forms.gle/9TSdDYUgxYs8SA9e8>`_ to discuss Ray!
For more information, check out:
* :ref:`Tune in 60 Seconds <tune-60-seconds>`: A quick overview of Tune and its key concepts.
* :ref:`Tune Guides and Examples <tune-guides-overview>`: Examples, Tutorials, and Guides for how to use Tune.
* `Code <https://github.com/ray-project/ray/tree/master/python/ray/tune>`__: GitHub repository for Tune.
**Want to get started?** Head over to the :ref:`60 second Tune tutorial <tune-60-seconds>`.
Quick Start
-----------
@ -57,14 +50,16 @@ If using TF2 and TensorBoard, Tune will also automatically generate TensorBoard
:scale: 20%
:align: center
Take a look at the :ref:`Distributed Experiments <tune-distributed>` documentation for:
1. Setting up distributed experiments on your local cluster
2. Using AWS and GCP
3. Spot instance usage/pre-emptible instances, and more.
.. tip:: Join the `Ray community slack <https://forms.gle/9TSdDYUgxYs8SA9e8>`_ to discuss Ray Tune (and other Ray libraries)!
Talks and Blogs
---------------
Guides/Materials
----------------
Here are some reference materials for Tune:
* :ref:`Tune Tutorials, Guides, and Examples <tune-guides-overview>`
* `Code <https://github.com/ray-project/ray/tree/master/python/ray/tune>`__: GitHub repository for Tune
Below are some blog posts and talks about Tune:

View file

@ -16,9 +16,9 @@ Take a look at any of the below tutorials to get started with Tune.
<div class="sphx-glr-bigcontainer">
.. customgalleryitem::
:tooltip: A gentle 60 second tour of core Tune concepts.
:tooltip: Tune concepts in 60 seconds.
:figure: /images/tune-workflow.png
:description: :doc:`A gentle 60 second tour of Tune <tune-60-seconds>`
:description: :doc:`Tune concepts in 60 seconds <tune-60-seconds>`
.. customgalleryitem::
:tooltip: A simple Tune walkthrough.
@ -124,6 +124,7 @@ Tune Examples
If any example is broken, or if you'd like to add an example to this page, feel free to raise an issue on our Github repository.
.. _tune-general-examples:
General Examples
~~~~~~~~~~~~~~~~

View file

@ -9,156 +9,167 @@ Let's quickly walk through the key concepts you need to know to use Tune. In thi
:local:
:depth: 1
Tune takes a user-defined Python function or class and evaluates it on a set of hyperparameter configurations. Each hyperparameter configuration evaluation is called a *trial*, and Tune runs multiple trials in parallel, leveraging Search Algorithms and Trial Schedulers to optimize your hyperparameters.
.. image:: /images/tune-workflow.png
Trainables
----------
To allow Tune to optimize your model, Tune will need to control your training process. This is done via the Trainable API. Each *trial* corresponds to one instance of a Trainable; Tune will create multiple instances of the Trainable.
Tune will optimize your training process using the :ref:`Trainable API <trainable-docs>`. To start, let's try to maximize this objective function:
The Trainable API is where you specify how to set up your model and track intermediate training progress. There are two types of Trainables - a **function-based API** is for fast prototyping, and **class-based** API that unlocks many Tune features such as checkpointing, pausing.
.. code-block:: python
def objective(x, a, b):
return a * (x ** 0.5) + b
Here's an example of specifying the objective function using :ref:`the function-based Trainable API <tune-function-api>`:
.. code-block:: python
def trainable(config):
# config (dict): A dict of hyperparameters.
for x in range(20):
score = objective(x, config["a"], config["b"])
tune.track.log(score=score) # This sends the score to Tune.
Now, there's two Trainable APIs - one being the :ref:`function-based API <tune-function-api>` that we demonstrated above.
The other is a :ref:`class-based API <tune-class-api>` that enables :ref:`checkpointing and pausing <tune-trainable-save-restore>`. Here's an example of specifying the objective function using the :ref:`class-based API <tune-class-api>`:
.. code-block:: python
from ray import tune
class Trainable(tune.Trainable):
"""Tries to iteratively find the password."""
def _setup(self, config):
self.iter = 0
self.password = 1024
# config (dict): A dict of hyperparameters
self.x = 0
self.a = config["a"]
self.b = config["b"]
def _train(self):
"""Execute one step of 'training'. This function will be called iteratively"""
self.iter += 1
return {
"accuracy": abs(self.iter - self.password),
"training_iteration": self.iter # Tune will automatically provide this.
}
def _stop(self):
# perform any cleanup necessary.
pass
Function API example:
.. code-block:: python
def trainable(config):
"""
Args:
config (dict): Parameters provided from the search algorithm
or variant generation.
"""
while True:
# ...
tune.track.log(**kwargs)
def _train(self): # This is called iteratively.
score = objective(self.x, self.a, self.b)
self.x += 1
return {"score": score}
.. tip:: Do not use ``tune.track.log`` within a ``Trainable`` class.
See the documentation: :ref:`trainable-docs`.
See the documentation: :ref:`trainable-docs` and :ref:`examples <tune-general-examples>`.
tune.run
--------
Use ``tune.run`` execute hyperparameter tuning using the core Ray APIs. This function manages your distributed experiment and provides many features such as logging, checkpointing, and early stopping.
Use ``tune.run`` execute hyperparameter tuning using the core Ray APIs. This function manages your experiment and provides many features such as :ref:`logging <tune-logging>`, :ref:`checkpointing <tune-checkpoint>`, and :ref:`early stopping <tune-stopping>`.
.. code-block:: python
# Pass in a Trainable class or function to tune.run.
tune.run(trainable)
# Run 10 trials (each trial is one instance of a Trainable). Tune runs in
# parallel and automatically determines concurrency.
tune.run(trainable, num_samples=10)
# Run 1 trial, stop when trial has reached 10 iterations OR a mean accuracy of 0.98.
tune.run(my_trainable, stop={"training_iteration": 10, "mean_accuracy": 0.98})
# Run 1 trial, search over hyperparameters, stop after 10 iterations.
hyperparameters = {"lr": tune.uniform(0, 1), "momentum": tune.uniform(0, 1)}
tune.run(my_trainable, config=hyperparameters, stop={"training_iteration": 10})
This function will report status on the command line until all Trials stop:
This function will report status on the command line until all trials stop (each trial is one instance of a :ref:`Trainable <trainable-docs>`):
.. code-block:: bash
== Status ==
Memory usage on this node: 11.4/16.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 4/12 CPUs, 0/0 GPUs, 0.0/3.17 GiB heap, 0.0/1.07 GiB objects
Resources requested: 1/12 CPUs, 0/0 GPUs, 0.0/3.17 GiB heap, 0.0/1.07 GiB objects
Result logdir: /Users/foo/ray_results/myexp
Number of trials: 4 (4 RUNNING)
Number of trials: 1 (1 RUNNING)
+----------------------+----------+---------------------+-----------+--------+--------+----------------+-------+
| Trial name | status | loc | param1 | param2 | acc | total time (s) | iter |
| Trial name | status | loc | a | b | score | total time (s) | iter |
|----------------------+----------+---------------------+-----------+--------+--------+----------------+-------|
| MyTrainable_a826033a | RUNNING | 10.234.98.164:31115 | 0.303706 | 0.0761 | 0.1289 | 7.54952 | 15 |
| MyTrainable_a8263fc6 | RUNNING | 10.234.98.164:31117 | 0.929276 | 0.158 | 0.4865 | 7.0501 | 14 |
| MyTrainable_a8267914 | RUNNING | 10.234.98.164:31111 | 0.068426 | 0.0319 | 0.9585 | 7.0477 | 14 |
| MyTrainable_a826b7bc | RUNNING | 10.234.98.164:31112 | 0.729127 | 0.0748 | 0.1797 | 7.05715 | 14 |
+----------------------+----------+---------------------+-----------+--------+--------+----------------+-------+
See the documentation: :ref:`tune-run-ref`.
You can also easily run 10 trials. Tune automatically :ref:`determines how many trials will run in parallel <tune-parallelism>`.
.. code-block:: python
tune.run(trainable, num_samples=10)
Finally, you can randomly sample or grid search hyperparameters via Tune's :ref:`search space API <tune-default-search-space>`:
.. code-block:: python
space = {"x": tune.uniform(0, 1)}
tune.run(my_trainable, config=space, num_samples=10)
See more documentation: :ref:`tune-run-ref`.
Search Algorithms
-----------------
To optimize the hyperparameters of your training process, you will want to explore a “search space”.
Search Algorithms are Tune modules that help explore a provided search space. It will use previous results from evaluating different hyperparameters to suggest better hyperparameters. Tune has SearchAlgorithms that integrate with many popular **optimization** libraries, such as `Nevergrad <https://github.com/facebookresearch/nevergrad>`_ and `Hyperopt <https://github.com/hyperopt/hyperopt/>`_.
To optimize the hyperparameters of your training process, you will want to use a :ref:`Search Algorithm <tune-search-alg>` which will help suggest better hyperparameters.
.. code-block:: python
# https://github.com/hyperopt/hyperopt/
# pip install hyperopt
# Be sure to first run `pip install hyperopt`
import hyperopt as hp
from ray.tune.suggest.hyperopt import HyperOptSearch
# Create a HyperOpt search space
space = {"momentum": hp.uniform("momentum", 0, 20), "lr": hp.uniform("lr", 0, 1)}
# Pass the search space into Tune's HyperOpt wrapper and maximize accuracy
hyperopt = HyperOptSearch(space, metric="accuracy", mode="max")
space = {
"a": hp.uniform("a", 0, 1),
"b": hp.uniform("b", 0, 20)
# Execute 20 trials using HyperOpt, stop after 20 iterations
max_iters = {"training_iteration": 20}
tune.run(trainable, search_alg=hyperopt, num_samples=20, stop=max_iters)
# Note: Arbitrary HyperOpt search spaces should be supported!
# "foo": hp.lognormal("foo", 0, 1))
}
# Specify the search space and maximize score
hyperopt = HyperOptSearch(space, metric="score", mode="max")
# Execute 20 trials using HyperOpt and stop after 20 iterations
tune.run(
trainable,
search_alg=hyperopt,
num_samples=20,
stop={"training_iteration": 20}
)
Tune has SearchAlgorithms that integrate with many popular **optimization** libraries, such as :ref:`Nevergrad <tune-nevergrad>` and :ref:`Hyperopt <tune-hyperopt>`.
See the documentation: :ref:`searchalg-ref`.
Trial Schedulers
----------------
In addition, you can make your training process more efficient by stopping, pausing, or changing the hyperparameters of running trials.
In addition, you can make your training process more efficient by using a :ref:`Trial Scheduler <tune-schedulers>`.
Trial Schedulers are Tune modules that adjust and change distributed training runs during execution. These modules can stop/pause/tweak the hyperparameters of running trials, making your hyperparameter tuning process much faster. Population-based training and HyperBand are examples of popular optimization algorithms implemented as Trial Schedulers.
Trial Schedulers can stop/pause/tweak the hyperparameters of running trials, making your hyperparameter tuning process much faster.
.. code-block:: python
from ray.tune.schedulers import HyperBandScheduler
# Create HyperBand scheduler and maximize accuracy
hyperband = HyperBandScheduler(metric="accuracy", mode="max")
# Create HyperBand scheduler and maximize score
hyperband = HyperBandScheduler(metric="score", mode="max")
# Execute 20 trials using HyperBand using a search space
configs = {"lr": tune.uniform(0, 1), "momentum": tune.uniform(0, 1)}
tune.run(MyTrainableClass, num_samples=20, config=configs, scheduler=hyperband)
configs = {"a": tune.uniform(0, 1), "b": tune.uniform(0, 1)}
Unlike **Search Algorithms**, Trial Schedulers do not select which hyperparameter configurations to evaluate. However, you can use them together.
tune.run(
MyTrainableClass,
config=configs,
num_samples=20,
scheduler=hyperband
)
:ref:`Population-based Training <tune-scheduler-pbt>` and :ref:`HyperBand <tune-scheduler-hyperband>` are examples of popular optimization algorithms implemented as Trial Schedulers.
Unlike **Search Algorithms**, :ref:`Trial Scheduler <tune-schedulers>` do not select which hyperparameter configurations to evaluate. However, you can use them together.
See the documentation: :ref:`schedulers-ref`.
Analysis
--------
After running a hyperparameter tuning job, you will want to analyze your results to determine what specific parameters are important and which hyperparameter values are the best.
``tune.run`` returns an :ref:`Analysis <tune-analysis-docs>` object which has methods you can use for analyzing your results. This object can also retrieve all training runs as dataframes, allowing you to do ad-hoc data analysis over your results.
``tune.run`` returns an :ref:`Analysis <tune-analysis-docs>` object which has methods you can use for analyzing your training.
.. code-block:: python
@ -167,13 +178,16 @@ After running a hyperparameter tuning job, you will want to analyze your results
# Get the best hyperparameters
best_hyperparameters = analysis.get_best_config()
# Get a dataframe for the max accuracy seen for each trial
df = analysis.dataframe(metric="mean_accuracy", mode="max")
This object can also retrieve all training runs as dataframes, allowing you to do ad-hoc data analysis over your results.
.. code-block:: python
# Get a dataframe for the max score seen for each trial
df = analysis.dataframe(metric="score", mode="max")
What's Next?
~~~~~~~~~~~~
Now that you have a working understanding of Tune, check out:
* :ref:`Tune Guides and Examples <tune-guides-overview>`: Examples and templates for using Tune with your preferred machine learning library.

View file

@ -5,7 +5,7 @@ A Basic Tune Tutorial
.. image:: /images/tune-api.svg
This tutorial will walk you through the following process to setup a Tune experiment. Specifically, we'll leverage ASHA and Bayesian Optimization (via HyperOpt) via the following steps:
This tutorial will walk you through the following process to setup a Tune experiment using Pytorch. Specifically, we'll leverage ASHA and Bayesian Optimization (via HyperOpt) via the following steps:
1. Integrating Tune into your workflow
2. Specifying a TrialScheduler

View file

@ -9,6 +9,8 @@ This document provides an overview of the core concepts as well as some of the c
.. contents:: :local:
.. _tune-parallelism:
Parallelism / GPUs
------------------
@ -60,6 +62,8 @@ To attach to a Ray cluster, simply run ``ray.init`` before ``tune.run``:
ray.init(address=<ray_address>)
tune.run(trainable, num_samples=100, resources_per_trial={"cpu": 2, "gpu": 1})
.. _tune-default-search-space:
Search Space (Grid/Random)
--------------------------
@ -219,6 +223,8 @@ You often will want to compute a large object (e.g., training data, model weight
tune.run(f)
.. _tune-stopping:
Stopping Trials
---------------
@ -271,6 +277,8 @@ Finally, you can implement the ``Stopper`` abstract class for stopping entire ex
Note that in the above example the currently running trials will not stop immediately but will do so once their current iterations are complete. See the :ref:`tune-stop-ref` documentation.
.. _tune-logging:
Logging/Tensorboard
-------------------

View file

@ -7,30 +7,47 @@ Training can be done with either a **Class API** (``tune.Trainable``) or **funct
You can use the **function-based API** for fast prototyping. On the other hand, the ``tune.Trainable`` interface supports checkpoint/restore functionality and provides more control for advanced algorithms.
For the sake of example, let's maximize this objective function:
.. code-block:: python
def objective(x, a, b):
return a * (x ** 0.5) + b
.. _tune-function-api:
Function-based API
------------------
.. code-block:: python
def trainable(config):
"""
Args:
config (dict): Parameters provided from the search algorithm
or variant generation.
"""
# config (dict): A dict of hyperparameters.
while True:
# ...
tune.track.log(**kwargs)
for x in range(20):
score = objective(x, config["a"], config["b"])
tune.track.log(score=score) # This sends the score to Tune.
analysis = tune.run(
trainable,
config={
"a": 2,
"b": 4
})
print("best config: ", analysis.get_best_config(metric="score", mode="max"))
.. tip:: Do not use ``tune.track.log`` within a ``Trainable`` class.
Tune will run this function on a separate thread in a Ray actor process. Note that this API is not checkpointable, since the thread will never return control back to its caller.
.. note:: If you have a lambda function that you want to train, you will need to first register the function: ``tune.register_trainable("lambda_id", lambda x: ...)``. You can then use ``lambda_id`` in place of ``my_trainable``.
.. note:: If you want to pass in a Python lambda, you will need to first register the function: ``tune.register_trainable("lambda_id", lambda x: ...)``. You can then use ``lambda_id`` in place of ``my_trainable``.
Trainable API
-------------
.. _tune-class-api:
Trainable Class API
-------------------
.. caution:: Do not use ``tune.track.log`` within a ``Trainable`` class.
@ -40,44 +57,40 @@ The Trainable **class API** will require users to subclass ``ray.tune.Trainable`
from ray import tune
class Guesser(tune.Trainable):
"""Randomly picks a number from [1, 10000) to find the password."""
class Trainable(tune.Trainable):
def _setup(self, config):
self.guess = config["guess"]
self.iter = 0
self.password = 1024
def _train(self):
"""Execute one step of 'training'. This function will be called iteratively"""
self.iter += 1
self.guess += 1
return {
"accuracy": abs(self.guess - self.password),
"training_iteration": self.iter # Tune will automatically provide this.
}
# config (dict): A dict of hyperparameters
self.x = 0
self.a = config["a"]
self.b = config["b"]
def _train(self): # This is called iteratively.
score = objective(self.x, self.a, self.b)
self.x += 1
return {"score": score}
analysis = tune.run(
Guesser,
stop={"training_iteration": 10},
num_samples=10,
Trainable,
stop={"training_iteration": 20},
config={
"guess": tune.randint(1, 10000)
"a": 2,
"b": 4
})
print('best config: ', analysis.get_best_config(metric="diff", mode="min"))
print('best config: ', analysis.get_best_config(metric="score", mode="max"))
As a subclass of ``tune.Trainable``, Tune will create a ``Guesser`` object on a separate process (using the Ray Actor API).
As a subclass of ``tune.Trainable``, Tune will create a ``Trainable`` object on a separate process (using the :ref:`Ray Actor API <actor-guide>`).
1. ``_setup`` function is invoked once training starts.
2. ``_train`` is invoked **multiple times**. Each time, the Guesser object executes one logical iteration of training in the tuning process, which may include one or more iterations of actual training.
2. ``_train`` is invoked **multiple times**. Each time, the Trainable object executes one logical iteration of training in the tuning process, which may include one or more iterations of actual training.
3. ``_stop`` is invoked when training is finished.
.. tip:: As a rule of thumb, the execution time of ``_train`` should be large enough to avoid overheads (i.e. more than a few seconds), but short enough to report progress periodically (i.e. at most a few minutes).
In this example, we only implemented the ``_setup`` and ``_train`` methods for simplification. Next, we'll implement ``_save`` and ``_restore`` for checkpoint and fault tolerance.
.. _tune-trainable-save-restore:
Save and Restore
~~~~~~~~~~~~~~~~

View file

@ -219,18 +219,19 @@ def run(run_or_experiment,
TuneError: Any trials failed and `raise_on_failed_trial` is True.
Examples:
>>> tune.run(mytrainable, scheduler=PopulationBasedTraining())
>>> tune.run(mytrainable, num_samples=5, reuse_actors=True)
.. code-block:: python
>>> tune.run(
>>> "PG",
>>> num_samples=5,
>>> config={
>>> "env": "CartPole-v0",
>>> "lr": tune.sample_from(lambda _: np.random.rand())
>>> }
>>> )
# Run 10 trials (each trial is one instance of a Trainable). Tune runs
# in parallel and automatically determines concurrency.
tune.run(trainable, num_samples=10)
# Run 1 trial, stop when trial has reached 10 iterations
tune.run(my_trainable, stop={"training_iteration": 10})
# Run 1 trial, search over hyperparameters, stop after 10 iterations.
space = {"lr": tune.uniform(0, 1), "momentum": tune.uniform(0, 1)}
tune.run(my_trainable, config=space, stop={"training_iteration": 10})
"""
trial_executor = trial_executor or RayTrialExecutor(
queue_trials=queue_trials,