2016-07-28 20:47:37 -07:00
|
|
|
# Hyperparameter Optimization
|
2016-07-07 14:17:12 -07:00
|
|
|
|
|
|
|
This document provides a walkthrough of the hyperparameter optimization example.
|
|
|
|
To run the application, first install this dependency.
|
|
|
|
|
|
|
|
- [TensorFlow](https://www.tensorflow.org/)
|
|
|
|
|
2016-07-08 12:32:48 -07:00
|
|
|
Then from the directory `ray/examples/hyperopt/` run the following.
|
2016-07-07 14:17:12 -07:00
|
|
|
|
|
|
|
```
|
2016-07-08 12:32:48 -07:00
|
|
|
source ../../setup-env.sh
|
2016-07-26 23:45:14 -07:00
|
|
|
python driver.py
|
2016-07-07 14:17:12 -07:00
|
|
|
```
|
|
|
|
|
|
|
|
Machine learning algorithms often have a number of *hyperparameters* whose
|
|
|
|
values must be chosen by the practitioner. For example, an optimization
|
|
|
|
algorithm may have a step size, a decay rate, and a regularization coefficient.
|
|
|
|
In a deep network, the network parameterization itself (e.g., the number of
|
|
|
|
layers and the number of units per layer) can be considered a hyperparameter.
|
|
|
|
|
|
|
|
Choosing these parameters can be challenging, and so a common practice is to
|
|
|
|
search over the space of hyperparameters. One approach that works surprisingly
|
|
|
|
well is to randomly sample different options.
|
|
|
|
|
2016-07-28 20:47:37 -07:00
|
|
|
## The serial version
|
2016-07-07 14:17:12 -07:00
|
|
|
|
|
|
|
Suppose that we want to train a convolutional network, but we aren't sure how to
|
|
|
|
choose the following hyperparameters:
|
|
|
|
|
|
|
|
- the learning rate
|
|
|
|
- the batch size
|
|
|
|
- the dropout probability
|
|
|
|
- the standard deviation of the distribution from which to initialize the
|
|
|
|
network weights
|
|
|
|
|
2016-07-26 18:16:10 -07:00
|
|
|
Suppose that we've defined a Python function `train_cnn_and_compute_accuracy`,
|
|
|
|
which takes values for these hyperparameters as its input (along with the
|
|
|
|
dataset), trains a convolutional network using those hyperparameters, and
|
|
|
|
returns the accuracy of the trained model on a validation set.
|
2016-07-07 14:17:12 -07:00
|
|
|
|
|
|
|
```python
|
2016-07-26 18:16:10 -07:00
|
|
|
def train_cnn_and_compute_accuracy(hyperparameters, train_images, train_labels, validation_images, validation_labels):
|
|
|
|
# Construct a deep network, train it, and return the validation accuracy.
|
|
|
|
# The argument hyperparameters is a dictionary with keys:
|
2016-07-07 14:17:12 -07:00
|
|
|
# - "learning_rate"
|
|
|
|
# - "batch_size"
|
|
|
|
# - "dropout"
|
|
|
|
# - "stddev"
|
|
|
|
return validation_accuracy
|
|
|
|
```
|
|
|
|
|
|
|
|
Something that works surprisingly well is to try random values for the
|
|
|
|
hyperparameters. For example, we can write the following.
|
|
|
|
|
|
|
|
```python
|
|
|
|
def generate_random_params():
|
|
|
|
# Randomly choose values for the hyperparameters
|
2016-07-26 18:16:10 -07:00
|
|
|
learning_rate = 10 ** np.random.uniform(-5, 5)
|
|
|
|
batch_size = np.random.randint(1, 100)
|
2016-07-07 14:17:12 -07:00
|
|
|
dropout = np.random.uniform(0, 1)
|
2016-07-26 18:16:10 -07:00
|
|
|
stddev = 10 ** np.random.uniform(-5, 5)
|
2016-07-07 14:17:12 -07:00
|
|
|
return {"learning_rate": learning_rate, "batch_size": batch_size, "dropout": dropout, "stddev": stddev}
|
|
|
|
|
|
|
|
results = []
|
|
|
|
for _ in range(100):
|
|
|
|
randparams = generate_random_params()
|
2016-08-08 16:01:13 -07:00
|
|
|
results.append((randparams, train_cnn_and_compute_accuracy(randparams, train_images, train_labels, validation_images, validation_labels)))
|
2016-07-07 14:17:12 -07:00
|
|
|
```
|
|
|
|
|
|
|
|
Then we can inspect the contents of `results` and see which set of
|
|
|
|
hyperparameters worked the best.
|
|
|
|
|
|
|
|
Of course, as there are no dependencies between the different invocations of
|
2016-07-26 18:16:10 -07:00
|
|
|
`train_cnn_and_compute_accuracy`, this computation could easily be parallelized
|
|
|
|
over multiple cores or multiple machines. Let's do that now.
|
2016-07-07 14:17:12 -07:00
|
|
|
|
2016-07-28 20:47:37 -07:00
|
|
|
## The distributed version
|
2016-07-07 14:17:12 -07:00
|
|
|
|
2016-07-26 18:16:10 -07:00
|
|
|
First, let's turn `train_cnn_and_compute_accuracy` into a remote function in Ray
|
|
|
|
by writing it as follows. In this example application, a slightly more
|
|
|
|
complicated version of this remote function is defined in
|
|
|
|
[hyperopt.py](hyperopt.py).
|
2016-07-07 14:17:12 -07:00
|
|
|
|
|
|
|
```python
|
2016-08-30 15:14:02 -07:00
|
|
|
@ray.remote
|
2016-07-26 18:16:10 -07:00
|
|
|
def train_cnn_and_compute_accuracy(hyperparameters, train_images, train_labels, validation_images, validation_labels):
|
|
|
|
# Actual work omitted.
|
2016-07-07 14:17:12 -07:00
|
|
|
return validation_accuracy
|
|
|
|
```
|
|
|
|
|
2016-08-30 17:20:00 -07:00
|
|
|
The only difference is that we added the `@ray.remote` decorator.
|
2016-07-07 14:17:12 -07:00
|
|
|
|
2016-07-26 18:16:10 -07:00
|
|
|
Now a call to `train_cnn_and_compute_accuracy` does not execute the function. It
|
2016-07-31 19:58:03 -07:00
|
|
|
submits the task to the scheduler and returns an object ID for the output
|
2016-07-26 18:16:10 -07:00
|
|
|
of the eventual computation. The scheduler, at its leisure, will schedule the
|
|
|
|
task on a worker (which may live on the same machine or on a different machine
|
|
|
|
in the cluster).
|
2016-07-07 14:17:12 -07:00
|
|
|
|
|
|
|
Now the for loop runs almost instantaneously because it does not do any actual
|
|
|
|
computation. Instead, it simply submits a number of tasks to the scheduler.
|
|
|
|
|
|
|
|
```python
|
2016-07-31 19:58:03 -07:00
|
|
|
result_ids = []
|
2016-07-07 14:17:12 -07:00
|
|
|
for _ in range(100):
|
2016-07-26 18:16:10 -07:00
|
|
|
params = generate_random_params()
|
2016-08-08 16:01:13 -07:00
|
|
|
results.append((params, train_cnn_and_compute_accuracy.remote(params, train_images, train_labels, validation_images, validation_labels)))
|
2016-07-07 14:17:12 -07:00
|
|
|
```
|
|
|
|
|
|
|
|
If we wish to wait until the results have all been retrieved, we can retrieve
|
|
|
|
their values with `ray.get`.
|
|
|
|
|
|
|
|
```python
|
2016-07-31 19:58:03 -07:00
|
|
|
results = [(params, ray.get(result_id)) for (params, result_id) in result_ids]
|
2016-07-07 14:17:12 -07:00
|
|
|
```
|
|
|
|
|
2016-07-28 20:47:37 -07:00
|
|
|
## Additional notes
|
2016-07-07 14:17:12 -07:00
|
|
|
|
|
|
|
**Early Stopping:** Sometimes when running an optimization, it is clear early on
|
|
|
|
that the hyperparameters being used are bad (for example, the loss function may
|
2016-07-26 18:16:10 -07:00
|
|
|
start diverging). In these situations, it makes sense to end that particular run
|
|
|
|
early to save resources. This is implemented within the remote function
|
|
|
|
`train_cnn_and_compute_accuracy`. If it detects that the optimization is going
|
|
|
|
poorly, it returns early.
|