mirror of
https://github.com/vale981/ray
synced 2025-03-09 21:06:39 -04:00

Continuing docs overhaul, tune now has: - [x] better landing page - [x] a getting started guide - [x] user guide was cut down, partially merged with FAQ, and partially integrated with tutorials - [x] the new user guide contains guides to tune features and practical integrations - [x] we rewrote some of the feature guides for clarity - [x] we got rid of sphinx-gallery for this sub-project (only data and core left), as it looks bad and is unnecessarily complicated anyway (plus, makes the build slower) - [x] sphinx-gallery examples are now moved to markdown notebook, as started in #22030. - [x] Examples are tested in the new framework, of course. There's still a lot one can do, but this is already getting too large. Will follow up with more fine-tuning next week. Co-authored-by: Antoni Baum <antoni.baum@protonmail.com> Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
91 lines
3.3 KiB
ReStructuredText
91 lines
3.3 KiB
ReStructuredText
A Guide To Callbacks & Metrics in Tune
|
|
======================================
|
|
|
|
.. _tune-callbacks:
|
|
|
|
How to work with Callbacks?
|
|
---------------------------
|
|
|
|
Ray Tune supports callbacks that are called during various times of the training process.
|
|
Callbacks can be passed as a parameter to ``tune.run()``, and the sub-method you provide will be invoked automatically.
|
|
|
|
This simple callback just prints a metric each time a result is received:
|
|
|
|
.. code-block:: python
|
|
|
|
from ray import tune
|
|
from ray.tune import Callback
|
|
|
|
|
|
class MyCallback(Callback):
|
|
def on_trial_result(self, iteration, trials, trial, result, **info):
|
|
print(f"Got result: {result['metric']}")
|
|
|
|
|
|
def train(config):
|
|
for i in range(10):
|
|
tune.report(metric=i)
|
|
|
|
|
|
tune.run(
|
|
train,
|
|
callbacks=[MyCallback()])
|
|
|
|
For more details and available hooks, please :ref:`see the API docs for Ray Tune callbacks <tune-callbacks-docs>`.
|
|
|
|
|
|
.. _tune-autofilled-metrics:
|
|
|
|
How to use log metrics in Tune?
|
|
-------------------------------
|
|
|
|
You can log arbitrary values and metrics in both Function and Class training APIs:
|
|
|
|
.. code-block:: python
|
|
|
|
def trainable(config):
|
|
for i in range(num_epochs):
|
|
...
|
|
tune.report(acc=accuracy, metric_foo=random_metric_1, bar=metric_2)
|
|
|
|
class Trainable(tune.Trainable):
|
|
def step(self):
|
|
...
|
|
# don't call report here!
|
|
return dict(acc=accuracy, metric_foo=random_metric_1, bar=metric_2)
|
|
|
|
|
|
.. tip::
|
|
Note that ``tune.report()`` is not meant to transfer large amounts of data, like models or datasets.
|
|
Doing so can incur large overheads and slow down your Tune run significantly.
|
|
|
|
Which metrics get automatically filled in?
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Tune has the concept of auto-filled metrics.
|
|
During training, Tune will automatically log the below metrics in addition to any user-provided values.
|
|
All of these can be used as stopping conditions or passed as a parameter to Trial Schedulers/Search Algorithms.
|
|
|
|
* ``config``: The hyperparameter configuration
|
|
* ``date``: String-formatted date and time when the result was processed
|
|
* ``done``: True if the trial has been finished, False otherwise
|
|
* ``episodes_total``: Total number of episodes (for RLLib trainables)
|
|
* ``experiment_id``: Unique experiment ID
|
|
* ``experiment_tag``: Unique experiment tag (includes parameter values)
|
|
* ``hostname``: Hostname of the worker
|
|
* ``iterations_since_restore``: The number of times ``tune.report()/trainable.train()`` has been
|
|
called after restoring the worker from a checkpoint
|
|
* ``node_ip``: Host IP of the worker
|
|
* ``pid``: Process ID (PID) of the worker process
|
|
* ``time_since_restore``: Time in seconds since restoring from a checkpoint.
|
|
* ``time_this_iter_s``: Runtime of the current training iteration in seconds (i.e.
|
|
one call to the trainable function or to ``_train()`` in the class API.
|
|
* ``time_total_s``: Total runtime in seconds.
|
|
* ``timestamp``: Timestamp when the result was processed
|
|
* ``timesteps_since_restore``: Number of timesteps since restoring from a checkpoint
|
|
* ``timesteps_total``: Total number of timesteps
|
|
* ``training_iteration``: The number of times ``tune.report()`` has been
|
|
called
|
|
* ``trial_id``: Unique trial ID
|
|
|
|
All of these metrics can be seen in the ``Trial.last_result`` dictionary.
|