# Using Weights & Biases with Tune

(tune-wandb-ref)=

[Weights & Biases](https://www.wandb.ai/) (Wandb) is a tool for experiment
tracking, model optimizaton, and dataset versioning. It is very popular
in the machine learning and data science community for its superb visualization
tools.

```{image} /images/wandb_logo_full.png
:align: center
:alt: Weights & Biases
:height: 80px
:target: https://www.wandb.ai/
```

Ray Tune currently offers two lightweight integrations for Weights & Biases.
One is the {ref}`WandbLoggerCallback <tune-wandb-logger>`, which automatically logs
metrics reported to Tune to the Wandb API.

The other one is the {ref}`@wandb_mixin <tune-wandb-mixin>` decorator, which can be
used with the function API. It automatically
initializes the Wandb API with Tune's training information. You can just use the
Wandb API like you would normally do, e.g. using `wandb.log()` to log your training
process.

```{contents}
:backlinks: none
:local: true
```

## Running A Weights & Biases Example

In the following example we're going to use both of the above methods, namely the `WandbLoggerCallback` and
the `wandb_mixin` decorator to log metrics.
Let's start with a few crucial imports:

In [1]:
import numpy as np
import wandb

from ray import air, tune
from ray.air import session
from ray.tune import Trainable
from ray.air.callbacks.wandb import WandbLoggerCallback
from ray.tune.integration.wandb import (
    WandbTrainableMixin,
    wandb_mixin,
)

Next, let's define an easy `objective` function (a Tune `Trainable`) that reports a random loss to Tune.
The objective function itself is not important for this example, since we want to focus on the Weights & Biases
integration primarily.

In [2]:
def objective(config, checkpoint_dir=None):
    for i in range(30):
        loss = config["mean"] + config["sd"] * np.random.randn()
        session.report({"loss": loss})

Given that you provide an `api_key_file` pointing to your Weights & Biases API key, you cna define a
simple grid-search Tune run using the `WandbLoggerCallback` as follows:

In [3]:
def tune_function(api_key_file):
    """Example for using a WandbLoggerCallback with the function API"""
    tuner = tune.Tuner(
        objective,
        tune_config=tune.TuneConfig(
            metric="loss",
            mode="min",
        ),
        run_config=air.RunConfig(
            callbacks=[
                WandbLoggerCallback(api_key_file=api_key_file, project="Wandb_example")
            ],
        ),
        param_space={
            "mean": tune.grid_search([1, 2, 3, 4, 5]),
            "sd": tune.uniform(0.2, 0.8),
        },
    )
    results = tuner.fit()

    return results.get_best_result().config

To use the `wandb_mixin` decorator, you can simply decorate the objective function from earlier.
Note that we also use `wandb.log(...)` to log the `loss` to Weights & Biases as a dictionary.
Otherwise, the decorated version of our objective is identical to its original.

In [4]:
@wandb_mixin
def decorated_objective(config, checkpoint_dir=None):
    for i in range(30):
        loss = config["mean"] + config["sd"] * np.random.randn()
        session.report({"loss": loss})
        wandb.log(dict(loss=loss))

With the `decorated_objective` defined, running a Tune experiment is as simple as providing this objective and
passing the `api_key_file` to the `wandb` key of your Tune `config`:

In [5]:
def tune_decorated(api_key_file):
    """Example for using the @wandb_mixin decorator with the function API"""
    tuner = tune.Tuner(
        objective,
        tune_config=tune.TuneConfig(
            metric="loss",
            mode="min",
        ),
        param_space={
            "mean": tune.grid_search([1, 2, 3, 4, 5]),
            "sd": tune.uniform(0.2, 0.8),
            "wandb": {"api_key_file": api_key_file, "project": "Wandb_example"},
        },
    )
    results = tuner.fit()

    return results.get_best_result().config

Finally, you can also define a class-based Tune `Trainable` by using the `WandbTrainableMixin` to define your objective:

In [6]:
class WandbTrainable(WandbTrainableMixin, Trainable):
    def step(self):
        for i in range(30):
            loss = self.config["mean"] + self.config["sd"] * np.random.randn()
            wandb.log({"loss": loss})
        return {"loss": loss, "done": True}

Running Tune with this `WandbTrainable` works exactly the same as with the function API.
The below `tune_trainable` function differs from `tune_decorated` above only in the first argument we pass to
`Tuner()`:

In [8]:
def tune_trainable(api_key_file):
    """Example for using a WandTrainableMixin with the class API"""
    tuner = tune.Tuner(
        WandbTrainable,
        tune_config=tune.TuneConfig(
            metric="loss",
            mode="min",
        ),
        param_space={
            "mean": tune.grid_search([1, 2, 3, 4, 5]),
            "sd": tune.uniform(0.2, 0.8),
            "wandb": {"api_key_file": api_key_file, "project": "Wandb_example"},
        },
    )
    results = tuner.fit()

    return results.get_best_result().config

Since you may not have an API key for Wandb, we can _mock_ the Wandb logger and test all three of our training
functions as follows.
If you do have an API key file, make sure to set `mock_api` to `False` and pass in the right `api_key_file` below.

In [9]:
import tempfile
from unittest.mock import MagicMock

mock_api = True

api_key_file = "~/.wandb_api_key"

if mock_api:
    WandbLoggerCallback._logger_process_cls = MagicMock
    decorated_objective.__mixins__ = tuple()
    WandbTrainable._wandb = MagicMock()
    wandb = MagicMock()  # noqa: F811
    temp_file = tempfile.NamedTemporaryFile()
    temp_file.write(b"1234")
    temp_file.flush()
    api_key_file = temp_file.name

tune_function(api_key_file)
tune_decorated(api_key_file)
tune_trainable(api_key_file)

if mock_api:
    temp_file.close()

2022-07-22 15:39:38,323	INFO services.py:1483 -- View the Ray dashboard at [1m[32mhttp://127.0.0.1:8266[39m[22m

from ray.air import session

def train(config):
    # ...
    session.report({"metric": metric}, checkpoint=checkpoint)

For more information please see https://docs.ray.io/en/master/ray-air/key-concepts.html#session



Trial name,status,loc,mean,sd,iter,total time (s),loss
objective_1e575_00000,TERMINATED,127.0.0.1:47932,1,0.65407,30,0.203522,0.653528
objective_1e575_00001,TERMINATED,127.0.0.1:47941,2,0.72087,30,0.314281,1.14091
objective_1e575_00002,TERMINATED,127.0.0.1:47942,3,0.680016,30,0.43947,2.11278
objective_1e575_00003,TERMINATED,127.0.0.1:47943,4,0.296117,30,0.442453,4.33397
objective_1e575_00004,TERMINATED,127.0.0.1:47944,5,0.358219,30,0.362729,5.41971


2022-07-22 15:39:41,596	INFO plugin_schema_manager.py:52 -- Loading the default runtime env schemas: ['/Users/kai/coding/ray/python/ray/_private/runtime_env/../../runtime_env/schemas/working_dir_schema.json', '/Users/kai/coding/ray/python/ray/_private/runtime_env/../../runtime_env/schemas/pip_schema.json'].


Result for objective_1e575_00000:
  date: 2022-07-22_15-39-44
  done: false
  experiment_id: 60ffbe63fc834195a37fabc078985531
  hostname: Kais-MacBook-Pro.local
  iterations_since_restore: 1
  loss: 0.4005309978356091
  node_ip: 127.0.0.1
  pid: 47932
  time_since_restore: 0.0001418590545654297
  time_this_iter_s: 0.0001418590545654297
  time_total_s: 0.0001418590545654297
  timestamp: 1658500784
  timesteps_since_restore: 0
  training_iteration: 1
  trial_id: 1e575_00000
  warmup_time: 0.002913236618041992
  
Result for objective_1e575_00000:
  date: 2022-07-22_15-39-44
  done: true
  experiment_id: 60ffbe63fc834195a37fabc078985531
  experiment_tag: 0_mean=1,sd=0.6541
  hostname: Kais-MacBook-Pro.local
  iterations_since_restore: 30
  loss: 0.6535282890948189
  node_ip: 127.0.0.1
  pid: 47932
  time_since_restore: 0.203521728515625
  time_this_iter_s: 0.003339052200317383
  time_total_s: 0.203521728515625
  timestamp: 1658500784
  timesteps_since_restore: 0
  training_iteration: 30
  

2022-07-22 15:39:47,478	INFO tune.py:738 -- Total run time: 6.95 seconds (6.00 seconds for the tuning loop).


Trial name,status,loc,mean,sd,iter,total time (s),loss
objective_227e1_00000,TERMINATED,127.0.0.1:47968,1,0.356258,30,0.0869601,1.41581
objective_227e1_00001,TERMINATED,127.0.0.1:47973,2,0.411041,30,0.371924,2.9165
objective_227e1_00002,TERMINATED,127.0.0.1:47974,3,0.359191,30,0.305055,2.57809
objective_227e1_00003,TERMINATED,127.0.0.1:47975,4,0.543202,30,0.218044,5.06532
objective_227e1_00004,TERMINATED,127.0.0.1:47976,5,0.777638,30,0.287682,6.36554


Result for objective_227e1_00000:
  date: 2022-07-22_15-39-50
  done: false
  experiment_id: e80ef3e4843c41068c733322d48e0817
  hostname: Kais-MacBook-Pro.local
  iterations_since_restore: 1
  loss: 0.27641082730463906
  node_ip: 127.0.0.1
  pid: 47968
  time_since_restore: 0.0001361370086669922
  time_this_iter_s: 0.0001361370086669922
  time_total_s: 0.0001361370086669922
  timestamp: 1658500790
  timesteps_since_restore: 0
  training_iteration: 1
  trial_id: 227e1_00000
  warmup_time: 0.003004789352416992
  
Result for objective_227e1_00000:
  date: 2022-07-22_15-39-50
  done: true
  experiment_id: e80ef3e4843c41068c733322d48e0817
  experiment_tag: 0_mean=1,sd=0.3563
  hostname: Kais-MacBook-Pro.local
  iterations_since_restore: 30
  loss: 1.4158135642199134
  node_ip: 127.0.0.1
  pid: 47968
  time_since_restore: 0.0869600772857666
  time_this_iter_s: 0.0022199153900146484
  time_total_s: 0.0869600772857666
  timestamp: 1658500790
  timesteps_since_restore: 0
  training_iteration: 3

2022-07-22 15:39:53,254	INFO tune.py:738 -- Total run time: 5.76 seconds (5.63 seconds for the tuning loop).


Trial name,status,loc,mean,sd,iter,total time (s),loss
WandbTrainable_25f04_00000,ERROR,127.0.0.1:47994,1,0.524531,1,0.000827789,0.994137
WandbTrainable_25f04_00001,ERROR,127.0.0.1:48005,2,0.515265,1,0.00108528,2.31254
WandbTrainable_25f04_00002,ERROR,127.0.0.1:48006,3,0.56327,1,0.00111198,3.43952
WandbTrainable_25f04_00003,ERROR,127.0.0.1:48007,4,0.507054,1,0.000993013,4.53341
WandbTrainable_25f04_00004,ERROR,127.0.0.1:48008,5,0.372142,1,0.000849962,5.13408

Trial name,# failures,error file
WandbTrainable_25f04_00000,1,"/Users/kai/ray_results/WandbTrainable_2022-07-22_15-39-53/WandbTrainable_25f04_00000_0_mean=1,sd=0.5245_2022-07-22_15-39-53/error.txt"
WandbTrainable_25f04_00001,1,"/Users/kai/ray_results/WandbTrainable_2022-07-22_15-39-53/WandbTrainable_25f04_00001_1_mean=2,sd=0.5153_2022-07-22_15-39-56/error.txt"
WandbTrainable_25f04_00002,1,"/Users/kai/ray_results/WandbTrainable_2022-07-22_15-39-53/WandbTrainable_25f04_00002_2_mean=3,sd=0.5633_2022-07-22_15-39-56/error.txt"
WandbTrainable_25f04_00003,1,"/Users/kai/ray_results/WandbTrainable_2022-07-22_15-39-53/WandbTrainable_25f04_00003_3_mean=4,sd=0.5071_2022-07-22_15-39-56/error.txt"
WandbTrainable_25f04_00004,1,"/Users/kai/ray_results/WandbTrainable_2022-07-22_15-39-53/WandbTrainable_25f04_00004_4_mean=5,sd=0.3721_2022-07-22_15-39-56/error.txt"


2022-07-22 15:39:56,146	ERROR trial_runner.py:921 -- Trial WandbTrainable_25f04_00000: Error processing event.
ray.exceptions.RayTaskError(NotImplementedError): [36mray::WandbTrainable.save()[39m (pid=47994, ip=127.0.0.1, repr=<__main__.WandbTrainable object at 0x11052de10>)
  File "/Users/kai/coding/ray/python/ray/tune/trainable/trainable.py", line 449, in save
    checkpoint_dict_or_path = self.save_checkpoint(checkpoint_dir)
  File "/Users/kai/coding/ray/python/ray/tune/trainable/trainable.py", line 1014, in save_checkpoint
    raise NotImplementedError
NotImplementedError


Result for WandbTrainable_25f04_00000:
  date: 2022-07-22_15-39-56
  done: true
  experiment_id: c0ac6bf4f2af45368a3c5c3e14e47115
  hostname: Kais-MacBook-Pro.local
  iterations_since_restore: 1
  loss: 0.9941371354505734
  node_ip: 127.0.0.1
  pid: 47994
  time_since_restore: 0.000827789306640625
  time_this_iter_s: 0.000827789306640625
  time_total_s: 0.000827789306640625
  timestamp: 1658500796
  timesteps_since_restore: 0
  training_iteration: 1
  trial_id: 25f04_00000
  warmup_time: 0.0031821727752685547
  
Result for WandbTrainable_25f04_00000:
  date: 2022-07-22_15-39-56
  done: true
  experiment_id: c0ac6bf4f2af45368a3c5c3e14e47115
  experiment_tag: 0_mean=1,sd=0.5245
  hostname: Kais-MacBook-Pro.local
  iterations_since_restore: 1
  loss: 0.9941371354505734
  node_ip: 127.0.0.1
  pid: 47994
  time_since_restore: 0.000827789306640625
  time_this_iter_s: 0.000827789306640625
  time_total_s: 0.000827789306640625
  timestamp: 1658500796
  timesteps_since_restore: 0
  training_iter

2022-07-22 15:39:59,299	ERROR trial_runner.py:921 -- Trial WandbTrainable_25f04_00002: Error processing event.
ray.exceptions.RayTaskError(NotImplementedError): [36mray::WandbTrainable.save()[39m (pid=48006, ip=127.0.0.1, repr=<__main__.WandbTrainable object at 0x11a54c8d0>)
  File "/Users/kai/coding/ray/python/ray/tune/trainable/trainable.py", line 449, in save
    checkpoint_dict_or_path = self.save_checkpoint(checkpoint_dir)
  File "/Users/kai/coding/ray/python/ray/tune/trainable/trainable.py", line 1014, in save_checkpoint
    raise NotImplementedError
NotImplementedError
2022-07-22 15:39:59,305	ERROR trial_runner.py:921 -- Trial WandbTrainable_25f04_00004: Error processing event.
ray.exceptions.RayTaskError(NotImplementedError): [36mray::WandbTrainable.save()[39m (pid=48008, ip=127.0.0.1, repr=<__main__.WandbTrainable object at 0x11c314d90>)
  File "/Users/kai/coding/ray/python/ray/tune/trainable/trainable.py", line 449, in save
    checkpoint_dict_or_path = self.save_checkpoi

Result for WandbTrainable_25f04_00001:
  date: 2022-07-22_15-39-59
  done: true
  experiment_id: b0920f67a88f4993b7ec85dee2f78022
  hostname: Kais-MacBook-Pro.local
  iterations_since_restore: 1
  loss: 2.3125440070079093
  node_ip: 127.0.0.1
  pid: 48005
  time_since_restore: 0.0010852813720703125
  time_this_iter_s: 0.0010852813720703125
  time_total_s: 0.0010852813720703125
  timestamp: 1658500799
  timesteps_since_restore: 0
  training_iteration: 1
  trial_id: 25f04_00001
  warmup_time: 0.0049626827239990234
  
Result for WandbTrainable_25f04_00004:
  date: 2022-07-22_15-39-59
  done: true
  experiment_id: 4435b2105eb24fbaba4778e33ce2e1a9
  hostname: Kais-MacBook-Pro.local
  iterations_since_restore: 1
  loss: 5.134083536061109
  node_ip: 127.0.0.1
  pid: 48008
  time_since_restore: 0.0008499622344970703
  time_this_iter_s: 0.0008499622344970703
  time_total_s: 0.0008499622344970703
  timestamp: 1658500799
  timesteps_since_restore: 0
  training_iteration: 1
  trial_id: 25f04_00004

2022-07-22 15:39:59,455	ERROR tune.py:733 -- Trials did not complete: [WandbTrainable_25f04_00000, WandbTrainable_25f04_00001, WandbTrainable_25f04_00002, WandbTrainable_25f04_00003, WandbTrainable_25f04_00004]
2022-07-22 15:39:59,456	INFO tune.py:738 -- Total run time: 6.18 seconds (6.04 seconds for the tuning loop).


This completes our Tune and Wandb walk-through.
In the following sections you can find more details on the API of the Tune-Wandb integration.

## Tune Wandb API Reference

### WandbLoggerCallback

(tune-wandb-logger)=

```{eval-rst}
.. autoclass:: ray.air.callbacks.wandb.WandbLoggerCallback
   :noindex:
```

### Wandb-Mixin

(tune-wandb-mixin)=

```{eval-rst}
.. autofunction:: ray.tune.integration.wandb.wandb_mixin
   :noindex:
```