ray/doc/source/ray-air/examples/upload_to_wandb.ipynb
Antoni Baum 128f9e5664
[AIR] Move integration logging callbacks to AIR (#26126)
As the integration logging callbacks are commonly used with AIR Trainers, they should be moved from the tune package to the air package. The old imports will still work, but raise a deprecation warning.
2022-06-28 17:25:19 -07:00

410 lines
21 KiB
Text

{
"cells": [
{
"cell_type": "markdown",
"id": "f37e8a9f",
"metadata": {},
"source": [
"# Logging results and uploading models to Weights & Biases\n",
"In this example, we train a simple XGBoost model and log the training\n",
"results to Weights & Biases. We also save the resulting model checkpoints\n",
"as artifacts."
]
},
{
"cell_type": "markdown",
"id": "27d04c97",
"metadata": {},
"source": [
"Let's start with installing our dependencies:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "4e697e5d",
"metadata": {},
"outputs": [],
"source": [
"!pip install -qU \"ray[tune]\" sklearn xgboost_ray wandb"
]
},
{
"cell_type": "markdown",
"id": "3096e7c9",
"metadata": {},
"source": [
"Then we need some imports:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "9c286701",
"metadata": {},
"outputs": [],
"source": [
"import ray\n",
"\n",
"from ray.air import RunConfig\n",
"from ray.air.result import Result\n",
"from ray.train.xgboost import XGBoostTrainer\n",
"from ray.air.callbacks.wandb import WandbLoggerCallback"
]
},
{
"cell_type": "markdown",
"id": "2efa1564",
"metadata": {},
"source": [
"We define a simple function that returns our training dataset as a Ray Dataset:\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "a63ebd10",
"metadata": {},
"outputs": [],
"source": [
"def get_train_dataset() -> ray.data.Dataset:\n",
" dataset = ray.data.read_csv(\"s3://air-example-data/breast_cancer.csv\")\n",
" return dataset"
]
},
{
"cell_type": "markdown",
"id": "d07cf41f",
"metadata": {},
"source": [
"Now we define a simple training function. All the magic happens within the `WandbLoggerCallback`:\n",
"\n",
"```python\n",
"WandbLoggerCallback(\n",
" project=wandb_project,\n",
" save_checkpoints=True,\n",
")\n",
"```\n",
"\n",
"It will automatically log all results to Weights & Biases and upload the checkpoints as artifacts. It assumes you're logged in into Wandb via an API key or `wandb login`."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "52edfde0",
"metadata": {},
"outputs": [],
"source": [
"def train_model(train_dataset: ray.data.Dataset, wandb_project: str) -> Result:\n",
" \"\"\"Train a simple XGBoost model and return the result.\"\"\"\n",
" trainer = XGBoostTrainer(\n",
" scaling_config={\"num_workers\": 2},\n",
" params={\"tree_method\": \"auto\"},\n",
" label_column=\"target\",\n",
" datasets={\"train\": train_dataset},\n",
" num_boost_round=10,\n",
" run_config=RunConfig(\n",
" callbacks=[\n",
" # This is the part needed to enable logging to Weights & Biases.\n",
" # It assumes you've logged in before, e.g. with `wandb login`.\n",
" WandbLoggerCallback(\n",
" project=wandb_project,\n",
" save_checkpoints=True,\n",
" )\n",
" ]\n",
" ),\n",
" )\n",
" result = trainer.fit()\n",
" return result"
]
},
{
"cell_type": "markdown",
"id": "1959ce19",
"metadata": {},
"source": [
"Let's kick off a run:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "64f80d6c",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"2022-05-19 15:22:11,956\tINFO services.py:1483 -- View the Ray dashboard at \u001b[1m\u001b[32mhttp://127.0.0.1:8266\u001b[39m\u001b[22m\n",
"2022-05-19 15:22:15,995\tINFO wandb.py:172 -- Already logged into W&B.\n"
]
},
{
"data": {
"text/html": [
"== Status ==<br>Current time: 2022-05-19 15:22:42 (running for 00:00:26.61)<br>Memory usage on this node: 10.2/16.0 GiB<br>Using FIFO scheduling algorithm.<br>Resources requested: 0/16 CPUs, 0/0 GPUs, 0.0/4.6 GiB heap, 0.0/2.0 GiB objects<br>Result logdir: /Users/kai/ray_results/XGBoostTrainer_2022-05-19_15-22-14<br>Number of trials: 1/1 (1 TERMINATED)<br><table>\n",
"<thead>\n",
"<tr><th>Trial name </th><th>status </th><th>loc </th><th style=\"text-align: right;\"> iter</th><th style=\"text-align: right;\"> total time (s)</th><th style=\"text-align: right;\"> train-rmse</th></tr>\n",
"</thead>\n",
"<tbody>\n",
"<tr><td>XGBoostTrainer_14a73_00000</td><td>TERMINATED</td><td>127.0.0.1:20065</td><td style=\"text-align: right;\"> 10</td><td style=\"text-align: right;\"> 10.2724</td><td style=\"text-align: right;\"> 0.030717</td></tr>\n",
"</tbody>\n",
"</table><br><br>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\u001b[2m\u001b[33m(raylet)\u001b[0m 2022-05-19 15:22:17,422\tINFO context.py:70 -- Exec'ing worker with command: exec /Users/kai/.pyenv/versions/3.7.7/bin/python3.7 /Users/kai/coding/ray/python/ray/workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=61838 --object-store-name=/tmp/ray/session_2022-05-19_15-22-09_017478_19912/sockets/plasma_store --raylet-name=/tmp/ray/session_2022-05-19_15-22-09_017478_19912/sockets/raylet --redis-address=None --storage=None --temp-dir=/tmp/ray --metrics-agent-port=63609 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:62933 --redis-password=5241590000000000 --startup-token=16 --runtime-env-hash=-2010331134\n",
"\u001b[34m\u001b[1mwandb\u001b[0m: Currently logged in as: \u001b[33mkaifricke\u001b[0m. Use \u001b[1m`wandb login --relogin`\u001b[0m to force relogin\n",
"\u001b[2m\u001b[36m(GBDTTrainable pid=20065)\u001b[0m UserWarning: Dataset 'train' has 1 blocks, which is less than the `num_workers` 2. This dataset will be automatically repartitioned to 2 blocks.\n",
"\u001b[2m\u001b[33m(raylet)\u001b[0m 2022-05-19 15:22:23,215\tINFO context.py:70 -- Exec'ing worker with command: exec /Users/kai/.pyenv/versions/3.7.7/bin/python3.7 /Users/kai/coding/ray/python/ray/workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=61838 --object-store-name=/tmp/ray/session_2022-05-19_15-22-09_017478_19912/sockets/plasma_store --raylet-name=/tmp/ray/session_2022-05-19_15-22-09_017478_19912/sockets/raylet --redis-address=None --storage=None --temp-dir=/tmp/ray --metrics-agent-port=63609 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:62933 --redis-password=5241590000000000 --startup-token=17 --runtime-env-hash=-2010331069\n"
]
},
{
"data": {
"text/html": [
"Tracking run with wandb version 0.12.16"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"Run data is saved locally in <code>/Users/kai/coding/ray/doc/source/ray-air/examples/wandb/run-20220519_152218-14a73_00000</code>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"Syncing run <strong><a href=\"https://wandb.ai/kaifricke/ray_air_example/runs/14a73_00000\" target=\"_blank\">XGBoostTrainer_14a73_00000</a></strong> to <a href=\"https://wandb.ai/kaifricke/ray_air_example\" target=\"_blank\">Weights & Biases</a> (<a href=\"https://wandb.me/run\" target=\"_blank\">docs</a>)<br/>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\u001b[2m\u001b[36m(GBDTTrainable pid=20065)\u001b[0m 2022-05-19 15:22:24,711\tINFO main.py:980 -- [RayXGBoost] Created 2 new actors (2 total actors). Waiting until actors are ready for training.\n",
"\u001b[2m\u001b[33m(raylet)\u001b[0m 2022-05-19 15:22:26,090\tINFO context.py:70 -- Exec'ing worker with command: exec /Users/kai/.pyenv/versions/3.7.7/bin/python3.7 /Users/kai/coding/ray/python/ray/workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=61838 --object-store-name=/tmp/ray/session_2022-05-19_15-22-09_017478_19912/sockets/plasma_store --raylet-name=/tmp/ray/session_2022-05-19_15-22-09_017478_19912/sockets/raylet --redis-address=None --storage=None --temp-dir=/tmp/ray --metrics-agent-port=63609 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:62933 --redis-password=5241590000000000 --startup-token=18 --runtime-env-hash=-2010331069\n",
"\u001b[2m\u001b[33m(raylet)\u001b[0m 2022-05-19 15:22:26,234\tINFO context.py:70 -- Exec'ing worker with command: exec /Users/kai/.pyenv/versions/3.7.7/bin/python3.7 /Users/kai/coding/ray/python/ray/workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=61838 --object-store-name=/tmp/ray/session_2022-05-19_15-22-09_017478_19912/sockets/plasma_store --raylet-name=/tmp/ray/session_2022-05-19_15-22-09_017478_19912/sockets/raylet --redis-address=None --storage=None --temp-dir=/tmp/ray --metrics-agent-port=63609 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:62933 --redis-password=5241590000000000 --startup-token=19 --runtime-env-hash=-2010331134\n",
"\u001b[2m\u001b[33m(raylet)\u001b[0m 2022-05-19 15:22:26,236\tINFO context.py:70 -- Exec'ing worker with command: exec /Users/kai/.pyenv/versions/3.7.7/bin/python3.7 /Users/kai/coding/ray/python/ray/workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=61838 --object-store-name=/tmp/ray/session_2022-05-19_15-22-09_017478_19912/sockets/plasma_store --raylet-name=/tmp/ray/session_2022-05-19_15-22-09_017478_19912/sockets/raylet --redis-address=None --storage=None --temp-dir=/tmp/ray --metrics-agent-port=63609 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:62933 --redis-password=5241590000000000 --startup-token=20 --runtime-env-hash=-2010331134\n",
"\u001b[2m\u001b[33m(raylet)\u001b[0m 2022-05-19 15:22:26,239\tINFO context.py:70 -- Exec'ing worker with command: exec /Users/kai/.pyenv/versions/3.7.7/bin/python3.7 /Users/kai/coding/ray/python/ray/workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=61838 --object-store-name=/tmp/ray/session_2022-05-19_15-22-09_017478_19912/sockets/plasma_store --raylet-name=/tmp/ray/session_2022-05-19_15-22-09_017478_19912/sockets/raylet --redis-address=None --storage=None --temp-dir=/tmp/ray --metrics-agent-port=63609 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:62933 --redis-password=5241590000000000 --startup-token=21 --runtime-env-hash=-2010331134\n",
"\u001b[2m\u001b[33m(raylet)\u001b[0m 2022-05-19 15:22:26,263\tINFO context.py:70 -- Exec'ing worker with command: exec /Users/kai/.pyenv/versions/3.7.7/bin/python3.7 /Users/kai/coding/ray/python/ray/workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=61838 --object-store-name=/tmp/ray/session_2022-05-19_15-22-09_017478_19912/sockets/plasma_store --raylet-name=/tmp/ray/session_2022-05-19_15-22-09_017478_19912/sockets/raylet --redis-address=None --storage=None --temp-dir=/tmp/ray --metrics-agent-port=63609 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:62933 --redis-password=5241590000000000 --startup-token=22 --runtime-env-hash=-2010331134\n",
"\u001b[2m\u001b[36m(GBDTTrainable pid=20065)\u001b[0m 2022-05-19 15:22:29,260\tINFO main.py:1025 -- [RayXGBoost] Starting XGBoost training.\n",
"\u001b[2m\u001b[36m(_RemoteRayXGBoostActor pid=20130)\u001b[0m [15:22:29] task [xgboost.ray]:6859875216 got new rank 0\n",
"\u001b[2m\u001b[36m(_RemoteRayXGBoostActor pid=20131)\u001b[0m [15:22:29] task [xgboost.ray]:4625795280 got new rank 1\n",
"\u001b[34m\u001b[1mwandb\u001b[0m: Adding directory to artifact (/Users/kai/ray_results/XGBoostTrainer_2022-05-19_15-22-14/XGBoostTrainer_14a73_00000_0_2022-05-19_15-22-16/checkpoint_000000)... Done. 0.1s\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Result for XGBoostTrainer_14a73_00000:\n",
" date: 2022-05-19_15-22-31\n",
" done: false\n",
" experiment_id: 2d50bfe80d2a441e80f4ca05f7c3b607\n",
" hostname: Kais-MacBook-Pro.local\n",
" iterations_since_restore: 1\n",
" node_ip: 127.0.0.1\n",
" pid: 20065\n",
" should_checkpoint: true\n",
" time_since_restore: 10.080440044403076\n",
" time_this_iter_s: 10.080440044403076\n",
" time_total_s: 10.080440044403076\n",
" timestamp: 1652970151\n",
" timesteps_since_restore: 0\n",
" train-rmse: 0.357284\n",
" training_iteration: 1\n",
" trial_id: 14a73_00000\n",
" warmup_time: 0.006903171539306641\n",
" \n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\u001b[34m\u001b[1mwandb\u001b[0m: Adding directory to artifact (/Users/kai/ray_results/XGBoostTrainer_2022-05-19_15-22-14/XGBoostTrainer_14a73_00000_0_2022-05-19_15-22-16/checkpoint_000001)... Done. 0.1s\n",
"\u001b[2m\u001b[36m(GBDTTrainable pid=20065)\u001b[0m 2022-05-19 15:22:32,051\tINFO main.py:1519 -- [RayXGBoost] Finished XGBoost training on training data with total N=569 in 7.37 seconds (2.79 pure XGBoost training time).\n",
"\u001b[34m\u001b[1mwandb\u001b[0m: Adding directory to artifact (/Users/kai/ray_results/XGBoostTrainer_2022-05-19_15-22-14/XGBoostTrainer_14a73_00000_0_2022-05-19_15-22-16/checkpoint_000002)... Done. 0.1s\n",
"\u001b[34m\u001b[1mwandb\u001b[0m: Adding directory to artifact (/Users/kai/ray_results/XGBoostTrainer_2022-05-19_15-22-14/XGBoostTrainer_14a73_00000_0_2022-05-19_15-22-16/checkpoint_000003)... Done. 0.1s\n",
"\u001b[34m\u001b[1mwandb\u001b[0m: Adding directory to artifact (/Users/kai/ray_results/XGBoostTrainer_2022-05-19_15-22-14/XGBoostTrainer_14a73_00000_0_2022-05-19_15-22-16/checkpoint_000004)... Done. 0.1s\n",
"\u001b[34m\u001b[1mwandb\u001b[0m: Adding directory to artifact (/Users/kai/ray_results/XGBoostTrainer_2022-05-19_15-22-14/XGBoostTrainer_14a73_00000_0_2022-05-19_15-22-16/checkpoint_000005)... Done. 0.1s\n",
"\u001b[34m\u001b[1mwandb\u001b[0m: Adding directory to artifact (/Users/kai/ray_results/XGBoostTrainer_2022-05-19_15-22-14/XGBoostTrainer_14a73_00000_0_2022-05-19_15-22-16/checkpoint_000006)... Done. 0.1s\n",
"\u001b[34m\u001b[1mwandb\u001b[0m: Adding directory to artifact (/Users/kai/ray_results/XGBoostTrainer_2022-05-19_15-22-14/XGBoostTrainer_14a73_00000_0_2022-05-19_15-22-16/checkpoint_000007)... Done. 0.1s\n",
"\u001b[34m\u001b[1mwandb\u001b[0m: Adding directory to artifact (/Users/kai/ray_results/XGBoostTrainer_2022-05-19_15-22-14/XGBoostTrainer_14a73_00000_0_2022-05-19_15-22-16/checkpoint_000008)... Done. 0.1s\n",
"\u001b[34m\u001b[1mwandb\u001b[0m: Adding directory to artifact (/Users/kai/ray_results/XGBoostTrainer_2022-05-19_15-22-14/XGBoostTrainer_14a73_00000_0_2022-05-19_15-22-16/checkpoint_000009)... Done. 0.1s\n",
"\u001b[34m\u001b[1mwandb\u001b[0m: Adding directory to artifact (/Users/kai/ray_results/XGBoostTrainer_2022-05-19_15-22-14/XGBoostTrainer_14a73_00000_0_2022-05-19_15-22-16/checkpoint_000009)... Done. 0.1s\n"
]
},
{
"data": {
"text/html": [
"Waiting for W&B process to finish... <strong style=\"color:green\">(success).</strong>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Result for XGBoostTrainer_14a73_00000:\n",
" date: 2022-05-19_15-22-32\n",
" done: true\n",
" experiment_id: 2d50bfe80d2a441e80f4ca05f7c3b607\n",
" experiment_tag: '0'\n",
" hostname: Kais-MacBook-Pro.local\n",
" iterations_since_restore: 10\n",
" node_ip: 127.0.0.1\n",
" pid: 20065\n",
" should_checkpoint: true\n",
" time_since_restore: 10.272444248199463\n",
" time_this_iter_s: 0.023891210556030273\n",
" time_total_s: 10.272444248199463\n",
" timestamp: 1652970152\n",
" timesteps_since_restore: 0\n",
" train-rmse: 0.030717\n",
" training_iteration: 10\n",
" trial_id: 14a73_00000\n",
" warmup_time: 0.006903171539306641\n",
" \n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"2022-05-19 15:22:42,727\tINFO tune.py:753 -- Total run time: 27.83 seconds (26.61 seconds for the tuning loop).\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"VBox(children=(Label(value='0.090 MB of 0.090 MB uploaded (0.000 MB deduped)\\r'), FloatProgress(value=1.0, max…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<style>\n",
" table.wandb td:nth-child(1) { padding: 0 10px; text-align: left ; width: auto;} td:nth-child(2) {text-align: left ; width: 100%}\n",
" .wandb-row { display: flex; flex-direction: row; flex-wrap: wrap; justify-content: flex-start; width: 100% }\n",
" .wandb-col { display: flex; flex-direction: column; flex-basis: 100%; flex: 1; padding: 10px; }\n",
" </style>\n",
"<div class=\"wandb-row\"><div class=\"wandb-col\"><h3>Run history:</h3><br/><table class=\"wandb\"><tr><td>iterations_since_restore</td><td>▁▂▃▃▄▅▆▆▇█</td></tr><tr><td>time_since_restore</td><td>▁▂▃▃▄▅▅▆▇█</td></tr><tr><td>time_this_iter_s</td><td>█▁▁▁▁▁▁▁▁▁</td></tr><tr><td>time_total_s</td><td>▁▂▃▃▄▅▅▆▇█</td></tr><tr><td>timestamp</td><td>▁▁▁▁▁▁▁▁██</td></tr><tr><td>timesteps_since_restore</td><td>▁▁▁▁▁▁▁▁▁▁</td></tr><tr><td>train-rmse</td><td>█▆▄▃▂▂▂▁▁▁</td></tr><tr><td>training_iteration</td><td>▁▂▃▃▄▅▆▆▇█</td></tr><tr><td>warmup_time</td><td>▁▁▁▁▁▁▁▁▁▁</td></tr></table><br/></div><div class=\"wandb-col\"><h3>Run summary:</h3><br/><table class=\"wandb\"><tr><td>iterations_since_restore</td><td>10</td></tr><tr><td>time_since_restore</td><td>10.27244</td></tr><tr><td>time_this_iter_s</td><td>0.02389</td></tr><tr><td>time_total_s</td><td>10.27244</td></tr><tr><td>timestamp</td><td>1652970152</td></tr><tr><td>timesteps_since_restore</td><td>0</td></tr><tr><td>train-rmse</td><td>0.03072</td></tr><tr><td>training_iteration</td><td>10</td></tr><tr><td>warmup_time</td><td>0.0069</td></tr></table><br/></div></div>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"Synced <strong style=\"color:#cdcd00\">XGBoostTrainer_14a73_00000</strong>: <a href=\"https://wandb.ai/kaifricke/ray_air_example/runs/14a73_00000\" target=\"_blank\">https://wandb.ai/kaifricke/ray_air_example/runs/14a73_00000</a><br/>Synced 5 W&B file(s), 0 media file(s), 21 artifact file(s) and 0 other file(s)"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"Find logs at: <code>./wandb/run-20220519_152218-14a73_00000/logs</code>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"wandb_project = \"ray_air_example\"\n",
"\n",
"train_dataset = get_train_dataset()\n",
"result = train_model(train_dataset=train_dataset, wandb_project=wandb_project)"
]
},
{
"cell_type": "markdown",
"id": "78701c42",
"metadata": {},
"source": [
"Check out your [WandB](https://wandb.ai/) project to see the results!"
]
}
],
"metadata": {
"jupytext": {
"cell_metadata_filter": "-all",
"main_language": "python",
"notebook_metadata_filter": "-all"
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}