{
"cells": [
{
"cell_type": "markdown",
"id": "f37e8a9f",
"metadata": {},
"source": [
"# Logging results and uploading models to Weights & Biases\n",
"In this example, we train a simple XGBoost model and log the training\n",
"results to Weights & Biases. We also save the resulting model checkpoints\n",
"as artifacts."
]
},
{
"cell_type": "markdown",
"id": "27d04c97",
"metadata": {},
"source": [
"Let's start with installing our dependencies:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "4e697e5d",
"metadata": {},
"outputs": [],
"source": [
"!pip install -qU \"ray[tune]\" sklearn xgboost_ray wandb"
]
},
{
"cell_type": "markdown",
"id": "3096e7c9",
"metadata": {},
"source": [
"Then we need some imports:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "9c286701",
"metadata": {},
"outputs": [],
"source": [
"import ray\n",
"\n",
"from ray.air import RunConfig\n",
"from ray.air.result import Result\n",
"from ray.air.train.integrations.xgboost import XGBoostTrainer\n",
"from ray.tune.integration.wandb import WandbLoggerCallback\n",
"from sklearn.datasets import load_breast_cancer"
]
},
{
"cell_type": "markdown",
"id": "2efa1564",
"metadata": {},
"source": [
"We define a simple function that returns our training dataset as a Ray Dataset:\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "a63ebd10",
"metadata": {},
"outputs": [],
"source": [
"def get_train_dataset() -> ray.data.Dataset:\n",
" \"\"\"Return the \"Breast cancer\" dataset as a Ray dataset.\"\"\"\n",
" data_raw = load_breast_cancer(as_frame=True)\n",
" df = data_raw[\"data\"]\n",
" df[\"target\"] = data_raw[\"target\"]\n",
" return ray.data.from_pandas(df)"
]
},
{
"cell_type": "markdown",
"id": "d07cf41f",
"metadata": {},
"source": [
"Now we define a simple training function. All the magic happens within the `WandbLoggerCallback`:\n",
"\n",
"```python\n",
"WandbLoggerCallback(\n",
" project=wandb_project,\n",
" save_checkpoints=True,\n",
")\n",
"```\n",
"\n",
"It will automatically log all results to Weights & Biases and upload the checkpoints as artifacts. It assumes you're logged in into Wandb via an API key or `wandb login`."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "52edfde0",
"metadata": {},
"outputs": [],
"source": [
"def train_model(train_dataset: ray.data.Dataset, wandb_project: str) -> Result:\n",
" \"\"\"Train a simple XGBoost model and return the result.\"\"\"\n",
" trainer = XGBoostTrainer(\n",
" scaling_config={\"num_workers\": 2},\n",
" params={\"tree_method\": \"auto\"},\n",
" label_column=\"target\",\n",
" datasets={\"train\": train_dataset},\n",
" num_boost_round=10,\n",
" run_config=RunConfig(\n",
" callbacks=[\n",
" # This is the part needed to enable logging to Weights & Biases.\n",
" # It assumes you've logged in before, e.g. with `wandb login`.\n",
" WandbLoggerCallback(\n",
" project=wandb_project,\n",
" save_checkpoints=True,\n",
" )\n",
" ]\n",
" ),\n",
" )\n",
" result = trainer.fit()\n",
" return result"
]
},
{
"cell_type": "markdown",
"id": "1959ce19",
"metadata": {},
"source": [
"Let's kick off a run:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "64f80d6c",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"2022-05-19 15:22:11,956\tINFO services.py:1483 -- View the Ray dashboard at \u001b[1m\u001b[32mhttp://127.0.0.1:8266\u001b[39m\u001b[22m\n",
"2022-05-19 15:22:15,995\tINFO wandb.py:172 -- Already logged into W&B.\n"
]
},
{
"data": {
"text/html": [
"== Status ==
Current time: 2022-05-19 15:22:42 (running for 00:00:26.61)
Memory usage on this node: 10.2/16.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/16 CPUs, 0/0 GPUs, 0.0/4.6 GiB heap, 0.0/2.0 GiB objects
Result logdir: /Users/kai/ray_results/XGBoostTrainer_2022-05-19_15-22-14
Number of trials: 1/1 (1 TERMINATED)
\n",
"\n",
"Trial name | status | loc | iter | total time (s) | train-rmse |
\n",
"\n",
"\n",
"XGBoostTrainer_14a73_00000 | TERMINATED | 127.0.0.1:20065 | 10 | 10.2724 | 0.030717 |
\n",
"\n",
"
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\u001b[2m\u001b[33m(raylet)\u001b[0m 2022-05-19 15:22:17,422\tINFO context.py:70 -- Exec'ing worker with command: exec /Users/kai/.pyenv/versions/3.7.7/bin/python3.7 /Users/kai/coding/ray/python/ray/workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=61838 --object-store-name=/tmp/ray/session_2022-05-19_15-22-09_017478_19912/sockets/plasma_store --raylet-name=/tmp/ray/session_2022-05-19_15-22-09_017478_19912/sockets/raylet --redis-address=None --storage=None --temp-dir=/tmp/ray --metrics-agent-port=63609 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:62933 --redis-password=5241590000000000 --startup-token=16 --runtime-env-hash=-2010331134\n",
"\u001b[34m\u001b[1mwandb\u001b[0m: Currently logged in as: \u001b[33mkaifricke\u001b[0m. Use \u001b[1m`wandb login --relogin`\u001b[0m to force relogin\n",
"\u001b[2m\u001b[36m(GBDTTrainable pid=20065)\u001b[0m UserWarning: Dataset 'train' has 1 blocks, which is less than the `num_workers` 2. This dataset will be automatically repartitioned to 2 blocks.\n",
"\u001b[2m\u001b[33m(raylet)\u001b[0m 2022-05-19 15:22:23,215\tINFO context.py:70 -- Exec'ing worker with command: exec /Users/kai/.pyenv/versions/3.7.7/bin/python3.7 /Users/kai/coding/ray/python/ray/workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=61838 --object-store-name=/tmp/ray/session_2022-05-19_15-22-09_017478_19912/sockets/plasma_store --raylet-name=/tmp/ray/session_2022-05-19_15-22-09_017478_19912/sockets/raylet --redis-address=None --storage=None --temp-dir=/tmp/ray --metrics-agent-port=63609 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:62933 --redis-password=5241590000000000 --startup-token=17 --runtime-env-hash=-2010331069\n"
]
},
{
"data": {
"text/html": [
"Tracking run with wandb version 0.12.16"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"Run data is saved locally in /Users/kai/coding/ray/doc/source/ray-air/examples/wandb/run-20220519_152218-14a73_00000
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"Syncing run XGBoostTrainer_14a73_00000 to Weights & Biases (docs)
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\u001b[2m\u001b[36m(GBDTTrainable pid=20065)\u001b[0m 2022-05-19 15:22:24,711\tINFO main.py:980 -- [RayXGBoost] Created 2 new actors (2 total actors). Waiting until actors are ready for training.\n",
"\u001b[2m\u001b[33m(raylet)\u001b[0m 2022-05-19 15:22:26,090\tINFO context.py:70 -- Exec'ing worker with command: exec /Users/kai/.pyenv/versions/3.7.7/bin/python3.7 /Users/kai/coding/ray/python/ray/workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=61838 --object-store-name=/tmp/ray/session_2022-05-19_15-22-09_017478_19912/sockets/plasma_store --raylet-name=/tmp/ray/session_2022-05-19_15-22-09_017478_19912/sockets/raylet --redis-address=None --storage=None --temp-dir=/tmp/ray --metrics-agent-port=63609 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:62933 --redis-password=5241590000000000 --startup-token=18 --runtime-env-hash=-2010331069\n",
"\u001b[2m\u001b[33m(raylet)\u001b[0m 2022-05-19 15:22:26,234\tINFO context.py:70 -- Exec'ing worker with command: exec /Users/kai/.pyenv/versions/3.7.7/bin/python3.7 /Users/kai/coding/ray/python/ray/workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=61838 --object-store-name=/tmp/ray/session_2022-05-19_15-22-09_017478_19912/sockets/plasma_store --raylet-name=/tmp/ray/session_2022-05-19_15-22-09_017478_19912/sockets/raylet --redis-address=None --storage=None --temp-dir=/tmp/ray --metrics-agent-port=63609 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:62933 --redis-password=5241590000000000 --startup-token=19 --runtime-env-hash=-2010331134\n",
"\u001b[2m\u001b[33m(raylet)\u001b[0m 2022-05-19 15:22:26,236\tINFO context.py:70 -- Exec'ing worker with command: exec /Users/kai/.pyenv/versions/3.7.7/bin/python3.7 /Users/kai/coding/ray/python/ray/workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=61838 --object-store-name=/tmp/ray/session_2022-05-19_15-22-09_017478_19912/sockets/plasma_store --raylet-name=/tmp/ray/session_2022-05-19_15-22-09_017478_19912/sockets/raylet --redis-address=None --storage=None --temp-dir=/tmp/ray --metrics-agent-port=63609 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:62933 --redis-password=5241590000000000 --startup-token=20 --runtime-env-hash=-2010331134\n",
"\u001b[2m\u001b[33m(raylet)\u001b[0m 2022-05-19 15:22:26,239\tINFO context.py:70 -- Exec'ing worker with command: exec /Users/kai/.pyenv/versions/3.7.7/bin/python3.7 /Users/kai/coding/ray/python/ray/workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=61838 --object-store-name=/tmp/ray/session_2022-05-19_15-22-09_017478_19912/sockets/plasma_store --raylet-name=/tmp/ray/session_2022-05-19_15-22-09_017478_19912/sockets/raylet --redis-address=None --storage=None --temp-dir=/tmp/ray --metrics-agent-port=63609 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:62933 --redis-password=5241590000000000 --startup-token=21 --runtime-env-hash=-2010331134\n",
"\u001b[2m\u001b[33m(raylet)\u001b[0m 2022-05-19 15:22:26,263\tINFO context.py:70 -- Exec'ing worker with command: exec /Users/kai/.pyenv/versions/3.7.7/bin/python3.7 /Users/kai/coding/ray/python/ray/workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=61838 --object-store-name=/tmp/ray/session_2022-05-19_15-22-09_017478_19912/sockets/plasma_store --raylet-name=/tmp/ray/session_2022-05-19_15-22-09_017478_19912/sockets/raylet --redis-address=None --storage=None --temp-dir=/tmp/ray --metrics-agent-port=63609 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:62933 --redis-password=5241590000000000 --startup-token=22 --runtime-env-hash=-2010331134\n",
"\u001b[2m\u001b[36m(GBDTTrainable pid=20065)\u001b[0m 2022-05-19 15:22:29,260\tINFO main.py:1025 -- [RayXGBoost] Starting XGBoost training.\n",
"\u001b[2m\u001b[36m(_RemoteRayXGBoostActor pid=20130)\u001b[0m [15:22:29] task [xgboost.ray]:6859875216 got new rank 0\n",
"\u001b[2m\u001b[36m(_RemoteRayXGBoostActor pid=20131)\u001b[0m [15:22:29] task [xgboost.ray]:4625795280 got new rank 1\n",
"\u001b[34m\u001b[1mwandb\u001b[0m: Adding directory to artifact (/Users/kai/ray_results/XGBoostTrainer_2022-05-19_15-22-14/XGBoostTrainer_14a73_00000_0_2022-05-19_15-22-16/checkpoint_000000)... Done. 0.1s\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Result for XGBoostTrainer_14a73_00000:\n",
" date: 2022-05-19_15-22-31\n",
" done: false\n",
" experiment_id: 2d50bfe80d2a441e80f4ca05f7c3b607\n",
" hostname: Kais-MacBook-Pro.local\n",
" iterations_since_restore: 1\n",
" node_ip: 127.0.0.1\n",
" pid: 20065\n",
" should_checkpoint: true\n",
" time_since_restore: 10.080440044403076\n",
" time_this_iter_s: 10.080440044403076\n",
" time_total_s: 10.080440044403076\n",
" timestamp: 1652970151\n",
" timesteps_since_restore: 0\n",
" train-rmse: 0.357284\n",
" training_iteration: 1\n",
" trial_id: 14a73_00000\n",
" warmup_time: 0.006903171539306641\n",
" \n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\u001b[34m\u001b[1mwandb\u001b[0m: Adding directory to artifact (/Users/kai/ray_results/XGBoostTrainer_2022-05-19_15-22-14/XGBoostTrainer_14a73_00000_0_2022-05-19_15-22-16/checkpoint_000001)... Done. 0.1s\n",
"\u001b[2m\u001b[36m(GBDTTrainable pid=20065)\u001b[0m 2022-05-19 15:22:32,051\tINFO main.py:1519 -- [RayXGBoost] Finished XGBoost training on training data with total N=569 in 7.37 seconds (2.79 pure XGBoost training time).\n",
"\u001b[34m\u001b[1mwandb\u001b[0m: Adding directory to artifact (/Users/kai/ray_results/XGBoostTrainer_2022-05-19_15-22-14/XGBoostTrainer_14a73_00000_0_2022-05-19_15-22-16/checkpoint_000002)... Done. 0.1s\n",
"\u001b[34m\u001b[1mwandb\u001b[0m: Adding directory to artifact (/Users/kai/ray_results/XGBoostTrainer_2022-05-19_15-22-14/XGBoostTrainer_14a73_00000_0_2022-05-19_15-22-16/checkpoint_000003)... Done. 0.1s\n",
"\u001b[34m\u001b[1mwandb\u001b[0m: Adding directory to artifact (/Users/kai/ray_results/XGBoostTrainer_2022-05-19_15-22-14/XGBoostTrainer_14a73_00000_0_2022-05-19_15-22-16/checkpoint_000004)... Done. 0.1s\n",
"\u001b[34m\u001b[1mwandb\u001b[0m: Adding directory to artifact (/Users/kai/ray_results/XGBoostTrainer_2022-05-19_15-22-14/XGBoostTrainer_14a73_00000_0_2022-05-19_15-22-16/checkpoint_000005)... Done. 0.1s\n",
"\u001b[34m\u001b[1mwandb\u001b[0m: Adding directory to artifact (/Users/kai/ray_results/XGBoostTrainer_2022-05-19_15-22-14/XGBoostTrainer_14a73_00000_0_2022-05-19_15-22-16/checkpoint_000006)... Done. 0.1s\n",
"\u001b[34m\u001b[1mwandb\u001b[0m: Adding directory to artifact (/Users/kai/ray_results/XGBoostTrainer_2022-05-19_15-22-14/XGBoostTrainer_14a73_00000_0_2022-05-19_15-22-16/checkpoint_000007)... Done. 0.1s\n",
"\u001b[34m\u001b[1mwandb\u001b[0m: Adding directory to artifact (/Users/kai/ray_results/XGBoostTrainer_2022-05-19_15-22-14/XGBoostTrainer_14a73_00000_0_2022-05-19_15-22-16/checkpoint_000008)... Done. 0.1s\n",
"\u001b[34m\u001b[1mwandb\u001b[0m: Adding directory to artifact (/Users/kai/ray_results/XGBoostTrainer_2022-05-19_15-22-14/XGBoostTrainer_14a73_00000_0_2022-05-19_15-22-16/checkpoint_000009)... Done. 0.1s\n",
"\u001b[34m\u001b[1mwandb\u001b[0m: Adding directory to artifact (/Users/kai/ray_results/XGBoostTrainer_2022-05-19_15-22-14/XGBoostTrainer_14a73_00000_0_2022-05-19_15-22-16/checkpoint_000009)... Done. 0.1s\n"
]
},
{
"data": {
"text/html": [
"Waiting for W&B process to finish... (success)."
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Result for XGBoostTrainer_14a73_00000:\n",
" date: 2022-05-19_15-22-32\n",
" done: true\n",
" experiment_id: 2d50bfe80d2a441e80f4ca05f7c3b607\n",
" experiment_tag: '0'\n",
" hostname: Kais-MacBook-Pro.local\n",
" iterations_since_restore: 10\n",
" node_ip: 127.0.0.1\n",
" pid: 20065\n",
" should_checkpoint: true\n",
" time_since_restore: 10.272444248199463\n",
" time_this_iter_s: 0.023891210556030273\n",
" time_total_s: 10.272444248199463\n",
" timestamp: 1652970152\n",
" timesteps_since_restore: 0\n",
" train-rmse: 0.030717\n",
" training_iteration: 10\n",
" trial_id: 14a73_00000\n",
" warmup_time: 0.006903171539306641\n",
" \n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"2022-05-19 15:22:42,727\tINFO tune.py:753 -- Total run time: 27.83 seconds (26.61 seconds for the tuning loop).\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"VBox(children=(Label(value='0.090 MB of 0.090 MB uploaded (0.000 MB deduped)\\r'), FloatProgress(value=1.0, max…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"Run history:
iterations_since_restore | ▁▂▃▃▄▅▆▆▇█ |
time_since_restore | ▁▂▃▃▄▅▅▆▇█ |
time_this_iter_s | █▁▁▁▁▁▁▁▁▁ |
time_total_s | ▁▂▃▃▄▅▅▆▇█ |
timestamp | ▁▁▁▁▁▁▁▁██ |
timesteps_since_restore | ▁▁▁▁▁▁▁▁▁▁ |
train-rmse | █▆▄▃▂▂▂▁▁▁ |
training_iteration | ▁▂▃▃▄▅▆▆▇█ |
warmup_time | ▁▁▁▁▁▁▁▁▁▁ |
Run summary:
iterations_since_restore | 10 |
time_since_restore | 10.27244 |
time_this_iter_s | 0.02389 |
time_total_s | 10.27244 |
timestamp | 1652970152 |
timesteps_since_restore | 0 |
train-rmse | 0.03072 |
training_iteration | 10 |
warmup_time | 0.0069 |
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"Synced XGBoostTrainer_14a73_00000: https://wandb.ai/kaifricke/ray_air_example/runs/14a73_00000
Synced 5 W&B file(s), 0 media file(s), 21 artifact file(s) and 0 other file(s)"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"Find logs at: ./wandb/run-20220519_152218-14a73_00000/logs
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"wandb_project = \"ray_air_example\"\n",
"\n",
"train_dataset = get_train_dataset()\n",
"result = train_model(train_dataset=train_dataset, wandb_project=wandb_project)"
]
},
{
"cell_type": "markdown",
"id": "78701c42",
"metadata": {},
"source": [
"Check out your [WandB](https://wandb.ai/) project to see the results!"
]
}
],
"metadata": {
"jupytext": {
"cell_metadata_filter": "-all",
"main_language": "python",
"notebook_metadata_filter": "-all"
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}