{ "cells": [ { "cell_type": "markdown", "id": "f37e8a9f", "metadata": {}, "source": [ "# Logging results and uploading models to Weights & Biases\n", "In this example, we train a simple XGBoost model and log the training\n", "results to Weights & Biases. We also save the resulting model checkpoints\n", "as artifacts." ] }, { "cell_type": "markdown", "id": "27d04c97", "metadata": {}, "source": [ "Let's start with installing our dependencies:" ] }, { "cell_type": "code", "execution_count": 1, "id": "4e697e5d", "metadata": {}, "outputs": [], "source": [ "!pip install -qU \"ray[tune]\" sklearn xgboost_ray wandb" ] }, { "cell_type": "markdown", "id": "3096e7c9", "metadata": {}, "source": [ "Then we need some imports:" ] }, { "cell_type": "code", "execution_count": 2, "id": "9c286701", "metadata": {}, "outputs": [], "source": [ "import ray\n", "\n", "from ray.ml import RunConfig\n", "from ray.ml.result import Result\n", "from ray.ml.train.integrations.xgboost import XGBoostTrainer\n", "from ray.tune.integration.wandb import WandbLoggerCallback\n", "from sklearn.datasets import load_breast_cancer" ] }, { "cell_type": "markdown", "id": "2efa1564", "metadata": {}, "source": [ "We define a simple function that returns our training dataset as a Ray Dataset:\n" ] }, { "cell_type": "code", "execution_count": 3, "id": "a63ebd10", "metadata": {}, "outputs": [], "source": [ "def get_train_dataset() -> ray.data.Dataset:\n", " \"\"\"Return the \"Breast cancer\" dataset as a Ray dataset.\"\"\"\n", " data_raw = load_breast_cancer(as_frame=True)\n", " df = data_raw[\"data\"]\n", " df[\"target\"] = data_raw[\"target\"]\n", " return ray.data.from_pandas(df)" ] }, { "cell_type": "markdown", "id": "d07cf41f", "metadata": {}, "source": [ "Now we define a simple training function. All the magic happens within the `WandbLoggerCallback`:\n", "\n", "```python\n", "WandbLoggerCallback(\n", " project=wandb_project,\n", " save_checkpoints=True,\n", ")\n", "```\n", "\n", "It will automatically log all results to Weights & Biases and upload the checkpoints as artifacts. It assumes you're logged in into Wandb via an API key or `wandb login`." ] }, { "cell_type": "code", "execution_count": 4, "id": "52edfde0", "metadata": {}, "outputs": [], "source": [ "def train_model(train_dataset: ray.data.Dataset, wandb_project: str) -> Result:\n", " \"\"\"Train a simple XGBoost model and return the result.\"\"\"\n", " trainer = XGBoostTrainer(\n", " scaling_config={\"num_workers\": 2},\n", " params={\"tree_method\": \"auto\"},\n", " label_column=\"target\",\n", " datasets={\"train\": train_dataset},\n", " num_boost_round=10,\n", " run_config=RunConfig(\n", " callbacks=[\n", " # This is the part needed to enable logging to Weights & Biases.\n", " # It assumes you've logged in before, e.g. with `wandb login`.\n", " WandbLoggerCallback(\n", " project=wandb_project,\n", " save_checkpoints=True,\n", " )\n", " ]\n", " ),\n", " )\n", " result = trainer.fit()\n", " return result" ] }, { "cell_type": "markdown", "id": "1959ce19", "metadata": {}, "source": [ "Let's kick off a run:" ] }, { "cell_type": "code", "execution_count": 5, "id": "64f80d6c", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2022-05-19 15:22:11,956\tINFO services.py:1483 -- View the Ray dashboard at \u001b[1m\u001b[32mhttp://127.0.0.1:8266\u001b[39m\u001b[22m\n", "2022-05-19 15:22:15,995\tINFO wandb.py:172 -- Already logged into W&B.\n" ] }, { "data": { "text/html": [ "== Status ==
Current time: 2022-05-19 15:22:42 (running for 00:00:26.61)
Memory usage on this node: 10.2/16.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/16 CPUs, 0/0 GPUs, 0.0/4.6 GiB heap, 0.0/2.0 GiB objects
Result logdir: /Users/kai/ray_results/XGBoostTrainer_2022-05-19_15-22-14
Number of trials: 1/1 (1 TERMINATED)
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Trial name status loc iter total time (s) train-rmse
XGBoostTrainer_14a73_00000TERMINATED127.0.0.1:20065 10 10.2724 0.030717


" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[2m\u001b[33m(raylet)\u001b[0m 2022-05-19 15:22:17,422\tINFO context.py:70 -- Exec'ing worker with command: exec /Users/kai/.pyenv/versions/3.7.7/bin/python3.7 /Users/kai/coding/ray/python/ray/workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=61838 --object-store-name=/tmp/ray/session_2022-05-19_15-22-09_017478_19912/sockets/plasma_store --raylet-name=/tmp/ray/session_2022-05-19_15-22-09_017478_19912/sockets/raylet --redis-address=None --storage=None --temp-dir=/tmp/ray --metrics-agent-port=63609 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:62933 --redis-password=5241590000000000 --startup-token=16 --runtime-env-hash=-2010331134\n", "\u001b[34m\u001b[1mwandb\u001b[0m: Currently logged in as: \u001b[33mkaifricke\u001b[0m. Use \u001b[1m`wandb login --relogin`\u001b[0m to force relogin\n", "\u001b[2m\u001b[36m(GBDTTrainable pid=20065)\u001b[0m UserWarning: Dataset 'train' has 1 blocks, which is less than the `num_workers` 2. This dataset will be automatically repartitioned to 2 blocks.\n", "\u001b[2m\u001b[33m(raylet)\u001b[0m 2022-05-19 15:22:23,215\tINFO context.py:70 -- Exec'ing worker with command: exec /Users/kai/.pyenv/versions/3.7.7/bin/python3.7 /Users/kai/coding/ray/python/ray/workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=61838 --object-store-name=/tmp/ray/session_2022-05-19_15-22-09_017478_19912/sockets/plasma_store --raylet-name=/tmp/ray/session_2022-05-19_15-22-09_017478_19912/sockets/raylet --redis-address=None --storage=None --temp-dir=/tmp/ray --metrics-agent-port=63609 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:62933 --redis-password=5241590000000000 --startup-token=17 --runtime-env-hash=-2010331069\n" ] }, { "data": { "text/html": [ "Tracking run with wandb version 0.12.16" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Run data is saved locally in /Users/kai/coding/ray/doc/source/ray-air/examples/wandb/run-20220519_152218-14a73_00000" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Syncing run XGBoostTrainer_14a73_00000 to Weights & Biases (docs)
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(GBDTTrainable pid=20065)\u001b[0m 2022-05-19 15:22:24,711\tINFO main.py:980 -- [RayXGBoost] Created 2 new actors (2 total actors). Waiting until actors are ready for training.\n", "\u001b[2m\u001b[33m(raylet)\u001b[0m 2022-05-19 15:22:26,090\tINFO context.py:70 -- Exec'ing worker with command: exec /Users/kai/.pyenv/versions/3.7.7/bin/python3.7 /Users/kai/coding/ray/python/ray/workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=61838 --object-store-name=/tmp/ray/session_2022-05-19_15-22-09_017478_19912/sockets/plasma_store --raylet-name=/tmp/ray/session_2022-05-19_15-22-09_017478_19912/sockets/raylet --redis-address=None --storage=None --temp-dir=/tmp/ray --metrics-agent-port=63609 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:62933 --redis-password=5241590000000000 --startup-token=18 --runtime-env-hash=-2010331069\n", "\u001b[2m\u001b[33m(raylet)\u001b[0m 2022-05-19 15:22:26,234\tINFO context.py:70 -- Exec'ing worker with command: exec /Users/kai/.pyenv/versions/3.7.7/bin/python3.7 /Users/kai/coding/ray/python/ray/workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=61838 --object-store-name=/tmp/ray/session_2022-05-19_15-22-09_017478_19912/sockets/plasma_store --raylet-name=/tmp/ray/session_2022-05-19_15-22-09_017478_19912/sockets/raylet --redis-address=None --storage=None --temp-dir=/tmp/ray --metrics-agent-port=63609 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:62933 --redis-password=5241590000000000 --startup-token=19 --runtime-env-hash=-2010331134\n", "\u001b[2m\u001b[33m(raylet)\u001b[0m 2022-05-19 15:22:26,236\tINFO context.py:70 -- Exec'ing worker with command: exec /Users/kai/.pyenv/versions/3.7.7/bin/python3.7 /Users/kai/coding/ray/python/ray/workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=61838 --object-store-name=/tmp/ray/session_2022-05-19_15-22-09_017478_19912/sockets/plasma_store --raylet-name=/tmp/ray/session_2022-05-19_15-22-09_017478_19912/sockets/raylet --redis-address=None --storage=None --temp-dir=/tmp/ray --metrics-agent-port=63609 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:62933 --redis-password=5241590000000000 --startup-token=20 --runtime-env-hash=-2010331134\n", "\u001b[2m\u001b[33m(raylet)\u001b[0m 2022-05-19 15:22:26,239\tINFO context.py:70 -- Exec'ing worker with command: exec /Users/kai/.pyenv/versions/3.7.7/bin/python3.7 /Users/kai/coding/ray/python/ray/workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=61838 --object-store-name=/tmp/ray/session_2022-05-19_15-22-09_017478_19912/sockets/plasma_store --raylet-name=/tmp/ray/session_2022-05-19_15-22-09_017478_19912/sockets/raylet --redis-address=None --storage=None --temp-dir=/tmp/ray --metrics-agent-port=63609 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:62933 --redis-password=5241590000000000 --startup-token=21 --runtime-env-hash=-2010331134\n", "\u001b[2m\u001b[33m(raylet)\u001b[0m 2022-05-19 15:22:26,263\tINFO context.py:70 -- Exec'ing worker with command: exec /Users/kai/.pyenv/versions/3.7.7/bin/python3.7 /Users/kai/coding/ray/python/ray/workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=61838 --object-store-name=/tmp/ray/session_2022-05-19_15-22-09_017478_19912/sockets/plasma_store --raylet-name=/tmp/ray/session_2022-05-19_15-22-09_017478_19912/sockets/raylet --redis-address=None --storage=None --temp-dir=/tmp/ray --metrics-agent-port=63609 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:62933 --redis-password=5241590000000000 --startup-token=22 --runtime-env-hash=-2010331134\n", "\u001b[2m\u001b[36m(GBDTTrainable pid=20065)\u001b[0m 2022-05-19 15:22:29,260\tINFO main.py:1025 -- [RayXGBoost] Starting XGBoost training.\n", "\u001b[2m\u001b[36m(_RemoteRayXGBoostActor pid=20130)\u001b[0m [15:22:29] task [xgboost.ray]:6859875216 got new rank 0\n", "\u001b[2m\u001b[36m(_RemoteRayXGBoostActor pid=20131)\u001b[0m [15:22:29] task [xgboost.ray]:4625795280 got new rank 1\n", "\u001b[34m\u001b[1mwandb\u001b[0m: Adding directory to artifact (/Users/kai/ray_results/XGBoostTrainer_2022-05-19_15-22-14/XGBoostTrainer_14a73_00000_0_2022-05-19_15-22-16/checkpoint_000000)... Done. 0.1s\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Result for XGBoostTrainer_14a73_00000:\n", " date: 2022-05-19_15-22-31\n", " done: false\n", " experiment_id: 2d50bfe80d2a441e80f4ca05f7c3b607\n", " hostname: Kais-MacBook-Pro.local\n", " iterations_since_restore: 1\n", " node_ip: 127.0.0.1\n", " pid: 20065\n", " should_checkpoint: true\n", " time_since_restore: 10.080440044403076\n", " time_this_iter_s: 10.080440044403076\n", " time_total_s: 10.080440044403076\n", " timestamp: 1652970151\n", " timesteps_since_restore: 0\n", " train-rmse: 0.357284\n", " training_iteration: 1\n", " trial_id: 14a73_00000\n", " warmup_time: 0.006903171539306641\n", " \n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[34m\u001b[1mwandb\u001b[0m: Adding directory to artifact (/Users/kai/ray_results/XGBoostTrainer_2022-05-19_15-22-14/XGBoostTrainer_14a73_00000_0_2022-05-19_15-22-16/checkpoint_000001)... Done. 0.1s\n", "\u001b[2m\u001b[36m(GBDTTrainable pid=20065)\u001b[0m 2022-05-19 15:22:32,051\tINFO main.py:1519 -- [RayXGBoost] Finished XGBoost training on training data with total N=569 in 7.37 seconds (2.79 pure XGBoost training time).\n", "\u001b[34m\u001b[1mwandb\u001b[0m: Adding directory to artifact (/Users/kai/ray_results/XGBoostTrainer_2022-05-19_15-22-14/XGBoostTrainer_14a73_00000_0_2022-05-19_15-22-16/checkpoint_000002)... Done. 0.1s\n", "\u001b[34m\u001b[1mwandb\u001b[0m: Adding directory to artifact (/Users/kai/ray_results/XGBoostTrainer_2022-05-19_15-22-14/XGBoostTrainer_14a73_00000_0_2022-05-19_15-22-16/checkpoint_000003)... Done. 0.1s\n", "\u001b[34m\u001b[1mwandb\u001b[0m: Adding directory to artifact (/Users/kai/ray_results/XGBoostTrainer_2022-05-19_15-22-14/XGBoostTrainer_14a73_00000_0_2022-05-19_15-22-16/checkpoint_000004)... Done. 0.1s\n", "\u001b[34m\u001b[1mwandb\u001b[0m: Adding directory to artifact (/Users/kai/ray_results/XGBoostTrainer_2022-05-19_15-22-14/XGBoostTrainer_14a73_00000_0_2022-05-19_15-22-16/checkpoint_000005)... Done. 0.1s\n", "\u001b[34m\u001b[1mwandb\u001b[0m: Adding directory to artifact (/Users/kai/ray_results/XGBoostTrainer_2022-05-19_15-22-14/XGBoostTrainer_14a73_00000_0_2022-05-19_15-22-16/checkpoint_000006)... Done. 0.1s\n", "\u001b[34m\u001b[1mwandb\u001b[0m: Adding directory to artifact (/Users/kai/ray_results/XGBoostTrainer_2022-05-19_15-22-14/XGBoostTrainer_14a73_00000_0_2022-05-19_15-22-16/checkpoint_000007)... Done. 0.1s\n", "\u001b[34m\u001b[1mwandb\u001b[0m: Adding directory to artifact (/Users/kai/ray_results/XGBoostTrainer_2022-05-19_15-22-14/XGBoostTrainer_14a73_00000_0_2022-05-19_15-22-16/checkpoint_000008)... Done. 0.1s\n", "\u001b[34m\u001b[1mwandb\u001b[0m: Adding directory to artifact (/Users/kai/ray_results/XGBoostTrainer_2022-05-19_15-22-14/XGBoostTrainer_14a73_00000_0_2022-05-19_15-22-16/checkpoint_000009)... Done. 0.1s\n", "\u001b[34m\u001b[1mwandb\u001b[0m: Adding directory to artifact (/Users/kai/ray_results/XGBoostTrainer_2022-05-19_15-22-14/XGBoostTrainer_14a73_00000_0_2022-05-19_15-22-16/checkpoint_000009)... Done. 0.1s\n" ] }, { "data": { "text/html": [ "Waiting for W&B process to finish... (success)." ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Result for XGBoostTrainer_14a73_00000:\n", " date: 2022-05-19_15-22-32\n", " done: true\n", " experiment_id: 2d50bfe80d2a441e80f4ca05f7c3b607\n", " experiment_tag: '0'\n", " hostname: Kais-MacBook-Pro.local\n", " iterations_since_restore: 10\n", " node_ip: 127.0.0.1\n", " pid: 20065\n", " should_checkpoint: true\n", " time_since_restore: 10.272444248199463\n", " time_this_iter_s: 0.023891210556030273\n", " time_total_s: 10.272444248199463\n", " timestamp: 1652970152\n", " timesteps_since_restore: 0\n", " train-rmse: 0.030717\n", " training_iteration: 10\n", " trial_id: 14a73_00000\n", " warmup_time: 0.006903171539306641\n", " \n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2022-05-19 15:22:42,727\tINFO tune.py:753 -- Total run time: 27.83 seconds (26.61 seconds for the tuning loop).\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "", "version_major": 2, "version_minor": 0 }, "text/plain": [ "VBox(children=(Label(value='0.090 MB of 0.090 MB uploaded (0.000 MB deduped)\\r'), FloatProgress(value=1.0, max…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "

Run history:


iterations_since_restore▁▂▃▃▄▅▆▆▇█
time_since_restore▁▂▃▃▄▅▅▆▇█
time_this_iter_s█▁▁▁▁▁▁▁▁▁
time_total_s▁▂▃▃▄▅▅▆▇█
timestamp▁▁▁▁▁▁▁▁██
timesteps_since_restore▁▁▁▁▁▁▁▁▁▁
train-rmse█▆▄▃▂▂▂▁▁▁
training_iteration▁▂▃▃▄▅▆▆▇█
warmup_time▁▁▁▁▁▁▁▁▁▁

Run summary:


iterations_since_restore10
time_since_restore10.27244
time_this_iter_s0.02389
time_total_s10.27244
timestamp1652970152
timesteps_since_restore0
train-rmse0.03072
training_iteration10
warmup_time0.0069

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Synced XGBoostTrainer_14a73_00000: https://wandb.ai/kaifricke/ray_air_example/runs/14a73_00000
Synced 5 W&B file(s), 0 media file(s), 21 artifact file(s) and 0 other file(s)" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Find logs at: ./wandb/run-20220519_152218-14a73_00000/logs" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "wandb_project = \"ray_air_example\"\n", "\n", "train_dataset = get_train_dataset()\n", "result = train_model(train_dataset=train_dataset, wandb_project=wandb_project)" ] }, { "cell_type": "markdown", "id": "78701c42", "metadata": {}, "source": [ "Check out your [WandB](https://wandb.ai/) project to see the results!" ] } ], "metadata": { "jupytext": { "cell_metadata_filter": "-all", "main_language": "python", "notebook_metadata_filter": "-all" }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.7" } }, "nbformat": 4, "nbformat_minor": 5 }