{
"cells": [
{
"cell_type": "markdown",
"id": "98d7c620",
"metadata": {},
"source": [
"# Logging results and uploading models to Comet ML\n",
"In this example, we train a simple XGBoost model and log the training\n",
"results to Comet ML. We also save the resulting model checkpoints\n",
"as artifacts."
]
},
{
"cell_type": "markdown",
"id": "c6e66577",
"metadata": {},
"source": [
"Let's start with installing our dependencies:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "6d6297ef",
"metadata": {},
"outputs": [],
"source": [
"!pip install -qU \"ray[tune]\" sklearn xgboost_ray comet_ml"
]
},
{
"cell_type": "markdown",
"id": "c2e21446",
"metadata": {},
"source": [
"Then we need some imports:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "dffff484",
"metadata": {},
"outputs": [],
"source": [
"import ray\n",
"\n",
"from ray.ml import RunConfig\n",
"from ray.ml.result import Result\n",
"from ray.ml.train.integrations.xgboost import XGBoostTrainer\n",
"from ray.tune.integration.comet import CometLoggerCallback\n",
"from sklearn.datasets import load_breast_cancer"
]
},
{
"cell_type": "markdown",
"id": "29fcd93b",
"metadata": {},
"source": [
"We define a simple function that returns our training dataset as a Ray Dataset:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "cf830706",
"metadata": {},
"outputs": [],
"source": [
"def get_train_dataset() -> ray.data.Dataset:\n",
" \"\"\"Return the \"Breast cancer\" dataset as a Ray dataset.\"\"\"\n",
" data_raw = load_breast_cancer(as_frame=True)\n",
" df = data_raw[\"data\"]\n",
" df[\"target\"] = data_raw[\"target\"]\n",
" return ray.data.from_pandas(df)"
]
},
{
"cell_type": "markdown",
"id": "0f48f948",
"metadata": {},
"source": [
"Now we define a simple training function. All the magic happens within the `CometLoggerCallback`:\n",
"\n",
"```python\n",
"CometLoggerCallback(\n",
" project_name=comet_project,\n",
" save_checkpoints=True,\n",
")\n",
"```\n",
"\n",
"It will automatically log all results to Comet ML and upload the checkpoints as artifacts. It assumes you're logged in into Comet via an API key or your `~./.comet.config`."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "230f23a3",
"metadata": {},
"outputs": [],
"source": [
"def train_model(train_dataset: ray.data.Dataset, comet_project: str) -> Result:\n",
" \"\"\"Train a simple XGBoost model and return the result.\"\"\"\n",
" trainer = XGBoostTrainer(\n",
" scaling_config={\"num_workers\": 2},\n",
" params={\"tree_method\": \"auto\"},\n",
" label_column=\"target\",\n",
" datasets={\"train\": train_dataset},\n",
" num_boost_round=10,\n",
" run_config=RunConfig(\n",
" callbacks=[\n",
" # This is the part needed to enable logging to Comet ML.\n",
" # It assumes Comet ML can find a valid API (e.g. by setting\n",
" # the ``COMET_API_KEY`` environment variable).\n",
" CometLoggerCallback(\n",
" project_name=comet_project,\n",
" save_checkpoints=True,\n",
" )\n",
" ]\n",
" ),\n",
" )\n",
" result = trainer.fit()\n",
" return result"
]
},
{
"cell_type": "markdown",
"id": "711b1d7d",
"metadata": {},
"source": [
"Let's kick off a run:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "9bfd9a8d",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"2022-05-19 15:19:17,237\tINFO services.py:1483 -- View the Ray dashboard at \u001b[1m\u001b[32mhttp://127.0.0.1:8265\u001b[39m\u001b[22m\n"
]
},
{
"data": {
"text/html": [
"== Status ==
Current time: 2022-05-19 15:19:35 (running for 00:00:14.95)
Memory usage on this node: 10.2/16.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/16 CPUs, 0/0 GPUs, 0.0/5.12 GiB heap, 0.0/2.0 GiB objects
Result logdir: /Users/kai/ray_results/XGBoostTrainer_2022-05-19_15-19-19
Number of trials: 1/1 (1 TERMINATED)
Trial name | status | loc | iter | total time (s) | train-rmse |
---|---|---|---|---|---|
XGBoostTrainer_ac544_00000 | TERMINATED | 127.0.0.1:19852 | 10 | 9.7203 | 0.030717 |