mirror of
https://github.com/vale981/ray
synced 2025-03-06 10:31:39 -05:00

This PR updates the Ray AIR/Tune ipynb examples to use the Tuner() API instead of tune.run(). Signed-off-by: Kai Fricke <kai@anyscale.com> Signed-off-by: Richard Liaw <rliaw@berkeley.edu> Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com> Signed-off-by: Kai Fricke <coding@kaifricke.com> Co-authored-by: Richard Liaw <rliaw@berkeley.edu> Co-authored-by: Xiaowei Jiang <xwjiang2010@gmail.com>
356 lines
14 KiB
Text
356 lines
14 KiB
Text
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "3b05af3b",
|
|
"metadata": {},
|
|
"source": [
|
|
"(tune-comet-ref)=\n",
|
|
"\n",
|
|
"# Using Comet with Tune\n",
|
|
"\n",
|
|
"[Comet](https://www.comet.ml/site/) is a tool to manage and optimize the\n",
|
|
"entire ML lifecycle, from experiment tracking, model optimization and dataset\n",
|
|
"versioning to model production monitoring.\n",
|
|
"\n",
|
|
"```{image} /images/comet_logo_full.png\n",
|
|
":align: center\n",
|
|
":alt: Comet\n",
|
|
":height: 120px\n",
|
|
":target: https://www.comet.ml/site/\n",
|
|
"```\n",
|
|
"\n",
|
|
"```{contents}\n",
|
|
":backlinks: none\n",
|
|
":local: true\n",
|
|
"```\n",
|
|
"\n",
|
|
"## Example\n",
|
|
"\n",
|
|
"To illustrate logging your trial results to Comet, we'll define a simple training function\n",
|
|
"that simulates a `loss` metric:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 1,
|
|
"id": "19e3c389",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import numpy as np\n",
|
|
"from ray import air, tune\n",
|
|
"from ray.air import session\n",
|
|
"\n",
|
|
"\n",
|
|
"def train_function(config, checkpoint_dir=None):\n",
|
|
" for i in range(30):\n",
|
|
" loss = config[\"mean\"] + config[\"sd\"] * np.random.randn()\n",
|
|
" session.report({\"loss\": loss})"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "6fb69a24",
|
|
"metadata": {},
|
|
"source": [
|
|
"Now, given that you provide your Comet API key and your project name like so:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"id": "993d5be6",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"api_key = \"YOUR_COMET_API_KEY\"\n",
|
|
"project_name = \"YOUR_COMET_PROJECT_NAME\""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"id": "e9ce0d76",
|
|
"metadata": {
|
|
"tags": [
|
|
"remove-cell"
|
|
]
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# This cell is hidden from the rendered notebook. It makes the \n",
|
|
"from unittest.mock import MagicMock\n",
|
|
"from ray.air.callbacks.comet import CometLoggerCallback\n",
|
|
"\n",
|
|
"CometLoggerCallback._logger_process_cls = MagicMock\n",
|
|
"api_key = \"abc\"\n",
|
|
"project_name = \"test\""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "d792a1b0",
|
|
"metadata": {},
|
|
"source": [
|
|
"You can add a Comet logger by specifying the `callbacks` argument in your `RunConfig()` accordingly:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"id": "dbb761e7",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stderr",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"2022-07-22 15:41:21,477\tINFO services.py:1483 -- View the Ray dashboard at \u001b[1m\u001b[32mhttp://127.0.0.1:8267\u001b[39m\u001b[22m\n",
|
|
"/Users/kai/coding/ray/python/ray/tune/trainable/function_trainable.py:643: DeprecationWarning: `checkpoint_dir` in `func(config, checkpoint_dir)` is being deprecated. To save and load checkpoint in trainable functions, please use the `ray.air.session` API:\n",
|
|
"\n",
|
|
"from ray.air import session\n",
|
|
"\n",
|
|
"def train(config):\n",
|
|
" # ...\n",
|
|
" session.report({\"metric\": metric}, checkpoint=checkpoint)\n",
|
|
"\n",
|
|
"For more information please see https://docs.ray.io/en/master/ray-air/key-concepts.html#session\n",
|
|
"\n",
|
|
" DeprecationWarning,\n"
|
|
]
|
|
},
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"== Status ==<br>Current time: 2022-07-22 15:41:31 (running for 00:00:06.73)<br>Memory usage on this node: 9.9/16.0 GiB<br>Using FIFO scheduling algorithm.<br>Resources requested: 0/16 CPUs, 0/0 GPUs, 0.0/4.5 GiB heap, 0.0/2.0 GiB objects<br>Current best trial: 5bf98_00000 with loss=1.0234101880766688 and parameters={'mean': 1, 'sd': 0.40575843135279466}<br>Result logdir: /Users/kai/ray_results/train_function_2022-07-22_15-41-18<br>Number of trials: 3/3 (3 TERMINATED)<br><table>\n",
|
|
"<thead>\n",
|
|
"<tr><th>Trial name </th><th>status </th><th>loc </th><th style=\"text-align: right;\"> mean</th><th style=\"text-align: right;\"> sd</th><th style=\"text-align: right;\"> iter</th><th style=\"text-align: right;\"> total time (s)</th><th style=\"text-align: right;\"> loss</th></tr>\n",
|
|
"</thead>\n",
|
|
"<tbody>\n",
|
|
"<tr><td>train_function_5bf98_00000</td><td>TERMINATED</td><td>127.0.0.1:48140</td><td style=\"text-align: right;\"> 1</td><td style=\"text-align: right;\">0.405758</td><td style=\"text-align: right;\"> 30</td><td style=\"text-align: right;\"> 2.11758 </td><td style=\"text-align: right;\">1.02341</td></tr>\n",
|
|
"<tr><td>train_function_5bf98_00001</td><td>TERMINATED</td><td>127.0.0.1:48147</td><td style=\"text-align: right;\"> 2</td><td style=\"text-align: right;\">0.647335</td><td style=\"text-align: right;\"> 30</td><td style=\"text-align: right;\"> 0.0770731</td><td style=\"text-align: right;\">1.53993</td></tr>\n",
|
|
"<tr><td>train_function_5bf98_00002</td><td>TERMINATED</td><td>127.0.0.1:48151</td><td style=\"text-align: right;\"> 3</td><td style=\"text-align: right;\">0.256568</td><td style=\"text-align: right;\"> 30</td><td style=\"text-align: right;\"> 0.0728431</td><td style=\"text-align: right;\">3.0393 </td></tr>\n",
|
|
"</tbody>\n",
|
|
"</table><br><br>"
|
|
],
|
|
"text/plain": [
|
|
"<IPython.core.display.HTML object>"
|
|
]
|
|
},
|
|
"metadata": {},
|
|
"output_type": "display_data"
|
|
},
|
|
{
|
|
"name": "stderr",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"2022-07-22 15:41:24,693\tINFO plugin_schema_manager.py:52 -- Loading the default runtime env schemas: ['/Users/kai/coding/ray/python/ray/_private/runtime_env/../../runtime_env/schemas/working_dir_schema.json', '/Users/kai/coding/ray/python/ray/_private/runtime_env/../../runtime_env/schemas/pip_schema.json'].\n",
|
|
"COMET WARNING: As you are running in a Jupyter environment, you will need to call `experiment.end()` when finished to ensure all metrics and code are logged before exiting.\n",
|
|
"COMET ERROR: The given API key abc is invalid, please check it against the dashboard. Your experiment would not be logged \n",
|
|
"For more details, please refer to: https://www.comet.ml/docs/python-sdk/warnings-errors/\n",
|
|
"COMET WARNING: As you are running in a Jupyter environment, you will need to call `experiment.end()` when finished to ensure all metrics and code are logged before exiting.\n",
|
|
"COMET ERROR: The given API key abc is invalid, please check it against the dashboard. Your experiment would not be logged \n",
|
|
"For more details, please refer to: https://www.comet.ml/docs/python-sdk/warnings-errors/\n",
|
|
"COMET WARNING: As you are running in a Jupyter environment, you will need to call `experiment.end()` when finished to ensure all metrics and code are logged before exiting.\n",
|
|
"COMET ERROR: The given API key abc is invalid, please check it against the dashboard. Your experiment would not be logged \n",
|
|
"For more details, please refer to: https://www.comet.ml/docs/python-sdk/warnings-errors/\n"
|
|
]
|
|
},
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Result for train_function_5bf98_00000:\n",
|
|
" date: 2022-07-22_15-41-27\n",
|
|
" done: false\n",
|
|
" experiment_id: c94e6cdedd4540e4b40e4a34fbbeb850\n",
|
|
" hostname: Kais-MacBook-Pro.local\n",
|
|
" iterations_since_restore: 1\n",
|
|
" loss: 1.1009860426725162\n",
|
|
" node_ip: 127.0.0.1\n",
|
|
" pid: 48140\n",
|
|
" time_since_restore: 0.000125885009765625\n",
|
|
" time_this_iter_s: 0.000125885009765625\n",
|
|
" time_total_s: 0.000125885009765625\n",
|
|
" timestamp: 1658500887\n",
|
|
" timesteps_since_restore: 0\n",
|
|
" training_iteration: 1\n",
|
|
" trial_id: 5bf98_00000\n",
|
|
" warmup_time: 0.0029532909393310547\n",
|
|
" \n",
|
|
"Result for train_function_5bf98_00000:\n",
|
|
" date: 2022-07-22_15-41-29\n",
|
|
" done: true\n",
|
|
" experiment_id: c94e6cdedd4540e4b40e4a34fbbeb850\n",
|
|
" experiment_tag: 0_mean=1,sd=0.4058\n",
|
|
" hostname: Kais-MacBook-Pro.local\n",
|
|
" iterations_since_restore: 30\n",
|
|
" loss: 1.0234101880766688\n",
|
|
" node_ip: 127.0.0.1\n",
|
|
" pid: 48140\n",
|
|
" time_since_restore: 2.1175789833068848\n",
|
|
" time_this_iter_s: 0.0022211074829101562\n",
|
|
" time_total_s: 2.1175789833068848\n",
|
|
" timestamp: 1658500889\n",
|
|
" timesteps_since_restore: 0\n",
|
|
" training_iteration: 30\n",
|
|
" trial_id: 5bf98_00000\n",
|
|
" warmup_time: 0.0029532909393310547\n",
|
|
" \n",
|
|
"Result for train_function_5bf98_00001:\n",
|
|
" date: 2022-07-22_15-41-30\n",
|
|
" done: false\n",
|
|
" experiment_id: ba865bc613d94413a37fe027123ba031\n",
|
|
" hostname: Kais-MacBook-Pro.local\n",
|
|
" iterations_since_restore: 1\n",
|
|
" loss: 2.3754716847171182\n",
|
|
" node_ip: 127.0.0.1\n",
|
|
" pid: 48147\n",
|
|
" time_since_restore: 0.0001590251922607422\n",
|
|
" time_this_iter_s: 0.0001590251922607422\n",
|
|
" time_total_s: 0.0001590251922607422\n",
|
|
" timestamp: 1658500890\n",
|
|
" timesteps_since_restore: 0\n",
|
|
" training_iteration: 1\n",
|
|
" trial_id: 5bf98_00001\n",
|
|
" warmup_time: 0.0036537647247314453\n",
|
|
" \n",
|
|
"Result for train_function_5bf98_00001:\n",
|
|
" date: 2022-07-22_15-41-30\n",
|
|
" done: true\n",
|
|
" experiment_id: ba865bc613d94413a37fe027123ba031\n",
|
|
" experiment_tag: 1_mean=2,sd=0.6473\n",
|
|
" hostname: Kais-MacBook-Pro.local\n",
|
|
" iterations_since_restore: 30\n",
|
|
" loss: 1.5399275480220707\n",
|
|
" node_ip: 127.0.0.1\n",
|
|
" pid: 48147\n",
|
|
" time_since_restore: 0.0770730972290039\n",
|
|
" time_this_iter_s: 0.002664804458618164\n",
|
|
" time_total_s: 0.0770730972290039\n",
|
|
" timestamp: 1658500890\n",
|
|
" timesteps_since_restore: 0\n",
|
|
" training_iteration: 30\n",
|
|
" trial_id: 5bf98_00001\n",
|
|
" warmup_time: 0.0036537647247314453\n",
|
|
" \n",
|
|
"Result for train_function_5bf98_00002:\n",
|
|
" date: 2022-07-22_15-41-31\n",
|
|
" done: false\n",
|
|
" experiment_id: 2efb6f3c4d954bcab1ea4083f138008e\n",
|
|
" hostname: Kais-MacBook-Pro.local\n",
|
|
" iterations_since_restore: 1\n",
|
|
" loss: 3.204653294422825\n",
|
|
" node_ip: 127.0.0.1\n",
|
|
" pid: 48151\n",
|
|
" time_since_restore: 0.00014400482177734375\n",
|
|
" time_this_iter_s: 0.00014400482177734375\n",
|
|
" time_total_s: 0.00014400482177734375\n",
|
|
" timestamp: 1658500891\n",
|
|
" timesteps_since_restore: 0\n",
|
|
" training_iteration: 1\n",
|
|
" trial_id: 5bf98_00002\n",
|
|
" warmup_time: 0.0030150413513183594\n",
|
|
" \n",
|
|
"Result for train_function_5bf98_00002:\n",
|
|
" date: 2022-07-22_15-41-31\n",
|
|
" done: true\n",
|
|
" experiment_id: 2efb6f3c4d954bcab1ea4083f138008e\n",
|
|
" experiment_tag: 2_mean=3,sd=0.2566\n",
|
|
" hostname: Kais-MacBook-Pro.local\n",
|
|
" iterations_since_restore: 30\n",
|
|
" loss: 3.0393011150182865\n",
|
|
" node_ip: 127.0.0.1\n",
|
|
" pid: 48151\n",
|
|
" time_since_restore: 0.07284307479858398\n",
|
|
" time_this_iter_s: 0.0020139217376708984\n",
|
|
" time_total_s: 0.07284307479858398\n",
|
|
" timestamp: 1658500891\n",
|
|
" timesteps_since_restore: 0\n",
|
|
" training_iteration: 30\n",
|
|
" trial_id: 5bf98_00002\n",
|
|
" warmup_time: 0.0030150413513183594\n",
|
|
" \n"
|
|
]
|
|
},
|
|
{
|
|
"name": "stderr",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"2022-07-22 15:41:31,290\tINFO tune.py:738 -- Total run time: 7.36 seconds (6.72 seconds for the tuning loop).\n"
|
|
]
|
|
},
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"{'mean': 1, 'sd': 0.40575843135279466}\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"from ray.air.callbacks.comet import CometLoggerCallback\n",
|
|
"\n",
|
|
"tuner = tune.Tuner(\n",
|
|
" train_function,\n",
|
|
" tune_config=tune.TuneConfig(\n",
|
|
" metric=\"loss\",\n",
|
|
" mode=\"min\",\n",
|
|
" ),\n",
|
|
" run_config=air.RunConfig(\n",
|
|
" callbacks=[\n",
|
|
" CometLoggerCallback(\n",
|
|
" api_key=api_key, project_name=project_name, tags=[\"comet_example\"]\n",
|
|
" )\n",
|
|
" ],\n",
|
|
" ),\n",
|
|
" param_space={\"mean\": tune.grid_search([1, 2, 3]), \"sd\": tune.uniform(0.2, 0.8)},\n",
|
|
")\n",
|
|
"results = tuner.fit()\n",
|
|
"\n",
|
|
"print(results.get_best_result().config)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "d7e46189",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Tune Comet Logger\n",
|
|
"\n",
|
|
"Ray Tune offers an integration with Comet through the `CometLoggerCallback`,\n",
|
|
"which automatically logs metrics and parameters reported to Tune to the Comet UI.\n",
|
|
"\n",
|
|
"Click on the following dropdown to see this callback API in detail:\n",
|
|
"\n",
|
|
"```{eval-rst}\n",
|
|
".. autoclass:: ray.air.callbacks.comet.CometLoggerCallback\n",
|
|
" :noindex:\n",
|
|
"```"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.7.7"
|
|
},
|
|
"orphan": true
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|