{ "cells": [ { "cell_type": "markdown", "id": "ecad719c", "metadata": {}, "source": [ "# Using Weights & Biases with Tune\n", "\n", "(tune-wandb-ref)=\n", "\n", "[Weights & Biases](https://www.wandb.ai/) (Wandb) is a tool for experiment\n", "tracking, model optimizaton, and dataset versioning. It is very popular\n", "in the machine learning and data science community for its superb visualization\n", "tools.\n", "\n", "```{image} /images/wandb_logo_full.png\n", ":align: center\n", ":alt: Weights & Biases\n", ":height: 80px\n", ":target: https://www.wandb.ai/\n", "```\n", "\n", "Ray Tune currently offers two lightweight integrations for Weights & Biases.\n", "One is the {ref}`WandbLoggerCallback `, which automatically logs\n", "metrics reported to Tune to the Wandb API.\n", "\n", "The other one is the {ref}`@wandb_mixin ` decorator, which can be\n", "used with the function API. It automatically\n", "initializes the Wandb API with Tune's training information. You can just use the\n", "Wandb API like you would normally do, e.g. using `wandb.log()` to log your training\n", "process.\n", "\n", "```{contents}\n", ":backlinks: none\n", ":local: true\n", "```\n", "\n", "## Running A Weights & Biases Example\n", "\n", "In the following example we're going to use both of the above methods, namely the `WandbLoggerCallback` and\n", "the `wandb_mixin` decorator to log metrics.\n", "Let's start with a few crucial imports:" ] }, { "cell_type": "code", "execution_count": 1, "id": "100bcf8a", "metadata": { "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "import numpy as np\n", "import wandb\n", "\n", "from ray import air, tune\n", "from ray.air import session\n", "from ray.tune import Trainable\n", "from ray.air.callbacks.wandb import WandbLoggerCallback\n", "from ray.tune.integration.wandb import (\n", " WandbTrainableMixin,\n", " wandb_mixin,\n", ")" ] }, { "cell_type": "markdown", "id": "9346c0f6", "metadata": {}, "source": [ "Next, let's define an easy `objective` function (a Tune `Trainable`) that reports a random loss to Tune.\n", "The objective function itself is not important for this example, since we want to focus on the Weights & Biases\n", "integration primarily." ] }, { "cell_type": "code", "execution_count": 2, "id": "e8b4fc4d", "metadata": { "pycharm": { "name": "#%%\n" }, "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "def objective(config, checkpoint_dir=None):\n", " for i in range(30):\n", " loss = config[\"mean\"] + config[\"sd\"] * np.random.randn()\n", " session.report({\"loss\": loss})" ] }, { "cell_type": "markdown", "id": "831eed42", "metadata": {}, "source": [ "Given that you provide an `api_key_file` pointing to your Weights & Biases API key, you cna define a\n", "simple grid-search Tune run using the `WandbLoggerCallback` as follows:" ] }, { "cell_type": "code", "execution_count": 3, "id": "52988599", "metadata": { "pycharm": { "name": "#%%\n" }, "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "def tune_function(api_key_file):\n", " \"\"\"Example for using a WandbLoggerCallback with the function API\"\"\"\n", " tuner = tune.Tuner(\n", " objective,\n", " tune_config=tune.TuneConfig(\n", " metric=\"loss\",\n", " mode=\"min\",\n", " ),\n", " run_config=air.RunConfig(\n", " callbacks=[\n", " WandbLoggerCallback(api_key_file=api_key_file, project=\"Wandb_example\")\n", " ],\n", " ),\n", " param_space={\n", " \"mean\": tune.grid_search([1, 2, 3, 4, 5]),\n", " \"sd\": tune.uniform(0.2, 0.8),\n", " },\n", " )\n", " results = tuner.fit()\n", "\n", " return results.get_best_result().config" ] }, { "cell_type": "markdown", "id": "e24c05fa", "metadata": {}, "source": [ "To use the `wandb_mixin` decorator, you can simply decorate the objective function from earlier.\n", "Note that we also use `wandb.log(...)` to log the `loss` to Weights & Biases as a dictionary.\n", "Otherwise, the decorated version of our objective is identical to its original." ] }, { "cell_type": "code", "execution_count": 4, "id": "5e30d5e7", "metadata": { "pycharm": { "name": "#%%\n" }, "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "@wandb_mixin\n", "def decorated_objective(config, checkpoint_dir=None):\n", " for i in range(30):\n", " loss = config[\"mean\"] + config[\"sd\"] * np.random.randn()\n", " session.report({\"loss\": loss})\n", " wandb.log(dict(loss=loss))" ] }, { "cell_type": "markdown", "id": "04040bcb", "metadata": {}, "source": [ "With the `decorated_objective` defined, running a Tune experiment is as simple as providing this objective and\n", "passing the `api_key_file` to the `wandb` key of your Tune `config`:" ] }, { "cell_type": "code", "execution_count": 5, "id": "d4fbd368", "metadata": { "pycharm": { "name": "#%%\n" }, "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "def tune_decorated(api_key_file):\n", " \"\"\"Example for using the @wandb_mixin decorator with the function API\"\"\"\n", " tuner = tune.Tuner(\n", " objective,\n", " tune_config=tune.TuneConfig(\n", " metric=\"loss\",\n", " mode=\"min\",\n", " ),\n", " param_space={\n", " \"mean\": tune.grid_search([1, 2, 3, 4, 5]),\n", " \"sd\": tune.uniform(0.2, 0.8),\n", " \"wandb\": {\"api_key_file\": api_key_file, \"project\": \"Wandb_example\"},\n", " },\n", " )\n", " results = tuner.fit()\n", "\n", " return results.get_best_result().config" ] }, { "cell_type": "markdown", "id": "f9521481", "metadata": {}, "source": [ "Finally, you can also define a class-based Tune `Trainable` by using the `WandbTrainableMixin` to define your objective:" ] }, { "cell_type": "code", "execution_count": 6, "id": "d27a7a35", "metadata": { "pycharm": { "name": "#%%\n" }, "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "class WandbTrainable(WandbTrainableMixin, Trainable):\n", " def step(self):\n", " for i in range(30):\n", " loss = self.config[\"mean\"] + self.config[\"sd\"] * np.random.randn()\n", " wandb.log({\"loss\": loss})\n", " return {\"loss\": loss, \"done\": True}" ] }, { "cell_type": "markdown", "id": "fa189bb2", "metadata": {}, "source": [ "Running Tune with this `WandbTrainable` works exactly the same as with the function API.\n", "The below `tune_trainable` function differs from `tune_decorated` above only in the first argument we pass to\n", "`Tuner()`:" ] }, { "cell_type": "code", "execution_count": 8, "id": "6e546cc2", "metadata": { "pycharm": { "name": "#%%\n" }, "vscode": { "languageId": "python" } }, "outputs": [], "source": [ "def tune_trainable(api_key_file):\n", " \"\"\"Example for using a WandTrainableMixin with the class API\"\"\"\n", " tuner = tune.Tuner(\n", " WandbTrainable,\n", " tune_config=tune.TuneConfig(\n", " metric=\"loss\",\n", " mode=\"min\",\n", " ),\n", " param_space={\n", " \"mean\": tune.grid_search([1, 2, 3, 4, 5]),\n", " \"sd\": tune.uniform(0.2, 0.8),\n", " \"wandb\": {\"api_key_file\": api_key_file, \"project\": \"Wandb_example\"},\n", " },\n", " )\n", " results = tuner.fit()\n", "\n", " return results.get_best_result().config" ] }, { "cell_type": "markdown", "id": "0b736172", "metadata": {}, "source": [ "Since you may not have an API key for Wandb, we can _mock_ the Wandb logger and test all three of our training\n", "functions as follows.\n", "If you do have an API key file, make sure to set `mock_api` to `False` and pass in the right `api_key_file` below." ] }, { "cell_type": "code", "execution_count": 9, "id": "e0e7f481", "metadata": { "pycharm": { "name": "#%%\n" }, "vscode": { "languageId": "python" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2022-07-22 15:39:38,323\tINFO services.py:1483 -- View the Ray dashboard at \u001b[1m\u001b[32mhttp://127.0.0.1:8266\u001b[39m\u001b[22m\n", "/Users/kai/coding/ray/python/ray/tune/trainable/function_trainable.py:643: DeprecationWarning: `checkpoint_dir` in `func(config, checkpoint_dir)` is being deprecated. To save and load checkpoint in trainable functions, please use the `ray.air.session` API:\n", "\n", "from ray.air import session\n", "\n", "def train(config):\n", " # ...\n", " session.report({\"metric\": metric}, checkpoint=checkpoint)\n", "\n", "For more information please see https://docs.ray.io/en/master/ray-air/key-concepts.html#session\n", "\n", " DeprecationWarning,\n" ] }, { "data": { "text/html": [ "== Status ==
Current time: 2022-07-22 15:39:47 (running for 00:00:06.01)
Memory usage on this node: 9.9/16.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/16 CPUs, 0/0 GPUs, 0.0/5.52 GiB heap, 0.0/2.0 GiB objects
Current best trial: 1e575_00000 with loss=0.6535282890948189 and parameters={'mean': 1, 'sd': 0.6540704916919089}
Result logdir: /Users/kai/ray_results/objective_2022-07-22_15-39-35
Number of trials: 5/5 (5 TERMINATED)
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Trial name status loc mean sd iter total time (s) loss
objective_1e575_00000TERMINATED127.0.0.1:47932 10.65407 30 0.2035220.653528
objective_1e575_00001TERMINATED127.0.0.1:47941 20.72087 30 0.3142811.14091
objective_1e575_00002TERMINATED127.0.0.1:47942 30.680016 30 0.43947 2.11278
objective_1e575_00003TERMINATED127.0.0.1:47943 40.296117 30 0.4424534.33397
objective_1e575_00004TERMINATED127.0.0.1:47944 50.358219 30 0.3627295.41971


" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": [ "2022-07-22 15:39:41,596\tINFO plugin_schema_manager.py:52 -- Loading the default runtime env schemas: ['/Users/kai/coding/ray/python/ray/_private/runtime_env/../../runtime_env/schemas/working_dir_schema.json', '/Users/kai/coding/ray/python/ray/_private/runtime_env/../../runtime_env/schemas/pip_schema.json'].\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Result for objective_1e575_00000:\n", " date: 2022-07-22_15-39-44\n", " done: false\n", " experiment_id: 60ffbe63fc834195a37fabc078985531\n", " hostname: Kais-MacBook-Pro.local\n", " iterations_since_restore: 1\n", " loss: 0.4005309978356091\n", " node_ip: 127.0.0.1\n", " pid: 47932\n", " time_since_restore: 0.0001418590545654297\n", " time_this_iter_s: 0.0001418590545654297\n", " time_total_s: 0.0001418590545654297\n", " timestamp: 1658500784\n", " timesteps_since_restore: 0\n", " training_iteration: 1\n", " trial_id: 1e575_00000\n", " warmup_time: 0.002913236618041992\n", " \n", "Result for objective_1e575_00000:\n", " date: 2022-07-22_15-39-44\n", " done: true\n", " experiment_id: 60ffbe63fc834195a37fabc078985531\n", " experiment_tag: 0_mean=1,sd=0.6541\n", " hostname: Kais-MacBook-Pro.local\n", " iterations_since_restore: 30\n", " loss: 0.6535282890948189\n", " node_ip: 127.0.0.1\n", " pid: 47932\n", " time_since_restore: 0.203521728515625\n", " time_this_iter_s: 0.003339052200317383\n", " time_total_s: 0.203521728515625\n", " timestamp: 1658500784\n", " timesteps_since_restore: 0\n", " training_iteration: 30\n", " trial_id: 1e575_00000\n", " warmup_time: 0.002913236618041992\n", " \n", "Result for objective_1e575_00002:\n", " date: 2022-07-22_15-39-46\n", " done: false\n", " experiment_id: c812a92f07134341a2908abc6e315061\n", " hostname: Kais-MacBook-Pro.local\n", " iterations_since_restore: 1\n", " loss: 2.7700164667438716\n", " node_ip: 127.0.0.1\n", " pid: 47942\n", " time_since_restore: 0.00013971328735351562\n", " time_this_iter_s: 0.00013971328735351562\n", " time_total_s: 0.00013971328735351562\n", " timestamp: 1658500786\n", " timesteps_since_restore: 0\n", " training_iteration: 1\n", " trial_id: 1e575_00002\n", " warmup_time: 0.002918720245361328\n", " \n", "Result for objective_1e575_00003:\n", " date: 2022-07-22_15-39-46\n", " done: false\n", " experiment_id: b97d28ec439342ae8dd7c7fa4ac4ccca\n", " hostname: Kais-MacBook-Pro.local\n", " iterations_since_restore: 1\n", " loss: 3.895346250529465\n", " node_ip: 127.0.0.1\n", " pid: 47943\n", " time_since_restore: 0.00013494491577148438\n", " time_this_iter_s: 0.00013494491577148438\n", " time_total_s: 0.00013494491577148438\n", " timestamp: 1658500786\n", " timesteps_since_restore: 0\n", " training_iteration: 1\n", " trial_id: 1e575_00003\n", " warmup_time: 0.0031499862670898438\n", " \n", "Result for objective_1e575_00001:\n", " date: 2022-07-22_15-39-46\n", " done: false\n", " experiment_id: 7034e40ba23f495eb6974ad5bda1406d\n", " hostname: Kais-MacBook-Pro.local\n", " iterations_since_restore: 1\n", " loss: 1.8250068029519693\n", " node_ip: 127.0.0.1\n", " pid: 47941\n", " time_since_restore: 0.00015974044799804688\n", " time_this_iter_s: 0.00015974044799804688\n", " time_total_s: 0.00015974044799804688\n", " timestamp: 1658500786\n", " timesteps_since_restore: 0\n", " training_iteration: 1\n", " trial_id: 1e575_00001\n", " warmup_time: 0.0026862621307373047\n", " \n", "Result for objective_1e575_00004:\n", " date: 2022-07-22_15-39-46\n", " done: false\n", " experiment_id: 6b7bf17ee7444b22b809897292864e19\n", " hostname: Kais-MacBook-Pro.local\n", " iterations_since_restore: 1\n", " loss: 5.098807619369106\n", " node_ip: 127.0.0.1\n", " pid: 47944\n", " time_since_restore: 0.00012803077697753906\n", " time_this_iter_s: 0.00012803077697753906\n", " time_total_s: 0.00012803077697753906\n", " timestamp: 1658500786\n", " timesteps_since_restore: 0\n", " training_iteration: 1\n", " trial_id: 1e575_00004\n", " warmup_time: 0.002666950225830078\n", " \n", "Result for objective_1e575_00002:\n", " date: 2022-07-22_15-39-47\n", " done: true\n", " experiment_id: c812a92f07134341a2908abc6e315061\n", " experiment_tag: 2_mean=3,sd=0.6800\n", " hostname: Kais-MacBook-Pro.local\n", " iterations_since_restore: 30\n", " loss: 2.1127773612837975\n", " node_ip: 127.0.0.1\n", " pid: 47942\n", " time_since_restore: 0.4394698143005371\n", " time_this_iter_s: 0.005173921585083008\n", " time_total_s: 0.4394698143005371\n", " timestamp: 1658500787\n", " timesteps_since_restore: 0\n", " training_iteration: 30\n", " trial_id: 1e575_00002\n", " warmup_time: 0.002918720245361328\n", " \n", "Result for objective_1e575_00001:\n", " date: 2022-07-22_15-39-47\n", " done: true\n", " experiment_id: 7034e40ba23f495eb6974ad5bda1406d\n", " experiment_tag: 1_mean=2,sd=0.7209\n", " hostname: Kais-MacBook-Pro.local\n", " iterations_since_restore: 30\n", " loss: 1.1409060371452806\n", " node_ip: 127.0.0.1\n", " pid: 47941\n", " time_since_restore: 0.31428098678588867\n", " time_this_iter_s: 0.008217096328735352\n", " time_total_s: 0.31428098678588867\n", " timestamp: 1658500787\n", " timesteps_since_restore: 0\n", " training_iteration: 30\n", " trial_id: 1e575_00001\n", " warmup_time: 0.0026862621307373047\n", " \n", "Result for objective_1e575_00003:\n", " date: 2022-07-22_15-39-47\n", " done: true\n", " experiment_id: b97d28ec439342ae8dd7c7fa4ac4ccca\n", " experiment_tag: 3_mean=4,sd=0.2961\n", " hostname: Kais-MacBook-Pro.local\n", " iterations_since_restore: 30\n", " loss: 4.333967406156947\n", " node_ip: 127.0.0.1\n", " pid: 47943\n", " time_since_restore: 0.44245290756225586\n", " time_this_iter_s: 0.005827903747558594\n", " time_total_s: 0.44245290756225586\n", " timestamp: 1658500787\n", " timesteps_since_restore: 0\n", " training_iteration: 30\n", " trial_id: 1e575_00003\n", " warmup_time: 0.0031499862670898438\n", " \n", "Result for objective_1e575_00004:\n", " date: 2022-07-22_15-39-47\n", " done: true\n", " experiment_id: 6b7bf17ee7444b22b809897292864e19\n", " experiment_tag: 4_mean=5,sd=0.3582\n", " hostname: Kais-MacBook-Pro.local\n", " iterations_since_restore: 30\n", " loss: 5.419707275520466\n", " node_ip: 127.0.0.1\n", " pid: 47944\n", " time_since_restore: 0.3627290725708008\n", " time_this_iter_s: 0.006065845489501953\n", " time_total_s: 0.3627290725708008\n", " timestamp: 1658500787\n", " timesteps_since_restore: 0\n", " training_iteration: 30\n", " trial_id: 1e575_00004\n", " warmup_time: 0.002666950225830078\n", " \n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2022-07-22 15:39:47,478\tINFO tune.py:738 -- Total run time: 6.95 seconds (6.00 seconds for the tuning loop).\n" ] }, { "data": { "text/html": [ "== Status ==
Current time: 2022-07-22 15:39:53 (running for 00:00:05.64)
Memory usage on this node: 9.8/16.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/16 CPUs, 0/0 GPUs, 0.0/5.52 GiB heap, 0.0/2.0 GiB objects
Current best trial: 227e1_00000 with loss=1.4158135642199134 and parameters={'mean': 1, 'sd': 0.35625806806413973, 'wandb': {'api_key_file': '/var/folders/b2/0_91bd757rz02lrmr920v0gw0000gn/T/tmp9qec20eq', 'project': 'Wandb_example'}}
Result logdir: /Users/kai/ray_results/objective_2022-07-22_15-39-47
Number of trials: 5/5 (5 TERMINATED)
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Trial name status loc mean sd iter total time (s) loss
objective_227e1_00000TERMINATED127.0.0.1:47968 10.356258 30 0.08696011.41581
objective_227e1_00001TERMINATED127.0.0.1:47973 20.411041 30 0.371924 2.9165
objective_227e1_00002TERMINATED127.0.0.1:47974 30.359191 30 0.305055 2.57809
objective_227e1_00003TERMINATED127.0.0.1:47975 40.543202 30 0.218044 5.06532
objective_227e1_00004TERMINATED127.0.0.1:47976 50.777638 30 0.287682 6.36554


" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Result for objective_227e1_00000:\n", " date: 2022-07-22_15-39-50\n", " done: false\n", " experiment_id: e80ef3e4843c41068c733322d48e0817\n", " hostname: Kais-MacBook-Pro.local\n", " iterations_since_restore: 1\n", " loss: 0.27641082730463906\n", " node_ip: 127.0.0.1\n", " pid: 47968\n", " time_since_restore: 0.0001361370086669922\n", " time_this_iter_s: 0.0001361370086669922\n", " time_total_s: 0.0001361370086669922\n", " timestamp: 1658500790\n", " timesteps_since_restore: 0\n", " training_iteration: 1\n", " trial_id: 227e1_00000\n", " warmup_time: 0.003004789352416992\n", " \n", "Result for objective_227e1_00000:\n", " date: 2022-07-22_15-39-50\n", " done: true\n", " experiment_id: e80ef3e4843c41068c733322d48e0817\n", " experiment_tag: 0_mean=1,sd=0.3563\n", " hostname: Kais-MacBook-Pro.local\n", " iterations_since_restore: 30\n", " loss: 1.4158135642199134\n", " node_ip: 127.0.0.1\n", " pid: 47968\n", " time_since_restore: 0.0869600772857666\n", " time_this_iter_s: 0.0022199153900146484\n", " time_total_s: 0.0869600772857666\n", " timestamp: 1658500790\n", " timesteps_since_restore: 0\n", " training_iteration: 30\n", " trial_id: 227e1_00000\n", " warmup_time: 0.003004789352416992\n", " \n", "Result for objective_227e1_00001:\n", " date: 2022-07-22_15-39-52\n", " done: false\n", " experiment_id: bf0685a616354a02af154ac3601a2109\n", " hostname: Kais-MacBook-Pro.local\n", " iterations_since_restore: 1\n", " loss: 2.058177604134134\n", " node_ip: 127.0.0.1\n", " pid: 47973\n", " time_since_restore: 0.00015783309936523438\n", " time_this_iter_s: 0.00015783309936523438\n", " time_total_s: 0.00015783309936523438\n", " timestamp: 1658500792\n", " timesteps_since_restore: 0\n", " training_iteration: 1\n", " trial_id: 227e1_00001\n", " warmup_time: 0.0029697418212890625\n", " \n", "Result for objective_227e1_00004:\n", " date: 2022-07-22_15-39-52\n", " done: false\n", " experiment_id: 1f45d26f052c443d8a4aef3279f4e29e\n", " hostname: Kais-MacBook-Pro.local\n", " iterations_since_restore: 1\n", " loss: 5.383672927239436\n", " node_ip: 127.0.0.1\n", " pid: 47976\n", " time_since_restore: 0.00013184547424316406\n", " time_this_iter_s: 0.00013184547424316406\n", " time_total_s: 0.00013184547424316406\n", " timestamp: 1658500792\n", " timesteps_since_restore: 0\n", " training_iteration: 1\n", " trial_id: 227e1_00004\n", " warmup_time: 0.0028159618377685547\n", " \n", "Result for objective_227e1_00003:\n", " date: 2022-07-22_15-39-52\n", " done: false\n", " experiment_id: c4b18bff67ec45939614ad8b66cecb8c\n", " hostname: Kais-MacBook-Pro.local\n", " iterations_since_restore: 1\n", " loss: 2.6242029842903367\n", " node_ip: 127.0.0.1\n", " pid: 47975\n", " time_since_restore: 0.00014901161193847656\n", " time_this_iter_s: 0.00014901161193847656\n", " time_total_s: 0.00014901161193847656\n", " timestamp: 1658500792\n", " timesteps_since_restore: 0\n", " training_iteration: 1\n", " trial_id: 227e1_00003\n", " warmup_time: 0.0026941299438476562\n", " \n", "Result for objective_227e1_00002:\n", " date: 2022-07-22_15-39-52\n", " done: false\n", " experiment_id: b84e7701625e49ef8056680eb616b611\n", " hostname: Kais-MacBook-Pro.local\n", " iterations_since_restore: 1\n", " loss: 3.2091889147367088\n", " node_ip: 127.0.0.1\n", " pid: 47974\n", " time_since_restore: 0.00016427040100097656\n", " time_this_iter_s: 0.00016427040100097656\n", " time_total_s: 0.00016427040100097656\n", " timestamp: 1658500792\n", " timesteps_since_restore: 0\n", " training_iteration: 1\n", " trial_id: 227e1_00002\n", " warmup_time: 0.0029571056365966797\n", " \n", "Result for objective_227e1_00003:\n", " date: 2022-07-22_15-39-53\n", " done: true\n", " experiment_id: c4b18bff67ec45939614ad8b66cecb8c\n", " experiment_tag: 3_mean=4,sd=0.5432\n", " hostname: Kais-MacBook-Pro.local\n", " iterations_since_restore: 30\n", " loss: 5.065320265027247\n", " node_ip: 127.0.0.1\n", " pid: 47975\n", " time_since_restore: 0.21804404258728027\n", " time_this_iter_s: 0.011553049087524414\n", " time_total_s: 0.21804404258728027\n", " timestamp: 1658500793\n", " timesteps_since_restore: 0\n", " training_iteration: 30\n", " trial_id: 227e1_00003\n", " warmup_time: 0.0026941299438476562\n", " \n", "Result for objective_227e1_00002:\n", " date: 2022-07-22_15-39-53\n", " done: true\n", " experiment_id: b84e7701625e49ef8056680eb616b611\n", " experiment_tag: 2_mean=3,sd=0.3592\n", " hostname: Kais-MacBook-Pro.local\n", " iterations_since_restore: 30\n", " loss: 2.578088712628635\n", " node_ip: 127.0.0.1\n", " pid: 47974\n", " time_since_restore: 0.3050551414489746\n", " time_this_iter_s: 0.005466938018798828\n", " time_total_s: 0.3050551414489746\n", " timestamp: 1658500793\n", " timesteps_since_restore: 0\n", " training_iteration: 30\n", " trial_id: 227e1_00002\n", " warmup_time: 0.0029571056365966797\n", " \n", "Result for objective_227e1_00001:\n", " date: 2022-07-22_15-39-53\n", " done: true\n", " experiment_id: bf0685a616354a02af154ac3601a2109\n", " experiment_tag: 1_mean=2,sd=0.4110\n", " hostname: Kais-MacBook-Pro.local\n", " iterations_since_restore: 30\n", " loss: 2.9165001549045844\n", " node_ip: 127.0.0.1\n", " pid: 47973\n", " time_since_restore: 0.37192392349243164\n", " time_this_iter_s: 0.007360935211181641\n", " time_total_s: 0.37192392349243164\n", " timestamp: 1658500793\n", " timesteps_since_restore: 0\n", " training_iteration: 30\n", " trial_id: 227e1_00001\n", " warmup_time: 0.0029697418212890625\n", " \n", "Result for objective_227e1_00004:\n", " date: 2022-07-22_15-39-53\n", " done: true\n", " experiment_id: 1f45d26f052c443d8a4aef3279f4e29e\n", " experiment_tag: 4_mean=5,sd=0.7776\n", " hostname: Kais-MacBook-Pro.local\n", " iterations_since_restore: 30\n", " loss: 6.365540480426036\n", " node_ip: 127.0.0.1\n", " pid: 47976\n", " time_since_restore: 0.28768181800842285\n", " time_this_iter_s: 0.003290891647338867\n", " time_total_s: 0.28768181800842285\n", " timestamp: 1658500793\n", " timesteps_since_restore: 0\n", " training_iteration: 30\n", " trial_id: 227e1_00004\n", " warmup_time: 0.0028159618377685547\n", " \n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2022-07-22 15:39:53,254\tINFO tune.py:738 -- Total run time: 5.76 seconds (5.63 seconds for the tuning loop).\n" ] }, { "data": { "text/html": [ "== Status ==
Current time: 2022-07-22 15:39:59 (running for 00:00:06.06)
Memory usage on this node: 10.1/16.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/16 CPUs, 0/0 GPUs, 0.0/5.52 GiB heap, 0.0/2.0 GiB objects
Current best trial: 25f04_00000 with loss=0.9941371354505734 and parameters={'mean': 1, 'sd': 0.5245309522439918}
Result logdir: /Users/kai/ray_results/WandbTrainable_2022-07-22_15-39-53
Number of trials: 5/5 (5 ERROR)
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Trial name status loc mean sd iter total time (s) loss
WandbTrainable_25f04_00000ERROR 127.0.0.1:47994 10.524531 1 0.0008277890.994137
WandbTrainable_25f04_00001ERROR 127.0.0.1:48005 20.515265 1 0.00108528 2.31254
WandbTrainable_25f04_00002ERROR 127.0.0.1:48006 30.56327 1 0.00111198 3.43952
WandbTrainable_25f04_00003ERROR 127.0.0.1:48007 40.507054 1 0.0009930134.53341
WandbTrainable_25f04_00004ERROR 127.0.0.1:48008 50.372142 1 0.0008499625.13408

Number of errored trials: 5
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Trial name # failureserror file
WandbTrainable_25f04_00000 1/Users/kai/ray_results/WandbTrainable_2022-07-22_15-39-53/WandbTrainable_25f04_00000_0_mean=1,sd=0.5245_2022-07-22_15-39-53/error.txt
WandbTrainable_25f04_00001 1/Users/kai/ray_results/WandbTrainable_2022-07-22_15-39-53/WandbTrainable_25f04_00001_1_mean=2,sd=0.5153_2022-07-22_15-39-56/error.txt
WandbTrainable_25f04_00002 1/Users/kai/ray_results/WandbTrainable_2022-07-22_15-39-53/WandbTrainable_25f04_00002_2_mean=3,sd=0.5633_2022-07-22_15-39-56/error.txt
WandbTrainable_25f04_00003 1/Users/kai/ray_results/WandbTrainable_2022-07-22_15-39-53/WandbTrainable_25f04_00003_3_mean=4,sd=0.5071_2022-07-22_15-39-56/error.txt
WandbTrainable_25f04_00004 1/Users/kai/ray_results/WandbTrainable_2022-07-22_15-39-53/WandbTrainable_25f04_00004_4_mean=5,sd=0.3721_2022-07-22_15-39-56/error.txt

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": [ "2022-07-22 15:39:56,146\tERROR trial_runner.py:921 -- Trial WandbTrainable_25f04_00000: Error processing event.\n", "ray.exceptions.RayTaskError(NotImplementedError): \u001b[36mray::WandbTrainable.save()\u001b[39m (pid=47994, ip=127.0.0.1, repr=<__main__.WandbTrainable object at 0x11052de10>)\n", " File \"/Users/kai/coding/ray/python/ray/tune/trainable/trainable.py\", line 449, in save\n", " checkpoint_dict_or_path = self.save_checkpoint(checkpoint_dir)\n", " File \"/Users/kai/coding/ray/python/ray/tune/trainable/trainable.py\", line 1014, in save_checkpoint\n", " raise NotImplementedError\n", "NotImplementedError\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Result for WandbTrainable_25f04_00000:\n", " date: 2022-07-22_15-39-56\n", " done: true\n", " experiment_id: c0ac6bf4f2af45368a3c5c3e14e47115\n", " hostname: Kais-MacBook-Pro.local\n", " iterations_since_restore: 1\n", " loss: 0.9941371354505734\n", " node_ip: 127.0.0.1\n", " pid: 47994\n", " time_since_restore: 0.000827789306640625\n", " time_this_iter_s: 0.000827789306640625\n", " time_total_s: 0.000827789306640625\n", " timestamp: 1658500796\n", " timesteps_since_restore: 0\n", " training_iteration: 1\n", " trial_id: 25f04_00000\n", " warmup_time: 0.0031821727752685547\n", " \n", "Result for WandbTrainable_25f04_00000:\n", " date: 2022-07-22_15-39-56\n", " done: true\n", " experiment_id: c0ac6bf4f2af45368a3c5c3e14e47115\n", " experiment_tag: 0_mean=1,sd=0.5245\n", " hostname: Kais-MacBook-Pro.local\n", " iterations_since_restore: 1\n", " loss: 0.9941371354505734\n", " node_ip: 127.0.0.1\n", " pid: 47994\n", " time_since_restore: 0.000827789306640625\n", " time_this_iter_s: 0.000827789306640625\n", " time_total_s: 0.000827789306640625\n", " timestamp: 1658500796\n", " timesteps_since_restore: 0\n", " training_iteration: 1\n", " trial_id: 25f04_00000\n", " warmup_time: 0.0031821727752685547\n", " \n", "Result for WandbTrainable_25f04_00002:\n", " date: 2022-07-22_15-39-59\n", " done: true\n", " experiment_id: b4174fe95248493e8dedfcbc67549339\n", " hostname: Kais-MacBook-Pro.local\n", " iterations_since_restore: 1\n", " loss: 3.4395203958985836\n", " node_ip: 127.0.0.1\n", " pid: 48006\n", " time_since_restore: 0.0011119842529296875\n", " time_this_iter_s: 0.0011119842529296875\n", " time_total_s: 0.0011119842529296875\n", " timestamp: 1658500799\n", " timesteps_since_restore: 0\n", " training_iteration: 1\n", " trial_id: 25f04_00002\n", " warmup_time: 0.004413127899169922\n", " \n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2022-07-22 15:39:59,299\tERROR trial_runner.py:921 -- Trial WandbTrainable_25f04_00002: Error processing event.\n", "ray.exceptions.RayTaskError(NotImplementedError): \u001b[36mray::WandbTrainable.save()\u001b[39m (pid=48006, ip=127.0.0.1, repr=<__main__.WandbTrainable object at 0x11a54c8d0>)\n", " File \"/Users/kai/coding/ray/python/ray/tune/trainable/trainable.py\", line 449, in save\n", " checkpoint_dict_or_path = self.save_checkpoint(checkpoint_dir)\n", " File \"/Users/kai/coding/ray/python/ray/tune/trainable/trainable.py\", line 1014, in save_checkpoint\n", " raise NotImplementedError\n", "NotImplementedError\n", "2022-07-22 15:39:59,305\tERROR trial_runner.py:921 -- Trial WandbTrainable_25f04_00004: Error processing event.\n", "ray.exceptions.RayTaskError(NotImplementedError): \u001b[36mray::WandbTrainable.save()\u001b[39m (pid=48008, ip=127.0.0.1, repr=<__main__.WandbTrainable object at 0x11c314d90>)\n", " File \"/Users/kai/coding/ray/python/ray/tune/trainable/trainable.py\", line 449, in save\n", " checkpoint_dict_or_path = self.save_checkpoint(checkpoint_dir)\n", " File \"/Users/kai/coding/ray/python/ray/tune/trainable/trainable.py\", line 1014, in save_checkpoint\n", " raise NotImplementedError\n", "NotImplementedError\n", "2022-07-22 15:39:59,310\tERROR trial_runner.py:921 -- Trial WandbTrainable_25f04_00001: Error processing event.\n", "ray.exceptions.RayTaskError(NotImplementedError): \u001b[36mray::WandbTrainable.save()\u001b[39m (pid=48005, ip=127.0.0.1, repr=<__main__.WandbTrainable object at 0x10e56fb90>)\n", " File \"/Users/kai/coding/ray/python/ray/tune/trainable/trainable.py\", line 449, in save\n", " checkpoint_dict_or_path = self.save_checkpoint(checkpoint_dir)\n", " File \"/Users/kai/coding/ray/python/ray/tune/trainable/trainable.py\", line 1014, in save_checkpoint\n", " raise NotImplementedError\n", "NotImplementedError\n", "2022-07-22 15:39:59,324\tERROR trial_runner.py:921 -- Trial WandbTrainable_25f04_00003: Error processing event.\n", "ray.exceptions.RayTaskError(NotImplementedError): \u001b[36mray::WandbTrainable.save()\u001b[39m (pid=48007, ip=127.0.0.1, repr=<__main__.WandbTrainable object at 0x10b49ee50>)\n", " File \"/Users/kai/coding/ray/python/ray/tune/trainable/trainable.py\", line 449, in save\n", " checkpoint_dict_or_path = self.save_checkpoint(checkpoint_dir)\n", " File \"/Users/kai/coding/ray/python/ray/tune/trainable/trainable.py\", line 1014, in save_checkpoint\n", " raise NotImplementedError\n", "NotImplementedError\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Result for WandbTrainable_25f04_00001:\n", " date: 2022-07-22_15-39-59\n", " done: true\n", " experiment_id: b0920f67a88f4993b7ec85dee2f78022\n", " hostname: Kais-MacBook-Pro.local\n", " iterations_since_restore: 1\n", " loss: 2.3125440070079093\n", " node_ip: 127.0.0.1\n", " pid: 48005\n", " time_since_restore: 0.0010852813720703125\n", " time_this_iter_s: 0.0010852813720703125\n", " time_total_s: 0.0010852813720703125\n", " timestamp: 1658500799\n", " timesteps_since_restore: 0\n", " training_iteration: 1\n", " trial_id: 25f04_00001\n", " warmup_time: 0.0049626827239990234\n", " \n", "Result for WandbTrainable_25f04_00004:\n", " date: 2022-07-22_15-39-59\n", " done: true\n", " experiment_id: 4435b2105eb24fbaba4778e33ce2e1a9\n", " hostname: Kais-MacBook-Pro.local\n", " iterations_since_restore: 1\n", " loss: 5.134083536061109\n", " node_ip: 127.0.0.1\n", " pid: 48008\n", " time_since_restore: 0.0008499622344970703\n", " time_this_iter_s: 0.0008499622344970703\n", " time_total_s: 0.0008499622344970703\n", " timestamp: 1658500799\n", " timesteps_since_restore: 0\n", " training_iteration: 1\n", " trial_id: 25f04_00004\n", " warmup_time: 0.0031480789184570312\n", " \n", "Result for WandbTrainable_25f04_00002:\n", " date: 2022-07-22_15-39-59\n", " done: true\n", " experiment_id: b4174fe95248493e8dedfcbc67549339\n", " experiment_tag: 2_mean=3,sd=0.5633\n", " hostname: Kais-MacBook-Pro.local\n", " iterations_since_restore: 1\n", " loss: 3.4395203958985836\n", " node_ip: 127.0.0.1\n", " pid: 48006\n", " time_since_restore: 0.0011119842529296875\n", " time_this_iter_s: 0.0011119842529296875\n", " time_total_s: 0.0011119842529296875\n", " timestamp: 1658500799\n", " timesteps_since_restore: 0\n", " training_iteration: 1\n", " trial_id: 25f04_00002\n", " warmup_time: 0.004413127899169922\n", " \n", "Result for WandbTrainable_25f04_00004:\n", " date: 2022-07-22_15-39-59\n", " done: true\n", " experiment_id: 4435b2105eb24fbaba4778e33ce2e1a9\n", " experiment_tag: 4_mean=5,sd=0.3721\n", " hostname: Kais-MacBook-Pro.local\n", " iterations_since_restore: 1\n", " loss: 5.134083536061109\n", " node_ip: 127.0.0.1\n", " pid: 48008\n", " time_since_restore: 0.0008499622344970703\n", " time_this_iter_s: 0.0008499622344970703\n", " time_total_s: 0.0008499622344970703\n", " timestamp: 1658500799\n", " timesteps_since_restore: 0\n", " training_iteration: 1\n", " trial_id: 25f04_00004\n", " warmup_time: 0.0031480789184570312\n", " \n", "Result for WandbTrainable_25f04_00001:\n", " date: 2022-07-22_15-39-59\n", " done: true\n", " experiment_id: b0920f67a88f4993b7ec85dee2f78022\n", " experiment_tag: 1_mean=2,sd=0.5153\n", " hostname: Kais-MacBook-Pro.local\n", " iterations_since_restore: 1\n", " loss: 2.3125440070079093\n", " node_ip: 127.0.0.1\n", " pid: 48005\n", " time_since_restore: 0.0010852813720703125\n", " time_this_iter_s: 0.0010852813720703125\n", " time_total_s: 0.0010852813720703125\n", " timestamp: 1658500799\n", " timesteps_since_restore: 0\n", " training_iteration: 1\n", " trial_id: 25f04_00001\n", " warmup_time: 0.0049626827239990234\n", " \n", "Result for WandbTrainable_25f04_00003:\n", " date: 2022-07-22_15-39-59\n", " done: true\n", " experiment_id: a667aef035a1475a883c166a014b756c\n", " hostname: Kais-MacBook-Pro.local\n", " iterations_since_restore: 1\n", " loss: 4.533407187147774\n", " node_ip: 127.0.0.1\n", " pid: 48007\n", " time_since_restore: 0.0009930133819580078\n", " time_this_iter_s: 0.0009930133819580078\n", " time_total_s: 0.0009930133819580078\n", " timestamp: 1658500799\n", " timesteps_since_restore: 0\n", " training_iteration: 1\n", " trial_id: 25f04_00003\n", " warmup_time: 0.0036199092864990234\n", " \n", "Result for WandbTrainable_25f04_00003:\n", " date: 2022-07-22_15-39-59\n", " done: true\n", " experiment_id: a667aef035a1475a883c166a014b756c\n", " experiment_tag: 3_mean=4,sd=0.5071\n", " hostname: Kais-MacBook-Pro.local\n", " iterations_since_restore: 1\n", " loss: 4.533407187147774\n", " node_ip: 127.0.0.1\n", " pid: 48007\n", " time_since_restore: 0.0009930133819580078\n", " time_this_iter_s: 0.0009930133819580078\n", " time_total_s: 0.0009930133819580078\n", " timestamp: 1658500799\n", " timesteps_since_restore: 0\n", " training_iteration: 1\n", " trial_id: 25f04_00003\n", " warmup_time: 0.0036199092864990234\n", " \n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2022-07-22 15:39:59,455\tERROR tune.py:733 -- Trials did not complete: [WandbTrainable_25f04_00000, WandbTrainable_25f04_00001, WandbTrainable_25f04_00002, WandbTrainable_25f04_00003, WandbTrainable_25f04_00004]\n", "2022-07-22 15:39:59,456\tINFO tune.py:738 -- Total run time: 6.18 seconds (6.04 seconds for the tuning loop).\n" ] } ], "source": [ "import tempfile\n", "from unittest.mock import MagicMock\n", "\n", "mock_api = True\n", "\n", "api_key_file = \"~/.wandb_api_key\"\n", "\n", "if mock_api:\n", " WandbLoggerCallback._logger_process_cls = MagicMock\n", " decorated_objective.__mixins__ = tuple()\n", " WandbTrainable._wandb = MagicMock()\n", " wandb = MagicMock() # noqa: F811\n", " temp_file = tempfile.NamedTemporaryFile()\n", " temp_file.write(b\"1234\")\n", " temp_file.flush()\n", " api_key_file = temp_file.name\n", "\n", "tune_function(api_key_file)\n", "tune_decorated(api_key_file)\n", "tune_trainable(api_key_file)\n", "\n", "if mock_api:\n", " temp_file.close()" ] }, { "cell_type": "markdown", "id": "2f6e9138", "metadata": {}, "source": [ "This completes our Tune and Wandb walk-through.\n", "In the following sections you can find more details on the API of the Tune-Wandb integration.\n", "\n", "## Tune Wandb API Reference\n", "\n", "### WandbLoggerCallback\n", "\n", "(tune-wandb-logger)=\n", "\n", "```{eval-rst}\n", ".. autoclass:: ray.air.callbacks.wandb.WandbLoggerCallback\n", " :noindex:\n", "```\n", "\n", "### Wandb-Mixin\n", "\n", "(tune-wandb-mixin)=\n", "\n", "```{eval-rst}\n", ".. autofunction:: ray.tune.integration.wandb.wandb_mixin\n", " :noindex:\n", "```" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.7" }, "orphan": true }, "nbformat": 4, "nbformat_minor": 5 }