From 8a306063088fcc2553168cc2ba9cd6e22f9366fc Mon Sep 17 00:00:00 2001
From: Antoni Baum <antoni.baum@protonmail.com>
Date: Tue, 30 Aug 2022 21:36:41 +0200
Subject: [PATCH] [AIR][Docs] Improve Hugging Face notebook example (#28121)

Improves the HF notebook by making use of preprocessors and adding a section on tuning. Brings it in line with the Ray Summit 2022 demo.

Signed-off-by: Antoni Baum antoni.baum@protonmail.com
---
 .../huggingface_text_classification.ipynb     | 2053 +++++++++++------
 1 file changed, 1388 insertions(+), 665 deletions(-)
diff --git a/doc/source/ray-air/examples/huggingface_text_classification.ipynb b/doc/source/ray-air/examples/huggingface_text_classification.ipynb
index 6ba93c934..a44ba0d2d 100644
--- a/doc/source/ray-air/examples/huggingface_text_classification.ipynb
+++ b/doc/source/ray-air/examples/huggingface_text_classification.ipynb
@@ -18,7 +18,7 @@
         "In this notebook, we will:\n",
         "1. [Set up Ray](#setup)\n",
         "2. [Load the dataset](#load)\n",
-        "3. [Preprocess the dataset](#preprocess)\n",
+        "3. [Preprocess the dataset with Ray AIR](#preprocess)\n",
         "4. [Run the training with Ray AIR](#train)\n",
         "5. [Predict on test data with Ray AIR](#predict)\n",
         "6. [Optionally, share the model with the community](#share)"
@@ -35,7 +35,7 @@
     },
     {
       "cell_type": "code",
-      "execution_count": null,
+      "execution_count": 1,
       "metadata": {
         "id": "YajFzmkthYbO"
       },
@@ -61,7 +61,7 @@
       "source": [
         "We will use `ray.init()` to initialize a local cluster. By default, this cluster will be compromised of only the machine you are running this notebook on. You can also run this notebook on an Anyscale cluster.\n",
         "\n",
-        "This notebook *will not* run in [Ray Client](https://docs.ray.io/en/latest/cluster/running-applications/job-submission/ray-client.html) mode."
+        "Note: this notebook *will not* run in Ray Client mode."
       ]
     },
     {
@@ -75,10 +75,64 @@
         "outputId": "e527bdbb-2f28-4142-cca0-762e0566cbcd"
       },
       "outputs": [
+        {
+          "name": "stderr",
+          "output_type": "stream",
+          "text": [
+            "2022-08-25 10:09:51,282\tINFO worker.py:1223 -- Using address localhost:9031 set in the environment variable RAY_ADDRESS\n",
+            "2022-08-25 10:09:51,697\tINFO worker.py:1333 -- Connecting to existing Ray cluster at address: 172.31.80.117:9031...\n",
+            "2022-08-25 10:09:51,706\tINFO worker.py:1509 -- Connected to Ray cluster. View the dashboard at \u001b[1m\u001b[32mhttps://session-i8ddtfaxhwypbvnyb9uzg7xs.i.anyscaleuserdata-staging.com/auth/?token=agh0_CkcwRQIhAJXwvxwq31GryaWthvXGCXZebsijbuqi7qL2pCa5uROOAiBGjzsyXAJFHLlaEI9zSlNI8ewtghKg5UV3t8NmlxuMcRJmEiCtvjcKE0VPiU7iQx51P9oPQjfpo5g1RJXccVSS5005cBgCIgNuL2E6DAj9xazjBhDwj4veAUIMCP3ClJgGEPCPi94B-gEeChxzZXNfaThERFRmQVhId1lwYlZueWI5dVpnN3hT&redirect_to=dashboard \u001b[39m\u001b[22m\n",
+            "2022-08-25 10:09:51,709\tINFO packaging.py:342 -- Pushing file package 'gcs://_ray_pkg_3332f64b0a461fddc20be71129115d0a.zip' (0.34MiB) to Ray cluster...\n",
+            "2022-08-25 10:09:51,714\tINFO packaging.py:351 -- Successfully pushed file package 'gcs://_ray_pkg_3332f64b0a461fddc20be71129115d0a.zip'.\n"
+          ]
+        },
         {
           "data": {
+            "text/html": [
+              "<div>\n",
+              "    <div style=\"margin-left: 50px;display: flex;flex-direction: row;align-items: center\">\n",
+              "        <h3 style=\"color: var(--jp-ui-font-color0)\">Ray</h3>\n",
+              "        <svg version=\"1.1\" id=\"ray\" width=\"3em\" viewBox=\"0 0 144.5 144.6\" style=\"margin-left: 3em;margin-right: 3em\">\n",
+              "            <g id=\"layer-1\">\n",
+              "                <path fill=\"#00a2e9\" class=\"st0\" d=\"M97.3,77.2c-3.8-1.1-6.2,0.9-8.3,5.1c-3.5,6.8-9.9,9.9-17.4,9.6S58,88.1,54.8,81.2c-1.4-3-3-4-6.3-4.1\n",
+              "                    c-5.6-0.1-9.9,0.1-13.1,6.4c-3.8,7.6-13.6,10.2-21.8,7.6C5.2,88.4-0.4,80.5,0,71.7c0.1-8.4,5.7-15.8,13.8-18.2\n",
+              "                    c8.4-2.6,17.5,0.7,22.3,8c1.3,1.9,1.3,5.2,3.6,5.6c3.9,0.6,8,0.2,12,0.2c1.8,0,1.9-1.6,2.4-2.8c3.5-7.8,9.7-11.8,18-11.9\n",
+              "                    c8.2-0.1,14.4,3.9,17.8,11.4c1.3,2.8,2.9,3.6,5.7,3.3c1-0.1,2,0.1,3,0c2.8-0.5,6.4,1.7,8.1-2.7s-2.3-5.5-4.1-7.5\n",
+              "                    c-5.1-5.7-10.9-10.8-16.1-16.3C84,38,81.9,37.1,78,38.3C66.7,42,56.2,35.7,53,24.1C50.3,14,57.3,2.8,67.7,0.5\n",
+              "                    C78.4-2,89,4.7,91.5,15.3c0.1,0.3,0.1,0.5,0.2,0.8c0.7,3.4,0.7,6.9-0.8,9.8c-1.7,3.2-0.8,5,1.5,7.2c6.7,6.5,13.3,13,19.8,19.7\n",
+              "                    c1.8,1.8,3,2.1,5.5,1.2c9.1-3.4,17.9-0.6,23.4,7c4.8,6.9,4.6,16.1-0.4,22.9c-5.4,7.2-14.2,9.9-23.1,6.5c-2.3-0.9-3.5-0.6-5.1,1.1\n",
+              "                    c-6.7,6.9-13.6,13.7-20.5,20.4c-1.8,1.8-2.5,3.2-1.4,5.9c3.5,8.7,0.3,18.6-7.7,23.6c-7.9,5-18.2,3.8-24.8-2.9\n",
+              "                    c-6.4-6.4-7.4-16.2-2.5-24.3c4.9-7.8,14.5-11,23.1-7.8c3,1.1,4.7,0.5,6.9-1.7C91.7,98.4,98,92.3,104.2,86c1.6-1.6,4.1-2.7,2.6-6.2\n",
+              "                    c-1.4-3.3-3.8-2.5-6.2-2.6C99.8,77.2,98.9,77.2,97.3,77.2z M72.1,29.7c5.5,0.1,9.9-4.3,10-9.8c0-0.1,0-0.2,0-0.3\n",
+              "                    C81.8,14,77,9.8,71.5,10.2c-5,0.3-9,4.2-9.3,9.2c-0.2,5.5,4,10.1,9.5,10.3C71.8,29.7,72,29.7,72.1,29.7z M72.3,62.3\n",
+              "                    c-5.4-0.1-9.9,4.2-10.1,9.7c0,0.2,0,0.3,0,0.5c0.2,5.4,4.5,9.7,9.9,10c5.1,0.1,9.9-4.7,10.1-9.8c0.2-5.5-4-10-9.5-10.3\n",
+              "                    C72.6,62.3,72.4,62.3,72.3,62.3z M115,72.5c0.1,5.4,4.5,9.7,9.8,9.9c5.6-0.2,10-4.8,10-10.4c-0.2-5.4-4.6-9.7-10-9.7\n",
+              "                    c-5.3-0.1-9.8,4.2-9.9,9.5C115,72.1,115,72.3,115,72.5z M19.5,62.3c-5.4,0.1-9.8,4.4-10,9.8c-0.1,5.1,5.2,10.4,10.2,10.3\n",
+              "                    c5.6-0.2,10-4.9,9.8-10.5c-0.1-5.4-4.5-9.7-9.9-9.6C19.6,62.3,19.5,62.3,19.5,62.3z M71.8,134.6c5.9,0.2,10.3-3.9,10.4-9.6\n",
+              "                    c0.5-5.5-3.6-10.4-9.1-10.8c-5.5-0.5-10.4,3.6-10.8,9.1c0,0.5,0,0.9,0,1.4c-0.2,5.3,4,9.8,9.3,10\n",
+              "                    C71.6,134.6,71.7,134.6,71.8,134.6z\"/>\n",
+              "            </g>\n",
+              "        </svg>\n",
+              "        <table>\n",
+              "            <tr>\n",
+              "                <td style=\"text-align: left\"><b>Python version:</b></td>\n",
+              "                <td style=\"text-align: left\"><b>3.8.5</b></td>\n",
+              "            </tr>\n",
+              "            <tr>\n",
+              "                <td style=\"text-align: left\"><b>Ray version:</b></td>\n",
+              "                <td style=\"text-align: left\"><b> 2.0.0</b></td>\n",
+              "            </tr>\n",
+              "            <tr>\n",
+              "    <td style=\"text-align: left\"><b>Dashboard:</b></td>\n",
+              "    <td style=\"text-align: left\"><b><a href=\"http://session-i8ddtfaxhwypbvnyb9uzg7xs.i.anyscaleuserdata-staging.com/auth/?token=agh0_CkcwRQIhAJXwvxwq31GryaWthvXGCXZebsijbuqi7qL2pCa5uROOAiBGjzsyXAJFHLlaEI9zSlNI8ewtghKg5UV3t8NmlxuMcRJmEiCtvjcKE0VPiU7iQx51P9oPQjfpo5g1RJXccVSS5005cBgCIgNuL2E6DAj9xazjBhDwj4veAUIMCP3ClJgGEPCPi94B-gEeChxzZXNfaThERFRmQVhId1lwYlZueWI5dVpnN3hT&redirect_to=dashboard\" target=\"_blank\">http://session-i8ddtfaxhwypbvnyb9uzg7xs.i.anyscaleuserdata-staging.com/auth/?token=agh0_CkcwRQIhAJXwvxwq31GryaWthvXGCXZebsijbuqi7qL2pCa5uROOAiBGjzsyXAJFHLlaEI9zSlNI8ewtghKg5UV3t8NmlxuMcRJmEiCtvjcKE0VPiU7iQx51P9oPQjfpo5g1RJXccVSS5005cBgCIgNuL2E6DAj9xazjBhDwj4veAUIMCP3ClJgGEPCPi94B-gEeChxzZXNfaThERFRmQVhId1lwYlZueWI5dVpnN3hT&redirect_to=dashboard</a></b></td>\n",
+              "</tr>\n",
+              "\n",
+              "        </table>\n",
+              "    </div>\n",
+              "</div>\n"
+            ],
             "text/plain": [
-              "RayContext(dashboard_url='', python_version='3.7.13', ray_version='2.0.0.dev0', ray_commit='e2ee2140f97ca08b70fd0f7561038b7f8d958d63', address_info={'node_ip_address': '172.28.0.2', 'raylet_ip_address': '172.28.0.2', 'redis_address': None, 'object_store_address': '/tmp/ray/session_2022-05-12_18-30-10_467499_75/sockets/plasma_store', 'raylet_socket_name': '/tmp/ray/session_2022-05-12_18-30-10_467499_75/sockets/raylet', 'webui_url': '', 'session_dir': '/tmp/ray/session_2022-05-12_18-30-10_467499_75', 'metrics_export_port': 64840, 'gcs_address': '172.28.0.2:58661', 'address': '172.28.0.2:58661', 'node_id': '65d091b8f504ccd72024fd0b1a8445a8f9ea43e86bcbf67868c22ba7'})"
+              "RayContext(dashboard_url='session-i8ddtfaxhwypbvnyb9uzg7xs.i.anyscaleuserdata-staging.com/auth/?token=agh0_CkcwRQIhAJXwvxwq31GryaWthvXGCXZebsijbuqi7qL2pCa5uROOAiBGjzsyXAJFHLlaEI9zSlNI8ewtghKg5UV3t8NmlxuMcRJmEiCtvjcKE0VPiU7iQx51P9oPQjfpo5g1RJXccVSS5005cBgCIgNuL2E6DAj9xazjBhDwj4veAUIMCP3ClJgGEPCPi94B-gEeChxzZXNfaThERFRmQVhId1lwYlZueWI5dVpnN3hT&redirect_to=dashboard', python_version='3.8.5', ray_version='2.0.0', ray_commit='cba26cc83f6b5b8a2ff166594a65cb74c0ec8740', address_info={'node_ip_address': '172.31.80.117', 'raylet_ip_address': '172.31.80.117', 'redis_address': None, 'object_store_address': '/tmp/ray/session_2022-08-25_09-57-39_455459_216/sockets/plasma_store', 'raylet_socket_name': '/tmp/ray/session_2022-08-25_09-57-39_455459_216/sockets/raylet', 'webui_url': 'session-i8ddtfaxhwypbvnyb9uzg7xs.i.anyscaleuserdata-staging.com/auth/?token=agh0_CkcwRQIhAJXwvxwq31GryaWthvXGCXZebsijbuqi7qL2pCa5uROOAiBGjzsyXAJFHLlaEI9zSlNI8ewtghKg5UV3t8NmlxuMcRJmEiCtvjcKE0VPiU7iQx51P9oPQjfpo5g1RJXccVSS5005cBgCIgNuL2E6DAj9xazjBhDwj4veAUIMCP3ClJgGEPCPi94B-gEeChxzZXNfaThERFRmQVhId1lwYlZueWI5dVpnN3hT&redirect_to=dashboard', 'session_dir': '/tmp/ray/session_2022-08-25_09-57-39_455459_216', 'metrics_export_port': 55366, 'gcs_address': '172.31.80.117:9031', 'address': '172.31.80.117:9031', 'dashboard_agent_listen_port': 52365, 'node_id': '422ff33444fd0f870aa6e718628407400a0ec9483a637c3026c3f9a3'})"
             ]
           },
           "execution_count": 2,
@@ -117,12 +171,16 @@
           "name": "stdout",
           "output_type": "stream",
           "text": [
-            "{'CPU': 2.0,\n",
-            " 'GPU': 1.0,\n",
-            " 'accelerator_type:T4': 1.0,\n",
-            " 'memory': 7855477556.0,\n",
-            " 'node:172.28.0.2': 1.0,\n",
-            " 'object_store_memory': 3927738777.0}\n"
+            "{'CPU': 208.0,\n",
+            " 'GPU': 16.0,\n",
+            " 'accelerator_type:T4': 4.0,\n",
+            " 'memory': 616693614180.0,\n",
+            " 'node:172.31.76.237': 1.0,\n",
+            " 'node:172.31.80.117': 1.0,\n",
+            " 'node:172.31.85.193': 1.0,\n",
+            " 'node:172.31.85.32': 1.0,\n",
+            " 'node:172.31.90.137': 1.0,\n",
+            " 'object_store_memory': 259318055729.0}\n"
           ]
         }
       ],
@@ -232,7 +290,7 @@
     },
     {
       "cell_type": "code",
-      "execution_count": 7,
+      "execution_count": null,
       "metadata": {
         "colab": {
           "base_uri": "https://localhost:8080/",
@@ -241,120 +299,7 @@
         "id": "MwhAeEOuhYbV",
         "outputId": "3aff8c73-d6eb-4784-890a-a419403b5bda"
       },
-      "outputs": [
-        {
-          "data": {
-            "application/vnd.jupyter.widget-view+json": {
-              "model_id": "bf499d18407642489b7f5acb9dc88ca8",
-              "version_major": 2,
-              "version_minor": 0
-            },
-            "text/plain": [
-              "Downloading builder script:   0%|          | 0.00/7.78k [00:00<?, ?B/s]"
-            ]
-          },
-          "metadata": {},
-          "output_type": "display_data"
-        },
-        {
-          "data": {
-            "application/vnd.jupyter.widget-view+json": {
-              "model_id": "032a4b0c60f04ad1839898524ffeb290",
-              "version_major": 2,
-              "version_minor": 0
-            },
-            "text/plain": [
-              "Downloading metadata:   0%|          | 0.00/4.47k [00:00<?, ?B/s]"
-            ]
-          },
-          "metadata": {},
-          "output_type": "display_data"
-        },
-        {
-          "name": "stdout",
-          "output_type": "stream",
-          "text": [
-            "Downloading and preparing dataset glue/cola (download: 368.14 KiB, generated: 596.73 KiB, post-processed: Unknown size, total: 964.86 KiB) to /root/.cache/huggingface/datasets/glue/cola/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad...\n"
-          ]
-        },
-        {
-          "data": {
-            "application/vnd.jupyter.widget-view+json": {
-              "model_id": "360558368bf64c35ab14378a2183c644",
-              "version_major": 2,
-              "version_minor": 0
-            },
-            "text/plain": [
-              "Downloading data:   0%|          | 0.00/377k [00:00<?, ?B/s]"
-            ]
-          },
-          "metadata": {},
-          "output_type": "display_data"
-        },
-        {
-          "data": {
-            "application/vnd.jupyter.widget-view+json": {
-              "model_id": "1a1ff1601285496b8fd00c40f0633720",
-              "version_major": 2,
-              "version_minor": 0
-            },
-            "text/plain": [
-              "Generating train split:   0%|          | 0/8551 [00:00<?, ? examples/s]"
-            ]
-          },
-          "metadata": {},
-          "output_type": "display_data"
-        },
-        {
-          "data": {
-            "application/vnd.jupyter.widget-view+json": {
-              "model_id": "16dde3df50d74f25adac0db6a210eef8",
-              "version_major": 2,
-              "version_minor": 0
-            },
-            "text/plain": [
-              "Generating validation split:   0%|          | 0/1043 [00:00<?, ? examples/s]"
-            ]
-          },
-          "metadata": {},
-          "output_type": "display_data"
-        },
-        {
-          "data": {
-            "application/vnd.jupyter.widget-view+json": {
-              "model_id": "a4dd6698d1f54126b61f1fd0d0dde1f9",
-              "version_major": 2,
-              "version_minor": 0
-            },
-            "text/plain": [
-              "Generating test split:   0%|          | 0/1063 [00:00<?, ? examples/s]"
-            ]
-          },
-          "metadata": {},
-          "output_type": "display_data"
-        },
-        {
-          "name": "stdout",
-          "output_type": "stream",
-          "text": [
-            "Dataset glue downloaded and prepared to /root/.cache/huggingface/datasets/glue/cola/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad. Subsequent calls will reuse this data.\n"
-          ]
-        },
-        {
-          "data": {
-            "application/vnd.jupyter.widget-view+json": {
-              "model_id": "b1fa3ae216f64c0ab17b50ddc8e536b1",
-              "version_major": 2,
-              "version_minor": 0
-            },
-            "text/plain": [
-              "  0%|          | 0/3 [00:00<?, ?it/s]"
-            ]
-          },
-          "metadata": {},
-          "output_type": "display_data"
-        }
-      ],
+      "outputs": [],
       "source": [
         "from datasets import load_dataset\n",
         "\n",
@@ -409,7 +354,7 @@
         "id": "n9qywopnIrJH"
       },
       "source": [
-        "### Preprocessing the data <a name=\"preprocess\"></a>"
+        "### Preprocessing the data with Ray AIR <a name=\"preprocess\"></a>"
       ]
     },
     {
@@ -437,64 +382,7 @@
         "id": "eXNLu_-nIrJI",
         "outputId": "f545a7a5-f341-4315-cd89-9942a657aa31"
       },
-      "outputs": [
-        {
-          "data": {
-            "application/vnd.jupyter.widget-view+json": {
-              "model_id": "8afaa1d7c12a41db8ad9f37c4067bfd4",
-              "version_major": 2,
-              "version_minor": 0
-            },
-            "text/plain": [
-              "Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]"
-            ]
-          },
-          "metadata": {},
-          "output_type": "display_data"
-        },
-        {
-          "data": {
-            "application/vnd.jupyter.widget-view+json": {
-              "model_id": "2c5849fe79464a3c990b1bdc140b3860",
-              "version_major": 2,
-              "version_minor": 0
-            },
-            "text/plain": [
-              "Downloading:   0%|          | 0.00/483 [00:00<?, ?B/s]"
-            ]
-          },
-          "metadata": {},
-          "output_type": "display_data"
-        },
-        {
-          "data": {
-            "application/vnd.jupyter.widget-view+json": {
-              "model_id": "173cb43e6d594a87bd7a8a0fc6888aeb",
-              "version_major": 2,
-              "version_minor": 0
-            },
-            "text/plain": [
-              "Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]"
-            ]
-          },
-          "metadata": {},
-          "output_type": "display_data"
-        },
-        {
-          "data": {
-            "application/vnd.jupyter.widget-view+json": {
-              "model_id": "6ea57663b5244adfa0780b8aca40a035",
-              "version_major": 2,
-              "version_minor": 0
-            },
-            "text/plain": [
-              "Downloading:   0%|          | 0.00/455k [00:00<?, ?B/s]"
-            ]
-          },
-          "metadata": {},
-          "output_type": "display_data"
-        }
-      ],
+      "outputs": [],
       "source": [
         "from transformers import AutoTokenizer\n",
         "\n",
@@ -541,133 +429,6 @@
         "}"
       ]
     },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "2C0hcmp9IrJQ"
-      },
-      "source": [
-        "We can them write the function that will preprocess our samples. We just feed them to the `tokenizer` with the argument `truncation=True`. This will ensure that an input longer that what the model selected can handle will be truncated to the maximum length accepted by the model."
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": 11,
-      "metadata": {
-        "id": "vc0BSBLIIrJQ"
-      },
-      "outputs": [],
-      "source": [
-        "def preprocess_function(examples, *, tokenizer):\n",
-        "    sentence1_key, sentence2_key = task_to_keys[task]\n",
-        "    if sentence2_key is None:\n",
-        "        return tokenizer(examples[sentence1_key], truncation=True)\n",
-        "    return tokenizer(examples[sentence1_key], examples[sentence2_key], truncation=True)"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "zS-6iXTkIrJT"
-      },
-      "source": [
-        "To apply this function on all the sentences (or pairs of sentences) in our dataset, we just use the `map` method of our `dataset` object we created earlier. This will apply the function on all the elements of all the splits in `dataset`, so our training, validation and testing data will be preprocessed in one single command."
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": 12,
-      "metadata": {
-        "colab": {
-          "base_uri": "https://localhost:8080/",
-          "height": 113,
-          "referenced_widgets": [
-            "28ff97b9821d495088a0711191c3e12e",
-            "86872b991c15442584118f1d80fd0002",
-            "c9c3257fe113444684ecf1ae6f75c29e",
-            "b2d325c5482c438c85a70c2da36cd87f",
-            "dc0925463f3840c4abea541a92bb1ea2",
-            "9115f7abc8bc436593ba2fa467e8d5a6",
-            "161cce93c7bb46e5a4241a7d18b89684",
-            "b99ba51d60ce410fb7eda1077f62d682",
-            "d4a98c5d1c754f5ab0f9fcd077cf679e",
-            "f36ca6add4eb42e59b2942e13b10ab57",
-            "b442801df1ca42f48918520235707926",
-            "ed9f698c7c4f46ff9c520ed0597b6bf6",
-            "a92090bf5a004510bed17c915ac7ce0f",
-            "8602f09bbbda43d8846b0eccc72b4e3b",
-            "ed6adf5ad4154b7c958b91eb99944cd4",
-            "1af9d6e90a7443ec89afa9d97e887ab9",
-            "9344f70ece404d25a55280914809b9a0",
-            "ff92d4134be847aeb6119eb9a9c78954",
-            "a773b4ebad9f4c9695407a472c767bb0",
-            "7257356322214ebc80101b3348bea854",
-            "c5b34a2569c847ea846f29ca955b540f",
-            "60d3537f850a4fb5ac7cd1f1e65c3a95",
-            "4f73054b701f4684b3a44793d10d4a0f",
-            "b5a5d7e5f9bc40289acfaa955fe8055a",
-            "c26bb829a6a649fc87f0fbf7c881011f",
-            "c3b85ffc3f044f80b2ed5460570e22bb",
-            "33f6d5b837b44ff7a8baccc6d592643a",
-            "72e5b4bb569348209d574dd2777a26e3",
-            "8a72aa5ea9e14d2e9b16f1ec04590a32",
-            "6f33096dd60741af910b74719c209ec6",
-            "0bf58c047d08490da78ede70471f9af8",
-            "8bf7008eaecb4317b47d62cfbb673299",
-            "41cbe808a34e473eba315488d1a59624"
-          ]
-        },
-        "id": "DDtsaJeVIrJT",
-        "outputId": "29e116d3-9c07-47a2-9728-4e151747b6f6"
-      },
-      "outputs": [
-        {
-          "data": {
-            "application/vnd.jupyter.widget-view+json": {
-              "model_id": "28ff97b9821d495088a0711191c3e12e",
-              "version_major": 2,
-              "version_minor": 0
-            },
-            "text/plain": [
-              "  0%|          | 0/9 [00:00<?, ?ba/s]"
-            ]
-          },
-          "metadata": {},
-          "output_type": "display_data"
-        },
-        {
-          "data": {
-            "application/vnd.jupyter.widget-view+json": {
-              "model_id": "ed9f698c7c4f46ff9c520ed0597b6bf6",
-              "version_major": 2,
-              "version_minor": 0
-            },
-            "text/plain": [
-              "  0%|          | 0/2 [00:00<?, ?ba/s]"
-            ]
-          },
-          "metadata": {},
-          "output_type": "display_data"
-        },
-        {
-          "data": {
-            "application/vnd.jupyter.widget-view+json": {
-              "model_id": "4f73054b701f4684b3a44793d10d4a0f",
-              "version_major": 2,
-              "version_minor": 0
-            },
-            "text/plain": [
-              "  0%|          | 0/2 [00:00<?, ?ba/s]"
-            ]
-          },
-          "metadata": {},
-          "output_type": "display_data"
-        }
-      ],
-      "source": [
-        "encoded_datasets = datasets.map(preprocess_function, batched=True, fn_kwargs=dict(tokenizer=tokenizer))"
-      ]
-    },
     {
       "cell_type": "markdown",
       "metadata": {
@@ -679,15 +440,67 @@
     },
     {
       "cell_type": "code",
-      "execution_count": 13,
-      "metadata": {
-        "id": "OaTDkPPMhYbY"
-      },
-      "outputs": [],
+      "execution_count": 11,
+      "metadata": {},
+      "outputs": [
+        {
+          "data": {
+            "text/plain": [
+              "{'train': Dataset(num_blocks=1, num_rows=8551, schema={sentence: string, label: int64, idx: int32}),\n",
+              " 'validation': Dataset(num_blocks=1, num_rows=1043, schema={sentence: string, label: int64, idx: int32}),\n",
+              " 'test': Dataset(num_blocks=1, num_rows=1063, schema={sentence: string, label: int64, idx: int32})}"
+            ]
+          },
+          "execution_count": 11,
+          "metadata": {},
+          "output_type": "execute_result"
+        }
+      ],
       "source": [
         "import ray.data\n",
         "\n",
-        "ray_datasets = ray.data.from_huggingface(encoded_datasets)"
+        "ray_datasets = ray.data.from_huggingface(datasets)\n",
+        "ray_datasets"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "2C0hcmp9IrJQ"
+      },
+      "source": [
+        "We can them write the function that will preprocess our samples. We just feed them to the `tokenizer` with the argument `truncation=True`. This will ensure that an input longer that what the model selected can handle will be truncated to the maximum length accepted by the model.\n",
+        "\n",
+        "We use a `BatchMapper` to create a Ray AIR preprocessor that will map the function to the dataset in a distributed fashion. It will be ran during training and prediction."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 12,
+      "metadata": {
+        "id": "vc0BSBLIIrJQ"
+      },
+      "outputs": [],
+      "source": [
+        "import pandas as pd\n",
+        "from ray.data.preprocessors import BatchMapper\n",
+        "\n",
+        "def preprocess_function(examples: pd.DataFrame):\n",
+        "    # if we only have one column, we are inferring.\n",
+        "    # no need to tokenize in that case. \n",
+        "    if len(examples.columns) == 1:\n",
+        "        return examples\n",
+        "    examples = examples.to_dict(\"list\")\n",
+        "    sentence1_key, sentence2_key = task_to_keys[task]\n",
+        "    if sentence2_key is None:\n",
+        "        ret = tokenizer(examples[sentence1_key], truncation=True)\n",
+        "    else:\n",
+        "        ret = tokenizer(examples[sentence1_key], examples[sentence2_key], truncation=True)\n",
+        "    # Add back the original columns\n",
+        "    ret = {**examples, **ret}\n",
+        "    return pd.DataFrame.from_dict(ret)\n",
+        "\n",
+        "batch_encoder = BatchMapper(preprocess_function)"
       ]
     },
     {
@@ -722,7 +535,7 @@
     },
     {
       "cell_type": "code",
-      "execution_count": 14,
+      "execution_count": 13,
       "metadata": {
         "id": "TlqNaB8jIrJW"
       },
@@ -748,11 +561,11 @@
         "        evaluation_strategy=\"epoch\",\n",
         "        save_strategy=\"epoch\",\n",
         "        logging_strategy=\"epoch\",\n",
-        "        learning_rate=2e-5,\n",
+        "        learning_rate=config.get(\"learning_rate\", 2e-5),\n",
         "        per_device_train_batch_size=batch_size,\n",
         "        per_device_eval_batch_size=batch_size,\n",
-        "        num_train_epochs=5,\n",
-        "        weight_decay=0.01,\n",
+        "        num_train_epochs=config.get(\"epochs\", 2),\n",
+        "        weight_decay=config.get(\"weight_decay\", 0.01),\n",
         "        push_to_hub=False,\n",
         "        disable_tqdm=True,  # declutter the output a little\n",
         "        no_cuda=not use_gpu,  # you need to explicitly set no_cuda if you want CPUs\n",
@@ -787,26 +600,30 @@
       "source": [
         "With our `trainer_init_per_worker` complete, we can now instantiate the `HuggingFaceTrainer`. Aside from the function, we set the `scaling_config`, controlling the amount of workers and resources used, and the `datasets` we will use for training and evaluation.\n",
         "\n",
-        "We specify the `MlflowLoggerCallback` inside the `run_config`."
+        "We specify the `MlflowLoggerCallback` inside the `run_config`, and pass the preprocessor we have defined earlier as an argument. It will be included with the returned `Checkpoint`, meaning it will also be applied during inference."
       ]
     },
     {
       "cell_type": "code",
-      "execution_count": 15,
+      "execution_count": 14,
       "metadata": {
         "id": "RElw7OgLhYba"
       },
       "outputs": [],
       "source": [
         "from ray.train.huggingface import HuggingFaceTrainer\n",
-        "from ray.air.config import RunConfig, ScalingConfig\n",
+        "from ray.air.config import RunConfig, ScalingConfig, CheckpointConfig\n",
         "from ray.air.callbacks.mlflow import MLflowLoggerCallback\n",
         "\n",
         "trainer = HuggingFaceTrainer(\n",
         "    trainer_init_per_worker=trainer_init_per_worker,\n",
         "    scaling_config=ScalingConfig(num_workers=num_workers, use_gpu=use_gpu),\n",
         "    datasets={\"train\": ray_datasets[\"train\"], \"evaluation\": ray_datasets[validation_key]},\n",
-        "    run_config=RunConfig(callbacks=[MLflowLoggerCallback(experiment_name=name)])\n",
+        "    run_config=RunConfig(\n",
+        "        callbacks=[MLflowLoggerCallback(experiment_name=name)],\n",
+        "        checkpoint_config=CheckpointConfig(num_to_keep=1, checkpoint_score_attribute=\"eval_loss\", checkpoint_score_order=\"min\"),\n",
+        "    ),\n",
+        "    preprocessor=batch_encoder,\n",
         ")"
       ]
     },
@@ -821,7 +638,7 @@
     },
     {
       "cell_type": "code",
-      "execution_count": 16,
+      "execution_count": 15,
       "metadata": {
         "colab": {
           "base_uri": "https://localhost:8080/",
@@ -834,12 +651,12 @@
         {
           "data": {
             "text/html": [
-              "== Status ==<br>Current time: 2022-05-12 18:35:14 (running for 00:03:48.08)<br>Memory usage on this node: 5.7/12.7 GiB<br>Using FIFO scheduling algorithm.<br>Resources requested: 0/2 CPUs, 0/1 GPUs, 0.0/7.32 GiB heap, 0.0/3.66 GiB objects (0.0/1.0 accelerator_type:T4)<br>Result logdir: /root/ray_results/HuggingFaceTrainer_2022-05-12_18-31-26<br>Number of trials: 1/1 (1 TERMINATED)<br><table>\n",
+              "== Status ==<br>Current time: 2022-08-25 10:14:09 (running for 00:04:06.45)<br>Memory usage on this node: 4.3/62.0 GiB<br>Using FIFO scheduling algorithm.<br>Resources requested: 0/208 CPUs, 0/16 GPUs, 0.0/574.34 GiB heap, 0.0/241.51 GiB objects (0.0/4.0 accelerator_type:T4)<br>Result logdir: /home/ray/ray_results/HuggingFaceTrainer_2022-08-25_10-10-02<br>Number of trials: 1/1 (1 TERMINATED)<br><table>\n",
               "<thead>\n",
-              "<tr><th>Trial name                    </th><th>status    </th><th>loc           </th><th style=\"text-align: right;\">  iter</th><th style=\"text-align: right;\">  total time (s)</th><th style=\"text-align: right;\">  loss</th><th style=\"text-align: right;\">  learning_rate</th><th style=\"text-align: right;\">  epoch</th></tr>\n",
+              "<tr><th>Trial name                    </th><th>status    </th><th>loc              </th><th style=\"text-align: right;\">  iter</th><th style=\"text-align: right;\">  total time (s)</th><th style=\"text-align: right;\">  loss</th><th style=\"text-align: right;\">  learning_rate</th><th style=\"text-align: right;\">  epoch</th></tr>\n",
               "</thead>\n",
               "<tbody>\n",
-              "<tr><td>HuggingFaceTrainer_bb9dd_00000</td><td>TERMINATED</td><td>172.28.0.2:419</td><td style=\"text-align: right;\">     5</td><td style=\"text-align: right;\">         222.391</td><td style=\"text-align: right;\">0.1575</td><td style=\"text-align: right;\">    1.30841e-06</td><td style=\"text-align: right;\">      5</td></tr>\n",
+              "<tr><td>HuggingFaceTrainer_c1ff5_00000</td><td>TERMINATED</td><td>172.31.90.137:947</td><td style=\"text-align: right;\">     2</td><td style=\"text-align: right;\">         200.217</td><td style=\"text-align: right;\">0.3886</td><td style=\"text-align: right;\">              0</td><td style=\"text-align: right;\">      2</td></tr>\n",
               "</tbody>\n",
               "</table><br><br>"
             ],
@@ -854,294 +671,335 @@
           "name": "stderr",
           "output_type": "stream",
           "text": [
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m 2022-05-12 18:31:33,158\tINFO torch.py:347 -- Setting up process group for: env:// [rank=0, world_size=1]\n"
+            "(RayTrainWorker pid=1114, ip=172.31.90.137) 2022-08-25 10:10:44,617\tINFO config.py:71 -- Setting up process group for: env:// [rank=0, world_size=4]\n"
           ]
         },
         {
           "name": "stdout",
           "output_type": "stream",
           "text": [
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m Is CUDA available: True\n"
+            "(RayTrainWorker pid=1114, ip=172.31.90.137) Is CUDA available: True\n",
+            "(RayTrainWorker pid=1116, ip=172.31.90.137) Is CUDA available: True\n",
+            "(RayTrainWorker pid=1117, ip=172.31.90.137) Is CUDA available: True\n",
+            "(RayTrainWorker pid=1115, ip=172.31.90.137) Is CUDA available: True\n"
           ]
         },
         {
           "name": "stderr",
           "output_type": "stream",
           "text": [
-            "Downloading builder script: 5.76kB [00:00, 6.35MB/s]                   \n",
-            "Downloading:   0%|          | 0.00/256M [00:00<?, ?B/s]\n",
-            "Downloading:   2%|▏         | 5.63M/256M [00:00<00:04, 59.1MB/s]\n",
-            "Downloading:   5%|▍         | 12.2M/256M [00:00<00:03, 65.0MB/s]\n",
-            "Downloading:   7%|▋         | 18.5M/256M [00:00<00:03, 65.6MB/s]\n",
-            "Downloading:  10%|▉         | 25.3M/256M [00:00<00:03, 67.5MB/s]\n",
-            "Downloading:  12%|█▏        | 31.7M/256M [00:00<00:03, 66.6MB/s]\n",
-            "Downloading:  15%|█▌        | 38.3M/256M [00:00<00:03, 67.6MB/s]\n",
-            "Downloading:  18%|█▊        | 44.8M/256M [00:00<00:03, 67.6MB/s]\n",
-            "Downloading:  20%|██        | 51.2M/256M [00:00<00:03, 66.6MB/s]\n",
-            "Downloading:  23%|██▎       | 57.9M/256M [00:00<00:03, 67.5MB/s]\n",
-            "Downloading:  25%|██▌       | 64.7M/256M [00:01<00:02, 68.6MB/s]\n",
-            "Downloading:  28%|██▊       | 71.2M/256M [00:01<00:02, 66.6MB/s]\n",
-            "Downloading:  31%|███       | 78.0M/256M [00:01<00:02, 67.9MB/s]\n",
-            "Downloading:  33%|███▎      | 84.5M/256M [00:01<00:02, 68.0MB/s]\n",
-            "Downloading:  36%|███▌      | 91.1M/256M [00:01<00:02, 68.2MB/s]\n",
-            "Downloading:  38%|███▊      | 97.7M/256M [00:01<00:02, 68.5MB/s]\n",
-            "Downloading:  41%|████      | 104M/256M [00:01<00:02, 62.8MB/s] \n",
-            "Downloading:  43%|████▎     | 110M/256M [00:01<00:02, 58.5MB/s]\n",
-            "Downloading:  46%|████▌     | 117M/256M [00:01<00:02, 60.5MB/s]\n",
-            "Downloading:  48%|████▊     | 123M/256M [00:01<00:02, 61.7MB/s]\n",
-            "Downloading:  50%|█████     | 129M/256M [00:02<00:02, 63.0MB/s]\n",
-            "Downloading:  53%|█████▎    | 135M/256M [00:02<00:01, 64.0MB/s]\n",
-            "Downloading:  55%|█████▌    | 142M/256M [00:02<00:01, 62.2MB/s]\n",
-            "Downloading:  58%|█████▊    | 148M/256M [00:02<00:01, 61.0MB/s]\n",
-            "Downloading:  60%|██████    | 154M/256M [00:02<00:01, 62.2MB/s]\n",
-            "Downloading:  62%|██████▏   | 160M/256M [00:02<00:01, 62.1MB/s]\n",
-            "Downloading:  65%|██████▌   | 166M/256M [00:02<00:01, 64.1MB/s]\n",
-            "Downloading:  67%|██████▋   | 172M/256M [00:02<00:01, 64.4MB/s]\n",
-            "Downloading:  73%|███████▎  | 186M/256M [00:02<00:01, 67.3MB/s]\n",
-            "Downloading:  75%|███████▌  | 192M/256M [00:03<00:00, 68.0MB/s]\n",
-            "Downloading:  78%|███████▊  | 199M/256M [00:03<00:00, 70.0MB/s]\n",
-            "Downloading:  81%|████████  | 206M/256M [00:03<00:00, 69.6MB/s]\n",
-            "Downloading:  83%|████████▎ | 213M/256M [00:03<00:00, 70.1MB/s]\n",
-            "Downloading:  86%|████████▌ | 220M/256M [00:03<00:00, 69.1MB/s]\n",
-            "Downloading:  89%|████████▊ | 226M/256M [00:03<00:00, 68.4MB/s]\n",
-            "Downloading:  91%|█████████ | 233M/256M [00:03<00:00, 62.3MB/s]\n",
-            "Downloading:  93%|█████████▎| 239M/256M [00:03<00:00, 60.2MB/s]\n",
-            "Downloading:  96%|█████████▌| 245M/256M [00:03<00:00, 61.8MB/s]\n",
-            "Downloading: 100%|██████████| 256M/256M [00:04<00:00, 65.0MB/s]\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.weight', 'vocab_projector.bias', 'vocab_projector.weight', 'vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias']\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m - This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m - This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'pre_classifier.bias', 'pre_classifier.weight', 'classifier.weight']\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m /usr/local/lib/python3.7/dist-packages/transformers/optimization.py:309: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m   FutureWarning,\n"
+            "Downloading builder script: 5.76kB [00:00, 6.45MB/s]                   \n",
+            "Downloading builder script: 5.76kB [00:00, 6.91MB/s]                   \n",
+            "Downloading builder script: 5.76kB [00:00, 6.44MB/s]                   \n",
+            "Downloading builder script: 5.76kB [00:00, 6.94MB/s]                   \n",
+            "Downloading tokenizer_config.json: 100%|██████████| 28.0/28.0 [00:00<00:00, 30.5kB/s]\n",
+            "Downloading config.json: 100%|██████████| 483/483 [00:00<00:00, 817kB/s]\n",
+            "Downloading vocab.txt:   0%|          | 0.00/226k [00:00<?, ?B/s]\n",
+            "Downloading vocab.txt:  18%|█▊        | 41.0k/226k [00:00<00:00, 353kB/s]\n",
+            "Downloading vocab.txt: 100%|██████████| 226k/226k [00:00<00:00, 773kB/s] \n",
+            "Downloading tokenizer.json:   0%|          | 0.00/455k [00:00<?, ?B/s]\n",
+            "Downloading tokenizer.json:   6%|▌         | 28.0k/455k [00:00<00:01, 227kB/s]\n",
+            "Downloading tokenizer.json:  24%|██▍       | 111k/455k [00:00<00:00, 488kB/s] \n",
+            "Downloading tokenizer.json:  42%|████▏     | 191k/455k [00:00<00:00, 559kB/s]\n",
+            "Downloading tokenizer.json:  67%|██████▋   | 303k/455k [00:00<00:00, 694kB/s]\n",
+            "Downloading tokenizer.json: 100%|██████████| 455k/455k [00:00<00:00, 815kB/s]\n",
+            "Downloading pytorch_model.bin:   0%|          | 0.00/256M [00:00<?, ?B/s]\n",
+            "Downloading pytorch_model.bin:   0%|          | 1.20M/256M [00:00<00:21, 12.6MB/s]\n",
+            "Downloading pytorch_model.bin:   2%|▏         | 6.02M/256M [00:00<00:07, 34.9MB/s]\n",
+            "Downloading pytorch_model.bin:   6%|▌         | 15.0M/256M [00:00<00:04, 62.0MB/s]\n",
+            "Downloading pytorch_model.bin:   9%|▉         | 24.0M/256M [00:00<00:03, 74.8MB/s]\n",
+            "Downloading pytorch_model.bin:  13%|█▎        | 33.1M/256M [00:00<00:02, 82.3MB/s]\n",
+            "Downloading pytorch_model.bin:  17%|█▋        | 42.2M/256M [00:00<00:02, 86.7MB/s]\n",
+            "Downloading pytorch_model.bin:  20%|██        | 51.4M/256M [00:00<00:02, 89.8MB/s]\n",
+            "Downloading pytorch_model.bin:  24%|██▎       | 60.6M/256M [00:00<00:02, 91.8MB/s]\n",
+            "Downloading pytorch_model.bin:  27%|██▋       | 69.8M/256M [00:00<00:02, 93.3MB/s]\n",
+            "Downloading pytorch_model.bin:  31%|███       | 78.9M/256M [00:01<00:01, 94.2MB/s]\n",
+            "Downloading pytorch_model.bin:  34%|███▍      | 88.0M/256M [00:01<00:01, 94.6MB/s]\n",
+            "Downloading pytorch_model.bin:  38%|███▊      | 97.2M/256M [00:01<00:01, 95.1MB/s]\n",
+            "Downloading pytorch_model.bin:  42%|████▏     | 106M/256M [00:01<00:01, 95.6MB/s] \n",
+            "Downloading pytorch_model.bin:  45%|████▌     | 116M/256M [00:01<00:01, 96.0MB/s]\n",
+            "Downloading pytorch_model.bin:  49%|████▉     | 125M/256M [00:01<00:01, 96.2MB/s]\n",
+            "Downloading pytorch_model.bin:  52%|█████▏    | 134M/256M [00:01<00:01, 96.0MB/s]\n",
+            "Downloading pytorch_model.bin:  56%|█████▌    | 143M/256M [00:01<00:01, 96.1MB/s]\n",
+            "Downloading pytorch_model.bin:  60%|█████▉    | 152M/256M [00:01<00:01, 96.0MB/s]\n",
+            "Downloading pytorch_model.bin:  63%|██████▎   | 162M/256M [00:01<00:01, 96.2MB/s]\n",
+            "Downloading pytorch_model.bin:  67%|██████▋   | 171M/256M [00:02<00:00, 96.1MB/s]\n",
+            "Downloading pytorch_model.bin:  70%|███████   | 180M/256M [00:02<00:00, 96.2MB/s]\n",
+            "Downloading pytorch_model.bin:  74%|███████▍  | 189M/256M [00:02<00:00, 96.2MB/s]\n",
+            "Downloading pytorch_model.bin:  78%|███████▊  | 198M/256M [00:02<00:00, 96.2MB/s]\n",
+            "Downloading pytorch_model.bin:  81%|████████  | 208M/256M [00:02<00:00, 95.9MB/s]\n",
+            "Downloading pytorch_model.bin:  85%|████████▍ | 217M/256M [00:02<00:00, 95.9MB/s]\n",
+            "Downloading pytorch_model.bin:  88%|████████▊ | 226M/256M [00:02<00:00, 96.2MB/s]\n",
+            "Downloading pytorch_model.bin:  92%|█████████▏| 235M/256M [00:02<00:00, 96.1MB/s]\n",
+            "Downloading pytorch_model.bin:  96%|█████████▌| 244M/256M [00:02<00:00, 96.1MB/s]\n",
+            "Downloading pytorch_model.bin: 100%|██████████| 256M/256M [00:02<00:00, 91.6MB/s]\n",
+            "(RayTrainWorker pid=1117, ip=172.31.90.137) Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_projector.weight', 'vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_transform.weight', 'vocab_layer_norm.bias']\n",
+            "(RayTrainWorker pid=1117, ip=172.31.90.137) - This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
+            "(RayTrainWorker pid=1117, ip=172.31.90.137) - This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
+            "(RayTrainWorker pid=1117, ip=172.31.90.137) Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.bias', 'classifier.bias', 'classifier.weight', 'pre_classifier.weight']\n",
+            "(RayTrainWorker pid=1117, ip=172.31.90.137) You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n",
+            "(RayTrainWorker pid=1114, ip=172.31.90.137) Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_transform.weight']\n",
+            "(RayTrainWorker pid=1114, ip=172.31.90.137) - This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
+            "(RayTrainWorker pid=1114, ip=172.31.90.137) - This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
+            "(RayTrainWorker pid=1114, ip=172.31.90.137) Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.bias', 'pre_classifier.weight', 'classifier.weight', 'classifier.bias']\n",
+            "(RayTrainWorker pid=1114, ip=172.31.90.137) You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n",
+            "(RayTrainWorker pid=1116, ip=172.31.90.137) Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_layer_norm.bias', 'vocab_transform.bias', 'vocab_projector.bias', 'vocab_layer_norm.weight', 'vocab_transform.weight', 'vocab_projector.weight']\n",
+            "(RayTrainWorker pid=1116, ip=172.31.90.137) - This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
+            "(RayTrainWorker pid=1116, ip=172.31.90.137) - This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
+            "(RayTrainWorker pid=1116, ip=172.31.90.137) Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'pre_classifier.weight', 'pre_classifier.bias', 'classifier.weight']\n",
+            "(RayTrainWorker pid=1116, ip=172.31.90.137) You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n",
+            "(RayTrainWorker pid=1115, ip=172.31.90.137) Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_projector.bias', 'vocab_projector.weight', 'vocab_transform.bias', 'vocab_layer_norm.bias', 'vocab_transform.weight', 'vocab_layer_norm.weight']\n",
+            "(RayTrainWorker pid=1115, ip=172.31.90.137) - This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
+            "(RayTrainWorker pid=1115, ip=172.31.90.137) - This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
+            "(RayTrainWorker pid=1115, ip=172.31.90.137) Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.weight', 'classifier.weight', 'classifier.bias', 'pre_classifier.bias']\n",
+            "(RayTrainWorker pid=1115, ip=172.31.90.137) You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n"
           ]
         },
         {
           "name": "stdout",
           "output_type": "stream",
           "text": [
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m Starting training\n"
+            "(RayTrainWorker pid=1114, ip=172.31.90.137) Starting training\n",
+            "(RayTrainWorker pid=1116, ip=172.31.90.137) Starting training\n",
+            "(RayTrainWorker pid=1117, ip=172.31.90.137) Starting training\n",
+            "(RayTrainWorker pid=1115, ip=172.31.90.137) Starting training\n"
           ]
         },
         {
           "name": "stderr",
           "output_type": "stream",
           "text": [
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m ***** Running training *****\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m   Num examples = 8551\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m   Num Epochs = 5\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m   Instantaneous batch size per device = 16\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m   Total train batch size (w. parallel, distributed & accumulation) = 16\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m   Gradient Accumulation steps = 1\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m   Total optimization steps = 2675\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m The following columns in the training set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: idx, sentence. If idx, sentence are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m [W reducer.cpp:1289] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())\n"
+            "(RayTrainWorker pid=1114, ip=172.31.90.137) ***** Running training *****\n",
+            "(RayTrainWorker pid=1114, ip=172.31.90.137)   Num examples = 8551\n",
+            "(RayTrainWorker pid=1114, ip=172.31.90.137)   Num Epochs = 2\n",
+            "(RayTrainWorker pid=1114, ip=172.31.90.137)   Instantaneous batch size per device = 16\n",
+            "(RayTrainWorker pid=1114, ip=172.31.90.137)   Total train batch size (w. parallel, distributed & accumulation) = 64\n",
+            "(RayTrainWorker pid=1114, ip=172.31.90.137)   Gradient Accumulation steps = 1\n",
+            "(RayTrainWorker pid=1114, ip=172.31.90.137)   Total optimization steps = 1070\n",
+            "(RayTrainWorker pid=1114, ip=172.31.90.137) The following columns in the training set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.\n"
           ]
         },
         {
           "name": "stdout",
           "output_type": "stream",
           "text": [
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m {'loss': 0.5441, 'learning_rate': 1.6261682242990654e-05, 'epoch': 0.93}\n"
+            "(RayTrainWorker pid=1114, ip=172.31.90.137) {'loss': 0.5437, 'learning_rate': 1e-05, 'epoch': 1.0}\n"
           ]
         },
         {
           "name": "stderr",
           "output_type": "stream",
           "text": [
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m ***** Running Evaluation *****\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m   Num examples = 1043\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m   Batch size = 16\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: idx, sentence. If idx, sentence are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m Saving model checkpoint to distilbert-base-uncased-finetuned-cola/checkpoint-535\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m Configuration saved in distilbert-base-uncased-finetuned-cola/checkpoint-535/config.json\n"
+            "(RayTrainWorker pid=1114, ip=172.31.90.137) ***** Running Evaluation *****\n",
+            "(RayTrainWorker pid=1114, ip=172.31.90.137)   Num examples = 1043\n",
+            "(RayTrainWorker pid=1114, ip=172.31.90.137)   Batch size = 16\n",
+            "(RayTrainWorker pid=1114, ip=172.31.90.137) The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.\n"
           ]
         },
         {
           "name": "stdout",
           "output_type": "stream",
           "text": [
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m {'eval_loss': 0.4999416470527649, 'eval_matthews_correlation': 0.3991733676966143, 'eval_runtime': 1.0378, 'eval_samples_per_second': 1004.976, 'eval_steps_per_second': 63.594, 'epoch': 1.0}\n"
+            "(RayTrainWorker pid=1114, ip=172.31.90.137) {'eval_loss': 0.5794203281402588, 'eval_matthews_correlation': 0.3293676852500821, 'eval_runtime': 0.9804, 'eval_samples_per_second': 277.441, 'eval_steps_per_second': 5.1, 'epoch': 1.0}\n"
           ]
         },
         {
           "name": "stderr",
           "output_type": "stream",
           "text": [
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m Model weights saved in distilbert-base-uncased-finetuned-cola/checkpoint-535/pytorch_model.bin\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m tokenizer config file saved in distilbert-base-uncased-finetuned-cola/checkpoint-535/tokenizer_config.json\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m Special tokens file saved in distilbert-base-uncased-finetuned-cola/checkpoint-535/special_tokens_map.json\n"
+            "(RayTrainWorker pid=1114, ip=172.31.90.137) Saving model checkpoint to distilbert-base-uncased-finetuned-cola/checkpoint-535\n",
+            "(RayTrainWorker pid=1114, ip=172.31.90.137) Configuration saved in distilbert-base-uncased-finetuned-cola/checkpoint-535/config.json\n",
+            "(RayTrainWorker pid=1114, ip=172.31.90.137) Model weights saved in distilbert-base-uncased-finetuned-cola/checkpoint-535/pytorch_model.bin\n",
+            "(RayTrainWorker pid=1114, ip=172.31.90.137) tokenizer config file saved in distilbert-base-uncased-finetuned-cola/checkpoint-535/tokenizer_config.json\n",
+            "(RayTrainWorker pid=1114, ip=172.31.90.137) Special tokens file saved in distilbert-base-uncased-finetuned-cola/checkpoint-535/special_tokens_map.json\n"
           ]
         },
         {
           "name": "stdout",
           "output_type": "stream",
           "text": [
-            "Trial HuggingFaceTrainer_bb9dd_00000 reported loss=0.5441,learning_rate=1.6261682242990654e-05,epoch=1.0,step=535,eval_loss=0.4999416470527649,eval_matthews_correlation=0.3991733676966143,eval_runtime=1.0378,eval_samples_per_second=1004.976,eval_steps_per_second=63.594,_timestamp=1652380362,_time_this_iter_s=66.77899646759033,_training_iteration=1,should_checkpoint=True with parameters={}.\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m {'loss': 0.3886, 'learning_rate': 1.2523364485981309e-05, 'epoch': 1.87}\n"
+            "Result for HuggingFaceTrainer_c1ff5_00000:\n",
+            "  _time_this_iter_s: 90.87123560905457\n",
+            "  _timestamp: 1661447540\n",
+            "  _training_iteration: 1\n",
+            "  date: 2022-08-25_10-12-20\n",
+            "  done: false\n",
+            "  epoch: 1.0\n",
+            "  eval_loss: 0.5794203281402588\n",
+            "  eval_matthews_correlation: 0.3293676852500821\n",
+            "  eval_runtime: 0.9804\n",
+            "  eval_samples_per_second: 277.441\n",
+            "  eval_steps_per_second: 5.1\n",
+            "  experiment_id: 592e02b25b254bd1a3743904313dc85b\n",
+            "  hostname: ip-172-31-90-137\n",
+            "  iterations_since_restore: 1\n",
+            "  learning_rate: 1.0e-05\n",
+            "  loss: 0.5437\n",
+            "  node_ip: 172.31.90.137\n",
+            "  pid: 947\n",
+            "  should_checkpoint: true\n",
+            "  step: 535\n",
+            "  time_since_restore: 103.24057936668396\n",
+            "  time_this_iter_s: 103.24057936668396\n",
+            "  time_total_s: 103.24057936668396\n",
+            "  timestamp: 1661447540\n",
+            "  timesteps_since_restore: 0\n",
+            "  training_iteration: 1\n",
+            "  trial_id: c1ff5_00000\n",
+            "  warmup_time: 0.003858327865600586\n",
+            "  \n"
           ]
         },
         {
           "name": "stderr",
           "output_type": "stream",
           "text": [
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m ***** Running Evaluation *****\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m   Num examples = 1043\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m   Batch size = 16\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: idx, sentence. If idx, sentence are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.\n"
+            "(RayTrainWorker pid=1114, ip=172.31.90.137) Saving model checkpoint to distilbert-base-uncased-finetuned-cola/checkpoint-1070\n",
+            "(RayTrainWorker pid=1114, ip=172.31.90.137) Configuration saved in distilbert-base-uncased-finetuned-cola/checkpoint-1070/config.json\n",
+            "(RayTrainWorker pid=1114, ip=172.31.90.137) Model weights saved in distilbert-base-uncased-finetuned-cola/checkpoint-1070/pytorch_model.bin\n",
+            "(RayTrainWorker pid=1114, ip=172.31.90.137) tokenizer config file saved in distilbert-base-uncased-finetuned-cola/checkpoint-1070/tokenizer_config.json\n",
+            "(RayTrainWorker pid=1114, ip=172.31.90.137) Special tokens file saved in distilbert-base-uncased-finetuned-cola/checkpoint-1070/special_tokens_map.json\n"
           ]
         },
         {
           "name": "stdout",
           "output_type": "stream",
           "text": [
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m {'eval_loss': 0.5397436618804932, 'eval_matthews_correlation': 0.5085739436587455, 'eval_runtime': 1.0792, 'eval_samples_per_second': 966.488, 'eval_steps_per_second': 61.158, 'epoch': 2.0}\n"
+            "(RayTrainWorker pid=1114, ip=172.31.90.137) {'loss': 0.3886, 'learning_rate': 0.0, 'epoch': 2.0}\n"
           ]
         },
         {
           "name": "stderr",
           "output_type": "stream",
           "text": [
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m Saving model checkpoint to distilbert-base-uncased-finetuned-cola/checkpoint-1070\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m Configuration saved in distilbert-base-uncased-finetuned-cola/checkpoint-1070/config.json\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m Model weights saved in distilbert-base-uncased-finetuned-cola/checkpoint-1070/pytorch_model.bin\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m tokenizer config file saved in distilbert-base-uncased-finetuned-cola/checkpoint-1070/tokenizer_config.json\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m Special tokens file saved in distilbert-base-uncased-finetuned-cola/checkpoint-1070/special_tokens_map.json\n"
+            "(RayTrainWorker pid=1114, ip=172.31.90.137) ***** Running Evaluation *****\n",
+            "(RayTrainWorker pid=1114, ip=172.31.90.137)   Num examples = 1043\n",
+            "(RayTrainWorker pid=1114, ip=172.31.90.137)   Batch size = 16\n",
+            "(RayTrainWorker pid=1114, ip=172.31.90.137) The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.\n"
           ]
         },
         {
           "name": "stdout",
           "output_type": "stream",
           "text": [
-            "Trial HuggingFaceTrainer_bb9dd_00000 reported loss=0.3886,learning_rate=1.2523364485981309e-05,epoch=2.0,step=1070,eval_loss=0.5397436618804932,eval_matthews_correlation=0.5085739436587455,eval_runtime=1.0792,eval_samples_per_second=966.488,eval_steps_per_second=61.158,_timestamp=1652380400,_time_this_iter_s=37.84357762336731,_training_iteration=2,should_checkpoint=True with parameters={}.\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m {'loss': 0.2746, 'learning_rate': 8.785046728971963e-06, 'epoch': 2.8}\n"
+            "(RayTrainWorker pid=1114, ip=172.31.90.137) {'eval_loss': 0.6215357184410095, 'eval_matthews_correlation': 0.42957017514952434, 'eval_runtime': 0.9956, 'eval_samples_per_second': 273.204, 'eval_steps_per_second': 5.022, 'epoch': 2.0}\n"
           ]
         },
         {
           "name": "stderr",
           "output_type": "stream",
           "text": [
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m ***** Running Evaluation *****\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m   Num examples = 1043\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m   Batch size = 16\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: idx, sentence. If idx, sentence are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m Saving model checkpoint to distilbert-base-uncased-finetuned-cola/checkpoint-1605\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m Configuration saved in distilbert-base-uncased-finetuned-cola/checkpoint-1605/config.json\n"
+            "(RayTrainWorker pid=1114, ip=172.31.90.137) Saving model checkpoint to distilbert-base-uncased-finetuned-cola/checkpoint-1070\n",
+            "(RayTrainWorker pid=1114, ip=172.31.90.137) Configuration saved in distilbert-base-uncased-finetuned-cola/checkpoint-1070/config.json\n",
+            "(RayTrainWorker pid=1114, ip=172.31.90.137) Model weights saved in distilbert-base-uncased-finetuned-cola/checkpoint-1070/pytorch_model.bin\n",
+            "(RayTrainWorker pid=1114, ip=172.31.90.137) tokenizer config file saved in distilbert-base-uncased-finetuned-cola/checkpoint-1070/tokenizer_config.json\n",
+            "(RayTrainWorker pid=1114, ip=172.31.90.137) Special tokens file saved in distilbert-base-uncased-finetuned-cola/checkpoint-1070/special_tokens_map.json\n"
           ]
         },
         {
           "name": "stdout",
           "output_type": "stream",
           "text": [
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m {'eval_loss': 0.6648283004760742, 'eval_matthews_correlation': 0.5141951979542654, 'eval_runtime': 1.1148, 'eval_samples_per_second': 935.563, 'eval_steps_per_second': 59.202, 'epoch': 3.0}\n"
+            "(RayTrainWorker pid=1114, ip=172.31.90.137) {'train_runtime': 174.4696, 'train_samples_per_second': 98.023, 'train_steps_per_second': 6.133, 'train_loss': 0.4661755713346963, 'epoch': 2.0}\n"
           ]
         },
         {
           "name": "stderr",
           "output_type": "stream",
           "text": [
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m Model weights saved in distilbert-base-uncased-finetuned-cola/checkpoint-1605/pytorch_model.bin\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m tokenizer config file saved in distilbert-base-uncased-finetuned-cola/checkpoint-1605/tokenizer_config.json\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m Special tokens file saved in distilbert-base-uncased-finetuned-cola/checkpoint-1605/special_tokens_map.json\n"
+            "(RayTrainWorker pid=1114, ip=172.31.90.137) \n",
+            "(RayTrainWorker pid=1114, ip=172.31.90.137) \n",
+            "(RayTrainWorker pid=1114, ip=172.31.90.137) Training completed. Do not forget to share your model on huggingface.co/models =)\n",
+            "(RayTrainWorker pid=1114, ip=172.31.90.137) \n",
+            "(RayTrainWorker pid=1114, ip=172.31.90.137) \n"
           ]
         },
         {
           "name": "stdout",
           "output_type": "stream",
           "text": [
-            "Trial HuggingFaceTrainer_bb9dd_00000 reported loss=0.2746,learning_rate=8.785046728971963e-06,epoch=3.0,step=1605,eval_loss=0.6648283004760742,eval_matthews_correlation=0.5141951979542654,eval_runtime=1.1148,eval_samples_per_second=935.563,eval_steps_per_second=59.202,_timestamp=1652380437,_time_this_iter_s=36.976723432540894,_training_iteration=3,should_checkpoint=True with parameters={}.\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m {'loss': 0.196, 'learning_rate': 5.046728971962617e-06, 'epoch': 3.74}\n"
-          ]
-        },
-        {
-          "name": "stderr",
-          "output_type": "stream",
-          "text": [
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m ***** Running Evaluation *****\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m   Num examples = 1043\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m   Batch size = 16\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: idx, sentence. If idx, sentence are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m Saving model checkpoint to distilbert-base-uncased-finetuned-cola/checkpoint-2140\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m Configuration saved in distilbert-base-uncased-finetuned-cola/checkpoint-2140/config.json\n"
+            "Result for HuggingFaceTrainer_c1ff5_00000:\n",
+            "  _time_this_iter_s: 96.96447467803955\n",
+            "  _timestamp: 1661447637\n",
+            "  _training_iteration: 2\n",
+            "  date: 2022-08-25_10-13-57\n",
+            "  done: false\n",
+            "  epoch: 2.0\n",
+            "  eval_loss: 0.6215357184410095\n",
+            "  eval_matthews_correlation: 0.42957017514952434\n",
+            "  eval_runtime: 0.9956\n",
+            "  eval_samples_per_second: 273.204\n",
+            "  eval_steps_per_second: 5.022\n",
+            "  experiment_id: 592e02b25b254bd1a3743904313dc85b\n",
+            "  hostname: ip-172-31-90-137\n",
+            "  iterations_since_restore: 2\n",
+            "  learning_rate: 0.0\n",
+            "  loss: 0.3886\n",
+            "  node_ip: 172.31.90.137\n",
+            "  pid: 947\n",
+            "  should_checkpoint: true\n",
+            "  step: 1070\n",
+            "  time_since_restore: 200.21722102165222\n",
+            "  time_this_iter_s: 96.97664165496826\n",
+            "  time_total_s: 200.21722102165222\n",
+            "  timestamp: 1661447637\n",
+            "  timesteps_since_restore: 0\n",
+            "  train_loss: 0.4661755713346963\n",
+            "  train_runtime: 174.4696\n",
+            "  train_samples_per_second: 98.023\n",
+            "  train_steps_per_second: 6.133\n",
+            "  training_iteration: 2\n",
+            "  trial_id: c1ff5_00000\n",
+            "  warmup_time: 0.003858327865600586\n",
+            "  \n"
           ]
         },
         {
           "name": "stdout",
           "output_type": "stream",
           "text": [
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m {'eval_loss': 0.7566447854042053, 'eval_matthews_correlation': 0.5518326707011334, 'eval_runtime': 1.1113, 'eval_samples_per_second': 938.535, 'eval_steps_per_second': 59.39, 'epoch': 4.0}\n"
+            "Result for HuggingFaceTrainer_c1ff5_00000:\n",
+            "  _time_this_iter_s: 96.96447467803955\n",
+            "  _timestamp: 1661447637\n",
+            "  _training_iteration: 2\n",
+            "  date: 2022-08-25_10-13-57\n",
+            "  done: true\n",
+            "  epoch: 2.0\n",
+            "  eval_loss: 0.6215357184410095\n",
+            "  eval_matthews_correlation: 0.42957017514952434\n",
+            "  eval_runtime: 0.9956\n",
+            "  eval_samples_per_second: 273.204\n",
+            "  eval_steps_per_second: 5.022\n",
+            "  experiment_id: 592e02b25b254bd1a3743904313dc85b\n",
+            "  experiment_tag: '0'\n",
+            "  hostname: ip-172-31-90-137\n",
+            "  iterations_since_restore: 2\n",
+            "  learning_rate: 0.0\n",
+            "  loss: 0.3886\n",
+            "  node_ip: 172.31.90.137\n",
+            "  pid: 947\n",
+            "  should_checkpoint: true\n",
+            "  step: 1070\n",
+            "  time_since_restore: 200.21722102165222\n",
+            "  time_this_iter_s: 96.97664165496826\n",
+            "  time_total_s: 200.21722102165222\n",
+            "  timestamp: 1661447637\n",
+            "  timesteps_since_restore: 0\n",
+            "  train_loss: 0.4661755713346963\n",
+            "  train_runtime: 174.4696\n",
+            "  train_samples_per_second: 98.023\n",
+            "  train_steps_per_second: 6.133\n",
+            "  training_iteration: 2\n",
+            "  trial_id: c1ff5_00000\n",
+            "  warmup_time: 0.003858327865600586\n",
+            "  \n"
           ]
         },
         {
           "name": "stderr",
           "output_type": "stream",
           "text": [
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m Model weights saved in distilbert-base-uncased-finetuned-cola/checkpoint-2140/pytorch_model.bin\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m tokenizer config file saved in distilbert-base-uncased-finetuned-cola/checkpoint-2140/tokenizer_config.json\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m Special tokens file saved in distilbert-base-uncased-finetuned-cola/checkpoint-2140/special_tokens_map.json\n"
-          ]
-        },
-        {
-          "name": "stdout",
-          "output_type": "stream",
-          "text": [
-            "Trial HuggingFaceTrainer_bb9dd_00000 reported loss=0.196,learning_rate=5.046728971962617e-06,epoch=4.0,step=2140,eval_loss=0.7566447854042053,eval_matthews_correlation=0.5518326707011334,eval_runtime=1.1113,eval_samples_per_second=938.535,eval_steps_per_second=59.39,_timestamp=1652380474,_time_this_iter_s=36.68935775756836,_training_iteration=4,should_checkpoint=True with parameters={}.\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m {'loss': 0.1575, 'learning_rate': 1.308411214953271e-06, 'epoch': 4.67}\n"
-          ]
-        },
-        {
-          "name": "stderr",
-          "output_type": "stream",
-          "text": [
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m Saving model checkpoint to distilbert-base-uncased-finetuned-cola/checkpoint-2675\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m Configuration saved in distilbert-base-uncased-finetuned-cola/checkpoint-2675/config.json\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m Model weights saved in distilbert-base-uncased-finetuned-cola/checkpoint-2675/pytorch_model.bin\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m tokenizer config file saved in distilbert-base-uncased-finetuned-cola/checkpoint-2675/tokenizer_config.json\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m Special tokens file saved in distilbert-base-uncased-finetuned-cola/checkpoint-2675/special_tokens_map.json\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m ***** Running Evaluation *****\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m   Num examples = 1043\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m   Batch size = 16\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: idx, sentence. If idx, sentence are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m Saving model checkpoint to distilbert-base-uncased-finetuned-cola/checkpoint-2675\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m Configuration saved in distilbert-base-uncased-finetuned-cola/checkpoint-2675/config.json\n"
-          ]
-        },
-        {
-          "name": "stdout",
-          "output_type": "stream",
-          "text": [
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m {'eval_loss': 0.8616615533828735, 'eval_matthews_correlation': 0.5420036503219092, 'eval_runtime': 1.2577, 'eval_samples_per_second': 829.302, 'eval_steps_per_second': 52.477, 'epoch': 5.0}\n"
-          ]
-        },
-        {
-          "name": "stderr",
-          "output_type": "stream",
-          "text": [
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m Model weights saved in distilbert-base-uncased-finetuned-cola/checkpoint-2675/pytorch_model.bin\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m tokenizer config file saved in distilbert-base-uncased-finetuned-cola/checkpoint-2675/tokenizer_config.json\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m Special tokens file saved in distilbert-base-uncased-finetuned-cola/checkpoint-2675/special_tokens_map.json\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m \n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m \n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m Training completed. Do not forget to share your model on huggingface.co/models =)\n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m \n",
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m \n"
-          ]
-        },
-        {
-          "name": "stdout",
-          "output_type": "stream",
-          "text": [
-            "\u001b[2m\u001b[36m(RayTrainWorker pid=455)\u001b[0m {'train_runtime': 187.8585, 'train_samples_per_second': 227.592, 'train_steps_per_second': 14.239, 'train_loss': 0.30010223103460865, 'epoch': 5.0}\n",
-            "Trial HuggingFaceTrainer_bb9dd_00000 reported loss=0.1575,learning_rate=1.308411214953271e-06,epoch=5.0,step=2675,eval_loss=0.8616615533828735,eval_matthews_correlation=0.5420036503219092,eval_runtime=1.2577,eval_samples_per_second=829.302,eval_steps_per_second=52.477,train_runtime=187.8585,train_samples_per_second=227.592,train_steps_per_second=14.239,train_loss=0.30010223103460865,_timestamp=1652380513,_time_this_iter_s=39.63672137260437,_training_iteration=5,should_checkpoint=True with parameters={}.\n",
-            "Trial HuggingFaceTrainer_bb9dd_00000 completed. Last result: loss=0.1575,learning_rate=1.308411214953271e-06,epoch=5.0,step=2675,eval_loss=0.8616615533828735,eval_matthews_correlation=0.5420036503219092,eval_runtime=1.2577,eval_samples_per_second=829.302,eval_steps_per_second=52.477,train_runtime=187.8585,train_samples_per_second=227.592,train_steps_per_second=14.239,train_loss=0.30010223103460865,_timestamp=1652380513,_time_this_iter_s=39.63672137260437,_training_iteration=5,should_checkpoint=True\n"
-          ]
-        },
-        {
-          "name": "stderr",
-          "output_type": "stream",
-          "text": [
-            "2022-05-12 18:35:14,803\tINFO tune.py:753 -- Total run time: 228.34 seconds (228.07 seconds for the tuning loop).\n"
+            "2022-08-25 10:14:09,300\tINFO tune.py:758 -- Total run time: 246.67 seconds (246.44 seconds for the tuning loop).\n"
           ]
         }
       ],
@@ -1160,7 +1018,7 @@
     },
     {
       "cell_type": "code",
-      "execution_count": 17,
+      "execution_count": 16,
       "metadata": {
         "colab": {
           "base_uri": "https://localhost:8080/"
@@ -1172,10 +1030,10 @@
         {
           "data": {
             "text/plain": [
-              "Result(metrics={'loss': 0.1575, 'learning_rate': 1.308411214953271e-06, 'epoch': 5.0, 'step': 2675, 'eval_loss': 0.8616615533828735, 'eval_matthews_correlation': 0.5420036503219092, 'eval_runtime': 1.2577, 'eval_samples_per_second': 829.302, 'eval_steps_per_second': 52.477, 'train_runtime': 187.8585, 'train_samples_per_second': 227.592, 'train_steps_per_second': 14.239, 'train_loss': 0.30010223103460865, '_timestamp': 1652380513, '_time_this_iter_s': 39.63672137260437, '_training_iteration': 5, 'time_this_iter_s': 39.64510202407837, 'should_checkpoint': True, 'done': True, 'timesteps_total': None, 'episodes_total': None, 'training_iteration': 5, 'trial_id': 'bb9dd_00000', 'experiment_id': 'db0c5ea784a44980819bf5e1bfb72c04', 'date': '2022-05-12_18-35-13', 'timestamp': 1652380513, 'time_total_s': 222.39091277122498, 'pid': 419, 'hostname': 'e618da00601e', 'node_ip': '172.28.0.2', 'config': {}, 'time_since_restore': 222.39091277122498, 'timesteps_since_restore': 0, 'iterations_since_restore': 5, 'warmup_time': 0.004034996032714844, 'experiment_tag': '0'}, checkpoint=<ray.air.checkpoint.Checkpoint object at 0x7f9ffd9d9c90>, error=None)"
+              "Result(metrics={'loss': 0.3886, 'learning_rate': 0.0, 'epoch': 2.0, 'step': 1070, 'eval_loss': 0.6215357184410095, 'eval_matthews_correlation': 0.42957017514952434, 'eval_runtime': 0.9956, 'eval_samples_per_second': 273.204, 'eval_steps_per_second': 5.022, 'train_runtime': 174.4696, 'train_samples_per_second': 98.023, 'train_steps_per_second': 6.133, 'train_loss': 0.4661755713346963, '_timestamp': 1661447637, '_time_this_iter_s': 96.96447467803955, '_training_iteration': 2, 'should_checkpoint': True, 'done': True, 'trial_id': 'c1ff5_00000', 'experiment_tag': '0'}, error=None, log_dir=PosixPath('/home/ray/ray_results/HuggingFaceTrainer_2022-08-25_10-10-02/HuggingFaceTrainer_c1ff5_00000_0_2022-08-25_10-10-04'))"
             ]
           },
-          "execution_count": 17,
+          "execution_count": 16,
           "metadata": {},
           "output_type": "execute_result"
         }
@@ -1184,6 +1042,996 @@
         "result"
       ]
     },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Tune hyperparameters with Ray AIR <a name=\"predict\"></a>"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "If we would like to tune any hyperparameters of the model, we can do so by simply passing our `HuggingFaceTrainer` into a `Tuner` and defining the search space.\n",
+        "\n",
+        "We can also take advantage of the advanced search algorithms and schedulers provided by Ray Tune. In this example, we will use an `ASHAScheduler` to aggresively terminate underperforming trials."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 17,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "from ray import tune\n",
+        "from ray.tune import Tuner\n",
+        "from ray.tune.schedulers.async_hyperband import ASHAScheduler\n",
+        "\n",
+        "tune_epochs = 4\n",
+        "tuner = Tuner(\n",
+        "    trainer,\n",
+        "    param_space={\n",
+        "        \"trainer_init_config\": {\n",
+        "            \"learning_rate\": tune.grid_search([2e-5, 2e-4, 2e-3, 2e-2]),\n",
+        "            \"epochs\": tune_epochs,\n",
+        "        }\n",
+        "    },\n",
+        "    tune_config=tune.TuneConfig(\n",
+        "        metric=\"eval_loss\",\n",
+        "        mode=\"min\",\n",
+        "        num_samples=1,\n",
+        "        scheduler=ASHAScheduler(\n",
+        "            max_t=tune_epochs,\n",
+        "        )\n",
+        "    ),\n",
+        "    run_config=RunConfig(\n",
+        "        checkpoint_config=CheckpointConfig(num_to_keep=1, checkpoint_score_attribute=\"eval_loss\", checkpoint_score_order=\"min\")\n",
+        "    ),\n",
+        ")"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 18,
+      "metadata": {},
+      "outputs": [
+        {
+          "data": {
+            "text/html": [
+              "== Status ==<br>Current time: 2022-08-25 10:20:13 (running for 00:06:01.75)<br>Memory usage on this node: 4.4/62.0 GiB<br>Using AsyncHyperBand: num_stopped=4\n",
+              "Bracket: Iter 4.000: -0.8064090609550476 | Iter 1.000: -0.6378736793994904<br>Resources requested: 0/208 CPUs, 0/16 GPUs, 0.0/574.34 GiB heap, 0.0/241.51 GiB objects (0.0/4.0 accelerator_type:T4)<br>Current best trial: 5654d_00001 with eval_loss=0.6492420434951782 and parameters={'trainer_init_config': {'learning_rate': 0.0002, 'epochs': 4}}<br>Result logdir: /home/ray/ray_results/HuggingFaceTrainer_2022-08-25_10-14-11<br>Number of trials: 4/4 (4 TERMINATED)<br><table>\n",
+              "<thead>\n",
+              "<tr><th>Trial name                    </th><th>status    </th><th>loc               </th><th style=\"text-align: right;\">  trainer_init_conf...</th><th style=\"text-align: right;\">  iter</th><th style=\"text-align: right;\">  total time (s)</th><th style=\"text-align: right;\">  loss</th><th style=\"text-align: right;\">  learning_rate</th><th style=\"text-align: right;\">  epoch</th></tr>\n",
+              "</thead>\n",
+              "<tbody>\n",
+              "<tr><td>HuggingFaceTrainer_5654d_00000</td><td>TERMINATED</td><td>172.31.90.137:1729</td><td style=\"text-align: right;\">                2e-05 </td><td style=\"text-align: right;\">     4</td><td style=\"text-align: right;\">        347.171 </td><td style=\"text-align: right;\">0.1958</td><td style=\"text-align: right;\">        0      </td><td style=\"text-align: right;\">      4</td></tr>\n",
+              "<tr><td>HuggingFaceTrainer_5654d_00001</td><td>TERMINATED</td><td>172.31.76.237:1805</td><td style=\"text-align: right;\">                0.0002</td><td style=\"text-align: right;\">     1</td><td style=\"text-align: right;\">         95.2492</td><td style=\"text-align: right;\">0.6225</td><td style=\"text-align: right;\">        0.00015</td><td style=\"text-align: right;\">      1</td></tr>\n",
+              "<tr><td>HuggingFaceTrainer_5654d_00002</td><td>TERMINATED</td><td>172.31.85.32:1322 </td><td style=\"text-align: right;\">                0.002 </td><td style=\"text-align: right;\">     1</td><td style=\"text-align: right;\">         93.7613</td><td style=\"text-align: right;\">0.6463</td><td style=\"text-align: right;\">        0.0015 </td><td style=\"text-align: right;\">      1</td></tr>\n",
+              "<tr><td>HuggingFaceTrainer_5654d_00003</td><td>TERMINATED</td><td>172.31.85.193:1060</td><td style=\"text-align: right;\">                0.02  </td><td style=\"text-align: right;\">     1</td><td style=\"text-align: right;\">         99.3677</td><td style=\"text-align: right;\">0.926 </td><td style=\"text-align: right;\">        0.015  </td><td style=\"text-align: right;\">      1</td></tr>\n",
+              "</tbody>\n",
+              "</table><br><br>"
+            ],
+            "text/plain": [
+              "<IPython.core.display.HTML object>"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "name": "stderr",
+          "output_type": "stream",
+          "text": [
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) 2022-08-25 10:14:23,379\tINFO config.py:71 -- Setting up process group for: env:// [rank=0, world_size=4]\n"
+          ]
+        },
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "(RayTrainWorker pid=1792, ip=172.31.90.137) Is CUDA available: True\n",
+            "(RayTrainWorker pid=1790, ip=172.31.90.137) Is CUDA available: True\n",
+            "(RayTrainWorker pid=1791, ip=172.31.90.137) Is CUDA available: True\n",
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) Is CUDA available: True\n"
+          ]
+        },
+        {
+          "name": "stderr",
+          "output_type": "stream",
+          "text": [
+            "(RayTrainWorker pid=1974, ip=172.31.76.237) 2022-08-25 10:14:29,354\tINFO config.py:71 -- Setting up process group for: env:// [rank=0, world_size=4]\n"
+          ]
+        },
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "(RayTrainWorker pid=1977, ip=172.31.76.237) Is CUDA available: True\n",
+            "(RayTrainWorker pid=1976, ip=172.31.76.237) Is CUDA available: True\n",
+            "(RayTrainWorker pid=1975, ip=172.31.76.237) Is CUDA available: True\n",
+            "(RayTrainWorker pid=1974, ip=172.31.76.237) Is CUDA available: True\n"
+          ]
+        },
+        {
+          "name": "stderr",
+          "output_type": "stream",
+          "text": [
+            "(RayTrainWorker pid=1483, ip=172.31.85.32) 2022-08-25 10:14:35,313\tINFO config.py:71 -- Setting up process group for: env:// [rank=0, world_size=4]\n"
+          ]
+        },
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "(RayTrainWorker pid=1790, ip=172.31.90.137) Starting training\n",
+            "(RayTrainWorker pid=1792, ip=172.31.90.137) Starting training\n",
+            "(RayTrainWorker pid=1791, ip=172.31.90.137) Starting training\n",
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) Starting training\n"
+          ]
+        },
+        {
+          "name": "stderr",
+          "output_type": "stream",
+          "text": [
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) ***** Running training *****\n",
+            "(RayTrainWorker pid=1789, ip=172.31.90.137)   Num examples = 8551\n",
+            "(RayTrainWorker pid=1789, ip=172.31.90.137)   Num Epochs = 4\n",
+            "(RayTrainWorker pid=1789, ip=172.31.90.137)   Instantaneous batch size per device = 16\n",
+            "(RayTrainWorker pid=1789, ip=172.31.90.137)   Total train batch size (w. parallel, distributed & accumulation) = 64\n",
+            "(RayTrainWorker pid=1789, ip=172.31.90.137)   Gradient Accumulation steps = 1\n",
+            "(RayTrainWorker pid=1789, ip=172.31.90.137)   Total optimization steps = 2140\n"
+          ]
+        },
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "(RayTrainWorker pid=1483, ip=172.31.85.32) Is CUDA available: True\n",
+            "(RayTrainWorker pid=1485, ip=172.31.85.32) Is CUDA available: True\n",
+            "(RayTrainWorker pid=1486, ip=172.31.85.32) Is CUDA available: True\n",
+            "(RayTrainWorker pid=1484, ip=172.31.85.32) Is CUDA available: True\n",
+            "(RayTrainWorker pid=1977, ip=172.31.76.237) Starting training\n",
+            "(RayTrainWorker pid=1976, ip=172.31.76.237) Starting training\n",
+            "(RayTrainWorker pid=1975, ip=172.31.76.237) Starting training\n",
+            "(RayTrainWorker pid=1974, ip=172.31.76.237) Starting training\n"
+          ]
+        },
+        {
+          "name": "stderr",
+          "output_type": "stream",
+          "text": [
+            "(RayTrainWorker pid=1974, ip=172.31.76.237) ***** Running training *****\n",
+            "(RayTrainWorker pid=1974, ip=172.31.76.237)   Num examples = 8551\n",
+            "(RayTrainWorker pid=1974, ip=172.31.76.237)   Num Epochs = 4\n",
+            "(RayTrainWorker pid=1974, ip=172.31.76.237)   Instantaneous batch size per device = 16\n",
+            "(RayTrainWorker pid=1974, ip=172.31.76.237)   Total train batch size (w. parallel, distributed & accumulation) = 64\n",
+            "(RayTrainWorker pid=1974, ip=172.31.76.237)   Gradient Accumulation steps = 1\n",
+            "(RayTrainWorker pid=1974, ip=172.31.76.237)   Total optimization steps = 2140\n"
+          ]
+        },
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "(RayTrainWorker pid=1483, ip=172.31.85.32) Starting training\n",
+            "(RayTrainWorker pid=1485, ip=172.31.85.32) Starting training\n",
+            "(RayTrainWorker pid=1486, ip=172.31.85.32) Starting training\n",
+            "(RayTrainWorker pid=1484, ip=172.31.85.32) Starting training\n"
+          ]
+        },
+        {
+          "name": "stderr",
+          "output_type": "stream",
+          "text": [
+            "(RayTrainWorker pid=1483, ip=172.31.85.32) ***** Running training *****\n",
+            "(RayTrainWorker pid=1483, ip=172.31.85.32)   Num examples = 8551\n",
+            "(RayTrainWorker pid=1483, ip=172.31.85.32)   Num Epochs = 4\n",
+            "(RayTrainWorker pid=1483, ip=172.31.85.32)   Instantaneous batch size per device = 16\n",
+            "(RayTrainWorker pid=1483, ip=172.31.85.32)   Total train batch size (w. parallel, distributed & accumulation) = 64\n",
+            "(RayTrainWorker pid=1483, ip=172.31.85.32)   Gradient Accumulation steps = 1\n",
+            "(RayTrainWorker pid=1483, ip=172.31.85.32)   Total optimization steps = 2140\n",
+            "(RayTrainWorker pid=1223, ip=172.31.85.193) 2022-08-25 10:14:48,193\tINFO config.py:71 -- Setting up process group for: env:// [rank=0, world_size=4]\n"
+          ]
+        },
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "(RayTrainWorker pid=1223, ip=172.31.85.193) Is CUDA available: True\n",
+            "(RayTrainWorker pid=1224, ip=172.31.85.193) Is CUDA available: True\n",
+            "(RayTrainWorker pid=1226, ip=172.31.85.193) Is CUDA available: True\n",
+            "(RayTrainWorker pid=1225, ip=172.31.85.193) Is CUDA available: True\n"
+          ]
+        },
+        {
+          "name": "stderr",
+          "output_type": "stream",
+          "text": [
+            "Downloading builder script: 5.76kB [00:00, 6.59MB/s]                   \n",
+            "Downloading builder script: 5.76kB [00:00, 6.52MB/s]                   \n",
+            "Downloading builder script: 5.76kB [00:00, 6.07MB/s]                   \n",
+            "Downloading builder script: 5.76kB [00:00, 6.81MB/s]                   \n",
+            "Downloading tokenizer_config.json: 100%|██████████| 28.0/28.0 [00:00<00:00, 46.0kB/s]\n",
+            "Downloading config.json: 100%|██████████| 483/483 [00:00<00:00, 766kB/s]\n",
+            "Downloading vocab.txt:   0%|          | 0.00/226k [00:00<?, ?B/s]\n",
+            "Downloading vocab.txt:  32%|███▏      | 72.0k/226k [00:00<00:00, 624kB/s]\n",
+            "Downloading vocab.txt: 100%|██████████| 226k/226k [00:00<00:00, 966kB/s] \n",
+            "Downloading tokenizer.json:   0%|          | 0.00/455k [00:00<?, ?B/s]\n",
+            "Downloading tokenizer.json:   6%|▋         | 29.0k/455k [00:00<00:01, 233kB/s]\n",
+            "Downloading tokenizer.json:  30%|██▉       | 136k/455k [00:00<00:00, 600kB/s] \n",
+            "Downloading tokenizer.json: 100%|██████████| 455k/455k [00:00<00:00, 1.44MB/s]\n",
+            "Downloading pytorch_model.bin:   0%|          | 0.00/256M [00:00<?, ?B/s]\n",
+            "Downloading pytorch_model.bin:   1%|          | 2.32M/256M [00:00<00:10, 24.4MB/s]\n",
+            "Downloading pytorch_model.bin:   4%|▍         | 11.0M/256M [00:00<00:04, 63.4MB/s]\n",
+            "Downloading pytorch_model.bin:   8%|▊         | 20.0M/256M [00:00<00:03, 77.7MB/s]\n",
+            "Downloading pytorch_model.bin:  11%|█▏        | 29.1M/256M [00:00<00:02, 84.8MB/s]\n",
+            "Downloading pytorch_model.bin:  15%|█▍        | 38.2M/256M [00:00<00:02, 88.5MB/s]\n",
+            "Downloading pytorch_model.bin:  18%|█▊        | 47.3M/256M [00:00<00:02, 90.7MB/s]\n",
+            "Downloading pytorch_model.bin:  22%|██▏       | 56.4M/256M [00:00<00:02, 92.4MB/s]\n",
+            "Downloading pytorch_model.bin:  26%|██▌       | 65.5M/256M [00:00<00:02, 93.4MB/s]\n",
+            "Downloading pytorch_model.bin:  29%|██▉       | 74.7M/256M [00:00<00:02, 94.2MB/s]\n",
+            "Downloading pytorch_model.bin:  33%|███▎      | 83.8M/256M [00:01<00:01, 94.8MB/s]\n",
+            "Downloading pytorch_model.bin:  36%|███▋      | 93.0M/256M [00:01<00:01, 95.1MB/s]\n",
+            "Downloading pytorch_model.bin:  40%|███▉      | 102M/256M [00:01<00:01, 95.4MB/s] \n",
+            "Downloading pytorch_model.bin:  44%|████▎     | 111M/256M [00:01<00:01, 95.6MB/s]\n",
+            "Downloading pytorch_model.bin:  47%|████▋     | 120M/256M [00:01<00:01, 95.7MB/s]\n",
+            "Downloading pytorch_model.bin:  51%|█████     | 130M/256M [00:01<00:01, 95.8MB/s]\n",
+            "Downloading pytorch_model.bin:  54%|█████▍    | 139M/256M [00:01<00:01, 95.8MB/s]\n",
+            "Downloading pytorch_model.bin:  58%|█████▊    | 148M/256M [00:01<00:01, 95.9MB/s]\n",
+            "Downloading pytorch_model.bin:  61%|██████▏   | 157M/256M [00:01<00:01, 96.1MB/s]\n",
+            "Downloading pytorch_model.bin:  65%|██████▌   | 166M/256M [00:01<00:00, 96.1MB/s]\n",
+            "Downloading pytorch_model.bin:  69%|██████▊   | 175M/256M [00:02<00:00, 96.1MB/s]\n",
+            "Downloading pytorch_model.bin:  72%|███████▏  | 185M/256M [00:02<00:00, 96.2MB/s]\n",
+            "Downloading pytorch_model.bin:  76%|███████▌  | 194M/256M [00:02<00:00, 96.2MB/s]\n",
+            "Downloading pytorch_model.bin:  79%|███████▉  | 203M/256M [00:02<00:00, 96.1MB/s]\n",
+            "Downloading pytorch_model.bin:  83%|████████▎ | 212M/256M [00:02<00:00, 96.1MB/s]\n",
+            "Downloading pytorch_model.bin:  87%|████████▋ | 221M/256M [00:02<00:00, 96.2MB/s]\n",
+            "Downloading pytorch_model.bin:  90%|█████████ | 231M/256M [00:02<00:00, 96.2MB/s]\n",
+            "Downloading pytorch_model.bin:  94%|█████████▍| 240M/256M [00:02<00:00, 96.1MB/s]\n",
+            "Downloading pytorch_model.bin:  97%|█████████▋| 249M/256M [00:02<00:00, 96.0MB/s]\n",
+            "Downloading pytorch_model.bin: 100%|██████████| 256M/256M [00:02<00:00, 93.2MB/s]\n"
+          ]
+        },
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "(RayTrainWorker pid=1223, ip=172.31.85.193) Starting training\n",
+            "(RayTrainWorker pid=1226, ip=172.31.85.193) Starting training\n",
+            "(RayTrainWorker pid=1225, ip=172.31.85.193) Starting training\n",
+            "(RayTrainWorker pid=1224, ip=172.31.85.193) Starting training\n"
+          ]
+        },
+        {
+          "name": "stderr",
+          "output_type": "stream",
+          "text": [
+            "(RayTrainWorker pid=1223, ip=172.31.85.193) ***** Running training *****\n",
+            "(RayTrainWorker pid=1223, ip=172.31.85.193)   Num examples = 8551\n",
+            "(RayTrainWorker pid=1223, ip=172.31.85.193)   Num Epochs = 4\n",
+            "(RayTrainWorker pid=1223, ip=172.31.85.193)   Instantaneous batch size per device = 16\n",
+            "(RayTrainWorker pid=1223, ip=172.31.85.193)   Total train batch size (w. parallel, distributed & accumulation) = 64\n",
+            "(RayTrainWorker pid=1223, ip=172.31.85.193)   Gradient Accumulation steps = 1\n",
+            "(RayTrainWorker pid=1223, ip=172.31.85.193)   Total optimization steps = 2140\n",
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) ***** Running Evaluation *****\n",
+            "(RayTrainWorker pid=1789, ip=172.31.90.137)   Num examples = 1043\n",
+            "(RayTrainWorker pid=1789, ip=172.31.90.137)   Batch size = 16\n"
+          ]
+        },
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) {'loss': 0.5458, 'learning_rate': 1.5000000000000002e-05, 'epoch': 1.0}\n"
+          ]
+        },
+        {
+          "name": "stderr",
+          "output_type": "stream",
+          "text": [
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.\n"
+          ]
+        },
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) {'eval_loss': 0.6037685871124268, 'eval_matthews_correlation': 0.3654892178274207, 'eval_runtime': 0.9847, 'eval_samples_per_second': 276.225, 'eval_steps_per_second': 5.078, 'epoch': 1.0}\n"
+          ]
+        },
+        {
+          "name": "stderr",
+          "output_type": "stream",
+          "text": [
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) Saving model checkpoint to distilbert-base-uncased-finetuned-cola/checkpoint-535\n",
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) Configuration saved in distilbert-base-uncased-finetuned-cola/checkpoint-535/config.json\n",
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) Model weights saved in distilbert-base-uncased-finetuned-cola/checkpoint-535/pytorch_model.bin\n",
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) tokenizer config file saved in distilbert-base-uncased-finetuned-cola/checkpoint-535/tokenizer_config.json\n",
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) Special tokens file saved in distilbert-base-uncased-finetuned-cola/checkpoint-535/special_tokens_map.json\n"
+          ]
+        },
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "Result for HuggingFaceTrainer_5654d_00000:\n",
+            "  _time_this_iter_s: 85.01727724075317\n",
+            "  _timestamp: 1661447753\n",
+            "  _training_iteration: 1\n",
+            "  date: 2022-08-25_10-15-53\n",
+            "  done: false\n",
+            "  epoch: 1.0\n",
+            "  eval_loss: 0.6037685871124268\n",
+            "  eval_matthews_correlation: 0.3654892178274207\n",
+            "  eval_runtime: 0.9847\n",
+            "  eval_samples_per_second: 276.225\n",
+            "  eval_steps_per_second: 5.078\n",
+            "  experiment_id: cee1b96afcf344e89482e3c5e298a412\n",
+            "  hostname: ip-172-31-90-137\n",
+            "  iterations_since_restore: 1\n",
+            "  learning_rate: 1.5000000000000002e-05\n",
+            "  loss: 0.5458\n",
+            "  node_ip: 172.31.90.137\n",
+            "  pid: 1729\n",
+            "  should_checkpoint: true\n",
+            "  step: 535\n",
+            "  time_since_restore: 94.93232989311218\n",
+            "  time_this_iter_s: 94.93232989311218\n",
+            "  time_total_s: 94.93232989311218\n",
+            "  timestamp: 1661447753\n",
+            "  timesteps_since_restore: 0\n",
+            "  training_iteration: 1\n",
+            "  trial_id: 5654d_00000\n",
+            "  warmup_time: 0.0037021636962890625\n",
+            "  \n",
+            "(RayTrainWorker pid=1974, ip=172.31.76.237) {'loss': 0.6225, 'learning_rate': 0.00015000000000000001, 'epoch': 1.0}\n"
+          ]
+        },
+        {
+          "name": "stderr",
+          "output_type": "stream",
+          "text": [
+            "(RayTrainWorker pid=1974, ip=172.31.76.237) ***** Running Evaluation *****\n",
+            "(RayTrainWorker pid=1974, ip=172.31.76.237)   Num examples = 1043\n",
+            "(RayTrainWorker pid=1974, ip=172.31.76.237)   Batch size = 16\n",
+            "(RayTrainWorker pid=1974, ip=172.31.76.237) The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: idx, sentence. If idx, sentence are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.\n"
+          ]
+        },
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "(RayTrainWorker pid=1974, ip=172.31.76.237) {'eval_loss': 0.6492420434951782, 'eval_matthews_correlation': 0.0, 'eval_runtime': 1.0157, 'eval_samples_per_second': 267.792, 'eval_steps_per_second': 4.923, 'epoch': 1.0}\n"
+          ]
+        },
+        {
+          "name": "stderr",
+          "output_type": "stream",
+          "text": [
+            "(RayTrainWorker pid=1974, ip=172.31.76.237) Saving model checkpoint to distilbert-base-uncased-finetuned-cola/checkpoint-535\n",
+            "(RayTrainWorker pid=1974, ip=172.31.76.237) Configuration saved in distilbert-base-uncased-finetuned-cola/checkpoint-535/config.json\n",
+            "(RayTrainWorker pid=1974, ip=172.31.76.237) Model weights saved in distilbert-base-uncased-finetuned-cola/checkpoint-535/pytorch_model.bin\n",
+            "(RayTrainWorker pid=1974, ip=172.31.76.237) tokenizer config file saved in distilbert-base-uncased-finetuned-cola/checkpoint-535/tokenizer_config.json\n",
+            "(RayTrainWorker pid=1974, ip=172.31.76.237) Special tokens file saved in distilbert-base-uncased-finetuned-cola/checkpoint-535/special_tokens_map.json\n"
+          ]
+        },
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "Result for HuggingFaceTrainer_5654d_00001:\n",
+            "  _time_this_iter_s: 84.79700112342834\n",
+            "  _timestamp: 1661447759\n",
+            "  _training_iteration: 1\n",
+            "  date: 2022-08-25_10-16-00\n",
+            "  done: true\n",
+            "  epoch: 1.0\n",
+            "  eval_loss: 0.6492420434951782\n",
+            "  eval_matthews_correlation: 0.0\n",
+            "  eval_runtime: 1.0157\n",
+            "  eval_samples_per_second: 267.792\n",
+            "  eval_steps_per_second: 4.923\n",
+            "  experiment_id: 88145f9344584715a4bd7d018f751b12\n",
+            "  hostname: ip-172-31-76-237\n",
+            "  iterations_since_restore: 1\n",
+            "  learning_rate: 0.00015000000000000001\n",
+            "  loss: 0.6225\n",
+            "  node_ip: 172.31.76.237\n",
+            "  pid: 1805\n",
+            "  should_checkpoint: true\n",
+            "  step: 535\n",
+            "  time_since_restore: 95.24916434288025\n",
+            "  time_this_iter_s: 95.24916434288025\n",
+            "  time_total_s: 95.24916434288025\n",
+            "  timestamp: 1661447760\n",
+            "  timesteps_since_restore: 0\n",
+            "  training_iteration: 1\n",
+            "  trial_id: 5654d_00001\n",
+            "  warmup_time: 0.003660917282104492\n",
+            "  \n",
+            "(RayTrainWorker pid=1483, ip=172.31.85.32) {'loss': 0.6463, 'learning_rate': 0.0015, 'epoch': 1.0}\n"
+          ]
+        },
+        {
+          "name": "stderr",
+          "output_type": "stream",
+          "text": [
+            "(RayTrainWorker pid=1483, ip=172.31.85.32) ***** Running Evaluation *****\n",
+            "(RayTrainWorker pid=1483, ip=172.31.85.32)   Num examples = 1043\n",
+            "(RayTrainWorker pid=1483, ip=172.31.85.32)   Batch size = 16\n",
+            "(RayTrainWorker pid=1483, ip=172.31.85.32) The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.\n"
+          ]
+        },
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "(RayTrainWorker pid=1483, ip=172.31.85.32) {'eval_loss': 0.6586529612541199, 'eval_matthews_correlation': 0.0, 'eval_runtime': 0.9576, 'eval_samples_per_second': 284.05, 'eval_steps_per_second': 5.222, 'epoch': 1.0}\n"
+          ]
+        },
+        {
+          "name": "stderr",
+          "output_type": "stream",
+          "text": [
+            "(RayTrainWorker pid=1483, ip=172.31.85.32) Saving model checkpoint to distilbert-base-uncased-finetuned-cola/checkpoint-535\n",
+            "(RayTrainWorker pid=1483, ip=172.31.85.32) Configuration saved in distilbert-base-uncased-finetuned-cola/checkpoint-535/config.json\n",
+            "(RayTrainWorker pid=1483, ip=172.31.85.32) Model weights saved in distilbert-base-uncased-finetuned-cola/checkpoint-535/pytorch_model.bin\n",
+            "(RayTrainWorker pid=1483, ip=172.31.85.32) tokenizer config file saved in distilbert-base-uncased-finetuned-cola/checkpoint-535/tokenizer_config.json\n",
+            "(RayTrainWorker pid=1483, ip=172.31.85.32) Special tokens file saved in distilbert-base-uncased-finetuned-cola/checkpoint-535/special_tokens_map.json\n"
+          ]
+        },
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "Result for HuggingFaceTrainer_5654d_00002:\n",
+            "  _time_this_iter_s: 84.01720070838928\n",
+            "  _timestamp: 1661447764\n",
+            "  _training_iteration: 1\n",
+            "  date: 2022-08-25_10-16-04\n",
+            "  done: true\n",
+            "  epoch: 1.0\n",
+            "  eval_loss: 0.6586529612541199\n",
+            "  eval_matthews_correlation: 0.0\n",
+            "  eval_runtime: 0.9576\n",
+            "  eval_samples_per_second: 284.05\n",
+            "  eval_steps_per_second: 5.222\n",
+            "  experiment_id: 5f8ab183779d40379d59ea615f9d5411\n",
+            "  hostname: ip-172-31-85-32\n",
+            "  iterations_since_restore: 1\n",
+            "  learning_rate: 0.0015\n",
+            "  loss: 0.6463\n",
+            "  node_ip: 172.31.85.32\n",
+            "  pid: 1322\n",
+            "  should_checkpoint: true\n",
+            "  step: 535\n",
+            "  time_since_restore: 93.76131749153137\n",
+            "  time_this_iter_s: 93.76131749153137\n",
+            "  time_total_s: 93.76131749153137\n",
+            "  timestamp: 1661447764\n",
+            "  timesteps_since_restore: 0\n",
+            "  training_iteration: 1\n",
+            "  trial_id: 5654d_00002\n",
+            "  warmup_time: 0.004533290863037109\n",
+            "  \n"
+          ]
+        },
+        {
+          "name": "stderr",
+          "output_type": "stream",
+          "text": [
+            "(RayTrainWorker pid=1223, ip=172.31.85.193) ***** Running Evaluation *****\n",
+            "(RayTrainWorker pid=1223, ip=172.31.85.193)   Num examples = 1043\n",
+            "(RayTrainWorker pid=1223, ip=172.31.85.193)   Batch size = 16\n",
+            "(RayTrainWorker pid=1223, ip=172.31.85.193) The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: idx, sentence. If idx, sentence are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.\n"
+          ]
+        },
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "(RayTrainWorker pid=1223, ip=172.31.85.193) {'loss': 0.926, 'learning_rate': 0.015, 'epoch': 1.0}\n",
+            "(RayTrainWorker pid=1223, ip=172.31.85.193) {'eval_loss': 0.6529427766799927, 'eval_matthews_correlation': 0.0, 'eval_runtime': 0.9428, 'eval_samples_per_second': 288.51, 'eval_steps_per_second': 5.303, 'epoch': 1.0}\n"
+          ]
+        },
+        {
+          "name": "stderr",
+          "output_type": "stream",
+          "text": [
+            "(RayTrainWorker pid=1223, ip=172.31.85.193) Saving model checkpoint to distilbert-base-uncased-finetuned-cola/checkpoint-535\n",
+            "(RayTrainWorker pid=1223, ip=172.31.85.193) Configuration saved in distilbert-base-uncased-finetuned-cola/checkpoint-535/config.json\n",
+            "(RayTrainWorker pid=1223, ip=172.31.85.193) Model weights saved in distilbert-base-uncased-finetuned-cola/checkpoint-535/pytorch_model.bin\n",
+            "(RayTrainWorker pid=1223, ip=172.31.85.193) tokenizer config file saved in distilbert-base-uncased-finetuned-cola/checkpoint-535/tokenizer_config.json\n",
+            "(RayTrainWorker pid=1223, ip=172.31.85.193) Special tokens file saved in distilbert-base-uncased-finetuned-cola/checkpoint-535/special_tokens_map.json\n"
+          ]
+        },
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "Result for HuggingFaceTrainer_5654d_00003:\n",
+            "  _time_this_iter_s: 89.4301290512085\n",
+            "  _timestamp: 1661447782\n",
+            "  _training_iteration: 1\n",
+            "  date: 2022-08-25_10-16-22\n",
+            "  done: true\n",
+            "  epoch: 1.0\n",
+            "  eval_loss: 0.6529427766799927\n",
+            "  eval_matthews_correlation: 0.0\n",
+            "  eval_runtime: 0.9428\n",
+            "  eval_samples_per_second: 288.51\n",
+            "  eval_steps_per_second: 5.303\n",
+            "  experiment_id: 8495977eeefd405fa4d9c1ea8fa735e1\n",
+            "  hostname: ip-172-31-85-193\n",
+            "  iterations_since_restore: 1\n",
+            "  learning_rate: 0.015\n",
+            "  loss: 0.926\n",
+            "  node_ip: 172.31.85.193\n",
+            "  pid: 1060\n",
+            "  should_checkpoint: true\n",
+            "  step: 535\n",
+            "  time_since_restore: 99.36774587631226\n",
+            "  time_this_iter_s: 99.36774587631226\n",
+            "  time_total_s: 99.36774587631226\n",
+            "  timestamp: 1661447782\n",
+            "  timesteps_since_restore: 0\n",
+            "  training_iteration: 1\n",
+            "  trial_id: 5654d_00003\n",
+            "  warmup_time: 0.004132509231567383\n",
+            "  \n"
+          ]
+        },
+        {
+          "name": "stderr",
+          "output_type": "stream",
+          "text": [
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) ***** Running Evaluation *****\n",
+            "(RayTrainWorker pid=1789, ip=172.31.90.137)   Num examples = 1043\n",
+            "(RayTrainWorker pid=1789, ip=172.31.90.137)   Batch size = 16\n",
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.\n"
+          ]
+        },
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) {'loss': 0.3841, 'learning_rate': 1e-05, 'epoch': 2.0}\n",
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) {'eval_loss': 0.5994958281517029, 'eval_matthews_correlation': 0.4573244914254411, 'eval_runtime': 0.9442, 'eval_samples_per_second': 288.066, 'eval_steps_per_second': 5.295, 'epoch': 2.0}\n"
+          ]
+        },
+        {
+          "name": "stderr",
+          "output_type": "stream",
+          "text": [
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) Saving model checkpoint to distilbert-base-uncased-finetuned-cola/checkpoint-1070\n",
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) Configuration saved in distilbert-base-uncased-finetuned-cola/checkpoint-1070/config.json\n",
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) Model weights saved in distilbert-base-uncased-finetuned-cola/checkpoint-1070/pytorch_model.bin\n",
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) tokenizer config file saved in distilbert-base-uncased-finetuned-cola/checkpoint-1070/tokenizer_config.json\n",
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) Special tokens file saved in distilbert-base-uncased-finetuned-cola/checkpoint-1070/special_tokens_map.json\n"
+          ]
+        },
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "Result for HuggingFaceTrainer_5654d_00000:\n",
+            "  _time_this_iter_s: 76.82565689086914\n",
+            "  _timestamp: 1661447830\n",
+            "  _training_iteration: 2\n",
+            "  date: 2022-08-25_10-17-10\n",
+            "  done: false\n",
+            "  epoch: 2.0\n",
+            "  eval_loss: 0.5994958281517029\n",
+            "  eval_matthews_correlation: 0.4573244914254411\n",
+            "  eval_runtime: 0.9442\n",
+            "  eval_samples_per_second: 288.066\n",
+            "  eval_steps_per_second: 5.295\n",
+            "  experiment_id: cee1b96afcf344e89482e3c5e298a412\n",
+            "  hostname: ip-172-31-90-137\n",
+            "  iterations_since_restore: 2\n",
+            "  learning_rate: 1.0e-05\n",
+            "  loss: 0.3841\n",
+            "  node_ip: 172.31.90.137\n",
+            "  pid: 1729\n",
+            "  should_checkpoint: true\n",
+            "  step: 1070\n",
+            "  time_since_restore: 171.76071190834045\n",
+            "  time_this_iter_s: 76.82838201522827\n",
+            "  time_total_s: 171.76071190834045\n",
+            "  timestamp: 1661447830\n",
+            "  timesteps_since_restore: 0\n",
+            "  training_iteration: 2\n",
+            "  trial_id: 5654d_00000\n",
+            "  warmup_time: 0.0037021636962890625\n",
+            "  \n"
+          ]
+        },
+        {
+          "name": "stderr",
+          "output_type": "stream",
+          "text": [
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) ***** Running Evaluation *****\n",
+            "(RayTrainWorker pid=1789, ip=172.31.90.137)   Num examples = 1043\n",
+            "(RayTrainWorker pid=1789, ip=172.31.90.137)   Batch size = 16\n",
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.\n"
+          ]
+        },
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) {'loss': 0.2687, 'learning_rate': 5e-06, 'epoch': 3.0}\n",
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) {'eval_loss': 0.6935313940048218, 'eval_matthews_correlation': 0.5300538425561, 'eval_runtime': 1.0176, 'eval_samples_per_second': 267.305, 'eval_steps_per_second': 4.914, 'epoch': 3.0}\n"
+          ]
+        },
+        {
+          "name": "stderr",
+          "output_type": "stream",
+          "text": [
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) Saving model checkpoint to distilbert-base-uncased-finetuned-cola/checkpoint-1605\n",
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) Configuration saved in distilbert-base-uncased-finetuned-cola/checkpoint-1605/config.json\n",
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) Model weights saved in distilbert-base-uncased-finetuned-cola/checkpoint-1605/pytorch_model.bin\n",
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) tokenizer config file saved in distilbert-base-uncased-finetuned-cola/checkpoint-1605/tokenizer_config.json\n",
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) Special tokens file saved in distilbert-base-uncased-finetuned-cola/checkpoint-1605/special_tokens_map.json\n"
+          ]
+        },
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "Result for HuggingFaceTrainer_5654d_00000:\n",
+            "  _time_this_iter_s: 76.47252488136292\n",
+            "  _timestamp: 1661447906\n",
+            "  _training_iteration: 3\n",
+            "  date: 2022-08-25_10-18-26\n",
+            "  done: false\n",
+            "  epoch: 3.0\n",
+            "  eval_loss: 0.6935313940048218\n",
+            "  eval_matthews_correlation: 0.5300538425561\n",
+            "  eval_runtime: 1.0176\n",
+            "  eval_samples_per_second: 267.305\n",
+            "  eval_steps_per_second: 4.914\n",
+            "  experiment_id: cee1b96afcf344e89482e3c5e298a412\n",
+            "  hostname: ip-172-31-90-137\n",
+            "  iterations_since_restore: 3\n",
+            "  learning_rate: 5.0e-06\n",
+            "  loss: 0.2687\n",
+            "  node_ip: 172.31.90.137\n",
+            "  pid: 1729\n",
+            "  should_checkpoint: true\n",
+            "  step: 1605\n",
+            "  time_since_restore: 248.23273348808289\n",
+            "  time_this_iter_s: 76.47202157974243\n",
+            "  time_total_s: 248.23273348808289\n",
+            "  timestamp: 1661447906\n",
+            "  timesteps_since_restore: 0\n",
+            "  training_iteration: 3\n",
+            "  trial_id: 5654d_00000\n",
+            "  warmup_time: 0.0037021636962890625\n",
+            "  \n"
+          ]
+        },
+        {
+          "name": "stderr",
+          "output_type": "stream",
+          "text": [
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) Saving model checkpoint to distilbert-base-uncased-finetuned-cola/checkpoint-2140\n",
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) Configuration saved in distilbert-base-uncased-finetuned-cola/checkpoint-2140/config.json\n",
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) Model weights saved in distilbert-base-uncased-finetuned-cola/checkpoint-2140/pytorch_model.bin\n",
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) tokenizer config file saved in distilbert-base-uncased-finetuned-cola/checkpoint-2140/tokenizer_config.json\n",
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) Special tokens file saved in distilbert-base-uncased-finetuned-cola/checkpoint-2140/special_tokens_map.json\n",
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) ***** Running Evaluation *****\n",
+            "(RayTrainWorker pid=1789, ip=172.31.90.137)   Num examples = 1043\n",
+            "(RayTrainWorker pid=1789, ip=172.31.90.137)   Batch size = 16\n",
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.\n"
+          ]
+        },
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) {'loss': 0.1958, 'learning_rate': 0.0, 'epoch': 4.0}\n"
+          ]
+        },
+        {
+          "name": "stderr",
+          "output_type": "stream",
+          "text": [
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) Saving model checkpoint to distilbert-base-uncased-finetuned-cola/checkpoint-2140\n",
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) Configuration saved in distilbert-base-uncased-finetuned-cola/checkpoint-2140/config.json\n"
+          ]
+        },
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) {'eval_loss': 0.8064090609550476, 'eval_matthews_correlation': 0.5322860764824153, 'eval_runtime': 1.0006, 'eval_samples_per_second': 271.827, 'eval_steps_per_second': 4.997, 'epoch': 4.0}\n"
+          ]
+        },
+        {
+          "name": "stderr",
+          "output_type": "stream",
+          "text": [
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) Model weights saved in distilbert-base-uncased-finetuned-cola/checkpoint-2140/pytorch_model.bin\n",
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) tokenizer config file saved in distilbert-base-uncased-finetuned-cola/checkpoint-2140/tokenizer_config.json\n",
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) Special tokens file saved in distilbert-base-uncased-finetuned-cola/checkpoint-2140/special_tokens_map.json\n",
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) \n",
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) \n",
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) Training completed. Do not forget to share your model on huggingface.co/models =)\n",
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) \n",
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) \n"
+          ]
+        },
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "(RayTrainWorker pid=1789, ip=172.31.90.137) {'train_runtime': 329.1948, 'train_samples_per_second': 103.902, 'train_steps_per_second': 6.501, 'train_loss': 0.34860724689804506, 'epoch': 4.0}\n",
+            "Result for HuggingFaceTrainer_5654d_00000:\n",
+            "  _time_this_iter_s: 98.92064905166626\n",
+            "  _timestamp: 1661448005\n",
+            "  _training_iteration: 4\n",
+            "  date: 2022-08-25_10-20-05\n",
+            "  done: true\n",
+            "  epoch: 4.0\n",
+            "  eval_loss: 0.8064090609550476\n",
+            "  eval_matthews_correlation: 0.5322860764824153\n",
+            "  eval_runtime: 1.0006\n",
+            "  eval_samples_per_second: 271.827\n",
+            "  eval_steps_per_second: 4.997\n",
+            "  experiment_id: cee1b96afcf344e89482e3c5e298a412\n",
+            "  hostname: ip-172-31-90-137\n",
+            "  iterations_since_restore: 4\n",
+            "  learning_rate: 0.0\n",
+            "  loss: 0.1958\n",
+            "  node_ip: 172.31.90.137\n",
+            "  pid: 1729\n",
+            "  should_checkpoint: true\n",
+            "  step: 2140\n",
+            "  time_since_restore: 347.1705844402313\n",
+            "  time_this_iter_s: 98.93785095214844\n",
+            "  time_total_s: 347.1705844402313\n",
+            "  timestamp: 1661448005\n",
+            "  timesteps_since_restore: 0\n",
+            "  train_loss: 0.34860724689804506\n",
+            "  train_runtime: 329.1948\n",
+            "  train_samples_per_second: 103.902\n",
+            "  train_steps_per_second: 6.501\n",
+            "  training_iteration: 4\n",
+            "  trial_id: 5654d_00000\n",
+            "  warmup_time: 0.0037021636962890625\n",
+            "  \n"
+          ]
+        },
+        {
+          "name": "stderr",
+          "output_type": "stream",
+          "text": [
+            "2022-08-25 10:20:13,409\tINFO tune.py:758 -- Total run time: 361.90 seconds (361.74 seconds for the tuning loop).\n"
+          ]
+        }
+      ],
+      "source": [
+        "tune_results = tuner.fit()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "We can view the results of the tuning run as a dataframe, and obtain the best result."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 19,
+      "metadata": {},
+      "outputs": [
+        {
+          "data": {
+            "text/html": [
+              "<div>\n",
+              "<style scoped>\n",
+              "    .dataframe tbody tr th:only-of-type {\n",
+              "        vertical-align: middle;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe tbody tr th {\n",
+              "        vertical-align: top;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe thead th {\n",
+              "        text-align: right;\n",
+              "    }\n",
+              "</style>\n",
+              "<table border=\"1\" class=\"dataframe\">\n",
+              "  <thead>\n",
+              "    <tr style=\"text-align: right;\">\n",
+              "      <th></th>\n",
+              "      <th>loss</th>\n",
+              "      <th>learning_rate</th>\n",
+              "      <th>epoch</th>\n",
+              "      <th>step</th>\n",
+              "      <th>eval_loss</th>\n",
+              "      <th>eval_matthews_correlation</th>\n",
+              "      <th>eval_runtime</th>\n",
+              "      <th>eval_samples_per_second</th>\n",
+              "      <th>eval_steps_per_second</th>\n",
+              "      <th>_timestamp</th>\n",
+              "      <th>...</th>\n",
+              "      <th>pid</th>\n",
+              "      <th>hostname</th>\n",
+              "      <th>node_ip</th>\n",
+              "      <th>time_since_restore</th>\n",
+              "      <th>timesteps_since_restore</th>\n",
+              "      <th>iterations_since_restore</th>\n",
+              "      <th>warmup_time</th>\n",
+              "      <th>config/trainer_init_config/epochs</th>\n",
+              "      <th>config/trainer_init_config/learning_rate</th>\n",
+              "      <th>logdir</th>\n",
+              "    </tr>\n",
+              "  </thead>\n",
+              "  <tbody>\n",
+              "    <tr>\n",
+              "      <th>1</th>\n",
+              "      <td>0.6225</td>\n",
+              "      <td>0.00015</td>\n",
+              "      <td>1.0</td>\n",
+              "      <td>535</td>\n",
+              "      <td>0.649242</td>\n",
+              "      <td>0.000000</td>\n",
+              "      <td>1.0157</td>\n",
+              "      <td>267.792</td>\n",
+              "      <td>4.923</td>\n",
+              "      <td>1661447759</td>\n",
+              "      <td>...</td>\n",
+              "      <td>1805</td>\n",
+              "      <td>ip-172-31-76-237</td>\n",
+              "      <td>172.31.76.237</td>\n",
+              "      <td>95.249164</td>\n",
+              "      <td>0</td>\n",
+              "      <td>1</td>\n",
+              "      <td>0.003661</td>\n",
+              "      <td>4</td>\n",
+              "      <td>0.00020</td>\n",
+              "      <td>/home/ray/ray_results/HuggingFaceTrainer_2022-...</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>3</th>\n",
+              "      <td>0.9260</td>\n",
+              "      <td>0.01500</td>\n",
+              "      <td>1.0</td>\n",
+              "      <td>535</td>\n",
+              "      <td>0.652943</td>\n",
+              "      <td>0.000000</td>\n",
+              "      <td>0.9428</td>\n",
+              "      <td>288.510</td>\n",
+              "      <td>5.303</td>\n",
+              "      <td>1661447782</td>\n",
+              "      <td>...</td>\n",
+              "      <td>1060</td>\n",
+              "      <td>ip-172-31-85-193</td>\n",
+              "      <td>172.31.85.193</td>\n",
+              "      <td>99.367746</td>\n",
+              "      <td>0</td>\n",
+              "      <td>1</td>\n",
+              "      <td>0.004133</td>\n",
+              "      <td>4</td>\n",
+              "      <td>0.02000</td>\n",
+              "      <td>/home/ray/ray_results/HuggingFaceTrainer_2022-...</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>2</th>\n",
+              "      <td>0.6463</td>\n",
+              "      <td>0.00150</td>\n",
+              "      <td>1.0</td>\n",
+              "      <td>535</td>\n",
+              "      <td>0.658653</td>\n",
+              "      <td>0.000000</td>\n",
+              "      <td>0.9576</td>\n",
+              "      <td>284.050</td>\n",
+              "      <td>5.222</td>\n",
+              "      <td>1661447764</td>\n",
+              "      <td>...</td>\n",
+              "      <td>1322</td>\n",
+              "      <td>ip-172-31-85-32</td>\n",
+              "      <td>172.31.85.32</td>\n",
+              "      <td>93.761317</td>\n",
+              "      <td>0</td>\n",
+              "      <td>1</td>\n",
+              "      <td>0.004533</td>\n",
+              "      <td>4</td>\n",
+              "      <td>0.00200</td>\n",
+              "      <td>/home/ray/ray_results/HuggingFaceTrainer_2022-...</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>0</th>\n",
+              "      <td>0.1958</td>\n",
+              "      <td>0.00000</td>\n",
+              "      <td>4.0</td>\n",
+              "      <td>2140</td>\n",
+              "      <td>0.806409</td>\n",
+              "      <td>0.532286</td>\n",
+              "      <td>1.0006</td>\n",
+              "      <td>271.827</td>\n",
+              "      <td>4.997</td>\n",
+              "      <td>1661448005</td>\n",
+              "      <td>...</td>\n",
+              "      <td>1729</td>\n",
+              "      <td>ip-172-31-90-137</td>\n",
+              "      <td>172.31.90.137</td>\n",
+              "      <td>347.170584</td>\n",
+              "      <td>0</td>\n",
+              "      <td>4</td>\n",
+              "      <td>0.003702</td>\n",
+              "      <td>4</td>\n",
+              "      <td>0.00002</td>\n",
+              "      <td>/home/ray/ray_results/HuggingFaceTrainer_2022-...</td>\n",
+              "    </tr>\n",
+              "  </tbody>\n",
+              "</table>\n",
+              "<p>4 rows × 33 columns</p>\n",
+              "</div>"
+            ],
+            "text/plain": [
+              "     loss  learning_rate  epoch  step  eval_loss  eval_matthews_correlation  \\\n",
+              "1  0.6225        0.00015    1.0   535   0.649242                   0.000000   \n",
+              "3  0.9260        0.01500    1.0   535   0.652943                   0.000000   \n",
+              "2  0.6463        0.00150    1.0   535   0.658653                   0.000000   \n",
+              "0  0.1958        0.00000    4.0  2140   0.806409                   0.532286   \n",
+              "\n",
+              "   eval_runtime  eval_samples_per_second  eval_steps_per_second  _timestamp  \\\n",
+              "1        1.0157                  267.792                  4.923  1661447759   \n",
+              "3        0.9428                  288.510                  5.303  1661447782   \n",
+              "2        0.9576                  284.050                  5.222  1661447764   \n",
+              "0        1.0006                  271.827                  4.997  1661448005   \n",
+              "\n",
+              "   ...   pid          hostname        node_ip  time_since_restore  \\\n",
+              "1  ...  1805  ip-172-31-76-237  172.31.76.237           95.249164   \n",
+              "3  ...  1060  ip-172-31-85-193  172.31.85.193           99.367746   \n",
+              "2  ...  1322   ip-172-31-85-32   172.31.85.32           93.761317   \n",
+              "0  ...  1729  ip-172-31-90-137  172.31.90.137          347.170584   \n",
+              "\n",
+              "   timesteps_since_restore  iterations_since_restore  warmup_time  \\\n",
+              "1                        0                         1     0.003661   \n",
+              "3                        0                         1     0.004133   \n",
+              "2                        0                         1     0.004533   \n",
+              "0                        0                         4     0.003702   \n",
+              "\n",
+              "   config/trainer_init_config/epochs config/trainer_init_config/learning_rate  \\\n",
+              "1                                  4                                  0.00020   \n",
+              "3                                  4                                  0.02000   \n",
+              "2                                  4                                  0.00200   \n",
+              "0                                  4                                  0.00002   \n",
+              "\n",
+              "                                              logdir  \n",
+              "1  /home/ray/ray_results/HuggingFaceTrainer_2022-...  \n",
+              "3  /home/ray/ray_results/HuggingFaceTrainer_2022-...  \n",
+              "2  /home/ray/ray_results/HuggingFaceTrainer_2022-...  \n",
+              "0  /home/ray/ray_results/HuggingFaceTrainer_2022-...  \n",
+              "\n",
+              "[4 rows x 33 columns]"
+            ]
+          },
+          "execution_count": 19,
+          "metadata": {},
+          "output_type": "execute_result"
+        }
+      ],
+      "source": [
+        "tune_results.get_dataframe().sort_values(\"eval_loss\")"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 20,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "best_result = tune_results.get_best_result()"
+      ]
+    },
     {
       "cell_type": "markdown",
       "metadata": {},
@@ -1202,7 +2050,7 @@
     },
     {
       "cell_type": "code",
-      "execution_count": 18,
+      "execution_count": 21,
       "metadata": {
         "colab": {
           "base_uri": "https://localhost:8080/",
@@ -1216,156 +2064,36 @@
           "name": "stderr",
           "output_type": "stream",
           "text": [
-            "Map Progress (2 actors 1 pending):   0%|          | 0/1 [00:12<?, ?it/s]\u001b[2m\u001b[36m(BlockWorker pid=735)\u001b[0m 2022-05-12 18:36:08.491769: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected\n",
-            "Map Progress (2 actors 1 pending): 100%|██████████| 1/1 [00:16<00:00, 16.63s/it]\n"
+            "Map_Batches: 100%|██████████| 1/1 [00:00<00:00, 12.41it/s]\n",
+            "Map_Batches: 100%|██████████| 1/1 [00:00<00:00,  7.46it/s]\n",
+            "Map Progress (1 actors 1 pending): 100%|██████████| 1/1 [00:18<00:00, 18.46s/it]\n"
           ]
         },
         {
-          "data": {
-            "text/html": [
-              "\n",
-              "  <div id=\"df-6bcebc1c-5de9-4e2b-802f-7d04902ab976\">\n",
-              "    <div class=\"colab-df-container\">\n",
-              "      <div>\n",
-              "<style scoped>\n",
-              "    .dataframe tbody tr th:only-of-type {\n",
-              "        vertical-align: middle;\n",
-              "    }\n",
-              "\n",
-              "    .dataframe tbody tr th {\n",
-              "        vertical-align: top;\n",
-              "    }\n",
-              "\n",
-              "    .dataframe thead th {\n",
-              "        text-align: right;\n",
-              "    }\n",
-              "</style>\n",
-              "<table border=\"1\" class=\"dataframe\">\n",
-              "  <thead>\n",
-              "    <tr style=\"text-align: right;\">\n",
-              "      <th></th>\n",
-              "      <th>label</th>\n",
-              "      <th>score</th>\n",
-              "    </tr>\n",
-              "  </thead>\n",
-              "  <tbody>\n",
-              "    <tr>\n",
-              "      <th>0</th>\n",
-              "      <td>LABEL_1</td>\n",
-              "      <td>0.998539</td>\n",
-              "    </tr>\n",
-              "    <tr>\n",
-              "      <th>1</th>\n",
-              "      <td>LABEL_1</td>\n",
-              "      <td>0.997706</td>\n",
-              "    </tr>\n",
-              "    <tr>\n",
-              "      <th>2</th>\n",
-              "      <td>LABEL_1</td>\n",
-              "      <td>0.998476</td>\n",
-              "    </tr>\n",
-              "    <tr>\n",
-              "      <th>3</th>\n",
-              "      <td>LABEL_1</td>\n",
-              "      <td>0.998498</td>\n",
-              "    </tr>\n",
-              "    <tr>\n",
-              "      <th>4</th>\n",
-              "      <td>LABEL_0</td>\n",
-              "      <td>0.533578</td>\n",
-              "    </tr>\n",
-              "  </tbody>\n",
-              "</table>\n",
-              "</div>\n",
-              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-6bcebc1c-5de9-4e2b-802f-7d04902ab976')\"\n",
-              "              title=\"Convert this dataframe to an interactive table.\"\n",
-              "              style=\"display:none;\">\n",
-              "        \n",
-              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
-              "       width=\"24px\">\n",
-              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
-              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
-              "  </svg>\n",
-              "      </button>\n",
-              "      \n",
-              "  <style>\n",
-              "    .colab-df-container {\n",
-              "      display:flex;\n",
-              "      flex-wrap:wrap;\n",
-              "      gap: 12px;\n",
-              "    }\n",
-              "\n",
-              "    .colab-df-convert {\n",
-              "      background-color: #E8F0FE;\n",
-              "      border: none;\n",
-              "      border-radius: 50%;\n",
-              "      cursor: pointer;\n",
-              "      display: none;\n",
-              "      fill: #1967D2;\n",
-              "      height: 32px;\n",
-              "      padding: 0 0 0 0;\n",
-              "      width: 32px;\n",
-              "    }\n",
-              "\n",
-              "    .colab-df-convert:hover {\n",
-              "      background-color: #E2EBFA;\n",
-              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
-              "      fill: #174EA6;\n",
-              "    }\n",
-              "\n",
-              "    [theme=dark] .colab-df-convert {\n",
-              "      background-color: #3B4455;\n",
-              "      fill: #D2E3FC;\n",
-              "    }\n",
-              "\n",
-              "    [theme=dark] .colab-df-convert:hover {\n",
-              "      background-color: #434B5C;\n",
-              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
-              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
-              "      fill: #FFFFFF;\n",
-              "    }\n",
-              "  </style>\n",
-              "\n",
-              "      <script>\n",
-              "        const buttonEl =\n",
-              "          document.querySelector('#df-6bcebc1c-5de9-4e2b-802f-7d04902ab976 button.colab-df-convert');\n",
-              "        buttonEl.style.display =\n",
-              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
-              "\n",
-              "        async function convertToInteractive(key) {\n",
-              "          const element = document.querySelector('#df-6bcebc1c-5de9-4e2b-802f-7d04902ab976');\n",
-              "          const dataTable =\n",
-              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
-              "                                                     [key], {});\n",
-              "          if (!dataTable) return;\n",
-              "\n",
-              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
-              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
-              "            + ' to learn more about interactive tables.';\n",
-              "          element.innerHTML = '';\n",
-              "          dataTable['output_type'] = 'display_data';\n",
-              "          await google.colab.output.renderOutput(dataTable, element);\n",
-              "          const docLink = document.createElement('div');\n",
-              "          docLink.innerHTML = docLinkHtml;\n",
-              "          element.appendChild(docLink);\n",
-              "        }\n",
-              "      </script>\n",
-              "    </div>\n",
-              "  </div>\n",
-              "  "
-            ],
-            "text/plain": [
-              "     label     score\n",
-              "0  LABEL_1  0.998539\n",
-              "1  LABEL_1  0.997706\n",
-              "2  LABEL_1  0.998476\n",
-              "3  LABEL_1  0.998498\n",
-              "4  LABEL_0  0.533578"
-            ]
-          },
-          "execution_count": 18,
-          "metadata": {},
-          "output_type": "execute_result"
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "{'label': 'LABEL_1', 'score': 0.6822417974472046}\n",
+            "{'label': 'LABEL_1', 'score': 0.6822402477264404}\n",
+            "{'label': 'LABEL_1', 'score': 0.6822407841682434}\n",
+            "{'label': 'LABEL_1', 'score': 0.6822386980056763}\n",
+            "{'label': 'LABEL_1', 'score': 0.6822428107261658}\n",
+            "{'label': 'LABEL_1', 'score': 0.6822453737258911}\n",
+            "{'label': 'LABEL_1', 'score': 0.6822437047958374}\n",
+            "{'label': 'LABEL_1', 'score': 0.6822428703308105}\n",
+            "{'label': 'LABEL_1', 'score': 0.6822431683540344}\n",
+            "{'label': 'LABEL_1', 'score': 0.6822426915168762}\n",
+            "{'label': 'LABEL_1', 'score': 0.6822447776794434}\n",
+            "{'label': 'LABEL_1', 'score': 0.6822456121444702}\n",
+            "{'label': 'LABEL_1', 'score': 0.6822471022605896}\n",
+            "{'label': 'LABEL_1', 'score': 0.6822477579116821}\n",
+            "{'label': 'LABEL_1', 'score': 0.682244598865509}\n",
+            "{'label': 'LABEL_1', 'score': 0.6822422742843628}\n",
+            "{'label': 'LABEL_1', 'score': 0.6822470426559448}\n",
+            "{'label': 'LABEL_1', 'score': 0.6822417378425598}\n",
+            "{'label': 'LABEL_1', 'score': 0.6822449564933777}\n",
+            "{'label': 'LABEL_1', 'score': 0.682239294052124}\n"
+          ]
         }
       ],
       "source": [
@@ -1373,18 +2101,13 @@
         "from ray.train.batch_predictor import BatchPredictor\n",
         "import pandas as pd\n",
         "\n",
-        "sentences = ['Bill whistled past the house.',\n",
-        "  'The car honked its way down the road.',\n",
-        "  'Bill pushed Harry off the sofa.',\n",
-        "  'the kittens yawned awake and played.',\n",
-        "  'I demand that the more John eats, the more he pay.']\n",
         "predictor = BatchPredictor.from_checkpoint(\n",
-        "    checkpoint=result.checkpoint,\n",
+        "    checkpoint=best_result.checkpoint,\n",
         "    predictor_cls=HuggingFacePredictor,\n",
         "    task=\"text-classification\",\n",
+        "    device=0 if use_gpu else -1,  # -1 is CPU, otherwise device index\n",
         ")\n",
-        "data = ray.data.from_pandas(pd.DataFrame(sentences, columns=[\"sentence\"]))\n",
-        "prediction = predictor.predict(data)\n",
+        "prediction = predictor.predict(ray_datasets[\"test\"].map_batches(lambda x: x[[\"sentence\"]]), num_gpus_per_worker=int(use_gpu))\n",
         "prediction.show()"
       ]
     },
@@ -1532,7 +2255,7 @@
       "provenance": []
     },
     "kernelspec": {
-      "display_name": "Python 3.9.12 ('.venv': venv)",
+      "display_name": "Python 3.8.10 ('venv': venv)",
       "language": "python",
       "name": "python3"
     },
@@ -1546,11 +2269,11 @@
       "name": "python",
       "nbconvert_exporter": "python",
       "pygments_lexer": "ipython3",
-      "version": "3.9.12"
+      "version": "3.8.10"
     },
     "vscode": {
       "interpreter": {
-        "hash": "a658351b4133f922c5967ed6133cfc05c9f16c53a5161e5843ace3f528fccaf5"
+        "hash": "3c0d54d489a08ae47a06eae2fd00ff032d6cddb527c382959b7b2575f6a8167f"
       }
     }
   },

Python version:	3.8.5
Ray version:	2.0.0
Dashboard:	http://session-i8ddtfaxhwypbvnyb9uzg7xs.i.anyscaleuserdata-staging.com/auth/?token=agh0_CkcwRQIhAJXwvxwq31GryaWthvXGCXZebsijbuqi7qL2pCa5uROOAiBGjzsyXAJFHLlaEI9zSlNI8ewtghKg5UV3t8NmlxuMcRJmEiCtvjcKE0VPiU7iQx51P9oPQjfpo5g1RJXccVSS5005cBgCIgNuL2E6DAj9xazjBhDwj4veAUIMCP3ClJgGEPCPi94B-gEeChxzZXNfaThERFRmQVhId1lwYlZueWI5dVpnN3hT&redirect_to=dashboard
Trial name	status	loc	iter	total time (s)	loss	learning_rate	epoch
Trial name	status	loc	iter	total time (s)	loss	learning_rate	epoch
HuggingFaceTrainer_bb9dd_00000	TERMINATED	172.28.0.2:419	5	222.391	0.1575	1.30841e-06	5
HuggingFaceTrainer_c1ff5_00000	TERMINATED	172.31.90.137:947	2	200.217	0.3886	0	2
Trial name	status	loc	trainer_init_conf...	iter	total time (s)	loss	learning_rate	epoch
HuggingFaceTrainer_5654d_00000	TERMINATED	172.31.90.137:1729	2e-05	4	347.171	0.1958	0	4
HuggingFaceTrainer_5654d_00001	TERMINATED	172.31.76.237:1805	0.0002	1	95.2492	0.6225	0.00015	1
HuggingFaceTrainer_5654d_00002	TERMINATED	172.31.85.32:1322	0.002	1	93.7613	0.6463	0.0015	1
HuggingFaceTrainer_5654d_00003	TERMINATED	172.31.85.193:1060	0.02	1	99.3677	0.926	0.015	1
	loss	learning_rate	epoch	step	eval_loss	eval_matthews_correlation	eval_runtime	eval_samples_per_second	eval_steps_per_second	_timestamp	...	pid	hostname	node_ip	time_since_restore	iterations_since_restore	warmup_time	config/trainer_init_config/epochs	config/trainer_init_config/learning_rate	logdir
1	0.6225	0.00015	1.0	535	0.649242	0.000000	1.0157	267.792	4.923	1661447759	...	1805	ip-172-31-76-237	172.31.76.237	95.249164	1	0.003661	4	0.00020	/home/ray/ray_results/HuggingFaceTrainer_2022-...
3	0.9260	0.01500	1.0	535	0.652943	0.000000	0.9428	288.510	5.303	1661447782	...	1060	ip-172-31-85-193	172.31.85.193	99.367746	1	0.004133	4	0.02000	/home/ray/ray_results/HuggingFaceTrainer_2022-...
2	0.6463	0.00150	1.0	535	0.658653	0.000000	0.9576	284.050	5.222	1661447764	...	1322	ip-172-31-85-32	172.31.85.32	93.761317	1	0.004533	4	0.00200	/home/ray/ray_results/HuggingFaceTrainer_2022-...
0	0.1958	0.00000	4.0	2140	0.806409	0.532286	1.0006	271.827	4.997	1661448005	...	1729	ip-172-31-90-137	172.31.90.137	347.170584	4	0.003702	4	0.00002	/home/ray/ray_results/HuggingFaceTrainer_2022-...
	label	score
0	LABEL_1	0.998539
1	LABEL_1	0.997706
2	LABEL_1	0.998476
3	LABEL_1	0.998498
4	LABEL_0	0.533578