diff --git a/doc/source/_toc.yml b/doc/source/_toc.yml
index db4a0a42d..b87e1297e 100644
--- a/doc/source/_toc.yml
+++ b/doc/source/_toc.yml
@@ -34,6 +34,7 @@ parts:
- file: ray-air/examples/rl_serving_example
- file: ray-air/examples/rl_online_example
- file: ray-air/examples/rl_offline_example
+ - file: ray-air/examples/feast_example
- file: ray-air/package-ref
- caption: AIR Libraries
diff --git a/doc/source/ray-air/examples/feast_example.ipynb b/doc/source/ray-air/examples/feast_example.ipynb
new file mode 100644
index 000000000..6404fcd49
--- /dev/null
+++ b/doc/source/ray-air/examples/feast_example.ipynb
@@ -0,0 +1,1510 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Integrate Ray Air with Feast feature store"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# !pip install feast==0.20.1 ray[air]>=1.13 xgboost_ray"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "INyNIaeB1Kza"
+ },
+ "source": [
+ "In this example, we showcase how to use Ray Air with Feast feature store, leveraging both historical features for training a model and online features for inference.\n",
+ "\n",
+ "The task is adapted from [Feast credit scoring tutorial](https://github.com/feast-dev/feast-aws-credit-scoring-tutorial). In this example, we train a xgboost model and run some prediction on an incoming loan request to see if it is approved or rejected. "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "sBC9CCrpzQLF"
+ },
+ "source": [
+ "Let's first set up our workspace and prepare the data to work with."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {
+ "id": "DcPIskZlzSal"
+ },
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "WORKING_DIR = os.path.expanduser(\"~/ray-air-feast-example/\")\n",
+ "%env WORKING_DIR=$WORKING_DIR"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "BcyCKjV3zTCK",
+ "outputId": "afdfa24d-e5ce-49db-c904-e961e1eb910c"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "--2022-06-02 14:22:50-- https://github.com/ray-project/air-sample-data/raw/main/air-feast-example.zip\n",
+ "Resolving github.com (github.com)... 192.30.255.113\n",
+ "Connecting to github.com (github.com)|192.30.255.113|:443... connected.\n",
+ "HTTP request sent, awaiting response... 302 Found\n",
+ "Location: https://raw.githubusercontent.com/ray-project/air-sample-data/main/air-feast-example.zip [following]\n",
+ "--2022-06-02 14:22:50-- https://raw.githubusercontent.com/ray-project/air-sample-data/main/air-feast-example.zip\n",
+ "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.108.133, ...\n",
+ "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.\n",
+ "HTTP request sent, awaiting response... 200 OK\n",
+ "Length: 23715107 (23M) [application/zip]\n",
+ "Saving to: ‘air-feast-example.zip’\n",
+ "\n",
+ "air-feast-example.z 100%[===================>] 22.62M 114MB/s in 0.2s \n",
+ "\n",
+ "2022-06-02 14:22:51 (114 MB/s) - ‘air-feast-example.zip’ saved [23715107/23715107]\n",
+ "\n",
+ "Archive: air-feast-example.zip\n",
+ " creating: air-feast-example/\n",
+ " creating: air-feast-example/feature_repo/\n",
+ " inflating: air-feast-example/feature_repo/.DS_Store \n",
+ " extracting: air-feast-example/feature_repo/__init__.py \n",
+ " inflating: air-feast-example/feature_repo/features.py \n",
+ " creating: air-feast-example/feature_repo/data/\n",
+ " inflating: air-feast-example/feature_repo/data/.DS_Store \n",
+ " inflating: air-feast-example/feature_repo/data/credit_history_sample.csv \n",
+ " inflating: air-feast-example/feature_repo/data/zipcode_table_sample.csv \n",
+ " inflating: air-feast-example/feature_repo/data/credit_history.parquet \n",
+ " inflating: air-feast-example/feature_repo/data/zipcode_table.parquet \n",
+ " inflating: air-feast-example/feature_repo/feature_store.yaml \n",
+ " inflating: air-feast-example/.DS_Store \n",
+ " creating: air-feast-example/data/\n",
+ " inflating: air-feast-example/data/loan_table.parquet \n",
+ " inflating: air-feast-example/data/loan_table_sample.csv \n"
+ ]
+ }
+ ],
+ "source": [
+ "! mkdir -p $WORKING_DIR\n",
+ "! wget --no-check-certificate https://github.com/ray-project/air-sample-data/raw/main/air-feast-example.zip\n",
+ "! unzip air-feast-example.zip \n",
+ "! mv air-feast-example/* $WORKING_DIR\n",
+ "%cd $WORKING_DIR"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "iNbC-Qqi3Lq_",
+ "outputId": "99576086-12dd-4f96-fb51-de40b77b15ce"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "data feature_repo\n"
+ ]
+ }
+ ],
+ "source": [
+ "! ls"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "c_3wlEus4dYO"
+ },
+ "source": [
+ "There is already a feature repository set up in `feature_repo/`. It isn't necessary to create a new feature repository, but it can be done using the following command: `feast init -t local feature_repo`.\n",
+ "\n",
+ "Now let's take a look at the schema in Feast feature store, which is defined by `feature_repo/features.py`. There are mainly two features: zipcode_feature and credit_history, both are generated from parquet files - `feature_repo/data/zipcode_table.parquet` and `feature_repo/data/credit_history.parquet`."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "5VGLhPLLzlGW",
+ "outputId": "a3f3499e-c140-4ceb-a66d-2f1a6b8a2142"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\u001b[34mfrom\u001b[39;49;00m \u001b[04m\u001b[36mdatetime\u001b[39;49;00m \u001b[34mimport\u001b[39;49;00m timedelta\n",
+ "\n",
+ "\u001b[34mfrom\u001b[39;49;00m \u001b[04m\u001b[36mfeast\u001b[39;49;00m \u001b[34mimport\u001b[39;49;00m (Entity, Field, FeatureView, FileSource, ValueType)\n",
+ "\u001b[34mfrom\u001b[39;49;00m \u001b[04m\u001b[36mfeast\u001b[39;49;00m\u001b[04m\u001b[36m.\u001b[39;49;00m\u001b[04m\u001b[36mtypes\u001b[39;49;00m \u001b[34mimport\u001b[39;49;00m Float32, Int64, String\n",
+ "\n",
+ "\n",
+ "zipcode = Entity(name=\u001b[33m\"\u001b[39;49;00m\u001b[33mzipcode\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, value_type=Int64)\n",
+ "\n",
+ "zipcode_source = FileSource(\n",
+ " path=\u001b[33m\"\u001b[39;49;00m\u001b[33mfeature_repo/data/zipcode_table.parquet\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m,\n",
+ " timestamp_field=\u001b[33m\"\u001b[39;49;00m\u001b[33mevent_timestamp\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m,\n",
+ " created_timestamp_column=\u001b[33m\"\u001b[39;49;00m\u001b[33mcreated_timestamp\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m,\n",
+ ")\n",
+ "\n",
+ "zipcode_features = FeatureView(\n",
+ " name=\u001b[33m\"\u001b[39;49;00m\u001b[33mzipcode_features\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m,\n",
+ " entities=[\u001b[33m\"\u001b[39;49;00m\u001b[33mzipcode\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m],\n",
+ " ttl=timedelta(days=\u001b[34m3650\u001b[39;49;00m),\n",
+ " schema=[\n",
+ " Field(name=\u001b[33m\"\u001b[39;49;00m\u001b[33mcity\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, dtype=String),\n",
+ " Field(name=\u001b[33m\"\u001b[39;49;00m\u001b[33mstate\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, dtype=String),\n",
+ " Field(name=\u001b[33m\"\u001b[39;49;00m\u001b[33mlocation_type\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, dtype=String),\n",
+ " Field(name=\u001b[33m\"\u001b[39;49;00m\u001b[33mtax_returns_filed\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, dtype=Int64),\n",
+ " Field(name=\u001b[33m\"\u001b[39;49;00m\u001b[33mpopulation\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, dtype=Int64),\n",
+ " Field(name=\u001b[33m\"\u001b[39;49;00m\u001b[33mtotal_wages\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, dtype=Int64),\n",
+ " ],\n",
+ " source=zipcode_source,\n",
+ ")\n",
+ "\n",
+ "dob_ssn = Entity(\n",
+ " name=\u001b[33m\"\u001b[39;49;00m\u001b[33mdob_ssn\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m,\n",
+ " value_type=ValueType.STRING,\n",
+ " description=\u001b[33m\"\u001b[39;49;00m\u001b[33mDate of birth and last four digits of social security number\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m,\n",
+ ")\n",
+ "\n",
+ "credit_history_source = FileSource(\n",
+ " path=\u001b[33m\"\u001b[39;49;00m\u001b[33mfeature_repo/data/credit_history.parquet\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m,\n",
+ " timestamp_field=\u001b[33m\"\u001b[39;49;00m\u001b[33mevent_timestamp\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m,\n",
+ " created_timestamp_column=\u001b[33m\"\u001b[39;49;00m\u001b[33mcreated_timestamp\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m,\n",
+ ")\n",
+ "\n",
+ "credit_history = FeatureView(\n",
+ " name=\u001b[33m\"\u001b[39;49;00m\u001b[33mcredit_history\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m,\n",
+ " entities=[\u001b[33m\"\u001b[39;49;00m\u001b[33mdob_ssn\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m],\n",
+ " ttl=timedelta(days=\u001b[34m90\u001b[39;49;00m),\n",
+ " schema=[\n",
+ " Field(name=\u001b[33m\"\u001b[39;49;00m\u001b[33mcredit_card_due\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, dtype=Int64),\n",
+ " Field(name=\u001b[33m\"\u001b[39;49;00m\u001b[33mmortgage_due\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, dtype=Int64),\n",
+ " Field(name=\u001b[33m\"\u001b[39;49;00m\u001b[33mstudent_loan_due\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, dtype=Int64),\n",
+ " Field(name=\u001b[33m\"\u001b[39;49;00m\u001b[33mvehicle_loan_due\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, dtype=Int64),\n",
+ " Field(name=\u001b[33m\"\u001b[39;49;00m\u001b[33mhard_pulls\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, dtype=Int64),\n",
+ " Field(name=\u001b[33m\"\u001b[39;49;00m\u001b[33mmissed_payments_2y\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, dtype=Int64),\n",
+ " Field(name=\u001b[33m\"\u001b[39;49;00m\u001b[33mmissed_payments_1y\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, dtype=Int64),\n",
+ " Field(name=\u001b[33m\"\u001b[39;49;00m\u001b[33mmissed_payments_6m\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, dtype=Int64),\n",
+ " Field(name=\u001b[33m\"\u001b[39;49;00m\u001b[33mbankruptcies\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, dtype=Int64),\n",
+ " ],\n",
+ " source=credit_history_source,\n",
+ ")\n"
+ ]
+ }
+ ],
+ "source": [
+ "!pygmentize feature_repo/features.py"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "HQmrfEV33_SM"
+ },
+ "source": [
+ "Deploy the above defined feature store by running `apply` from within the feature_repo/ folder."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "SbL_EbMC2MFS",
+ "outputId": "13b07f1f-d52a-4c4e-a73f-f5478c0304de"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Feast is an open source project that collects anonymized error reporting and usage statistics. To opt out or learn more see https://docs.feast.dev/reference/usage\n",
+ "Created entity \u001b[1m\u001b[32mdob_ssn\u001b[0m\n",
+ "Created entity \u001b[1m\u001b[32mzipcode\u001b[0m\n",
+ "Created feature view \u001b[1m\u001b[32mcredit_history\u001b[0m\n",
+ "Created feature view \u001b[1m\u001b[32mzipcode_features\u001b[0m\n",
+ "\n",
+ "Created sqlite table \u001b[1m\u001b[32mfeature_repo_credit_history\u001b[0m\n",
+ "Created sqlite table \u001b[1m\u001b[32mfeature_repo_zipcode_features\u001b[0m\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "! (cd feature_repo && feast apply)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "D-9Kr-kdzg1r",
+ "outputId": "beabd2f7-c1a6-4fe3-b087-b35b7b9ab56b"
+ },
+ "outputs": [],
+ "source": [
+ "import feast\n",
+ "fs = feast.FeatureStore(repo_path=\"feature_repo\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "5nR8uNE8z-YQ"
+ },
+ "source": [
+ "## Generate training data\n",
+ "On top of the features in Feast, we also have labeled training data at `data/loan_table.parquet`. At the time of training, loan table will be passed into Feast as an entity dataframe for training data generation. Feast will intelligently join credit_history and zipcode_feature tables to create relevant feature vectors to augment the training data."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 424
+ },
+ "id": "twBCJMzVzV0X",
+ "outputId": "efb41c7a-2802-4169-906e-7db1c37d8c8e"
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " loan_id | \n",
+ " dob_ssn | \n",
+ " zipcode | \n",
+ " person_age | \n",
+ " person_income | \n",
+ " person_home_ownership | \n",
+ " person_emp_length | \n",
+ " loan_intent | \n",
+ " loan_amnt | \n",
+ " loan_int_rate | \n",
+ " loan_status | \n",
+ " event_timestamp | \n",
+ " created_timestamp | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " 10000 | \n",
+ " 19530219_5179 | \n",
+ " 76104 | \n",
+ " 22 | \n",
+ " 59000 | \n",
+ " RENT | \n",
+ " 123.0 | \n",
+ " PERSONAL | \n",
+ " 35000 | \n",
+ " 16.02 | \n",
+ " 1 | \n",
+ " 2021-08-25 20:34:41.361000+00:00 | \n",
+ " 2021-08-25 20:34:41.361000+00:00 | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " 10001 | \n",
+ " 19520816_8737 | \n",
+ " 70380 | \n",
+ " 21 | \n",
+ " 9600 | \n",
+ " OWN | \n",
+ " 5.0 | \n",
+ " EDUCATION | \n",
+ " 1000 | \n",
+ " 11.14 | \n",
+ " 0 | \n",
+ " 2021-08-25 20:16:20.128000+00:00 | \n",
+ " 2021-08-25 20:16:20.128000+00:00 | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " 10002 | \n",
+ " 19860413_2537 | \n",
+ " 97039 | \n",
+ " 25 | \n",
+ " 9600 | \n",
+ " MORTGAGE | \n",
+ " 1.0 | \n",
+ " MEDICAL | \n",
+ " 5500 | \n",
+ " 12.87 | \n",
+ " 1 | \n",
+ " 2021-08-25 19:57:58.896000+00:00 | \n",
+ " 2021-08-25 19:57:58.896000+00:00 | \n",
+ "
\n",
+ " \n",
+ " 3 | \n",
+ " 10003 | \n",
+ " 19760701_8090 | \n",
+ " 63785 | \n",
+ " 23 | \n",
+ " 65500 | \n",
+ " RENT | \n",
+ " 4.0 | \n",
+ " MEDICAL | \n",
+ " 35000 | \n",
+ " 15.23 | \n",
+ " 1 | \n",
+ " 2021-08-25 19:39:37.663000+00:00 | \n",
+ " 2021-08-25 19:39:37.663000+00:00 | \n",
+ "
\n",
+ " \n",
+ " 4 | \n",
+ " 10004 | \n",
+ " 19830125_8297 | \n",
+ " 82223 | \n",
+ " 24 | \n",
+ " 54400 | \n",
+ " RENT | \n",
+ " 8.0 | \n",
+ " MEDICAL | \n",
+ " 35000 | \n",
+ " 14.27 | \n",
+ " 1 | \n",
+ " 2021-08-25 19:21:16.430000+00:00 | \n",
+ " 2021-08-25 19:21:16.430000+00:00 | \n",
+ "
\n",
+ " \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ "
\n",
+ " \n",
+ " 28633 | \n",
+ " 38633 | \n",
+ " 19491126_1487 | \n",
+ " 43205 | \n",
+ " 57 | \n",
+ " 53000 | \n",
+ " MORTGAGE | \n",
+ " 1.0 | \n",
+ " PERSONAL | \n",
+ " 5800 | \n",
+ " 13.16 | \n",
+ " 0 | \n",
+ " 2020-08-25 21:48:06.292000+00:00 | \n",
+ " 2020-08-25 21:48:06.292000+00:00 | \n",
+ "
\n",
+ " \n",
+ " 28634 | \n",
+ " 38634 | \n",
+ " 19681208_6537 | \n",
+ " 24872 | \n",
+ " 54 | \n",
+ " 120000 | \n",
+ " MORTGAGE | \n",
+ " 4.0 | \n",
+ " PERSONAL | \n",
+ " 17625 | \n",
+ " 7.49 | \n",
+ " 0 | \n",
+ " 2020-08-25 21:29:45.059000+00:00 | \n",
+ " 2020-08-25 21:29:45.059000+00:00 | \n",
+ "
\n",
+ " \n",
+ " 28635 | \n",
+ " 38635 | \n",
+ " 19880422_2592 | \n",
+ " 68826 | \n",
+ " 65 | \n",
+ " 76000 | \n",
+ " RENT | \n",
+ " 3.0 | \n",
+ " HOMEIMPROVEMENT | \n",
+ " 35000 | \n",
+ " 10.99 | \n",
+ " 1 | \n",
+ " 2020-08-25 21:11:23.826000+00:00 | \n",
+ " 2020-08-25 21:11:23.826000+00:00 | \n",
+ "
\n",
+ " \n",
+ " 28636 | \n",
+ " 38636 | \n",
+ " 19901017_6108 | \n",
+ " 92014 | \n",
+ " 56 | \n",
+ " 150000 | \n",
+ " MORTGAGE | \n",
+ " 5.0 | \n",
+ " PERSONAL | \n",
+ " 15000 | \n",
+ " 11.48 | \n",
+ " 0 | \n",
+ " 2020-08-25 20:53:02.594000+00:00 | \n",
+ " 2020-08-25 20:53:02.594000+00:00 | \n",
+ "
\n",
+ " \n",
+ " 28637 | \n",
+ " 38637 | \n",
+ " 19960703_3449 | \n",
+ " 69033 | \n",
+ " 66 | \n",
+ " 42000 | \n",
+ " RENT | \n",
+ " 2.0 | \n",
+ " MEDICAL | \n",
+ " 6475 | \n",
+ " 9.99 | \n",
+ " 0 | \n",
+ " 2020-08-25 20:34:41.361000+00:00 | \n",
+ " 2020-08-25 20:34:41.361000+00:00 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
28638 rows × 13 columns
\n",
+ "
"
+ ],
+ "text/plain": [
+ " loan_id dob_ssn zipcode person_age person_income \\\n",
+ "0 10000 19530219_5179 76104 22 59000 \n",
+ "1 10001 19520816_8737 70380 21 9600 \n",
+ "2 10002 19860413_2537 97039 25 9600 \n",
+ "3 10003 19760701_8090 63785 23 65500 \n",
+ "4 10004 19830125_8297 82223 24 54400 \n",
+ "... ... ... ... ... ... \n",
+ "28633 38633 19491126_1487 43205 57 53000 \n",
+ "28634 38634 19681208_6537 24872 54 120000 \n",
+ "28635 38635 19880422_2592 68826 65 76000 \n",
+ "28636 38636 19901017_6108 92014 56 150000 \n",
+ "28637 38637 19960703_3449 69033 66 42000 \n",
+ "\n",
+ " person_home_ownership person_emp_length loan_intent loan_amnt \\\n",
+ "0 RENT 123.0 PERSONAL 35000 \n",
+ "1 OWN 5.0 EDUCATION 1000 \n",
+ "2 MORTGAGE 1.0 MEDICAL 5500 \n",
+ "3 RENT 4.0 MEDICAL 35000 \n",
+ "4 RENT 8.0 MEDICAL 35000 \n",
+ "... ... ... ... ... \n",
+ "28633 MORTGAGE 1.0 PERSONAL 5800 \n",
+ "28634 MORTGAGE 4.0 PERSONAL 17625 \n",
+ "28635 RENT 3.0 HOMEIMPROVEMENT 35000 \n",
+ "28636 MORTGAGE 5.0 PERSONAL 15000 \n",
+ "28637 RENT 2.0 MEDICAL 6475 \n",
+ "\n",
+ " loan_int_rate loan_status event_timestamp \\\n",
+ "0 16.02 1 2021-08-25 20:34:41.361000+00:00 \n",
+ "1 11.14 0 2021-08-25 20:16:20.128000+00:00 \n",
+ "2 12.87 1 2021-08-25 19:57:58.896000+00:00 \n",
+ "3 15.23 1 2021-08-25 19:39:37.663000+00:00 \n",
+ "4 14.27 1 2021-08-25 19:21:16.430000+00:00 \n",
+ "... ... ... ... \n",
+ "28633 13.16 0 2020-08-25 21:48:06.292000+00:00 \n",
+ "28634 7.49 0 2020-08-25 21:29:45.059000+00:00 \n",
+ "28635 10.99 1 2020-08-25 21:11:23.826000+00:00 \n",
+ "28636 11.48 0 2020-08-25 20:53:02.594000+00:00 \n",
+ "28637 9.99 0 2020-08-25 20:34:41.361000+00:00 \n",
+ "\n",
+ " created_timestamp \n",
+ "0 2021-08-25 20:34:41.361000+00:00 \n",
+ "1 2021-08-25 20:16:20.128000+00:00 \n",
+ "2 2021-08-25 19:57:58.896000+00:00 \n",
+ "3 2021-08-25 19:39:37.663000+00:00 \n",
+ "4 2021-08-25 19:21:16.430000+00:00 \n",
+ "... ... \n",
+ "28633 2020-08-25 21:48:06.292000+00:00 \n",
+ "28634 2020-08-25 21:29:45.059000+00:00 \n",
+ "28635 2020-08-25 21:11:23.826000+00:00 \n",
+ "28636 2020-08-25 20:53:02.594000+00:00 \n",
+ "28637 2020-08-25 20:34:41.361000+00:00 \n",
+ "\n",
+ "[28638 rows x 13 columns]"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "import pandas as pd\n",
+ "loan_df = pd.read_parquet(\"data/loan_table.parquet\")\n",
+ "display(loan_df)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "metadata": {
+ "id": "hAHAb2nS6ClR"
+ },
+ "outputs": [],
+ "source": [
+ "feast_features = [\n",
+ " \"zipcode_features:city\",\n",
+ " \"zipcode_features:state\",\n",
+ " \"zipcode_features:location_type\",\n",
+ " \"zipcode_features:tax_returns_filed\",\n",
+ " \"zipcode_features:population\",\n",
+ " \"zipcode_features:total_wages\",\n",
+ " \"credit_history:credit_card_due\",\n",
+ " \"credit_history:mortgage_due\",\n",
+ " \"credit_history:student_loan_due\",\n",
+ " \"credit_history:vehicle_loan_due\",\n",
+ " \"credit_history:hard_pulls\",\n",
+ " \"credit_history:missed_payments_2y\",\n",
+ " \"credit_history:missed_payments_1y\",\n",
+ " \"credit_history:missed_payments_6m\",\n",
+ " \"credit_history:bankruptcies\",\n",
+ "]\n",
+ "\n",
+ "loan_w_offline_feature = fs.get_historical_features(\n",
+ " entity_df=loan_df, features=feast_features\n",
+ ").to_df()\n",
+ "\n",
+ "# Drop some unnecessary columns for simplicity\n",
+ "loan_w_offline_feature = loan_w_offline_feature.drop([\"event_timestamp\", \"created_timestamp__\", \"loan_id\", \"zipcode\", \"dob_ssn\"], axis=1)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "TxDFT1776XgP"
+ },
+ "source": [
+ "Now let's take a look at the training data as it is augmented by Feast."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 20,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 540
+ },
+ "id": "s-w2696D6h78",
+ "outputId": "f98c1dbe-1916-41bc-d543-590bac08caf5"
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " person_age | \n",
+ " person_income | \n",
+ " person_home_ownership | \n",
+ " person_emp_length | \n",
+ " loan_intent | \n",
+ " loan_amnt | \n",
+ " loan_int_rate | \n",
+ " loan_status | \n",
+ " city | \n",
+ " state | \n",
+ " ... | \n",
+ " total_wages | \n",
+ " credit_card_due | \n",
+ " mortgage_due | \n",
+ " student_loan_due | \n",
+ " vehicle_loan_due | \n",
+ " hard_pulls | \n",
+ " missed_payments_2y | \n",
+ " missed_payments_1y | \n",
+ " missed_payments_6m | \n",
+ " bankruptcies | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 1358886 | \n",
+ " 55 | \n",
+ " 24543 | \n",
+ " RENT | \n",
+ " 3.0 | \n",
+ " VENTURE | \n",
+ " 4000 | \n",
+ " 13.92 | \n",
+ " 0 | \n",
+ " SLIDELL | \n",
+ " LA | \n",
+ " ... | \n",
+ " 315061217 | \n",
+ " 1777 | \n",
+ " 690650 | \n",
+ " 46372 | \n",
+ " 10439 | \n",
+ " 5 | \n",
+ " 1 | \n",
+ " 2 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ " 1358815 | \n",
+ " 58 | \n",
+ " 20000 | \n",
+ " RENT | \n",
+ " 0.0 | \n",
+ " EDUCATION | \n",
+ " 4000 | \n",
+ " 9.99 | \n",
+ " 0 | \n",
+ " CHOUTEAU | \n",
+ " OK | \n",
+ " ... | \n",
+ " 59412230 | \n",
+ " 1791 | \n",
+ " 462670 | \n",
+ " 19421 | \n",
+ " 3583 | \n",
+ " 8 | \n",
+ " 7 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ " 2 | \n",
+ "
\n",
+ " \n",
+ " 1353348 | \n",
+ " 64 | \n",
+ " 24000 | \n",
+ " RENT | \n",
+ " 1.0 | \n",
+ " MEDICAL | \n",
+ " 3000 | \n",
+ " 6.99 | \n",
+ " 0 | \n",
+ " BISMARCK | \n",
+ " ND | \n",
+ " ... | \n",
+ " 469621263 | \n",
+ " 5917 | \n",
+ " 1780959 | \n",
+ " 11835 | \n",
+ " 27910 | \n",
+ " 8 | \n",
+ " 3 | \n",
+ " 2 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ " 1354200 | \n",
+ " 55 | \n",
+ " 34000 | \n",
+ " RENT | \n",
+ " 0.0 | \n",
+ " DEBTCONSOLIDATION | \n",
+ " 12000 | \n",
+ " 6.92 | \n",
+ " 1 | \n",
+ " SANTA BARBARA | \n",
+ " CA | \n",
+ " ... | \n",
+ " 24537583 | \n",
+ " 8091 | \n",
+ " 364271 | \n",
+ " 30248 | \n",
+ " 22640 | \n",
+ " 2 | \n",
+ " 7 | \n",
+ " 3 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ " 1354271 | \n",
+ " 51 | \n",
+ " 74628 | \n",
+ " MORTGAGE | \n",
+ " 3.0 | \n",
+ " PERSONAL | \n",
+ " 3000 | \n",
+ " 13.49 | \n",
+ " 0 | \n",
+ " HUNTINGTON BEACH | \n",
+ " CA | \n",
+ " ... | \n",
+ " 19749601 | \n",
+ " 3679 | \n",
+ " 1659968 | \n",
+ " 37582 | \n",
+ " 20284 | \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ "
\n",
+ " \n",
+ " 674285 | \n",
+ " 23 | \n",
+ " 74000 | \n",
+ " RENT | \n",
+ " 3.0 | \n",
+ " MEDICAL | \n",
+ " 25000 | \n",
+ " 10.36 | \n",
+ " 1 | \n",
+ " MANSFIELD | \n",
+ " MO | \n",
+ " ... | \n",
+ " 33180988 | \n",
+ " 5176 | \n",
+ " 1089963 | \n",
+ " 44642 | \n",
+ " 2877 | \n",
+ " 1 | \n",
+ " 6 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ " 668250 | \n",
+ " 21 | \n",
+ " 200000 | \n",
+ " MORTGAGE | \n",
+ " 2.0 | \n",
+ " DEBTCONSOLIDATION | \n",
+ " 25000 | \n",
+ " 13.99 | \n",
+ " 0 | \n",
+ " SALISBURY | \n",
+ " MD | \n",
+ " ... | \n",
+ " 470634058 | \n",
+ " 5297 | \n",
+ " 1288915 | \n",
+ " 22471 | \n",
+ " 22630 | \n",
+ " 0 | \n",
+ " 5 | \n",
+ " 2 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ " 668321 | \n",
+ " 24 | \n",
+ " 200000 | \n",
+ " MORTGAGE | \n",
+ " 3.0 | \n",
+ " VENTURE | \n",
+ " 24000 | \n",
+ " 7.49 | \n",
+ " 0 | \n",
+ " STRUNK | \n",
+ " KY | \n",
+ " ... | \n",
+ " 10067358 | \n",
+ " 6549 | \n",
+ " 22399 | \n",
+ " 11806 | \n",
+ " 13005 | \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ " 670025 | \n",
+ " 23 | \n",
+ " 215000 | \n",
+ " MORTGAGE | \n",
+ " 7.0 | \n",
+ " MEDICAL | \n",
+ " 35000 | \n",
+ " 14.79 | \n",
+ " 0 | \n",
+ " HAWTHORN | \n",
+ " PA | \n",
+ " ... | \n",
+ " 5956835 | \n",
+ " 9079 | \n",
+ " 876038 | \n",
+ " 4556 | \n",
+ " 21588 | \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ " 2034006 | \n",
+ " 22 | \n",
+ " 59000 | \n",
+ " RENT | \n",
+ " 123.0 | \n",
+ " PERSONAL | \n",
+ " 35000 | \n",
+ " 16.02 | \n",
+ " 1 | \n",
+ " FORT WORTH | \n",
+ " TX | \n",
+ " ... | \n",
+ " 142325465 | \n",
+ " 8419 | \n",
+ " 91803 | \n",
+ " 22328 | \n",
+ " 15078 | \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
28638 rows × 23 columns
\n",
+ "
"
+ ],
+ "text/plain": [
+ " person_age person_income person_home_ownership person_emp_length \\\n",
+ "1358886 55 24543 RENT 3.0 \n",
+ "1358815 58 20000 RENT 0.0 \n",
+ "1353348 64 24000 RENT 1.0 \n",
+ "1354200 55 34000 RENT 0.0 \n",
+ "1354271 51 74628 MORTGAGE 3.0 \n",
+ "... ... ... ... ... \n",
+ "674285 23 74000 RENT 3.0 \n",
+ "668250 21 200000 MORTGAGE 2.0 \n",
+ "668321 24 200000 MORTGAGE 3.0 \n",
+ "670025 23 215000 MORTGAGE 7.0 \n",
+ "2034006 22 59000 RENT 123.0 \n",
+ "\n",
+ " loan_intent loan_amnt loan_int_rate loan_status \\\n",
+ "1358886 VENTURE 4000 13.92 0 \n",
+ "1358815 EDUCATION 4000 9.99 0 \n",
+ "1353348 MEDICAL 3000 6.99 0 \n",
+ "1354200 DEBTCONSOLIDATION 12000 6.92 1 \n",
+ "1354271 PERSONAL 3000 13.49 0 \n",
+ "... ... ... ... ... \n",
+ "674285 MEDICAL 25000 10.36 1 \n",
+ "668250 DEBTCONSOLIDATION 25000 13.99 0 \n",
+ "668321 VENTURE 24000 7.49 0 \n",
+ "670025 MEDICAL 35000 14.79 0 \n",
+ "2034006 PERSONAL 35000 16.02 1 \n",
+ "\n",
+ " city state ... total_wages credit_card_due \\\n",
+ "1358886 SLIDELL LA ... 315061217 1777 \n",
+ "1358815 CHOUTEAU OK ... 59412230 1791 \n",
+ "1353348 BISMARCK ND ... 469621263 5917 \n",
+ "1354200 SANTA BARBARA CA ... 24537583 8091 \n",
+ "1354271 HUNTINGTON BEACH CA ... 19749601 3679 \n",
+ "... ... ... ... ... ... \n",
+ "674285 MANSFIELD MO ... 33180988 5176 \n",
+ "668250 SALISBURY MD ... 470634058 5297 \n",
+ "668321 STRUNK KY ... 10067358 6549 \n",
+ "670025 HAWTHORN PA ... 5956835 9079 \n",
+ "2034006 FORT WORTH TX ... 142325465 8419 \n",
+ "\n",
+ " mortgage_due student_loan_due vehicle_loan_due hard_pulls \\\n",
+ "1358886 690650 46372 10439 5 \n",
+ "1358815 462670 19421 3583 8 \n",
+ "1353348 1780959 11835 27910 8 \n",
+ "1354200 364271 30248 22640 2 \n",
+ "1354271 1659968 37582 20284 0 \n",
+ "... ... ... ... ... \n",
+ "674285 1089963 44642 2877 1 \n",
+ "668250 1288915 22471 22630 0 \n",
+ "668321 22399 11806 13005 0 \n",
+ "670025 876038 4556 21588 0 \n",
+ "2034006 91803 22328 15078 0 \n",
+ "\n",
+ " missed_payments_2y missed_payments_1y missed_payments_6m \\\n",
+ "1358886 1 2 1 \n",
+ "1358815 7 1 0 \n",
+ "1353348 3 2 1 \n",
+ "1354200 7 3 0 \n",
+ "1354271 1 0 0 \n",
+ "... ... ... ... \n",
+ "674285 6 1 0 \n",
+ "668250 5 2 1 \n",
+ "668321 1 0 0 \n",
+ "670025 1 0 0 \n",
+ "2034006 1 0 0 \n",
+ "\n",
+ " bankruptcies \n",
+ "1358886 0 \n",
+ "1358815 2 \n",
+ "1353348 0 \n",
+ "1354200 0 \n",
+ "1354271 0 \n",
+ "... ... \n",
+ "674285 0 \n",
+ "668250 0 \n",
+ "668321 0 \n",
+ "670025 0 \n",
+ "2034006 0 \n",
+ "\n",
+ "[28638 rows x 23 columns]"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "display(loan_w_offline_feature)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "KpiOm00789Rd"
+ },
+ "outputs": [],
+ "source": [
+ "# Convert into Train and Validation datasets.\n",
+ "import ray\n",
+ "\n",
+ "loan_ds = ray.data.from_pandas(loan_w_offline_feature)\n",
+ "train_ds, validation_ds = loan_ds.split_proportionately([0.8])\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "uFNp9sUCEGc2"
+ },
+ "source": [
+ "## Define Preprocessors\n",
+ "\n",
+ "[Preprocessor](https://docs.ray.io/en/latest/ray-air/getting-started.html#preprocessors) does last mile processing on Ray Datasets before feeding into training model."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 22,
+ "metadata": {
+ "id": "qravnrt9EBuW"
+ },
+ "outputs": [],
+ "source": [
+ "categorical_features = [\n",
+ " \"person_home_ownership\",\n",
+ " \"loan_intent\",\n",
+ " \"city\",\n",
+ " \"state\",\n",
+ " \"location_type\",\n",
+ "]\n",
+ "\n",
+ "from ray.ml.preprocessors import Chain, OrdinalEncoder, SimpleImputer\n",
+ "\n",
+ "imputer = SimpleImputer(categorical_features, strategy=\"most_frequent\")\n",
+ "encoder = OrdinalEncoder(columns=categorical_features)\n",
+ "chained_preprocessor = Chain(imputer, encoder)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "SqPGGd6bEn0x"
+ },
+ "source": [
+ "## Train XGBoost model using Ray Air Trainer\n",
+ "Ray Air provides a variety of [Trainers](https://docs.ray.io/en/latest/ray-air/getting-started.html#trainer) that are integrated with popular machine learning frameworks. You can train a distributed model at scale leveraging Ray using the intuitive API `trainer.fit()`. The output is a Ray Air [Checkpoint](https://docs.ray.io/en/latest/ray-air/getting-started.html#module-ray.ml.checkpoint), that will seamlessly transfer the workload from training to prediction. Let's take a look!"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 23,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 670
+ },
+ "id": "995W14MdFmxl",
+ "outputId": "417c1188-edf6-4310-dba8-366b71d77806"
+ },
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/home/ray/anaconda3/lib/python3.8/site-packages/xgboost_ray/main.py:131: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.\n",
+ " XGBOOST_LOOSE_VERSION = LooseVersion(xgboost_version)\n",
+ "E0602 14:26:17.861773834 4511 fork_posix.cc:76] Other threads are currently calling into gRPC, skipping fork() handlers\n",
+ "/home/ray/anaconda3/lib/python3.8/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
+ " from .autonotebook import tqdm as notebook_tqdm\n"
+ ]
+ },
+ {
+ "data": {
+ "text/html": [
+ "== Status ==
Current time: 2022-06-02 14:26:33 (running for 00:00:14.95)
Memory usage on this node: 3.5/30.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/8 CPUs, 0/0 GPUs, 0.0/18.04 GiB heap, 0.0/9.02 GiB objects
Result logdir: /home/ray/ray_results/XGBoostTrainer_2022-06-02_14-26-17
Number of trials: 1/1 (1 TERMINATED)
\n",
+ "\n",
+ "Trial name | status | loc | iter | total time (s) | train-logloss | train-error | validation-logloss |
\n",
+ "\n",
+ "\n",
+ "XGBoostTrainer_a3a2c_00000 | TERMINATED | 172.31.71.98:12634 | 100 | 11.9561 | 0.0578837 | 0.0127019 | 0.225994 |
\n",
+ "\n",
+ "
"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "(GBDTTrainable pid=12634) 2022-06-02 14:26:23,018\tINFO main.py:980 -- [RayXGBoost] Created 1 new actors (1 total actors). Waiting until actors are ready for training.\n",
+ "(GBDTTrainable pid=12634) 2022-06-02 14:26:25,230\tINFO main.py:1025 -- [RayXGBoost] Starting XGBoost training.\n",
+ "(GBDTTrainable pid=12634) E0602 14:26:25.231635524 12691 fork_posix.cc:76] Other threads are currently calling into gRPC, skipping fork() handlers\n",
+ "(_RemoteRayXGBoostActor pid=12769) [14:26:25] task [xgboost.ray]:139712002042896 got new rank 0\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Result for XGBoostTrainer_a3a2c_00000:\n",
+ " date: 2022-06-02_14-26-26\n",
+ " done: false\n",
+ " experiment_id: 14b63d641b8e4583b3551b1af113ec6d\n",
+ " hostname: ip-172-31-71-98\n",
+ " iterations_since_restore: 1\n",
+ " node_ip: 172.31.71.98\n",
+ " pid: 12634\n",
+ " should_checkpoint: true\n",
+ " time_since_restore: 5.432286262512207\n",
+ " time_this_iter_s: 5.432286262512207\n",
+ " time_total_s: 5.432286262512207\n",
+ " timestamp: 1654205186\n",
+ " timesteps_since_restore: 0\n",
+ " train-error: 0.09502400698384984\n",
+ " train-logloss: 0.5147884634112437\n",
+ " training_iteration: 1\n",
+ " trial_id: a3a2c_00000\n",
+ " validation-error: 0.1627094972067039\n",
+ " validation-logloss: 0.5557328870197414\n",
+ " warmup_time: 0.004194974899291992\n",
+ " \n",
+ "Result for XGBoostTrainer_a3a2c_00000:\n",
+ " date: 2022-06-02_14-26-31\n",
+ " done: false\n",
+ " experiment_id: 14b63d641b8e4583b3551b1af113ec6d\n",
+ " hostname: ip-172-31-71-98\n",
+ " iterations_since_restore: 81\n",
+ " node_ip: 172.31.71.98\n",
+ " pid: 12634\n",
+ " should_checkpoint: true\n",
+ " time_since_restore: 10.460175275802612\n",
+ " time_this_iter_s: 0.03925156593322754\n",
+ " time_total_s: 10.460175275802612\n",
+ " timestamp: 1654205191\n",
+ " timesteps_since_restore: 0\n",
+ " train-error: 0.01802706241815801\n",
+ " train-logloss: 0.07058723671627426\n",
+ " training_iteration: 81\n",
+ " trial_id: a3a2c_00000\n",
+ " validation-error: 0.0824022346368715\n",
+ " validation-logloss: 0.22556984905196217\n",
+ " warmup_time: 0.004194974899291992\n",
+ " \n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "(GBDTTrainable pid=12634) 2022-06-02 14:26:32,801\tINFO main.py:1516 -- [RayXGBoost] Finished XGBoost training on training data with total N=22,910 in 9.81 seconds (7.57 pure XGBoost training time).\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Result for XGBoostTrainer_a3a2c_00000:\n",
+ " date: 2022-06-02_14-26-32\n",
+ " done: true\n",
+ " experiment_id: 14b63d641b8e4583b3551b1af113ec6d\n",
+ " experiment_tag: '0'\n",
+ " hostname: ip-172-31-71-98\n",
+ " iterations_since_restore: 100\n",
+ " node_ip: 172.31.71.98\n",
+ " pid: 12634\n",
+ " should_checkpoint: true\n",
+ " time_since_restore: 11.956074953079224\n",
+ " time_this_iter_s: 0.022900104522705078\n",
+ " time_total_s: 11.956074953079224\n",
+ " timestamp: 1654205192\n",
+ " timesteps_since_restore: 0\n",
+ " train-error: 0.01270187690964644\n",
+ " train-logloss: 0.05788368908741939\n",
+ " training_iteration: 100\n",
+ " trial_id: a3a2c_00000\n",
+ " validation-error: 0.0825768156424581\n",
+ " validation-logloss: 0.22599449588356768\n",
+ " warmup_time: 0.004194974899291992\n",
+ " \n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "2022-06-02 14:26:33,481\tINFO tune.py:752 -- Total run time: 15.62 seconds (14.94 seconds for the tuning loop).\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ "'checkpoint'"
+ ]
+ },
+ "execution_count": 23,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "LABEL = \"loan_status\"\n",
+ "CHECKPOINT_PATH = \"checkpoint\"\n",
+ "NUM_WORKERS = 1 # Change this based on the resources in the cluster.\n",
+ "\n",
+ "\n",
+ "from ray.ml.train.integrations.xgboost import XGBoostTrainer\n",
+ "params = {\n",
+ " \"tree_method\": \"approx\",\n",
+ " \"objective\": \"binary:logistic\",\n",
+ " \"eval_metric\": [\"logloss\", \"error\"],\n",
+ "}\n",
+ "\n",
+ "trainer = XGBoostTrainer(\n",
+ " scaling_config={\n",
+ " \"num_workers\": NUM_WORKERS,\n",
+ " \"use_gpu\": 0,\n",
+ " },\n",
+ " label_column=LABEL,\n",
+ " params=params,\n",
+ " datasets={\"train\": train_ds, \"validation\": validation_ds},\n",
+ " preprocessor=chained_preprocessor,\n",
+ " num_boost_round=100,\n",
+ ")\n",
+ "checkpoint = trainer.fit().checkpoint\n",
+ "# This saves the checkpoint onto disk\n",
+ "checkpoint.to_directory(CHECKPOINT_PATH)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "QJr73gOvKBza"
+ },
+ "source": [
+ "## Inference\n",
+ "Now from the Checkpoint object we obtained from last session, we can construct a Ray Air [Predictor](https://docs.ray.io/en/latest/ray-air/getting-started.html#predictors) that encapsulates everything needed for inference.\n",
+ "\n",
+ "The API for using Predictor is also very intuitive - simply call `Predictor.predict()`."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 24,
+ "metadata": {
+ "id": "wE8ielQlKYSi"
+ },
+ "outputs": [],
+ "source": [
+ "from ray.ml.checkpoint import Checkpoint\n",
+ "from ray.ml.predictors.integrations.xgboost import XGBoostPredictor\n",
+ "predictor = XGBoostPredictor.from_checkpoint(Checkpoint.from_directory(CHECKPOINT_PATH))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 25,
+ "metadata": {
+ "id": "K9Y9UiD3KqSW"
+ },
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/tmp/ipykernel_4511/2153939661.py:23: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.\n",
+ "Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations\n",
+ " loan_request_df = pd.DataFrame.from_dict(loan_request_dict, dtype=np.float)\n",
+ "/tmp/ipykernel_4511/2153939661.py:23: FutureWarning: Could not cast to float64, falling back to object. This behavior is deprecated. In a future version, when a dtype is passed to 'DataFrame', either all columns will be cast to that dtype, or a TypeError will be raised.\n",
+ " loan_request_df = pd.DataFrame.from_dict(loan_request_dict, dtype=np.float)\n"
+ ]
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " person_age | \n",
+ " person_income | \n",
+ " person_home_ownership | \n",
+ " person_emp_length | \n",
+ " loan_intent | \n",
+ " loan_amnt | \n",
+ " loan_int_rate | \n",
+ " state | \n",
+ " population | \n",
+ " location_type | \n",
+ " ... | \n",
+ " tax_returns_filed | \n",
+ " student_loan_due | \n",
+ " missed_payments_1y | \n",
+ " hard_pulls | \n",
+ " mortgage_due | \n",
+ " bankruptcies | \n",
+ " credit_card_due | \n",
+ " missed_payments_2y | \n",
+ " missed_payments_6m | \n",
+ " vehicle_loan_due | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " 133.0 | \n",
+ " 59000.0 | \n",
+ " RENT | \n",
+ " 123.0 | \n",
+ " PERSONAL | \n",
+ " 35000.0 | \n",
+ " 16.02 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " ... | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
1 rows × 22 columns
\n",
+ "
"
+ ],
+ "text/plain": [
+ " person_age person_income person_home_ownership person_emp_length \\\n",
+ "0 133.0 59000.0 RENT 123.0 \n",
+ "\n",
+ " loan_intent loan_amnt loan_int_rate state population location_type \\\n",
+ "0 PERSONAL 35000.0 16.02 NaN NaN NaN \n",
+ "\n",
+ " ... tax_returns_filed student_loan_due missed_payments_1y hard_pulls \\\n",
+ "0 ... NaN NaN NaN NaN \n",
+ "\n",
+ " mortgage_due bankruptcies credit_card_due missed_payments_2y \\\n",
+ "0 NaN NaN NaN NaN \n",
+ "\n",
+ " missed_payments_6m vehicle_loan_due \n",
+ "0 NaN NaN \n",
+ "\n",
+ "[1 rows x 22 columns]"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "import numpy as np\n",
+ "## Now let's do some prediciton.\n",
+ "loan_request_dict = {\n",
+ " \"zipcode\": [76104],\n",
+ " \"dob_ssn\": [\"19630621_4278\"],\n",
+ " \"person_age\": [133],\n",
+ " \"person_income\": [59000],\n",
+ " \"person_home_ownership\": [\"RENT\"],\n",
+ " \"person_emp_length\": [123.0],\n",
+ " \"loan_intent\": [\"PERSONAL\"],\n",
+ " \"loan_amnt\": [35000],\n",
+ " \"loan_int_rate\": [16.02],\n",
+ "}\n",
+ "\n",
+ "# Now augment the request with online features.\n",
+ "zipcode = loan_request_dict[\"zipcode\"][0]\n",
+ "dob_ssn = loan_request_dict[\"dob_ssn\"][0]\n",
+ "online_features = fs.get_online_features(\n",
+ " entity_rows=[{\"zipcode\": zipcode, \"dob_ssn\": dob_ssn}],\n",
+ " features=feast_features,\n",
+ ").to_dict()\n",
+ "loan_request_dict.update(online_features)\n",
+ "loan_request_df = pd.DataFrame.from_dict(loan_request_dict, dtype=np.float)\n",
+ "loan_request_df = loan_request_df.drop([\"zipcode\", \"dob_ssn\"], axis=1)\n",
+ "display(loan_request_df)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 27,
+ "metadata": {
+ "id": "eS7_n1GPLL1e"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Loan rejected!\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Run through our predictor using `Predictor.predict()` API.\n",
+ "loan_result = np.round(predictor.predict(loan_request_df)[\"predictions\"][0])\n",
+ "\n",
+ "if loan_result == 0:\n",
+ " print(\"Loan approved!\")\n",
+ "elif loan_result == 1:\n",
+ " print(\"Loan rejected!\")"
+ ]
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "collapsed_sections": [],
+ "name": "air + feast",
+ "provenance": []
+ },
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.8.5"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/doc/source/ray-air/examples/index.rst b/doc/source/ray-air/examples/index.rst
index d1e570984..3f8a1e8cb 100644
--- a/doc/source/ray-air/examples/index.rst
+++ b/doc/source/ray-air/examples/index.rst
@@ -39,3 +39,4 @@ Advanced
--------
- :doc:`/ray-air/examples/torch_incremental_learning`: Incrementally train and deploy a PyTorch CV model
+- :doc:`/ray-air/examples/feast_example`: Integrate with Feast feature store in both train and inference
diff --git a/doc/source/ray-air/examples/tfx_tabular_train_to_serve.ipynb b/doc/source/ray-air/examples/tfx_tabular_train_to_serve.ipynb
index 3e92a3fac..3831fd5e4 100644
--- a/doc/source/ray-air/examples/tfx_tabular_train_to_serve.ipynb
+++ b/doc/source/ray-air/examples/tfx_tabular_train_to_serve.ipynb
@@ -15,7 +15,7 @@
"every step from data ingestion to pushing a model to serving.\n",
"\n",
"1. Read a CSV into [Ray Dataset](https://docs.ray.io/en/latest/data/dataset.html).\n",
- "2. Process the dataset by chaining [Ray AIR preprocessors](https://docs.ray.io/en/master/ray-air/package-ref.html#preprocessors).\n",
+ "2. Process the dataset by chaining [Ray AIR preprocessors](https://docs.ray.io/en/latest/ray-air/getting-started.html#preprocessors).\n",
"3. Train the model using the TensorflowTrainer from AIR.\n",
"4. Serve the model using Ray Serve and the above preprocessors."
]
@@ -453,14 +453,14 @@
"a modularized component so that the same logic can be applied to both\n",
"training data as well as data for online serving or offline batch prediction.\n",
"\n",
- "In AIR, this component is a [`Preprocessor`](https://docs.ray.io/en/master/ray-air/package-ref.html#preprocessors).\n",
+ "In AIR, this component is a [`Preprocessor`](https://docs.ray.io/en/latest/ray-air/getting-started.html#preprocessors).\n",
"It is constructed in a way that allows easy composition.\n",
"\n",
"Now let's construct a chained preprocessor composed of simple preprocessors, including\n",
"1. Imputer for filling missing features;\n",
"2. OneHotEncoder for encoding categorical features;\n",
"3. BatchMapper where arbitrary user-defined function can be applied to batches of records;\n",
- "and so on. Take a look at [`Preprocessor`](https://docs.ray.io/en/master/ray-air/package-ref.html#preprocessors).\n",
+ "and so on. Take a look at [`Preprocessor`](https://docs.ray.io/en/latest/ray-air/getting-started.html#preprocessors).\n",
"The output of the preprocessing step goes into model for training."
]
},