diff --git a/doc/source/_toc.yml b/doc/source/_toc.yml index db4a0a42d..b87e1297e 100644 --- a/doc/source/_toc.yml +++ b/doc/source/_toc.yml @@ -34,6 +34,7 @@ parts: - file: ray-air/examples/rl_serving_example - file: ray-air/examples/rl_online_example - file: ray-air/examples/rl_offline_example + - file: ray-air/examples/feast_example - file: ray-air/package-ref - caption: AIR Libraries diff --git a/doc/source/ray-air/examples/feast_example.ipynb b/doc/source/ray-air/examples/feast_example.ipynb new file mode 100644 index 000000000..6404fcd49 --- /dev/null +++ b/doc/source/ray-air/examples/feast_example.ipynb @@ -0,0 +1,1510 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Integrate Ray Air with Feast feature store" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# !pip install feast==0.20.1 ray[air]>=1.13 xgboost_ray" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "INyNIaeB1Kza" + }, + "source": [ + "In this example, we showcase how to use Ray Air with Feast feature store, leveraging both historical features for training a model and online features for inference.\n", + "\n", + "The task is adapted from [Feast credit scoring tutorial](https://github.com/feast-dev/feast-aws-credit-scoring-tutorial). In this example, we train a xgboost model and run some prediction on an incoming loan request to see if it is approved or rejected. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sBC9CCrpzQLF" + }, + "source": [ + "Let's first set up our workspace and prepare the data to work with." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "id": "DcPIskZlzSal" + }, + "outputs": [], + "source": [ + "import os\n", + "WORKING_DIR = os.path.expanduser(\"~/ray-air-feast-example/\")\n", + "%env WORKING_DIR=$WORKING_DIR" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "BcyCKjV3zTCK", + "outputId": "afdfa24d-e5ce-49db-c904-e961e1eb910c" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "--2022-06-02 14:22:50-- https://github.com/ray-project/air-sample-data/raw/main/air-feast-example.zip\n", + "Resolving github.com (github.com)... 192.30.255.113\n", + "Connecting to github.com (github.com)|192.30.255.113|:443... connected.\n", + "HTTP request sent, awaiting response... 302 Found\n", + "Location: https://raw.githubusercontent.com/ray-project/air-sample-data/main/air-feast-example.zip [following]\n", + "--2022-06-02 14:22:50-- https://raw.githubusercontent.com/ray-project/air-sample-data/main/air-feast-example.zip\n", + "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.108.133, ...\n", + "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 23715107 (23M) [application/zip]\n", + "Saving to: ‘air-feast-example.zip’\n", + "\n", + "air-feast-example.z 100%[===================>] 22.62M 114MB/s in 0.2s \n", + "\n", + "2022-06-02 14:22:51 (114 MB/s) - ‘air-feast-example.zip’ saved [23715107/23715107]\n", + "\n", + "Archive: air-feast-example.zip\n", + " creating: air-feast-example/\n", + " creating: air-feast-example/feature_repo/\n", + " inflating: air-feast-example/feature_repo/.DS_Store \n", + " extracting: air-feast-example/feature_repo/__init__.py \n", + " inflating: air-feast-example/feature_repo/features.py \n", + " creating: air-feast-example/feature_repo/data/\n", + " inflating: air-feast-example/feature_repo/data/.DS_Store \n", + " inflating: air-feast-example/feature_repo/data/credit_history_sample.csv \n", + " inflating: air-feast-example/feature_repo/data/zipcode_table_sample.csv \n", + " inflating: air-feast-example/feature_repo/data/credit_history.parquet \n", + " inflating: air-feast-example/feature_repo/data/zipcode_table.parquet \n", + " inflating: air-feast-example/feature_repo/feature_store.yaml \n", + " inflating: air-feast-example/.DS_Store \n", + " creating: air-feast-example/data/\n", + " inflating: air-feast-example/data/loan_table.parquet \n", + " inflating: air-feast-example/data/loan_table_sample.csv \n" + ] + } + ], + "source": [ + "! mkdir -p $WORKING_DIR\n", + "! wget --no-check-certificate https://github.com/ray-project/air-sample-data/raw/main/air-feast-example.zip\n", + "! unzip air-feast-example.zip \n", + "! mv air-feast-example/* $WORKING_DIR\n", + "%cd $WORKING_DIR" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "iNbC-Qqi3Lq_", + "outputId": "99576086-12dd-4f96-fb51-de40b77b15ce" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "data feature_repo\n" + ] + } + ], + "source": [ + "! ls" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "c_3wlEus4dYO" + }, + "source": [ + "There is already a feature repository set up in `feature_repo/`. It isn't necessary to create a new feature repository, but it can be done using the following command: `feast init -t local feature_repo`.\n", + "\n", + "Now let's take a look at the schema in Feast feature store, which is defined by `feature_repo/features.py`. There are mainly two features: zipcode_feature and credit_history, both are generated from parquet files - `feature_repo/data/zipcode_table.parquet` and `feature_repo/data/credit_history.parquet`." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "5VGLhPLLzlGW", + "outputId": "a3f3499e-c140-4ceb-a66d-2f1a6b8a2142" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[34mfrom\u001b[39;49;00m \u001b[04m\u001b[36mdatetime\u001b[39;49;00m \u001b[34mimport\u001b[39;49;00m timedelta\n", + "\n", + "\u001b[34mfrom\u001b[39;49;00m \u001b[04m\u001b[36mfeast\u001b[39;49;00m \u001b[34mimport\u001b[39;49;00m (Entity, Field, FeatureView, FileSource, ValueType)\n", + "\u001b[34mfrom\u001b[39;49;00m \u001b[04m\u001b[36mfeast\u001b[39;49;00m\u001b[04m\u001b[36m.\u001b[39;49;00m\u001b[04m\u001b[36mtypes\u001b[39;49;00m \u001b[34mimport\u001b[39;49;00m Float32, Int64, String\n", + "\n", + "\n", + "zipcode = Entity(name=\u001b[33m\"\u001b[39;49;00m\u001b[33mzipcode\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, value_type=Int64)\n", + "\n", + "zipcode_source = FileSource(\n", + " path=\u001b[33m\"\u001b[39;49;00m\u001b[33mfeature_repo/data/zipcode_table.parquet\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m,\n", + " timestamp_field=\u001b[33m\"\u001b[39;49;00m\u001b[33mevent_timestamp\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m,\n", + " created_timestamp_column=\u001b[33m\"\u001b[39;49;00m\u001b[33mcreated_timestamp\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m,\n", + ")\n", + "\n", + "zipcode_features = FeatureView(\n", + " name=\u001b[33m\"\u001b[39;49;00m\u001b[33mzipcode_features\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m,\n", + " entities=[\u001b[33m\"\u001b[39;49;00m\u001b[33mzipcode\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m],\n", + " ttl=timedelta(days=\u001b[34m3650\u001b[39;49;00m),\n", + " schema=[\n", + " Field(name=\u001b[33m\"\u001b[39;49;00m\u001b[33mcity\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, dtype=String),\n", + " Field(name=\u001b[33m\"\u001b[39;49;00m\u001b[33mstate\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, dtype=String),\n", + " Field(name=\u001b[33m\"\u001b[39;49;00m\u001b[33mlocation_type\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, dtype=String),\n", + " Field(name=\u001b[33m\"\u001b[39;49;00m\u001b[33mtax_returns_filed\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, dtype=Int64),\n", + " Field(name=\u001b[33m\"\u001b[39;49;00m\u001b[33mpopulation\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, dtype=Int64),\n", + " Field(name=\u001b[33m\"\u001b[39;49;00m\u001b[33mtotal_wages\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, dtype=Int64),\n", + " ],\n", + " source=zipcode_source,\n", + ")\n", + "\n", + "dob_ssn = Entity(\n", + " name=\u001b[33m\"\u001b[39;49;00m\u001b[33mdob_ssn\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m,\n", + " value_type=ValueType.STRING,\n", + " description=\u001b[33m\"\u001b[39;49;00m\u001b[33mDate of birth and last four digits of social security number\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m,\n", + ")\n", + "\n", + "credit_history_source = FileSource(\n", + " path=\u001b[33m\"\u001b[39;49;00m\u001b[33mfeature_repo/data/credit_history.parquet\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m,\n", + " timestamp_field=\u001b[33m\"\u001b[39;49;00m\u001b[33mevent_timestamp\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m,\n", + " created_timestamp_column=\u001b[33m\"\u001b[39;49;00m\u001b[33mcreated_timestamp\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m,\n", + ")\n", + "\n", + "credit_history = FeatureView(\n", + " name=\u001b[33m\"\u001b[39;49;00m\u001b[33mcredit_history\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m,\n", + " entities=[\u001b[33m\"\u001b[39;49;00m\u001b[33mdob_ssn\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m],\n", + " ttl=timedelta(days=\u001b[34m90\u001b[39;49;00m),\n", + " schema=[\n", + " Field(name=\u001b[33m\"\u001b[39;49;00m\u001b[33mcredit_card_due\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, dtype=Int64),\n", + " Field(name=\u001b[33m\"\u001b[39;49;00m\u001b[33mmortgage_due\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, dtype=Int64),\n", + " Field(name=\u001b[33m\"\u001b[39;49;00m\u001b[33mstudent_loan_due\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, dtype=Int64),\n", + " Field(name=\u001b[33m\"\u001b[39;49;00m\u001b[33mvehicle_loan_due\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, dtype=Int64),\n", + " Field(name=\u001b[33m\"\u001b[39;49;00m\u001b[33mhard_pulls\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, dtype=Int64),\n", + " Field(name=\u001b[33m\"\u001b[39;49;00m\u001b[33mmissed_payments_2y\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, dtype=Int64),\n", + " Field(name=\u001b[33m\"\u001b[39;49;00m\u001b[33mmissed_payments_1y\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, dtype=Int64),\n", + " Field(name=\u001b[33m\"\u001b[39;49;00m\u001b[33mmissed_payments_6m\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, dtype=Int64),\n", + " Field(name=\u001b[33m\"\u001b[39;49;00m\u001b[33mbankruptcies\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, dtype=Int64),\n", + " ],\n", + " source=credit_history_source,\n", + ")\n" + ] + } + ], + "source": [ + "!pygmentize feature_repo/features.py" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "HQmrfEV33_SM" + }, + "source": [ + "Deploy the above defined feature store by running `apply` from within the feature_repo/ folder." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "SbL_EbMC2MFS", + "outputId": "13b07f1f-d52a-4c4e-a73f-f5478c0304de" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Feast is an open source project that collects anonymized error reporting and usage statistics. To opt out or learn more see https://docs.feast.dev/reference/usage\n", + "Created entity \u001b[1m\u001b[32mdob_ssn\u001b[0m\n", + "Created entity \u001b[1m\u001b[32mzipcode\u001b[0m\n", + "Created feature view \u001b[1m\u001b[32mcredit_history\u001b[0m\n", + "Created feature view \u001b[1m\u001b[32mzipcode_features\u001b[0m\n", + "\n", + "Created sqlite table \u001b[1m\u001b[32mfeature_repo_credit_history\u001b[0m\n", + "Created sqlite table \u001b[1m\u001b[32mfeature_repo_zipcode_features\u001b[0m\n", + "\n" + ] + } + ], + "source": [ + "! (cd feature_repo && feast apply)" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "D-9Kr-kdzg1r", + "outputId": "beabd2f7-c1a6-4fe3-b087-b35b7b9ab56b" + }, + "outputs": [], + "source": [ + "import feast\n", + "fs = feast.FeatureStore(repo_path=\"feature_repo\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5nR8uNE8z-YQ" + }, + "source": [ + "## Generate training data\n", + "On top of the features in Feast, we also have labeled training data at `data/loan_table.parquet`. At the time of training, loan table will be passed into Feast as an entity dataframe for training data generation. Feast will intelligently join credit_history and zipcode_feature tables to create relevant feature vectors to augment the training data." + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 424 + }, + "id": "twBCJMzVzV0X", + "outputId": "efb41c7a-2802-4169-906e-7db1c37d8c8e" + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
loan_iddob_ssnzipcodeperson_ageperson_incomeperson_home_ownershipperson_emp_lengthloan_intentloan_amntloan_int_rateloan_statusevent_timestampcreated_timestamp
01000019530219_5179761042259000RENT123.0PERSONAL3500016.0212021-08-25 20:34:41.361000+00:002021-08-25 20:34:41.361000+00:00
11000119520816_873770380219600OWN5.0EDUCATION100011.1402021-08-25 20:16:20.128000+00:002021-08-25 20:16:20.128000+00:00
21000219860413_253797039259600MORTGAGE1.0MEDICAL550012.8712021-08-25 19:57:58.896000+00:002021-08-25 19:57:58.896000+00:00
31000319760701_8090637852365500RENT4.0MEDICAL3500015.2312021-08-25 19:39:37.663000+00:002021-08-25 19:39:37.663000+00:00
41000419830125_8297822232454400RENT8.0MEDICAL3500014.2712021-08-25 19:21:16.430000+00:002021-08-25 19:21:16.430000+00:00
..........................................
286333863319491126_1487432055753000MORTGAGE1.0PERSONAL580013.1602020-08-25 21:48:06.292000+00:002020-08-25 21:48:06.292000+00:00
286343863419681208_65372487254120000MORTGAGE4.0PERSONAL176257.4902020-08-25 21:29:45.059000+00:002020-08-25 21:29:45.059000+00:00
286353863519880422_2592688266576000RENT3.0HOMEIMPROVEMENT3500010.9912020-08-25 21:11:23.826000+00:002020-08-25 21:11:23.826000+00:00
286363863619901017_61089201456150000MORTGAGE5.0PERSONAL1500011.4802020-08-25 20:53:02.594000+00:002020-08-25 20:53:02.594000+00:00
286373863719960703_3449690336642000RENT2.0MEDICAL64759.9902020-08-25 20:34:41.361000+00:002020-08-25 20:34:41.361000+00:00
\n", + "

28638 rows × 13 columns

\n", + "
" + ], + "text/plain": [ + " loan_id dob_ssn zipcode person_age person_income \\\n", + "0 10000 19530219_5179 76104 22 59000 \n", + "1 10001 19520816_8737 70380 21 9600 \n", + "2 10002 19860413_2537 97039 25 9600 \n", + "3 10003 19760701_8090 63785 23 65500 \n", + "4 10004 19830125_8297 82223 24 54400 \n", + "... ... ... ... ... ... \n", + "28633 38633 19491126_1487 43205 57 53000 \n", + "28634 38634 19681208_6537 24872 54 120000 \n", + "28635 38635 19880422_2592 68826 65 76000 \n", + "28636 38636 19901017_6108 92014 56 150000 \n", + "28637 38637 19960703_3449 69033 66 42000 \n", + "\n", + " person_home_ownership person_emp_length loan_intent loan_amnt \\\n", + "0 RENT 123.0 PERSONAL 35000 \n", + "1 OWN 5.0 EDUCATION 1000 \n", + "2 MORTGAGE 1.0 MEDICAL 5500 \n", + "3 RENT 4.0 MEDICAL 35000 \n", + "4 RENT 8.0 MEDICAL 35000 \n", + "... ... ... ... ... \n", + "28633 MORTGAGE 1.0 PERSONAL 5800 \n", + "28634 MORTGAGE 4.0 PERSONAL 17625 \n", + "28635 RENT 3.0 HOMEIMPROVEMENT 35000 \n", + "28636 MORTGAGE 5.0 PERSONAL 15000 \n", + "28637 RENT 2.0 MEDICAL 6475 \n", + "\n", + " loan_int_rate loan_status event_timestamp \\\n", + "0 16.02 1 2021-08-25 20:34:41.361000+00:00 \n", + "1 11.14 0 2021-08-25 20:16:20.128000+00:00 \n", + "2 12.87 1 2021-08-25 19:57:58.896000+00:00 \n", + "3 15.23 1 2021-08-25 19:39:37.663000+00:00 \n", + "4 14.27 1 2021-08-25 19:21:16.430000+00:00 \n", + "... ... ... ... \n", + "28633 13.16 0 2020-08-25 21:48:06.292000+00:00 \n", + "28634 7.49 0 2020-08-25 21:29:45.059000+00:00 \n", + "28635 10.99 1 2020-08-25 21:11:23.826000+00:00 \n", + "28636 11.48 0 2020-08-25 20:53:02.594000+00:00 \n", + "28637 9.99 0 2020-08-25 20:34:41.361000+00:00 \n", + "\n", + " created_timestamp \n", + "0 2021-08-25 20:34:41.361000+00:00 \n", + "1 2021-08-25 20:16:20.128000+00:00 \n", + "2 2021-08-25 19:57:58.896000+00:00 \n", + "3 2021-08-25 19:39:37.663000+00:00 \n", + "4 2021-08-25 19:21:16.430000+00:00 \n", + "... ... \n", + "28633 2020-08-25 21:48:06.292000+00:00 \n", + "28634 2020-08-25 21:29:45.059000+00:00 \n", + "28635 2020-08-25 21:11:23.826000+00:00 \n", + "28636 2020-08-25 20:53:02.594000+00:00 \n", + "28637 2020-08-25 20:34:41.361000+00:00 \n", + "\n", + "[28638 rows x 13 columns]" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "import pandas as pd\n", + "loan_df = pd.read_parquet(\"data/loan_table.parquet\")\n", + "display(loan_df)" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": { + "id": "hAHAb2nS6ClR" + }, + "outputs": [], + "source": [ + "feast_features = [\n", + " \"zipcode_features:city\",\n", + " \"zipcode_features:state\",\n", + " \"zipcode_features:location_type\",\n", + " \"zipcode_features:tax_returns_filed\",\n", + " \"zipcode_features:population\",\n", + " \"zipcode_features:total_wages\",\n", + " \"credit_history:credit_card_due\",\n", + " \"credit_history:mortgage_due\",\n", + " \"credit_history:student_loan_due\",\n", + " \"credit_history:vehicle_loan_due\",\n", + " \"credit_history:hard_pulls\",\n", + " \"credit_history:missed_payments_2y\",\n", + " \"credit_history:missed_payments_1y\",\n", + " \"credit_history:missed_payments_6m\",\n", + " \"credit_history:bankruptcies\",\n", + "]\n", + "\n", + "loan_w_offline_feature = fs.get_historical_features(\n", + " entity_df=loan_df, features=feast_features\n", + ").to_df()\n", + "\n", + "# Drop some unnecessary columns for simplicity\n", + "loan_w_offline_feature = loan_w_offline_feature.drop([\"event_timestamp\", \"created_timestamp__\", \"loan_id\", \"zipcode\", \"dob_ssn\"], axis=1)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TxDFT1776XgP" + }, + "source": [ + "Now let's take a look at the training data as it is augmented by Feast." + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 540 + }, + "id": "s-w2696D6h78", + "outputId": "f98c1dbe-1916-41bc-d543-590bac08caf5" + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
person_ageperson_incomeperson_home_ownershipperson_emp_lengthloan_intentloan_amntloan_int_rateloan_statuscitystate...total_wagescredit_card_duemortgage_duestudent_loan_duevehicle_loan_duehard_pullsmissed_payments_2ymissed_payments_1ymissed_payments_6mbankruptcies
13588865524543RENT3.0VENTURE400013.920SLIDELLLA...3150612171777690650463721043951210
13588155820000RENT0.0EDUCATION40009.990CHOUTEAUOK...59412230179146267019421358387102
13533486424000RENT1.0MEDICAL30006.990BISMARCKND...46962126359171780959118352791083210
13542005534000RENT0.0DEBTCONSOLIDATION120006.921SANTA BARBARACA...245375838091364271302482264027300
13542715174628MORTGAGE3.0PERSONAL300013.490HUNTINGTON BEACHCA...1974960136791659968375822028401000
..................................................................
6742852374000RENT3.0MEDICAL2500010.361MANSFIELDMO...331809885176108996344642287716100
66825021200000MORTGAGE2.0DEBTCONSOLIDATION2500013.990SALISBURYMD...47063405852971288915224712263005210
66832124200000MORTGAGE3.0VENTURE240007.490STRUNKKY...10067358654922399118061300501000
67002523215000MORTGAGE7.0MEDICAL3500014.790HAWTHORNPA...5956835907987603845562158801000
20340062259000RENT123.0PERSONAL3500016.021FORT WORTHTX...142325465841991803223281507801000
\n", + "

28638 rows × 23 columns

\n", + "
" + ], + "text/plain": [ + " person_age person_income person_home_ownership person_emp_length \\\n", + "1358886 55 24543 RENT 3.0 \n", + "1358815 58 20000 RENT 0.0 \n", + "1353348 64 24000 RENT 1.0 \n", + "1354200 55 34000 RENT 0.0 \n", + "1354271 51 74628 MORTGAGE 3.0 \n", + "... ... ... ... ... \n", + "674285 23 74000 RENT 3.0 \n", + "668250 21 200000 MORTGAGE 2.0 \n", + "668321 24 200000 MORTGAGE 3.0 \n", + "670025 23 215000 MORTGAGE 7.0 \n", + "2034006 22 59000 RENT 123.0 \n", + "\n", + " loan_intent loan_amnt loan_int_rate loan_status \\\n", + "1358886 VENTURE 4000 13.92 0 \n", + "1358815 EDUCATION 4000 9.99 0 \n", + "1353348 MEDICAL 3000 6.99 0 \n", + "1354200 DEBTCONSOLIDATION 12000 6.92 1 \n", + "1354271 PERSONAL 3000 13.49 0 \n", + "... ... ... ... ... \n", + "674285 MEDICAL 25000 10.36 1 \n", + "668250 DEBTCONSOLIDATION 25000 13.99 0 \n", + "668321 VENTURE 24000 7.49 0 \n", + "670025 MEDICAL 35000 14.79 0 \n", + "2034006 PERSONAL 35000 16.02 1 \n", + "\n", + " city state ... total_wages credit_card_due \\\n", + "1358886 SLIDELL LA ... 315061217 1777 \n", + "1358815 CHOUTEAU OK ... 59412230 1791 \n", + "1353348 BISMARCK ND ... 469621263 5917 \n", + "1354200 SANTA BARBARA CA ... 24537583 8091 \n", + "1354271 HUNTINGTON BEACH CA ... 19749601 3679 \n", + "... ... ... ... ... ... \n", + "674285 MANSFIELD MO ... 33180988 5176 \n", + "668250 SALISBURY MD ... 470634058 5297 \n", + "668321 STRUNK KY ... 10067358 6549 \n", + "670025 HAWTHORN PA ... 5956835 9079 \n", + "2034006 FORT WORTH TX ... 142325465 8419 \n", + "\n", + " mortgage_due student_loan_due vehicle_loan_due hard_pulls \\\n", + "1358886 690650 46372 10439 5 \n", + "1358815 462670 19421 3583 8 \n", + "1353348 1780959 11835 27910 8 \n", + "1354200 364271 30248 22640 2 \n", + "1354271 1659968 37582 20284 0 \n", + "... ... ... ... ... \n", + "674285 1089963 44642 2877 1 \n", + "668250 1288915 22471 22630 0 \n", + "668321 22399 11806 13005 0 \n", + "670025 876038 4556 21588 0 \n", + "2034006 91803 22328 15078 0 \n", + "\n", + " missed_payments_2y missed_payments_1y missed_payments_6m \\\n", + "1358886 1 2 1 \n", + "1358815 7 1 0 \n", + "1353348 3 2 1 \n", + "1354200 7 3 0 \n", + "1354271 1 0 0 \n", + "... ... ... ... \n", + "674285 6 1 0 \n", + "668250 5 2 1 \n", + "668321 1 0 0 \n", + "670025 1 0 0 \n", + "2034006 1 0 0 \n", + "\n", + " bankruptcies \n", + "1358886 0 \n", + "1358815 2 \n", + "1353348 0 \n", + "1354200 0 \n", + "1354271 0 \n", + "... ... \n", + "674285 0 \n", + "668250 0 \n", + "668321 0 \n", + "670025 0 \n", + "2034006 0 \n", + "\n", + "[28638 rows x 23 columns]" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "display(loan_w_offline_feature)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "KpiOm00789Rd" + }, + "outputs": [], + "source": [ + "# Convert into Train and Validation datasets.\n", + "import ray\n", + "\n", + "loan_ds = ray.data.from_pandas(loan_w_offline_feature)\n", + "train_ds, validation_ds = loan_ds.split_proportionately([0.8])\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "uFNp9sUCEGc2" + }, + "source": [ + "## Define Preprocessors\n", + "\n", + "[Preprocessor](https://docs.ray.io/en/latest/ray-air/getting-started.html#preprocessors) does last mile processing on Ray Datasets before feeding into training model." + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": { + "id": "qravnrt9EBuW" + }, + "outputs": [], + "source": [ + "categorical_features = [\n", + " \"person_home_ownership\",\n", + " \"loan_intent\",\n", + " \"city\",\n", + " \"state\",\n", + " \"location_type\",\n", + "]\n", + "\n", + "from ray.ml.preprocessors import Chain, OrdinalEncoder, SimpleImputer\n", + "\n", + "imputer = SimpleImputer(categorical_features, strategy=\"most_frequent\")\n", + "encoder = OrdinalEncoder(columns=categorical_features)\n", + "chained_preprocessor = Chain(imputer, encoder)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "SqPGGd6bEn0x" + }, + "source": [ + "## Train XGBoost model using Ray Air Trainer\n", + "Ray Air provides a variety of [Trainers](https://docs.ray.io/en/latest/ray-air/getting-started.html#trainer) that are integrated with popular machine learning frameworks. You can train a distributed model at scale leveraging Ray using the intuitive API `trainer.fit()`. The output is a Ray Air [Checkpoint](https://docs.ray.io/en/latest/ray-air/getting-started.html#module-ray.ml.checkpoint), that will seamlessly transfer the workload from training to prediction. Let's take a look!" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 670 + }, + "id": "995W14MdFmxl", + "outputId": "417c1188-edf6-4310-dba8-366b71d77806" + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/ray/anaconda3/lib/python3.8/site-packages/xgboost_ray/main.py:131: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.\n", + " XGBOOST_LOOSE_VERSION = LooseVersion(xgboost_version)\n", + "E0602 14:26:17.861773834 4511 fork_posix.cc:76] Other threads are currently calling into gRPC, skipping fork() handlers\n", + "/home/ray/anaconda3/lib/python3.8/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", + " from .autonotebook import tqdm as notebook_tqdm\n" + ] + }, + { + "data": { + "text/html": [ + "== Status ==
Current time: 2022-06-02 14:26:33 (running for 00:00:14.95)
Memory usage on this node: 3.5/30.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/8 CPUs, 0/0 GPUs, 0.0/18.04 GiB heap, 0.0/9.02 GiB objects
Result logdir: /home/ray/ray_results/XGBoostTrainer_2022-06-02_14-26-17
Number of trials: 1/1 (1 TERMINATED)
\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
Trial name status loc iter total time (s) train-logloss train-error validation-logloss
XGBoostTrainer_a3a2c_00000TERMINATED172.31.71.98:12634 100 11.9561 0.0578837 0.0127019 0.225994


" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "(GBDTTrainable pid=12634) 2022-06-02 14:26:23,018\tINFO main.py:980 -- [RayXGBoost] Created 1 new actors (1 total actors). Waiting until actors are ready for training.\n", + "(GBDTTrainable pid=12634) 2022-06-02 14:26:25,230\tINFO main.py:1025 -- [RayXGBoost] Starting XGBoost training.\n", + "(GBDTTrainable pid=12634) E0602 14:26:25.231635524 12691 fork_posix.cc:76] Other threads are currently calling into gRPC, skipping fork() handlers\n", + "(_RemoteRayXGBoostActor pid=12769) [14:26:25] task [xgboost.ray]:139712002042896 got new rank 0\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Result for XGBoostTrainer_a3a2c_00000:\n", + " date: 2022-06-02_14-26-26\n", + " done: false\n", + " experiment_id: 14b63d641b8e4583b3551b1af113ec6d\n", + " hostname: ip-172-31-71-98\n", + " iterations_since_restore: 1\n", + " node_ip: 172.31.71.98\n", + " pid: 12634\n", + " should_checkpoint: true\n", + " time_since_restore: 5.432286262512207\n", + " time_this_iter_s: 5.432286262512207\n", + " time_total_s: 5.432286262512207\n", + " timestamp: 1654205186\n", + " timesteps_since_restore: 0\n", + " train-error: 0.09502400698384984\n", + " train-logloss: 0.5147884634112437\n", + " training_iteration: 1\n", + " trial_id: a3a2c_00000\n", + " validation-error: 0.1627094972067039\n", + " validation-logloss: 0.5557328870197414\n", + " warmup_time: 0.004194974899291992\n", + " \n", + "Result for XGBoostTrainer_a3a2c_00000:\n", + " date: 2022-06-02_14-26-31\n", + " done: false\n", + " experiment_id: 14b63d641b8e4583b3551b1af113ec6d\n", + " hostname: ip-172-31-71-98\n", + " iterations_since_restore: 81\n", + " node_ip: 172.31.71.98\n", + " pid: 12634\n", + " should_checkpoint: true\n", + " time_since_restore: 10.460175275802612\n", + " time_this_iter_s: 0.03925156593322754\n", + " time_total_s: 10.460175275802612\n", + " timestamp: 1654205191\n", + " timesteps_since_restore: 0\n", + " train-error: 0.01802706241815801\n", + " train-logloss: 0.07058723671627426\n", + " training_iteration: 81\n", + " trial_id: a3a2c_00000\n", + " validation-error: 0.0824022346368715\n", + " validation-logloss: 0.22556984905196217\n", + " warmup_time: 0.004194974899291992\n", + " \n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "(GBDTTrainable pid=12634) 2022-06-02 14:26:32,801\tINFO main.py:1516 -- [RayXGBoost] Finished XGBoost training on training data with total N=22,910 in 9.81 seconds (7.57 pure XGBoost training time).\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Result for XGBoostTrainer_a3a2c_00000:\n", + " date: 2022-06-02_14-26-32\n", + " done: true\n", + " experiment_id: 14b63d641b8e4583b3551b1af113ec6d\n", + " experiment_tag: '0'\n", + " hostname: ip-172-31-71-98\n", + " iterations_since_restore: 100\n", + " node_ip: 172.31.71.98\n", + " pid: 12634\n", + " should_checkpoint: true\n", + " time_since_restore: 11.956074953079224\n", + " time_this_iter_s: 0.022900104522705078\n", + " time_total_s: 11.956074953079224\n", + " timestamp: 1654205192\n", + " timesteps_since_restore: 0\n", + " train-error: 0.01270187690964644\n", + " train-logloss: 0.05788368908741939\n", + " training_iteration: 100\n", + " trial_id: a3a2c_00000\n", + " validation-error: 0.0825768156424581\n", + " validation-logloss: 0.22599449588356768\n", + " warmup_time: 0.004194974899291992\n", + " \n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "2022-06-02 14:26:33,481\tINFO tune.py:752 -- Total run time: 15.62 seconds (14.94 seconds for the tuning loop).\n" + ] + }, + { + "data": { + "text/plain": [ + "'checkpoint'" + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "LABEL = \"loan_status\"\n", + "CHECKPOINT_PATH = \"checkpoint\"\n", + "NUM_WORKERS = 1 # Change this based on the resources in the cluster.\n", + "\n", + "\n", + "from ray.ml.train.integrations.xgboost import XGBoostTrainer\n", + "params = {\n", + " \"tree_method\": \"approx\",\n", + " \"objective\": \"binary:logistic\",\n", + " \"eval_metric\": [\"logloss\", \"error\"],\n", + "}\n", + "\n", + "trainer = XGBoostTrainer(\n", + " scaling_config={\n", + " \"num_workers\": NUM_WORKERS,\n", + " \"use_gpu\": 0,\n", + " },\n", + " label_column=LABEL,\n", + " params=params,\n", + " datasets={\"train\": train_ds, \"validation\": validation_ds},\n", + " preprocessor=chained_preprocessor,\n", + " num_boost_round=100,\n", + ")\n", + "checkpoint = trainer.fit().checkpoint\n", + "# This saves the checkpoint onto disk\n", + "checkpoint.to_directory(CHECKPOINT_PATH)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QJr73gOvKBza" + }, + "source": [ + "## Inference\n", + "Now from the Checkpoint object we obtained from last session, we can construct a Ray Air [Predictor](https://docs.ray.io/en/latest/ray-air/getting-started.html#predictors) that encapsulates everything needed for inference.\n", + "\n", + "The API for using Predictor is also very intuitive - simply call `Predictor.predict()`." + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": { + "id": "wE8ielQlKYSi" + }, + "outputs": [], + "source": [ + "from ray.ml.checkpoint import Checkpoint\n", + "from ray.ml.predictors.integrations.xgboost import XGBoostPredictor\n", + "predictor = XGBoostPredictor.from_checkpoint(Checkpoint.from_directory(CHECKPOINT_PATH))" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": { + "id": "K9Y9UiD3KqSW" + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/tmp/ipykernel_4511/2153939661.py:23: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.\n", + "Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations\n", + " loan_request_df = pd.DataFrame.from_dict(loan_request_dict, dtype=np.float)\n", + "/tmp/ipykernel_4511/2153939661.py:23: FutureWarning: Could not cast to float64, falling back to object. This behavior is deprecated. In a future version, when a dtype is passed to 'DataFrame', either all columns will be cast to that dtype, or a TypeError will be raised.\n", + " loan_request_df = pd.DataFrame.from_dict(loan_request_dict, dtype=np.float)\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
person_ageperson_incomeperson_home_ownershipperson_emp_lengthloan_intentloan_amntloan_int_ratestatepopulationlocation_type...tax_returns_filedstudent_loan_duemissed_payments_1yhard_pullsmortgage_duebankruptciescredit_card_duemissed_payments_2ymissed_payments_6mvehicle_loan_due
0133.059000.0RENT123.0PERSONAL35000.016.02NaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
\n", + "

1 rows × 22 columns

\n", + "
" + ], + "text/plain": [ + " person_age person_income person_home_ownership person_emp_length \\\n", + "0 133.0 59000.0 RENT 123.0 \n", + "\n", + " loan_intent loan_amnt loan_int_rate state population location_type \\\n", + "0 PERSONAL 35000.0 16.02 NaN NaN NaN \n", + "\n", + " ... tax_returns_filed student_loan_due missed_payments_1y hard_pulls \\\n", + "0 ... NaN NaN NaN NaN \n", + "\n", + " mortgage_due bankruptcies credit_card_due missed_payments_2y \\\n", + "0 NaN NaN NaN NaN \n", + "\n", + " missed_payments_6m vehicle_loan_due \n", + "0 NaN NaN \n", + "\n", + "[1 rows x 22 columns]" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "import numpy as np\n", + "## Now let's do some prediciton.\n", + "loan_request_dict = {\n", + " \"zipcode\": [76104],\n", + " \"dob_ssn\": [\"19630621_4278\"],\n", + " \"person_age\": [133],\n", + " \"person_income\": [59000],\n", + " \"person_home_ownership\": [\"RENT\"],\n", + " \"person_emp_length\": [123.0],\n", + " \"loan_intent\": [\"PERSONAL\"],\n", + " \"loan_amnt\": [35000],\n", + " \"loan_int_rate\": [16.02],\n", + "}\n", + "\n", + "# Now augment the request with online features.\n", + "zipcode = loan_request_dict[\"zipcode\"][0]\n", + "dob_ssn = loan_request_dict[\"dob_ssn\"][0]\n", + "online_features = fs.get_online_features(\n", + " entity_rows=[{\"zipcode\": zipcode, \"dob_ssn\": dob_ssn}],\n", + " features=feast_features,\n", + ").to_dict()\n", + "loan_request_dict.update(online_features)\n", + "loan_request_df = pd.DataFrame.from_dict(loan_request_dict, dtype=np.float)\n", + "loan_request_df = loan_request_df.drop([\"zipcode\", \"dob_ssn\"], axis=1)\n", + "display(loan_request_df)" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": { + "id": "eS7_n1GPLL1e" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Loan rejected!\n" + ] + } + ], + "source": [ + "# Run through our predictor using `Predictor.predict()` API.\n", + "loan_result = np.round(predictor.predict(loan_request_df)[\"predictions\"][0])\n", + "\n", + "if loan_result == 0:\n", + " print(\"Loan approved!\")\n", + "elif loan_result == 1:\n", + " print(\"Loan rejected!\")" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "air + feast", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.5" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/doc/source/ray-air/examples/index.rst b/doc/source/ray-air/examples/index.rst index d1e570984..3f8a1e8cb 100644 --- a/doc/source/ray-air/examples/index.rst +++ b/doc/source/ray-air/examples/index.rst @@ -39,3 +39,4 @@ Advanced -------- - :doc:`/ray-air/examples/torch_incremental_learning`: Incrementally train and deploy a PyTorch CV model +- :doc:`/ray-air/examples/feast_example`: Integrate with Feast feature store in both train and inference diff --git a/doc/source/ray-air/examples/tfx_tabular_train_to_serve.ipynb b/doc/source/ray-air/examples/tfx_tabular_train_to_serve.ipynb index 3e92a3fac..3831fd5e4 100644 --- a/doc/source/ray-air/examples/tfx_tabular_train_to_serve.ipynb +++ b/doc/source/ray-air/examples/tfx_tabular_train_to_serve.ipynb @@ -15,7 +15,7 @@ "every step from data ingestion to pushing a model to serving.\n", "\n", "1. Read a CSV into [Ray Dataset](https://docs.ray.io/en/latest/data/dataset.html).\n", - "2. Process the dataset by chaining [Ray AIR preprocessors](https://docs.ray.io/en/master/ray-air/package-ref.html#preprocessors).\n", + "2. Process the dataset by chaining [Ray AIR preprocessors](https://docs.ray.io/en/latest/ray-air/getting-started.html#preprocessors).\n", "3. Train the model using the TensorflowTrainer from AIR.\n", "4. Serve the model using Ray Serve and the above preprocessors." ] @@ -453,14 +453,14 @@ "a modularized component so that the same logic can be applied to both\n", "training data as well as data for online serving or offline batch prediction.\n", "\n", - "In AIR, this component is a [`Preprocessor`](https://docs.ray.io/en/master/ray-air/package-ref.html#preprocessors).\n", + "In AIR, this component is a [`Preprocessor`](https://docs.ray.io/en/latest/ray-air/getting-started.html#preprocessors).\n", "It is constructed in a way that allows easy composition.\n", "\n", "Now let's construct a chained preprocessor composed of simple preprocessors, including\n", "1. Imputer for filling missing features;\n", "2. OneHotEncoder for encoding categorical features;\n", "3. BatchMapper where arbitrary user-defined function can be applied to batches of records;\n", - "and so on. Take a look at [`Preprocessor`](https://docs.ray.io/en/master/ray-air/package-ref.html#preprocessors).\n", + "and so on. Take a look at [`Preprocessor`](https://docs.ray.io/en/latest/ray-air/getting-started.html#preprocessors).\n", "The output of the preprocessing step goes into model for training." ] },