Oss-inference-examples (#2915)

* New notebooks for OSS inference examples * removed outputs * Updated sub rg ws
Azure · Dec 14, 2023 · 4c80148 · 4c80148
1 parent 3699394
commit 4c80148
Show file tree

Hide file tree

Showing 5 changed files with 1,676 additions and 0 deletions.
diff --git a/sdk/python/foundation-models/system/inference/fill-mask/fill-mask-online-endpoint-oss.ipynb b/sdk/python/foundation-models/system/inference/fill-mask/fill-mask-online-endpoint-oss.ipynb
@@ -0,0 +1,349 @@
+{
+ "cells": [
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Fill Mask Inference using Online Endpoints\n",
+    "\n",
+    "This sample shows how to deploy `fill-mask` type models to an online endpoint for inference.\n",
+    "\n",
+    "### Task\n",
+    "`fill-mask` task is about predicting masked words in a sentence. Models that perform this have a good understanding of the language structure and domain of the dataset that they are trained on. `fill-mask` models are typically used as foundation models for more scenario oriented tasks such as `text-classification` or `token-classification`.\n",
+    "\n",
+    "### Model\n",
+    "Models that can perform the `fill-mask` task are tagged with `task: fill-mask`. We will use the `bert-base-uncased` model in this notebook. If you opened this notebook from a specific model card, remember to replace the specific model name. If you don't find a model that suits your scenario or domain, you can discover and [import models from HuggingFace hub](../../import/import_model_into_registry.ipynb) and then use them for inference. \n",
+    "\n",
+    "### Inference data\n",
+    "We will use the [book corpus](https://huggingface.co/datasets/bookcorpus) dataset.\n",
+    "\n",
+    "### Outline\n",
+    "* Set up pre-requisites.\n",
+    "* Pick a model to deploy.\n",
+    "* Download and prepare data for inference. \n",
+    "* Deploy the model for real time inference.\n",
+    "* Test the endpoint\n",
+    "* Clean up resources."
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 1. Set up pre-requisites\n",
+    "* Install dependencies\n",
+    "* Connect to AzureML Workspace. Learn more at [set up SDK authentication](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-setup-authentication?tabs=sdk). Replace  `<WORKSPACE_NAME>`, `<RESOURCE_GROUP>` and `<SUBSCRIPTION_ID>` below.\n",
+    "* Connect to `azureml` system registry"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from azure.ai.ml import MLClient\n",
+    "from azure.identity import (\n",
+    "    DefaultAzureCredential,\n",
+    "    InteractiveBrowserCredential,\n",
+    "    ClientSecretCredential,\n",
+    ")\n",
+    "from azure.ai.ml.entities import AmlCompute\n",
+    "import time\n",
+    "\n",
+    "try:\n",
+    "    credential = DefaultAzureCredential()\n",
+    "    credential.get_token(\"https://management.azure.com/.default\")\n",
+    "except Exception as ex:\n",
+    "    credential = InteractiveBrowserCredential()\n",
+    "\n",
+    "workspace_ml_client = MLClient(\n",
+    "    credential,\n",
+    "    subscription_id=\"<SUBSCRIPTION_ID>\",\n",
+    "    resource_group_name=\"<RESOURCE_GROUP>\",\n",
+    "    workspace_name=\"<WORKSPACE_NAME>\",\n",
+    ")\n",
+    "# The models, fine tuning pipelines and environments are available in the AzureML system registry, \"azureml\"\n",
+    "registry_ml_client = MLClient(credential, registry_name=\"azureml\")"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 2. Pick a model to deploy\n",
+    "\n",
+    "Browse models in the Model Catalog in the AzureML Studio, filtering by the `fill-mask` task. In this example, we use the `bert-base-uncased` model. If you have opened this notebook for a different model, replace the model name and version accordingly. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model_name = \"bert-base-uncased\"\n",
+    "version_list = list(registry_ml_client.models.list(model_name))\n",
+    "if len(version_list) == 0:\n",
+    "    print(\"Model not found in registry\")\n",
+    "else:\n",
+    "    model_version = version_list[0].version\n",
+    "    foundation_model = registry_ml_client.models.get(model_name, model_version)\n",
+    "    print(\n",
+    "        \"\\n\\nUsing model name: {0}, version: {1}, id: {2} for inferencing\".format(\n",
+    "            foundation_model.name, foundation_model.version, foundation_model.id\n",
+    "        )\n",
+    "    )"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 3. Download and prepare data for inference.\n",
+    "\n",
+    "The next few cells show basic data preparation:\n",
+    "* Visualize some data rows\n",
+    "* We will `<mask>` one word in each sentence so that the model can predict the masked words.\n",
+    "* Save few samples in the format that can be passed as input to the online-inference endpoint."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Download a small sample of the dataset into the ./book-corpus-dataset directory\n",
+    "%run ./book-corpus-dataset/download-dataset.py --download_dir ./book-corpus-dataset"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# load the ./book-corpus-dataset/train.jsonl file into a pandas dataframe and show the first 5 rows\n",
+    "import pandas as pd\n",
+    "\n",
+    "pd.set_option(\n",
+    "    \"display.max_colwidth\", 0\n",
+    ")  # set the max column width to 0 to display the full text\n",
+    "train_df = pd.read_json(\"./book-corpus-dataset/train.jsonl\", lines=True)\n",
+    "train_df.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Get the right mask token from huggingface\n",
+    "import urllib.request, json\n",
+    "\n",
+    "with urllib.request.urlopen(f\"https://huggingface.co/api/models/{model_name}\") as url:\n",
+    "    data = json.load(url)\n",
+    "    mask_token = data[\"mask_token\"]\n",
+    "\n",
+    "# take the value of the \"text\" column, replace a random word with the mask token and save the result in the \"masked_text\" column\n",
+    "import random, os\n",
+    "\n",
+    "train_df[\"masked_text\"] = train_df[\"text\"].apply(\n",
+    "    lambda x: x.replace(random.choice(x.split()), mask_token, 1)\n",
+    ")\n",
+    "# save the train_df dataframe to a jsonl file in the ./book-corpus-dataset folder with the masked_ prefix\n",
+    "train_df.to_json(\n",
+    "    os.path.join(\".\", \"book-corpus-dataset\", \"masked_train.jsonl\"),\n",
+    "    orient=\"records\",\n",
+    "    lines=True,\n",
+    ")\n",
+    "train_df.head()"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 4. Deploy the model to an online endpoint\n",
+    "Online endpoints give a durable REST API that can be used to integrate with applications that need to use the model."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import time, sys\n",
+    "from azure.ai.ml.entities import (\n",
+    "    ManagedOnlineEndpoint,\n",
+    "    ManagedOnlineDeployment,\n",
+    "    ProbeSettings,\n",
+    ")\n",
+    "\n",
+    "# Create online endpoint - endpoint names need to be unique in a region, hence using timestamp to create unique endpoint name\n",
+    "timestamp = int(time.time())\n",
+    "online_endpoint_name = \"fill-mask-\" + str(timestamp)\n",
+    "# create an online endpoint\n",
+    "endpoint = ManagedOnlineEndpoint(\n",
+    "    name=online_endpoint_name,\n",
+    "    description=\"Online endpoint for \" + foundation_model.name + \", for fill-mask task\",\n",
+    "    auth_mode=\"key\",\n",
+    ")\n",
+    "workspace_ml_client.begin_create_or_update(endpoint).wait()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# create a deployment\n",
+    "demo_deployment = ManagedOnlineDeployment(\n",
+    "    name=\"demo\",\n",
+    "    endpoint_name=online_endpoint_name,\n",
+    "    model=foundation_model.id,\n",
+    "    instance_type=\"Standard_DS3_v2\",\n",
+    "    instance_count=2,\n",
+    "    liveness_probe=ProbeSettings(\n",
+    "        failure_threshold=30,\n",
+    "        success_threshold=1,\n",
+    "        timeout=2,\n",
+    "        period=10,\n",
+    "        initial_delay=1000,\n",
+    "    ),\n",
+    "    readiness_probe=ProbeSettings(\n",
+    "        failure_threshold=10,\n",
+    "        success_threshold=1,\n",
+    "        timeout=10,\n",
+    "        period=10,\n",
+    "        initial_delay=1000,\n",
+    "    ),\n",
+    ")\n",
+    "workspace_ml_client.online_deployments.begin_create_or_update(demo_deployment).wait()\n",
+    "endpoint.traffic = {\"demo\": 100}\n",
+    "workspace_ml_client.begin_create_or_update(endpoint).result()"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 5. Test the endpoint with sample data\n",
+    "\n",
+    "We will fetch some sample data from the test dataset and submit to online endpoint for inference. We will then show the display the scored labels alongside the ground truth labels"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import json\n",
+    "\n",
+    "# read the ./book-corpus-dataset/masked_train.jsonl file into a pandas dataframe\n",
+    "df = pd.read_json(\"./book-corpus-dataset/masked_train.jsonl\", lines=True)\n",
+    "# escape single and double quotes in the masked_text column\n",
+    "df[\"masked_text\"] = df[\"masked_text\"].str.replace(\"'\", \"\\\\'\").str.replace('\"', '\\\\\"')\n",
+    "# pick 1 random row\n",
+    "sample_df = df.sample(1)\n",
+    "# create a json object with the key as \"inputs\" and value as a list of values from the masked_text column of the sample_df dataframe\n",
+    "test_json = {\"input_data\": sample_df[\"masked_text\"].tolist()}\n",
+    "# save the json object to a file named sample_score.json in the ./book-corpus-dataset folder\n",
+    "with open(os.path.join(\".\", \"book-corpus-dataset\", \"sample_score.json\"), \"w\") as f:\n",
+    "    json.dump(test_json, f)\n",
+    "sample_df.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# score the sample_score.json file using the online endpoint with the azureml endpoint invoke method\n",
+    "response = workspace_ml_client.online_endpoints.invoke(\n",
+    "    endpoint_name=online_endpoint_name,\n",
+    "    deployment_name=\"demo\",\n",
+    "    request_file=\"./book-corpus-dataset/sample_score.json\",\n",
+    ")\n",
+    "print(\"raw response: \\n\", response, \"\\n\")\n",
+    "# convert the json response to a pandas dataframe\n",
+    "response_df = pd.read_json(response)\n",
+    "response_df.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# compare the predicted squences with the ground truth sequence\n",
+    "compare_df = pd.DataFrame(\n",
+    "    {\n",
+    "        \"ground_truth_sequence\": sample_df[\"text\"].tolist(),\n",
+    "        \"predicted_sequence\": [\n",
+    "            sample_df[\"masked_text\"].tolist()[0].replace(mask_token, response_df[0][0])\n",
+    "        ],\n",
+    "    }\n",
+    ")\n",
+    "compare_df.head()"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 6. Delete the online endpoint\n",
+    "Don't forget to delete the online endpoint, else you will leave the billing meter running for the compute used by the endpoint"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "workspace_ml_client.online_endpoints.begin_delete(name=online_endpoint_name).wait()"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "base",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.7"
+  },
+  "vscode": {
+   "interpreter": {
+    "hash": "2f394aca7ca06fed1e6064aef884364492d7cdda3614a461e02e6407fc40ba69"
+   }
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}