diff --git a/sdk/python/foundation-models/system/inference/fill-mask/fill-mask-online-endpoint-oss.ipynb b/sdk/python/foundation-models/system/inference/fill-mask/fill-mask-online-endpoint-oss.ipynb
new file mode 100644
index 00000000000..5c3d8ea6310
--- /dev/null
+++ b/sdk/python/foundation-models/system/inference/fill-mask/fill-mask-online-endpoint-oss.ipynb
@@ -0,0 +1,349 @@
+{
+ "cells": [
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Fill Mask Inference using Online Endpoints\n",
+    "\n",
+    "This sample shows how to deploy `fill-mask` type models to an online endpoint for inference.\n",
+    "\n",
+    "### Task\n",
+    "`fill-mask` task is about predicting masked words in a sentence. Models that perform this have a good understanding of the language structure and domain of the dataset that they are trained on. `fill-mask` models are typically used as foundation models for more scenario oriented tasks such as `text-classification` or `token-classification`.\n",
+    "\n",
+    "### Model\n",
+    "Models that can perform the `fill-mask` task are tagged with `task: fill-mask`. We will use the `bert-base-uncased` model in this notebook. If you opened this notebook from a specific model card, remember to replace the specific model name. If you don't find a model that suits your scenario or domain, you can discover and [import models from HuggingFace hub](../../import/import_model_into_registry.ipynb) and then use them for inference. \n",
+    "\n",
+    "### Inference data\n",
+    "We will use the [book corpus](https://huggingface.co/datasets/bookcorpus) dataset.\n",
+    "\n",
+    "### Outline\n",
+    "* Set up pre-requisites.\n",
+    "* Pick a model to deploy.\n",
+    "* Download and prepare data for inference. \n",
+    "* Deploy the model for real time inference.\n",
+    "* Test the endpoint\n",
+    "* Clean up resources."
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 1. Set up pre-requisites\n",
+    "* Install dependencies\n",
+    "* Connect to AzureML Workspace. Learn more at [set up SDK authentication](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-setup-authentication?tabs=sdk). Replace  `<WORKSPACE_NAME>`, `<RESOURCE_GROUP>` and `<SUBSCRIPTION_ID>` below.\n",
+    "* Connect to `azureml` system registry"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from azure.ai.ml import MLClient\n",
+    "from azure.identity import (\n",
+    "    DefaultAzureCredential,\n",
+    "    InteractiveBrowserCredential,\n",
+    "    ClientSecretCredential,\n",
+    ")\n",
+    "from azure.ai.ml.entities import AmlCompute\n",
+    "import time\n",
+    "\n",
+    "try:\n",
+    "    credential = DefaultAzureCredential()\n",
+    "    credential.get_token(\"https://management.azure.com/.default\")\n",
+    "except Exception as ex:\n",
+    "    credential = InteractiveBrowserCredential()\n",
+    "\n",
+    "workspace_ml_client = MLClient(\n",
+    "    credential,\n",
+    "    subscription_id=\"<SUBSCRIPTION_ID>\",\n",
+    "    resource_group_name=\"<RESOURCE_GROUP>\",\n",
+    "    workspace_name=\"<WORKSPACE_NAME>\",\n",
+    ")\n",
+    "# The models, fine tuning pipelines and environments are available in the AzureML system registry, \"azureml\"\n",
+    "registry_ml_client = MLClient(credential, registry_name=\"azureml\")"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 2. Pick a model to deploy\n",
+    "\n",
+    "Browse models in the Model Catalog in the AzureML Studio, filtering by the `fill-mask` task. In this example, we use the `bert-base-uncased` model. If you have opened this notebook for a different model, replace the model name and version accordingly. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model_name = \"bert-base-uncased\"\n",
+    "version_list = list(registry_ml_client.models.list(model_name))\n",
+    "if len(version_list) == 0:\n",
+    "    print(\"Model not found in registry\")\n",
+    "else:\n",
+    "    model_version = version_list[0].version\n",
+    "    foundation_model = registry_ml_client.models.get(model_name, model_version)\n",
+    "    print(\n",
+    "        \"\\n\\nUsing model name: {0}, version: {1}, id: {2} for inferencing\".format(\n",
+    "            foundation_model.name, foundation_model.version, foundation_model.id\n",
+    "        )\n",
+    "    )"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 3. Download and prepare data for inference.\n",
+    "\n",
+    "The next few cells show basic data preparation:\n",
+    "* Visualize some data rows\n",
+    "* We will `<mask>` one word in each sentence so that the model can predict the masked words.\n",
+    "* Save few samples in the format that can be passed as input to the online-inference endpoint."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Download a small sample of the dataset into the ./book-corpus-dataset directory\n",
+    "%run ./book-corpus-dataset/download-dataset.py --download_dir ./book-corpus-dataset"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# load the ./book-corpus-dataset/train.jsonl file into a pandas dataframe and show the first 5 rows\n",
+    "import pandas as pd\n",
+    "\n",
+    "pd.set_option(\n",
+    "    \"display.max_colwidth\", 0\n",
+    ")  # set the max column width to 0 to display the full text\n",
+    "train_df = pd.read_json(\"./book-corpus-dataset/train.jsonl\", lines=True)\n",
+    "train_df.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Get the right mask token from huggingface\n",
+    "import urllib.request, json\n",
+    "\n",
+    "with urllib.request.urlopen(f\"https://huggingface.co/api/models/{model_name}\") as url:\n",
+    "    data = json.load(url)\n",
+    "    mask_token = data[\"mask_token\"]\n",
+    "\n",
+    "# take the value of the \"text\" column, replace a random word with the mask token and save the result in the \"masked_text\" column\n",
+    "import random, os\n",
+    "\n",
+    "train_df[\"masked_text\"] = train_df[\"text\"].apply(\n",
+    "    lambda x: x.replace(random.choice(x.split()), mask_token, 1)\n",
+    ")\n",
+    "# save the train_df dataframe to a jsonl file in the ./book-corpus-dataset folder with the masked_ prefix\n",
+    "train_df.to_json(\n",
+    "    os.path.join(\".\", \"book-corpus-dataset\", \"masked_train.jsonl\"),\n",
+    "    orient=\"records\",\n",
+    "    lines=True,\n",
+    ")\n",
+    "train_df.head()"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 4. Deploy the model to an online endpoint\n",
+    "Online endpoints give a durable REST API that can be used to integrate with applications that need to use the model."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import time, sys\n",
+    "from azure.ai.ml.entities import (\n",
+    "    ManagedOnlineEndpoint,\n",
+    "    ManagedOnlineDeployment,\n",
+    "    ProbeSettings,\n",
+    ")\n",
+    "\n",
+    "# Create online endpoint - endpoint names need to be unique in a region, hence using timestamp to create unique endpoint name\n",
+    "timestamp = int(time.time())\n",
+    "online_endpoint_name = \"fill-mask-\" + str(timestamp)\n",
+    "# create an online endpoint\n",
+    "endpoint = ManagedOnlineEndpoint(\n",
+    "    name=online_endpoint_name,\n",
+    "    description=\"Online endpoint for \" + foundation_model.name + \", for fill-mask task\",\n",
+    "    auth_mode=\"key\",\n",
+    ")\n",
+    "workspace_ml_client.begin_create_or_update(endpoint).wait()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# create a deployment\n",
+    "demo_deployment = ManagedOnlineDeployment(\n",
+    "    name=\"demo\",\n",
+    "    endpoint_name=online_endpoint_name,\n",
+    "    model=foundation_model.id,\n",
+    "    instance_type=\"Standard_DS3_v2\",\n",
+    "    instance_count=2,\n",
+    "    liveness_probe=ProbeSettings(\n",
+    "        failure_threshold=30,\n",
+    "        success_threshold=1,\n",
+    "        timeout=2,\n",
+    "        period=10,\n",
+    "        initial_delay=1000,\n",
+    "    ),\n",
+    "    readiness_probe=ProbeSettings(\n",
+    "        failure_threshold=10,\n",
+    "        success_threshold=1,\n",
+    "        timeout=10,\n",
+    "        period=10,\n",
+    "        initial_delay=1000,\n",
+    "    ),\n",
+    ")\n",
+    "workspace_ml_client.online_deployments.begin_create_or_update(demo_deployment).wait()\n",
+    "endpoint.traffic = {\"demo\": 100}\n",
+    "workspace_ml_client.begin_create_or_update(endpoint).result()"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 5. Test the endpoint with sample data\n",
+    "\n",
+    "We will fetch some sample data from the test dataset and submit to online endpoint for inference. We will then show the display the scored labels alongside the ground truth labels"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import json\n",
+    "\n",
+    "# read the ./book-corpus-dataset/masked_train.jsonl file into a pandas dataframe\n",
+    "df = pd.read_json(\"./book-corpus-dataset/masked_train.jsonl\", lines=True)\n",
+    "# escape single and double quotes in the masked_text column\n",
+    "df[\"masked_text\"] = df[\"masked_text\"].str.replace(\"'\", \"\\\\'\").str.replace('\"', '\\\\\"')\n",
+    "# pick 1 random row\n",
+    "sample_df = df.sample(1)\n",
+    "# create a json object with the key as \"inputs\" and value as a list of values from the masked_text column of the sample_df dataframe\n",
+    "test_json = {\"input_data\": sample_df[\"masked_text\"].tolist()}\n",
+    "# save the json object to a file named sample_score.json in the ./book-corpus-dataset folder\n",
+    "with open(os.path.join(\".\", \"book-corpus-dataset\", \"sample_score.json\"), \"w\") as f:\n",
+    "    json.dump(test_json, f)\n",
+    "sample_df.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# score the sample_score.json file using the online endpoint with the azureml endpoint invoke method\n",
+    "response = workspace_ml_client.online_endpoints.invoke(\n",
+    "    endpoint_name=online_endpoint_name,\n",
+    "    deployment_name=\"demo\",\n",
+    "    request_file=\"./book-corpus-dataset/sample_score.json\",\n",
+    ")\n",
+    "print(\"raw response: \\n\", response, \"\\n\")\n",
+    "# convert the json response to a pandas dataframe\n",
+    "response_df = pd.read_json(response)\n",
+    "response_df.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# compare the predicted squences with the ground truth sequence\n",
+    "compare_df = pd.DataFrame(\n",
+    "    {\n",
+    "        \"ground_truth_sequence\": sample_df[\"text\"].tolist(),\n",
+    "        \"predicted_sequence\": [\n",
+    "            sample_df[\"masked_text\"].tolist()[0].replace(mask_token, response_df[0][0])\n",
+    "        ],\n",
+    "    }\n",
+    ")\n",
+    "compare_df.head()"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 6. Delete the online endpoint\n",
+    "Don't forget to delete the online endpoint, else you will leave the billing meter running for the compute used by the endpoint"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "workspace_ml_client.online_endpoints.begin_delete(name=online_endpoint_name).wait()"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "base",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.7"
+  },
+  "vscode": {
+   "interpreter": {
+    "hash": "2f394aca7ca06fed1e6064aef884364492d7cdda3614a461e02e6407fc40ba69"
+   }
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/sdk/python/foundation-models/system/inference/question-answering/question-answering-online-endpoint-oss.ipynb b/sdk/python/foundation-models/system/inference/question-answering/question-answering-online-endpoint-oss.ipynb
new file mode 100644
index 00000000000..acdea6d115b
--- /dev/null
+++ b/sdk/python/foundation-models/system/inference/question-answering/question-answering-online-endpoint-oss.ipynb
@@ -0,0 +1,328 @@
+{
+ "cells": [
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Question Answering Inference using Online Endpoints\n",
+    "\n",
+    "This sample shows how to deploy `question-answering` type models to an online endpoint for inference.\n",
+    "\n",
+    "### Task\n",
+    "`question-answering` tasks return an answer given a question. There are two common types of `question-answering` tasks:\n",
+    "\n",
+    "* Extractive: extract the answer from the given context.\n",
+    "* Abstractive: generate an answer from the context that correctly answers the question.\n",
+    " \n",
+    "### Model\n",
+    "Models that can perform the `question-answering` task are tagged with `task: question-answering`. We will use the `deepset-minilm-uncased-squad2` model in this notebook. If you opened this notebook from a specific model card, remember to replace the specific model name. If you don't find a model that suits your scenario or domain, you can discover and [import models from HuggingFace hub](../../import/import_model_into_registry.ipynb) and then use them for inference. \n",
+    "\n",
+    "### Inference data\n",
+    "We will use the [SQUAD](https://huggingface.co/datasets/squad) dataset. The [original source](https://rajpurkar.github.io/SQuAD-explorer/) of dataset describes it as follows: _\"Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.\"_\n",
+    "\n",
+    "\n",
+    "### Outline\n",
+    "* Set up pre-requisites.\n",
+    "* Pick a model to deploy.\n",
+    "* Download and prepare data for inference. \n",
+    "* Deploy the model for real time inference.\n",
+    "* Test the endpoint\n",
+    "* Clean up resources."
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 1. Set up pre-requisites\n",
+    "* Install dependencies\n",
+    "* Connect to AzureML Workspace. Learn more at [set up SDK authentication](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-setup-authentication?tabs=sdk). Replace  `<WORKSPACE_NAME>`, `<RESOURCE_GROUP>` and `<SUBSCRIPTION_ID>` below.\n",
+    "* Connect to `azureml` system registry"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from azure.ai.ml import MLClient\n",
+    "from azure.identity import (\n",
+    "    DefaultAzureCredential,\n",
+    "    InteractiveBrowserCredential,\n",
+    "    ClientSecretCredential,\n",
+    ")\n",
+    "from azure.ai.ml.entities import AmlCompute\n",
+    "import time\n",
+    "\n",
+    "try:\n",
+    "    credential = DefaultAzureCredential()\n",
+    "    credential.get_token(\"https://management.azure.com/.default\")\n",
+    "except Exception as ex:\n",
+    "    credential = InteractiveBrowserCredential()\n",
+    "\n",
+    "workspace_ml_client = MLClient(\n",
+    "    credential,\n",
+    "    subscription_id=\"<SUBSCRIPTION_ID>\",\n",
+    "    resource_group_name=\"<RESOURCE_GROUP>\",\n",
+    "    workspace_name=\"<WORKSPACE_NAME>\",\n",
+    ")\n",
+    "# The models, fine tuning pipelines and environments are available in the AzureML system registry, \"azureml\"\n",
+    "registry_ml_client = MLClient(credential, registry_name=\"azureml\")"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 2. Pick a model to deploy\n",
+    "\n",
+    "Browse models in the Model Catalog in the AzureML Studio, filtering by the `question-answering` task. In this example, we use the `deepset-minilm-uncased-squad2` model. If you have opened this notebook for a different model, replace the model name and version accordingly. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model_name = \"deepset-minilm-uncased-squad2\"\n",
+    "version_list = list(registry_ml_client.models.list(model_name))\n",
+    "if len(version_list) == 0:\n",
+    "    print(\"Model not found in registry\")\n",
+    "else:\n",
+    "    model_version = version_list[0].version\n",
+    "    foundation_model = registry_ml_client.models.get(model_name, model_version)\n",
+    "    print(\n",
+    "        \"\\n\\nUsing model name: {0}, version: {1}, id: {2} for inferencing\".format(\n",
+    "            foundation_model.name, foundation_model.version, foundation_model.id\n",
+    "        )\n",
+    "    )"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 3. Download and prepare data for inference.\n",
+    "\n",
+    "The next few cells show basic data preparation:\n",
+    "* Visualize some data rows\n",
+    "* Save few samples in the format that can be passed as input to the online-inference endpoint."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Download a small sample of the dataset into the ./squad-dataset directory\n",
+    "%run ./squad-dataset/download-dataset.py --download_dir ./squad-dataset"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Load the ./squad-dataset/train.jsonl file into a pandas dataframe and show the first 5 rows\n",
+    "import pandas as pd\n",
+    "\n",
+    "pd.set_option(\n",
+    "    \"display.max_colwidth\", 0\n",
+    ")  # set the max column width to 0 to display the full text\n",
+    "train_df = pd.read_json(\"./squad-dataset/train.jsonl\", lines=True)\n",
+    "train_df.head()"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 4. Deploy the model to an online endpoint\n",
+    "Online endpoints give a durable REST API that can be used to integrate with applications that need to use the model."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import time, sys\n",
+    "from azure.ai.ml.entities import (\n",
+    "    ManagedOnlineEndpoint,\n",
+    "    ManagedOnlineDeployment,\n",
+    "    ProbeSettings,\n",
+    ")\n",
+    "\n",
+    "# Create online endpoint - endpoint names need to be unique in a region, hence using timestamp to create unique endpoint name\n",
+    "timestamp = int(time.time())\n",
+    "online_endpoint_name = \"question-answering-\" + str(timestamp)\n",
+    "# create an online endpoint\n",
+    "endpoint = ManagedOnlineEndpoint(\n",
+    "    name=online_endpoint_name,\n",
+    "    description=\"Online endpoint for \"\n",
+    "    + foundation_model.name\n",
+    "    + \", for question-answering task\",\n",
+    "    auth_mode=\"key\",\n",
+    ")\n",
+    "workspace_ml_client.begin_create_or_update(endpoint).wait()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# create a deployment\n",
+    "demo_deployment = ManagedOnlineDeployment(\n",
+    "    name=\"demo\",\n",
+    "    endpoint_name=online_endpoint_name,\n",
+    "    model=foundation_model.id,\n",
+    "    instance_type=\"Standard_DS3_v2\",\n",
+    "    instance_count=2,\n",
+    "    liveness_probe=ProbeSettings(\n",
+    "        failure_threshold=30,\n",
+    "        success_threshold=1,\n",
+    "        timeout=2,\n",
+    "        period=10,\n",
+    "        initial_delay=1000,\n",
+    "    ),\n",
+    "    readiness_probe=ProbeSettings(\n",
+    "        failure_threshold=10,\n",
+    "        success_threshold=1,\n",
+    "        timeout=10,\n",
+    "        period=10,\n",
+    "        initial_delay=1000,\n",
+    "    ),\n",
+    ")\n",
+    "workspace_ml_client.online_deployments.begin_create_or_update(demo_deployment).wait()\n",
+    "endpoint.traffic = {\"demo\": 100}\n",
+    "workspace_ml_client.begin_create_or_update(endpoint).result()"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 5. Test the endpoint with sample data\n",
+    "\n",
+    "We will fetch some sample data from the test dataset and submit to online endpoint for inference. We will then show the display the scored labels alongside the ground truth labels"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import json\n",
+    "import os\n",
+    "\n",
+    "# read the ./squad-dataset/train.jsonl file into a pandas dataframe\n",
+    "df = pd.read_json(\"./squad-dataset/train.jsonl\", lines=True)\n",
+    "# escape single and double quotes in the text column\n",
+    "df[\"question\"] = df[\"question\"].str.replace(\"'\", \"\\\\'\").str.replace('\"', '\\\\\"')\n",
+    "df[\"context\"] = df[\"context\"].str.replace(\"'\", \"\\\\'\").str.replace('\"', '\\\\\"')\n",
+    "# pick 1 random row\n",
+    "sample_df = df.sample(1)\n",
+    "# create a json object with the key as \"inputs\" and value as a list of question-context pairs from columns of the sample_df dataframe\n",
+    "test_json = {\n",
+    "    \"input_data\": {\n",
+    "        \"question\": sample_df[\"question\"].to_list(),\n",
+    "        \"context\": sample_df[\"context\"].to_list(),\n",
+    "    },\n",
+    "    \"params\": {},\n",
+    "}\n",
+    "# save the json object to a file named sample_score.json in the ./squad-dataset folder\n",
+    "with open(os.path.join(\".\", \"squad-dataset\", \"sample_score.json\"), \"w\") as f:\n",
+    "    json.dump(test_json, f)\n",
+    "sample_df.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# score the sample_score.json file using the online endpoint with the azureml endpoint invoke method\n",
+    "response = workspace_ml_client.online_endpoints.invoke(\n",
+    "    endpoint_name=online_endpoint_name,\n",
+    "    deployment_name=\"demo\",\n",
+    "    request_file=\"./squad-dataset/sample_score.json\",\n",
+    ")\n",
+    "print(\"raw response: \\n\", response, \"\\n\")\n",
+    "# convert the json response to a pandas dataframe\n",
+    "response_df = pd.read_json(response)\n",
+    "response_df.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# compare the predicted answer with the actual answer\n",
+    "response_df = pd.DataFrame({\"predicted_answer\": [response_df[0][0]]})\n",
+    "response_df[\"ground_truth_answer\"] = sample_df[\"answers\"].to_list()[0][\"text\"]\n",
+    "response_df.head()"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 6. Delete the online endpoint\n",
+    "Don't forget to delete the online endpoint, else you will leave the billing meter running for the compute used by the endpoint"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "workspace_ml_client.online_endpoints.begin_delete(name=online_endpoint_name).wait()"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "base",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.7"
+  },
+  "vscode": {
+   "interpreter": {
+    "hash": "2f394aca7ca06fed1e6064aef884364492d7cdda3614a461e02e6407fc40ba69"
+   }
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/sdk/python/foundation-models/system/inference/text-classification/text-classification-online-endpoint-oss.ipynb b/sdk/python/foundation-models/system/inference/text-classification/text-classification-online-endpoint-oss.ipynb
new file mode 100644
index 00000000000..854694e4798
--- /dev/null
+++ b/sdk/python/foundation-models/system/inference/text-classification/text-classification-online-endpoint-oss.ipynb
@@ -0,0 +1,352 @@
+{
+ "cells": [
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Text Classification Inference using Online Endpoints\n",
+    "\n",
+    "This sample shows how to deploy `text-classification` type models to an online endpoint for inference.\n",
+    "\n",
+    "### Task\n",
+    "`text-classification` is generic task type that can be used for scenarios such as sentiment analysis, emotion detection, grammar checking, spam filtering, etc. In this example, we will do sentiment analysis on movie reviews, to determine whether a review is positive or negative. \n",
+    "\n",
+    "### Inference data\n",
+    "We will use the [imdb](https://huggingface.co/datasets/imdb) dataset\n",
+    "\n",
+    "### Model\n",
+    "Look for models tagged with `text-classification` in the system registry. Just looking for `text-classification` is not sufficient, you need to check if the model is specifically finetuned for sentiment analysis by studying the model card and looking at the input/output samples or signatures of the model. In this notebook, we use the `finiteautomata-bertweet-base-sentiment-analysis` model.\n",
+    "\n",
+    "   \n",
+    "\n",
+    "### Outline\n",
+    "* Set up pre-requisites.\n",
+    "* Pick a model to deploy.\n",
+    "* Download and prepare data for inference. \n",
+    "* Deploy the model for real time inference.\n",
+    "* Test the endpoint\n",
+    "* Clean up resources."
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 1. Set up pre-requisites\n",
+    "* Install dependencies\n",
+    "* Connect to AzureML Workspace. Learn more at [set up SDK authentication](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-setup-authentication?tabs=sdk). Replace  `<WORKSPACE_NAME>`, `<RESOURCE_GROUP>` and `<SUBSCRIPTION_ID>` below.\n",
+    "* Connect to `azureml` system registry"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from azure.ai.ml import MLClient\n",
+    "from azure.identity import (\n",
+    "    DefaultAzureCredential,\n",
+    "    InteractiveBrowserCredential,\n",
+    "    ClientSecretCredential,\n",
+    ")\n",
+    "from azure.ai.ml.entities import AmlCompute\n",
+    "import time\n",
+    "\n",
+    "try:\n",
+    "    credential = DefaultAzureCredential()\n",
+    "    credential.get_token(\"https://management.azure.com/.default\")\n",
+    "except Exception as ex:\n",
+    "    credential = InteractiveBrowserCredential()\n",
+    "\n",
+    "workspace_ml_client = MLClient(\n",
+    "    credential,\n",
+    "    subscription_id=\"<SUBSCRIPTION_ID>\",\n",
+    "    resource_group_name=\"<RESOURCE_GROUP>\",\n",
+    "    workspace_name=\"<WORKSPACE_NAME>\",\n",
+    ")\n",
+    "# The models, fine tuning pipelines and environments are available in the AzureML system registry, \"azureml\"\n",
+    "registry_ml_client = MLClient(credential, registry_name=\"azureml\")"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 2. Pick a model to deploy\n",
+    "\n",
+    "Browse models in the Model Catalog in the AzureML Studio, filtering by the `fill-mask` task. In this example, we use the `finiteautomata-bertweet-base-sentiment-analysis` model. If you have opened this notebook for a different model, replace the model name and version accordingly. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model_name = \"roberta-large-mnli\"\n",
+    "version_list = list(registry_ml_client.models.list(model_name))\n",
+    "if len(version_list) == 0:\n",
+    "    print(\"Model not found in registry\")\n",
+    "else:\n",
+    "    model_version = version_list[0].version\n",
+    "    foundation_model = registry_ml_client.models.get(model_name, model_version)\n",
+    "    print(\n",
+    "        \"\\n\\nUsing model name: {0}, version: {1}, id: {2} for fine tuning\".format(\n",
+    "            foundation_model.name, foundation_model.version, foundation_model.id\n",
+    "        )\n",
+    "    )"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 3. Download and prepare data for inference.\n",
+    "\n",
+    "The next few cells show basic data preparation:\n",
+    "* Visualize some data rows\n",
+    "* Replace numerical categories in data with the actual string labels. This mapping is available in [label.json](./emotion-dataset/label.json). This step is needed because the selected models will return labels such `pos`, `neg`, etc. when running prediction. If the labels in your ground truth data are left as `0`, `1`, `2`, etc., then they would not match with prediction labels returned by the models."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Download a small sample of the dataset into the ./imdb-dataset directory\n",
+    "%run ./imdb-dataset/download-dataset.py --download_dir ./imdb-dataset"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "\n",
+    "dataset_dir = \"./imdb-dataset\"\n",
+    "data_file = \"train.jsonl\"\n",
+    "\n",
+    "# load the train.jsonl file into a pandas dataframe and show the first 5 rows\n",
+    "import pandas as pd\n",
+    "\n",
+    "pd.set_option(\n",
+    "    \"display.max_colwidth\", 0\n",
+    ")  # set the max column width to 0 to display the full text\n",
+    "df = pd.read_json(os.path.join(dataset_dir, data_file), lines=True)\n",
+    "df.head()\n",
+    "\n",
+    "# load the id2label json element of the label.json file into pandas table with keys as 'label' column of int64 type and values as 'label_string' column as string type\n",
+    "import json\n",
+    "\n",
+    "label_file = \"label.json\"\n",
+    "with open(os.path.join(dataset_dir, label_file)) as f:\n",
+    "    id2label = json.load(f)\n",
+    "    id2label = id2label[\"id2label\"]\n",
+    "    label_df = pd.DataFrame.from_dict(\n",
+    "        id2label, orient=\"index\", columns=[\"label_string\"]\n",
+    "    )\n",
+    "    label_df[\"label\"] = label_df.index.astype(\"int64\")\n",
+    "    label_df = label_df[[\"label\", \"label_string\"]]\n",
+    "\n",
+    "# join the train, validation and test dataframes with the id2label dataframe to get the label_string column\n",
+    "df = df.merge(label_df, on=\"label\", how=\"left\")\n",
+    "# rename the label_string column to ground_truth_label\n",
+    "df = df.rename(columns={\"label_string\": \"ground_truth_label\"})\n",
+    "\n",
+    "import re\n",
+    "\n",
+    "df[\"text\"] = df[\"text\"].apply(lambda x: re.sub(r\"<.*?>\", \"\", x))\n",
+    "\n",
+    "df.head()"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 4. Deploy the model to an online endpoint\n",
+    "Online endpoints give a durable REST API that can be used to integrate with applications that need to use the model."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import time, sys\n",
+    "from azure.ai.ml.entities import (\n",
+    "    ManagedOnlineEndpoint,\n",
+    "    ManagedOnlineDeployment,\n",
+    "    ProbeSettings,\n",
+    ")\n",
+    "\n",
+    "# Create online endpoint - endpoint names need to be unique in a region, hence using timestamp to create unique endpoint name\n",
+    "timestamp = int(time.time())\n",
+    "online_endpoint_name = \"text-class-\" + str(timestamp)\n",
+    "# create an online endpoint\n",
+    "endpoint = ManagedOnlineEndpoint(\n",
+    "    name=online_endpoint_name,\n",
+    "    description=\"Online endpoint for \"\n",
+    "    + foundation_model.name\n",
+    "    + \", to detect positive or negative sentiment in movie reviews\",\n",
+    "    auth_mode=\"key\",\n",
+    ")\n",
+    "workspace_ml_client.begin_create_or_update(endpoint).wait()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# create a deployment\n",
+    "demo_deployment = ManagedOnlineDeployment(\n",
+    "    name=\"demo\",\n",
+    "    endpoint_name=online_endpoint_name,\n",
+    "    model=foundation_model.id,\n",
+    "    instance_type=\"Standard_DS4_v2\",\n",
+    "    instance_count=2,\n",
+    "    liveness_probe=ProbeSettings(\n",
+    "        failure_threshold=30,\n",
+    "        success_threshold=1,\n",
+    "        timeout=2,\n",
+    "        period=10,\n",
+    "        initial_delay=1000,\n",
+    "    ),\n",
+    "    readiness_probe=ProbeSettings(\n",
+    "        failure_threshold=10,\n",
+    "        success_threshold=1,\n",
+    "        timeout=10,\n",
+    "        period=10,\n",
+    "        initial_delay=1000,\n",
+    "    ),\n",
+    ")\n",
+    "workspace_ml_client.online_deployments.begin_create_or_update(demo_deployment).wait()\n",
+    "endpoint.traffic = {\"demo\": 100}\n",
+    "workspace_ml_client.begin_create_or_update(endpoint).result()"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 5. Test the endpoint with sample data\n",
+    "\n",
+    "We will fetch some sample data from the test dataset and submit to online endpoint for inference. We will then show the display the scored labels alongside the ground truth labels"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import json\n",
+    "\n",
+    "score_file = \"sample_score.json\"\n",
+    "# pick 5 random rows\n",
+    "sample_df = df.sample(5)\n",
+    "# reset the index of sample_df\n",
+    "sample_df = sample_df.reset_index(drop=True)\n",
+    "sample_df.drop(columns=[\"label\"], inplace=True)\n",
+    "\n",
+    "# save the json object to a file named sample_score.json in the\n",
+    "test_json = {\n",
+    "    \"input_data\": sample_df[\"text\"].tolist(),\n",
+    "    \"params\": {\"return_all_scores\": True},\n",
+    "}\n",
+    "# save the json object to a file named sample_score.json in the ./imdb-dataset folder\n",
+    "with open(os.path.join(\".\", dataset_dir, score_file), \"w\") as f:\n",
+    "    json.dump(test_json, f)\n",
+    "sample_df.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# score the sample_score.json file using the online endpoint with the azureml endpoint invoke method\n",
+    "response = workspace_ml_client.online_endpoints.invoke(\n",
+    "    endpoint_name=online_endpoint_name,\n",
+    "    deployment_name=\"demo\",\n",
+    "    request_file=os.path.join(\".\", dataset_dir, score_file),\n",
+    ")\n",
+    "print(\"raw response: \\n\", response, \"\\n\")\n",
+    "# convert the json response to a pandas dataframe\n",
+    "response_df = pd.read_json(response)\n",
+    "# rename label column to predicted_label\n",
+    "response_df = response_df.rename(columns={0: \"predicted_label\"})\n",
+    "response_df.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# merge the sample_df and response_df dataframes\n",
+    "merged_df = sample_df.merge(response_df, left_index=True, right_index=True)\n",
+    "merged_df.head()"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 6. Delete the online endpoint\n",
+    "Don't forget to delete the online endpoint, else you will leave the billing meter running for the compute used by the endpoint"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "workspace_ml_client.online_endpoints.begin_delete(name=online_endpoint_name).wait()"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "base",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.7"
+  },
+  "vscode": {
+   "interpreter": {
+    "hash": "2f394aca7ca06fed1e6064aef884364492d7cdda3614a461e02e6407fc40ba69"
+   }
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/sdk/python/foundation-models/system/inference/token-classification/token-classification-online-endpoint-oss.ipynb b/sdk/python/foundation-models/system/inference/token-classification/token-classification-online-endpoint-oss.ipynb
new file mode 100644
index 00000000000..c6d5cd72add
--- /dev/null
+++ b/sdk/python/foundation-models/system/inference/token-classification/token-classification-online-endpoint-oss.ipynb
@@ -0,0 +1,327 @@
+{
+ "cells": [
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Token Classification Inference using Online Endpoints\n",
+    "\n",
+    "This sample shows how to deploy `token-classification` type models to an online endpoint for inference.\n",
+    "\n",
+    "### Task\n",
+    "`token-classification` assigns a label to individual tokens in a sentence. One of the most common `token-classification` tasks is Named Entity Recognition (NER). NER attempts to find a label for each entity in a sentence, such as a person, location, or organization.\n",
+    "\n",
+    "### Model\n",
+    "Models that can perform the `token-classification` task are tagged with `task: token-classification`. We will use the `Jean-Baptiste-camembert-ner` model in this notebook. If you opened this notebook from a specific model card, remember to replace the specific model name. If you don't find a model that suits your scenario or domain, you can discover and [import models from HuggingFace hub](../../import/import_model_into_registry.ipynb) and then use them for inference. \n",
+    "\n",
+    "### Inference data\n",
+    "We will use the [polyglot_ner](https://huggingface.co/datasets/polyglot_ner/viewer/fr/train) dataset. \\\n",
+    "Please note that the dataset used here is a French dataset, as the Jean-Baptiste/camembert-ner model was trained in French.\n",
+    "\n",
+    "### Outline\n",
+    "* Set up pre-requisites.\n",
+    "* Pick a model to deploy.\n",
+    "* Prepare data for inference. \n",
+    "* Deploy the model for real time inference.\n",
+    "* Test the endpoint\n",
+    "* Clean up resources."
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 1. Set up pre-requisites\n",
+    "* Install dependencies\n",
+    "* Connect to AzureML Workspace. Learn more at [set up SDK authentication](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-setup-authentication?tabs=sdk). Replace  `<WORKSPACE_NAME>`, `<RESOURCE_GROUP>` and `<SUBSCRIPTION_ID>` below.\n",
+    "* Connect to `azureml` system registry"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from azure.ai.ml import MLClient\n",
+    "from azure.identity import (\n",
+    "    DefaultAzureCredential,\n",
+    "    InteractiveBrowserCredential,\n",
+    "    ClientSecretCredential,\n",
+    ")\n",
+    "from azure.ai.ml.entities import AmlCompute\n",
+    "import time\n",
+    "\n",
+    "try:\n",
+    "    credential = DefaultAzureCredential()\n",
+    "    credential.get_token(\"https://management.azure.com/.default\")\n",
+    "except Exception as ex:\n",
+    "    credential = InteractiveBrowserCredential()\n",
+    "\n",
+    "workspace_ml_client = MLClient(\n",
+    "    credential,\n",
+    "    subscription_id=\"<SUBSCRIPTION_ID>\",\n",
+    "    resource_group_name=\"<RESOURCE_GROUP>\",\n",
+    "    workspace_name=\"<WORKSPACE_NAME>\",\n",
+    ")\n",
+    "# The models, fine tuning pipelines and environments are available in the AzureML system registry, \"azureml\"\n",
+    "registry_ml_client = MLClient(credential, registry_name=\"azureml\")"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 2. Pick a model to deploy\n",
+    "\n",
+    "Browse models in the Model Catalog in the AzureML Studio, filtering by the `token-classification` task. In this example, we use the `Jean-Baptiste-camembert-ner` model. If you have opened this notebook for a different model, replace the model name and version accordingly. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model_name = \"Jean-Baptiste-camembert-ner\"\n",
+    "version_list = list(registry_ml_client.models.list(model_name))\n",
+    "if len(version_list) == 0:\n",
+    "    print(\"Model not found in registry\")\n",
+    "else:\n",
+    "    model_version = version_list[0].version\n",
+    "    foundation_model = registry_ml_client.models.get(model_name, model_version)\n",
+    "    print(\n",
+    "        \"\\n\\nUsing model name: {0}, version: {1}, id: {2} for inferencing\".format(\n",
+    "            foundation_model.name, foundation_model.version, foundation_model.id\n",
+    "        )\n",
+    "    )"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 3. Download and prepare data for inference.\n",
+    "\n",
+    "The next few cells show basic data preparation:\n",
+    "* Visualize some data rows\n",
+    "* Save few samples in the format that can be passed as input to the online-inference endpoint."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Download a small sample of the dataset into the ./polyglot_ner-dataset directory\n",
+    "%run ./polyglot_ner-dataset/download-dataset.py --download_dir ./polyglot_ner-dataset"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# load the ./polyglot_ner/train.jsonl file into a pandas dataframe and show the first 5 rows\n",
+    "import pandas as pd\n",
+    "\n",
+    "pd.set_option(\n",
+    "    \"display.max_colwidth\", 0\n",
+    ")  # set the max column width to 0 to display the full text\n",
+    "train_df = pd.read_json(\"./polyglot_ner-dataset/train.jsonl\", lines=True)\n",
+    "\n",
+    "train_df.drop(columns=[\"words\", \"id\", \"lang\"], inplace=True)\n",
+    "train_df.rename(columns={\"ner\": \"ground_truth_labels\"}, inplace=True)\n",
+    "\n",
+    "train_df = train_df[[\"text\", \"ground_truth_labels\"]]\n",
+    "\n",
+    "train_df.head()"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 4. Deploy the model to an online endpoint\n",
+    "Online endpoints give a durable REST API that can be used to integrate with applications that need to use the model."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import time, sys\n",
+    "from azure.ai.ml.entities import (\n",
+    "    ManagedOnlineEndpoint,\n",
+    "    ManagedOnlineDeployment,\n",
+    "    ProbeSettings,\n",
+    ")\n",
+    "\n",
+    "# Create online endpoint - endpoint names need to be unique in a region, hence using timestamp to create unique endpoint name\n",
+    "timestamp = int(time.time())\n",
+    "online_endpoint_name = \"token-classification-\" + str(timestamp)\n",
+    "# create an online endpoint\n",
+    "endpoint = ManagedOnlineEndpoint(\n",
+    "    name=online_endpoint_name,\n",
+    "    description=\"Online endpoint for \"\n",
+    "    + foundation_model.name\n",
+    "    + \", for token-classification task\",\n",
+    "    auth_mode=\"key\",\n",
+    ")\n",
+    "workspace_ml_client.begin_create_or_update(endpoint).wait()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# create a deployment\n",
+    "demo_deployment = ManagedOnlineDeployment(\n",
+    "    name=\"demo\",\n",
+    "    endpoint_name=online_endpoint_name,\n",
+    "    model=foundation_model.id,\n",
+    "    instance_type=\"STANDARD_E8S_V3\",\n",
+    "    instance_count=2,\n",
+    "    liveness_probe=ProbeSettings(\n",
+    "        failure_threshold=30,\n",
+    "        success_threshold=1,\n",
+    "        timeout=2,\n",
+    "        period=10,\n",
+    "        initial_delay=1000,\n",
+    "    ),\n",
+    "    readiness_probe=ProbeSettings(\n",
+    "        failure_threshold=10,\n",
+    "        success_threshold=1,\n",
+    "        timeout=10,\n",
+    "        period=10,\n",
+    "        initial_delay=1000,\n",
+    "    ),\n",
+    ")\n",
+    "workspace_ml_client.online_deployments.begin_create_or_update(demo_deployment).wait()\n",
+    "endpoint.traffic = {\"demo\": 100}\n",
+    "workspace_ml_client.begin_create_or_update(endpoint).result()"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 5. Test the endpoint with sample data\n",
+    "\n",
+    "We will fetch some sample data from the test dataset and submit to online endpoint for inference. We will then show the display the scored labels alongside the ground truth labels"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import json\n",
+    "import os\n",
+    "\n",
+    "# pick 1 random row\n",
+    "sample_df = train_df.sample(1)\n",
+    "# create a json object with the key as \"inputs\" and value as a list of values from the en column of the sample_df dataframe\n",
+    "sample_json = {\"inputs\": sample_df[\"text\"].tolist()}\n",
+    "# save the json object to a file named sample_score.json in the ./polyglot_ner-dataset folder\n",
+    "test_json = {\"input_data\": [sample_df[\"text\"].tolist()], \"params\": {}}\n",
+    "# save the json object to a file named sample_score.json in the ./polyglot_ner-dataset folder\n",
+    "with open(os.path.join(\".\", \"polyglot_ner-dataset\", \"sample_score.json\"), \"w\") as f:\n",
+    "    json.dump(test_json, f)\n",
+    "sample_df.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# score the sample_score.json file using the online endpoint with the azureml endpoint invoke method\n",
+    "response = workspace_ml_client.online_endpoints.invoke(\n",
+    "    endpoint_name=online_endpoint_name,\n",
+    "    deployment_name=\"demo\",\n",
+    "    request_file=\"./polyglot_ner-dataset/sample_score.json\",\n",
+    ")\n",
+    "print(\"raw response: \\n\", response, \"\\n\")\n",
+    "# convert the json response to a pandas dataframe\n",
+    "response_df = pd.read_json(response)\n",
+    "response_df.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# compare the predicted labels with the actual labels\n",
+    "predicted_labels = response_df[0][0]\n",
+    "compare_df = pd.DataFrame(\n",
+    "    {\n",
+    "        \"ground_truth_labels\": sample_df[\"ground_truth_labels\"].tolist(),\n",
+    "        \"predicted_labels\": [predicted_labels],\n",
+    "    }\n",
+    ")\n",
+    "compare_df.head()"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 6. Delete the online endpoint\n",
+    "Don't forget to delete the online endpoint, else you will leave the billing meter running for the compute used by the endpoint"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "workspace_ml_client.online_endpoints.begin_delete(name=online_endpoint_name).wait()"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "base",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.7"
+  },
+  "vscode": {
+   "interpreter": {
+    "hash": "2f394aca7ca06fed1e6064aef884364492d7cdda3614a461e02e6407fc40ba69"
+   }
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/sdk/python/foundation-models/system/inference/translation/translation-online-endpoint-oss.ipynb b/sdk/python/foundation-models/system/inference/translation/translation-online-endpoint-oss.ipynb
new file mode 100644
index 00000000000..e79fcab36fa
--- /dev/null
+++ b/sdk/python/foundation-models/system/inference/translation/translation-online-endpoint-oss.ipynb
@@ -0,0 +1,320 @@
+{
+ "cells": [
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Translation Inference using Online Endpoints\n",
+    "\n",
+    "This sample shows how to deploy `translation` type models to an online endpoint for inference.\n",
+    "\n",
+    "### Task\n",
+    "`translation` converts a sequence of text from one language to another. It is one of several tasks you can formulate as a sequence-to-sequence problem, a powerful framework for returning some output from an input, like translation or summarization. `translation` systems are commonly used for translation between different language texts, but it can also be used for speech or some combination in between like text-to-speech or speech-to-text.\n",
+    "\n",
+    "### Model\n",
+    "Models that can perform the `translation` task are tagged with `task: text-translation`. We will use the `t5-small` model in this notebook. If you opened this notebook from a specific model card, remember to replace the specific model name. If you don't find a model that suits your scenario or domain, you can discover and [import models from HuggingFace hub](../../import/import_model_into_registry.ipynb) and then use them for inference. \n",
+    "\n",
+    "### Inference data\n",
+    "We will use the [wmt16 (ro-en)](https://huggingface.co/datasets/wmt16) dataset.\n",
+    "\n",
+    "### Outline\n",
+    "* Set up pre-requisites.\n",
+    "* Pick a model to deploy.\n",
+    "* Download and prepare data for inference. \n",
+    "* Deploy the model for real time inference.\n",
+    "* Test the endpoint\n",
+    "* Clean up resources."
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 1. Set up pre-requisites\n",
+    "* Install dependencies\n",
+    "* Connect to AzureML Workspace. Learn more at [set up SDK authentication](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-setup-authentication?tabs=sdk). Replace  `<WORKSPACE_NAME>`, `<RESOURCE_GROUP>` and `<SUBSCRIPTION_ID>` below.\n",
+    "* Connect to `azureml` system registry"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from azure.ai.ml import MLClient\n",
+    "from azure.identity import (\n",
+    "    DefaultAzureCredential,\n",
+    "    InteractiveBrowserCredential,\n",
+    "    ClientSecretCredential,\n",
+    ")\n",
+    "from azure.ai.ml.entities import AmlCompute\n",
+    "import time\n",
+    "\n",
+    "try:\n",
+    "    credential = DefaultAzureCredential()\n",
+    "    credential.get_token(\"https://management.azure.com/.default\")\n",
+    "except Exception as ex:\n",
+    "    credential = InteractiveBrowserCredential()\n",
+    "\n",
+    "workspace_ml_client = MLClient(\n",
+    "    credential,\n",
+    "    subscription_id=\"<SUBSCRIPTION_ID>\",\n",
+    "    resource_group_name=\"<RESOURCE_GROUP>\",\n",
+    "    workspace_name=\"<WORKSPACE_NAME>\",\n",
+    ")\n",
+    "# The models, fine tuning pipelines and environments are available in the AzureML system registry, \"azureml\"\n",
+    "registry_ml_client = MLClient(credential, registry_name=\"azureml\")"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 2. Pick a model to deploy\n",
+    "\n",
+    "Browse models in the Model Catalog in the AzureML Studio, filtering by the `translation` task. In this example, we use the `t5-small` model. If you have opened this notebook for a different model, replace the model name and version accordingly. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model_name = \"t5-large\"\n",
+    "version_list = list(registry_ml_client.models.list(model_name))\n",
+    "if len(version_list) == 0:\n",
+    "    print(\"Model not found in registry\")\n",
+    "else:\n",
+    "    model_version = version_list[0].version\n",
+    "foundation_model = registry_ml_client.models.get(model_name, model_version)\n",
+    "print(\n",
+    "    \"\\n\\nUsing model name: {0}, version: {1}, id: {2} for inferencing\".format(\n",
+    "        foundation_model.name, foundation_model.version, foundation_model.id\n",
+    "    )\n",
+    ")"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 3. Download and prepare data for inference.\n",
+    "\n",
+    "The next few cells show basic data preparation:\n",
+    "* Visualize some data rows\n",
+    "* Save few samples in the format that can be passed as input to the online-inference endpoint."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Download a small sample of the dataset into the ./wmt16-en-ro-dataset directory\n",
+    "%run ./wmt16-en-ro-dataset/download-dataset.py --download_dir ./wmt16-en-ro-dataset"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# load the ./wmt16-en-ro-dataset/train.jsonl file into a pandas dataframe and show the first 5 rows\n",
+    "import pandas as pd\n",
+    "\n",
+    "pd.set_option(\n",
+    "    \"display.max_colwidth\", 0\n",
+    ")  # set the max column width to 0 to display the full text\n",
+    "train_df = pd.read_json(\"./wmt16-en-ro-dataset/train.jsonl\", lines=True)\n",
+    "train_df.head()"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 4. Deploy the model to an online endpoint\n",
+    "Online endpoints give a durable REST API that can be used to integrate with applications that need to use the model."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import time, sys\n",
+    "from azure.ai.ml.entities import (\n",
+    "    ManagedOnlineEndpoint,\n",
+    "    ManagedOnlineDeployment,\n",
+    "    ProbeSettings,\n",
+    ")\n",
+    "\n",
+    "# Create online endpoint - endpoint names need to be unique in a region, hence using timestamp to create unique endpoint name\n",
+    "timestamp = int(time.time())\n",
+    "online_endpoint_name = \"translation-\" + str(timestamp)\n",
+    "# create an online endpoint\n",
+    "endpoint = ManagedOnlineEndpoint(\n",
+    "    name=online_endpoint_name,\n",
+    "    description=\"Online endpoint for \"\n",
+    "    + foundation_model.name\n",
+    "    + \", for translation task\",\n",
+    "    auth_mode=\"key\",\n",
+    ")\n",
+    "workspace_ml_client.begin_create_or_update(endpoint).wait()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# create a deployment\n",
+    "demo_deployment = ManagedOnlineDeployment(\n",
+    "    name=\"demo\",\n",
+    "    endpoint_name=online_endpoint_name,\n",
+    "    model=foundation_model.id,\n",
+    "    instance_type=\"Standard_DS4_v2\",\n",
+    "    instance_count=2,\n",
+    "    liveness_probe=ProbeSettings(\n",
+    "        failure_threshold=30,\n",
+    "        success_threshold=1,\n",
+    "        timeout=2,\n",
+    "        period=10,\n",
+    "        initial_delay=1000,\n",
+    "    ),\n",
+    "    readiness_probe=ProbeSettings(\n",
+    "        failure_threshold=10,\n",
+    "        success_threshold=1,\n",
+    "        timeout=10,\n",
+    "        period=10,\n",
+    "        initial_delay=1000,\n",
+    "    ),\n",
+    ")\n",
+    "workspace_ml_client.online_deployments.begin_create_or_update(demo_deployment).wait()\n",
+    "endpoint.traffic = {\"demo\": 100}\n",
+    "workspace_ml_client.begin_create_or_update(endpoint).result()"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 5. Test the endpoint with sample data\n",
+    "\n",
+    "We will fetch some sample data from the test dataset and submit to online endpoint for inference. We will then show the display the scored labels alongside the ground truth labels"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import json\n",
+    "import os\n",
+    "\n",
+    "# read the ./wmt16-en-ro-dataset/train.jsonl file into a pandas dataframe\n",
+    "df = pd.read_json(\"./wmt16-en-ro-dataset/train.jsonl\", lines=True)\n",
+    "# escape single and double quotes in the text column\n",
+    "df[\"en\"] = df[\"en\"].str.replace(\"'\", \"\\\\'\").str.replace('\"', '\\\\\"')\n",
+    "# pick 1 random row\n",
+    "sample_df = df.sample(1)\n",
+    "# create a json object with the key as \"inputs\" and value as a list of values from the en column of the sample_df dataframe\n",
+    "test_json = {\n",
+    "    \"input_data\": [f'translate English to Romanian: {sample_df[\"en\"].tolist()}'],\n",
+    "    \"params\": {},\n",
+    "}\n",
+    "# save the json object to a file named sample_score.json in the ./wmt16-en-ro-dataset folder\n",
+    "with open(os.path.join(\".\", \"wmt16-en-ro-dataset\", \"sample_score.json\"), \"w\") as f:\n",
+    "    json.dump(test_json, f)\n",
+    "sample_df.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# score the sample_score.json file using the online endpoint with the azureml endpoint invoke method\n",
+    "response = workspace_ml_client.online_endpoints.invoke(\n",
+    "    endpoint_name=online_endpoint_name,\n",
+    "    deployment_name=\"demo\",\n",
+    "    request_file=\"./wmt16-en-ro-dataset/sample_score.json\",\n",
+    ")\n",
+    "print(\"raw response: \\n\", response, \"\\n\")\n",
+    "# convert the json response to a pandas dataframe\n",
+    "response_df = pd.read_json(response)\n",
+    "response_df.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# compare the predicted translation with the ground truth translation\n",
+    "response_df.rename(columns={0: \"predicted_translation\"}, inplace=True)\n",
+    "response_df[\"ground_truth_translation\"] = sample_df[\"ro\"].tolist()\n",
+    "response_df.head()"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 6. Delete the online endpoint\n",
+    "Don't forget to delete the online endpoint, else you will leave the billing meter running for the compute used by the endpoint"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "workspace_ml_client.online_endpoints.begin_delete(name=online_endpoint_name).wait()"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "base",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.7"
+  },
+  "vscode": {
+   "interpreter": {
+    "hash": "2f394aca7ca06fed1e6064aef884364492d7cdda3614a461e02e6407fc40ba69"
+   }
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}