diff --git a/.DS_Store b/.DS_Store new file mode 100644 index 00000000..5709ef14 Binary files /dev/null and b/.DS_Store differ diff --git a/00_Intro/README.md b/00_Intro/README.md new file mode 100644 index 00000000..a17baa97 --- /dev/null +++ b/00_Intro/README.md @@ -0,0 +1,118 @@ +# Prerequisites + +- This workshop must be executed into your own account with access on Amazon Bedrock. +- Run this workshop in **us-east-1 (N. Virginia)** region. +- If you are running on SageMaker Studio, here is recommended kernel configuration: + - Image: Data Science 3.0 + - Instance Type: ml.t3.medium + +## IAM Policy for Bedrock + +Following IAM policy should be created to grant access on Bedrock APIs: + +```json +{ + "Version": "2012-10-17", + "Statement": [ + { + "Sid": "Statement1", + "Effect": "Allow", + "Action": "bedrock:*", + "Resource": "*" + } + ] +} +``` + +Now, please proceed to the environment setup. + +# Environment Setup + +## Boto3 Setup + +First, to be able to execute this workshop, you should have [Python](https://www.python.org/getit/) installed. + +> Python version >= 3.9 + +Next, you need to install boto3 (and botocore) [AWS SDK for Python (Boto3)](https://aws.amazon.com/sdk-for-python/) libraries. Both libraries contain required dependencies related with Bedrock APIs. + +This can be downloaded using the script on the root of this repository + +```sh +bash download-dependencies.sh +``` + +To install it, run the following commands: + + +```python +%pip install ../dependencies/botocore-1.29.162-py3-none-any.whl +%pip install ../dependencies/boto3-1.26.162-py3-none-any.whl +%pip install ../dependencies/awscli-1.27.162-py3-none-any.whl +``` + +> `boto3` version must be >= 1.26.162 and `botocore` version must be >= 1.29.162 + +## AWS Credentials (optional) + +If you are running this workshop in your local computer, such as Microsoft VS Code, PyCharm, etc., you should run the following snippet, which will configure your credentials (to be able to run AWS API calls): + +```python +import sys, os +module_path = "../utils" +sys.path.append(os.path.abspath(module_path)) +import bedrock as util_w + +os.environ['LANGCHAIN_ASSUME_ROLE'] = '' +boto3_bedrock = util_w.get_bedrock_client(os.environ['LANGCHAIN_ASSUME_ROLE']) +``` + +## LangChain installation + +It's also necessary to install [LangChain](https://python.langchain.com/en/latest/index.html). LangChain is a framework for developing applications powered by language models. + +```python +%pip install langchain==0.0.190 --quiet +``` + +> `LangChain` version must be >= `0.0.190` + +## Explanation of Bedrock API + +First, you instantiate a client session on `boto3` to make calls on Bedrock API: + +```python +bedrock = boto3.client(service_name='bedrock', + region_name='us-east-1', + endpoint_url='https://bedrock.us-east-1.amazonaws.com') +``` + +Next, you call `InvokeModel` API for sending requests to a foundation model. +Following is an example of API request to send a text to Amazon Titan Model. + +```python +response = bedrock.invoke_model(body={ + "inputText": "this is where you place your input text", + "textGenerationConfig": { + "maxTokenCount": 4096, + "stopSequences": [], + "temperature":0, + "topP":1 + }, + }, + modelId="amazon.titan-tg1-large", + accept=accept, + contentType=contentType) + +``` + +Where: + +* **inputText**: Text prompt to be send to the Bedrock API. +* **textGenerationConfig**: Model specific parameters (varies for each model) to be considered during inference time. + +## Conclusion + +**Congratulations!** In this section, you installed all prerequisite and understood Bedrock API. + +**Now, you can start the workshop**. \ No newline at end of file diff --git a/00_Intro/bedrock_boto3_setup.ipynb b/00_Intro/bedrock_boto3_setup.ipynb new file mode 100644 index 00000000..bd5ba526 --- /dev/null +++ b/00_Intro/bedrock_boto3_setup.ipynb @@ -0,0 +1,1307 @@ +{ + "cells": [ + { + "attachments": {}, + "cell_type": "markdown", + "id": "d0916a3a-e402-48b7-a775-ce739e4aeaf4", + "metadata": {}, + "source": [ + "# Bedrock boto3 Setup" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "bbab02f1-3eac-4274-b06b-d51ce586df2c", + "metadata": { + "tags": [] + }, + "source": [ + "--- \n", + "\n", + "In this demo notebook, we demonstrate how to use the `boto3` Python SDK to work with [Bedrock](https://aws.amazon.com/bedrock/) Foundational Models.\n", + "\n", + "---" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "4c1fda97-9150-484a-8cfa-86ec9568fc61", + "metadata": {}, + "source": [ + "## Prerequisites" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "27a83a8d-9527-48b4-92ff-fce963fbe3b5", + "metadata": {}, + "source": [ + "---\n", + "Before executing any of the notebook in this workshop, execute the following cells to add Bedrock extensions to the `boto3` Python SDK\n", + "\n", + "---" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "108c611c-7246-45c4-9f1e-76888b5076eb", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Make sure you run `download-dependencies.sh` from the root of the repository to download the dependencies before running this cell\n", + "%pip install ../dependencies/botocore-1.29.162-py3-none-any.whl ../dependencies/boto3-1.26.162-py3-none-any.whl ../dependencies/awscli-1.27.162-py3-none-any.whl --force-reinstall\n" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "3f1c8940", + "metadata": {}, + "source": [ + "You also need to install [langchain](https://github.com/hwchase17/langchain)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e692c0d3", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "%pip install langchain==0.0.190 --quiet" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "be703e81", + "metadata": {}, + "source": [ + "## Create the boto3 client\n", + "\n", + "Interaction with the Bedrock API is done via boto3 SDK. To create a the Bedrock client, we are providing an utility method that supports different options for passing credentials to boto3. \n", + "If you are running these notebooks from your own computer, make sure you have [installed the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) before proceeding.\n", + "\n", + "\n", + "#### Use default credential chain\n", + "\n", + "If you are running this notebook from a Sagemaker Studio notebook and your Sagemaker Studio role has permissions to access Bedrock you can just run the cells below as-is. This is also the case if you are running these notebooks from a computer whose default credentials have access to Bedrock\n", + "\n", + "#### Use a different role\n", + "\n", + "In case you or your company has setup a specific role to access Bedrock, you can specify such role by uncommenting the line `#os.environ['BEDROCK_ASSUME_ROLE'] = ''` in the cell below before executing it. Ensure that your current user or role have permissions to assume such role.\n", + "\n", + "#### Use a specific profile\n", + "\n", + "In case you are running this notebooks from your own computer and you have setup the AWS CLI with multiple profiles and the profile which has access to Bedrock is not the default one, you can uncomment the line `#os.environ['AWS_PROFILE'] = ''` and specify the profile to use.\n", + "\n", + "#### Note about `langchain`\n", + "\n", + "The Bedrock classes provided by `langchain` create a default Bedrock boto3 client. We recommend to explicitly create the Bedrock client using the instructions below, and pass it to the class instantiation methods using `client=boto3_bedrock`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b031c34a", + "metadata": {}, + "outputs": [], + "source": [ + "#### Un comment the following lines to run from your local environment outside of the AWS account with Bedrock access\n", + "\n", + "\n", + "#import os\n", + "#os.environ['BEDROCK_ASSUME_ROLE'] = ''\n", + "#os.environ['AWS_PROFILE'] = ''" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "89bfd1e4", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import sys\n", + "import json\n", + "\n", + "module_path = \"..\"\n", + "sys.path.append(os.path.abspath(module_path))\n", + "from utils import bedrock, print_ww\n", + "\n", + "os.environ['AWS_DEFAULT_REGION'] = 'us-east-1'\n", + "if('BEDROCK_ASSUME_ROLE' in os.environ):\n", + " boto3_bedrock = bedrock.get_bedrock_client(os.environ.get('BEDROCK_ASSUME_ROLE', None))\n", + "else:\n", + " boto3_bedrock = bedrock.get_bedrock_client()\n" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "9e9174c4-326a-463e-92e1-8c7e47111269", + "metadata": {}, + "source": [ + "#### We can validate our connection by testing out the `list_foundation_models()` method, which will tell us all the models available for us to use " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f67b4466-12ff-4975-9811-7a19c6206604", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "boto3_bedrock.list_foundation_models()" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "be9044d4-9d04-47c3-86ca-5b206585b784", + "metadata": {}, + "source": [ + "#### In this Notebook we will be using the `invoke_model()` method of Amazon Bedrock. This will be the primary method we use for most of our Text Generation and Processing tasks. " + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "881928fb-4daf-47e5-a2b6-b2292a679a81", + "metadata": {}, + "source": [ + "# `InvokeModel` body and output\n", + "\n", + "#### We provide the details about the format for the input and output format of `invoke_model()` for the different foundation models\n", + "\n", + "## Titan Large\n", + "\n", + "#### Input\n", + "```json\n", + "{ \n", + " \"inputText\": \"\",\n", + " \"textGenerationConfig\" : { \n", + " \"maxTokenCount\": 512,\n", + " \"stopSequences\": [],\n", + " \"temperature\":0.1, \n", + " \"topP\":0.9\n", + " }\n", + "}\n", + "```\n", + "\n", + "#### Output\n", + "\n", + "```json\n", + "{\n", + " \"inputTextTokenCount\": 613,\n", + " \"results\": [{\n", + " \"tokenCount\": 219,\n", + " \"outputText\": \"\"\n", + " }]\n", + "}\n", + "```\n", + "\n", + "## Jurassic Grande and Jumbo \n", + "\n", + "#### Input\n", + "\n", + "```json\n", + "{\n", + " \"prompt\": \"\",\n", + " \"maxTokens\": 200,\n", + " \"temperature\": 0.5,\n", + " \"topP\": 0.5,\n", + " \"stopSequences\": [],\n", + " \"countPenalty\": {\n", + " \"scale\": 0\n", + " },\n", + " \"presencePenalty\": {\n", + " \"scale\": 0\n", + " },\n", + " \"frequencyPenalty\": {\n", + " \"scale\": 0\n", + " }\n", + "}\n", + "```\n", + "\n", + "#### Output\n", + "\n", + "```json\n", + "{\n", + " \"id\": 1234,\n", + " \"prompt\": {\n", + " \"text\": \"\",\n", + " \"tokens\": [\n", + " {\n", + " \"generatedToken\": {\n", + " \"token\": \"\\u2581who\\u2581is\",\n", + " \"logprob\": -12.980147361755371,\n", + " \"raw_logprob\": -12.980147361755371\n", + " },\n", + " \"topTokens\": null,\n", + " \"textRange\": {\n", + " \"start\": 0,\n", + " \"end\": 6\n", + " }\n", + " },\n", + " ...\n", + " ]\n", + " },\n", + " \"completions\": [\n", + " {\n", + " \"data\": {\n", + " \"text\": \"\",\n", + " \"tokens\": [\n", + " {\n", + " \"generatedToken\": {\n", + " \"token\": \"<|newline|>\",\n", + " \"logprob\": 0.0,\n", + " \"raw_logprob\": -0.01293118204921484\n", + " },\n", + " \"topTokens\": null,\n", + " \"textRange\": {\n", + " \"start\": 0,\n", + " \"end\": 1\n", + " }\n", + " },\n", + " ...\n", + " ]\n", + " },\n", + " \"finishReason\": {\n", + " \"reason\": \"endoftext\"\n", + " }\n", + " }\n", + " ]\n", + "}\n", + "```\n", + "\n", + "## Claude\n", + "\n", + "#### Input\n", + "\n", + "```json\n", + "{\n", + " \"prompt\": \"\\n\\nHuman:\\n\\nAnswer:\",\n", + " \"max_tokens_to_sample\": 300,\n", + " \"temperature\": 0.5,\n", + " \"top_k\": 250,\n", + " \"top_p\": 1,\n", + " \"stop_sequences\": [\n", + " \"\\n\\nHuman:\"\n", + " ]\n", + "}\n", + "```\n", + "\n", + "#### Output\n", + "\n", + "```json\n", + "{\n", + " \"completion\": \" \",\n", + " \"stop_reason\": \"stop_sequence\"\n", + "}\n", + "```\n", + "\n", + "## Stable Diffusion XL\n", + "\n", + "### Input\n", + "\n", + "```json\n", + "{\n", + " \"text_prompts\": [\n", + " { \n", + " \"text\": \"this is where you place your input text\" \n", + " }\n", + " ],\n", + " \"cfg_scale\":10,\n", + " \"seed\":0,\n", + " \"steps\":50\n", + "}\n", + "```\n", + "\n", + "### Output\n", + "\n", + "```json\n", + "{ \n", + " \"result\": \"success\", \n", + " \"artifacts\": [\n", + " {\n", + " \"seed\": 123, \n", + " \"base64\": \"\",\n", + " \"finishReason\": \"SUCCESS\"\n", + " }\n", + "}\n", + "```" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "80f4adca-cfc4-439b-84b7-e528398684e3", + "metadata": {}, + "source": [ + "# Common inference parameter definitions\n", + "\n", + "## Randomness and Diversity\n", + "\n", + "Foundation models support the following parameters to control randomness and diversity in the \n", + "response.\n", + "\n", + "**Temperature** – Large language models use probability to construct the words in a sequence. For any \n", + "given next word, there is a probability distribution of options for the next word in the sequence. When \n", + "you set the temperature closer to zero, the model tends to select the higher-probability words. When \n", + "you set the temperature further away from zero, the model may select a lower-probability word.\n", + "\n", + "In technical terms, the temperature modulates the probability density function for the next tokens, \n", + "implementing the temperature sampling technique. This parameter can deepen or flatten the density \n", + "function curve. A lower value results in a steeper curve with more deterministic responses, and a higher \n", + "value results in a flatter curve with more random responses.\n", + "\n", + "**Top K** – Temperature defines the probability distribution of potential words, and Top K defines the cut \n", + "off where the model no longer selects the words. For example, if K=50, the model selects from 50 of the \n", + "most probable words that could be next in a given sequence. This reduces the probability that an unusual \n", + "word gets selected next in a sequence.\n", + "In technical terms, Top K is the number of the highest-probability vocabulary tokens to keep for Top-\n", + "K-filtering - This limits the distribution of probable tokens, so the model chooses one of the highest-\n", + "probability tokens.\n", + "\n", + "**Top P** – Top P defines a cut off based on the sum of probabilities of the potential choices. If you set Top \n", + "P below 1.0, the model considers the most probable options and ignores less probable ones. Top P is \n", + "similar to Top K, but instead of capping the number of choices, it caps choices based on the sum of their \n", + "probabilities.\n", + "For the example prompt \"I hear the hoof beats of ,\" you may want the model to provide \"horses,\" \n", + "\"zebras\" or \"unicorns\" as the next word. If you set the temperature to its maximum, without capping \n", + "Top K or Top P, you increase the probability of getting unusual results such as \"unicorns.\" If you set the \n", + "temperature to 0, you increase the probability of \"horses.\" If you set a high temperature and set Top K or \n", + "Top P to the maximum, you increase the probability of \"horses\" or \"zebras,\" and decrease the probability \n", + "of \"unicorns.\"\n", + "\n", + "## Length\n", + "\n", + "The following parameters control the length of the generated response.\n", + "\n", + "**Response length** – Configures the minimum and maximum number of tokens to use in the generated \n", + "response.\n", + "\n", + "**Length penalty** – Length penalty optimizes the model to be more concise in its output by penalizing \n", + "longer responses. Length penalty differs from response length as the response length is a hard cut off for \n", + "the minimum or maximum response length.\n", + "\n", + "In technical terms, the length penalty penalizes the model exponentially for lengthy responses. 0.0 \n", + "means no penalty. Set a value less than 0.0 for the model to generate longer sequences, or set a value \n", + "greater than 0.0 for the model to produce shorter sequences.\n", + "\n", + "## Repetitions\n", + "\n", + "The following parameters help control repetition in the generated response.\n", + "\n", + "**Repetition penalty (presence penalty)** – Prevents repetitions of the same words (tokens) in responses. \n", + "1.0 means no penalty. Greater than 1.0 decreases repetition." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "722bf913-3957-457f-804a-89900dd85c79", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "prompt_data = \"\"\"Command: Write me a blog about making strong business decisions as a leader.\\nBlog:\"\"\"" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "ce22c308-ebbf-4ef5-a823-832b7c236e31", + "metadata": {}, + "source": [ + "## 2. Accessing Bedrock Foundation Models" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "893872fe-04fa-4f09-9736-6c6173ec1fc2", + "metadata": { + "tags": [] + }, + "source": [ + "### Let's try the prompt with the Titan Model on Bedrock" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7df55eed-a3cf-426c-95ea-ec60dade6477", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "prompt_data = \"\"\"Command: Write me a blog about making strong business decisions as a leader.\\nBlog:\"\"\" # If you'd like to try your own prompt, edit this parameter!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "dd2bb671-6b10-4948-9e5e-95d6ced3b86f", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "body = json.dumps({\"inputText\": prompt_data})\n", + "modelId = \"amazon.titan-tg1-large\" \n", + "accept = \"application/json\"\n", + "contentType = \"application/json\"\n", + "\n", + "response = boto3_bedrock.invoke_model(\n", + " body=body, modelId=modelId, accept=accept, contentType=contentType\n", + ")\n", + "response_body = json.loads(response.get(\"body\").read())\n", + "\n", + "print(response_body.get(\"results\")[0].get(\"outputText\"))" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "3d7c0fe6-576a-4380-89aa-726bab5d65ff", + "metadata": {}, + "source": [ + "### Let's try the prompt with the Anthropic Claude Instant Model on Bedrock" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0ba33ac0-fa16-4c4f-b882-e838d0cb5830", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "body = json.dumps({\"prompt\": prompt_data, \"max_tokens_to_sample\": 500})\n", + "modelId = \"anthropic.claude-instant-v1\" # change this to use a different version from the model provider\n", + "accept = \"application/json\"\n", + "contentType = \"application/json\"\n", + "\n", + "response = boto3_bedrock.invoke_model(\n", + " body=body, modelId=modelId, accept=accept, contentType=contentType\n", + ")\n", + "response_body = json.loads(response.get(\"body\").read())\n", + "\n", + "print(response_body.get(\"completion\"))" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "ed0e3144-c6df-400d-aab1-1540614dbbde", + "metadata": {}, + "source": [ + "### Let's try the prompt with the Jurassic Grande Model on Bedrock" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c02d1585-945e-45d1-99d2-171e956138f8", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "body = json.dumps({\"prompt\": prompt_data, \"maxTokens\": 200})\n", + "modelId = \"ai21.j2-grande-instruct\" # change this to use a different version from the model provider\n", + "accept = \"application/json\"\n", + "contentType = \"application/json\"\n", + "\n", + "response = boto3_bedrock.invoke_model(\n", + " body=body, modelId=modelId, accept=accept, contentType=contentType\n", + ")\n", + "response_body = json.loads(response.get(\"body\").read())\n", + "\n", + "print(response_body.get(\"completions\")[0].get(\"data\").get(\"text\"))" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "4b3619b5", + "metadata": {}, + "source": [ + "### Let's try the streaming output from Bedrock" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c69627e3", + "metadata": {}, + "outputs": [], + "source": [ + "from IPython.display import display, display_markdown, Markdown, clear_output\n", + "\n", + "body = json.dumps({\"prompt\": prompt_data, \"max_tokens_to_sample\": 200})\n", + "modelId = \"anthropic.claude-instant-v1\" # change this to use a different version from the model provider\n", + "accept = \"application/json\"\n", + "contentType = \"application/json\"\n", + "\n", + "response = boto3_bedrock.invoke_model_with_response_stream(body=body, modelId=modelId, accept=accept, contentType=contentType)\n", + "stream = response.get('body')\n", + "output = []\n", + "\n", + "if stream:\n", + " for event in stream:\n", + " chunk = event.get('chunk')\n", + " if chunk:\n", + " chunk_obj = json.loads(chunk.get('bytes').decode())\n", + " text = chunk_obj['completion']\n", + " clear_output(wait=True)\n", + " output.append(text)\n", + " display_markdown(Markdown(''.join(output)))\n", + " " + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "bc498bea", + "metadata": {}, + "source": [ + "### Let's try the prompt with the Stable Diffusion XL on Bedrock" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "173e51a2", + "metadata": {}, + "outputs": [], + "source": [ + "prompt_data = \"a fine image of an astronaut riding a horse on Mars\"\n", + "body = json.dumps({\n", + " \"text_prompts\": [\n", + " { \n", + " \"text\": prompt_data \n", + " }\n", + " ],\n", + " \"cfg_scale\":10,\n", + " \"seed\":20,\n", + " \"steps\":50\n", + "})\n", + "modelId = \"stability.stable-diffusion-xl\" \n", + "accept = \"application/json\"\n", + "contentType = \"application/json\"\n", + "\n", + "response = boto3_bedrock.invoke_model(\n", + " body=body, modelId=modelId, accept=accept, contentType=contentType\n", + ")\n", + "response_body = json.loads(response.get(\"body\").read())\n", + "\n", + "print(response_body['result'])\n", + "print(f'{response_body.get(\"artifacts\")[0].get(\"base64\")[0:80]}...')" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "2a00bb66", + "metadata": {}, + "source": [ + "The output is a base64 encoded string of the image. You can use ans image processing library such as Pillow to decode the image as in the example below:\n", + "\n", + "```python\n", + "base_64_img_str = response_body.get(\"artifacts\")[0].get(\"base64\")\n", + "image = Image.open(io.BytesIO(base64.decodebytes(bytes(base_64_img_str, \"utf-8\"))))\n", + "```" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "1ef3451d-b66a-4b11-a1ed-734bf9e7bbec", + "metadata": {}, + "source": [ + "# Embeddings\n", + "\n", + "Use text embeddings to convert text into meaningful vector representations. You input a body of text \n", + "and the output is a (1 x n) vector. You can use embedding vectors for a wide variety of applications. \n", + "Bedrock currently offers one model for text embedding that supports text similarity (finding the \n", + "semantic similarity between bodies of text) and text retrieval (such as search).\n", + "For the text embeddings model, the input text size is 512 tokens and the output vector length is 4096.\n", + "To use a text embeddings model, use the InvokeModel API operation or the Python SDK.\n", + "Use InvokeModel to retrieve the vector representation of the input text from the specified model.\n", + "\n", + "At the time of writing you can only use `amazon.titan-e1t-medium` as embedding model via the API.\n", + "\n", + "#### Input\n", + "\n", + "```json\n", + "{\n", + " \"inputText\": \"\"\n", + "}\n", + "```\n", + "\n", + "#### Output\n", + "\n", + "```json\n", + "{\n", + " \"embedding\": []\n", + "}\n", + "```\n" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "9645dbd8", + "metadata": {}, + "source": [ + "Let's see how to generate embeddings of some text:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1085cc56", + "metadata": {}, + "outputs": [], + "source": [ + "prompt_data = \"Amazon Bedrock supports foundation models from industry-leading providers such as \\\n", + "AI21 Labs, Anthropic, Stability AI, and Amazon. Choose the model that is best suited to achieving your unique goals.\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5c54b424", + "metadata": {}, + "outputs": [], + "source": [ + "body = json.dumps({\"inputText\": prompt_data})\n", + "modelId = \"amazon.titan-e1t-medium\" # change this to use a different version from the model provider\n", + "accept = \"application/json\"\n", + "contentType = \"application/json\"\n", + "\n", + "response = boto3_bedrock.invoke_model(\n", + " body=body, modelId=modelId, accept=accept, contentType=contentType\n", + ")\n", + "response_body = json.loads(response.get(\"body\").read())\n", + "\n", + "embedding = response_body.get(\"embedding\")\n", + "print(f\"The embedding vector has {len(embedding)} values\\n{embedding[0:3]+['...']+embedding[-3:]}\")" + ] + } + ], + "metadata": { + "availableInstances": [ + { + "_defaultOrder": 0, + "_isFastLaunch": true, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 4, + "name": "ml.t3.medium", + "vcpuNum": 2 + }, + { + "_defaultOrder": 1, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 8, + "name": "ml.t3.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 2, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.t3.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 3, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.t3.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 4, + "_isFastLaunch": true, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 8, + "name": "ml.m5.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 5, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.m5.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 6, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.m5.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 7, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.m5.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 8, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.m5.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 9, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.m5.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 10, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.m5.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 11, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 384, + "name": "ml.m5.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 12, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 8, + "name": "ml.m5d.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 13, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.m5d.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 14, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.m5d.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 15, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.m5d.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 16, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.m5d.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 17, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.m5d.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 18, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.m5d.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 19, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 384, + "name": "ml.m5d.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 20, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": true, + "memoryGiB": 0, + "name": "ml.geospatial.interactive", + "supportedImageNames": [ + "sagemaker-geospatial-v1-0" + ], + "vcpuNum": 0 + }, + { + "_defaultOrder": 21, + "_isFastLaunch": true, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 4, + "name": "ml.c5.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 22, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 8, + "name": "ml.c5.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 23, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.c5.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 24, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.c5.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 25, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 72, + "name": "ml.c5.9xlarge", + "vcpuNum": 36 + }, + { + "_defaultOrder": 26, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 96, + "name": "ml.c5.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 27, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 144, + "name": "ml.c5.18xlarge", + "vcpuNum": 72 + }, + { + "_defaultOrder": 28, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.c5.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 29, + "_isFastLaunch": true, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.g4dn.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 30, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.g4dn.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 31, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.g4dn.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 32, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.g4dn.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 33, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 4, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.g4dn.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 34, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.g4dn.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 35, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 61, + "name": "ml.p3.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 36, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 4, + "hideHardwareSpecs": false, + "memoryGiB": 244, + "name": "ml.p3.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 37, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 488, + "name": "ml.p3.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 38, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 768, + "name": "ml.p3dn.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 39, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.r5.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 40, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.r5.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 41, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.r5.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 42, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.r5.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 43, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.r5.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 44, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 384, + "name": "ml.r5.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 45, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 512, + "name": "ml.r5.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 46, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 768, + "name": "ml.r5.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 47, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.g5.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 48, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.g5.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 49, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.g5.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 50, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.g5.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 51, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.g5.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 52, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 4, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.g5.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 53, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 4, + "hideHardwareSpecs": false, + "memoryGiB": 384, + "name": "ml.g5.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 54, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 768, + "name": "ml.g5.48xlarge", + "vcpuNum": 192 + }, + { + "_defaultOrder": 55, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 1152, + "name": "ml.p4d.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 56, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 1152, + "name": "ml.p4de.24xlarge", + "vcpuNum": 96 + } + ], + "instance_type": "ml.t3.medium", + "kernelspec": { + "display_name": "venv", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.16" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/01_Generation/00_generate_w_bedrock.ipynb b/01_Generation/00_generate_w_bedrock.ipynb new file mode 100644 index 00000000..dd2d9e4a --- /dev/null +++ b/01_Generation/00_generate_w_bedrock.ipynb @@ -0,0 +1,922 @@ +{ + "cells": [ + { + "attachments": {}, + "cell_type": "markdown", + "id": "dc40c48b-0c95-4757-a067-563cfccd51a5", + "metadata": { + "tags": [] + }, + "source": [ + "# Invoke Bedrock model for text generation using zero-shot prompt" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "c9a413e2-3c34-4073-9000-d8556537bb6a", + "metadata": {}, + "source": [ + "## Introduction\n", + "\n", + "In this notebook we show you how to use a LLM to generate an email response to a customer who provided negative feedback on the quality of customer service that they received from the support engineer. \n", + "\n", + "We will use Bedrock's Amazon Titan Text large model using the Boto3 API. \n", + "\n", + "The prompt used in this example is called a zero-shot prompt because we are not providing any examples of text alongside their classification other than the prompt.\n", + "\n", + "**Note:** *This notebook can be run within or outside of AWS environment.*\n", + "\n", + "#### Context\n", + "To demonstrate the text generation capability of Amazon Bedrock, we will explore the use of Boto3 client to communicate with Amazon Bedrock API. We will demonstrate different configurations available as well as how simple input can lead to desired outputs.\n", + "\n", + "#### Pattern\n", + "We will simply provide the Amazon Bedrock API with an input consisting of a task, an instruction and an input for the model under the hood to generate an output without providing any additional example. The purpose here is to demonstrate how the powerful LLMs easily understand the task at hand and generate compelling outputs.\n", + "\n", + "![](./images/bedrock.jpg)\n", + "\n", + "#### Use case\n", + "To demonstrate the generation capability of models in Amazon Bedrock, let's take the use case of email generation.\n", + "\n", + "#### Persona\n", + "You are Bob a Customer Service Manager at AnyCompany and some of your customers are not happy with the customer service and are providing negative feedbacks on the service provided by customer support engineers. Now, you would like to respond to those customers humbly aplogizing for the poor service and regain trust. You need the help of an LLM to generate a bulk of emails for you which are human friendly and personalized to the customer's sentiment from previous email correspondence.\n", + "\n", + "#### Implementation\n", + "To fulfill this use case, in this notebook we will show how to generate an email with a thank you note based on the customer's previous email.We will use the Amazon Titan Text Large model using the Amazon Bedrock API with Boto3 client. " + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "64baae27-2660-4a1e-b2e5-3de49d069362", + "metadata": {}, + "source": [ + "## Setup" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "37115f13", + "metadata": {}, + "source": [ + "#### ⚠️⚠️⚠️ Execute the following cells before running this notebook ⚠️⚠️⚠️\n", + "\n", + "For a detailed description on what the following cells do refer to [Bedrock boto3 setup](../00_Intro/bedrock_boto3_setup.ipynb) notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "38b791ad-e6c5-4da5-96af-5c356a36e19d", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Make sure you run `download-dependencies.sh` from the root of the repository to download the dependencies before running this cell\n", + "%pip install ../dependencies/botocore-1.29.162-py3-none-any.whl ../dependencies/boto3-1.26.162-py3-none-any.whl ../dependencies/awscli-1.27.162-py3-none-any.whl --force-reinstall\n", + "%pip install langchain==0.0.190 --quiet" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "776fd083", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "#### Un comment the following lines to run from your local environment outside of the AWS account with Bedrock access\n", + "\n", + "#import os\n", + "#os.environ['BEDROCK_ASSUME_ROLE'] = ''\n", + "#os.environ['AWS_PROFILE'] = ''" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "568a6d26-e3ee-4b0c-a1eb-efc4bff99994", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "import boto3\n", + "import json\n", + "import os\n", + "import sys\n", + "\n", + "module_path = \"..\"\n", + "sys.path.append(os.path.abspath(module_path))\n", + "from utils import bedrock, print_ww\n", + "\n", + "os.environ['AWS_DEFAULT_REGION'] = 'us-east-1'\n", + "boto3_bedrock = bedrock.get_bedrock_client(os.environ.get('BEDROCK_ASSUME_ROLE', None))" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "4f634211-3de1-4390-8c3f-367af5554c39", + "metadata": {}, + "source": [ + "## Generate text\n", + "\n", + "Following on the use case explained above, let's prepare an input for the Amazon Bedrock service to generate an email" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "45ee2bae-6415-4dba-af98-a19028305c98", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# create the prompt\n", + "prompt_data = \"\"\"\n", + "Command: Write an email from Bob, Customer Service Manager, to the customer \"John Doe\" \n", + "who provided negative feedback on the service provided by our customer support \n", + "engineer\"\"\"\n" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "cc9784e5-5e9d-472d-8ef1-34108ee4968b", + "metadata": {}, + "source": [ + "Let's start by using the Amazon Titan Large model. Amazon Titan Large supports a context window of ~4k tokens and accepts the following parameters:\n", + "- `inputText`: Prompt to the LLM\n", + "- `textGenerationConfig`: These are the parameters that model will take into account while generating the output." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8af670eb-ad02-40df-a19c-3ed835fac8d9", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "body = json.dumps({\n", + " \"inputText\": prompt_data, \n", + " \"textGenerationConfig\":{\n", + " \"maxTokenCount\":4096,\n", + " \"stopSequences\":[],\n", + " \"temperature\":0,\n", + " \"topP\":0.9\n", + " }\n", + " }) " + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "c4ca6751", + "metadata": {}, + "source": [ + "The Amazon Bedrock API provides you with an API `invoke_model` which accepts the following:\n", + "- `modelId`: This is the model ARN for the various foundation models available under Amazon Bedrock\n", + "- `accept`: The type of input request\n", + "- `contentType`: The content type of the output\n", + "- `body`: A json string consisting of the prompt and the configurations\n", + "\n", + "Available text generation models under Amazon Bedrock have the following IDs:\n", + "- `amazon.titan-tg1-large`\n", + "- `ai21.j2-grande-instruct`\n", + "- `ai21.j2-jumbo-instruct`\n", + "- `anthropic.claude-instant-v1`\n", + "- `anthropic.claude-v1`" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "088cf6bf-dd73-4710-a0cc-6c11d220c431", + "metadata": {}, + "source": [ + "#### Invoke the Amazon Titan Large language model" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "379498f2", + "metadata": {}, + "source": [ + "First, we explore how the model generates an output based on the prompt created earlier.\n", + "\n", + "##### Complete Output Generation" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ecaceef1-0f7f-4ae5-8007-ff7c25335251", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "modelId = 'amazon.titan-tg1-large' # change this to use a different version from the model provider\n", + "accept = 'application/json'\n", + "contentType = 'application/json'\n", + "\n", + "response = boto3_bedrock.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)\n", + "response_body = json.loads(response.get('body').read())\n", + "\n", + "outputText = response_body.get('results')[0].get('outputText')\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3748383a-c140-407f-a7f6-8f140ad57680", + "metadata": {}, + "outputs": [], + "source": [ + "# The relevant portion of the response begins after the first newline character\n", + "# Below we print the response beginning after the first occurence of '\\n'.\n", + "\n", + "email = outputText[outputText.index('\\n')+1:]\n", + "print_ww(email)\n" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "2d69e1a0", + "metadata": {}, + "source": [ + "##### Streaming Output Generation\n", + "Above is an example email generated by the Amazon Titan Large model by understanding the input request and using its inherent understanding of the different modalities. This request to the API is synchronous and waits for the entire output to be generated by the model.\n", + "\n", + "Bedrock also supports that the output can be streamed as it is generated by the model in form of chunks. Below is an example of invoking the model with streaming option. `invoke_model_with_response_stream` returns a `ResponseStream` which you can read from." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ad073290", + "metadata": {}, + "outputs": [], + "source": [ + "response = boto3_bedrock.invoke_model_with_response_stream(body=body, modelId=modelId, accept=accept, contentType=contentType)\n", + "stream = response.get('body')\n", + "output = []\n", + "i = 1\n", + "if stream:\n", + " for event in stream:\n", + " chunk = event.get('chunk')\n", + " if chunk:\n", + " chunk_obj = json.loads(chunk.get('bytes').decode())\n", + " text = chunk_obj['outputText']\n", + " output.append(text)\n", + " print(f'\\t\\t\\x1b[31m**Chunk {i}**\\x1b[0m\\n{text}\\n')\n", + " i+=1" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "9a788be5", + "metadata": {}, + "source": [ + "The above helps to quickly get output of the model and let the service complete it as you read. This assists in use-cases where there are longer pieces of text that you request the model to generate. You can later combine all the chunks generated to form the complete output and use it for your use-case" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "02d48c73", + "metadata": {}, + "outputs": [], + "source": [ + "print('\\t\\t\\x1b[31m**COMPLETE OUTPUT**\\x1b[0m\\n')\n", + "complete_output = ''.join(output)\n", + "print(complete_output)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "64b08b3b", + "metadata": {}, + "source": [ + "## Conclusion\n", + "You have now experimented with using `boto3` SDK which provides a vanilla exposure to Amazon Bedrock API. Using this API you have seen the use case of generating an email responding to a customer due to their negative feedback.\n", + "\n", + "### Take aways\n", + "- Adapt this notebook to experiment with different models available through Amazon Bedrock such as Anthropic Claude and AI21 Labs Jurassic models.\n", + "- Change the prompts to your specific usecase and evaluate the output of different models.\n", + "- Play with the token length to understand the latency and responsiveness of the service.\n", + "- Apply different prompt engineering principles to get better outputs.\n", + "\n", + "## Thank You" + ] + } + ], + "metadata": { + "availableInstances": [ + { + "_defaultOrder": 0, + "_isFastLaunch": true, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 4, + "name": "ml.t3.medium", + "vcpuNum": 2 + }, + { + "_defaultOrder": 1, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 8, + "name": "ml.t3.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 2, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.t3.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 3, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.t3.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 4, + "_isFastLaunch": true, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 8, + "name": "ml.m5.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 5, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.m5.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 6, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.m5.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 7, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.m5.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 8, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.m5.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 9, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.m5.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 10, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.m5.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 11, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 384, + "name": "ml.m5.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 12, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 8, + "name": "ml.m5d.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 13, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.m5d.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 14, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.m5d.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 15, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.m5d.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 16, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.m5d.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 17, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.m5d.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 18, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.m5d.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 19, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 384, + "name": "ml.m5d.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 20, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": true, + "memoryGiB": 0, + "name": "ml.geospatial.interactive", + "supportedImageNames": [ + "sagemaker-geospatial-v1-0" + ], + "vcpuNum": 0 + }, + { + "_defaultOrder": 21, + "_isFastLaunch": true, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 4, + "name": "ml.c5.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 22, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 8, + "name": "ml.c5.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 23, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.c5.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 24, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.c5.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 25, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 72, + "name": "ml.c5.9xlarge", + "vcpuNum": 36 + }, + { + "_defaultOrder": 26, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 96, + "name": "ml.c5.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 27, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 144, + "name": "ml.c5.18xlarge", + "vcpuNum": 72 + }, + { + "_defaultOrder": 28, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.c5.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 29, + "_isFastLaunch": true, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.g4dn.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 30, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.g4dn.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 31, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.g4dn.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 32, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.g4dn.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 33, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 4, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.g4dn.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 34, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.g4dn.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 35, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 61, + "name": "ml.p3.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 36, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 4, + "hideHardwareSpecs": false, + "memoryGiB": 244, + "name": "ml.p3.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 37, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 488, + "name": "ml.p3.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 38, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 768, + "name": "ml.p3dn.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 39, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.r5.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 40, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.r5.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 41, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.r5.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 42, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.r5.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 43, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.r5.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 44, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 384, + "name": "ml.r5.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 45, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 512, + "name": "ml.r5.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 46, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 768, + "name": "ml.r5.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 47, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.g5.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 48, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.g5.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 49, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.g5.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 50, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.g5.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 51, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.g5.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 52, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 4, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.g5.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 53, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 4, + "hideHardwareSpecs": false, + "memoryGiB": 384, + "name": "ml.g5.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 54, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 768, + "name": "ml.g5.48xlarge", + "vcpuNum": 192 + }, + { + "_defaultOrder": 55, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 1152, + "name": "ml.p4d.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 56, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 1152, + "name": "ml.p4de.24xlarge", + "vcpuNum": 96 + } + ], + "instance_type": "ml.t3.medium", + "kernelspec": { + "display_name": "tmp-bedrock", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.16" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/01_Generation/01_zero_shot_generation.ipynb b/01_Generation/01_zero_shot_generation.ipynb new file mode 100644 index 00000000..aad69b2b --- /dev/null +++ b/01_Generation/01_zero_shot_generation.ipynb @@ -0,0 +1,836 @@ +{ + "cells": [ + { + "attachments": {}, + "cell_type": "markdown", + "id": "497d5095-1305-4435-8970-f7fc40e2635b", + "metadata": {}, + "source": [ + "# Invoke Bedrock model using LangChain and a zero-shot prompt" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "406280e0-6c82-48e7-af07-4c18282f1b9d", + "metadata": {}, + "source": [ + "## Introduction\n", + "\n", + "In this notebook we show how to use a LLM to generate an email response to a customer who provided negative feedback on the quality of customer service that they received from the support engineer. \n", + "\n", + "We will use Anthropic's Claude model provided by Bedrock in this example. We will use the Bedrock version that is integrated with [LangChain](https://python.langchain.com/docs/get_started/introduction.html). LangChain is a framework for developing applications powered by language models. The key aspects of this framework allow us to augment the Large Language Models by chaining together various components to create advanced use cases.\n", + "\n", + "In this notebook we will use the Bedrock API provided by LangChain. The prompt used in this example is called a zero-shot prompt because we are not providing any additional context other than the prompt.\n", + "\n", + "**Note:** *This notebook can be run within or outside of AWS environment*.\n", + "\n", + "#### Context\n", + "In the previous example `00_generate_w_bedrock.ipynb`, we explored how to use Boto3 client to communicate with Amazon Bedrock API. In this notebook, we will try to add a bit more complexity to leverage the LangChain framework for the similar use case. We will explore the use of Amazon Bedrock integration within LangChain framework and how it could be used to generate text with the help of `PromptTemplate`.\n", + "\n", + "#### Pattern\n", + "We will simply provide the LangChain implementation of Amazon Bedrock API with an input consisting of a task, an instruction and an input for the model under the hood to generate an output without providing any additional example. The purpose here is to demonstrate how the powerful LLMs easily understand the task at hand and generate compelling outputs.\n", + "\n", + "![](./images/bedrock_langchain.jpg)\n", + "\n", + "#### Useccase\n", + "To demonstrate the generation capability of models in Amazon Bedrock, let's take the use case of email generation.\n", + "\n", + "#### Persona\n", + "You are Bob a Customer Service Manager at AnyCompany and some of your customers are not happy with the customer service and are providing negative feedbacks on the service provided by customer support engineers. Now, you would like to respond to those customers humbly aplogizing for the poor service and regain trust. You need the help of an LLM to generate a bulk of emails for you which are human friendly and personalized to the customer's sentiment from previous email correspondence.\n", + "\n", + "#### Implementation\n", + "To fulfill this use case, in this notebook we will show how to generate an email with a thank you note based on the customer's previous email. We will use the Amazon Titan Text Large model using the Amazon Bedrock LangChain integration. " + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "b7daa1a8-d21a-410c-adbf-b253c2dabf80", + "metadata": { + "tags": [] + }, + "source": [ + "## Invoke the Bedrock client using LangChain Integration\n", + "\n", + "Lets begin with creating an instance of Bedrock class from llms. This expects a `model_id` of the model available in Amazon Bedrock. \n", + "\n", + "Optionally you can pass on a previously created boto3 client as well as some `model_kwargs` which can hold parameters such as `temperature`, `topP`, `maxTokenCount` or `stopSequences` (more on parameters can be explored in Amazon Bedrock console).\n", + "\n", + "Available text generation models under Amazon Bedrock have the following IDs:\n", + "\n", + "- amazon.titan-tg1-large\n", + "- ai21.j2-grande-instruct\n", + "- ai21.j2-jumbo-instruct\n", + "- anthropic.claude-instant-v1\n", + "- anthropic.claude-v1" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "de0361e7-168f-46a3-be55-7071c4f0500e", + "metadata": {}, + "source": [ + "## Setup " + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "d5f12f5d", + "metadata": {}, + "source": [ + "#### ⚠️⚠️⚠️ Execute the following cells before running this notebook ⚠️⚠️⚠️\n", + "\n", + "For a detailed description on what the following cells do refer to [Bedrock boto3 setup](../00_Intro/bedrock_boto3_setup.ipynb) notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "49e2c0a9-4838-4f2b-bb36-61c0cbcd62af", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Make sure you run `download-dependencies.sh` from the root of the repository to download the dependencies before running this cell\n", + "%pip install ../dependencies/botocore-1.29.162-py3-none-any.whl ../dependencies/boto3-1.26.162-py3-none-any.whl ../dependencies/awscli-1.27.162-py3-none-any.whl --force-reinstall\n", + "%pip install langchain==0.0.190 --quiet" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ee2be60b-480a-4524-8a1d-3529ebcb812d", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "%pip install langchain==0.0.190 --quiet" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fef011e8", + "metadata": {}, + "outputs": [], + "source": [ + "#### Un comment the following lines to run from your local environment outside of the AWS account with Bedrock access\n", + "\n", + "#import os\n", + "#os.environ['BEDROCK_ASSUME_ROLE'] = ''\n", + "#os.environ['AWS_PROFILE'] = ''" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f26378e7", + "metadata": {}, + "outputs": [], + "source": [ + "import boto3\n", + "import json\n", + "import os\n", + "import sys\n", + "\n", + "module_path = \"..\"\n", + "sys.path.append(os.path.abspath(module_path))\n", + "from utils import bedrock, print_ww\n", + "\n", + "os.environ['AWS_DEFAULT_REGION'] = 'us-east-1'\n", + "boto3_bedrock = bedrock.get_bedrock_client(os.environ.get('BEDROCK_ASSUME_ROLE', None))" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "66824ae7", + "metadata": {}, + "source": [ + "Amazon Bedrock API can be used with these parameters below:\n", + "- `inputTextTokenCount`: The size of the input prompt\n", + "- `tokenCount`: The number of tokens of the entire prompt + the output\n", + "\n", + "It will return the results as below:\n", + "- `outputText`: The text generated by the model" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "13f75ea1-dce1-4794-84bf-68d9c22a2d97", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from langchain.llms.bedrock import Bedrock\n", + "\n", + "inference_modifier = {'max_tokens_to_sample':4096, \n", + " \"temperature\":0.5,\n", + " \"top_k\":250,\n", + " \"top_p\":1,\n", + " \"stop_sequences\": [\"\\n\\nHuman\"]\n", + " }\n", + "\n", + "textgen_llm = Bedrock(model_id = \"anthropic.claude-v1\",\n", + " client = boto3_bedrock, \n", + " model_kwargs = inference_modifier \n", + " )" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "c9fc4301", + "metadata": {}, + "source": [ + "LangChain has abstracted away the Amazon Bedrock API and made it easy to build use cases. You can pass in your prompt and it is automatically routed to the appropriate API to generate the response. You simply get the text output as-is and don't have to extract the results out of the response body.\n", + "\n", + "Let's prepare the prompt to generate an email for the Customer Service Manager to send to the customer." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c4e3304c", + "metadata": {}, + "outputs": [], + "source": [ + "response = textgen_llm(\"\"\"Write an email from Bob, Customer Service Manager, \n", + "to the customer \"John Doe\" that provided negative feedback on the service \n", + "provided by our customer support engineer.\\n\\nHuman:\"\"\")\n", + "\n", + "print_ww(response)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "f8ed23ea", + "metadata": {}, + "source": [ + "## Conclusion\n", + "You have now experimented with using `LangChain` framework which provides an abstraction layer on Amazon Bedrock API. Using this framework you have seen the usecase of generating an email responding to a customer due to their negative feedback.\n", + "\n", + "### Take aways\n", + "- Adapt this notebook to experiment with different models available through Amazon Bedrock such as Anthropic Claude and AI21 Labs Jurassic models.\n", + "- Change the prompts to your specific usecase and evaluate the output of different models.\n", + "- Play with the different parameters to understand the latency and responsiveness of the service.\n", + "- Apply different prompt engineering principles to get better outputs.\n", + "\n", + "## Thank You" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b604d5da-3a28-476c-b1ff-20aedf46c898", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "availableInstances": [ + { + "_defaultOrder": 0, + "_isFastLaunch": true, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 4, + "name": "ml.t3.medium", + "vcpuNum": 2 + }, + { + "_defaultOrder": 1, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 8, + "name": "ml.t3.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 2, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.t3.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 3, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.t3.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 4, + "_isFastLaunch": true, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 8, + "name": "ml.m5.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 5, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.m5.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 6, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.m5.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 7, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.m5.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 8, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.m5.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 9, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.m5.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 10, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.m5.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 11, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 384, + "name": "ml.m5.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 12, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 8, + "name": "ml.m5d.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 13, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.m5d.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 14, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.m5d.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 15, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.m5d.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 16, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.m5d.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 17, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.m5d.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 18, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.m5d.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 19, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 384, + "name": "ml.m5d.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 20, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": true, + "memoryGiB": 0, + "name": "ml.geospatial.interactive", + "supportedImageNames": [ + "sagemaker-geospatial-v1-0" + ], + "vcpuNum": 0 + }, + { + "_defaultOrder": 21, + "_isFastLaunch": true, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 4, + "name": "ml.c5.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 22, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 8, + "name": "ml.c5.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 23, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.c5.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 24, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.c5.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 25, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 72, + "name": "ml.c5.9xlarge", + "vcpuNum": 36 + }, + { + "_defaultOrder": 26, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 96, + "name": "ml.c5.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 27, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 144, + "name": "ml.c5.18xlarge", + "vcpuNum": 72 + }, + { + "_defaultOrder": 28, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.c5.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 29, + "_isFastLaunch": true, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.g4dn.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 30, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.g4dn.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 31, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.g4dn.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 32, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.g4dn.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 33, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 4, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.g4dn.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 34, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.g4dn.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 35, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 61, + "name": "ml.p3.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 36, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 4, + "hideHardwareSpecs": false, + "memoryGiB": 244, + "name": "ml.p3.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 37, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 488, + "name": "ml.p3.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 38, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 768, + "name": "ml.p3dn.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 39, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.r5.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 40, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.r5.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 41, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.r5.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 42, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.r5.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 43, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.r5.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 44, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 384, + "name": "ml.r5.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 45, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 512, + "name": "ml.r5.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 46, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 768, + "name": "ml.r5.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 47, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.g5.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 48, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.g5.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 49, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.g5.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 50, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.g5.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 51, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.g5.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 52, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 4, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.g5.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 53, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 4, + "hideHardwareSpecs": false, + "memoryGiB": 384, + "name": "ml.g5.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 54, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 768, + "name": "ml.g5.48xlarge", + "vcpuNum": 192 + }, + { + "_defaultOrder": 55, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 1152, + "name": "ml.p4d.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 56, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 1152, + "name": "ml.p4de.24xlarge", + "vcpuNum": 96 + } + ], + "instance_type": "ml.t3.medium", + "kernelspec": { + "display_name": "venv", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.16" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/01_Generation/02_contextual_generation.ipynb b/01_Generation/02_contextual_generation.ipynb new file mode 100644 index 00000000..716fe3ee --- /dev/null +++ b/01_Generation/02_contextual_generation.ipynb @@ -0,0 +1,897 @@ +{ + "cells": [ + { + "attachments": {}, + "cell_type": "markdown", + "id": "af3f88dd-0f5e-427e-84ee-8934982300d1", + "metadata": { + "tags": [] + }, + "source": [ + "# Bedrock with LangChain using a Prompt that includes Context" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "b920ca4a-a71d-4630-a6e4-577d95192ad1", + "metadata": {}, + "source": [ + "## Introduction\n", + "\n", + "In this notebook we show you how to generate an email response to a customer who was not happy with the quality of customer service that they received from the customer support engineer. We will provide additional context to the model by providing the contents of the actual email that was received from the unhappy customer.\n", + "\n", + "Because of additional context in the prompt, the text produced by the Amazon Titan Large language model in this notebook is of much better quality and relevance than the content produced earlier through zero-shot prompts.\n", + "\n", + "[LangChain](https://python.langchain.com/docs/get_started/introduction.html) is a framework for developing applications powered by language models. The key aspects of this framework allow us to augment the Large Language Models by chaining together various components to create advanced use cases.\n", + "\n", + "In this notebook we will use the Bedrock API provided by LangChain. The prompt used in this example creates a custom LangChain prompt template for adding context to the text generation request. \n", + "\n", + "**Note:** *This notebook can be run within or outside of AWS environment.*\n", + "\n", + "#### Context\n", + "In the previous example `01_zero_shot_generation.ipynb`, we explored how to use LangChain framework to communicate with Amazon Bedrock API. In this notebook we will try to add a bit more complexity with the help of `PromptTemplates` to leverage the LangChain framework for the similar use case. `PrompTemplates` allow you to create generic shells which can be populated with information later and get model outputs based on different scenarios.\n", + "\n", + "As part of this notebook we will explore the use of Amazon Bedrock integration within LangChain framework and how it could be used to generate text with the help of `PromptTemplate`.\n", + "\n", + "#### Pattern\n", + "We will simply provide the LangChain implementation of Amazon Bedrock API with an input consisting of a task, an instruction and an input for the model under the hood to generate an output without providing any additional example. The purpose here is to demonstrate how the powerful LLMs easily understand the task at hand and generate compelling outputs.\n", + "\n", + "![](./images/bedrock_langchain.jpg)\n", + "\n", + "#### Use case\n", + "To demonstrate the generation capability of models in Amazon Bedrock, let's take the use case of email generation.\n", + "\n", + "#### Persona\n", + "You are Bob a Customer Service Manager at AnyCompany and some of your customers are not happy with the customer service and are providing negative feedbacks on the service provided by customer support engineers. Now, you would like to respond to those customers humbly aplogizing for the poor service and regain trust. You need the help of an LLM to generate a bulk of emails for you which are human friendly and personalized to the customer's sentiment from previous email correspondence.\n", + "\n", + "#### Implementation\n", + "To fulfill this use case, we will show you how to generate an email with a thank you note based on the customer's previous email. We will use the Amazon Titan Text Large model using the Amazon Bedrock LangChain integration. \n" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "aa11828a-243d-4808-9c92-e8caf4cebd37", + "metadata": {}, + "source": [ + "## Setup\n", + "Before we get started with the implementation we have to make sure that the required boto3 and botocore packages are installed. These will be used to leverage the Amazon Bedrock API client." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "49e2c0a9-4838-4f2b-bb36-61c0cbcd62af", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Make sure you run `download-dependencies.sh` from the root of the repository to download the dependencies before running this cell\n", + "%pip install ../dependencies/botocore-1.29.162-py3-none-any.whl ../dependencies/boto3-1.26.162-py3-none-any.whl ../dependencies/awscli-1.27.162-py3-none-any.whl --force-reinstall" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "ac4dc1bc", + "metadata": {}, + "source": [ + "Let's begin with creating an instance of Bedrock class from llms. This expects a `model_id` which is the ARN of the model available in Amazon Bedrock. \n", + "\n", + "Optionally you can pass on a previously created boto3 client as well as some `model_kwargs` which can hold parameters such as `temperature`, `topP`, `maxTokenCount` or `stopSequences` (more on parameters can be explored in Amazon Bedrock console).\n", + "\n", + "Available text generation models under Amazon Bedrock have the following IDs:\n", + "\n", + "- amazon.titan-tg1-large\n", + "- ai21.j2-grande-instruct\n", + "- ai21.j2-jumbo-instruct\n", + "- anthropic.claude-instant-v1\n", + "- anthropic.claude-v1" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "b8365753", + "metadata": {}, + "source": [ + "#### ⚠️⚠️⚠️ Execute the following cells before running this notebook ⚠️⚠️⚠️\n", + "\n", + "For a detailed description on what the following cells do refer to [Bedrock boto3 setup](../00_Intro/bedrock_boto3_setup.ipynb) notebook." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "827b80d8", + "metadata": {}, + "source": [ + "For this notebook we also need langchain version >= 0.0.190 which has Amazon Bedrock class implemented under llms module. Also we are installing the transformers framework from HuggingFace, which we will use to quickly count the number of tokens in the input prompt." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f84b2e23", + "metadata": {}, + "outputs": [], + "source": [ + "%pip install langchain==0.0.190 --quiet\n", + "%pip install transformers==4.24.0 --quiet" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e0a377a2", + "metadata": {}, + "outputs": [], + "source": [ + "#### Un comment the following lines to run from your local environment outside of the AWS account with Bedrock access\n", + "\n", + "#import os\n", + "#os.environ['BEDROCK_ASSUME_ROLE'] = ''\n", + "#os.environ['AWS_PROFILE'] = ''" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2cb5fe2d", + "metadata": {}, + "outputs": [], + "source": [ + "import boto3\n", + "import json\n", + "import os\n", + "import sys\n", + "\n", + "module_path = \"..\"\n", + "sys.path.append(os.path.abspath(module_path))\n", + "from utils import bedrock, print_ww\n", + "\n", + "os.environ['AWS_DEFAULT_REGION'] = 'us-east-1'\n", + "boto3_bedrock = bedrock.get_bedrock_client(os.environ.get('BEDROCK_ASSUME_ROLE', None))" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "b7daa1a8-d21a-410c-adbf-b253c2dabf80", + "metadata": { + "tags": [] + }, + "source": [ + "## Invoke the Bedrock LLM Model\n", + "\n", + "For more details for the parameters please refer to the Bedrock API page." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8ffa1250-56cd-4b6d-b3d8-c62baac143ce", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from langchain.llms.bedrock import Bedrock\n", + "\n", + "inference_modifier = {'max_tokens_to_sample':4096, \n", + " \"temperature\":0.5,\n", + " \"top_k\":250,\n", + " \"top_p\":1,\n", + " \"stop_sequences\": [\"\\n\\nHuman\"]\n", + " }\n", + "\n", + "textgen_llm = Bedrock(model_id = \"anthropic.claude-v1\",\n", + " client = boto3_bedrock, \n", + " model_kwargs = inference_modifier \n", + " )\n" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "de2678ed-f0d6-444f-9a57-5170dd1952f7", + "metadata": {}, + "source": [ + "## Create a LangChain custom prompt template\n", + "\n", + "By creating a template for the prompt we can pass it different input variables to it on every run. This is useful when you have to generate content with different input variables that you may be fetching from a database.\n", + "\n", + "Previously we hardcoded the prompt, it might be the case that you have multiple customers sending similar negative feedback and you now want to use each of those customer's emails and respond to them with an apology but you also want to keep the response a bit personalized. In the following cell we are exploring how you can create a `PromptTemplate` to achieve this pattern." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "dbec103a-97ae-4e9e-9d80-dc20f354a228", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from langchain import PromptTemplate\n", + "\n", + "# Create a prompt template that has multiple input variables\n", + "multi_var_prompt = PromptTemplate(\n", + " input_variables=[\"customerServiceManager\", \"customerName\", \"feedbackFromCustomer\"], \n", + " template=\"\"\"Create an apology email from the Service Manager {customerServiceManager} to {customerName}. \n", + " in response to the following feedback that was received from the customer: {feedbackFromCustomer}.\n", + " \"\"\"\n", + " \n", + ")\n", + "\n", + "# Pass in values to the input variables\n", + "prompt = multi_var_prompt.format(customerServiceManager=\"Bob\", \n", + " customerName=\"John Doe\", \n", + " feedbackFromCustomer=\"\"\"Hello Bob,\n", + " I am very disappointed with the recent experience I had when I called your customer support.\n", + " I was expecting an immediate call back but it took three days for us to get a call back.\n", + " The first suggestion to fix the problem was incorrect. Ultimately the problem was fixed after three days.\n", + " We are very unhappy with the response provided and may consider taking our business elsewhere.\n", + " \"\"\"\n", + " )\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e45bdfd5-ce76-42e9-81cd-b0892337d163", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "num_tokens = textgen_llm.get_num_tokens(prompt)\n", + "print(f\"Our prompt has {num_tokens} tokens\")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "a1bf31e9-56c0-408f-a652-9e23de446aef", + "metadata": {}, + "source": [ + "## Invoke again\n", + "\n", + "invoke using the prompt template and expect to see a curated response back" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c1064c57-27a4-48c5-911b-e4f1dfeff122", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "response = textgen_llm(prompt)\n", + "\n", + "email = response[response.index('\\n')+1:] \n", + "\n", + "print_ww(email)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "9e9abc40", + "metadata": {}, + "source": [ + "## Summary\n", + "\n", + "To conclude we learnt that invoking the LLM without any context might not yield the desired results. By adding context and further using the the prompt template to constrain the output from the LLM we are able to successfully get our desired output" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "0403c457", + "metadata": {}, + "source": [] + } + ], + "metadata": { + "availableInstances": [ + { + "_defaultOrder": 0, + "_isFastLaunch": true, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 4, + "name": "ml.t3.medium", + "vcpuNum": 2 + }, + { + "_defaultOrder": 1, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 8, + "name": "ml.t3.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 2, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.t3.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 3, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.t3.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 4, + "_isFastLaunch": true, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 8, + "name": "ml.m5.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 5, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.m5.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 6, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.m5.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 7, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.m5.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 8, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.m5.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 9, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.m5.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 10, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.m5.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 11, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 384, + "name": "ml.m5.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 12, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 8, + "name": "ml.m5d.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 13, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.m5d.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 14, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.m5d.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 15, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.m5d.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 16, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.m5d.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 17, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.m5d.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 18, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.m5d.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 19, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 384, + "name": "ml.m5d.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 20, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": true, + "memoryGiB": 0, + "name": "ml.geospatial.interactive", + "supportedImageNames": [ + "sagemaker-geospatial-v1-0" + ], + "vcpuNum": 0 + }, + { + "_defaultOrder": 21, + "_isFastLaunch": true, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 4, + "name": "ml.c5.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 22, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 8, + "name": "ml.c5.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 23, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.c5.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 24, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.c5.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 25, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 72, + "name": "ml.c5.9xlarge", + "vcpuNum": 36 + }, + { + "_defaultOrder": 26, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 96, + "name": "ml.c5.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 27, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 144, + "name": "ml.c5.18xlarge", + "vcpuNum": 72 + }, + { + "_defaultOrder": 28, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.c5.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 29, + "_isFastLaunch": true, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.g4dn.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 30, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.g4dn.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 31, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.g4dn.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 32, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.g4dn.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 33, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 4, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.g4dn.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 34, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.g4dn.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 35, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 61, + "name": "ml.p3.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 36, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 4, + "hideHardwareSpecs": false, + "memoryGiB": 244, + "name": "ml.p3.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 37, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 488, + "name": "ml.p3.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 38, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 768, + "name": "ml.p3dn.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 39, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.r5.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 40, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.r5.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 41, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.r5.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 42, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.r5.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 43, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.r5.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 44, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 384, + "name": "ml.r5.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 45, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 512, + "name": "ml.r5.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 46, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 768, + "name": "ml.r5.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 47, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.g5.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 48, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.g5.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 49, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.g5.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 50, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.g5.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 51, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.g5.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 52, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 4, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.g5.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 53, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 4, + "hideHardwareSpecs": false, + "memoryGiB": 384, + "name": "ml.g5.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 54, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 768, + "name": "ml.g5.48xlarge", + "vcpuNum": 192 + }, + { + "_defaultOrder": 55, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 1152, + "name": "ml.p4d.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 56, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 1152, + "name": "ml.p4de.24xlarge", + "vcpuNum": 96 + } + ], + "instance_type": "ml.t3.medium", + "kernelspec": { + "display_name": "bedrock", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.16" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/01_Generation/README.md b/01_Generation/README.md new file mode 100644 index 00000000..f2153131 --- /dev/null +++ b/01_Generation/README.md @@ -0,0 +1,26 @@ +## Overview + +In this lab, you will learn how to generate text using LLMs on Amazon Bedrock. We will demonstrate the use of LLMs using the Bedrock API as well as how to utilize the LangChain framework that integrates with Bedrock. + +We will first generate text using a zero-shot prompt. The zero-shot prompt provides instruction to generate text content without providing a detailed context. We will explore zero-shot email generation using two approaches: Bedrock API (BoTo3) and Bedrock integration with LangChain. Then we will show how to improve the quality of the generated text by providing additional context in the prompt. + +## Audience + +Architects and developer who want to learn how to use Amazon Bedrock LLMs to generate text. +Some of the business use cases for text generation include: + +- Generating product descriptions based on product features and benefits for marketing teams +- Generation of media articles and marketing campaigns +- Email and reports generation + +## Workshop Notebooks + +We will generate an email response to a customer where the customer had provided negative feedback on service received from a customer support engineer. The text generation workshop includes the following three notebooks. +1. [Generate Email with Amazon Titan](./00_generate_w_bedrock.ipynb) - Invokes Amazon Titan large text model using Bedrock API to generate an email response to a customer. It uses a zero-shot prompt without context as instruction to the model. +2. [Zero-shot Text Generation with Anthropic Claude](01_zero_shot_generation.ipynb) - Invokes Anthropic's Claude Text model using the LangChain framework integration with Bedrock to generate an email to a customer. It uses a zero-shot prompt without context as instruction to the model. +3. [Contextual Text Generation using LangChain](./02_contextual_generation.ipynb) - We provide additional context in the prompt which includes the original email from the customer that we would like the model to generate a response for. The example includes a custom prompt template in LangChain, so that variable values can be substitued in the prompt at runtime. + +## Architecture + +![Bedrock](./images/bedrock.jpg) +![Bedrock](./images/bedrock_langchain.jpg) \ No newline at end of file diff --git a/01_Generation/images/bedrock.jpg b/01_Generation/images/bedrock.jpg new file mode 100644 index 00000000..661f05a0 Binary files /dev/null and b/01_Generation/images/bedrock.jpg differ diff --git a/01_Generation/images/bedrock_langchain.jpg b/01_Generation/images/bedrock_langchain.jpg new file mode 100644 index 00000000..b0d08a03 Binary files /dev/null and b/01_Generation/images/bedrock_langchain.jpg differ diff --git a/02_Summarization/.DS_Store b/02_Summarization/.DS_Store new file mode 100644 index 00000000..402f9b10 Binary files /dev/null and b/02_Summarization/.DS_Store differ diff --git a/02_Summarization/01.small-text-summarization-claude.ipynb b/02_Summarization/01.small-text-summarization-claude.ipynb new file mode 100644 index 00000000..c098fb6d --- /dev/null +++ b/02_Summarization/01.small-text-summarization-claude.ipynb @@ -0,0 +1,923 @@ +{ + "cells": [ + { + "attachments": {}, + "cell_type": "markdown", + "id": "fded102b", + "metadata": {}, + "source": [ + "# Text summarization with small files with Anthropic Claude" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "fab8b2cf", + "metadata": {}, + "source": [ + "## Overview\n", + "\n", + "In this example, you are going to ingest a small amount of data (String data) directly into Amazon Bedrock API (using Anthropic Claude model) and give it an instruction to summarize the respective text.\n", + "\n", + "### Architecture\n", + "\n", + "![](./images/41-text-simple-1.png)\n", + "\n", + "In this architecture:\n", + "\n", + "1. A small piece of text (or small file) is loaded\n", + "1. A foundational model processes the input data\n", + "1. Model returns a response with the summary of the ingested text\n", + "\n", + "### Use case\n", + "\n", + "This approach can be used to summarize call transcripts, meetings transcripts, books, articles, blog posts, and other relevant content.\n", + "\n", + "### Challenges\n", + "\n", + "This approach can be used when the input text or file fits within the model context length. In notebook `02.long-text-summarization-titan.ipynb`, we will explore an approach to address the challenge when users have large document(s) that exceed the token limit.\n", + "\n", + "## Setup" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "7eaf6ce4", + "metadata": {}, + "source": [ + "#### ⚠️⚠️⚠️ Execute the following cells before running this notebook ⚠️⚠️⚠️\n", + "\n", + "For a detailed description on what the following cells do refer to [Bedrock boto3 setup](../00_Intro/bedrock_boto3_setup.ipynb) notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a77295e8-364e-4a29-b320-670d697a0b3e", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Make sure you run `download-dependencies.sh` from the root of the repository to download the dependencies before running this cell\n", + "%pip install ../dependencies/botocore-1.29.162-py3-none-any.whl ../dependencies/boto3-1.26.162-py3-none-any.whl ../dependencies/awscli-1.27.162-py3-none-any.whl --force-reinstall\n", + "%pip install langchain==0.0.190 --quiet" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "66edf151", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "#### Un comment the following lines to run from your local environment outside of the AWS account with Bedrock access\n", + "\n", + "#import os\n", + "#os.environ['BEDROCK_ASSUME_ROLE'] = ''\n", + "#os.environ['AWS_PROFILE'] = ''" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "871b730e", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "import boto3\n", + "import json\n", + "import os\n", + "import sys\n", + "\n", + "module_path = \"..\"\n", + "sys.path.append(os.path.abspath(module_path))\n", + "from utils import bedrock, print_ww\n", + "\n", + "os.environ['AWS_DEFAULT_REGION'] = 'us-east-1'\n", + "boto3_bedrock = bedrock.get_bedrock_client(os.environ.get('BEDROCK_ASSUME_ROLE', None))" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "342796d0", + "metadata": {}, + "source": [ + "## Summarizing a short text with boto3\n", + " \n", + "To learn detail of API request to Amazon Bedrock, this notebook introduces how to create API request and send the request via Boto3 rather than relying on langchain, which gives simpler API by wrapping Boto3 operation. " + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "9da4d9ee", + "metadata": {}, + "source": [ + "### Request Syntax of InvokeModel in Boto3\n", + "\n", + "\n", + "We use `InvokeModel` API for sending request to a foundation model. Here is an example of API request for sending text to Anthropic Claude. Inference parameters in `textGenerationConfig` depends on the model that you are about to use. Inference paramerters of Anthropic Claude are:\n", + "\n", + "- **temperature** tunes the degree of randomness in generation. Lower temperatures mean less random generations.\n", + "- **top_p** less than one keeps only the smallest set of most probable tokens with probabilities that add up to top_p or higher for generation.\n", + "- **top_k** can be used to reduce repetitiveness of generated tokens. The higher the value, the stronger a penalty is applied to previously present tokens, proportional to how many times they have already appeared in the prompt or prior generation.\n", + "- **max_tokens_to_sample** is maximum number of tokens to generate. Responses are not guaranteed to fill up to the maximum desired length.\n", + "- **stop_sequences** are sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.\n", + "\n", + "```python\n", + "response = bedrock.invoke_model(body=\n", + " {\"prompt\":\"this is where you place your input text\",\n", + " \"max_tokens_to_sample\":4096,\n", + " \"temperature\":0.5,\n", + " \"top_k\":250,\n", + " \"top_p\":0.5,\n", + " \"stop_sequences\":[]\n", + " },\n", + " modelId=\"anthropic.claude-v1\", \n", + " accept=accept, \n", + " contentType=contentType)\n", + "\n", + "```\n", + "\n", + "### Writing prompt with text to be summarized\n", + "\n", + "In this notebook, you can use any short text whose tokens are less than the maximum token of a foundation model. As an exmple of short text, let's take one paragraph of an [AWS blog post](https://aws.amazon.com/jp/blogs/machine-learning/announcing-new-tools-for-building-with-generative-ai-on-aws/) about announcement of Amazon Bedrock.\n", + "\n", + "The prompt starts with an instruction `Please provide a summary of the following text.`. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ece0c069", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "prompt = \"\"\"\n", + "Please provide a summary of the following text.\n", + "\n", + "AWS took all of that feedback from customers, and today we are excited to announce Amazon Bedrock, \\\n", + "a new service that makes FMs from AI21 Labs, Anthropic, Stability AI, and Amazon accessible via an API. \\\n", + "Bedrock is the easiest way for customers to build and scale generative AI-based applications using FMs, \\\n", + "democratizing access for all builders. Bedrock will offer the ability to access a range of powerful FMs \\\n", + "for text and images—including Amazons Titan FMs, which consist of two new LLMs we’re also announcing \\\n", + "today—through a scalable, reliable, and secure AWS managed service. With Bedrock’s serverless experience, \\\n", + "customers can easily find the right model for what they’re trying to get done, get started quickly, privately \\\n", + "customize FMs with their own data, and easily integrate and deploy them into their applications using the AWS \\\n", + "tools and capabilities they are familiar with, without having to manage any infrastructure (including integrations \\\n", + "with Amazon SageMaker ML features like Experiments to test different models and Pipelines to manage their FMs at scale).\n", + "\n", + "\"\"\"" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "3efddbb0", + "metadata": {}, + "source": [ + "## Creating request body with prompt and inference parameters \n", + "\n", + "Following the request syntax of `invoke_model`, you create request body with the above prompt and inference parameters." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "60d191eb", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "body = json.dumps({\"prompt\": prompt,\n", + " \"max_tokens_to_sample\":4096,\n", + " \"temperature\":0.5,\n", + " \"top_k\":250,\n", + " \"top_p\":0.5,\n", + " \"stop_sequences\":[]\n", + " }) " + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "cc9f3326", + "metadata": {}, + "source": [ + "## Invoke foundation model via Boto3\n", + "\n", + "Here sends the API request to Amazon Bedrock with specifying request parameters `modelId`, `accept`, and `contentType`. Following the prompt, the foundation model in Amazon Bedrock summarizes the text." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9f400d76", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "modelId = 'anthropic.claude-v1' # change this to use a different version from the model provider\n", + "accept = 'application/json'\n", + "contentType = 'application/json'\n", + "\n", + "response = boto3_bedrock.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)\n", + "response_body = json.loads(response.get('body').read())\n", + "\n", + "print_ww(response_body.get('completion'))" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "180c84a0", + "metadata": {}, + "source": [ + "In the above the Bedrock service generates the entire summary for the given prompt in a single output, this can be slow if the output contains large amount of tokens. \n", + "\n", + "Below we explore the option how we can use Bedrock to stream the output such that the user could start consuming it as it is being generated by the model. For this Bedrock supports `invoke_model_with_response_stream` API providing `ResponseStream` that streams the output in form of chunks." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "94e5ca2f", + "metadata": {}, + "outputs": [], + "source": [ + "response = boto3_bedrock.invoke_model_with_response_stream(body=body, modelId=modelId, accept=accept, contentType=contentType)\n", + "stream = response.get('body')\n", + "output = list(stream)\n", + "output" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "fc9c1b3b", + "metadata": {}, + "source": [ + "Instead of generating the entire output, Bedrock sends smaller chunks from the model. This can be displayed in a consumable manner as well." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "01ab3461", + "metadata": {}, + "outputs": [], + "source": [ + "from IPython.display import display_markdown,Markdown,clear_output" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f0148858", + "metadata": {}, + "outputs": [], + "source": [ + "response = boto3_bedrock.invoke_model_with_response_stream(body=body, modelId=modelId, accept=accept, contentType=contentType)\n", + "stream = response.get('body')\n", + "output = []\n", + "i = 1\n", + "if stream:\n", + " for event in stream:\n", + " chunk = event.get('chunk')\n", + " if chunk:\n", + " chunk_obj = json.loads(chunk.get('bytes').decode())\n", + " text = chunk_obj['completion']\n", + " clear_output(wait=True)\n", + " output.append(text)\n", + " display_markdown(Markdown(''.join(output)))\n", + " i+=1" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "93e8ee83", + "metadata": {}, + "source": [ + "## Conclusion\n", + "You have now experimented with using `boto3` SDK which provides a vanilla exposure to Amazon Bedrock API. Using this API you have seen the use case of generating a summary of AWS news about Amazon Bedrock in 2 different ways: entire output and streaming output generation.\n", + "\n", + "### Take aways\n", + "- Adapt this notebook to experiment with different models available through Amazon Bedrock such as Amazon Titan and AI21 Labs Jurassic models.\n", + "- Change the prompts to your specific usecase and evaluate the output of different models.\n", + "- Play with the token length to understand the latency and responsiveness of the service.\n", + "- Apply different prompt engineering principles to get better outputs.\n", + "\n", + "## Thank You" + ] + } + ], + "metadata": { + "availableInstances": [ + { + "_defaultOrder": 0, + "_isFastLaunch": true, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 4, + "name": "ml.t3.medium", + "vcpuNum": 2 + }, + { + "_defaultOrder": 1, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 8, + "name": "ml.t3.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 2, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.t3.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 3, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.t3.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 4, + "_isFastLaunch": true, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 8, + "name": "ml.m5.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 5, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.m5.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 6, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.m5.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 7, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.m5.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 8, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.m5.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 9, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.m5.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 10, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.m5.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 11, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 384, + "name": "ml.m5.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 12, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 8, + "name": "ml.m5d.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 13, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.m5d.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 14, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.m5d.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 15, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.m5d.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 16, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.m5d.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 17, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.m5d.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 18, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.m5d.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 19, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 384, + "name": "ml.m5d.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 20, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": true, + "memoryGiB": 0, + "name": "ml.geospatial.interactive", + "supportedImageNames": [ + "sagemaker-geospatial-v1-0" + ], + "vcpuNum": 0 + }, + { + "_defaultOrder": 21, + "_isFastLaunch": true, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 4, + "name": "ml.c5.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 22, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 8, + "name": "ml.c5.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 23, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.c5.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 24, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.c5.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 25, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 72, + "name": "ml.c5.9xlarge", + "vcpuNum": 36 + }, + { + "_defaultOrder": 26, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 96, + "name": "ml.c5.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 27, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 144, + "name": "ml.c5.18xlarge", + "vcpuNum": 72 + }, + { + "_defaultOrder": 28, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.c5.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 29, + "_isFastLaunch": true, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.g4dn.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 30, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.g4dn.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 31, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.g4dn.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 32, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.g4dn.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 33, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 4, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.g4dn.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 34, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.g4dn.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 35, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 61, + "name": "ml.p3.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 36, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 4, + "hideHardwareSpecs": false, + "memoryGiB": 244, + "name": "ml.p3.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 37, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 488, + "name": "ml.p3.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 38, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 768, + "name": "ml.p3dn.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 39, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.r5.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 40, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.r5.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 41, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.r5.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 42, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.r5.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 43, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.r5.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 44, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 384, + "name": "ml.r5.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 45, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 512, + "name": "ml.r5.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 46, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 768, + "name": "ml.r5.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 47, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.g5.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 48, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.g5.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 49, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.g5.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 50, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.g5.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 51, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.g5.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 52, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 4, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.g5.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 53, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 4, + "hideHardwareSpecs": false, + "memoryGiB": 384, + "name": "ml.g5.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 54, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 768, + "name": "ml.g5.48xlarge", + "vcpuNum": 192 + }, + { + "_defaultOrder": 55, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 1152, + "name": "ml.p4d.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 56, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 1152, + "name": "ml.p4de.24xlarge", + "vcpuNum": 96 + } + ], + "instance_type": "ml.t3.medium", + "kernelspec": { + "display_name": "tmp-bedrock", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.16" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/02_Summarization/01.small-text-summarization-titan.ipynb b/02_Summarization/01.small-text-summarization-titan.ipynb new file mode 100644 index 00000000..c273298c --- /dev/null +++ b/02_Summarization/01.small-text-summarization-titan.ipynb @@ -0,0 +1,928 @@ +{ + "cells": [ + { + "attachments": {}, + "cell_type": "markdown", + "id": "fded102b", + "metadata": {}, + "source": [ + "# Text summarization with small files with Amazon Titan" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "fab8b2cf", + "metadata": {}, + "source": [ + "## Overview\n", + "\n", + "In this example, you are going to ingest a small amount of data (String data) directly into Amazon Bedrock API (using Amazon Titan model) and give it an instruction to summarize the respective text.\n", + "\n", + "### Architecture\n", + "\n", + "![](./images/41-text-simple-1.png)\n", + "\n", + "In this architecture:\n", + "\n", + "1. A small piece of text (or small file) is loaded\n", + "1. A foundational model processes those data\n", + "1. Model returns a response with the summary of the ingested text\n", + "\n", + "### Use case\n", + "\n", + "This approach can be used to summarize call transcripts, meetings transcripts, books, articles, blog posts, and other relevant content.\n", + "\n", + "### Challenges\n", + "This approach can be used when the input text or file fits within the model context length. In notebook `02.long-text-summarization-titan.ipynb`, we will explore an approach to address the challenge when users have large document(s) that exceed the token limit.\n", + "\n", + "\n", + "## Setup" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "e9c888b8", + "metadata": {}, + "source": [ + "#### ⚠️⚠️⚠️ Execute the following cells before running this notebook ⚠️⚠️⚠️\n", + "\n", + "For a detailed description on what the following cells do refer to [Bedrock boto3 setup](../00_Intro/bedrock_boto3_setup.ipynb) notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "229c048f", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Make sure you run `download-dependencies.sh` from the root of the repository to download the dependencies before running this cell\n", + "%pip install ../dependencies/botocore-1.29.162-py3-none-any.whl ../dependencies/boto3-1.26.162-py3-none-any.whl ../dependencies/awscli-1.27.162-py3-none-any.whl --force-reinstall\n", + "%pip install langchain==0.0.190 --quiet" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9e86d86b", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "#### Un comment the following lines to run from your local environment outside of the AWS account with Bedrock access\n", + "\n", + "#import os\n", + "#os.environ['BEDROCK_ASSUME_ROLE'] = ''\n", + "#os.environ['AWS_PROFILE'] = ''" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8d0e24c6", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "import boto3\n", + "import json\n", + "import os\n", + "import sys\n", + "\n", + "module_path = \"..\"\n", + "sys.path.append(os.path.abspath(module_path))\n", + "from utils import bedrock, print_ww\n", + "\n", + "os.environ['AWS_DEFAULT_REGION'] = 'us-east-1'\n", + "boto3_bedrock = bedrock.get_bedrock_client(os.environ.get('BEDROCK_ASSUME_ROLE', None))" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "342796d0", + "metadata": {}, + "source": [ + "## Summarizing a short text with boto3\n", + " \n", + "To learn detail of API request to Amazon Bedrock, this notebook introduces how to create API request and send the request via Boto3 rather than relying on langchain, which gives simpler API by wrapping Boto3 operation. " + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "9da4d9ee", + "metadata": {}, + "source": [ + "### Request Syntax of InvokeModel in Boto3\n", + "\n", + "\n", + "We use `InvokeModel` API for sending request to a foundation model. Here is an example of API request for sending text to Amazon Titan Text Large. Inference parameters in `textGenerationConfig` depends on the model that you are about to use. Inference paramerters of Amazon Titan Text are:\n", + "- **maxTokenCount** configures the max number of tokens to use in the generated response. (int, defaults to 512)\n", + "- **stopSequences** is used to make the model stop at a desired point, such as the end of a sentence or a list. The returned response will not contain the stop sequence.\n", + "- **temperature** modulates the probability density function for the next tokens, implementing the temperature sampling technique. This parameter can be used to deepen or flatten the density function curve. A lower value results in a steeper curve and more deterministic responses, whereas a higher value results in a flatter curve and more random responses. (float, defaults to 0, max value is 1.5)\n", + "- **topP** controls token choices, based on the probability of the potential choices. If you set Top P below 1.0, the model considers only the most probable options and ignores less probable options. The result is more stable and repetitive completions.\n", + "\n", + "```python\n", + "response = bedrock.invoke_model(body={\n", + " \"inputText\": \"this is where you place your input text\",\n", + " \"textGenerationConfig\": {\n", + " \"maxTokenCount\": 4096,\n", + " \"stopSequences\": [],\n", + " \"temperature\":0,\n", + " \"topP\":1\n", + " },\n", + " },\n", + " modelId=\"amazon.titan-tg1-large\", \n", + " accept=accept, \n", + " contentType=contentType)\n", + "\n", + "```\n", + "\n", + "### Writing prompt with text to be summarized\n", + "\n", + "In this notebook, you can use any short text whose tokens are less than the maximum token of a foundation model. As an exmple of short text, let's take one paragraph of an [AWS blog post](https://aws.amazon.com/jp/blogs/machine-learning/announcing-new-tools-for-building-with-generative-ai-on-aws/) about announcement of Amazon Bedrock.\n", + "\n", + "The prompt starts with an instruction `Please provide a summary of the following text.`, and includes text surrounded by `` tag. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ece0c069", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "prompt = \"\"\"\n", + "Please provide a summary of the following text. Do not add any information that is not mentioned in the text below.\n", + "\n", + "\n", + "AWS took all of that feedback from customers, and today we are excited to announce Amazon Bedrock, \\\n", + "a new service that makes FMs from AI21 Labs, Anthropic, Stability AI, and Amazon accessible via an API. \\\n", + "Bedrock is the easiest way for customers to build and scale generative AI-based applications using FMs, \\\n", + "democratizing access for all builders. Bedrock will offer the ability to access a range of powerful FMs \\\n", + "for text and images—including Amazons Titan FMs, which consist of two new LLMs we’re also announcing \\\n", + "today—through a scalable, reliable, and secure AWS managed service. With Bedrock’s serverless experience, \\\n", + "customers can easily find the right model for what they’re trying to get done, get started quickly, privately \\\n", + "customize FMs with their own data, and easily integrate and deploy them into their applications using the AWS \\\n", + "tools and capabilities they are familiar with, without having to manage any infrastructure (including integrations \\\n", + "with Amazon SageMaker ML features like Experiments to test different models and Pipelines to manage their FMs at scale).\n", + "\n", + "\n", + "\"\"\"" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "3efddbb0", + "metadata": {}, + "source": [ + "## Creating request body with prompt and inference parameters \n", + "\n", + "Following the request syntax of `invoke_model`, you create request body with the above prompt and inference parameters." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "60d191eb", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "body = json.dumps({\"inputText\": prompt, \n", + " \"textGenerationConfig\":{\n", + " \"maxTokenCount\":4096,\n", + " \"stopSequences\":[],\n", + " \"temperature\":0,\n", + " \"topP\":1\n", + " },\n", + " }) " + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "cc9f3326", + "metadata": {}, + "source": [ + "## Invoke foundation model via Boto3\n", + "\n", + "Here sends the API request to Amazon Bedrock with specifying request parameters `modelId`, `accept`, and `contentType`. Following the prompt, the foundation model in Amazon Bedrock sumamrizes the text." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9f400d76", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "modelId = 'amazon.titan-tg1-large' # change this to use a different version from the model provider\n", + "accept = 'application/json'\n", + "contentType = 'application/json'\n", + "\n", + "response = boto3_bedrock.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)\n", + "response_body = json.loads(response.get('body').read())\n", + "\n", + "print_ww(response_body.get('results')[0].get('outputText'))" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "3c527882", + "metadata": {}, + "source": [ + "In the above the Bedrock service generates the entire summary for the given prompt in a single output, this can be slow if the output contains large amount of tokens. \n", + "\n", + "Below we explore the option how we can use Bedrock to stream the output such that the user could start consuming it as it is being generated by the model. For this Bedrock supports `invoke_model_with_response_stream` API providing `ResponseStream` that streams the output in form of chunks." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "62787950", + "metadata": {}, + "outputs": [], + "source": [ + "response = boto3_bedrock.invoke_model_with_response_stream(body=body, modelId=modelId, accept=accept, contentType=contentType)\n", + "stream = response.get('body')\n", + "output = list(stream)\n", + "output" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "2ec4a584", + "metadata": {}, + "source": [ + "Instead of generating the entire output, Bedrock sends smaller chunks from the model. This can be displayed in a consumable manner as well." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ffc08b2e", + "metadata": {}, + "outputs": [], + "source": [ + "from IPython.display import display_markdown, Markdown, clear_output" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "06b84ff2", + "metadata": {}, + "outputs": [], + "source": [ + "response = boto3_bedrock.invoke_model_with_response_stream(body=body, modelId=modelId, accept=accept, contentType=contentType)\n", + "stream = response.get('body')\n", + "output = []\n", + "i = 1\n", + "if stream:\n", + " for event in stream:\n", + " chunk = event.get('chunk')\n", + " if chunk:\n", + " chunk_obj = json.loads(chunk.get('bytes').decode())\n", + " text = chunk_obj['outputText']\n", + " clear_output(wait=True)\n", + " output.append(text)\n", + " display_markdown(Markdown(''.join(output)))\n", + " i+=1\n", + "\n", + "clear_output(wait=True)\n", + "print_ww(''.join(output))" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "62a93aeb", + "metadata": {}, + "source": [ + "## Conclusion\n", + "You have now experimented with using `boto3` SDK which provides a vanilla exposure to Amazon Bedrock API. Using this API you have seen the use case of generating a summary of AWS news about Amazon Bedrock in 2 different ways: entire output and streaming output generation.\n", + "\n", + "### Take aways\n", + "- Adapt this notebook to experiment with different models available through Amazon Bedrock such as Anthropic Claude and AI21 Labs Jurassic models.\n", + "- Change the prompts to your specific usecase and evaluate the output of different models.\n", + "- Play with the token length to understand the latency and responsiveness of the service.\n", + "- Apply different prompt engineering principles to get better outputs.\n", + "\n", + "## Thank You" + ] + } + ], + "metadata": { + "availableInstances": [ + { + "_defaultOrder": 0, + "_isFastLaunch": true, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 4, + "name": "ml.t3.medium", + "vcpuNum": 2 + }, + { + "_defaultOrder": 1, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 8, + "name": "ml.t3.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 2, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.t3.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 3, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.t3.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 4, + "_isFastLaunch": true, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 8, + "name": "ml.m5.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 5, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.m5.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 6, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.m5.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 7, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.m5.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 8, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.m5.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 9, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.m5.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 10, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.m5.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 11, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 384, + "name": "ml.m5.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 12, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 8, + "name": "ml.m5d.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 13, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.m5d.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 14, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.m5d.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 15, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.m5d.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 16, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.m5d.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 17, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.m5d.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 18, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.m5d.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 19, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 384, + "name": "ml.m5d.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 20, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": true, + "memoryGiB": 0, + "name": "ml.geospatial.interactive", + "supportedImageNames": [ + "sagemaker-geospatial-v1-0" + ], + "vcpuNum": 0 + }, + { + "_defaultOrder": 21, + "_isFastLaunch": true, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 4, + "name": "ml.c5.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 22, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 8, + "name": "ml.c5.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 23, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.c5.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 24, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.c5.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 25, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 72, + "name": "ml.c5.9xlarge", + "vcpuNum": 36 + }, + { + "_defaultOrder": 26, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 96, + "name": "ml.c5.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 27, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 144, + "name": "ml.c5.18xlarge", + "vcpuNum": 72 + }, + { + "_defaultOrder": 28, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.c5.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 29, + "_isFastLaunch": true, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.g4dn.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 30, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.g4dn.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 31, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.g4dn.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 32, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.g4dn.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 33, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 4, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.g4dn.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 34, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.g4dn.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 35, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 61, + "name": "ml.p3.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 36, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 4, + "hideHardwareSpecs": false, + "memoryGiB": 244, + "name": "ml.p3.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 37, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 488, + "name": "ml.p3.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 38, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 768, + "name": "ml.p3dn.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 39, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.r5.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 40, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.r5.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 41, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.r5.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 42, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.r5.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 43, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.r5.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 44, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 384, + "name": "ml.r5.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 45, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 512, + "name": "ml.r5.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 46, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 768, + "name": "ml.r5.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 47, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.g5.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 48, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.g5.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 49, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.g5.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 50, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.g5.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 51, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.g5.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 52, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 4, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.g5.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 53, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 4, + "hideHardwareSpecs": false, + "memoryGiB": 384, + "name": "ml.g5.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 54, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 768, + "name": "ml.g5.48xlarge", + "vcpuNum": 192 + }, + { + "_defaultOrder": 55, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 1152, + "name": "ml.p4d.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 56, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 1152, + "name": "ml.p4de.24xlarge", + "vcpuNum": 96 + } + ], + "instance_type": "ml.t3.medium", + "kernelspec": { + "display_name": "tmp-bedrock", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.16" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/02_Summarization/02.long-text-summarization-titan.ipynb b/02_Summarization/02.long-text-summarization-titan.ipynb new file mode 100644 index 00000000..1532304f --- /dev/null +++ b/02_Summarization/02.long-text-summarization-titan.ipynb @@ -0,0 +1,883 @@ +{ + "cells": [ + { + "attachments": {}, + "cell_type": "markdown", + "id": "fded102b", + "metadata": {}, + "source": [ + "# Abstractive Text Summarization with Amazon Titan" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "fab8b2cf", + "metadata": {}, + "source": [ + "## Overview\n", + "When we work with large documents, we can face some challenges as the input text might not fit into the model context length, or the model hallucinates with large documents, or, out of memory errors, etc.\n", + "\n", + "To solve those problems, we are going to show an architecture that is based on the concept of chunking and chaining prompts. This architecture is leveraging [LangChain](https://python.langchain.com/docs/get_started/introduction.html) which is a popular framework for developing applications powered by language models.\n", + "\n", + "### Architecture\n", + "\n", + "![](./images/42-text-summarization-2.png)\n", + "\n", + "In this architecture:\n", + "\n", + "1. A large document (or a giant file appending small ones) is loaded\n", + "1. Langchain utility is used to split it into multiple smaller chunks (chunking)\n", + "1. First chunk is sent to the model; Model returns the corresponding summary\n", + "1. Langchain gets next chunk and appends it to the returned summary and sends the combined text as a new request to the model; the process repeats until all chunks are processed\n", + "1. In the end, you have final summary based on entire content\n", + "\n", + "### Use case\n", + "This approach can be used to summarize call transcripts, meetings transcripts, books, articles, blog posts, and other relevant content.\n", + "\n", + "## Setup" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "fcc7dfe4", + "metadata": {}, + "source": [ + "#### ⚠️⚠️⚠️ Execute the following cells before running this notebook ⚠️⚠️⚠️\n", + "\n", + "For a detailed description on what the following cells do refer to [Bedrock boto3 setup](../00_Intro/bedrock_boto3_setup.ipynb) notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6a8925f3", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Make sure you run `download-dependencies.sh` from the root of the repository to download the dependencies before running this cell\n", + "%pip install ../dependencies/botocore-1.29.162-py3-none-any.whl ../dependencies/boto3-1.26.162-py3-none-any.whl ../dependencies/awscli-1.27.162-py3-none-any.whl --force-reinstall\n", + "%pip install transformers==4.24.0 --quiet\n", + "%pip install langchain==0.0.190 --quiet" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3f0f9067", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "#### Un comment the following lines to run from your local environment outside of the AWS account with Bedrock access\n", + "\n", + "#import os\n", + "#os.environ['BEDROCK_ASSUME_ROLE'] = ''\n", + "#os.environ['AWS_PROFILE'] = ''" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5315afb7", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "import boto3\n", + "import json\n", + "import os\n", + "import sys\n", + "\n", + "module_path = \"..\"\n", + "sys.path.append(os.path.abspath(module_path))\n", + "from utils import bedrock, print_ww\n", + "\n", + "os.environ['AWS_DEFAULT_REGION'] = 'us-east-1'\n", + "boto3_bedrock = bedrock.get_bedrock_client(os.environ.get('BEDROCK_ASSUME_ROLE', None))" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "49ae9a41", + "metadata": {}, + "source": [ + "## Summarize long text \n", + "\n", + "### Configuring LangChain with Boto3\n", + "\n", + "LangChain allows you to access Bedrock once you pass boto3 session information to LangChain. If you pass None as the boto3 session information to LangChain, LangChain tries to get session information from your environment.\n", + "In order to ensure the right client is used we are going to instantiate one thanks to a utility method.\n", + "\n", + "You need to specify LLM for LangChain Bedrock class, and can pass arguments for inference. Here you specify Amazon Titan Text Large in `model_id` and pass Titan's inference parameter in `textGenerationConfig`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "93df2442", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from langchain.llms.bedrock import Bedrock\n", + "\n", + "llm = Bedrock(model_id=\"amazon.titan-tg1-large\", \n", + " model_kwargs ={\n", + " \"textGenerationConfig\": {\n", + " \"maxTokenCount\": 4096,\n", + " \"stopSequences\": [],\n", + " \"temperature\":0,\n", + " \"topP\":1\n", + " },\n", + " },\n", + " client=boto3_bedrock)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "31223056", + "metadata": {}, + "source": [ + "### Loading a text file with many tokens\n", + "\n", + "In `letters` directory, you can find a text file of [Amazon's CEO letter to shareholders in 2022](https://www.aboutamazon.com/news/company-news/amazon-ceo-andy-jassy-2022-letter-to-shareholders). The following cell loads the text file and counts the number of tokens in the file. \n", + "\n", + "You will see warning indicating the number of tokens in the text file exceeeds the maximum number of tokens fot his model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c70352ae", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "shareholder_letter = \"./letters/2022-letter.txt\"\n", + "\n", + "with open(shareholder_letter, \"r\") as file:\n", + " letter = file.read()\n", + " \n", + "llm.get_num_tokens(letter)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "dc8ec39d", + "metadata": {}, + "source": [ + "### Splitting the long text into chunks\n", + "\n", + "The text is too long to fit in the prompt, so we will split it into smaller chunks.\n", + "`RecursiveCharacterTextSplitter` in LangChain supports splitting long text into chunks recursively until size of each chunk becomes smaller than `chunk_size`. A text is separated with `separators=[\"\\n\\n\", \"\\n\"]` into chunks, which avoids splitting each paragraph into multiple chunks.\n", + "\n", + "Using 6,000 characters per chunk, we can get summaries for each portion separately. The number of tokens, or word pieces, in a chunk depends on the text." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2e7c372b", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from langchain.text_splitter import RecursiveCharacterTextSplitter\n", + "text_splitter = RecursiveCharacterTextSplitter(\n", + " separators=[\"\\n\\n\", \"\\n\"], chunk_size=4000, chunk_overlap=100\n", + ")\n", + "\n", + "docs = text_splitter.create_documents([letter])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f66569f0", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "num_docs = len(docs)\n", + "\n", + "num_tokens_first_doc = llm.get_num_tokens(docs[0].page_content)\n", + "\n", + "print(\n", + " f\"Now we have {num_docs} documents and the first one has {num_tokens_first_doc} tokens\"\n", + ")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "a5f8ae45", + "metadata": {}, + "source": [ + "### Summarizing chunks and combining them" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "b61d49f5", + "metadata": {}, + "source": [ + "Assuming that the number of tokens is consistent in the other docs we should be good to go. Let's use LangChain's [load_summarize_chain](https://python.langchain.com/en/latest/use_cases/summarization.html) to summarize the text. `load_summarize_chain` provides three ways of summarization: `stuff`, `map_reduce`, and `refine`. \n", + "- `stuff` puts all the chunks into one prompt. Thus, this would hit the maximum limit of tokens.\n", + "- `map_reduce` summarizes each chunk, combines the summary, and summarizes the combined summary. If the combined summary is too large, it would raise error.\n", + "- `refine` summarizes the first chunk, and then summarizes the second chunk with the first summary. The same process repeats until all chunks are summarized.\n", + "\n", + "`map_reduce` and `refine` invoke LLM multiple times and takes time for obtaining final summary. \n", + "Let's try `map_reduce` here. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b3b08c54", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Set verbose=True if you want to see the prompts being used\n", + "from langchain.chains.summarize import load_summarize_chain\n", + "summary_chain = load_summarize_chain(llm=llm, chain_type=\"map_reduce\", verbose=False)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ba73121e", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "output = summary_chain.run(docs)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f3f7eb9b", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "print_ww(output.strip())" + ] + } + ], + "metadata": { + "availableInstances": [ + { + "_defaultOrder": 0, + "_isFastLaunch": true, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 4, + "name": "ml.t3.medium", + "vcpuNum": 2 + }, + { + "_defaultOrder": 1, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 8, + "name": "ml.t3.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 2, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.t3.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 3, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.t3.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 4, + "_isFastLaunch": true, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 8, + "name": "ml.m5.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 5, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.m5.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 6, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.m5.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 7, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.m5.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 8, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.m5.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 9, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.m5.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 10, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.m5.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 11, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 384, + "name": "ml.m5.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 12, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 8, + "name": "ml.m5d.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 13, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.m5d.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 14, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.m5d.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 15, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.m5d.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 16, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.m5d.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 17, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.m5d.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 18, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.m5d.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 19, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 384, + "name": "ml.m5d.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 20, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": true, + "memoryGiB": 0, + "name": "ml.geospatial.interactive", + "supportedImageNames": [ + "sagemaker-geospatial-v1-0" + ], + "vcpuNum": 0 + }, + { + "_defaultOrder": 21, + "_isFastLaunch": true, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 4, + "name": "ml.c5.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 22, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 8, + "name": "ml.c5.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 23, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.c5.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 24, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.c5.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 25, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 72, + "name": "ml.c5.9xlarge", + "vcpuNum": 36 + }, + { + "_defaultOrder": 26, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 96, + "name": "ml.c5.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 27, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 144, + "name": "ml.c5.18xlarge", + "vcpuNum": 72 + }, + { + "_defaultOrder": 28, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.c5.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 29, + "_isFastLaunch": true, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.g4dn.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 30, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.g4dn.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 31, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.g4dn.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 32, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.g4dn.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 33, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 4, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.g4dn.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 34, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.g4dn.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 35, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 61, + "name": "ml.p3.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 36, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 4, + "hideHardwareSpecs": false, + "memoryGiB": 244, + "name": "ml.p3.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 37, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 488, + "name": "ml.p3.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 38, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 768, + "name": "ml.p3dn.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 39, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.r5.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 40, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.r5.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 41, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.r5.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 42, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.r5.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 43, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.r5.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 44, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 384, + "name": "ml.r5.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 45, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 512, + "name": "ml.r5.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 46, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 768, + "name": "ml.r5.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 47, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.g5.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 48, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.g5.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 49, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.g5.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 50, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.g5.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 51, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.g5.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 52, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 4, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.g5.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 53, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 4, + "hideHardwareSpecs": false, + "memoryGiB": 384, + "name": "ml.g5.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 54, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 768, + "name": "ml.g5.48xlarge", + "vcpuNum": 192 + }, + { + "_defaultOrder": 55, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 1152, + "name": "ml.p4d.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 56, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 1152, + "name": "ml.p4de.24xlarge", + "vcpuNum": 96 + } + ], + "instance_type": "ml.t3.medium", + "kernelspec": { + "display_name": "tmp-bedrock", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.11" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/02_Summarization/README.md b/02_Summarization/README.md new file mode 100644 index 00000000..01c27afa --- /dev/null +++ b/02_Summarization/README.md @@ -0,0 +1,53 @@ + +# Text summarization + + +## Overview + +Text summarization is a Natural Language Processing (NLP) technique that involves extracting the most relevant information from a text document and presenting it in a concise and coherent format. + +Summarization works by sending a prompt instruction to the model, asking the model to summarize your text, like following example: + +```xml +Please summarize the following text: + + +Lorem ipsum dolor sit amet, consectetur adipiscing elit, +sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. +Sem fringilla ut morbi tincidunt augue interdum velit euismod in. +Quis hendrerit dolor magna eget est. + +``` + +In order to get the model running a summarization task, we use a technique called prompt engineering, which sends to the model instructions (on plain text) about what is expected when it processes our data and about the response. If you would like to learn more about it, please look into [this](https://www.promptingguide.ai/). + +## Why is it relevant + +People in general are busy with amount of stuff to do. You have meetings to attend, articles and blogs to read, etc. Summarization is a good technique to help you be up to date with important subjects. + +In this module, you will be able to work with Amazon Bedrock API to quickly summarize small and large texts, simplifying the underneath understanding. + +The idea in this demonstration is to show the art of the possible and how to replicate this example to summarize other common scenarios as: + +- Academical papers +- Transcriptions: + - After business calls + - Call center +- Legal documentation +- Financial reports + +## Target Audience + +This module can be executed by any developer familiar with Python, also by data scientists and other technical people. + +## Patterns + +On this workshop, you will be able to learn following patterns on summarization: + +1. [Text summarization with small files](./01.small-text-summarization-claude.ipynb) + + ![small text](./images/41-text-simple-1.png) + +2. [Abstractive Text Summarization](./02.long-text-summarization-titan.ipynb) + + ![large text](./images/42-text-summarization-2.png) \ No newline at end of file diff --git a/02_Summarization/images/41-text-simple-1.png b/02_Summarization/images/41-text-simple-1.png new file mode 100644 index 00000000..849cc64e Binary files /dev/null and b/02_Summarization/images/41-text-simple-1.png differ diff --git a/02_Summarization/images/42-text-summarization-2.png b/02_Summarization/images/42-text-summarization-2.png new file mode 100644 index 00000000..04e53685 Binary files /dev/null and b/02_Summarization/images/42-text-summarization-2.png differ diff --git a/02_Summarization/letters/2022-letter.txt b/02_Summarization/letters/2022-letter.txt new file mode 100644 index 00000000..878ccef7 --- /dev/null +++ b/02_Summarization/letters/2022-letter.txt @@ -0,0 +1,61 @@ +As I sit down to write my second annual shareholder letter as CEO, I find myself optimistic and energized by what lies ahead for Amazon. Despite 2022 being one of the harder macroeconomic years in recent memory, and with some of our own operating challenges to boot, we still found a way to grow demand (on top of the unprecedented growth we experienced in the first half of the pandemic). We innovated in our largest businesses to meaningfully improve customer experience short and long term. And, we made important adjustments in our investment decisions and the way in which we’ll invent moving forward, while still preserving the long-term investments that we believe can change the future of Amazon for customers, shareholders, and employees. + +While there were an unusual number of simultaneous challenges this past year, the reality is that if you operate in large, dynamic, global market segments with many capable and well-funded competitors (the conditions in which Amazon operates all of its businesses), conditions rarely stay stagnant for long. + +In the 25 years I’ve been at Amazon, there has been constant change, much of which we’ve initiated ourselves. When I joined Amazon in 1997, we had booked $15M in revenue in 1996, were a books-only retailer, did not have a third-party marketplace, and only shipped to addresses in the US. Today, Amazon sells nearly every physical and digital retail item you can imagine, with a vibrant third-party seller ecosystem that accounts for 60% of our unit sales, and reaches customers in virtually every country around the world. Similarly, building a business around a set of technology infrastructure services in the cloud was not obvious in 2003 when we started pursuing AWS, and still wasn’t when we launched our first services in 2006. Having virtually every book at your fingertips in 60 seconds, and then being able to store and retrieve them on a lightweight digital reader was not “a thing” yet when we launched Kindle in 2007, nor was a voice-driven personal assistant like Alexa (launched in 2014) that you could use to access entertainment, control your smart home, shop, and retrieve all sorts of information. + +There have also been times when macroeconomic conditions or operating inefficiencies have presented us with new challenges. For instance, in the 2001 dot-com crash, we had to secure letters of credit to buy inventory for the holidays, streamline costs to deliver better profitability for the business, yet still prioritized the long-term customer experience and business we were trying to build (if you remember, we actually lowered prices in most of our categories during that tenuous 2001 period). You saw this sort of balancing again in 2008-2009 as we endured the recession provoked by the mortgage-backed securities financial crisis. We took several actions to manage the cost structure and efficiency of our Stores business, but we also balanced this streamlining with investment in customer experiences that we believed could be substantial future businesses with strong returns for shareholders. In 2008, AWS was still a fairly small, fledgling business. We knew we were on to something, but it still required substantial capital investment. There were voices inside and outside of the company questioning why Amazon (known mostly as an online retailer then) would be investing so much in cloud computing. But, we knew we were inventing something special that could create a lot of value for customers and Amazon in the future. We had a head start on potential competitors; and if anything, we wanted to accelerate our pace of innovation. We made the long-term decision to continue investing in AWS. Fifteen years later, AWS is now an $85B annual revenue run rate business, with strong profitability, that has transformed how customers from start-ups to multinational companies to public sector organizations manage their technology infrastructure. Amazon would be a different company if we’d slowed investment in AWS during that 2008-2009 period. + +Change is always around the corner. Sometimes, you proactively invite it in, and sometimes it just comes a-knocking. But, when you see it’s coming, you have to embrace it. And, the companies that do this well over a long period of time usually succeed. I’m optimistic about our future prospects because I like the way our team is responding to the changes we see in front of us. + +Over the last several months, we took a deep look across the company, business by business, invention by invention, and asked ourselves whether we had conviction about each initiative’s long-term potential to drive enough revenue, operating income, free cash flow, and return on invested capital. In some cases, it led to us shuttering certain businesses. For instance, we stopped pursuing physical store concepts like our Bookstores and 4 Star stores, closed our Amazon Fabric and Amazon Care efforts, and moved on from some newer devices where we didn’t see a path to meaningful returns. In other cases, we looked at some programs that weren’t producing the returns we’d hoped (e.g. free shipping for all online grocery orders over $35) and amended them. We also reprioritized where to spend our resources, which ultimately led to the hard decision to eliminate 27,000 corporate roles. There are a number of other changes that we’ve made over the last several months to streamline our overall costs, and like most leadership teams, we’ll continue to evaluate what we’re seeing in our business and proceed adaptively. + +We also looked hard at how we were working together as a team and asked our corporate employees to come back to the office at least three days a week, beginning in May. During the pandemic, our employees rallied to get work done from home and did everything possible to keep up with the unexpected circumstances that presented themselves. It was impressive and I’m proud of the way our collective team came together to overcome unprecedented challenges for our customers, communities, and business. But, we don’t think it’s the best long-term approach. We’ve become convinced that collaborating and inventing is easier and more effective when we’re working together and learning from one another in person. The energy and riffing on one another’s ideas happen more freely, and many of the best Amazon inventions have had their breakthrough moments from people staying behind after a meeting and working through ideas on a whiteboard, or continuing the conversation on the walk back from a meeting, or just popping by a teammate’s office later that day with another thought. Invention is often messy. It wanders and meanders and marinates. Serendipitous interactions help it, and there are more of those in-person than virtually. It’s also significantly easier to learn, model, practice, and strengthen our culture when we’re in the office together most of the time and surrounded by our colleagues. Innovation and our unique culture have been incredibly important in our first 29 years as a company, and I expect it will be comparably so in the next 29. + +A critical challenge we’ve continued to tackle is the rising cost to serve in our Stores fulfillment network (i.e. the cost to get a product from Amazon to a customer)—and we’ve made several changes that we believe will meaningfully improve our fulfillment costs and speed of delivery. + +During the early part of the pandemic, with many physical stores shut down, our consumer business grew at an extraordinary clip, with annual revenue increasing from $245B in 2019 to $434B in 2022. This meant that we had to double the fulfillment center footprint that we’d built over the prior 25 years and substantially accelerate building a last-mile transportation network that’s now the size of UPS (along with a new sortation center network to assist with efficiency and speed when items needed to traverse long distances)—all in the span of about two years. This was no easy feat, and hundreds of thousands of Amazonians worked very hard to make this happen. However, not surprisingly, with that rate and scale of change, there was a lot of optimization needed to yield the intended productivity. Over the last several months, we’ve scrutinized every process path in our fulfillment centers and transportation network and redesigned scores of processes and mechanisms, resulting in steady productivity gains and cost reductions over the last few quarters. There’s more work to do, but we’re pleased with our trajectory and the meaningful upside in front of us. + +We also took this occasion to make larger structural changes that set us up better to deliver lower costs and faster speed for many years to come. A good example was reevaluating how our US fulfillment network was organized. Until recently, Amazon operated one national US fulfillment network that distributed inventory from fulfillment centers spread across the entire country. If a local fulfillment center didn’t have the product a customer ordered, we’d end up shipping it from other parts of the country, costing us more and increasing delivery times. This challenge became more pronounced as our fulfillment network expanded to hundreds of additional nodes over the last few years, distributing inventory across more locations and increasing the complexity of connecting the fulfillment center and delivery station nodes efficiently. Last year, we started rearchitecting our inventory placement strategy and leveraging our larger fulfillment center footprint to move from a national fulfillment network to a regionalized network model. We made significant internal changes (e.g. placement and logistics software, processes, physical operations) to create eight interconnected regions in smaller geographic areas. Each of these regions has broad, relevant selection to operate in a largely self-sufficient way, while still being able to ship nationally when necessary. Some of the most meaningful and hard work came from optimizing the connections between this large amount of infrastructure. We also continue to improve our advanced machine learning algorithms to better predict what customers in various parts of the country will need so that we have the right inventory in the right regions at the right time. We’ve recently completed this regional roll out and like the early results. Shorter travel distances mean lower cost to serve, less impact on the environment, and customers getting their orders faster. On the latter, we’re excited about seeing more next day and same-day deliveries, and we’re on track to have our fastest Prime delivery speeds ever in 2023. Overall, we remain confident about our plans to lower costs, reduce delivery times, and build a meaningfully larger retail business with healthy operating margins. + +AWS has an $85B annualized revenue run rate, is still early in its adoption curve, but at a juncture where it’s critical to stay focused on what matters most to customers over the long-haul. Despite growing 29% year-over-year (“YoY”) in 2022 on a $62B revenue base, AWS faces short-term headwinds right now as companies are being more cautious in spending given the challenging, current macroeconomic conditions. While some companies might obsess over how they could extract as much money from customers as possible in these tight times, it’s neither what customers want nor best for customers in the long term, so we’re taking a different tack. One of the many advantages of AWS and cloud computing is that when your business grows, you can seamlessly scale up; and conversely, if your business contracts, you can choose to give us back that capacity and cease paying for it. This elasticity is unique to the cloud, and doesn’t exist when you’ve already made expensive capital investments in your own on-premises datacenters, servers, and networking gear. In AWS, like all our businesses, we’re not trying to optimize for any one quarter or year. We’re trying to build customer relationships (and a business) that outlast all of us; and as a result, our AWS sales and support teams are spending much of their time helping customers optimize their AWS spend so they can better weather this uncertain economy. Many of these AWS customers tell us that they’re not cost-cutting as much as cost-optimizing so they can take their resources and apply them to emerging and inventive new customer experiences they’re planning. Customers have appreciated this customer-focused, long-term approach, and we think it’ll bode well for both customers and AWS. + +While these short-term headwinds soften our growth rate, we like a lot of the fundamentals that we’re seeing in AWS. Our new customer pipeline is robust, as are our active migrations. Many companies use discontinuous periods like this to step back and determine what they strategically want to change, and we find an increasing number of enterprises opting out of managing their own infrastructure, and preferring to move to AWS to enjoy the agility, innovation, cost-efficiency, and security benefits. And most importantly for customers, AWS continues to deliver new capabilities rapidly (over 3,300 new features and services launched in 2022), and invest in long-term inventions that change what’s possible. + +Chip development is a good example. In last year’s letter, I mentioned the investment we were making in our general-purpose CPU processors named Graviton. Graviton2-based compute instances deliver up to 40% better price-performance than the comparable latest generation x86-based instances; and in 2022, we delivered our Graviton3 chips, providing 25% better performance than the Graviton2 processors. Further, as machine learning adoption has continued to accelerate, customers have yearned for lower-cost GPUs (the chips most commonly used for machine learning). AWS started investing years ago in these specialized chips for machine learning training and inference (inferences are the predictions or answers that a machine learning model provides). We delivered our first training chip in 2022 (“Trainium”); and for the most common machine learning models, Trainium-based instances are up to 140% faster than GPU-based instances at up to 70% lower cost. Most companies are still in the training stage, but as they develop models that graduate to large-scale production, they’ll find that most of the cost is in inference because models are trained periodically whereas inferences are happening all the time as their associated application is being exercised. We launched our first inference chips (“Inferentia”) in 2019, and they have saved companies like Amazon over a hundred million dollars in capital expense already. Our Inferentia2 chip, which just launched, offers up to four times higher throughput and ten times lower latency than our first Inferentia processor. With the enormous upcoming growth in machine learning, customers will be able to get a lot more done with AWS’s training and inference chips at a significantly lower cost. We’re not close to being done innovating here, and this long-term investment should prove fruitful for both customers and AWS. AWS is still in the early stages of its evolution, and has a chance for unusual growth in the next decade. + +Similarly high potential, Amazon’s Advertising business is uniquely effective for brands, which is part of why it continues to grow at a brisk clip. Akin to physical retailers’ advertising businesses selling shelf space, end-caps, and placement in their circulars, our sponsored products and brands offerings have been an integral part of the Amazon shopping experience for more than a decade. However, unlike physical retailers, Amazon can tailor these sponsored products to be relevant to what customers are searching for given what we know about shopping behaviors and our very deep investment in machine learning algorithms. This leads to advertising that’s more useful for customers; and as a result, performs better for brands. This is part of why our Advertising revenue has continued to grow rapidly (23% YoY in Q4 2022, 25% YoY overall for 2022 on a $31B revenue base), even as most large advertising-focused businesses’ growth have slowed over the last several quarters. + +We strive to be the best place for advertisers to build their brands. We have near and long-term opportunities that will help us achieve that mission. We’re continuing to make large investments in machine learning to keep honing our advertising selection algorithms. For the past couple of years, we’ve invested in building comprehensive, flexible, and durable planning and measurement solutions, giving marketers greater insight into advertising effectiveness. An example is Amazon Marketing Cloud (“AMC”). AMC is a “clean room” (i.e. secure digital environment) in which advertisers can run custom audience and campaign analytics across a range of first and third-party inputs, in a privacy-safe manner, to generate advertising and business insights to inform their broader marketing and sales strategies. The Advertising and AWS teams have collaborated to enable companies to store their data in AWS, operate securely in AMC with Amazon and other third-party data sources, perform analytics in AWS, and have the option to activate advertising on Amazon or third-party publishers through the Amazon Demand-Side Platform. Customers really like this concerted capability. We also see future opportunity to thoughtfully integrate advertising into our video, live sports, audio, and grocery products. We’ll continue to work hard to help brands uniquely engage with the right audience, and grow this part of our business. + +While it’s tempting in turbulent times only to focus on your existing large businesses, to build a sustainable, long-lasting, growing company that helps customers across a large number of dimensions, you can’t stop inventing and working on long-term customer experiences that can meaningfully impact customers and your company. + +When we look at new investment opportunities, we ask ourselves a few questions: + +If we were successful, could it be big and have a reasonable return on invested capital? +Is the opportunity being well-served today? +Do we have a differentiated approach? +And, do we have competence in that area? And if not, can we acquire it quickly? +If we like the answers to those questions, then we’ll invest. This process has led to some expansions that seem straightforward, and others that some folks might not have initially guessed. + +The earliest example is when we chose to expand from just selling Books, to adding categories like Music, Video, Electronics, and Toys. Back then (1998-1999), it wasn’t universally applauded, but in retrospect, it seems fairly obvious. + +The same could be said for our international Stores expansion. In 2022, our international consumer segment drove $118B of revenue. In our larger, established international consumer businesses, we’re big enough to be impacted by the slowing macroeconomic conditions; however, the growth in 2019-2021 on a large base was remarkable—30% compound annual growth rate (“CAGR”) in the UK, 26% in Germany, and 21% in Japan (excluding the impact of FX). Over the past several years, we’ve invested in new international geographies, including India, Brazil, Mexico, Australia, various European countries, the Middle East, and parts of Africa. These new countries take a certain amount of fixed investment to get started and to scale, but we like the trajectory they’re on, and their growth patterns resemble what we’ve seen in North America and our established international geographies. Emerging countries sometimes lack some of the infrastructure and services that our business relies on (e.g. payment methods, transportation services, and internet/telecom infrastructure). To solve these challenges, we continue to work with various partners to deliver solutions for customers. Ultimately, we believe that this investment in serving a broader geographical footprint will allow us to help more customers across the world, as well as build a larger free cash flow-generating consumer business. + +Beyond geographic expansion, we’ve been working to expand our customer offerings across some large, unique product retail market segments. Grocery is an $800B market segment in the US alone, with the average household shopping three to four times per week. Amazon has built a somewhat unusual, but significant grocery business over nearly 20 years. Similar to how other mass merchants entered the grocery space in the 1980s, we began by adding products typically found in supermarket aisles that don’t require temperature control such as paper products, canned and boxed food, candy and snacks, pet care, health and personal care, and beauty. However, we offer more than three million items compared to a typical supermarket’s 30K for the same categories. To date, we’ve also focused on larger pack sizes, given the current cost to serve online delivery. While we’re pleased with the size and growth of our grocery business, we aspire to serve more of our customers’ grocery needs than we do today. To do so, we need a broader physical store footprint given that most of the grocery shopping still happens in physical venues. Whole Foods Market pioneered the natural and organic specialty grocery store concept 40 years ago. Today, it’s a large and growing business that continues to raise the bar for healthy and sustainable food. Over the past year, we’ve continued to invest in the business while also making changes to drive better profitability. Whole Foods is on an encouraging path, but to have a larger impact on physical grocery, we must find a mass grocery format that we believe is worth expanding broadly. Amazon Fresh is the brand we’ve been experimenting with for a few years, and we’re working hard to identify and build the right mass grocery format for Amazon scale. Grocery is a big growth opportunity for Amazon. + +Amazon Business is another example of an investment where our ecommerce and logistics capabilities position us well to pursue this large market segment. Amazon Business allows businesses, municipalities, and organizations to procure products like office supplies and other bulk items easily and at great savings. While some areas of the economy have struggled over the past few years, Amazon Business has thrived. Why? Because the team has translated what it means to deliver selection, value, and convenience into a business procurement setting, constantly listening to and learning from customers, and innovating on their behalf. Some people have never heard of Amazon Business, but, our business customers love it. Amazon Business launched in 2015 and today drives roughly $35B in annualized gross sales. More than six million active customers, including 96 of the global Fortune 100 companies, are enjoying Amazon Business’ one-stop shopping, real-time analytics, and broad selection on hundreds of millions of business supplies. We believe that we’ve only scratched the surface of what’s possible to date, and plan to keep building the features our business customers tell us they need and want. + +While many brands and merchants successfully sell their products on Amazon’s marketplace, there are also a large number of brands and sellers who have launched their own direct-to-consumer websites. One of the challenges for these merchants is driving conversion from views to purchases. We invented Buy with Prime to help with this challenge. Buy with Prime allows third-party brands and sellers to offer their products on their own websites to our large Amazon Prime membership, and offer those customers fast, free Prime shipping and seamless checkout with their Amazon account. Buy with Prime provides merchants several additional benefits, including Amazon handling the product storage, picking, packing, delivery, payment, and any returns, all through Amazon Pay and Fulfillment by Amazon. Buy with Prime has recently been made available to all US merchants; and so far, Buy with Prime has increased shopper conversion on third-party shopping sites by 25% on average. Merchants are excited about converting more sales and fulfilling these shipments more easily, Prime members love that they can use their Prime benefits on more destinations, and Buy with Prime allows us to improve the shopping experience across more of the web. + +Expanding internationally, pursuing large retail market segments that are still nascent for Amazon, and using our unique assets to help merchants sell more effectively on their own websites are somewhat natural extensions for us. There are also a few investments we’re making that are further from our core businesses, but where we see unique opportunity. In 2003, AWS would have been a classic example. In 2023, Amazon Healthcare and Kuiper are potential analogues. + +Our initial efforts in Healthcare began with pharmacy, which felt less like a major departure from ecommerce. For years, Amazon customers had asked us when we’d offer them an online pharmacy as their frustrations mounted with current providers. Launched in 2020, Amazon Pharmacy is a full-service, online pharmacy that offers transparent pricing, easy refills, and savings for Prime members. The business is growing quickly, and continues to innovate. An example is Amazon Pharmacy’s recent launch of RxPass, which for a $5 per month flat fee, enables Prime members to get as many of the eligible prescription medications as they need for dozens of common conditions, like high blood pressure, acid reflux, and anxiety. However, our customers have continued to express a strong desire for Amazon to provide a better alternative to the inefficient and unsatisfying broader healthcare experience. We decided to start with primary care as it’s a prevalent first stop in the patient journey. We evaluated and studied the existing landscape extensively, including some early Amazon experiments like Amazon Care. During this process, we identified One Medical’s patient-focused experience as an excellent foundation upon which to build our future business; and in July 2022, we announced our acquisition of One Medical. There are several elements that customers love about One Medical. It has a fantastic digital app that makes it easy for patients to discuss issues with a medical practitioner via chat or video conference. If a physical visit is required, One Medical has offices in cities across the US where patients can book same or next day appointments. One Medical has relationships with specialty physicians in each of its cities and works closely with local hospital systems to make seeing specialists easy, so One Medical members can quickly access these resources when needed. Going forward, we strongly believe that One Medical and Amazon will continue to innovate together to change what primary care will look like for customers. + +Kuiper is another example of Amazon innovating for customers over the long term in an area where there’s high customer need. Our vision for Kuiper is to create a low-Earth orbit satellite system to deliver high-quality broadband internet service to places around the world that don’t currently have it. There are hundreds of millions of households and businesses who don’t have reliable access to the internet. Imagine what they’ll be able to do with reliable connectivity, from people taking online education courses, using financial services, starting their own businesses, doing their shopping, enjoying entertainment, to businesses and governments improving their coverage, efficiency, and operations. Kuiper will deliver not only accessibility, but affordability. Our teams have developed low-cost antennas (i.e. customer terminals) that will lower the barriers to access. We recently unveiled the new terminals that will communicate with the satellites passing overhead, and we expect to be able to produce our standard residential version for less than $400 each. They’re small: 11 inches square, 1 inch thick, and weigh less than 5 pounds without their mounting bracket, but they deliver speeds up to 400 megabits per second. And they’re powered by Amazon-designed baseband chips. We’re preparing to launch two prototype satellites to test the entire end-to-end communications network this year, and plan to be in beta with commercial customers in 2024. The customer reaction to what we’ve shared thus far about Kuiper has been very positive, and we believe Kuiper represents a very large potential opportunity for Amazon. It also shares several similarities to AWS in that it’s capital intensive at the start, but has a large prospective consumer, enterprise, and government customer base, significant revenue and operating profit potential, and relatively few companies with the technical and inventive aptitude, as well as the investment hypothesis to go after it. + +One final investment area that I’ll mention, that’s core to setting Amazon up to invent in every area of our business for many decades to come, and where we’re investing heavily is Large Language Models (“LLMs”) and Generative AI. Machine learning has been a technology with high promise for several decades, but it’s only been the last five to ten years that it’s started to be used more pervasively by companies. This shift was driven by several factors, including access to higher volumes of compute capacity at lower prices than was ever available. Amazon has been using machine learning extensively for 25 years, employing it in everything from personalized ecommerce recommendations, to fulfillment center pick paths, to drones for Prime Air, to Alexa, to the many machine learning services AWS offers (where AWS has the broadest machine learning functionality and customer base of any cloud provider). More recently, a newer form of machine learning, called Generative AI, has burst onto the scene and promises to significantly accelerate machine learning adoption. Generative AI is based on very Large Language Models (trained on up to hundreds of billions of parameters, and growing), across expansive datasets, and has radically general and broad recall and learning capabilities. We have been working on our own LLMs for a while now, believe it will transform and improve virtually every customer experience, and will continue to invest substantially in these models across all of our consumer, seller, brand, and creator experiences. Additionally, as we’ve done for years in AWS, we’re democratizing this technology so companies of all sizes can leverage Generative AI. AWS is offering the most price-performant machine learning chips in Trainium and Inferentia so small and large companies can afford to train and run their LLMs in production. We enable companies to choose from various LLMs and build applications with all of the AWS security, privacy and other features that customers are accustomed to using. And, we’re delivering applications like AWS’s CodeWhisperer, which revolutionizes developer productivity by generating code suggestions in real time. I could write an entire letter on LLMs and Generative AI as I think they will be that transformative, but I’ll leave that for a future letter. Let’s just say that LLMs and Generative AI are going to be a big deal for customers, our shareholders, and Amazon. + +So, in closing, I’m optimistic that we’ll emerge from this challenging macroeconomic time in a stronger position than when we entered it. There are several reasons for it and I’ve mentioned many of them above. But, there are two relatively simple statistics that underline our immense future opportunity. While we have a consumer business that’s $434B in 2022, the vast majority of total market segment share in global retail still resides in physical stores (roughly 80%). And, it’s a similar story for Global IT spending, where we have AWS revenue of $80B in 2022, with about 90% of Global IT spending still on-premises and yet to migrate to the cloud. As these equations steadily flip—as we’re already seeing happen—we believe our leading customer experiences, relentless invention, customer focus, and hard work will result in significant growth in the coming years. And, of course, this doesn’t include the other businesses and experiences we’re pursuing at Amazon, all of which are still in their early days. + +I strongly believe that our best days are in front of us, and I look forward to working with my teammates at Amazon to make it so. \ No newline at end of file diff --git a/03_QuestionAnswering/00_qa_w_bedrock_titan.ipynb b/03_QuestionAnswering/00_qa_w_bedrock_titan.ipynb new file mode 100644 index 00000000..4633fb82 --- /dev/null +++ b/03_QuestionAnswering/00_qa_w_bedrock_titan.ipynb @@ -0,0 +1,432 @@ +{ + "cells": [ + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Question and answers with Bedrock\n", + "\n", + "Question Answering (QA) is an important task that involves extracting answers to factual queries posed in natural language. Typically, a QA system processes a query against a knowledge base containing structured or unstructured data and generates a response with accurate information. Ensuring high accuracy is key to developing a useful, reliable and trustworthy question answering system, especially for enterprise use cases. \n", + "\n", + "Generative AI models like Titan and Claude use probability distributions to generate responses to questions. These models are trained on vast amounts of text data, which allows them to predict what comes next in a sequence or what word might follow a particular word. However, these models are not able to provide accurate or deterministic answers to every question because there is always some degree of uncertainty in the data. Enterprises need to query domain specific and proprietary data and use the information to answer questions, and more generally data on which the model has not been trained on. \n", + "\n", + "In this module, we will demonstrate how to use the Bedrock Titan model to provide information response to queries.\n", + "\n", + "In this example we will be running the Model with no context and then manually try and provide the context. There is no `RAG` augmentation happening here. This approach works with short documents or single-ton applications, it might not scale to enterprise level question answering where there could be large enterprise documents which cannot all be fit into the prompt sent to the model. \n", + "\n", + "### Challenges\n", + "- How to have the model return factual answers for question\n", + "\n", + "### Proposal\n", + "To the above challenges, this notebook proposes the following strategy\n", + "#### Prepare documents\n", + "Before being able to answer the questions, the documents must be processed and a stored in a document store index\n", + "- Here we will send in the request with the full relevant context to the model and expect the response back\n" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### ⚠️⚠️⚠️ Execute the following cells before running this notebook ⚠️⚠️⚠️\n", + "\n", + "For a detailed description on what the following cells do refer to [Bedrock boto3 setup](../00_Intro/bedrock_boto3_setup.ipynb) notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Make sure you run `download-dependencies.sh` from the root of the repository to download the dependencies before running this cell\n", + "%pip install ../dependencies/botocore-1.29.162-py3-none-any.whl ../dependencies/boto3-1.26.162-py3-none-any.whl ../dependencies/awscli-1.27.162-py3-none-any.whl --force-reinstall\n", + "%pip install langchain==0.0.190 --quiet" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#### Un comment the following lines to run from your local environment outside of the AWS account with Bedrock access\n", + "\n", + "#import oss\n", + "#os.environ['BEDROCK_ASSUME_ROLE'] = ''\n", + "#os.environ['AWS_PROFILE'] = ''" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import boto3\n", + "import json\n", + "import os\n", + "import sys\n", + "\n", + "module_path = \"..\"\n", + "sys.path.append(os.path.abspath(module_path))\n", + "from utils import bedrock, print_ww\n", + "\n", + "os.environ['AWS_DEFAULT_REGION'] = 'us-east-1'\n", + "boto3_bedrock = bedrock.get_bedrock_client(os.environ.get('BEDROCK_ASSUME_ROLE', None))" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Section 1: Q&A with the knowledge of the model\n", + "In this section we try to use models provided by Bedrock service to answer questions based on the knowledge it gained during the training phase." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this Notebook we will be using the `invoke_model()` method of Amazon Bedrock client. The mandatory parameters required to use this method are `modelId` which represents the Amazon Bedrock model ARN, and `body` which is the prompt for our task. The `body` prompt changes depending on the foundational model provider selected. We walk through this in detail below\n", + "\n", + "```\n", + "{\n", + " modelId= model_id,\n", + " contentType= \"application/json\",\n", + " accept= \"application/json\",\n", + " body=body\n", + "}\n", + "\n", + "```" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "## Scenario\n", + "\n", + "We are trying to model a situation where we are asking the model to provide information to change the tires. We will first ask the model based on the training data to provide us with an answer for our specific make and model of the car. This technique is called 'Zero Shot` . We will soon realize that even though the model seems to be returning the answers which seem relevant to our specific car, it is actually halucinating. The reason we can find that out is because we run through a fake car and we get almost similiar scenario and answer back\n", + "\n", + "This situation implies we need to augment the model's training with additional data about our specific make and model of the car and then the model will return us very specific answer. In this notebook we will not use any external sources to augment the data but simulate how a RAG based augmentation system would work. \n", + "\n", + "To run our final test we provide a full detailed section from our manual which goes and explains for our specific car how the tire changes work and then we will test to get a curated response back from the model\n", + "\n", + "## Task\n", + "\n", + "To start the process, you select one of the models provided by Bedrock. For this use case you select Titan. These models are able to answer generic questions about cars.\n", + "\n", + "For example you ask the Titan model to tell you how to change a flat tire on your Audi.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "prompt_data = \"\"\"You are an helpful assistant. Answer questions in a concise way. If you are unsure about the\n", + "answer say 'I am unsure'\n", + "\n", + "Question: How can I fix a flat tire on my Audi A8?\n", + "Answer:\"\"\"\n", + "parameters = {\n", + " \"maxTokenCount\":512,\n", + " \"stopSequences\":[],\n", + " \"temperature\":0,\n", + " \"topP\":0.9\n", + " }" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Let's invoke of the model passing in the JSON body to generate the response" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "body = json.dumps({\"inputText\": prompt_data, \"textGenerationConfig\": parameters})\n", + "modelId = \"amazon.titan-tg1-large\" # change this to use a different version from the model provider\n", + "accept = \"application/json\"\n", + "contentType = \"application/json\"\n", + "\n", + "response = boto3_bedrock.invoke_model(\n", + " body=body, modelId=modelId, accept=accept, contentType=contentType\n", + ")\n", + "response_body = json.loads(response.get(\"body\").read())\n", + "answer = response_body.get(\"results\")[0].get(\"outputText\")\n", + "print_ww(answer.strip())" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The model is able to give you an answer outlining the process of changing the car flat tire, but the same explanation could be valid for any car. Unfortunately this is not the right answer for an Audi A8, which does not have a spare tire. This is because the model has been trained on data that contains instructions about changing tires on cars.\n", + "\n", + "Another example of this issue can be seen by trying to ask the same question for a completely fake car brand and model, say a Amazon Tirana." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "prompt_data = \"How can I fix a flat tire on my Amazon Tirana?\"\n", + "body = json.dumps({\"inputText\": prompt_data, \"textGenerationConfig\": parameters})\n", + "modelId = \"amazon.titan-tg1-large\" # change this to use a different version from the model provider\n", + "accept = \"application/json\"\n", + "contentType = \"application/json\"\n", + "\n", + "response = boto3_bedrock.invoke_model(\n", + " body=body, modelId=modelId, accept=accept, contentType=contentType\n", + ")\n", + "response_body = json.loads(response.get(\"body\").read())\n", + "answer = response_body.get(\"results\")[0].get(\"outputText\")\n", + "print_ww(answer.strip())" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As you can see the answer that the model provides is plausible but it is for a bike and not a car. The model assumed that the Amazon Tirana was a bike. The model is _hallucinating_.\n", + "\n", + "How can we fix this issue and have the model provide answers based on the specific instructions valid for my car model?\n", + "\n", + "Research by Facebook in 2020 found that LLM knowledge could be augmented on the fly by providing the additional knowledge base as part of the prompt. This approach is called Retrieval Augmented Generation, or RAG.\n", + "\n", + "Let's see how we can use this to improve our application.\n", + "\n", + "The following is an excerpt of the manual of the Audi A8 (in reality it is not the real manual, but let's assume so). This document is also conveniently short enough to fit entirely in the prompt of Titan Large. \n", + "\n", + "```\n", + "Tires and tire pressure:\n", + "\n", + "Tires are made of black rubber and are mounted on the wheels of your vehicle. They provide the necessary grip for driving, cornering, and braking. Two important factors to consider are tire pressure and tire wear, as they can affect the performance and handling of your car.\n", + "\n", + "Where to find recommended tire pressure:\n", + "\n", + "You can find the recommended tire pressure specifications on the inflation label located on the driver's side B-pillar of your vehicle. Alternatively, you can refer to your vehicle's manual for this information. The recommended tire pressure may vary depending on the speed and the number of occupants or maximum load in the vehicle.\n", + "\n", + "Reinflating the tires:\n", + "\n", + "When checking tire pressure, it is important to do so when the tires are cold. This means allowing the vehicle to sit for at least three hours to ensure the tires are at the same temperature as the ambient temperature.\n", + "\n", + "To reinflate the tires:\n", + "\n", + " Check the recommended tire pressure for your vehicle.\n", + " Follow the instructions provided on the air pump and inflate the tire(s) to the correct pressure.\n", + " In the center display of your vehicle, open the \"Car status\" app.\n", + " Navigate to the \"Tire pressure\" tab.\n", + " Press the \"Calibrate pressure\" option and confirm the action.\n", + " Drive the car for a few minutes at a speed above 30 km/h to calibrate the tire pressure.\n", + "\n", + "Note: In some cases, it may be necessary to drive for more than 15 minutes to clear any warning symbols or messages related to tire pressure. If the warnings persist, allow the tires to cool down and repeat the above steps.\n", + "\n", + "Flat Tire:\n", + "\n", + "If you encounter a flat tire while driving, you can temporarily seal the puncture and reinflate the tire using a tire mobility kit. This kit is typically stored under the lining of the luggage area in your vehicle.\n", + "\n", + "Instructions for using the tire mobility kit:\n", + "\n", + " Open the tailgate or trunk of your vehicle.\n", + " Lift up the lining of the luggage area to access the tire mobility kit.\n", + " Follow the instructions provided with the tire mobility kit to seal the puncture in the tire.\n", + " After using the kit, make sure to securely put it back in its original location.\n", + " Contact Rivesla or an appropriate service for assistance with disposing of and replacing the used sealant bottle.\n", + "\n", + "Please note that the tire mobility kit is a temporary solution and is designed to allow you to drive for a maximum of 10 minutes or 8 km (whichever comes first) at a maximum speed of 80 km/h. It is advisable to replace the punctured tire or have it repaired by a professional as soon as possible.\n", + "```\n", + "\n", + " \n", + "Next, we take this text and \"embed\" it in the prompt together with the original question. The prompt is also build in such a way as to try to hint the model to only look at the information provided as context." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "context = \"\"\"Tires and tire pressure:\n", + "\n", + "Tires are made of black rubber and are mounted on the wheels of your vehicle. They provide the necessary grip for driving, cornering, and braking. Two important factors to consider are tire pressure and tire wear, as they can affect the performance and handling of your car.\n", + "\n", + "Where to find recommended tire pressure:\n", + "\n", + "You can find the recommended tire pressure specifications on the inflation label located on the driver's side B-pillar of your vehicle. Alternatively, you can refer to your vehicle's manual for this information. The recommended tire pressure may vary depending on the speed and the number of occupants or maximum load in the vehicle.\n", + "\n", + "Reinflating the tires:\n", + "\n", + "When checking tire pressure, it is important to do so when the tires are cold. This means allowing the vehicle to sit for at least three hours to ensure the tires are at the same temperature as the ambient temperature.\n", + "\n", + "To reinflate the tires:\n", + "\n", + " Check the recommended tire pressure for your vehicle.\n", + " Follow the instructions provided on the air pump and inflate the tire(s) to the correct pressure.\n", + " In the center display of your vehicle, open the \"Car status\" app.\n", + " Navigate to the \"Tire pressure\" tab.\n", + " Press the \"Calibrate pressure\" option and confirm the action.\n", + " Drive the car for a few minutes at a speed above 30 km/h to calibrate the tire pressure.\n", + "\n", + "Note: In some cases, it may be necessary to drive for more than 15 minutes to clear any warning symbols or messages related to tire pressure. If the warnings persist, allow the tires to cool down and repeat the above steps.\n", + "\n", + "Flat Tire:\n", + "\n", + "If you encounter a flat tire while driving, you can temporarily seal the puncture and reinflate the tire using a tire mobility kit. This kit is typically stored under the lining of the luggage area in your vehicle.\n", + "\n", + "Instructions for using the tire mobility kit:\n", + "\n", + " Open the tailgate or trunk of your vehicle.\n", + " Lift up the lining of the luggage area to access the tire mobility kit.\n", + " Follow the instructions provided with the tire mobility kit to seal the puncture in the tire.\n", + " After using the kit, make sure to securely put it back in its original location.\n", + " Contact Audi or an appropriate service for assistance with disposing of and replacing the used sealant bottle.\n", + "\n", + "Please note that the tire mobility kit is a temporary solution and is designed to allow you to drive for a maximum of 10 minutes or 8 km (whichever comes first) at a maximum speed of 80 km/h. It is advisable to replace the punctured tire or have it repaired by a professional as soon as possible.\"\"\"" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Let's take the whole excerpt and pass it to the model together with the question." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "question = \"How can I fix a flat tire on my Audi A8?\"\n", + "prompt_data = f\"\"\"Answer the question based only on the information provided between ## and give step by step guide.\n", + "#\n", + "{context}\n", + "#\n", + "\n", + "Question: {question}\n", + "Answer:\"\"\"" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "##### Invoke the model via boto3 to generate the response" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "body = json.dumps({\"inputText\": prompt_data, \"textGenerationConfig\": parameters})\n", + "modelId = \"amazon.titan-tg1-large\" # change this to use a different version from the model provider\n", + "accept = \"application/json\"\n", + "contentType = \"application/json\"\n", + "\n", + "response = boto3_bedrock.invoke_model(\n", + " body=body, modelId=modelId, accept=accept, contentType=contentType\n", + ")\n", + "response_body = json.loads(response.get(\"body\").read())\n", + "answer = response_body.get(\"results\")[0].get(\"outputText\")\n", + "print_ww(answer.strip())" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Since the model takes a while to understand the context and generate relevant answer for you, this might lead to poor experience for the user since they have to wait for a response for some seconds.\n", + "\n", + "Bedrock also supports streaming capability where the service generates an output as the model is generating tokens. Here is an example of how you can do that." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from IPython.display import display_markdown,Markdown,clear_output" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "response = boto3_bedrock.invoke_model_with_response_stream(body=body, modelId=modelId, accept=accept, contentType=contentType)\n", + "stream = response.get('body')\n", + "output = []\n", + "i = 1\n", + "if stream:\n", + " for event in stream:\n", + " chunk = event.get('chunk')\n", + " if chunk:\n", + " chunk_obj = json.loads(chunk.get('bytes').decode())\n", + " text = chunk_obj['outputText']\n", + " clear_output(wait=True)\n", + " output.append(text)\n", + " display_markdown(Markdown(''.join(output)))\n", + " i+=1" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "##### Summary\n", + "\n", + "We see the response is a summarized and step by step instruction of how to change the tires . This simple example shows how you can leverage the `RAG` or the Augmentation process to generate a curated response back" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "bedrock", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.16" + }, + "orig_nbformat": 4 + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/03_QuestionAnswering/01_qa_w_rag_claude.ipynb b/03_QuestionAnswering/01_qa_w_rag_claude.ipynb new file mode 100644 index 00000000..d13895eb --- /dev/null +++ b/03_QuestionAnswering/01_qa_w_rag_claude.ipynb @@ -0,0 +1,535 @@ +{ + "cells": [ + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Retrieval Augmented Question & Answering with Amazon Bedrock using LangChain\n", + "\n", + "### Context\n", + "Previously we saw that the model told us how to to change the tire, however we had to manually provide it with the relevant data and provide the contex ourselves. We explored the approach to leverage the model availabe under Bedrock and ask questions based on it's knowledge learned during training as well as providing manual context. While that approach works with short documents or single-ton applications, it fails to scale to enterprise level question answering where there could be large enterprise documents which cannot all be fit into the prompt sent to the model. \n", + "\n", + "### Pattern\n", + "We can improve upon this process by implementing an architecure called Retreival Augmented Generation (RAG). RAG retrieves data from outside the language model (non-parametric) and augments the prompts by adding the relevant retrieved data in context. \n", + "\n", + "In this notebook we explain how to approach the pattern of Question Answering to find and leverage the documents to provide answers to the user questions.\n", + "\n", + "### Challenges\n", + "- How to manage large document(s) that exceed the token limit\n", + "- How to find the document(s) relevant to the question being asked\n", + "\n", + "### Proposal\n", + "To the above challenges, this notebook proposes the following strategy\n", + "#### Prepare documents\n", + "![Embeddings](./images/Embeddings_lang.png)\n", + "\n", + "Before being able to answer the questions, the documents must be processed and a stored in a document store index\n", + "- Load the documents\n", + "- Process and split them into smaller chunks\n", + "- Create a numerical vector representation of each chunk using Amazon Bedrock Titan Embeddings model\n", + "- Create an index using the chunks and the corresponding embeddings\n", + "#### Ask question\n", + "![Question](./images/Chatbot_lang.png)\n", + "\n", + "When the documents index is prepared, you are ready to ask the questions and relevant documents will be fetched based on the question being asked. Following steps will be executed.\n", + "- Create an embedding of the input question\n", + "- Compare the question embedding with the embeddings in the index\n", + "- Fetch the (top N) relevant document chunks\n", + "- Add those chunks as part of the context in the prompt\n", + "- Send the prompt to the model under Amazon Bedrock\n", + "- Get the contextual answer based on the documents retrieved" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Usecase\n", + "#### Dataset\n", + "To explain this architecture pattern we are using the documents from IRS. These documents explain topics such as:\n", + "- Original Issue Discount (OID) Instruments\n", + "- Reporting Cash Payments of Over $10,000 to IRS\n", + "- Employer's Tax Guide\n", + "\n", + "#### Persona\n", + "Let's assume a persona of a layman who doesn't have an understanding of how IRS works and if some actions have implications or not.\n", + "\n", + "The model will try to answer from the documents in easy language.\n" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Implementation\n", + "In order to follow the RAG approach this notebook is using the LangChain framework where it has integrations with different services and tools that allow efficient building of patterns such as RAG. We will be using the following tools:\n", + "\n", + "- **LLM (Large Language Model)**: Anthropic Claude V1 available through Amazon Bedrock\n", + "\n", + " This model will be used to understand the document chunks and provide an answer in human friendly manner.\n", + "- **Embeddings Model**: Amazon Titan Embeddings available through Amazon Bedrock\n", + "\n", + " This model will be used to generate a numerical representation of the textual documents\n", + "- **Document Loader**: PDF Loader available through LangChain\n", + "\n", + " This is the loader that can load the documents from a source, for the sake of this notebook we are loading the sample files from a local path. This could easily be replaced with a loader to load documents from enterprise internal systems.\n", + "\n", + "- **Vector Store**: FAISS available through LangChain\n", + "\n", + " In this notebook we are using this in-memory vector-store to store both the embeddings and the documents. In an enterprise context this could be replaced with a persistent store such as AWS OpenSearch, RDS Postgres with pgVector, ChromaDB, Pinecone or Weaviate.\n", + "- **Index**: VectorIndex\n", + "\n", + " The index helps to compare the input embedding and the document embeddings to find relevant document\n", + "- **Wrapper**: wraps index, vector store, embeddings model and the LLM to abstract away the logic from the user.\n", + "\n", + "### Setup\n", + "To run this notebook you would need to install 2 more dependencies, [PyPDF](https://pypi.org/project/pypdf/) and [FAISS vector store](https://github.com/facebookresearch/faiss).\n", + "\n", + "\n", + "\n", + "Then begin with instantiating the LLM and the Embeddings model. Here we are using Anthropic Claude to demonstrate the use case.\n", + "\n", + "Note: It is possible to choose other models available with Bedrock. You can replace the `model_id` as follows to change the model.\n", + "\n", + "`llm = Bedrock(model_id=\"amazon.titan-tg1-large\")`\n", + "\n", + "Available models under Bedrock have the following IDs:\n", + "- `amazon.titan-tg1-large`\n", + "- `ai21.j2-grande-instruct`\n", + "- `ai21.j2-jumbo-instruct`\n", + "- `anthropic.claude-instant-v1`\n", + "- `anthropic.claude-v1`" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### ⚠️⚠️⚠️ Execute the following cells before running this notebook ⚠️⚠️⚠️\n", + "\n", + "For a detailed description on what the following cells do refer to [Bedrock boto3 setup](../00_Intro/bedrock_boto3_setup.ipynb) notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Make sure you run `download-dependencies.sh` from the root of the repository to download the dependencies before running this cell\n", + "%pip install ../dependencies/botocore-1.29.162-py3-none-any.whl ../dependencies/boto3-1.26.162-py3-none-any.whl ../dependencies/awscli-1.27.162-py3-none-any.whl --force-reinstall\n", + "%pip install langchain==0.0.190 --quiet\n", + "%pip install pypdf==3.8.1 faiss-cpu==1.7.4 --quiet" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#### Un comment the following lines to run from your local environment outside of the AWS account with Bedrock access\n", + "\n", + "#import os\n", + "#os.environ['BEDROCK_ASSUME_ROLE'] = ''\n", + "#os.environ['AWS_PROFILE'] = ''" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import boto3\n", + "import json\n", + "import os\n", + "import sys\n", + "\n", + "module_path = \"..\"\n", + "sys.path.append(os.path.abspath(module_path))\n", + "from utils import bedrock, print_ww\n", + "\n", + "os.environ['AWS_DEFAULT_REGION'] = 'us-east-1'\n", + "boto3_bedrock = bedrock.get_bedrock_client(os.environ.get('BEDROCK_ASSUME_ROLE', None))" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Setup langchain\n", + "\n", + "We create an instance of the Bedrock classes for the LLM and the embedding models. At the time of writing, Bedrock supports one embedding model and therefore we do not need to specify any model id." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# We will be using the Titan Embeddings Model to generate our Embeddings.\n", + "from langchain.embeddings import BedrockEmbeddings\n", + "from langchain.llms.bedrock import Bedrock\n", + "\n", + "# - create the Anthropic Model\n", + "llm = Bedrock(model_id=\"anthropic.claude-v1\", client=boto3_bedrock, model_kwargs={'max_tokens_to_sample':200})\n", + "bedrock_embeddings = BedrockEmbeddings(client=boto3_bedrock)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Data Preparation\n", + "Let's first download some of the files to build our document store. For this example we will be using public IRS documents from [here](https://www.irs.gov/publications)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from urllib.request import urlretrieve\n", + "files = [\n", + " 'https://www.irs.gov/pub/irs-pdf/p1544.pdf',\n", + " 'https://www.irs.gov/pub/irs-pdf/p15.pdf',\n", + " 'https://www.irs.gov/pub/irs-pdf/p1212.pdf'\n", + "]\n", + "for url in files:\n", + " file_path = './data/' + url.split('/')[-1]\n", + " urlretrieve(url, file_path)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "After downloading we can load the documents with the help of [DirectoryLoader from PyPDF available under LangChain](https://python.langchain.com/en/latest/reference/modules/document_loaders.html) and splitting them into smaller chunks.\n", + "\n", + "Note: The retrieved document/text should be large enough to contain enough information to answer a question; but small enough to fit into the LLM prompt. Also the embeddings model has a limit of the length of input tokens limited to 512 tokens, which roughly translates to ~2000 characters. For the sake of this use-case we are creating chunks of roughly 1000 characters with an overlap of 100 characters using [RecursiveCharacterTextSplitter](https://python.langchain.com/en/latest/modules/indexes/text_splitters/examples/recursive_text_splitter.html)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter\n", + "from langchain.document_loaders import PyPDFLoader, PyPDFDirectoryLoader\n", + "\n", + "loader = PyPDFDirectoryLoader(\"./data/\")\n", + "\n", + "documents = loader.load()\n", + "# - in our testing Character split works better with this PDF data set\n", + "text_splitter = RecursiveCharacterTextSplitter(\n", + " # Set a really small chunk size, just to show.\n", + " chunk_size = 1000,\n", + " chunk_overlap = 100,\n", + ")\n", + "docs = text_splitter.split_documents(documents)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "avg_doc_length = lambda documents: sum([len(doc.page_content) for doc in documents])//len(documents)\n", + "avg_char_count_pre = avg_doc_length(documents)\n", + "avg_char_count_post = avg_doc_length(docs)\n", + "print(f'Average length among {len(documents)} documents loaded is {avg_char_count_pre} characters.')\n", + "print(f'After the split we have {len(docs)} documents more than the original {len(documents)}.')\n", + "print(f'Average length among {len(docs)} documents (after split) is {avg_char_count_post} characters.')" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We had 3 PDF documents which have been split into smaller ~500 chunks.\n", + "\n", + "Now we can see how a sample embedding would look like for one of those chunks" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "sample_embedding = np.array(bedrock_embeddings.embed_query(docs[0].page_content))\n", + "print(\"Sample embedding of a document chunk: \", sample_embedding)\n", + "print(\"Size of the embedding: \", sample_embedding.shape)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Following the similar pattern embeddings could be generated for the entire corpus and stored in a vector store.\n", + "\n", + "This can be easily done using [FAISS](https://github.com/facebookresearch/faiss) implementation inside [LangChain](https://python.langchain.com/en/latest/modules/indexes/vectorstores/examples/faiss.html) which takes input the embeddings model and the documents to create the entire vector store. Using the Index Wrapper we can abstract away most of the heavy lifting such as creating the prompt, getting embeddings of the query, sampling the relevant documents and calling the LLM. [VectorStoreIndexWrapper](https://python.langchain.com/en/latest/modules/indexes/getting_started.html#one-line-index-creation) helps us with that.\n", + "\n", + "**⚠️⚠️⚠️ NOTE: it might take few minutes to run the following cell ⚠️⚠️⚠️**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain.chains.question_answering import load_qa_chain\n", + "from langchain.vectorstores import FAISS\n", + "from langchain.indexes import VectorstoreIndexCreator\n", + "from langchain.indexes.vectorstore import VectorStoreIndexWrapper\n", + "\n", + "vectorstore_faiss = FAISS.from_documents(\n", + " docs,\n", + " bedrock_embeddings,\n", + ")\n", + "\n", + "wrapper_store_faiss = VectorStoreIndexWrapper(vectorstore=vectorstore_faiss)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Question Answering\n", + "\n", + "Now that we have our vector store in place, we can start asking questions." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "query = \"Is it possible that I get sentenced to jail due to failure in filings?\"" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The first step would be to create an embedding of the query such that it could be compared with the documents" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "query_embedding = vectorstore_faiss.embedding_function(query)\n", + "np.array(query_embedding)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can use this embedding of the query to then fetch relevant documents.\n", + "Now our query is represented as embeddings we can do a similarity search of our query against our data store providing us with the most relevant information." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "relevant_documents = vectorstore_faiss.similarity_search_by_vector(query_embedding)\n", + "print(f'{len(relevant_documents)} documents are fetched which are relevant to the query.')\n", + "print('----')\n", + "for i, rel_doc in enumerate(relevant_documents):\n", + " print_ww(f'## Document {i+1}: {rel_doc.page_content}.......')\n", + " print('---')" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now we have the relevant documents, it's time to use the LLM to generate an answer based on these documents. \n", + "\n", + "We will take our inital prompt, together with our relevant documents which were retreived based on the results of our similarity search. We then by combining these create a prompt that we feed back to the model to get our result. At this point our model should give us highly informed information on how we can change the tire of our specific car as it was outlined in our manual.\n", + "\n", + "LangChain provides an abstraction of how this can be done easily." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Quick way\n", + "You have the possibility to use the wrapper provided by LangChain which wraps around the Vector Store and takes input the LLM.\n", + "This wrapper performs the following steps behind the scences:\n", + "- Takes input the question\n", + "- Create question embedding\n", + "- Fetch relevant documents\n", + "- Stuff the documents and the question into a prompt\n", + "- Invoke the model with the prompt and generate the answer in a human readable manner." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "answer = wrapper_store_faiss.query(question=query, llm=llm)\n", + "print_ww(answer)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's ask a different question:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "query_2 = \"What is the difference between market discount and qualified stated interest\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "answer_2 = wrapper_store_faiss.query(question=query_2, llm=llm)\n", + "print_ww(answer_2)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Customisable option\n", + "In the above scenario you explored the quick and easy way to get a context-aware answer to your question. Now let's have a look at a more customizable option with the helpf of [RetrievalQA](https://python.langchain.com/en/latest/modules/chains/index_examples/vector_db_qa.html) where you can customize how the documents fetched should be added to prompt using `chain_type` parameter. Also, if you want to control how many relevant documents should be retrieved then change the `k` parameter in the cell below to see different outputs. In many scenarios you might want to know which were the source documents that the LLM used to generate the answer, you can get those documents in the output using `return_source_documents` which returns the documents that are added to the context of the LLM prompt. `RetrievalQA` also allows you to provide a custom [prompt template](https://python.langchain.com/en/latest/modules/prompts/prompt_templates/getting_started.html) which can be specific to the model.\n", + "\n", + "Note: In this example we are using Anthropic Claude as the LLM under Amazon Bedrock, this particular model performs best if the inputs are provided under `Human:` and the model is requested to generate an output after `Assistant:`. In the cell below you see an example of how to control the prompt such that the LLM stays grounded and doesn't answer outside the context." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "\n", + "from langchain.chains import RetrievalQA\n", + "from langchain.prompts import PromptTemplate\n", + "\n", + "prompt_template = \"\"\"Human: Use the following pieces of context to provide a concise answer to the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n", + "\n", + "{context}\n", + "\n", + "Question: {question}\n", + "Assistant:\"\"\"\n", + "PROMPT = PromptTemplate(\n", + " template=prompt_template, input_variables=[\"context\", \"question\"]\n", + ")\n", + "\n", + "qa = RetrievalQA.from_chain_type(\n", + " llm=llm,\n", + " chain_type=\"stuff\",\n", + " retriever=vectorstore_faiss.as_retriever(\n", + " search_type=\"similarity\", search_kwargs={\"k\": 3}\n", + " ),\n", + " return_source_documents=True,\n", + " chain_type_kwargs={\"prompt\": PROMPT}\n", + ")\n", + "query = \"Is it possible that I get sentenced to jail due to failure in filings?\"\n", + "result = qa({\"query\": query})\n", + "print_ww(result['result'])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "result['source_documents']" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Conclusion\n", + "Congratulations on completing this moduel on retrieval augmented generation! This is an important technique that combines the power of large language models with the precision of retrieval methods. By augmenting generation with relevant retrieved examples, the responses we recieved become more coherent, consistent and grounded. You should feel proud of learning this innovative approach. I'm sure the knowledge you've gained will be very useful for building creative and engaging language generation systems. Well done!\n", + "\n", + "In the above implementation of RAG based Question Answering we have explored the following concepts and how to implement them using Amazon Bedrock and it's LangChain integration.\n", + "\n", + "- Loading documents and generating embeddings to create a vector store\n", + "- Retrieving documents to the question\n", + "- Preparing a prompt which goes as input to the LLM\n", + "- Present an answer in a human friendly manner\n", + "\n", + "### Take-aways\n", + "- Experiment with different Vector Stores\n", + "- Leverage various models available under Amazon Bedrock to see alternate outputs\n", + "- Explore options such as persistent storage of embeddings and document chunks\n", + "- Integration with enterprise data stores\n", + "\n", + "# Thank You" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "bedrock", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.16" + }, + "orig_nbformat": 4 + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/03_QuestionAnswering/README.md b/03_QuestionAnswering/README.md new file mode 100644 index 00000000..0641c565 --- /dev/null +++ b/03_QuestionAnswering/README.md @@ -0,0 +1,55 @@ +# Question Answering +## Introduction + +Question Answering (QA) is an important task that involves extracting answers to factual queries posed in natural language. Typically, a QA system processes a query against a knowledge base containing structured or unstructured data and generates a response with accurate information. Ensuring high accuracy is key to developing a useful, reliable and trustworthy question answering system, especially for enterprise use cases. + +Generative AI models like Amazon Titan, Anthropic Claude and AI21 Jurassic 2 use probability distributions to generate responses to questions. These models are trained on vast amounts of text data, which allows them to predict what comes next in a sequence or what word might follow a particular word. However, these models are not able to provide accurate or deterministic answers to every question because there is always some degree of uncertainty in the data. + +Enterprises need to query domain specific and proprietary data and use the information to answer questions, and more generally data on which the model has not been trained on. + +## Patterns + +In these labs we will explore two QA patterns: + +1. First where questions are sent to the model where by we will get answers based on the base model with no modifications. +This poses a challenge, +outputs are generic to common world information, not specific to a customers specific business, and there is no source of information. + + ![Q&A](./images/51-simple-rag.png) + +2. The Second Pattern where we use Retrieval Augmented Generation which improves upon the first where we concatenate our questions with as much relevant context as possible, which is likely to contain the answers or information we are looking for. +The challenge here, There is a limit on how much contextual information can be used is determined by the token limit of the model. + ![RAG Q&A](./images/52-rag-with-external-data.png) + +This can be overcome by using Retrival Augmented Generation (RAG) + +## How Retrieval Augmented Generation (RAG) works + +RAG combines the use of embeddings to index the corpus of the documents to build a knowledge base and the use of an LLM to extract the information from a subset of the documents in the knowledge base. + + +As a preparation step for RAG, the documents building up the knowledge base are split in chunks of a fixed size (matching the maximum input size of the selected embedding model), and are then passed to the model to obtain the embedding vector. The embedding together with the original chunk of the document and additional metadata are stored in a vector database. The vector database is optimized to efficiently perform similarity search between vectors. + +## Target audience +Customers with data stores that may be private or frequently changing. RAG approach solves 2 problems, customers having the following challenges can benefit from this lab. +- Freshness of data: if the data is continously changing and model must only provide latest information. +- Actuality of knowledge: if there is some domain specific knowledge that model might not have understanding of, and the model must output as per the domain data. + +## Objective + +After this module you should have a good understanding of: + +1. What is the QA pattern and how it leverages Retrieval Augmented Generation (RAG) +2. How to use Bedrock to implement a Q&A RAG solution + + +In this module we will walk you through how to implement the QA pattern with Bedrock. +Additionally, we have prepared the embeddings to be loaded in the vector database for you. + +Take note you can use Titan Embeddings to obtain the embeddings of the user question, then use those embedding to retrieve the most relevant documents from the vector database, build a prompt concatenating the top 3 documents and invoke the LLM model via Bedrock. + +## Notebooks + +1. [Q&A with model knowledge and small context](./00_qa_w_bedrock_titan.ipynb) + +2. [Q&A with RAG](./01_qa_w_rag_claude.ipynb) \ No newline at end of file diff --git a/03_QuestionAnswering/images/51-simple-rag.png b/03_QuestionAnswering/images/51-simple-rag.png new file mode 100644 index 00000000..6a7c0bae Binary files /dev/null and b/03_QuestionAnswering/images/51-simple-rag.png differ diff --git a/03_QuestionAnswering/images/52-rag-with-external-data.png b/03_QuestionAnswering/images/52-rag-with-external-data.png new file mode 100644 index 00000000..6bbb9a7f Binary files /dev/null and b/03_QuestionAnswering/images/52-rag-with-external-data.png differ diff --git a/03_QuestionAnswering/images/Chatbot_lang.png b/03_QuestionAnswering/images/Chatbot_lang.png new file mode 100644 index 00000000..6c73f602 Binary files /dev/null and b/03_QuestionAnswering/images/Chatbot_lang.png differ diff --git a/03_QuestionAnswering/images/Embeddings_lang.png b/03_QuestionAnswering/images/Embeddings_lang.png new file mode 100644 index 00000000..11dae65b Binary files /dev/null and b/03_QuestionAnswering/images/Embeddings_lang.png differ diff --git a/04_Chatbot/00_Chatbot_Claude.ipynb b/04_Chatbot/00_Chatbot_Claude.ipynb new file mode 100644 index 00000000..c5ab8653 --- /dev/null +++ b/04_Chatbot/00_Chatbot_Claude.ipynb @@ -0,0 +1,1434 @@ +{ + "cells": [ + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Conversational Interface - Chatbot with Claude LLM\n", + "\n", + "In this notebook, we will build a chatbot using the Foundational Models (FMs) in Amazon Bedrock. For our use-case we use Claude as our FM for building the chatbot.\n", + "\n", + "Amazon Bedrock currently supports the following Claude models:\n", + "| Provider | Model Name | Versions | `id` |\n", + "| --- | --- | --- | --- |\n", + "| Anthropic | Claude | V1, Instant | `anthropic.claude-v1`, `anthropic.claude-instant-v1` |\n" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Overview\n", + "\n", + "Conversational interfaces such as chatbots and virtual assistants can be used to enhance the user experience for your customers.Chatbots uses natural language processing (NLP) and machine learning algorithms to understand and respond to user queries. Chatbots can be used in a variety of applications, such as customer service, sales, and e-commerce, to provide quick and efficient responses to users. They can be accessed through various channels such as websites, social media platforms, and messaging apps.\n", + "\n", + "\n", + "## Chatbot using Amazon Bedrock\n", + "\n", + "![Amazon Bedrock - Conversational Interface](./images/chatbot_bedrock.png)\n", + "\n", + "\n", + "## Use Cases\n", + "\n", + "1. **Chatbot (Basic)** - Zero Shot chatbot with a FM model\n", + "2. **Chatbot using prompt** - template(Langchain) - Chatbot with some context provided in the prompt template\n", + "3. **Chatbot with persona** - Chatbot with defined roles. i.e. Career Coach and Human interactions\n", + "4. **Contextual-aware chatbot** - Passing in context through an external file by generating embeddings.\n", + "\n", + "## Langchain framework for building Chatbot with Amazon Bedrock\n", + "In Conversational interfaces such as chatbots, it is highly important to remember previous interactions, both at a short term but also at a long term level.\n", + "\n", + "LangChain provides memory components in two forms. First, LangChain provides helper utilities for managing and manipulating previous chat messages. These are designed to be modular and useful regardless of how they are used. Secondly, LangChain provides easy ways to incorporate these utilities into chains.\n", + "It allows us to easily define and interact with different types of abstractions, which make it easy to build powerful chatbots.\n", + "\n", + "## Building Chatbot with Context - Key Elements\n", + "\n", + "The first process in a building a contextual-aware chatbot is to **generate embeddings** for the context. Typically, you will have an ingestion process which will run through your embedding model and generate the embeddings which will be stored in a sort of a vector store. In this example we are using a GPT-J embeddings model for this\n", + "\n", + "![Embeddings](./images/embeddings_lang.png)\n", + "\n", + "Second process is the user request orchestration , interaction, invoking and returing the results\n", + "\n", + "![Chatbot](./images/chatbot_lang.png)\n", + "\n", + "## Architecture [Context Aware Chatbot]\n", + "![4](./images/context-aware-chatbot.png)\n" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### ⚠️⚠️⚠️ Execute the following cells before running this notebook ⚠️⚠️⚠️\n", + "\n", + "For a detailed description on what the following cells do refer to [Bedrock boto3 setup](../00_Intro/bedrock_boto3_setup.ipynb) notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Make sure you run `download-dependencies.sh` from the root of the repository to download the dependencies before running this cell\n", + "%pip install ../dependencies/botocore-1.29.162-py3-none-any.whl ../dependencies/boto3-1.26.162-py3-none-any.whl ../dependencies/awscli-1.27.162-py3-none-any.whl --force-reinstall" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": { + "jupyter": { + "outputs_hidden": false + } + }, + "source": [ + "### Installing the dependencies" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "%pip install faiss-cpu==1.7.4 --quiet\n", + "%pip install pypdf==3.8.1 --quiet\n", + "%pip install langchain==0.0.190 --quiet\n", + "%pip install ipywidgets==7.7.0" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#### Un comment the following lines to run from your local environment outside of the AWS account with Bedrock access\n", + "\n", + "#import os\n", + "#os.environ['BEDROCK_ASSUME_ROLE'] = ''\n", + "#os.environ['AWS_PROFILE'] = ''" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import boto3\n", + "import json\n", + "import os\n", + "import sys\n", + "\n", + "module_path = \"..\"\n", + "sys.path.append(os.path.abspath(module_path))\n", + "from utils import bedrock, print_ww\n", + "\n", + "os.environ['AWS_DEFAULT_REGION'] = 'us-east-1'\n", + "boto3_bedrock = bedrock.get_bedrock_client(os.environ.get('BEDROCK_ASSUME_ROLE', None))" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Chatbot (Basic - without context)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": { + "tags": [] + }, + "source": [ + "#### We use [CoversationChain](https://python.langchain.com/en/latest/modules/models/llms/integrations/bedrock.html?highlight=ConversationChain#using-in-a-conversation-chain) from LangChain to start the conversation. We also use the [ConversationBufferMemory](https://python.langchain.com/en/latest/modules/memory/types/buffer.html) for storing the messages. We can also get the history as a list of messages (this is very useful in a chat model).\n", + "Chatbots needs to remember the previous interactions. Conversational memory allows us to do that. There are several ways that we can implement conversational memory. In the context of LangChain, they are all built on top of the ConversationChain.\n", + "\n", + "**Note:** The model outputs are non-deterministic" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from langchain.chains import ConversationChain\n", + "from langchain.memory import ConversationBufferMemory\n", + "from langchain import PromptTemplate" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2023-06-15T20:35:32.414119Z", + "start_time": "2023-06-15T20:35:31.605208Z" + }, + "collapsed": false + }, + "outputs": [], + "source": [ + "from langchain.llms.bedrock import Bedrock\n", + "cl_llm = Bedrock(model_id=\"anthropic.claude-v1\", client=boto3_bedrock, model_kwargs={\"max_tokens_to_sample\": 1000})\n", + "memory = ConversationBufferMemory()\n", + "conversation = ConversationChain(\n", + " llm=cl_llm, verbose=True, memory=memory\n", + ") \n", + "print_ww(conversation.predict(input=\"Hi there!\"))" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "What happens here? We said \"Hi there!\" and the model spat out a several conversations. This is due to the fact that the default prompt used by Langchain ConversationChain is not well designed for Claude. An [effective Claude prompt](https://docs.anthropic.com/claude/docs/introduction-to-prompt-design) should end with `\\n\\nHuman\\n\\nAassistant:`. Let's fix this.\n", + "\n", + "To learn more about how to write prompts for Claude, check [Anthropic documentation](https://docs.anthropic.com/claude/docs/introduction-to-prompt-design)." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Chatbot using prompt template(Langchain)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "LangChain provides several classes and functions to make constructing and working with prompts easy. We are going to use the [PromptTemplate](https://python.langchain.com/en/latest/modules/prompts/getting_started.html) class to construct the prompt from a f-string template. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from langchain.memory import ConversationBufferMemory\n", + "from langchain import PromptTemplate\n", + "\n", + "# turn verbose to true to see the full logs and documents\n", + "conversation= ConversationChain(\n", + " llm=cl_llm, verbose=False, memory=ConversationBufferMemory() #memory_chain\n", + ")\n", + "\n", + "# langchain prompts do not always work with all the models. This prompt is tuned for Claude\n", + "claude_prompt = PromptTemplate.from_template(\"\"\"The following is a friendly conversation between a human and an AI.\n", + "The AI is talkative and provides lots of specific details from its context. If the AI does not know\n", + "the answer to a question, it truthfully says it does not know.\n", + "\n", + "Current conversation:\n", + "{history}\n", + "\n", + "\n", + "Human: {input}\n", + "\n", + "\n", + "Assistant:\n", + "\"\"\")\n", + " \n", + "conversation.prompt = claude_prompt\n", + "\n", + "print_ww(conversation.predict(input=\"Hi there!\"))" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### New Questions\n", + "\n", + "Model has responded with intial message, let's ask few questions" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "print_ww(conversation.predict(input=\"Give me a few tips on how to start a new garden.\"))" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Build on the questions\n", + "\n", + "Let's ask a question without mentioning the word garden to see if model can understand previous conversation" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print_ww(conversation.predict(input=\"Cool. Will that work with tomatoes?\"))" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Finishing this conversation" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print_ww(conversation.predict(input=\"That's all, thank you!\"))" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Claude is still really talkative. Try changing the prompt to make Claude provide shorter answers." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": { + "collapsed": false + }, + "source": [ + "### Interactive session using ipywidgets\n", + "\n", + "The following utility class allows us to interact with Claude in a more natural way. We write out question in an input box, and get Claude answer. We can then continue our conversation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import ipywidgets as ipw\n", + "from IPython.display import display, clear_output\n", + "\n", + "class ChatUX:\n", + " \"\"\" A chat UX using IPWidgets\n", + " \"\"\"\n", + " def __init__(self, qa, retrievalChain = False):\n", + " self.qa = qa\n", + " self.name = None\n", + " self.b=None\n", + " self.retrievalChain = retrievalChain\n", + " self.out = ipw.Output()\n", + "\n", + "\n", + " def start_chat(self):\n", + " print(\"Starting chat bot\")\n", + " display(self.out)\n", + " self.chat(None)\n", + "\n", + "\n", + " def chat(self, _):\n", + " if self.name is None:\n", + " prompt = \"\"\n", + " else: \n", + " prompt = self.name.value\n", + " if 'q' == prompt or 'quit' == prompt or 'Q' == prompt:\n", + " print(\"Thank you , that was a nice chat !!\")\n", + " return\n", + " elif len(prompt) > 0:\n", + " with self.out:\n", + " thinking = ipw.Label(value=\"Thinking...\")\n", + " display(thinking)\n", + " try:\n", + " if self.retrievalChain:\n", + " result = self.qa.run({'question': prompt })\n", + " else:\n", + " result = self.qa.run({'input': prompt }) #, 'history':chat_history})\n", + " except:\n", + " result = \"No answer\"\n", + " thinking.value=\"\"\n", + " print_ww(f\"AI:{result}\")\n", + " self.name.disabled = True\n", + " self.b.disabled = True\n", + " self.name = None\n", + " \n", + " if self.name is None:\n", + " with self.out:\n", + " self.name = ipw.Text(description=\"You:\", placeholder='q to quit')\n", + " self.b = ipw.Button(description=\"Send\")\n", + " self.b.on_click(self.chat)\n", + " display(ipw.Box(children=(self.name, self.b)))" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's start a chat. You can also test the following questions:\n", + "1. tell me a joke\n", + "2. tell me another joke\n", + "3. what was the first joke about\n", + "4. can you make another joke on the same topic of the first joke" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "chat = ChatUX(conversation)\n", + "chat.start_chat()" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": { + "tags": [] + }, + "source": [ + "## Chatbot with persona" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "AI assistant will play the role of a career coach. Role Play Dialogue requires user message to be set in before starting the chat. ConversationBufferMemory is used to pre-populate the dialog" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# store previous interactions using ConversationalBufferMemory and add custom prompts to the chat.\n", + "memory = ConversationBufferMemory()\n", + "memory.chat_memory.add_user_message(\"You will be acting as a career coach. Your goal is to give career advice to users\")\n", + "memory.chat_memory.add_ai_message(\"I am career coach and give career advice\")\n", + "cl_llm = Bedrock(model_id=\"anthropic.claude-v1\",client=boto3_bedrock)\n", + "conversation = ConversationChain(\n", + " llm=cl_llm, verbose=True, memory=memory\n", + ")\n", + "\n", + "conversation.prompt = claude_prompt\n", + "\n", + "print_ww(conversation.predict(input=\"What are the career options in AI?\"))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print_ww(conversation.predict(input=\"What these people really do? Is it fun?\"))" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "##### Let's ask a question that is not specialty of this Persona and the model shouldn't answer that question and give a reason for that" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "conversation.verbose = False\n", + "print_ww(conversation.predict(input=\"How to fix my car?\"))" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Chatbot with Context \n", + "In this use case we will ask the Chatbot to answer question from some external corpus it has likely never seen before. To do this we apply a pattern called RAG (Retrieval Augmented Generation): the idea is to index the corpus in chunks, then lookup which sections of the corpus might be relevant to provide an answer by using semantic similarity between the chunks and the question. Finally the most relevant chunks are aggregated and passed as context to the ConversationChain, similar to providing an history.\n", + "\n", + "We will take a csv file and use **Titan Embeddings Model** to create vectors for each line of the csv. This vector is then stored in FAISS, an open source library providing an in-memory vector datastore. When the chatbot is asked a question, we query FAISS with the question and retrieve the text which is semantically closest. This will be our answer. " + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Titan embeddings Model\n", + "\n", + "Embeddings are a way to represent words, phrases or any other discrete items as vectors in a continuous vector space. This allows machine learning models to perform mathematical operations on these representations and capture semantic relationships between them.\n", + "\n", + "Embeddings are for example used for the RAG [document search capability](https://labelbox.com/blog/how-vector-similarity-search-works/) \n", + "\n", + "Other possible use for embeddings can be found here. [LangChain Embeddings](https://python.langchain.com/en/latest/reference/modules/embeddings.html)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "from langchain.embeddings import BedrockEmbeddings\n", + "\n", + "br_embeddings = BedrockEmbeddings(client=boto3_bedrock)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### FAISS as VectorStore\n", + "\n", + "In order to be able to use embeddings for search, we need a store that can efficiently perform vector similarity searches. In this notebook we use FAISS, which is an in memory store. For permanently store vectors, one can use pgVector, Pinecone or Chroma.\n", + "\n", + "The langchain VectorStore Api's are available [here](https://python.langchain.com/en/harrison-docs-refactor-3-24/reference/modules/vectorstore.html)\n", + "\n", + "To know more about the FAISS vector store please refer to this [document](https://arxiv.org/pdf/1702.08734.pdf)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from langchain.document_loaders import CSVLoader\n", + "from langchain.text_splitter import CharacterTextSplitter\n", + "from langchain.indexes.vectorstore import VectorStoreIndexWrapper\n", + "from langchain.vectorstores import FAISS\n", + "\n", + "s3_path = f\"s3://jumpstart-cache-prod-us-east-2/training-datasets/Amazon_SageMaker_FAQs/Amazon_SageMaker_FAQs.csv\"\n", + "!aws s3 cp $s3_path ./rag_data/Amazon_SageMaker_FAQs.csv\n", + "\n", + "loader = CSVLoader(\"./rag_data/Amazon_SageMaker_FAQs.csv\") # --- > 219 docs with 400 chars, each row consists in a question column and an answer column\n", + "documents_aws = loader.load() #\n", + "print(f\"Number of documents={len(documents_aws)}\")\n", + "\n", + "docs = CharacterTextSplitter(chunk_size=2000, chunk_overlap=400, separator=\",\").split_documents(documents_aws)\n", + "\n", + "print(f\"Number of documents after split and chunking={len(docs)}\")\n", + "\n", + "vectorstore_faiss_aws = FAISS.from_documents(\n", + " documents=docs,\n", + " embedding = br_embeddings\n", + ")\n", + "\n", + "print(f\"vectorstore_faiss_aws: number of elements in the index={vectorstore_faiss_aws.index.ntotal}::\")\n" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Semantic search\n", + "\n", + "We can use a Wrapper class provided by LangChain to query the vector data base store and return to us the relevant documents. Behind the scenes this is only going to run a RetrievalQA chain." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "wrapper_store_faiss = VectorStoreIndexWrapper(vectorstore=vectorstore_faiss_aws)\n", + "print_ww(wrapper_store_faiss.query(\"R in SageMaker\", llm=cl_llm))" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's see how the semantic search works:\n", + "1. First we calculate the embeddings vector for the query, and\n", + "2. then we use this vector to do a similarity search on the store" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "v = br_embeddings.embed_query(\"R in SageMaker\")\n", + "print(v[0:10])\n", + "results = vectorstore_faiss_aws.similarity_search_by_vector(v, k=4)\n", + "for r in results:\n", + " print_ww(r.page_content)\n", + " print('----')" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Memory\n", + "In any chatbot we will need a QA Chain with various options which are customized by the use case. But in a chatbot we will always need to keep the history of the conversation so the model can take it into consideration to provide the answer. In this example we use the [ConversationalRetrievalChain](https://python.langchain.com/docs/modules/chains/popular/chat_vector_db) from LangChain, together with a ConversationBufferMemory to keep the history of the conversation.\n", + "\n", + "Source: https://python.langchain.com/docs/modules/chains/popular/chat_vector_db\n", + "\n", + "Set `verbose` to `True` to see all the what is going on behind the scenes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "\n", + "from langchain.chains.conversational_retrieval.prompts import CONDENSE_QUESTION_PROMPT\n", + "\n", + "print_ww(CONDENSE_QUESTION_PROMPT.template)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Parameters used for ConversationRetrievalChain\n", + "* **retriever**: We used `VectorStoreRetriever`, which is backed by a `VectorStore`. To retrieve text, there are two search types you can choose: `\"similarity\"` or `\"mmr\"`. `search_type=\"similarity\"` uses similarity search in the retriever object where it selects text chunk vectors that are most similar to the question vector.\n", + "\n", + "* **memory**: Memory Chain to store the history \n", + "\n", + "* **condense_question_prompt**: Given a question from the user, we use the previous conversation and that question to make up a standalone question\n", + "\n", + "* **chain_type**: If the chat history is long and doesn't fit the context you use this parameter and the options are `stuff`, `refine`, `map_reduce`, `map-rerank`\n", + "\n", + "If the question asked is outside the scope of context, then the model will reply it doesn't know the answer\n", + "\n", + "**Note**: if you are curious how the chain works, uncomment the `verbose=True` line." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# turn verbose to true to see the full logs and documents\n", + "from langchain.chains import ConversationalRetrievalChain\n", + "from langchain.memory import ConversationBufferMemory\n", + "\n", + "memory_chain = ConversationBufferMemory(memory_key=\"chat_history\", return_messages=True)\n", + "qa = ConversationalRetrievalChain.from_llm(\n", + " llm=cl_llm, \n", + " retriever=vectorstore_faiss_aws.as_retriever(), \n", + " memory=memory_chain,\n", + " condense_question_prompt=CONDENSE_QUESTION_PROMPT,\n", + " #verbose=True, \n", + " chain_type='stuff', # 'refine',\n", + " #max_tokens_limit=300\n", + ")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's chat! ask the chatbot some questions about SageMaker, like:\n", + "1. What is SageMaker?\n", + "2. What is canvas?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "chat = ChatUX(qa, retrievalChain=True)\n", + "chat.start_chat()" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Your mileage might vary, but after 2 or 3 questions you will start to get some weird answers. In some cases, even in other languages.\n", + "This is happening for the same reasons outlined at the beginning of this notebook: the default langchain prompts are not optimal for Claude. \n", + "In the following cell we are going to set two new prompts: one for the question rephrasing, and one to get the answer from that rephrased question." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# turn verbose to true to see the full logs and documents\n", + "from langchain.chains import ConversationalRetrievalChain\n", + "from langchain.schema import BaseMessage\n", + "\n", + "\n", + "# We are also providing a different chat history retriever which outputs the history as a Claude chat (ie including the \\n\\n)\n", + "_ROLE_MAP = {\"human\": \"\\n\\nHuman: \", \"ai\": \"\\n\\nAssistant: \"}\n", + "def _get_chat_history(chat_history):\n", + " buffer = \"\"\n", + " for dialogue_turn in chat_history:\n", + " if isinstance(dialogue_turn, BaseMessage):\n", + " role_prefix = _ROLE_MAP.get(dialogue_turn.type, f\"{dialogue_turn.type}: \")\n", + " buffer += f\"\\n{role_prefix}{dialogue_turn.content}\"\n", + " elif isinstance(dialogue_turn, tuple):\n", + " human = \"\\n\\nHuman: \" + dialogue_turn[0]\n", + " ai = \"\\n\\nAssistant: \" + dialogue_turn[1]\n", + " buffer += \"\\n\" + \"\\n\".join([human, ai])\n", + " else:\n", + " raise ValueError(\n", + " f\"Unsupported chat history format: {type(dialogue_turn)}.\"\n", + " f\" Full chat history: {chat_history} \"\n", + " )\n", + " return buffer\n", + "\n", + "# the condense prompt for Claude\n", + "condense_prompt_claude = PromptTemplate.from_template(\"\"\"{chat_history}\n", + "\n", + "Answer only with the new question.\n", + "\n", + "\n", + "Human: How would you ask the question considering the previous conversation: {question}\n", + "\n", + "\n", + "Assistant: Question:\"\"\")\n", + "\n", + "# recreate the Claude LLM with more tokens to sample - this provide longer responses but introduces some latency\n", + "cl_llm = Bedrock(model_id=\"anthropic.claude-v1\", client=boto3_bedrock, model_kwargs={\"max_tokens_to_sample\": 500})\n", + "memory_chain = ConversationBufferMemory(memory_key=\"chat_history\", return_messages=True)\n", + "qa = ConversationalRetrievalChain.from_llm(\n", + " llm=cl_llm, \n", + " retriever=vectorstore_faiss_aws.as_retriever(), \n", + " #retriever=vectorstore_faiss_aws.as_retriever(search_type='similarity', search_kwargs={\"k\": 8}),\n", + " memory=memory_chain,\n", + " get_chat_history=_get_chat_history,\n", + " #verbose=True,\n", + " condense_question_prompt=condense_prompt_claude, \n", + " chain_type='stuff', # 'refine',\n", + " #max_tokens_limit=300\n", + ")\n", + "\n", + "# the LLMChain prompt to get the answer. the ConversationalRetrievalChange does not expose this parameter in the constructor\n", + "qa.combine_docs_chain.llm_chain.prompt = PromptTemplate.from_template(\"\"\"\n", + "{context}\n", + "\n", + "\n", + "Human: Use at maximum 3 sentences to answer the question inside the XML tags. \n", + "\n", + "{question}\n", + "\n", + "Do not use any XML tags in the answer. If the answer is not in the context say \"Sorry, I don't know as the answer was not found in the context\"\n", + "\n", + "Assistant:\"\"\")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's start another chat. Feel free to ask the following questions:\n", + "\n", + "1. What is SageMaker?\n", + "2. what is canvas?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "chat = ChatUX(qa, retrievalChain=True)\n", + "chat.start_chat()" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Do some prompt engineering\n", + "\n", + "You can \"tune\" your prompt to get more or less verbose answers. For example, try to change the number of sentences, or remove that instruction all-together. You might also need to change the number of `max_tokens_to_sample` (eg 1000 or 2000) to get the full answer." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### In this demo we used Claude LLM to create conversational interface with following patterns:\n", + "\n", + "1. Chatbot (Basic - without context)\n", + "\n", + "2. Chatbot using prompt template(Langchain)\n", + "\n", + "3. Chatbot with personas\n", + "\n", + "4. Chatbot with context" + ] + } + ], + "metadata": { + "availableInstances": [ + { + "_defaultOrder": 0, + "_isFastLaunch": true, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 4, + "name": "ml.t3.medium", + "vcpuNum": 2 + }, + { + "_defaultOrder": 1, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 8, + "name": "ml.t3.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 2, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.t3.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 3, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.t3.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 4, + "_isFastLaunch": true, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 8, + "name": "ml.m5.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 5, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.m5.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 6, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.m5.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 7, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.m5.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 8, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.m5.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 9, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.m5.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 10, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.m5.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 11, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 384, + "name": "ml.m5.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 12, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 8, + "name": "ml.m5d.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 13, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.m5d.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 14, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.m5d.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 15, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.m5d.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 16, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.m5d.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 17, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.m5d.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 18, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.m5d.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 19, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 384, + "name": "ml.m5d.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 20, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": true, + "memoryGiB": 0, + "name": "ml.geospatial.interactive", + "supportedImageNames": [ + "sagemaker-geospatial-v1-0" + ], + "vcpuNum": 0 + }, + { + "_defaultOrder": 21, + "_isFastLaunch": true, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 4, + "name": "ml.c5.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 22, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 8, + "name": "ml.c5.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 23, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.c5.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 24, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.c5.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 25, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 72, + "name": "ml.c5.9xlarge", + "vcpuNum": 36 + }, + { + "_defaultOrder": 26, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 96, + "name": "ml.c5.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 27, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 144, + "name": "ml.c5.18xlarge", + "vcpuNum": 72 + }, + { + "_defaultOrder": 28, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.c5.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 29, + "_isFastLaunch": true, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.g4dn.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 30, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.g4dn.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 31, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.g4dn.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 32, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.g4dn.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 33, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 4, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.g4dn.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 34, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.g4dn.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 35, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 61, + "name": "ml.p3.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 36, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 4, + "hideHardwareSpecs": false, + "memoryGiB": 244, + "name": "ml.p3.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 37, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 488, + "name": "ml.p3.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 38, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 768, + "name": "ml.p3dn.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 39, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.r5.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 40, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.r5.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 41, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.r5.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 42, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.r5.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 43, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.r5.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 44, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 384, + "name": "ml.r5.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 45, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 512, + "name": "ml.r5.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 46, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 768, + "name": "ml.r5.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 47, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.g5.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 48, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.g5.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 49, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.g5.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 50, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.g5.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 51, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.g5.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 52, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 4, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.g5.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 53, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 4, + "hideHardwareSpecs": false, + "memoryGiB": 384, + "name": "ml.g5.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 54, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 768, + "name": "ml.g5.48xlarge", + "vcpuNum": 192 + }, + { + "_defaultOrder": 55, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 1152, + "name": "ml.p4d.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 56, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 1152, + "name": "ml.p4de.24xlarge", + "vcpuNum": 96 + } + ], + "instance_type": "ml.t3.medium", + "kernelspec": { + "display_name": "venv", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.16" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/04_Chatbot/00_Chatbot_Titan.ipynb b/04_Chatbot/00_Chatbot_Titan.ipynb new file mode 100644 index 00000000..772e6327 --- /dev/null +++ b/04_Chatbot/00_Chatbot_Titan.ipynb @@ -0,0 +1,1277 @@ +{ + "cells": [ + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Conversational Interface - Chatbot with Titan LLM\n", + "\n", + "In this notebook, we will build a chatbot using the Foundational Models (FMs) in Amazon Bedrock. For our use-case we use Titan as our FM for building the chatbot.\n", + "\n", + "Amazon Bedrock currently supports the following Claude models:\n", + "| Provider | Model Name | Versions | `id` |\n", + "| --- | --- | --- | --- |\n", + "| Amazon | Titan Text | Large | `amazon.titan-tg1-large` |\n", + "|" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Overview\n", + "\n", + "Conversational interfaces such as chatbots and virtual assistants can be used to enhance the user experience for your customers.Chatbots uses natural language processing (NLP) and machine learning algorithms to understand and respond to user queries. Chatbots can be used in a variety of applications, such as customer service, sales, and e-commerce, to provide quick and efficient responses to users. They can be accessed through various channels such as websites, social media platforms, and messaging apps.\n", + "\n", + "\n", + "## Chatbot using Amazon Bedrock\n", + "\n", + "![Amazon Bedrock - Conversational Interface](./images/chatbot_bedrock.png)\n", + "\n", + "\n", + "## Use Cases\n", + "\n", + "1. **Chatbot (Basic)** - Zero Shot chatbot with a FM model\n", + "2. **Chatbot using prompt** - template(Langchain) - Chatbot with some context provided in the prompt template\n", + "3. **Chatbot with persona** - Chatbot with defined roles. i.e. Career Coach and Human interactions\n", + "4. **Contextual-aware chatbot** - Passing in context through an external file by generating embeddings.\n", + "\n", + "## Langchain framework for building Chatbot with Amazon Bedrock\n", + "In Conversational interfaces such as chatbots, it is highly important to remember previous interactions, both at a short term but also at a long term level.\n", + "\n", + "LangChain provides memory components in two forms. First, LangChain provides helper utilities for managing and manipulating previous chat messages. These are designed to be modular and useful regardless of how they are used. Secondly, LangChain provides easy ways to incorporate these utilities into chains.\n", + "It allows us to easily define and interact with different types of abstractions, which make it easy to build powerful chatbots.\n", + "\n", + "## Building Chatbot with Context - Key Elements\n", + "\n", + "The first process in a building a contextual-aware chatbot is to **generate embeddings** for the context. Typically, you will have an ingestion process which will run through your embedding model and generate the embeddings which will be stored in a sort of a vector store. In this example we are using a GPT-J embeddings model for this\n", + "\n", + "![Embeddings](./images/embeddings_lang.png)\n", + "\n", + "Second process is the user request orchestration , interaction, invoking and returing the results\n", + "\n", + "![Chatbot](./images/chatbot_lang.png)\n", + "\n", + "## Architecture [Context Aware Chatbot]\n", + "![4](./images/context-aware-chatbot.png)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### ⚠️⚠️⚠️ Execute the following cells before running this notebook ⚠️⚠️⚠️\n", + "\n", + "For a detailed description on what the following cells do refer to [Bedrock boto3 setup](../00_Intro/bedrock_boto3_setup.ipynb) notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Make sure you run `download-dependencies.sh` from the root of the repository to download the dependencies before running this cell\n", + "%pip install ../dependencies/botocore-1.29.162-py3-none-any.whl ../dependencies/boto3-1.26.162-py3-none-any.whl ../dependencies/awscli-1.27.162-py3-none-any.whl --force-reinstall" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Installing the dependencies" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%pip install faiss-cpu==1.7.4 --quiet\n", + "%pip install pypdf==3.8.1 --quiet\n", + "%pip install langchain==0.0.190 --quiet\n", + "%pip install ipywidgets==7.7.0" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#### Un comment the following lines to run from your local environment outside of the AWS account with Bedrock access\n", + "\n", + "#import os\n", + "#os.environ['BEDROCK_ASSUME_ROLE'] = ''\n", + "#os.environ['AWS_PROFILE'] = ''" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import boto3\n", + "import json\n", + "import os\n", + "import sys\n", + "\n", + "module_path = \"..\"\n", + "sys.path.append(os.path.abspath(module_path))\n", + "from utils import bedrock, print_ww\n", + "\n", + "os.environ['AWS_DEFAULT_REGION'] = 'us-east-1'\n", + "boto3_bedrock = bedrock.get_bedrock_client(os.environ.get('BEDROCK_ASSUME_ROLE', None))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain.chains import ConversationChain\n", + "from langchain.memory import ConversationBufferMemory" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Chatbot (Basic - without context)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": { + "tags": [] + }, + "source": [ + "#### Using CoversationChain from LangChain to start the conversation\n", + "Chatbots needs to remember the previous interactions. Conversational memory allows us to do that.There are several ways that we can implement conversational memory. In the context of LangChain, they are all built on top of the ConversationChain.\n", + "\n", + "Note: The model outputs are non-deterministic" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from langchain.llms.bedrock import Bedrock\n", + "\n", + "titan_llm = Bedrock(model_id=\"amazon.titan-tg1-large\", client=boto3_bedrock)\n", + "memory = ConversationBufferMemory()\n", + "conversation = ConversationChain(\n", + " llm=titan_llm, verbose=True, memory=memory\n", + ")\n", + "\n", + "print_ww(conversation.predict(input=\"Hi there!\"))" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### New Questions\n", + "\n", + "Model has responded with intial message, let's ask few questions" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "print_ww(conversation.predict(input=\"Give me a few tips on how to start a new garden.\"))" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Build on the questions\n", + "\n", + "Let's ask a question without mentioning the word garden to see if model can understand previous conversation" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "print_ww(conversation.predict(input=\"Cool. Will that work with tomatoes?\"))" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Finishing this conversation" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "print_ww(conversation.predict(input=\"That's all, thank you!\"))" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Chatbot using prompt template(Langchain)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "PromptTemplate is responsible for the construction of this input. LangChain provides several classes and functions to make constructing and working with prompts easy. We will use the default Prompt Template here. [PromptTemplate](https://python.langchain.com/en/latest/modules/prompts/getting_started.html)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from langchain.memory import ConversationBufferMemory\n", + "from langchain import PromptTemplate\n", + "\n", + "chat_history = []\n", + "\n", + "# turn verbose to true to see the full logs and documents\n", + "qa= ConversationChain(\n", + " llm=titan_llm, verbose=False, memory=ConversationBufferMemory() #memory_chain\n", + ")\n", + "\n", + "print(f\"ChatBot:DEFAULT:PROMPT:TEMPLATE: is ={qa.prompt.template}\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import ipywidgets as ipw\n", + "from IPython.display import display, clear_output\n", + "\n", + "class ChatUX:\n", + " \"\"\" A chat UX using IPWidgets\n", + " \"\"\"\n", + " def __init__(self, qa, retrievalChain = False):\n", + " self.qa = qa\n", + " self.name = None\n", + " self.b=None\n", + " self.retrievalChain = retrievalChain\n", + " self.out = ipw.Output()\n", + "\n", + "\n", + " def start_chat(self):\n", + " print(\"Starting chat bot\")\n", + " display(self.out)\n", + " self.chat(None)\n", + "\n", + "\n", + " def chat(self, _):\n", + " if self.name is None:\n", + " prompt = \"\"\n", + " else: \n", + " prompt = self.name.value\n", + " if 'q' == prompt or 'quit' == prompt or 'Q' == prompt:\n", + " print(\"Thank you , that was a nice chat !!\")\n", + " return\n", + " elif len(prompt) > 0:\n", + " with self.out:\n", + " thinking = ipw.Label(value=\"Thinking...\")\n", + " display(thinking)\n", + " try:\n", + " if self.retrievalChain:\n", + " result = self.qa.run({'question': prompt })\n", + " else:\n", + " result = self.qa.run({'input': prompt }) #, 'history':chat_history})\n", + " except:\n", + " result = \"No answer\"\n", + " thinking.value=\"\"\n", + " print_ww(f\"AI:{result}\")\n", + " self.name.disabled = True\n", + " self.b.disabled = True\n", + " self.name = None\n", + " \n", + " if self.name is None:\n", + " with self.out:\n", + " self.name = ipw.Text(description=\"You:\", placeholder='q to quit')\n", + " self.b = ipw.Button(description=\"Send\")\n", + " self.b.on_click(self.chat)\n", + " display(ipw.Box(children=(self.name, self.b)))" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's start a chat" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "chat = ChatUX(qa)\n", + "chat.start_chat()" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": { + "tags": [] + }, + "source": [ + "## Chatbot with persona" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "AI assistant will play the role of a career coach. Role Play Dialogue requires user message to be set in before starting the chat. ConversationBufferMemory is used to pre-populate the dialog" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "memory = ConversationBufferMemory()\n", + "memory.chat_memory.add_user_message(\"You will be acting as a career coach. Your goal is to give career advice to users\")\n", + "memory.chat_memory.add_ai_message(\"I am career coach and give career advice\")\n", + "titan_llm = Bedrock(model_id=\"amazon.titan-tg1-large\",client=boto3_bedrock)\n", + "conversation = ConversationChain(\n", + " llm=titan_llm, verbose=True, memory=memory\n", + ")\n", + "\n", + "print_ww(conversation.predict(input=\"What are the career options in AI?\"))" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "##### Let's ask a question that is not specaility of this Persona and the model shouldnn't answer that question and give a reason for that" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "conversation.verbose = False\n", + "print_ww(conversation.predict(input=\"How to fix my car?\"))" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Chatbot with Context \n", + "In this use case we will ask the Chatbot to answer question from the context that it was passed. We will take a csv file and use Titan embeddings Model to create the vector. This vector is stored in FAISS. When chatbot is asked a question we pass this vector and retrieve the answer. " + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Use a Titan embeddings Model - so we can use that to generate the embeddings for the documents\n", + "\n", + "Embeddings are a way to represent words, phrases or any other discrete items as vectors in a continuous vector space. This allows machine learning models to perform mathematical operations on these representations and capture semantic relationships between them.\n", + "\n", + "\n", + "This will be used for the RAG [document search capability](https://labelbox.com/blog/how-vector-similarity-search-works/) \n", + "\n", + "Other Embeddings posible are here. [LangChain Embeddings](https://python.langchain.com/en/latest/reference/modules/embeddings.html)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from langchain.embeddings import BedrockEmbeddings\n", + "from langchain.vectorstores import FAISS\n", + "from langchain import PromptTemplate\n", + "\n", + "br_embeddings = BedrockEmbeddings(client=boto3_bedrock)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Create the embeddings for document search" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Vector store indexer. \n", + "\n", + "This is what stores and matches the embeddings.This notebook showcases Chroma and FAISS and will be transient and in memory. The VectorStore Api's are available [here](https://python.langchain.com/en/harrison-docs-refactor-3-24/reference/modules/vectorstore.html)\n", + "\n", + "We will use our own Custom implementation of SageMaker Embeddings which needs a reference to the SageMaker endpoint to call the model which will return the embeddings. This will be used by the FAISS or Chroma to store in memory and be used when ever the User runs a query" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### VectorStore as FAISS \n", + "\n", + "You can read up about [FAISS](https://arxiv.org/pdf/1702.08734.pdf) in memory vector store here. However for our example it will be the same \n", + "\n", + "Chroma\n", + "\n", + "[Chroma](https://www.trychroma.com/) is a super simple vector search database. The core-API consists of just four functions, allowing users to build an in-memory document-vector store. By default Chroma uses the Hugging Face transformers library to vectorize documents.\n", + "\n", + "Weaviate\n", + "\n", + "[Weaviate](https://github.com/weaviate/weaviate) is a very posh looking tool - not only does Weaviate offer a GraphQL API with support for vector search. It also allows users to vectorize their content using Weaviate's inbuilt modules or custom modules." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from langchain.document_loaders import CSVLoader\n", + "from langchain.text_splitter import CharacterTextSplitter\n", + "from langchain.indexes.vectorstore import VectorStoreIndexWrapper\n", + "\n", + "s3_path = f\"s3://jumpstart-cache-prod-us-east-2/training-datasets/Amazon_SageMaker_FAQs/Amazon_SageMaker_FAQs.csv\"\n", + "!aws s3 cp $s3_path ./rag_data/Amazon_SageMaker_FAQs.csv\n", + "\n", + "loader = CSVLoader(\"./rag_data/Amazon_SageMaker_FAQs.csv\") # --- > 219 docs with 400 chars\n", + "documents_aws = loader.load() #\n", + "print(f\"documents:loaded:size={len(documents_aws)}\")\n", + "\n", + "docs = CharacterTextSplitter(chunk_size=2000, chunk_overlap=400, separator=\",\").split_documents(documents_aws)\n", + "\n", + "print(f\"Documents:after split and chunking size={len(docs)}\")\n", + "\n", + "vectorstore_faiss_aws = FAISS.from_documents(\n", + " documents=docs,\n", + " embedding = br_embeddings, \n", + " #**k_args\n", + ")\n", + "\n", + "print(f\"vectorstore_faiss_aws:created={vectorstore_faiss_aws}::\")\n" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### To run a quick low code test \n", + "\n", + "We can use a Wrapper class provided by LangChain to query the vector data base store and return to us the relevant documents. Behind the scenes this is only going to run a QA Chain with all default values" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "wrapper_store_faiss = VectorStoreIndexWrapper(vectorstore=vectorstore_faiss_aws)\n", + "print_ww(wrapper_store_faiss.query(\"R in SageMaker\", llm=titan_llm))" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Chatbot application\n", + "\n", + "For the chatbot we need context management, history, vector stores, and many other things. We will start by with a ConversationalRetrievalChain\n", + "\n", + "This uses conversation memory and RetrievalQAChain which Allow for passing in chat history which can be used for follow up questions.Source: https://python.langchain.com/en/latest/modules/chains/index_examples/chat_vector_db.html\n", + "\n", + "Set verbose to True to see all the what is going on behind the scenes" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from langchain.memory import ConversationBufferMemory\n", + "from langchain.chains import ConversationChain\n", + "from langchain.chains import ConversationalRetrievalChain\n", + "from langchain.chains.conversational_retrieval.prompts import CONDENSE_QUESTION_PROMPT\n", + "\n", + "\n", + "def create_prompt_template():\n", + " _template = \"\"\"{chat_history}\n", + "\n", + "Answer only with the new question.\n", + "How would you ask the question considering the previous conversation: {question}\n", + "Question:\"\"\"\n", + " CONVO_QUESTION_PROMPT = PromptTemplate.from_template(_template)\n", + " return CONVO_QUESTION_PROMPT\n", + "\n", + "memory_chain = ConversationBufferMemory(memory_key=\"chat_history\", input_key=\"question\", return_messages=True)\n", + "chat_history=[]" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Parameters used for ConversationRetrievalChain\n", + "retriever: We used VectoreStoreRetriver, which is backed by a VectorStore. To retrieve text, there are two search types you can choose: search_type: “similarity” or “mmr”. search_type=\"similarity\" uses similarity search in the retriever object where it selects text chunk vectors that are most similar to the question vector.\n", + "\n", + "memory: Memory Chain to store the history \n", + "\n", + "condense_question_prompt: Given a question from the user, we use the previous conversation and that question to make up a standalone question\n", + "\n", + "chain_type: If the chat history is long and doesn't fit the context you use this parameter and the options are \"stuff\", \"refine\", \"map_reduce\", \"map-rerank\"\n", + "\n", + "Note: If the question asked is outside the scope of context passed then the model will reply it doesn't know the answer" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# turn verbose to true to see the full logs and documents\n", + "from langchain.memory import ConversationBufferMemory\n", + "from langchain.chains import ConversationChain\n", + "from langchain.chains import ConversationalRetrievalChain\n", + "qa = ConversationalRetrievalChain.from_llm(\n", + " llm=titan_llm, \n", + " retriever=vectorstore_faiss_aws.as_retriever(), \n", + " #retriever=vectorstore_faiss_aws.as_retriever(search_type='similarity', search_kwargs={\"k\": 8}),\n", + " memory=memory_chain,\n", + " #verbose=True,\n", + " #condense_question_prompt=CONDENSE_QUESTION_PROMPT, # create_prompt_template(), \n", + " chain_type='stuff', # 'refine',\n", + " #max_tokens_limit=100\n", + ")\n", + "\n", + "qa.combine_docs_chain.llm_chain.prompt = PromptTemplate.from_template(\"\"\"\n", + "{context}\n", + "\n", + "Use at maximum 3 sentences to answer the question inside the XML tags. \n", + "\n", + "{question}\n", + "\n", + "Do not use any XML tags in the answer. If the answer is not in the context say \"Sorry, I don't know, as the answer was not found in the context.\"\n", + "\n", + "Answer:\"\"\")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's start a chat" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "chat = ChatUX(qa, retrievalChain=True)\n", + "chat.start_chat()" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### In this demo we used Titan LLM to create conversational interface with following patterns:\n", + "\n", + "1. Chatbot (Basic - without context)\n", + "\n", + "2. Chatbot using prompt template(Langchain)\n", + "\n", + "3. Chatbot with personas\n", + "\n", + "4. Chatbot with context" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [] + } + ], + "metadata": { + "availableInstances": [ + { + "_defaultOrder": 0, + "_isFastLaunch": true, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 4, + "name": "ml.t3.medium", + "vcpuNum": 2 + }, + { + "_defaultOrder": 1, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 8, + "name": "ml.t3.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 2, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.t3.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 3, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.t3.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 4, + "_isFastLaunch": true, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 8, + "name": "ml.m5.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 5, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.m5.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 6, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.m5.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 7, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.m5.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 8, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.m5.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 9, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.m5.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 10, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.m5.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 11, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 384, + "name": "ml.m5.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 12, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 8, + "name": "ml.m5d.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 13, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.m5d.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 14, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.m5d.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 15, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.m5d.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 16, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.m5d.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 17, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.m5d.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 18, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.m5d.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 19, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 384, + "name": "ml.m5d.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 20, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": true, + "memoryGiB": 0, + "name": "ml.geospatial.interactive", + "supportedImageNames": [ + "sagemaker-geospatial-v1-0" + ], + "vcpuNum": 0 + }, + { + "_defaultOrder": 21, + "_isFastLaunch": true, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 4, + "name": "ml.c5.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 22, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 8, + "name": "ml.c5.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 23, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.c5.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 24, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.c5.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 25, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 72, + "name": "ml.c5.9xlarge", + "vcpuNum": 36 + }, + { + "_defaultOrder": 26, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 96, + "name": "ml.c5.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 27, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 144, + "name": "ml.c5.18xlarge", + "vcpuNum": 72 + }, + { + "_defaultOrder": 28, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.c5.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 29, + "_isFastLaunch": true, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.g4dn.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 30, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.g4dn.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 31, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.g4dn.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 32, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.g4dn.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 33, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 4, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.g4dn.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 34, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.g4dn.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 35, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 61, + "name": "ml.p3.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 36, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 4, + "hideHardwareSpecs": false, + "memoryGiB": 244, + "name": "ml.p3.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 37, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 488, + "name": "ml.p3.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 38, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 768, + "name": "ml.p3dn.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 39, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.r5.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 40, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.r5.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 41, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.r5.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 42, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.r5.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 43, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.r5.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 44, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 384, + "name": "ml.r5.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 45, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 512, + "name": "ml.r5.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 46, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 768, + "name": "ml.r5.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 47, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.g5.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 48, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.g5.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 49, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.g5.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 50, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.g5.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 51, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.g5.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 52, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 4, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.g5.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 53, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 4, + "hideHardwareSpecs": false, + "memoryGiB": 384, + "name": "ml.g5.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 54, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 768, + "name": "ml.g5.48xlarge", + "vcpuNum": 192 + }, + { + "_defaultOrder": 55, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 1152, + "name": "ml.p4d.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 56, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 1152, + "name": "ml.p4de.24xlarge", + "vcpuNum": 96 + } + ], + "instance_type": "ml.t3.medium", + "kernelspec": { + "display_name": "venv", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.11" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/04_Chatbot/README.md b/04_Chatbot/README.md new file mode 100644 index 00000000..29cc40d3 --- /dev/null +++ b/04_Chatbot/README.md @@ -0,0 +1,49 @@ +# Conversational Interface - Chatbots + +## Overview + +Conversational interfaces such as chatbots and virtual assistants can be used to enhance the user experience for your customers.Chatbots uses natural language processing (NLP) and machine learning algorithms to understand and respond to user queries. Chatbots can be used in a variety of applications, such as customer service, sales, and e-commerce, to provide quick and efficient responses to users. They can be accessed through various channels such as websites, social media platforms, and messaging apps. + + +## Chatbot using Amazon Bedrock + +![Amazon Bedrock - Conversational Interface](./images/chatbot_bedrock.png) + +## Use Cases + +1. **Chatbot (Basic)** - Zero Shot chatbot with a FM model +2. **Chatbot using prompt** - template(Langchain) - Chatbot with some context provided in the prompt template +3. **Chatbot with persona** - Chatbot with defined roles. i.e. Career Coach and Human interactions +4. **Contextual-aware chatbot** - Passing in context through an external file by generating embeddings. + +## Langchain framework for building Chatbot with Amazon Bedrock +In Conversational interfaces such as chatbots, it is highly important to remember previous interactions, both at a short term but also at a long term level. + +LangChain provides memory components in two forms. First, LangChain provides helper utilities for managing and manipulating previous chat messages. These are designed to be modular and useful regardless of how they are used. Secondly, LangChain provides easy ways to incorporate these utilities into chains. +It allows us to easily define and interact with different types of abstractions, which make it easy to build powerful chatbots. + +## Building Chatbot with Context - Key Elements + +The first process in a building a contextual-aware chatbot is to **generate embeddings** for the context. Typically, you will have an ingestion process which will run through your embedding model and generate the embeddings which will be stored in a sort of a vector store. In this example we are using a GPT-J embeddings model for this + +![Embeddings](./images/embeddings_lang.png) + +Second process is the user request orchestration , interaction, invoking and returing the results + +![Chatbot](./images/chatbot_lang.png) + +## Architecture [Context Aware Chatbot] +![4](./images/context-aware-chatbot.png) + +In this architecture: + +1. The question asked to the LLM, is run through the embeddings model +2. The context documents are embedded using the [Amazon Titan Embeddings Model](https://aws.amazon.com/bedrock/titan/) and stored in the vector database. +3. The embedded text is then input to the FM for contextual search and including the chat history +4. The FM model then gives you the results based on the context. + +## Notebooks +This module provides you with 2 notebooks for the same pattern. You can experience conversation with Anthropic Claude as well as Amazon Titan Text Large to experience each the conversational power of each model. + +1. [Chatbot using Claude](./00_Chatbot_Claude.ipynb) +2. [Chatbot using Titan](./00_Chatbot_Titan.ipynb) diff --git a/04_Chatbot/images/chatbot_bedrock.png b/04_Chatbot/images/chatbot_bedrock.png new file mode 100644 index 00000000..0350c8e1 Binary files /dev/null and b/04_Chatbot/images/chatbot_bedrock.png differ diff --git a/04_Chatbot/images/chatbot_lang.png b/04_Chatbot/images/chatbot_lang.png new file mode 100644 index 00000000..6c73f602 Binary files /dev/null and b/04_Chatbot/images/chatbot_lang.png differ diff --git a/04_Chatbot/images/context-aware-chatbot.png b/04_Chatbot/images/context-aware-chatbot.png new file mode 100644 index 00000000..70b40982 Binary files /dev/null and b/04_Chatbot/images/context-aware-chatbot.png differ diff --git a/04_Chatbot/images/embeddings_lang.png b/04_Chatbot/images/embeddings_lang.png new file mode 100644 index 00000000..11dae65b Binary files /dev/null and b/04_Chatbot/images/embeddings_lang.png differ diff --git a/05_Image/Bedrock Stable Diffusion XL.ipynb b/05_Image/Bedrock Stable Diffusion XL.ipynb new file mode 100644 index 00000000..8d59b93d --- /dev/null +++ b/05_Image/Bedrock Stable Diffusion XL.ipynb @@ -0,0 +1,314 @@ +{ + "cells": [ + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Introduction to Bedrock - Generating images using Stable Diffusion\n", + "---\n", + "In this demo notebook, we demonstrate how to use the Bedrock SDK for an image generation task. We show how to use the Stable Diffusion foundational model to create images\n", + "1. Text to Image\n", + "2. Image to Image\n", + "\n", + "Images in Stable Diffusion are generated by these 4 main models below\n", + "1. The CLIP text encoder;\n", + "2. The VAE decoder;\n", + "3. The UNet, and\n", + "4. The VAE_post_quant_conv\n", + "\n", + "These blocks are chosen because they represent the bulk of the compute in the pipeline\n", + "\n", + "see this diagram below\n", + "\n", + "![SD Architecture](./images/sd.png)\n", + "\n", + "#### Image prompting\n", + "\n", + "Writing a good prompt can sometime be an art. It is often difficult to predict whether a certain prompt will yield a satisfactory image with a given model. However, there are certain templates that have been observed to work. Broadly, a prompt can be roughly broken down into three pieces: (i) type of image (photograph/sketch/painting etc.), (ii) description (subject/object/environment/scene etc.) and (iii) the style of the image (realistic/artistic/type of art etc.). You can change each of the three parts individually to generate variations of an image. Adjectives have been known to play a significant role in the image generation process. Also, adding more details help in the generation process.\n", + "\n", + "To generate a realistic image, you can use phrases such as “a photo of”, “a photograph of”, “realistic” or “hyper realistic”. To generate images by artists you can use phrases like “by Pablo Piccaso” or “oil painting by Rembrandt” or “landscape art by Frederic Edwin Church” or “pencil drawing by Albrecht Dürer”. You can also combine different artists as well. To generate artistic image by category, you can add the art category in the prompt such as “lion on a beach, abstract”. Some other categories include “oil painting”, “pencil drawing, “pop art”, “digital art”, “anime”, “cartoon”, “futurism”, “watercolor”, “manga” etc. You can also include details such as lighting or camera lens such as 35mm wide lens or 85mm wide lens and details about the framing (portrait/landscape/close up etc.).\n", + "\n", + "Note that model generates different images even if same prompt is given multiple times. So, you can generate multiple images and select the image that suits your application best." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### ⚠️⚠️⚠️ Execute the following cells before running this notebook ⚠️⚠️⚠️\n", + "\n", + "For a detailed description on what the following cells do refer to [Bedrock boto3 setup](../00_Intro/bedrock_boto3_setup.ipynb) notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Make sure you run `download-dependencies.sh` from the root of the repository to download the dependencies before running this cell\n", + "%pip install ../dependencies/botocore-1.29.162-py3-none-any.whl ../dependencies/boto3-1.26.162-py3-none-any.whl ../dependencies/awscli-1.27.162-py3-none-any.whl --force-reinstall\n", + "%pip install langchain==0.0.190 --quiet" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#### Un comment the following lines to run from your local environment outside of the AWS account with Bedrock access\n", + "\n", + "#import os\n", + "#os.environ['BEDROCK_ASSUME_ROLE'] = ''\n", + "#os.environ['AWS_PROFILE'] = ''" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import boto3\n", + "import json\n", + "import os\n", + "import sys\n", + "\n", + "module_path = \"..\"\n", + "sys.path.append(os.path.abspath(module_path))\n", + "from utils import bedrock\n", + "\n", + "os.environ['AWS_DEFAULT_REGION'] = 'us-east-1'\n", + "boto3_bedrock = bedrock.get_bedrock_client(os.environ.get('BEDROCK_ASSUME_ROLE', None))" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Install additional dependencies" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%pip install pillow==9.5.0" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import io, base64\n", + "from PIL import Image" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Text to Image\n", + "In order to generate an image, a description of what needs to be generated is needed. This is called `prompt`.\n", + "\n", + "You can also provide some negative prompts to guide the model to avoid certain type of outputs.\n", + "\n", + "Prompt acts as the input to the model and steers the model to generate a relevant output. With Stable Diffusion XL you have the option to choose certain [style presets](https://platform.stability.ai/docs/release-notes#style-presets) as well" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "prompt = \"Dog in a forest\"\n", + "negative_prompts = [\n", + " \"poorly rendered\", \n", + " \"poor background details\", \n", + " \"poorly drawn dog\", \n", + " \"disfigured dog features\"\n", + " ]\n", + "style_preset = \"photographic\" # (photographic, digital-art, cinematic, ...)\n", + "#prompt = \"photo taken from above of an italian landscape. cloud is clear with few clouds. Green hills and few villages, a lake\"" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "`Bedrock` class implements a method `generate_image`. This method takes input a prompt and prepares a payload to be sent over to Bedrock API.\n", + "You can provide the following model inference parameters to control the repetitiveness of responses:\n", + "- prompt (string): Input text prompt for the model\n", + "- seed (int): Determines initial noise. Using same seed with same settings will create similar images.\n", + "- cfg_scale (float): Presence strength - Determines how much final image portrays prompts.\n", + "- steps (int): Generation step - How many times image is sampled. More steps may be more accurate." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As an output the Bedrock generates a `base64` encoded string respresentation of the image." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "model = bedrock.Bedrock(boto3_bedrock)\n", + "base_64_img_str = model.generate_image(prompt, cfg_scale=5, seed=5450, steps=70, style_preset=style_preset)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can convert the `base64` image to a PIL image to be displayed" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "image_1 = Image.open(io.BytesIO(base64.decodebytes(bytes(base_64_img_str, \"utf-8\"))))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "image_1" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Image to Image\n", + "\n", + "Stable Diffusion let's us do some interesting stuff with our images like adding new characters or modifying scenery let's give it a try.\n", + "\n", + "You can use the previously generated image or use a different one to create a base64 string to be passed on as an initial image to the model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from io import BytesIO\n", + "from base64 import b64encode\n", + "\n", + "buffer = BytesIO()\n", + "image_1.save(buffer, format=\"JPEG\")\n", + "img_bytes = buffer.getvalue()\n", + "init_image = b64encode(img_bytes).decode()" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A new guiding prompt can then help the model to act on the intial image" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "change_prompt = \"add some leaves around the dog\"" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The `generate_image` method also accepts an additional paramter `init_image` which can be used to pass the initial image to the Stable Diffusion model on Bedrock." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "base_64_img_str = model.generate_image(change_prompt, init_image=init_image, seed=321, start_schedule=0.6)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "image_2 = Image.open(io.BytesIO(base64.decodebytes(bytes(base_64_img_str, \"utf-8\"))))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "image_2" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Summary\n", + "\n", + "And play around with different prompts to see amazing results." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "bedrock", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.11" + }, + "orig_nbformat": 4 + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/05_Image/README.md b/05_Image/README.md new file mode 100644 index 00000000..7e9ac12d --- /dev/null +++ b/05_Image/README.md @@ -0,0 +1,45 @@ +# Image Generation +### Overview + +Image generation can be a tidious task for artists, designers and content creators who illustrate their thoughts with the help of images. With the help of Foundation Models (FM) this tidious task can be streamlined to just a single line of text that expresses the thought of the artist, FMs can be used for creating realistic and artistic images of various subjects, environments, and scenes from language prompts. + +In this lab we will explore how to use a foundation model available with Amazon Bedrock to generate images as well as modify existing images. + + +### Image prompting + +Writing a good prompt can sometime be an art. It is often difficult to predict whether a certain prompt will yield a satisfactory image with a given model. However, there are certain templates that have been observed to work. Broadly, a prompt can be roughly broken down into three pieces: (i) type of image (photograph/sketch/painting etc.), (ii) description (subject/object/environment/scene etc.) and (iii) the style of the image (realistic/artistic/type of art etc.). You can change each of the three parts individually to generate variations of an image. Adjectives have been known to play a significant role in the image generation process. Also, adding more details help in the generation process.To generate a realistic image, you can use phrases such as “a photo of”, “a photograph of”, “realistic” or “hyper realistic”. + +To generate images by artists you can use phrases like “by Pablo Piccaso” or “oil painting by Rembrandt” or “landscape art by Frederic Edwin Church” or “pencil drawing by Albrecht Dürer”. You can also combine different artists as well. To generate artistic image by category, you can add the art category in the prompt such as “lion on a beach, abstract”. Some other categories include “oil painting”, “pencil drawing, “pop art”, “digital art”, “anime”, “cartoon”, “futurism”, “watercolor”, “manga” etc. You can also include details such as lighting or camera lens such as 35mm wide lens or 85mm wide lens and details about the framing (portrait/landscape/close up etc.). + +Note that model generates different images even if same prompt is given multiple times. So, you can generate multiple images and select the image that suits your application best. + +## Foundation Model + +To provide this capability, Amazon Bedrock supports a propreitary foundation model, [Stable Diffusion XL](https://stability.ai/stablediffusion) for image generation from Stability AI. Stable Diffusion works on the principle of diffusion and is composed of multiple models each having different purpose: + +1. The CLIP text encoder; +2. The VAE decoder; +3. The UNet, and +4. The VAE_post_quant_conv + +The workings can be explained with this architecture: +![Stable Diffusion Architecture](./images/sd.png) + +## Target Audience + +Marketing companies, agencies, web-designers, and general companies can take advantage on this feature to generate brand new images, from scratch. + +## Patterns + +In this workshop, you will be able to learn following patterns on Image Generation using Amazon Bedrock: + +1. [Text to Image](./Bedrock%20Stable%20Diffusion%20XL.ipynb) + ![Text to Image](./images/71-txt-2-img.png) +2. [Image to Image (In-paiting)](./Bedrock%20Stable%20Diffusion%20XL.ipynb) + ![Text to Image](./images/72-img-2-img.png) + +## Helper +To facilitate image generation there is a utility class `Bedrock` implementation under `/utils/bedrock.py`. This helps you to generate images easily. + +You can also explore different `style_preset` options [here](https://platform.stability.ai/docs/features/animation/parameters#available-styles). \ No newline at end of file diff --git a/05_Image/images/71-txt-2-img.png b/05_Image/images/71-txt-2-img.png new file mode 100644 index 00000000..bbc6f7f3 Binary files /dev/null and b/05_Image/images/71-txt-2-img.png differ diff --git a/05_Image/images/72-img-2-img.png b/05_Image/images/72-img-2-img.png new file mode 100644 index 00000000..462a6795 Binary files /dev/null and b/05_Image/images/72-img-2-img.png differ diff --git a/05_Image/images/sd.png b/05_Image/images/sd.png new file mode 100644 index 00000000..01fc9fd3 Binary files /dev/null and b/05_Image/images/sd.png differ diff --git a/10-overview.png b/10-overview.png new file mode 100644 index 00000000..85b64457 Binary files /dev/null and b/10-overview.png differ diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md new file mode 100644 index 00000000..5b627cfa --- /dev/null +++ b/CODE_OF_CONDUCT.md @@ -0,0 +1,4 @@ +## Code of Conduct +This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). +For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact +opensource-codeofconduct@amazon.com with any additional questions or comments. diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 00000000..c4b6a1c5 --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,59 @@ +# Contributing Guidelines + +Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional +documentation, we greatly value feedback and contributions from our community. + +Please read through this document before submitting any issues or pull requests to ensure we have all the necessary +information to effectively respond to your bug report or contribution. + + +## Reporting Bugs/Feature Requests + +We welcome you to use the GitHub issue tracker to report bugs or suggest features. + +When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already +reported the issue. Please try to include as much information as you can. Details like these are incredibly useful: + +* A reproducible test case or series of steps +* The version of our code being used +* Any modifications you've made relevant to the bug +* Anything unusual about your environment or deployment + + +## Contributing via Pull Requests +Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that: + +1. You are working against the latest source on the *main* branch. +2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already. +3. You open an issue to discuss any significant work - we would hate for your time to be wasted. + +To send us a pull request, please: + +1. Fork the repository. +2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change. +3. Ensure local tests pass. +4. Commit to your fork using clear commit messages. +5. Send us a pull request, answering any default questions in the pull request interface. +6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation. + +GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and +[creating a pull request](https://help.github.com/articles/creating-a-pull-request/). + + +## Finding contributions to work on +Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start. + + +## Code of Conduct +This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). +For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact +opensource-codeofconduct@amazon.com with any additional questions or comments. + + +## Security issue notifications +If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue. + + +## Licensing + +See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution. diff --git a/LICENSE b/LICENSE new file mode 100644 index 00000000..09951d9f --- /dev/null +++ b/LICENSE @@ -0,0 +1,17 @@ +MIT No Attribution + +Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. + +Permission is hereby granted, free of charge, to any person obtaining a copy of +this software and associated documentation files (the "Software"), to deal in +the Software without restriction, including without limitation the rights to +use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of +the Software, and to permit persons to whom the Software is furnished to do so. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS +FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR +COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER +IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN +CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. + diff --git a/README.md b/README.md new file mode 100644 index 00000000..5ed826e7 --- /dev/null +++ b/README.md @@ -0,0 +1,126 @@ +# Amazon Bedrock Workshop + +## Introduction to the Repository and Workshop + +The goal of this workshop is to give you hands-on experience leveraging foundation models (FMs) through Amazon Bedrock. Amazon Bedrock is a fully managed service that provides access to FMs from third-party providers and Amazon; available via an API. With Bedrock, you can choose from a variety of models to find the one that’s best suited for your use case. + +Within this series of labs, you will be taken through some of the most common usage patterns we are seeing with our customers for Generative AI. We will explore techniques for generating text and images, creating value for organizations by improving productivity. This is achieved by leveraging foundation models to help in composing emails, summarizing text, answering questions, building chatbots, and creating images. You will gain hands-on experience using Bedrock APIs, SDKs, and open-source software for example LangChain and FAISS to implement these usage patterns. + +This workshop is intended for developers and solution builders. + +What’s included in this workshop: + +- Text Generation \[Estimated time to complete - 30 mins\] +- Text Summarization \[Estimated time to complete - 30 mins\] +- Questions Answering \[Estimated time to complete - 45 mins\] +- Chatbot \[Estimated time to complete - 45 mins\] +- Image Generation \[Estimated time to complete - 30 mins\] + +
+ +![10-overview](10-overview.png) + +
+ +Workshop Link: [https://catalog.us-east-1.prod.workshops.aws/workshops/a4bdb007-5600-4368-81c5-ff5b4154f518/en-US/](https://catalog.us-east-1.prod.workshops.aws/workshops/a4bdb007-5600-4368-81c5-ff5b4154f518/en-US) + + + + + + +## Using these notebooks + +Start by cloning the workshop repo + +```sh +git clone https://github.com/aws-samples/amazon-bedrock-workshop.git +cd amazon-bedrock-workshop +``` + +The bedrock SDK is not already a part of boto3. To download the additional python wheel run the following script +```sh +bash ./download-dependencies.sh +``` +This script will create a `dependencies` folder and download the relevant SDKs needed to use Amazon Bedrock. Which can then be installed as follows: + +```bash +pip install ./dependencies/botocore-1.29.162-py3-none-any.whl --force-reinstall +pip install ./dependencies/boto3-1.26.162-py3-none-any.whl --force-reinstall +pip install ./dependencies/awscli-1.27.162-py3-none-any.whl --force-reinstall +``` + +Following this a bedrock client can be created as follows: + +```python +import boto3 +bedrock = boto3.client("bedrock", region_name="us-east-1") +``` + +If you need to use a specific role to access bedrock, you can do so using a session as follows: + +```python +import boto3 +session = boto3.session.Session(profile_name='bedrock') +boto3_bedrock = session.client("bedrock", region_name="us-east-1") +``` + +## Content + +This repository contains notebook examples for the Bedrock Architecture Patterns workshop. The notebooks are organised by module as follows: + +## Intro + +[Simple Bedrock Usage](./00_Intro/bedrock_boto3_setup.ipynb) + +This notebook shows setting up the boto3 client and some basic usage of bedrock. + +### Generation + +[Simple use case with boto3](./01_Generation/00_generate_w_bedrock.ipynb) + +In this notebook, you generate text using Amazon Bedrock. We demonstrate consuming the Amazon Titan model directly with boto3 + +[Simple use case with LangChain](./01_Generation/01_zero_shot_generation.ipynb) + +We then perform the same task but using the popular frame LangChain + +[Generation with additional context](./01_Generation/02_contextual_generation.ipynb) + +We then take this further by enhancing the prompt with additional context in order to improve the response. + +### Summarization + +[Small text summarization](./02_Summarization/01.small-text-summarization-claude.ipynb) + +In this notebook, you use use Bedrock to perform a simple task of summarising a small piece of text. + +[Long text summarization](./02_Summarization/02.long-text-summarization-titan.ipynb) + +The above approach may not work as the content to be summarized gets larger and exceeds the max tokens of the model. In this notebook we show an approach of breaking the file up into smaller chunks, summarizing each chunk, and then summarizing the summaries. + +### Question Answering + +[Simple questions with context](./03_QuestionAnswering/00_qa_w_bedrock_titan.ipynb) + +This notebook shows a simple example answerting a question with given context by calling the model directly. + +[Answering questions with Retrieval Augmented Generation](./03_QuestionAnswering/01_qa_w_rag_claude.ipynb) + +We can improve the above process by implementing an architecure called Retreival Augmented Generation (RAG). RAG retrieves data from outside the language model (non-parametric) and augments the prompts by adding the relevant retrieved data in context. + +### Chatbot + +[Chatbot using Claude](./04_Chatbot/00_Chatbot_Claude.ipynb) + +This notebook shows a chatbot using Claude + +[Chatbot using Titan](./04_Chatbot/00_Chatbot_Titan.ipynb) + +This notebook shows a chatbot using Titan + +### Text to Image + +[Image Generation with Stable Diffusion](./05_Image/Bedrock%20Stable%20Diffusion%20XL.ipynb) + +This notebook demonstrates image generation with using the Stable Diffusion model diff --git a/download-dependencies.sh b/download-dependencies.sh new file mode 100644 index 00000000..fb3c313f --- /dev/null +++ b/download-dependencies.sh @@ -0,0 +1,10 @@ +#!/bin/sh + +echo "Creating directory" +mkdir -p ./dependencies && \ +cd ./dependencies && \ +echo "Downloading dependencies" +curl -sS https://preview.documentation.bedrock.aws.dev/Documentation/SDK/bedrock-python-sdk.zip > sdk.zip && \ +echo "Unpacking dependencies" +unzip sdk.zip && \ +rm sdk.zip \ No newline at end of file diff --git a/utils/__init__.py b/utils/__init__.py new file mode 100644 index 00000000..177af15b --- /dev/null +++ b/utils/__init__.py @@ -0,0 +1,23 @@ +import textwrap +from io import StringIO +import sys + +def print_ww(*args, **kwargs): + buffer = StringIO() + try: + _stdout = sys.stdout + sys.stdout = buffer + width = 100 + if 'width' in kwargs: + width = kwargs['width'] + del kwargs['width'] + print(*args, **kwargs) + output = buffer.getvalue() + finally: + sys.stdout = _stdout + for line in output.splitlines(): + print("\n".join(textwrap.wrap(line, width=width))) + + + + diff --git a/utils/bedrock.py b/utils/bedrock.py new file mode 100644 index 00000000..482f856a --- /dev/null +++ b/utils/bedrock.py @@ -0,0 +1,185 @@ +import json +import boto3 +import os +from typing import Any, Dict, List, Optional +from pydantic import root_validator +from time import sleep +from enum import Enum +import boto3 +from botocore.config import Config + +def get_bedrock_client(assumed_role=None, region='us-east-1', url_override = None): + boto3_kwargs = {} + session = boto3.Session() + + target_region = os.environ.get('AWS_DEFAULT_REGION',region) + + print(f"Create new client\n Using region: {target_region}") + if 'AWS_PROFILE' in os.environ: + print(f" Using profile: {os.environ['AWS_PROFILE']}") + + retry_config = Config( + region_name = target_region, + retries = { + 'max_attempts': 10, + 'mode': 'standard' + } + ) + + boto3_kwargs = {} + + if assumed_role: + print(f" Using role: {assumed_role}", end='') + sts = session.client("sts") + response = sts.assume_role( + RoleArn=str(assumed_role), # + RoleSessionName="langchain-llm-1" + ) + print(" ... successful!") + boto3_kwargs['aws_access_key_id']=response['Credentials']['AccessKeyId'] + boto3_kwargs['aws_secret_access_key']=response['Credentials']['SecretAccessKey'] + boto3_kwargs['aws_session_token']=response['Credentials']['SessionToken'] + + if url_override: + boto3_kwargs['endpoint_url']=url_override + + bedrock_client = session.client( + service_name='bedrock', + config=retry_config, + region_name= target_region, + **boto3_kwargs + ) + + print("boto3 Bedrock client successfully created!") + print(bedrock_client._endpoint) + return bedrock_client + + +class BedrockMode(Enum): + IMAGE = "image" + + +class BedrockModel(Enum): + STABLE_DIFFUSION = "stability.stable-diffusion-xl" + + +class Bedrock: + __DEFAULT_EMPTY_EMBEDDING = [ + 0.0 + ] * 4096 # - we need to return an array of floats 4096 in size + __RETRY_BACKOFF_SEC = 3 + __RETRY_ATTEMPTS = 3 + + def __init__(self, client=None) -> None: + if client is None: + self.client = get_bedrock_client(assumed_role=None) + else: + assert str(type(client)) == "", f"The client passed in not a valid boto3 bedrock client, got {type(client)}" + self.client = client + + @root_validator() + def validate_environment(cls, values: Dict) -> Dict: + bedrock_client = get_bedrock_client(assumed_role=None) #boto3.client("bedrock") + values["client"] = bedrock_client + return values + + def generate_image(self, prompt: str, init_image: Optional[str] = None, **kwargs): + """ + Invoke Bedrock model to generate embeddings. + Args: + text (str): Input text + """ + mode = BedrockMode.IMAGE + model_type = BedrockModel.STABLE_DIFFUSION + payload = self.prepare_input( + prompt, init_image=init_image, mode=mode, model_type=model_type, **kwargs + ) + response = self._invoke_model(model_id=model_type, body_string=payload) + _, _, img_base_64 = self.extract_results(response, model_type) + return img_base_64 + + @staticmethod + def extract_results(response, model_type: BedrockModel, verbose=False): + response = response["body"].read() + if verbose: + print(f"response body readlines() returns: {response}") + + json_obj = json.loads(response) + if model_type == BedrockModel.STABLE_DIFFUSION: + in_token_count, out_token_count = None, None + if json_obj["result"] == "success": + model_output = json_obj["artifacts"][0]["base64"] + else: + model_output = None + else: + raise Exception(f" This class is for Stable Diffusion ONLY::model_type={model_type}") + + return in_token_count, out_token_count, model_output + + @staticmethod + def prepare_input( + prompt_text, + negative_prompts=[], + stop_sequences=[], + cfg_scale=10, + seed=1, + steps=50, + start_schedule=0.5, + init_image=None, + style_preset='photographic', + mode=BedrockMode.IMAGE, + model_type=BedrockModel.STABLE_DIFFUSION, + **kwargs, + ): + stop_sequences = stop_sequences[ + :1 + ] # Temporary addition as Bedrock models can't take multiple stop_sequences yet. Will change later. + if mode == BedrockMode.IMAGE: + if model_type in [BedrockModel.STABLE_DIFFUSION]: + positives = [{"text": prompt_text, "weight": 1}] + negatives = [{"text": prompt, "weight": -1} for prompt in negative_prompts] + json_obj = { + "text_prompts": positives + negatives, + "cfg_scale": cfg_scale, + "seed": seed, + "steps": steps, + "style_preset": style_preset + } + if init_image is not None: + json_obj["init_image"] = init_image + json_obj["start_schedule"] = start_schedule + else: + raise Exception( + 'Unsupported model_type, only "STABLE_DIFFUSION" model_type is supported.' + ) + + return json.dumps(json_obj) + + def list_models(self): + response = self.client.list_foundation_models() + if response["ResponseMetadata"]["HTTPStatusCode"] == 200: + return response["modelSummaries"] + else: + raise Exception("Invalid response") + + def _invoke_model(self, model_id: BedrockModel, body_string: str): + body = bytes(body_string, "utf-8") + response = None + for attempt_no in range(self.__RETRY_ATTEMPTS): + try: + response = self.client.invoke_model( + modelId=model_id.value, + contentType="application/json", + accept="application/json", + body=body, + ) + break + except: + print( + f"bedrock:invoke_model: Attempt no. {attempt_no+1} failed:: Retrying after {self.__RETRY_BACKOFF_SEC} seconds!" + ) + sleep(self.__RETRY_BACKOFF_SEC) + continue + return response + +