refactor(img): Replace Bedrock util with boto3

Replace the custom `utils.bedrock.Bedrock` class (only used in the image workshop) with native boto3 SDK calls, to avoid the dependency on pydantic and hopefully simplify use of the newer Data Science 3.0 kernel in SageMaker Studio. Show how to save generated images to (gitignored) files, and load img2img inputs from files. Clarify commentary, remove mentions of old util class, and extend the summary
zack-anthropic · Aug 1, 2023 · ad0f57f · ad0f57f
1 parent de1c800
commit ad0f57f
Show file tree

Hide file tree

Showing 3 changed files with 139 additions and 201 deletions.
diff --git a/.gitignore b/.gitignore
@@ -6,3 +6,4 @@ __pycache__
 
 # Files generated by the workshop:
 /dependencies
+data/
diff --git a/05_Image/Bedrock Stable Diffusion XL.ipynb b/05_Image/Bedrock Stable Diffusion XL.ipynb
@@ -4,14 +4,13 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Introduction to Bedrock - Generating images using Stable Diffusion\n",
+    "# Generating images using Stable Diffusion\n",
     "\n",
     "> *This notebook should work well with the **`Data Science 2.0`** kernel in SageMaker Studio*\n",
     "\n",
     "---\n",
-    "In this demo notebook, we demonstrate how to use the Bedrock SDK for an image generation task. We show how to use the Stable Diffusion foundational model to create images\n",
-    "1. Text to Image\n",
-    "2. Image to Image\n",
+    "\n",
+    "In this demo notebook, we show how to use [Stable Diffusion XL](https://stability.ai/stablediffusion) (SDXL) on [Amazon Bedrock](https://aws.amazon.com/bedrock/) for image generation (text-to-image) and image editing (image-to-image).\n",
     "\n",
     "Images in Stable Diffusion are generated by these 4 main models below\n",
     "1. The CLIP text encoder;\n",
@@ -23,11 +22,22 @@
     "\n",
     "see this diagram below\n",
     "\n",
-    "![SD Architecture](./images/sd.png)\n",
+    "![SD Architecture](./images/sd.png)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Image prompting\n",
+    "\n",
+    "Writing a good prompt can be somewhat of an art. It's often difficult to predict whether a certain prompt will yield a satisfactory image with a given model. However, there are certain templates that have been observed to work. Broadly, a prompt can be roughly broken down into three pieces:\n",
     "\n",
-    "#### Image prompting\n",
+    "1. **Type** of image (photograph/sketch/painting etc.)\n",
+    "2. **Description** of the content (subject/object/environment/scene etc.), and\n",
+    "3. **Style** of the image (realistic/artistic/type of art etc.).\n",
     "\n",
-    "Writing a good prompt can sometime be an art. It is often difficult to predict whether a certain prompt will yield a satisfactory image with a given model. However, there are certain templates that have been observed to work. Broadly, a prompt can be roughly broken down into three pieces: (i) type of image (photograph/sketch/painting etc.), (ii) description (subject/object/environment/scene etc.) and (iii) the style of the image (realistic/artistic/type of art etc.). You can change each of the three parts individually to generate variations of an image. Adjectives have been known to play a significant role in the image generation process. Also, adding more details help in the generation process.\n",
+    "You can change each of the three parts individually to generate variations of an image. Adjectives have been known to play a significant role in the image generation process. Also, adding more details help in the generation process.\n",
     "\n",
     "To generate a realistic image, you can use phrases such as “a photo of”, “a photograph of”, “realistic” or “hyper realistic”. To generate images by artists you can use phrases like “by Pablo Piccaso” or “oil painting by Rembrandt” or “landscape art by Frederic Edwin Church” or “pencil drawing by Albrecht Dürer”. You can also combine different artists as well. To generate artistic image by category, you can add the art category in the prompt such as “lion on a beach, abstract”. Some other categories include “oil painting”, “pencil drawing, “pop art”, “digital art”, “anime”, “cartoon”, “futurism”, “watercolor”, “manga” etc. You can also include details such as lighting or camera lens such as 35mm wide lens or 85mm wide lens and details about the framing (portrait/landscape/close up etc.).\n",
     "\n",
@@ -50,7 +60,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "tags": []
+   },
    "outputs": [],
    "source": [
     "# Make sure you ran `download-dependencies.sh` from the root of the repository first!\n",
@@ -104,81 +116,92 @@
    "metadata": {},
    "source": [
     "## Text to Image\n",
-    "In order to generate an image, a description of what needs to be generated is needed. This is called `prompt`.\n",
     "\n",
-    "You can also provide some negative prompts to guide the model to avoid certain type of outputs.\n",
+    "In text-to-image mode, we'll provide a text description of what image **should** be generated, called a `prompt`.\n",
+    "\n",
+    "With Stable Diffusion XL (SDXL) we can also specify certain [style presets](https://platform.stability.ai/docs/release-notes#style-presets) to help influence the generation.\n",
+    "\n",
+    "But what if we want to nudge the model to ***avoid*** specific content or style choices? Because image generation models are typically trained from *image descriptions*, trying to directly specify what you **don't** want in the prompt (for example `man without a beard`) doesn't usually work well: It would be very unusual to describe an image by the things it isn't!\n",
     "\n",
-    "Prompt acts as the input to the model and steers the model to generate a relevant output. With Stable Diffusion XL you have the option to choose certain [style presets](https://platform.stability.ai/docs/release-notes#style-presets) as well"
+    "Instead, SDXL lets us specify a `weight` for each prompt, which can be negative. We'll use this to provide `negative_prompts` as shown below:"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "tags": []
+   },
    "outputs": [],
    "source": [
     "prompt = \"Dog in a forest\"\n",
     "negative_prompts = [\n",
-    "    \"poorly rendered\", \n",
-    "    \"poor background details\", \n",
-    "    \"poorly drawn dog\", \n",
-    "    \"disfigured dog features\"\n",
-    "    ]\n",
-    "style_preset = \"photographic\" # (photographic, digital-art, cinematic, ...)\n",
+    "    \"poorly rendered\",\n",
+    "    \"poor background details\",\n",
+    "    \"poorly drawn dog\",\n",
+    "    \"disfigured dog features\",\n",
+    "]\n",
+    "style_preset = \"photographic\"  # (e.g. photographic, digital-art, cinematic, ...)\n",
     "#prompt = \"photo taken from above of an italian landscape. cloud is clear with few clouds. Green hills and few villages, a lake\""
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "`Bedrock` class implements a method `generate_image`. This method takes input a prompt and prepares a payload to be sent over to Bedrock API.\n",
-    "You can provide the following model inference parameters to control the repetitiveness of responses:\n",
-    "- prompt (string): Input text prompt for the model\n",
-    "- seed (int): Determines initial noise. Using same seed with same settings will create similar images.\n",
-    "- cfg_scale (float): Presence strength - Determines how much final image portrays prompts.\n",
-    "- steps (int): Generation step - How many times image is sampled. More steps may be more accurate."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "As an output the Bedrock generates a `base64` encoded string respresentation of the image."
+    "The Amazon Bedrock `InvokeModel` provides access to SDXL by setting the right model ID, and returns a JSON response including a [Base64 encoded string](https://en.wikipedia.org/wiki/Base64) that represents the (PNG) image.\n",
+    "\n",
+    "For more information on available input parameters for the model, refer to the [Stability AI docs](https://platform.stability.ai/docs/api-reference#tag/v1generation/operation/textToImage).\n",
+    "\n",
+    "The cell below invokes the SDXL model through Amazon Bedrock to create an initial image string:"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "tags": []
+   },
    "outputs": [],
    "source": [
-    "model = bedrock.Bedrock(boto3_bedrock)\n",
-    "base_64_img_str = model.generate_image(prompt, cfg_scale=5, seed=5450, steps=70, style_preset=style_preset)"
+    "request = json.dumps({\n",
+    "    \"text_prompts\": (\n",
+    "        [{\"text\": prompt, \"weight\": 1.0}]\n",
+    "        + [{\"text\": negprompt, \"weight\": -1.0} for negprompt in negative_prompts]\n",
+    "    ),\n",
+    "    \"cfg_scale\": 5,\n",
+    "    \"seed\": 5450,\n",
+    "    \"steps\": 70,\n",
+    "    \"style_preset\": style_preset,\n",
+    "})\n",
+    "modelId = \"stability.stable-diffusion-xl\"\n",
+    "\n",
+    "response = boto3_bedrock.invoke_model(body=request, modelId=modelId)\n",
+    "response_body = json.loads(response.get(\"body\").read())\n",
+    "\n",
+    "print(response_body[\"result\"])\n",
+    "base_64_img_str = response_body[\"artifacts\"][0].get(\"base64\")\n",
+    "print(f\"{base_64_img_str[0:80]}...\")"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We can convert the `base64` image to a PIL image to be displayed"
+    "By decoding our Base64 string to binary, and loading it with an image processing library like [Pillow](https://pillow.readthedocs.io/en/stable/) that can read PNG files, we can display and manipulate the image here in the notebook:"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "image_1 = Image.open(io.BytesIO(base64.decodebytes(bytes(base_64_img_str, \"utf-8\"))))"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "tags": []
+   },
    "outputs": [],
    "source": [
+    "os.makedirs(\"data\", exist_ok=True)\n",
+    "image_1 = Image.open(io.BytesIO(base64.decodebytes(bytes(base_64_img_str, \"utf-8\"))))\n",
+    "image_1.save(\"data/image_1.png\")\n",
     "image_1"
    ]
   },
@@ -188,21 +211,41 @@
    "source": [
     "## Image to Image\n",
     "\n",
-    "Stable Diffusion let's us do some interesting stuff with our images like adding new characters or modifying scenery let's give it a try.\n",
+    "Generating images from text is powerful, but in some cases could need many rounds of prompt refinement to get an image \"just right\".\n",
+    "\n",
+    "Rather than starting from scratch with text each time, image-to-image generation lets us **modify an existing image** to make the specific changes we'd like.\n",
     "\n",
-    "You can use the previously generated image or use a different one to create a base64 string to be passed on as an initial image to the model."
+    "We'll have to pass our initial image in to the API in base64 encoding, so first let's prepare that. You can use either the initial image from the previous section, or a different one if you'd prefer:"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "tags": []
+   },
    "outputs": [],
    "source": [
-    "buffer = io.BytesIO()\n",
-    "image_1.save(buffer, format=\"JPEG\")\n",
-    "img_bytes = buffer.getvalue()\n",
-    "init_image = base64.b64encode(img_bytes).decode()"
+    "def image_to_base64(img) -> str:\n",
+    "    \"\"\"Convert a PIL Image or local image file path to a base64 string for Amazon Bedrock\"\"\"\n",
+    "    if isinstance(img, str):\n",
+    "        if os.path.isfile(img):\n",
+    "            print(f\"Reading image from file: {img}\")\n",
+    "            with open(img, \"rb\") as f:\n",
+    "                return base64.b64encode(f.read()).decode(\"utf-8\")\n",
+    "        else:\n",
+    "            raise FileNotFoundError(f\"File {img} does not exist\")\n",
+    "    elif isinstance(img, Image.Image):\n",
+    "        print(\"Converting PIL Image to base64 string\")\n",
+    "        buffer = io.BytesIO()\n",
+    "        img.save(buffer, format=\"PNG\")\n",
+    "        return base64.b64encode(buffer.getvalue()).decode(\"utf-8\")\n",
+    "    else:\n",
+    "        raise ValueError(f\"Expected str (filename) or PIL Image. Got {type(img)}\")\n",
+    "\n",
+    "\n",
+    "init_image_b64 = image_to_base64(image_1)\n",
+    "print(init_image_b64[:80] + \"...\")"
    ]
   },
   {
@@ -215,7 +258,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "tags": []
+   },
    "outputs": [],
    "source": [
     "change_prompt = \"add some leaves around the dog\""
@@ -225,43 +270,67 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The `generate_image` method also accepts an additional paramter `init_image` which can be used to pass the initial image to the Stable Diffusion model on Bedrock."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "base_64_img_str = model.generate_image(change_prompt, init_image=init_image, seed=321, start_schedule=0.6)"
+    "The existing image is then passed through to the Stable Diffusion model via the `init_image` parameter.\n",
+    "\n",
+    "Again, you can refer to the [Stable Diffusion API docs](https://platform.stability.ai/docs/api-reference#tag/v1generation/operation/imageToImage) for more tips on how to use the different parameters:"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "tags": []
+   },
    "outputs": [],
    "source": [
-    "image_2 = Image.open(io.BytesIO(base64.decodebytes(bytes(base_64_img_str, \"utf-8\"))))"
+    "request = json.dumps({\n",
+    "    \"text_prompts\": (\n",
+    "        [{\"text\": change_prompt, \"weight\": 1.0}]\n",
+    "        + [{\"text\": negprompt, \"weight\": -1.0} for negprompt in negative_prompts]\n",
+    "    ),\n",
+    "    \"cfg_scale\": 10,\n",
+    "    \"init_image\": init_image_b64,\n",
+    "    \"seed\": 321,\n",
+    "    \"start_schedule\": 0.6,\n",
+    "    \"steps\": 50,\n",
+    "    \"style_preset\": style_preset,\n",
+    "})\n",
+    "modelId = \"stability.stable-diffusion-xl\"\n",
+    "\n",
+    "response = boto3_bedrock.invoke_model(body=request, modelId=modelId)\n",
+    "response_body = json.loads(response.get(\"body\").read())\n",
+    "\n",
+    "print(response_body[\"result\"])\n",
+    "image_2_b64_str = response_body[\"artifacts\"][0].get(\"base64\")\n",
+    "print(f\"{image_2_b64_str[0:80]}...\")"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "tags": []
+   },
    "outputs": [],
    "source": [
+    "image_2 = Image.open(io.BytesIO(base64.decodebytes(bytes(image_2_b64_str, \"utf-8\"))))\n",
+    "image_2.save(\"data/image_2.png\")\n",
     "image_2"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Summary\n",
+    "## Summary\n",
+    "\n",
+    "In this lab we demonstrated how to generate new images from text, and transform existing images with text instructions - using [Stable Diffusion XL](https://stability.ai/stablediffusion) on [Amazon Bedrock](https://aws.amazon.com/bedrock/).\n",
+    "\n",
+    "Through the Bedrock API, we can provide a range of parameters to influence image generation which generally correspond to those listed in the [Stable Diffusion API docs](https://platform.stability.ai/docs/api-reference#tag/v1generation).\n",
+    "\n",
+    "One key point to note when using Bedrock is that output image PNG/JPEG data is returned as a [Base64 encoded string](https://en.wikipedia.org/wiki/Base64) within the JSON API response: You can use the Python built-in [base64 library](https://docs.python.org/3/library/base64.html) to decode this image data - for example to save a `.png` file. We also showed that image processing libraries like [Pillow](https://pillow.readthedocs.io/en/stable/) can be used to load (and perhaps edit) the images within Python.\n",
     "\n",
-    "And play around with different prompts to see amazing results."
+    "From here you can explore more advanced image generation options - or combine GenAI with traditional image processing tools - to build the best creative workflow for your use-case."
    ]
   }
  ],
Original file line number	Diff line number	Diff line change
Expand Up		@@ -6,3 +6,4 @@ __pycache__

		# Files generated by the workshop:
		/dependencies
		data/