Skip to content

Commit

Permalink
Merge pull request #551 from GoogleCloudPlatform/fix_description_im_c…
Browse files Browse the repository at this point in the history
…aption

Removed unnecessary description from image_captioning.ipynb
  • Loading branch information
takumiohym authored Dec 6, 2024
2 parents 8e50cd1 + d438821 commit c2bc48e
Show file tree
Hide file tree
Showing 2 changed files with 1 addition and 3 deletions.
3 changes: 1 addition & 2 deletions notebooks/multi_modal/labs/image_captioning.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
"Image captioning models take an image as input, and output text. Ideally, we want the output of the model to accurately describe the events/things in the image, similar to a caption a human might provide. <br>\n",
"For example, given an image like the example below, the model is expected to generate a caption such as *\"some people are playing baseball.\"*.\n",
"\n",
"<div><img src=\"./sample_images/baseball.jpeg\" width=\"500\"></div>\n",
"<div><img src=\"../sample_images/baseball.jpeg\" width=\"500\"></div>\n",
"\n",
"In order to generate text, we will build an encoder-decoder model, where the encoder output embedding of an input image, and the decoder output text from the image embedding<br>\n",
"\n",
Expand Down Expand Up @@ -512,7 +512,6 @@
"* $h_s$ is the sequence of encoder outputs being attended to (the attention \"key\" and \"value\" in transformer terminology).\n",
"* $h_t$ is the decoder state attending to the sequence (the attention \"query\" in transformer terminology).\n",
"* $c_t$ is the resulting context vector.\n",
"* $a_t$ is the final output combining the \"context\" and \"query\".\n",
"\n",
"The equations:\n",
"\n",
Expand Down
1 change: 0 additions & 1 deletion notebooks/multi_modal/solutions/image_captioning.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -668,7 +668,6 @@
"* $h_s$ is the sequence of encoder outputs being attended to (the attention \"key\" and \"value\" in transformer terminology).\n",
"* $h_t$ is the decoder state attending to the sequence (the attention \"query\" in transformer terminology).\n",
"* $c_t$ is the resulting context vector.\n",
"* $a_t$ is the final output combining the \"context\" and \"query\".\n",
"\n",
"The equations:\n",
"\n",
Expand Down

0 comments on commit c2bc48e

Please sign in to comment.