diff --git a/notebooks/multi_modal/labs/image_captioning.ipynb b/notebooks/multi_modal/labs/image_captioning.ipynb
index f4200b52..0b26ec89 100644
--- a/notebooks/multi_modal/labs/image_captioning.ipynb
+++ b/notebooks/multi_modal/labs/image_captioning.ipynb
@@ -13,7 +13,7 @@
"Image captioning models take an image as input, and output text. Ideally, we want the output of the model to accurately describe the events/things in the image, similar to a caption a human might provide.
\n",
"For example, given an image like the example below, the model is expected to generate a caption such as *\"some people are playing baseball.\"*.\n",
"\n",
- "
\n",
+ "\n",
"\n",
"In order to generate text, we will build an encoder-decoder model, where the encoder output embedding of an input image, and the decoder output text from the image embedding
\n",
"\n",
@@ -512,7 +512,6 @@
"* $h_s$ is the sequence of encoder outputs being attended to (the attention \"key\" and \"value\" in transformer terminology).\n",
"* $h_t$ is the decoder state attending to the sequence (the attention \"query\" in transformer terminology).\n",
"* $c_t$ is the resulting context vector.\n",
- "* $a_t$ is the final output combining the \"context\" and \"query\".\n",
"\n",
"The equations:\n",
"\n",
diff --git a/notebooks/multi_modal/solutions/image_captioning.ipynb b/notebooks/multi_modal/solutions/image_captioning.ipynb
index 3ead05b7..6836088f 100644
--- a/notebooks/multi_modal/solutions/image_captioning.ipynb
+++ b/notebooks/multi_modal/solutions/image_captioning.ipynb
@@ -668,7 +668,6 @@
"* $h_s$ is the sequence of encoder outputs being attended to (the attention \"key\" and \"value\" in transformer terminology).\n",
"* $h_t$ is the decoder state attending to the sequence (the attention \"query\" in transformer terminology).\n",
"* $c_t$ is the resulting context vector.\n",
- "* $a_t$ is the final output combining the \"context\" and \"query\".\n",
"\n",
"The equations:\n",
"\n",