From 2226f6ece37b623d3751f414ede2ffab1645fee5 Mon Sep 17 00:00:00 2001
From: nothingg24 <41933005+nothingg24@users.noreply.github.com>
Date: Wed, 4 Oct 2023 10:01:11 +0700
Subject: [PATCH] add a paper

---
 README_multimodal.md | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/README_multimodal.md b/README_multimodal.md
index 4c0fa704..4add684d 100644
--- a/README_multimodal.md
+++ b/README_multimodal.md
@@ -32,7 +32,8 @@ If you find this repository useful, please consider citing this list:
 ## Multi-Modality
 ### Visual Captioning
 * General:
-    * **SAT**: "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention", ICML, 2015. [[paper](https://arxiv.org/abs/1502.03044)] 
+    * **SAT**: "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention", ICML, 2015 (*Université de Montréal*). [[Paper](https://arxiv.org/abs/1502.03044)]
+    * **SCA-CNN**: "SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning", CVPR, 2017 (*Zhejiang University*). [[Paper](https://arxiv.org/abs/1611.05594v2)][[Code](https://github.com/zjuchenlong/sca-cnn.cvpr17)]
     * **ETA-Transformer**: "Entangled Transformer for Image Captioning", ICCV, 2019 (*UTS*). [[Paper](https://openaccess.thecvf.com/content_ICCV_2019/html/Li_Entangled_Transformer_for_Image_Captioning_ICCV_2019_paper.html)]
     * **M2-Transformer**: "Meshed-Memory Transformer for Image Captioning", CVPR, 2020 (*UniMoRE*). [[Paper](https://arxiv.org/abs/1912.08226)][[PyTorch](https://github.com/aimagelab/meshed-memory-transformer)] 
     * **MCCFormers**: "Describing and Localizing Multiple Changes with Transformers", ICCV, 2021 (*AIST*). [[Paper](https://arxiv.org/abs/2103.14146)][[Website](https://cvpaperchallenge.github.io/Describing-and-Localizing-Multiple-Change-with-Transformers/)]