From 2226f6ece37b623d3751f414ede2ffab1645fee5 Mon Sep 17 00:00:00 2001 From: nothingg24 <41933005+nothingg24@users.noreply.github.com> Date: Wed, 4 Oct 2023 10:01:11 +0700 Subject: [PATCH] add a paper --- README_multimodal.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README_multimodal.md b/README_multimodal.md index 4c0fa704..4add684d 100644 --- a/README_multimodal.md +++ b/README_multimodal.md @@ -32,7 +32,8 @@ If you find this repository useful, please consider citing this list: ## Multi-Modality ### Visual Captioning * General: - * **SAT**: "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention", ICML, 2015. [[paper](https://arxiv.org/abs/1502.03044)] + * **SAT**: "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention", ICML, 2015 (*Université de Montréal*). [[Paper](https://arxiv.org/abs/1502.03044)] + * **SCA-CNN**: "SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning", CVPR, 2017 (*Zhejiang University*). [[Paper](https://arxiv.org/abs/1611.05594v2)][[Code](https://github.com/zjuchenlong/sca-cnn.cvpr17)] * **ETA-Transformer**: "Entangled Transformer for Image Captioning", ICCV, 2019 (*UTS*). [[Paper](https://openaccess.thecvf.com/content_ICCV_2019/html/Li_Entangled_Transformer_for_Image_Captioning_ICCV_2019_paper.html)] * **M2-Transformer**: "Meshed-Memory Transformer for Image Captioning", CVPR, 2020 (*UniMoRE*). [[Paper](https://arxiv.org/abs/1912.08226)][[PyTorch](https://github.com/aimagelab/meshed-memory-transformer)] * **MCCFormers**: "Describing and Localizing Multiple Changes with Transformers", ICCV, 2021 (*AIST*). [[Paper](https://arxiv.org/abs/2103.14146)][[Website](https://cvpaperchallenge.github.io/Describing-and-Localizing-Multiple-Change-with-Transformers/)]