- [2015 ICCV] VQA: Visual Question Answering, [paper], [bibtex], [homepage].
- [2016 NIPS] Hierarchical Question-Image Co-Attention for Visual Question Answering, [paper], [bibtex], sources: [karunraju/VQA], [jiasenlu/HieCoAttenVQA].
- [2016 ICML] Dynamic Memory Networks for Visual and Textual Question Answering, [paper], [bibtex], [blog], sources: [therne/dmn-tensorflow], [barronalex/Dynamic-Memory-Networks-in-TensorFlow], [ethancaballero/Improved-Dynamic-Memory-Networks-DMN-plus], [dandelin/Dynamic-memory-networks-plus-Pytorch], [DeepRNN/visual_question_answering].
- [2016 CVPR] Stacked Attention Networks for Image Question Answering, [paper], [bibtex], sources: [zcyang/imageqa-san].
- [2016 CVPR] Neural Module Networks, [paper], [bibtex].
- [2016 EMNLP] Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding, [paper], [bibtex], sources: [akirafukui/vqa-mcb], [Cadene/vqa.pytorch], [MarcBS/keras].
- [2017 CVPR] CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning, [paper], [bibtex], sources: [facebookresearch/clevr-dataset-gen].
- [2018 ECCV] Visual Question Answering as a Meta Learning Task, [paper], [bibtex].
- [2018 CVPR] Visual Grounding via Accumulated Attention, [paper], [bibtex].
- [2018 CVPR] Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering, [paper], [bibtex], sources: [peteanderson80/bottom-up-attention], [hengyuan-hu/bottom-up-attention-vqa], [LeeDoYup/bottom-up-attention-tf].
- [2019 AAAI] Dynamic Capsule Attention for Visual Question Answering, [paper], [bibtex].
- [2019 AAAI] BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection, [paper], [bibtex], sources: [Cadene/block.bootstrap.pytorch].
- [2019 ACL] Multi-grained Attention with Object-level Grounding for Visual Question Answering, [paper], [bibtex].
- [2019 EMNLP] B2T2: Fusion of Detected Objects in Text for Visual Question Answering, [paper], [bibtex], sources: [google-research/language/language/question_answering/b2t2/].
- [2019 ICCV] Multi-modality Latent Interaction Network for Visual Question Answering, [paper], [bibtex].
- [2019 CVPR] Towards VQA Models That Can Read, [paper], [bibtex], sources: [facebookresearch/pythia].
- [2019 CVPR] Learning to Compose Dynamic Tree Structures for Visual Contexts, [paper], [bibtex], sources: [KaihuaTang/VCTree-Scene-Graph-Generation].
- [2019 CVPR] GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering, [paper], [bibtex], [homepage].
- [2020 AAAI] ManyModalQA: Modality Disambiguation and QA over Diverse Inputs, [paper], [bibtex], sources: [hannandarryl/ManyModalQA].
- [2020 ACL] Multimodal Neural Graph Memory Networks for Visual Question Answering, [paper], [bibtex].
- [2020 CVPR] VQA with No Questions-Answers Training, [paper], [bibtex], sources: [benyv/uncord].
- [2017 CVPR] Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering, [paper], [bibtex], [homepage].
- [2017 ICCV] Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization, [paper], [bibtex], sources: [ramprs/grad-cam].
- [2018 CVPR] Don’t Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering, [paper], [bibtex], [homepage], sources: [AishwaryaAgrawal/GVQA].
- [2018 NeurIPS] Overcoming Language Priors in Visual Question Answering with Adversarial Regularization, [paper], [bibtex].
- [2019 NAACL] Adversarial Regularization for Visual Question Answering: Strengths Shortcomings and Side Effects, [paper], [bibtex].
- [2019 SIGIR] Quantifying and Alleviating the Language Prior Problem in Visual Question Answering, [paper], [bibtex], sources: [guoyang9/vqa-prior].
- [2019 EMNLP] Don't Take the Easy Way Out: Ensemble Based Methods for Avoiding Known Dataset Biases, [paper], [bibtex], sources: [chrisc36/debias].
- [2019 CVPR] Explicit Bias Discovery in Visual Question Answering Models, [paper], [bibtex].
- [2019 NeurIPS] RUBi: Reducing Unimodal Biases for Visual Question Answering, [paper], [bibtex], sources: [cdancette/rubi.bootstrap.pytorch].
- [2019 NeurIPS] Self-Critical Reasoning for Robust Visual Question Answering, [paper], [bibtex], sources: [jialinwu17/self_critical_vqa].
- [2019 ICCV] Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded, [paper], [bibtex].
- [2021 AAAI] Regularizing Attention Networks for Anomaly Detection in Visual Question Answering, [paper], [bibtex], sources: [LeeDoYup/Anomaly_Detection_VQA].
- [2020 CVPR] Counterfactual Samples Synthesizing for Robust Visual Question Answering, [paper], [bibtex], sources: [yanxinzju/CSS-VQA].
- [2020 CVPR] Counterfactual Vision and Language Learning, [paper], [bibtex].
- [2020 CVPR] Towards Causal VQA: Revealing and Reducing Spurious Correlations by Invariant and Covariant Semantic Editing, [paper], [bibtex], [homepage], sources: [AgarwalVedika/CausalVQA].
- [2020 ECCV] Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision, [paper], [bibtex].
- [2020 EMNLP] Learning to Contrast the Counterfactual Samples for Robust Visual Question Answering, [paper], [bibtex].
- [2021 CVPR] Counterfactual VQA: A Cause-Effect Look at Language Bias, [paper], [bibtex].
- [2021 CVPR] Causal Attention for Vision-Language Tasks, [paper], [bibtex], [supplementary], sources: [yangxuntu/lxmertcatt].
- [2018 CVPR] Referring Relationships, [paper], [bibtex], [homepage], sources: [StanfordVL/ReferringRelationships].
- [2019 ACL] A Corpus for Reasoning About Natural Language Grounded in Photographs, [paper], [bibtex], [homepage].
- [2019 ICCV] Dynamic Graph Attention for Referring Expression Comprehension, [paper], [bibtex], sources: [sibeiyang/sgmn].
- [2019 CVPR] From Recognition to Cognition: Visual Commonsense Reasoning, [paper], [bibtex], [homepage], [leaderboard], [dataset], sources: [rowanz/r2c].
- [2019 NeurIPS] TAB-VCR: Tags and Attributes based VCR Baselines, [paper], [bibtex], [slides], [homepage], sources: [Deanplayerljx/tab-vcr].
- [2020 CVPR] Graph-Structured Referring Expression Reasoning in The Wild, [paper], [bibtex], sources: [sibeiyang/sgmn].
- [2021 WACV] Meta Module Network for Compositional Visual Reasoning, [paper], [bibtex], sources: [wenhuchen/Meta-Module-Network].
- [2021 AAAI] Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding, [paper], [bibtex], sources: [ChopinSharp/ref-nms].
- [2021 ArXiv] VL-NMS: Breaking Proposal Bottlenecks in Two-Stage Visual-Language Matching, [paper], [bibtex].
- [2020 CVPR] Two Causal Principles for Improving Visual Dialog, [paper], [bibtex], sources: [simpleshinobu/visdial-principles].