- Reading Resources: [TheShadow29/awesome-grounding].
- Reading Resources: [chingyaoc/awesome-vqa].
- [2019 ArXiv] Multimodal Intelligence: Representation Learning Information Fusion and Applications, [paper], [bibtex].
- [2020 JAIR] Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods, [paper], [bibtex].
- [2019 IJCAI] Adapting BERT for Target-Oriented Multimodal Sentiment Classification, [paper], [bibtex], sources: [jefferyYu/TomBERT].
- [2019 CVPR] Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression, [paper], [bibtex], [homepage], sources: [generalized-iou].
- [2020 CVPR] Visual Grounding in Video for Unsupervised Word Translation, [paper], [bibtex], sources: [gsig/visual-grounding].