This repository contains the implementation of COVE, introduced in the NAACL 2025 paper: "COVE: COntext and VEracity prediction for out-of-context images. The code is released under an Apache 2.0 license.
Contact person: Jonathan Tonglet
Don't hesitate to send us an e-mail or report an issue, if something is broken (and it shouldn't be) or if you have further questions.
- Our paper is accepted to NAACL 2025 Main Conference! See you in Albuquerque ๐ต
Images taken out of their context are the most prevalent form of multimodal misinformation. Debunking them requires (1) providing the true context of the image and (2) checking the veracity of the image's caption. However, existing automated fact-checking methods fail to tackle both objectives explicitly. In this work, we introduce COVE, a new method that predicts first the true COntext of the image and then uses it to predict the VEracity of the caption. COVE beats the SOTA context prediction model on all context items, often by more than five percentage points. It is competitive with the best veracity prediction models on synthetic data and outperforms them on real-world data, showing that it is beneficial to combine the two tasks sequentially. Finally, we conduct a human study that reveals that the predicted context is a reusable and interpretable artifact to verify new out-of-context captions for the same image.
Follow these instructions to recreate the environment used for all our experiments.
$ conda create --name COVE python=3.9
$ conda activate COVE
$ pip install -r requirements.txt
$ python -m spacy download en_core_web_lg
5Pils-OOC is a test dataset for out-of-context misinformation detection, derived from 5Pils dataset. It contains 624 images, each paired with an accurate caption and an out-of-context caption. The dataset can be accessed in data/5pils-ooc/test.json. Instructions to download the images are the same as for 5Pils, and are explained in the next section. Like 5Pils, the dataset is released under a CC-BY-SA-4.0 license.
-
Follow the instructions on the NewsCLIPpings repo to download the dataset. Place the data under data/newsclippings/
-
Follow the instructions on the CCN repo to download the reverse image search and direct search evidence. Place the reverse image search results under data/newsclippings/evidence/reverse_image_search/ and place the direct search results under data/newsclippings/evidence/direct_search/
-
Finally, run the following script to prepare the data
$ python src/prepare_newsclippings.py
-
Follow the instructions on the 5Pils repo to download the images of 5Pils. Place the images under data/5pils-ooc/processed_img/
-
If you face issues downloading the images, please contact [email protected]
-
Download the evidence images and the corresponding webpage content using the following script
$ python src/prepare_direct_search_5pils_ooc.py
COVE consist of 6 steps, the first 3 focusing on the collection of a diverse set of evidence using the Google Vision API, an index of Wikipedia entities, and captions generated by the MLLM LlavaNext.
The results of the first 3 steps for the 5Pils-OOC dataset is stored in results/intermediate/context_input_5pils_ooc_test.csv. Below, we show how to perform context prediction and veracity prediction based on the gathered evidence. Scripts to collect evidence are explained below in the section Evidence collection
$ python src/context_llama3.py --dataset 5pils_ooc --split test
$ python src/knowledge_gap_completion.py --dataset 5pils_ooc --split test
To perform knowledge gap completion, you need to launch WikiChat in the background as explained on the WikiChat github.
$ python src/veracity_llama3.py --dataset 5pils_ooc --split test --knowledge_gap_completion 1
Finally, evaluate the performance on context and veracity prediction.
$ python src/evaluate.py --results_file 5pils_ooc_test.csv --dataset 5pils_ooc --split test --geonames_username your_user_name
Evaluation of Location requires ๐ GeoNames. You will need to create a (free) account and provide your account name as input.
The following scripts allow to collect evidence as explained in the paper. Some of these steps require the use of external APIs or to download files from other repos.
$ python src/object_detection.py --google_vision_api_key your_api_key --dataset newsclippings --split val
COVE relies on object detection using the Google Vision API. This step requires a Google Cloud account.
Compute embeddings for the images of the dataset and the web images retrieved as evidence
$ python src/get_image_embeddings.py --dataset newsclippings --split val
Compute the similarity scores between the images of the dataset and web images retrieved as evidence. Used in veracity rules afterwards.
$ python src/compute_direct_search_similarity_scores.py --dataset newsclippings --split val
Start by downloading the list of 6M Wiki entities from the OVEN github (6 Million Wikipedia Text Information (Text Only 419M))
Then, create the index (the index takes about ~10GB of storage space)
$ python src/generate_oven_index.py
Finally, retrieve the top k entities in the index matching an input image
$ python src/get_oven_entities.py --dataset newsclippings --split val
Generate embeddings for the Wikipedia entities detected in the caption or in the OVEN index
$ python src/get_wiki_image_embeddings.py --dataset newsclippings --split val
Then, pre-compute similarity scores for the FAC and PRODUCT entities
$ python src/compute_wikipedia_entity_similarity_scores.py --dataset newsclippings --split val
Generate captions describing the entire image, and captions specific to the detected objects
$ python src/automated_captioning.py --dataset newsclippings --split val
Assemble all evidence together and save the results as a csv file ready for context prediction
$ python src/prepare_evidence.py --dataset newsclippings --split val
If you find this work relevant to your research or use the 5Pils-OOC dataset or this code in your work, please cite our paper as follows:
@article{tonglet2025cove,
title={COVE: COntext and VEracity prediction for out-of-context images},
author={Tonglet, Jonathan and Thiem, Gabriel and Gurevych, Iryna},
journal={arXiv preprint arXiv:2502.01194},
year={2025},
url={https://arxiv.org/abs/2502.01194}
}
For 5Pils-OOC, please also cite:
@inproceedings{tonglet-etal-2024-image,
title = "{``}Image, Tell me your story!{''} Predicting the original meta-context of visual misinformation",
author = "Tonglet, Jonathan and
Moens, Marie-Francine and
Gurevych, Iryna",
editor = "Al-Onaizan, Yaser and
Bansal, Mohit and
Chen, Yun-Nung",
booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
month = nov,
year = "2024",
address = "Miami, Florida, USA",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.emnlp-main.448",
pages = "7845--7864",
}
This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.