This is the official repository of the paper: Can ChatGPT Detect DeepFakes? A Study of Using Multimodal Large Language Models for Media Forensics
Shan Jia, Reilin Lyu, Kangran Zhao, Yize Chen, Zhiyuan Yan, Yan Ju, Chuanbo Hu, Xin Li, Baoyuan Wu, Siwei Lyu
In this work, we investigate the capabilities of multimodal large language models (LLMs) in DeepFake detection. We conducted qualitative and quantitative experiments to demonstrate multimodal LLMs and show that they can expose AI-generated images through careful experimental design and prompt engineering. This is interesting, considering that LLMs are not inherently tailored for media forensic tasks, and the process does not require programming. We discuss the limitations of multimodal LLMs for these tasks and suggest possible improvements.
Two multimodal LLMs have been evaluated: GPT4V and Gemini 1.0 Pro.
The dataset used in this study can be downloaded from the following link, which contains 1,000 StyleGAN2 generated face images, 1,000 Latent Diffusion generated images, and 1,000 real faces from FFHQ dataset, drived from DF^3 dataset. Both raw data and post-proccessed (pped) data have been provided.
The test data has the following structure:
Test_data
|--Real_512Size
|--StyleGAN_raw_512size
|--StyleGAN_pped_256size
|--LD_raw_256Size
|--LD_pped_512Size
We'll make all responses from two multimodal LLMs upon the paper’s acceptance.
Method | Raw SG2 | Raw LD | Pped SG2 | Pped LD |
---|---|---|---|---|
CNN-aug | 96.5 | 58.6 | 53.2 | 52.4 |
GAN-DCT | 53.4 | 75.4 | 44.4 | 56.0 |
Nodown | 99.6 | 97.1 | 47.4 | 44.9 |
BeyondtheSpectrum | 98.1 | 77.3 | 45.4 | 46.9 |
PSM | 99.2 | 82.5 | 73.1 | 71.3 |
GLFF | 97.5 | 86.7 | 80.6 | 79.4 |
Gemini 1.0 (zero-shot) | 76.6 | 75.1 | 77.5 | 81.5 |
GPT4V (zero-shot) | 77.2 | 79.5 | 88.7 | 89.8 |
The following figure shows examples of GPT4V for DeepFake face detection. Left: Results for AI-generated images. Right: Results for real faces. The responses for AI-generated faces are labeled in pink, while for the real faces are labeled in green. Both success (w/ marks) and failure (w/ crosses) cases are shown. See paper for details.
@misc{jia2024chatgpt,
title={Can ChatGPT Detect DeepFakes? A Study of Using Multimodal Large Language Models for Media Forensics},
author={Shan Jia and Reilin Lyu and Kangran Zhao and Yize Chen and Zhiyuan Yan and Yan Ju and Chuanbo Hu and Xin Li and Baoyuan Wu and Siwei Lyu},
year={2024},
eprint={2403.14077},
archivePrefix={arXiv},
}