DIRE for Diffusion-Generated Image Detection

Zhendong Wang, Jianmin Bao, Wengang Zhou, Weilun Wang, Hezhen Hu, Hong Chen, Houqiang Li ICCV 2023

Summary

This paper, seeks to build a detector for telling apart real images from diffusion generated images by proposing a novel image representation called DIffusion Reconstruction Error (DIRE), which measures the error between an input image and its reconstruction counterpart by a pre-trained diffusion model. The hypothesis behind DIRE is the observation that images produced by diffusion processes can be reconstructed more accurately by a pre-trained diffusion model compared to real images.

Contributions

Proposed a novel image representation called DIRE for detecting diffusion-generated images.
Set up a new dataset, DiffusionForensics (including three-domain images (LSUN-Bedroom, ImageNet and CelebA-HQ)) generated by eleven different diffusion models for benchmarking the diffusion-generated image detectors.

Method

Given an input image x₀ to judge whether it is generated by diffusion models, we take a pre-trained diffusion model and apply the DDIM inversion process to gradually add Gaussian noise into x₀. Then the DDIM generation process is employed to reconstruct the input image and produces a recovered version x'₀. Then the DIRE is defined as:

$$ DIRE(x_{0}) = |x_{0} - x'_{0}| $$

Illustration of the difference between a real sample and a generated sample

p_g(x) represents the distribution of generated images while p_r(x) represents the distribution of real images. x_g and x_r represent a generated sample and a real sample, respectively. Using the inversion and reconstruction process of DDIM x_g and x_r become x'_g and x′_r , respectively.

As a sample x_g from the generated distribution p_g(x) and its reconstruction x′_g belong to the same distribution, the DIRE value for x_g would be relatively low. Conversely, the reconstruction of a real image x_r is likely to differ significantly from itself, resulting in a high amplitude in DIRE.

Thus for real images and diffusion-generated images, we get their DIRE representations and train a binary classifier to distinguish their DIREs using binary crossentropy loss.

Results

DIRE with a binary classifier significantly outperformed existing classifiers including CNNDetection, GANDetection, SBI, PatchForensics, F3Net at detecting -
- Diffusion generated bedroom images
- Diffusion generated face images
- Generated ImageNet images
- GAN-generated bedroom images
The robustness of detectors is checked in two-class degradations, Gaussian blur and JPEG compression, DIRE gets a perfect performance without performance drop.
Other methods of input also checked against DIRE were RGB images, reconstructed images (REC), and the combination of RGB and DIRE (RGB&DIRE). Using just DIRE as input achieved significantly higher accuracy

Two-Cents

The proposed image representation DIRE contributes to a novel, accurate and robust detector, outperforming current SOTA detection models extensively.

Resources

Paper
Implementation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DIRE.md

DIRE.md

DIRE for Diffusion-Generated Image Detection

Summary

Contributions

Method

Results

Two-Cents

Resources

Files

DIRE.md

Latest commit

History

DIRE.md

File metadata and controls

DIRE for Diffusion-Generated Image Detection

Summary

Contributions

Method

Results

Two-Cents

Resources