In this project, we used pytorch in order to implement a Convolutional Neural Network (CNN) for the purpose of extracting features in the problem of image forgery detection. This approach is inspired by the work of Y. Rao et al. A Deep Learning Approach to Detection of Splicing and Copy-Move Forgeries in Images. Following the feature fusion, proposed in the same paper, we take the extracted features and give them as input to an SVM that performs the final binary classification task. The SVM implementation was taken from scikit-learn. The datasets used in this project are the CASIA2 and the NC2016 datasets. This study was conducted as a final project of TU Delft's course CS4180 Deep Learning 2019 by Group 10.
The pipeline of the system is:
- Train the CNN with image patches close to the distribution of the images that the network will work on. The training patches contain both tampered and untampered regions from the corresponding images.
- Extract features from unseen images by breaking them into patches and applying feature fusion after the final convolutional layer of the network.
- Use an SVM classifier on the 400 extracted features of the previous step for the final classification.
The high-level pipeline is shown in the following image:
The CNN architecture of this project is shown in the image below and is influenced by the work of Y. Rao et al. The network structure is 2 convolutions, max pooling, 4 convolutions, max pooling and then 3 convolutions. In the training phase, after the final convolution, a fully connected layer with softmax is applied. In the testing phase, the 400-D output of the final convolutional layer is used in the next Feature Fusion step that creates the feature vectors.
In order to create a feature representation of an image during the test phase, k patches are extracted and passed through the network. After this procedure, k 400-D feature maps are being exported. These feature maps are fused into one feature vector for each image either using max or mean fusion.
For the final part of the pipeline an SVM classifier is trained and tested using the 400-D representations from the previous step. In particular, we use stratified 10-fold cross-validation to obtain an unbiased error estimate.
The accuracy and cross-entropy loss per epoch during the CNN training for the two datasets is shown below:
The SVM classification accuracy on both datasets after the 10-fold cross-validation is presented in the table below:
Dataset | Accuracy |
---|---|
CASIA2 | 96.82% ± 1.19% |
NC2016 | 84.89% ± 6.06% |
For more detailed information feel free to take a look at our project report.
The structure of the project is:
-
data
Here lay all the data files related to the project. The CASIA2 and NC16 folders are empty because GitHub does not allow files of such size.output
In this folder we have all the outputs of the pipeline.accuracy
CSVs containing the accuracy per epoch in all our runs.features
CSVs containing the final feature representations of every image after the feature fusion part. To minimize the repo size we only maintained two feature files (one per dataset) as an example.loss_function
CSVs containing the loss per epoch in all our runs.pre_trained_cnn
Pt files that contain the trained CNNs of all our runs.
-
reports
Final report of the project that contains more details on the implementation. -
src
Source folder of the project. Here we give examples on how to run every part of the pipeline.classification
Folder containing the SVM code.cnn
Folder containing the CNN code.feature_fusion
Folder containing the code used for the feature fusion.patch_extraction
Folder containing the code used for the patch extraction.plots
Folder containing the code used for the plots that we generated.