Is there a way to use this framework for identifying image manipulation; without the treatment and control setting? #2

codepujan · 2023-03-27T21:42:22Z

Great work with the overall framework and setting up the benchmark.
The control and treatment setting confused me a lot.
Let's say I have a bunch of source images, and a pool of images that are manipulated versions of the images, with the corresponding ground truth for each source image.
For every source image, I want to benchmark the different algorithms on identifying the correct set of manipulated images. Is there a proper way to go about this?
Thanks

chiffa · 2023-03-28T09:59:22Z

Yes - set the database, run the perturbation generator and then replace the generated folder with images from your manipulated dataset. At this point you can follow the steps outlined in the README to get the statistics and the figures.

Hope that helps!

Cyrilvallez · 2023-03-28T10:11:20Z

Sure, in your case you would first need to make sure that your pool of images that are manipulated versions of the source images are named with the following convention: source-name_attackID.extension. I.e, if your source images are named ("img1.extension", "img2.extension",....), use your groundtruths to rename the manipulated pool of images to ("img1_manipulation1.extension", "img1_manipulation2.extension", "img2_manipulation1.extension", "img3_manipulation14"...). Note that the names themselves should never contain underscores ("_"), as it is used to separate the original name and the manipulation name.
It is not important what you call the manipulations or even that you should be consistent with the numbering, the important part is that the name of the original image appears before the underscore (i.e. "img1_nxozhe.extension" will still work as the name of the original is before the underscore).

Then you would do:

import numpy as np
import hashing

path_database = 'path/to/source/images/'
path_dataset = 'path/to/pool/of/manipulated/images/'

dataset = hashing.create_dataset(path_dataset, existing_attacks=True)

algos = [
        # declare all algos you want
        hashing.ClassicalAlgorithm('Phash', hash_size=8),
        hashing.FeatureAlgorithm('ORB', n_features=30),
        hashing.NeuralAlgorithm('SimCLR v1 ResNet50 2x', device='cuda', distance='Jensen-Shannon')
        ]

thresholds = [
       # declare all thresholds you want
       np.linspace(0, 0.4, 20),
       np.linspace(0, 0.3, 20),
       np.linspace(0.3, 0.8, 20),
       ]

# Creates the databases for each algos and record the time it took
databases, time_database = hashing.create_databases(algos, path_database)

general_output, image_wise_output, running_time = hashing.hashing(algos, thresholds, databases,
 dataset, artificial_attacks=False)

The general_output will give you the overall number of images in your pool of manipulated images that triggered a match in the database, and the image_wise_output will give you a more detailed version where you can get the detailed number of correct/incorrect detection for each image in the database (source images).

However, this version will only provide you with true positives/false negatives (as I think this is what you are interested in). If you are also interested in true negatives/false positives, you will still need to split your images in experimental and control groups, as we did in our benchmarks.

codepujan · 2023-03-29T14:40:03Z

Thank you for the quick response @Cyrilvallez . The setup works in the case of above snippet. I am interested to identify true negatives, and false positives as well. In that case, I am still finding it difficult to understand the split of experimental / control groups ( I even read the paper) used in the benchmark.
Revisiting the setup ; I have source images (s1,...s10). And a bunch of images that are manipulation of each source. And there are a set of images that are noise and not related to any of the source images at all. How would I go about dividing the experimental and control groups in this setting?
Thank you for the help.

Cyrilvallez · 2023-03-29T17:57:56Z

Well then the experimental group would be the manipulations of the source images (images that are supposed to be detected), and the control would be the noisy images (images not supposed to be detected). However, if both groups do not contain the exact same number of images, be aware that the statistics computed (accuracy, precision, etc...) are NOT normalized against the number of images in each group (as we always used the same number of images in each group). Thus it can be misleading in the case of (large) imbalance between both groups.

Using this setup, you can simply follow all steps of the README using the experimental and control groups defined above as positive_dataset and negative_dataset respectively:

positive_dataset = hashing.create_dataset('path/to/experimental', existing_attacks=True)
negative_dataset = hashing.create_dataset('path/to/control', existing_attacks=True)

codepujan closed this as completed Mar 31, 2023

Cyrilvallez added the good first issue Good for newcomers label Apr 1, 2023

Cyrilvallez pinned this issue Apr 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a way to use this framework for identifying image manipulation; without the treatment and control setting? #2

Is there a way to use this framework for identifying image manipulation; without the treatment and control setting? #2

codepujan commented Mar 27, 2023

chiffa commented Mar 28, 2023

Cyrilvallez commented Mar 28, 2023 •

edited

Loading

codepujan commented Mar 29, 2023

Cyrilvallez commented Mar 29, 2023 •

edited

Loading

Is there a way to use this framework for identifying image manipulation; without the treatment and control setting? #2

Is there a way to use this framework for identifying image manipulation; without the treatment and control setting? #2

Comments

codepujan commented Mar 27, 2023

chiffa commented Mar 28, 2023

Cyrilvallez commented Mar 28, 2023 • edited Loading

codepujan commented Mar 29, 2023

Cyrilvallez commented Mar 29, 2023 • edited Loading

Cyrilvallez commented Mar 28, 2023 •

edited

Loading

Cyrilvallez commented Mar 29, 2023 •

edited

Loading