-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Added README.md about duplicate lists
- Loading branch information
Showing
1 changed file
with
21 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
CIFAR Duplicates | ||
================ | ||
|
||
The files in this directory contain lists of duplicate image pairs found in the [CIFAR-10 and CIFAR-100 datasets][1]. | ||
|
||
[`duplicates_cifar10.csv`](duplicates_cifar10.csv) and [`duplicates_cifar100.csv`](duplicates_cifar100.csv) list images from the test sets that have near-duplicates in the training set. | ||
The columns of these CSV files have the following meaning: | ||
|
||
- `TestID`: Index of the test image in the original CIFAR dataset (counting from 0). | ||
- `TrainID`: Index of the training image in the original CIFAR dataset (counting from 0). | ||
- `Distance`: The Euclidean distance between these two images in the L2-normalized CNN feature space. | ||
- `Judgment`: Indicates the type of duplicate (assigned by manual annotation): | ||
- `0` = **exact duplicate**: Almost all pixels in the two images are approximately identical. | ||
- `1` = **near-duplicate**: The content of the images is exactly the same, i.e., both originated from the same camera shot. However, different post-processing might have been applied to this original scene, e.g., color shifts, translations, scaling etc. | ||
- `2` = **very similar**: The contents of the two images are different, but highly similar, so that the difference can only be spotted at the second glance. | ||
|
||
On the other hand, [`duplicates_cifar10_test.csv`](duplicates_cifar10_test.csv) and [`duplicates_cifar100_test.csv`](duplicates_cifar100_test.csv) list duplicate image pairs within the test set. | ||
The structure is identical to that of the other two files, but the column `TrainID` now also refers to images in the test set. | ||
|
||
|
||
[1]: https://www.cs.toronto.edu/~kriz/cifar.html |