Logo Dataset (2022-01-21)
This dataset contains annotated logos detected by the universal logo detector model. This dataset can be used to evaluate the performance of models in metric learning settings.
The dataset contains 374 categories and 64489 images. All logos were manually annotated by contributors using Hunger Games.
Annotated logos from DB were first extracted from the PostgreSQL DB using the query in extract_logo.sql
. Then, logos were filtered and grouped into categories using the create_logo_dataset.py
script.
As postprocessing, the clean_logo_dataset.py
script was run, to rename categories and delete inconsistent categories.
Finally, a (brief) manual inspection was done to remove inconsistent logos (especially for logos of the label
type). brand_Carrefour
logos were split manually between brand_Carrefour
and brand_Carrefour_text
(text-only logos).
The logo_dataset.tar.gz
file being above the 2GB file size limit, the archive was split using the split
command.
To merge the splits, use:
cat logo_dataset.tar.gz.* > logo_dataset.tar.gz
.
Finally, train, val and test splits were generated (with the split_train_test.py
script) with the following ratio: 0.8 for train, 0.1 for val and 0.1 for test. The split procedure went as follow:
- If the category contains >= 50 logos, split the logos between train, val and tests with the respect to the split ratios.
- Otherwise assign the category to a split with respect with the split ratios.
The generated splits are in train.txt
, val.txt
, test.txt
.