Skip to content

Logo Dataset (2022-01-21)

Compare
Choose a tag to compare
@mithridatea mithridatea released this 21 Jan 15:54
· 227 commits to develop since this release
969751e

This dataset contains annotated logos detected by the universal logo detector model. This dataset can be used to evaluate the performance of models in metric learning settings.

The dataset contains 374 categories and 64489 images. All logos were manually annotated by contributors using Hunger Games.

Annotated logos from DB were first extracted from the PostgreSQL DB using the query in extract_logo.sql. Then, logos were filtered and grouped into categories using the create_logo_dataset.py script.
As postprocessing, the clean_logo_dataset.py script was run, to rename categories and delete inconsistent categories.
Finally, a (brief) manual inspection was done to remove inconsistent logos (especially for logos of the label type). brand_Carrefour logos were split manually between brand_Carrefour and brand_Carrefour_text (text-only logos).

The logo_dataset.tar.gz file being above the 2GB file size limit, the archive was split using the split command.
To merge the splits, use:

cat logo_dataset.tar.gz.* > logo_dataset.tar.gz.

Finally, train, val and test splits were generated (with the split_train_test.py script) with the following ratio: 0.8 for train, 0.1 for val and 0.1 for test. The split procedure went as follow:

  • If the category contains >= 50 logos, split the logos between train, val and tests with the respect to the split ratios.
  • Otherwise assign the category to a split with respect with the split ratios.

The generated splits are in train.txt, val.txt, test.txt.