Deep Learning on Small Datasets without Pre-Training using Cosine Loss

This document explains how the code in this repository can be used to produce the results reported in the following paper:

Deep Learning on Small Datasets without Pre-Training using Cosine Loss.
Björn Barz and Joachim Denzler.
IEEE Winter Conference on Applications of Computer Vision (WACV), 2020.

1. Results

According to Table 2 in the paper:

Loss Function	CUB	NAB	Cars	Flowers	MIT 67 Scenes	CIFAR-100
cross entropy	51.9%	59.4%	78.2%	67.3%	44.3%	77.0%
cross entropy + label smoothing	55.9%	68.3%	78.1%	66.8%	38.7%	77.5%
cosine loss	67.6%	71.7%	84.3%	71.1%	51.5%	75.3%
cosine loss + cross entropy	68.0%	71.9%	85.0%	70.6%	52.7%	76.4%

2. Requirements

Python >= 3.5
numpy
numexpr
keras >= 2.2.0
tensorflow (we used v1.8)
sklearn
scipy
pillow

3. Datasets

The following datasets have been used in the paper:

Caltech UCSD Birds-200-2011 (CUB)
North American Birds (NAB-large)
Stanford Cars (Cars)
Oxford Flowers-102 (Flowers)
MIT 67 Indoor Scenes (MIT67Scenes)
CIFAR-100 (CIFAR-100)

The names in parentheses specify the dataset names that can be passed to the scripts mentioned below.

4. Training with different loss functions

In the following exemplary python script calls, replace $DS with the name of the dataset (see above), $DSROOT with the path to that dataset, and $LR with the maximum learning rate for SGDR.

To save the model after training has completed, add --model_dump followed by the filename where the model definition and weights should be written to.

4.1 Softmax + Cross Entropy

python learn_classifier.py \
    --dataset $DS --data_root $DSROOT --sgdr_max_lr $LR \
    --architecture resnet-50 --batch_size 96 \
    --gpus 4 --read_workers 16 --queue_size 32 --gpu_merge

For label smoothing, add --label_smoothing 0.1.

4.2 Cosine Loss

python learn_image_embeddings.py \
    --dataset $DS --data_root $DSROOT --sgdr_max_lr $LR \
    --embedding onehot --architecture resnet-50 --batch_size 96 \
    --gpus 4 --read_workers 16 --queue_size 32 --gpu_merge

For the combined cosine + cross-entropy loss, add --cls_weight 0.1.

To use semantic embeddings instead of one-hot vectors, pass a path to one of the embedding files in the embeddings directory to --embedding instead of onehot.

4.3 CIFAR-100

For the CIFAR-100 dataset, use the following parameters:

python learn_classifier.py \
    --dataset CIFAR-100 --data_root $DSROOT --sgdr_max_lr $LR \
    --architecture resnet-110-wfc --batch_size 100

python learn_image_embeddings.py \
    --dataset CIFAR-100 --data_root $DSROOT --sgdr_max_lr $LR \
    --embedding onehot --architecture resnet-110-wfc --batch_size 100

4.4 Determining the best performance across different learning rates

For each dataset and loss function, we fine-tuned the learning rate individually by wrapping the training script calls into a bash loop like the following (here shown for training with the cosine loss on CIFAR-100 as an example):

for LR in 2.5 1.0 0.5 0.1 0.05 0.01 0.005 0.001; do
    echo $LR
    python learn_image_embeddings.py \
        --dataset CIFAR-100 --data_root $DSROOT --sgdr_max_lr $LR \
        --embedding onehot --architecture resnet-110-wfc --batch_size 100 \
        2>/dev/null | grep -oP "val_(prob_)?acc: \K([0-9.]+)" | sort -n | tail -n 1
done

The following table lists the values for --sgdr_max_lr that led to the best results.

Loss	CUB	NAB	Cars	Flowers	MIT 67 Scenes	CIFAR-100
cross entropy	0.05	0.05	1.0	1.0	0.05	0.1
cross entropy + label smoothing	0.05	0.1	1.0	0.1	1.0	0.1
cosine loss (one-hot)	0.5	0.5	1.0	0.5	2.5	0.05
cosine loss + cross entropy (one-hot)	0.5	0.5	0.5	0.5	2.5	0.1

5. Sub-sampling CUB

To experiment with differently sized variants of the CUB dataset, download the modified image list files and unzip the obtained archive into the root directory of your CUB dataset. For training, specify the dataset name as CUB-subX, where X is the number of samples per class.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CosineLoss.md

CosineLoss.md

Deep Learning on Small Datasets without Pre-Training using Cosine Loss

1. Results

2. Requirements

3. Datasets

4. Training with different loss functions

4.1 Softmax + Cross Entropy

4.2 Cosine Loss

4.3 CIFAR-100

4.4 Determining the best performance across different learning rates

5. Sub-sampling CUB

Files

CosineLoss.md

Latest commit

History

CosineLoss.md

File metadata and controls

Deep Learning on Small Datasets without Pre-Training using Cosine Loss

1. Results

2. Requirements

3. Datasets

4. Training with different loss functions

4.1 Softmax + Cross Entropy

4.2 Cosine Loss

4.3 CIFAR-100

4.4 Determining the best performance across different learning rates

5. Sub-sampling CUB