Skip to content

Latest commit



133 lines (92 loc) · 5.44 KB

File metadata and controls

133 lines (92 loc) · 5.44 KB

Deep Learning on Small Datasets without Pre-Training using Cosine Loss

This document explains how the code in this repository can be used to produce the results reported in the following paper:

Deep Learning on Small Datasets without Pre-Training using Cosine Loss.
Björn Barz and Joachim Denzler.
IEEE Winter Conference on Applications of Computer Vision (WACV), 2020.

1. Results

According to Table 2 in the paper:

Loss Function CUB NAB Cars Flowers MIT 67 Scenes CIFAR-100
cross entropy 51.9% 59.4% 78.2% 67.3% 44.3% 77.0%
cross entropy + label smoothing 55.9% 68.3% 78.1% 66.8% 38.7% 77.5%
cosine loss 67.6% 71.7% 84.3% 71.1% 51.5% 75.3%
cosine loss + cross entropy 68.0% 71.9% 85.0% 70.6% 52.7% 76.4%

2. Requirements

  • Python >= 3.5
  • numpy
  • numexpr
  • keras >= 2.2.0
  • tensorflow (we used v1.8)
  • sklearn
  • scipy
  • pillow

3. Datasets

The following datasets have been used in the paper:

The names in parentheses specify the dataset names that can be passed to the scripts mentioned below.

4. Training with different loss functions

In the following exemplary python script calls, replace $DS with the name of the dataset (see above), $DSROOT with the path to that dataset, and $LR with the maximum learning rate for SGDR.

To save the model after training has completed, add --model_dump followed by the filename where the model definition and weights should be written to.

4.1 Softmax + Cross Entropy

python \
    --dataset $DS --data_root $DSROOT --sgdr_max_lr $LR \
    --architecture resnet-50 --batch_size 96 \
    --gpus 4 --read_workers 16 --queue_size 32 --gpu_merge

For label smoothing, add --label_smoothing 0.1.

4.2 Cosine Loss

python \
    --dataset $DS --data_root $DSROOT --sgdr_max_lr $LR \
    --embedding onehot --architecture resnet-50 --batch_size 96 \
    --gpus 4 --read_workers 16 --queue_size 32 --gpu_merge

For the combined cosine + cross-entropy loss, add --cls_weight 0.1.

To use semantic embeddings instead of one-hot vectors, pass a path to one of the embedding files in the embeddings directory to --embedding instead of onehot.

4.3 CIFAR-100

For the CIFAR-100 dataset, use the following parameters:

python \
    --dataset CIFAR-100 --data_root $DSROOT --sgdr_max_lr $LR \
    --architecture resnet-110-wfc --batch_size 100

python \
    --dataset CIFAR-100 --data_root $DSROOT --sgdr_max_lr $LR \
    --embedding onehot --architecture resnet-110-wfc --batch_size 100

4.4 Determining the best performance across different learning rates

For each dataset and loss function, we fine-tuned the learning rate individually by wrapping the training script calls into a bash loop like the following (here shown for training with the cosine loss on CIFAR-100 as an example):

for LR in 2.5 1.0 0.5 0.1 0.05 0.01 0.005 0.001; do
    echo $LR
    python \
        --dataset CIFAR-100 --data_root $DSROOT --sgdr_max_lr $LR \
        --embedding onehot --architecture resnet-110-wfc --batch_size 100 \
        2>/dev/null | grep -oP "val_(prob_)?acc: \K([0-9.]+)" | sort -n | tail -n 1

The following table lists the values for --sgdr_max_lr that led to the best results.

Loss CUB NAB Cars Flowers MIT 67 Scenes CIFAR-100
cross entropy 0.05 0.05 1.0 1.0 0.05 0.1
cross entropy + label smoothing 0.05 0.1 1.0 0.1 1.0 0.1
cosine loss (one-hot) 0.5 0.5 1.0 0.5 2.5 0.05
cosine loss + cross entropy (one-hot) 0.5 0.5 0.5 0.5 2.5 0.1

5. Sub-sampling CUB

To experiment with differently sized variants of the CUB dataset, download the modified image list files and unzip the obtained archive into the root directory of your CUB dataset. For training, specify the dataset name as CUB-subX, where X is the number of samples per class.

Performance comparison for differently sub-sampled variants of the CUB dataset