Skip to content

Implementation from scratch the `RETINANET` algorithm in the task of face detection. Training and validation on the `Wider_easy` dataset.

Notifications You must be signed in to change notification settings

jkwiatk1/retinanet-face-detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Intro

The project created in a two-person team. Implementation of RetinaNet architecture in a face detection task. Training and evaluation on a Wider dataset.

Results

Images

  • normal:
    images

  • With augumentation:
    images

Training:

  • training loss:
    images

  • coco metrics:
    images

RetinaNet for Face Detection

RetinaNet architecture

images

ResNet50

Backbone created using the PyTorch model with pre-trained weights.

ResNet50 architecture

images

Feature Pyramid Network

Created based on:

Low resolution feature maps capture more global information of the image and represent richer semantic meaning while the high resolution feature maps focus more on the local information and provide more accurate spatial information. The goal of FPN is to combine the high and low resolution feature maps to enhance the features with both accurate spatial information and rich semantic meaning. FPN extracts feature maps and later feeds into a detector, like RPN.

Generating Anchor Boxes (RPN)

RPN applies a sliding window over the feature maps to make predictions on the objectness (has an object or not) and the object boundary box at each location.

For each scale level (say P4), a 3 × 3 convolution filter is applied over the feature maps followed by separate 1 × 1 convolution for objectness predictions and boundary box regression. These 3 × 3 and 1 × 1 convolutional layers are called the RPN head. The same head is applied to all different scale levels of feature maps.

Box Regression and Classification Heads

Classification Subnet

Classification subnet predicts the probability of object presence at each spatial position for each of the A anchors and K object classes.
The subnet is a FCN which applies four 3×3 conv layers, each with C filters and each followed by ReLU activations, followed by a 3×3 conv layer with KA filters. (K classes, A=9 anchors, and C = 256 filters)

Box Regression Subnet

Regression subnet is a FCN to each pyramid level for the purpose of regressing the offset from each anchor box to a nearby ground-truth object, if one exists.
It is identical to the classification subnet except that it terminates in 4A linear outputs per spatial location.
It is a class-agnostic bounding box regressor which uses fewer parameters, which is found to be equally effective.

Focal loss

During training, the total focal loss of an image is computed as the sum of the focal loss over all 100k anchors, normalized by the number of anchors assigned to a ground-truth box.

About

Implementation from scratch the `RETINANET` algorithm in the task of face detection. Training and validation on the `Wider_easy` dataset.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published