https://arxiv.org/abs/2010.08705
In this work, we propose a semantic Difficulty-awarE Active Learning (DEAL) method taking the semantic difficulty into consideration. To capture the semantic difficulty, we adopt a two-branch framework and utilize the wrong predicted result, which is termed as the error mask. It's a binary image where the right and wrong positions have a value 0 and 1, respectively. Then, a pixel-wise probability attention module[2] is introduced to aggregate similar pixels into areas and learn the proportion of error pixels as the difficulty score for each area. Finally, we can obtain the semantic difficulty map where the difficulty score for each area has a close connection with the standard IoU metric.
Overview of our framework. The first branch is a common semantic segmentation network. The second branch is composed of a probability attention module and a 1×1 convolution. Da and Du are the annotated and unlabeled data, Ds is a subset selected from Du. P and Q are the probability maps before and after attention. Lseg and Ldif are two loss functions. DS and DE are two acquisition funcions.Below are the qualitative results we get in Cityscapes.
It can be observed that objects with high difficulty scores have the following characters.
- Slender or tiny objects.
- poles and traffic signs in the 1st and 5th rows;
- bicycles far away in the 4th row.
- Under-represented classes.
- In the 2nd row, rider has a higher score than pedestrian;
- In the 3rd row, bus has a higher score than car.
Image | GT | Semantic Difficulty Map | |
---|---|---|---|
1 | |||
2 | |||
3 | |||
4 | |||
5 | |||
6 |
- Yoo D, Kweon I S. Learning loss for active learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 93-102.
- Fu J, Liu J, Tian H, et al. Dual attention network for scene segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 3146-3154.
- ViewAL: https://github.com/nihalsid/ViewAL