Class Similarity Transition: Decoupling Class Similarities and Imbalance from Generalized Few-shot Segmentation
This repository contains the code for our CVPR'2024 L3D-IVU Workshop paper, Class Similarity Transition: Decoupling Class Similarities and Imbalance from Generalized Few-shot Segmentation.
Abstract: In Generalized Few-shot Segmentation (GFSS), a model is trained with a large corpus of base class samples and then adapted on limited samples of novel classes. This paper focuses on the relevance between base and novel classes, and improves GFSS in two aspects: 1) mining the similarity between base and novel classes to promote the learning of novel classes, and 2) mitigating the class imbalance issue caused by the volume difference between the support set and the training set. Specifically, we first propose a similarity transition matrix to guide the learning of novel classes with base class knowledge. Then, we leverage the Label-Distribution-Aware Margin (LDAM) loss and Transductive Inference to the GFSS task to address the problem of class imbalance as well as overfitting the support set. In addition, by extending the probability transition matrix, the proposed method can mitigate the catastrophic forgetting of base classes when learning novel classes. With a simple training phase, our proposed method can be applied to any segmentation network trained on base classes. We validated our methods on the adapted version of OpenEarthMap. Compared to existing GFSS baselines, our method excels them all from 3% to 7% and ranks second in the OpenEarthMap Land Cover Mapping Few-Shot Challenge at the completion of this paper.
We used Python 3.9
in our experiments and the list of packages is available in the requirements.txt
file. You can install them using pip install -r requirements.txt
.
We use a adapted version of OpenEarthMap datasets. You can download the full .zip and directly extract it in the data/
folder.
Alternatively, you can prepare the datasets yourself. Here is the structure of the data folder for you to reproduce:
data
├── trainset
│ ├── images
│ └── labels
│
├── valset
| ├── images
| └── labels
|
├── testset
| ├── images
| └── labels
|
├── train.txt
├── stage1_val.txt
├── test.json
└── val.json
Default configuration files can be found in config/
. Data are located in data/
contains the train/val dataset. All the codes are provided in src/
. Testing script is located at the root of the repo.
We use ClassTrans-Train to train models on base classes. We suggest to skip this step and directly use this checkpoint to reimplement our results.
# Creating a soft link from `ClassTrans-Train/segmentation_models_pytorch` to `ClassTrans/segmentation_models_pytorch`
ln -s /your/path/ClassTrans-Train/segmentation_models_pytorch /your/path/ClassTrans
# Creating a soft link from `ClassTrans-Train/weight` to `ClassTrans/weight`
ln -s /your/path/ClassTrans-Train/weight /your/path/ClassTrans
# Run the testing script
bash test.sh
In test.py
, you can find some post-processing of the prediction masks with extra input files, which are obtained via a vision-language model APE and a class-agnostic mask refinement model CascadePSP. We provide these files in the Class-Trans/post-process
directory. If you want to reproduce our results step by step, you can refer to the following:
APE is a vision-language model which can conduct open-vocabulary detection and segmentation. We directly use the released checkpoint APE-D to infer the base class sea, lake, & pond
and the novel classes vehicle & cargo-trailer
and sports field
, using the following commands:
# sea, lake, & pond
python demo/demo_lazy.py --config-file configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/ape_deta/ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4_1080k.py --input data/cvpr2024_oem_ori_png/*.png --output output/cvpr2024_oem_ori_thres-0.12_water/ --confidence-threshold 0.12 --text-prompt 'water' --with-sseg --opts train.init_checkpoint=model_final.pth model.model_vision.select_box_nums_for_evaluation=500 model.model_vision.text_feature_bank_reset=True
# vehicle & cargo-trailer
python demo/demo_lazy.py --config-file configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/ape_deta/ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4_1080k.py --input data/cvpr2024_oem_crop_256-128/*.png --output output/cvpr2024_oem_crop-256-128_thres-0.1_car/ --confidence-threshold 0.1 --text-prompt 'car' --with-sseg --opts train.init_checkpoint=model_final.pth model.model_vision.select_box_nums_for_evaluation=500 model.model_vision.text_feature_bank_reset=True
# sports field
python demo/demo_lazy.py --config-file configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/ape_deta/ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4_1080k.py --input data/cvpr2024_oem_crop_256-128/*.png --output output/cvpr2024_oem_crop-256-128_thres-0.2_sportfield/ --confidence-threshold 0.2 --text-prompt 'sports field,basketball field,soccer field,tennis field,badminton field' --with-sseg --opts train.init_checkpoint=model_final.pth model.model_vision.select_box_nums_for_evaluation=500 model.model_vision.text_feature_bank_reset=True
Before executing the above commands, please make sure that you have successfully built the APE environment and sliced the original image into appropriate image tiles:
-
Please refer here to build APE's reasoning environment, we highly recommend using docker to build it.
-
Convert the RGB images from '.tif' format to '.png' format and use
image2patch.py
script to generate image tiles.
After reasoning with APE, use the following commands to compose the results of the image tiles into the whole image:
# get semantic mask from instance mask
python tools/get_mask_from_instance.py
# get the complete result for the whole image
python tools/patch2image.py
Note: We have confirmed that using the foundation model is consistent with the challenge rules.
We use CascadePSP to refine the mask of building type 1 & 2
# install segmentation_refinement
pip install segmentation_refinement
# get refined mask of building type 1 & 2
python tools/mask_refinement.py
Class | IoU |
---|---|
Tree | 68.94964 |
Rangeland | 49.81997 |
Bareland | 32.84904 |
Agric land type 1 | 53.61771 |
Road type 1 | 57.60924 |
Sea, lake, & pond | 53.97921 |
Building type 1 | 55.54934 |
------------------- | ---------- |
Vehicle & cargo-trailer | 37.24685 |
Parking space | 32.26357 |
Sports field | 49.98770 |
Building type 2 | 52.10971 |
mIoU for base classes | 53.19631 |
mIoU for novel classes | 42.90196 |
Weighted average of mIoU scores for base and novel classes | 47.01970 |
The weighted average is calculated using 0.4:0.6 => base:novel
based on SOA GFSS baseline.
We gratefully thank the authors of BAM, DIAM, APE, CascadePSP and PyTorch Semantic Segmentation from which some parts of our code are inspired.
If you find this project useful, please consider citing:
@article{wang2024class,
title={Class Similarity Transition: Decoupling Class Similarities and Imbalance from Generalized Few-shot Segmentation},
author={Wang, Shihong and Liu, Ruixun and Li, Kaiyu and Jiang, Jiawei and Cao, Xiangyong},
journal={arXiv preprint arXiv:2404.05111},
year={2024}
}