arXiv | IEEE Xplore | Website | Video
This repository is the official implementation of the paper:
A Good Foundation is Worth Many Labels: Label-Efficient Panoptic Segmentation
Niclas VΓΆdisch*, KΓΌrsat Petek*, Markus KΓ€ppeler*, Abhinav Valada, and Wolfram Burgard.
*Equal contribution.IEEE Robotics and Automation Letters, vol. 10, issue 1, pp. 216-223, January 2025
If you find our work useful, please consider citing our paper:
@article{voedisch2025pastel,
author={VΓΆdisch, Niclas and Petek, KΓΌrsat and KΓ€ppeler, Markus and Valada, Abhinav and Burgard, Wolfram},
journal={IEEE Robotics and Automation Letters},
title={A Good Foundation is Worth Many Labels: Label-Efficient Panoptic Segmentation},
year={2025},
volume={10},
number={1},
pages={216-223},
}
Make sure to also check out our previous work on this topic: SPINO.
A key challenge for the widespread application of learning-based models for robotic perception is to significantly reduce the required amount of annotated training data while achieving accurate predictions. This is essential not only to decrease operating costs but also to speed up deployment time. In this work, we address this challenge for PAnoptic SegmenTation with fEw Labels (PASTEL) by exploiting the groundwork paved by visual foundation models. We leverage descriptive image features from such a model to train two lightweight network heads for semantic segmentation and object boundary detection, using very few annotated training samples. We then merge their predictions via a novel fusion module that yields panoptic maps based on normalized cut. To further enhance the performance, we utilize self-training on unlabeled images selected by a feature-driven similarity scheme. We underline the relevance of our approach by employing PASTEL to important robot perception use cases from autonomous driving and agricultural robotics. In extensive experiments, we demonstrate that PASTEL significantly outperforms previous methods for label-efficient segmentation even when using fewer annotation.
- Create conda environment:
conda create --name pastel python=3.8
- Activate environment:
conda activate pastel
- Install dependencies:
pip install -r requirements.txt
- Install torch, torchvision and cuda:
pip install torch==1.10.1+cu111 torchvision==0.11.2+cu111 torchaudio==0.10.1 -f https://download.pytorch.org/whl/cu111/torch_stable.html
- Install pre-commit githook scripts:
pre-commit install
- Upgrade isort to 5.12.0:
pip install isort
- Update [pre-commit]:
pre-commit autoupdate
Linter (pylint) and formatter (yapf, iSort) settings can be set in pyproject.toml.
Generating pseudo-labels with PASTEL involves three steps:
- Train the semantic segmentation module.
- Train the boundary estimation module.
- Generate pseudo-labels using the fusion module.
For Cityscapes, an exemplary execution would look like this:
conda activate pastel
python semantic_fine_tuning.py fit --trainer.devices [0] --config configs/cityscapes_semantics.yaml
python boundary_fine_tuning.py fit --trainer.devices [0] --config configs/cityscapes_boundary.yaml
python instance_clustering.py test --trainer.devices [0,1,2,3] --config configs/cityscapes_instance_ncut.yaml
We provide configuration files for each step of all datasets in the configs
folder. Please make sure to double-check the paths to the datasets and the pretrained weights.
We provide the following pre-trained weights:
- Cityscapes:
- PASCAL VOC:
- PhenoBench:
β οΈ If your browser blocks the download, right-click on the link and copy the address to download the file manually.
Download the following files:
- leftImg8bit_sequence_trainvaltest.zip (324GB)
- gtFine_trainvaltest.zip (241MB)
- camera_trainvaltest.zip (2MB)
After extraction, one should obtain the following file structure:
ββ data/cityscapes
βββ camera
β βββ ...
βββ gtFine
β βββ ...
βββ leftImg8bit_sequence
βββ ...
- We use the 2012 challenge plus the SBD extension.
- Upon execution, the files should be automatically downloaded from torchvision.
Afterward, one should obtain the following file structure:
ββ data/pascal_voc
βββ SBD
β βββ ...
βββ VOCdevkit/VOC2012
βββ ...
- We use the leaf instance segmentation challenge.
- Please download the dataset from the official website.
After extraction, one should obtain the following file structure:
ββ data/phenobench
βββ test
β βββ images
βββ train
β βββ images
β βββ leaf_instances
β βββ leaf_visibility
β βββ plant_instances
β βββ plant_visibility
β βββ semantics
βββ val
βββ images
βββ leaf_instances
βββ leaf_visibility
βββ plant_instances
βββ plant_visibility
βββ semantics
For academic usage, the code is released under the GPLv3 license. For any commercial purpose, please contact the authors.
This work was funded by the German Research Foundation (DFG) Emmy Noether Program grant No 468878300.