📣 We have published our new survey on OOD detection and related tasks in Vision Language Model era! Check out our new paper! |
---|
This repository contains PyTorch implementation for our paper: LoCoOp: Few-Shot Out-of-Distribution Detection via Prompt Learning
We introduce a novel OOD detection approach called Local regularized Context Optimization (LoCoOp), which performs OOD regularization that utilizes the portions of CLIP local features as OOD features during training. CLIP's local features have a lot of ID-irrelevant nuisances (e.g., backgrounds), and by learning to push them away from the ID class text embeddings, we can remove the nuisances in the ID class text embeddings and enhance the separation between ID and OOD. Experiments on the large-scale ImageNet OOD detection benchmarks demonstrate the superiority of our LoCoOp over zero-shot, fully supervised detection methods and prompt learning methods. Notably, even in one shot setting -- just one label per class, LoCoOp outperforms existing zero-shot and fully supervised detection methods.
We kindly ask followers to observe the following two points:
-
- Clarify whether MCM or GL-MCM was used at the time of inference. This is very important to see the performance of LoCoOp alone.
-
- When testing other than ImageNet OOD Benchmark, change the value of the training "-topk" and "-lambda" argument and report the value in the paper. The current config is for ImageNet-1K.
Let's build a better Few-Shot OOD Detection community together!
- 2024/04/14: We added related work for CLIP-based parameter-efficient OOD detection so that we can easily follow this research area!
- 2023/09/22: We publish the code for training and evaluation.
- 2023/06/02: We make this repository public.
Our experiments are conducted with Python 3.8 and Pytorch 1.8.1.
All required packages are based on CoOp (for training) and MCM (for evaluation).
This code is built on top of the awesome toolbox Dassl.pytorch so you need to install the dassl
environment first. Simply follow the instructions described here to install dassl
as well as PyTorch. After that, run pip install -r requirements.txt
under LoCoOp/
to install a few more packages required by CLIP and MCM (this should be done when dassl
is activated).
Please create data
folder and download the following ID and OOD datasets to data
.
We use ImageNet-1K as the ID dataset.
- Create a folder named
imagenet/
underdata
folder. - Create
images/
underimagenet/
. - Download the dataset from the official website and extract the training and validation sets to
$DATA/imagenet/images
.
Besides, we need to put imagenet-classes.txt
underimagenet/data
folder. This .txt file can be downloaded via https://drive.google.com/file/d/1-61f_ol79pViBFDG_IDlUQSwoLcn2XXF/view
We use the large-scale OOD datasets iNaturalist, SUN, Places, and Texture curated by Huang et al. 2021. We follow instructions from this repository to download the subsampled datasets.
The overall file structure is as follows:
LoCoOp
|-- data
|-- imagenet
|-- imagenet-classes.txt
|-- images/
|--train/ # contains 1,000 folders like n01440764, n01443537, etc.
|-- val/ # contains 1,000 folders like n01440764, n01443537, etc.
|-- iNaturalist
|-- SUN
|-- Places
|-- Texture
...
We share the 16-shot pre-trained models for LoCoOp. Please download them via the url.
The training script is in LoCoOp/scripts/locoop/train.sh
.
e.g., 1-shot training with ViT-B/16
CUDA_VISIBLE_DEVICES=0 bash scripts/locoop/train.sh data imagenet vit_b16_ep50 end 16 1 False 0.25 200
e.g., 16-shot training with ViT-B/16
CUDA_VISIBLE_DEVICES=0 bash scripts/locoop/train.sh data imagenet vit_b16_ep50 end 16 16 False 0.25 200
The inference script is in LoCoOp/scripts/locoop/eval.sh
.
If you evaluate the model of seed1 created by the above 16-shot training code, please conduct the below command.
CUDA_VISIBLE_DEVICES=0 bash scripts/locoop/eval.sh data imagenet vit_b16_ep50 1 output/imagenet/LoCoOp/vit_b16_ep50_16shots/nctx16_cscFalse_ctpend/seed1
The average scores of three seeds (1,2,3) are reported in the paper.
The code for the visualization of extracted OOD regions is in LoCoOp/scripts/locoop/demo_visualization.sh
.
e.g., image_path=data/imagenet/images/train/n04325704/n04325704_219.JPEG, label=824
sh scripts/locoop/demo_visualization.sh /home/miyai/LoCoOp/data imagenet vit_b16_ep50 output/imagenet/LoCoOp/vit_b16_ep50_16shots/nctx16_cscFalse_ctpend/seed1 data/imagenet/images/train/n04325704/n04325704_219.JPEG 824
The visualization result is in visualization/
.
The visualization examples are below:
We adopt these codes to create this repository.
- Conditional Prompt Learning for Vision-Language Models, in CVPR, 2022.
- Learning to Prompt for Vision-Language Models, IJCV, 2022.
- Delving into Out-of-Distribution Detection with Vision-Language Representations, in NeurIPS, 2022
- Zero-Shot In-Distribution Detection in Multi-Object Settings Using Vision-Language Foundation Models, arXiv, 2023
Parameter-efficient OOD detection is a promising research direction, and LoCoOp can be a baseline approach for this field.
To catch up with this field, we summarized the subsequent work for CLIP-based efficient OOD detection methods. (Last update: 2024.04.14)
- , code
This paper proposes PEFT-MCM, demonstrating the effectiveness of combining parameter-efficient tuning methods and MCM. To implement this, you can utilize our LoCoOp's code following the minor change.
-
LSN learns negative prompts for OOD detection, which is an orthogonal approach to LoCoOp and can be combined with LoCoOp.
- , code
IDPrompt leverages ID-like outliers in the ID image to further leverage the capabilities of CLIP for OOD detection, which is a similar concept to LoCoOp.
- , code
LSA first tackled full-spectrum OOD detection in the context of CLIP-based parameter-efficient OOD detection.
- , code
NegPrompt learns a set of negative prompts with only ID data. Also, this paper tackled a novel promising problem setting called Open-vocabulary OOD detection.
If I missed some work, feel free to contact me by opening an issue!
If you find our work interesting or use our code/models, please consider citing:
@inproceedings{miyai2023locoop,
title={LoCoOp: Few-Shot Out-of-Distribution Detection via Prompt Learning},
author={Miyai, Atsuyuki and Yu, Qing and Irie, Go and Aizawa, Kiyoharu},
booktitle={Thirty-Seventh Conference on Neural Information Processing Systems},
year={2023}
}
Besides, when you use GL-MCM (test-time detection method), please consider citing:
@article{miyai2023zero,
title={Zero-Shot In-Distribution Detection in Multi-Object Settings Using Vision-Language Foundation Models},
author={Miyai, Atsuyuki and Yu, Qing and Irie, Go and Aizawa, Kiyoharu},
journal={arXiv preprint arXiv:2304.04521},
year={2023}
}