CRIS: CLIP-Driven Referring Image Segmentation (CVPR2022)

Created by Zhaoqing Wang*, Yu Lu*, Qiang Li*, Xunqiang Tao, Yandong Guo, Mingming Gong and Tongliang Liu

This is an official PyTorch implementation of the CRIS

CLIP-Driven Referring Image Segmentation (CRIS) framework is proposed to transfer the image-level semantic knowledge of the CLIP model to dense pixel-level referring image segmentation. More specifically, we design a vision-language decoder to propagate fine-grained semantic information from textual representations to each pixel-level activation, which promotes consistency between the two modalities. In addition, we present text-to-pixel contrastive learning to explicitly enforce the text feature similar to the related pixel-level features and dissimilar to the irrelevances.

🍻CRIS actives new state-of-the-art performance on RefCOCO, RefCOCO+ and G-Ref with simple framework!

Demo

Framework

News

🔧 [Jun 6, 2022] Pytorch implementation of CRIS are released.
☀️ [Mar 2, 2022] Our paper was accepted by CVPR-2022.

Main Results

Main results on RefCOCO

Backbone	val	test A	test B
ResNet50	69.52	72.72	64.70
ResNet101	70.47	73.18	66.10

Main results on RefCOCO+

Backbone	val	test A	test B
ResNet50	61.39	67.10	52.48
ResNet101	62.27	68.08	53.68

Main results on G-Ref

Backbone	val	test
ResNet50	59.35	59.39
ResNet101	59.87	60.36

Preparation

Environment
- PyTorch (e.g. 1.10.0)
- Other dependencies in requirements.txt
Datasets
- The detailed instruction is in prepare_datasets.md

Quick Start

This implementation only supports multi-gpu, DistributedDataParallel training, which is faster and simpler; single-gpu or DataParallel training is not supported. Besides, the evaluation only supports single-gpu mode.

Before training, please login in your wandb via wandb login or wandb login --anonymously. To do training of CRIS with 8 GPUs, run:

# e.g., Evaluation on the val-set of the RefCOCO dataset
python -u train.py --config config/refcoco/cris_r50.yaml

To do evaluation of CRIS with 1 GPU, run:

# e.g., Evaluation on the val-set of the RefCOCO dataset
CUDA_VISIBLE_DEVICES=0 python -u test.py \
      --config config/refcoco/cris_r50.yaml \
      --opts TEST.test_split val-test \
             TEST.test_lmdb datasets/lmdb/refcocog_g/val.lmdb

License

This project is under the MIT license. See LICENSE for details.

Citation

If you find our work useful in your research, please consider citing:

@inproceedings{wang2021cris,
  title={CRIS: CLIP-Driven Referring Image Segmentation},
  author={Wang, Zhaoqing and Lu, Yu and Li, Qiang and Tao, Xunqiang and Guo, Yandong and Gong, Mingming and Liu, Tongliang},
  booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.vscode		.vscode
config		config
engine		engine
img		img
model		model
tools		tools
utils		utils
LICENSE		LICENSE
README.md		README.md
requirement.txt		requirement.txt
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CRIS: CLIP-Driven Referring Image Segmentation (CVPR2022)

Demo

Framework

News

Main Results

Preparation

Quick Start

License

Citation

About

Releases 1

Packages

Contributors 3

Languages

License

DerrickWang005/CRIS.pytorch

Folders and files

Latest commit

History

Repository files navigation

CRIS: CLIP-Driven Referring Image Segmentation (CVPR2022)

Demo

Framework

News

Main Results

Preparation

Quick Start

License

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Languages

Packages