Skip to content

YUHAOSUNABC/BiFTA

Repository files navigation

Let's Roll a BiFTA: Bi-refinement for Fine-grained Text-visual Alignment in Vision-Language Models

This repository is the official PyTorch implementation of the TMLR 2026 paper: Let's Roll a BiFTA: Bi-refinement for Fine-grained Text-visual Alignment in Vision-Language Models

Abstract: Recent research has shown that aligning fine-grained text descriptions with localized image patches can significantly improve the zero-shot performance of pre-trained vision-language models (e.g., CLIP). However, we find that both fine-grained text descriptions and localized image patches often contain redundant information, making text-visual alignment less effective. In this paper, we tackle this issue from two perspectives: view refinement and description refinement, termed as Bi-refinement for Fine-grained Text-visual Alignment (BiFTA). View refinement removes redundant image patches with high Intersection over Union (IoU) ratios, resulting in more distinctive visual samples. Description refinement removes redundant text descriptions with high pairwise cosine similarity, ensuring greater diversity in the remaining descriptions. BiFTA achieves superior zero-shot performance on 6 benchmark datasets for both ViT-based and ResNet-based CLIP, justifying the necessity to remove redundant information in visual-text alignment.

Environment Setup

  • Python 3.12.3
  • CUDA 12.2.0
  • PyTorch 2.3.0
conda create -n bifta python=3.12
conda activate bifta
pip install -r requirements.txt

Dataset Preparation

Modify data_path in the config files under configs/ to point to your data location.

Supported datasets:

  • ImageNet
  • ImageNet-V2
  • CUB-200-2011
  • Oxford Pets
  • DTD
  • Food-101
  • Places365

Usage

Run Main Evaluation

python main.py --dataset [dataset] --seed [seed] --model_size [model_size]

Acknowledgements

This repo builds upon:

Citation

About

[TMLR 2026] "Let's Roll a BiFTA: Bi-refinement for Fine-grained Text-visual Alignment in Vision-Language Models"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published