GitHub - nmndeep/PerturbAndRecover: Perturb and Recover: Fine-tuning for Effective Backdoor Removal from CLIP

Perturb and Recover: Fine-tuning for Effective Backdoor Removal from CLIP

Naman Deep Singh, Francesco Croce and Matthias Hein

Abstract

Vision-Language models like CLIP have been shown to be highly effective at linking visual perception and natural language understanding, enabling sophisticated image-text capabilities, including strong retrieval and zero-shot classification performance. Their widespread use, as well as the fact that CLIP models are trained on image-text pairs from the web, make them both a worthwhile and relatively easy target for backdoor attacks. As training foundational models, such as CLIP, from scratch is very expensive, this paper focuses on cleaning potentially poisoned models via fine-tuning. We first show that existing cleaning techniques are not effective against simple structured triggers used in Blended or BadNet backdoor attacks, exposing a critical vulnerability for potential real-world deployment of these models. Then, we introduce PAR, Perturb and Recover, a surprisingly simple yet effective mechanism to remove backdoors from CLIP models. Through extensive experiments across different encoders and types of backdoor attacks, we show that PAR achieves high backdoor removal rate while preserving good standard performance. Finally, we illustrate that our approach is effective even only with synthetic text-image pairs, i.e. without access to real training data.

Proposed Triggers

The triggers can be used as-is in either BadNet or Blended attacks.

Clean Accuracy- Attack success rate trade-off

The proposed cleaning method, PAR yields a better ASR (Attack success rate)-CA (Clean accuracy) curve than the baselines. PAR works with both the real(CC3M) and synthetic data(SynC)

Train with PAR

Installations, see requirements.txt
Run bash trigger.sh

Instructions:

Add path to (image,caption) paired csv file in the variable trainData.
Add imgRoot: directory where the train images are located - this is appended to image-filename in the csv at trainData.
Add the path to posioned checkpoint in modelPath and encoder model-name in modeln.
Set outDIR - here the checkpoints and logs would be saved.

Evaluate

Run bash trigger_validate.sh checkpoint_path attack_name encoder_name target_label

attack_name can be one of the following:

badnet_rs (BadNet-Stripes)
blended_rs (Blended-Stripes)
tri_patt (Blended-Triangles)
water_patt (Blended-Text)
random (BadNet)
badclip (BadCLIP)
blended (Blended)

Some poisoned/PAR-cleaned CLIP model checkpoints

Encoder name	Backdoor-attack	poisoned model	PAR-cleaned
ViT-L/14-336	BadNet-Stripes	Link	--
ViT-L/14-336	Blended-Text	Link	--
ViT-B/32	BadNet-Stripes	Link	Link
ViT-B/32	Blended-Triangles	Link	Link
ViT-B/32	Blended-Text	Link	Link

Note: all of the above poisoned models are with target_label banana

The code in this repository is partially based on the following publically available codebases.

Citation

If you use our code/models cite our work using the following BibTex entry:

@article{singh2024PAR,
      title={Perturb and Recover: Fine-tuning for Effective Backdoor Removal from CLIP}, 
      author={Naman Deep Singh and Francesco Croce and Matthias Hein},
      journal = {arXiv preprint},
      year = {2024}
      url={https://arxiv.org/abs/2412.00727}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
asset		asset
backdoor		backdoor
eval		eval
pkgs		pkgs
.DS_Store		.DS_Store
README.md		README.md
data.py		data.py
losses.py		losses.py
requirements.txt		requirements.txt
ruff.toml		ruff.toml
tokenizer.py		tokenizer.py
train.py		train.py
trigger.sh		trigger.sh
trigger_validate.sh		trigger_validate.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Perturb and Recover: Fine-tuning for Effective Backdoor Removal from CLIP

Abstract

Proposed Triggers

Clean Accuracy- Attack success rate trade-off

Train with PAR

Evaluate

Some poisoned/PAR-cleaned CLIP model checkpoints

Citation

About

Uh oh!

Releases

Packages

Languages

nmndeep/PerturbAndRecover

Folders and files

Latest commit

History

Repository files navigation

Perturb and Recover: Fine-tuning for Effective Backdoor Removal from CLIP

Abstract

Proposed Triggers

Clean Accuracy- Attack success rate trade-off

Train with PAR

Evaluate

Some poisoned/PAR-cleaned CLIP model checkpoints

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages