Skip to content

nmndeep/PerturbAndRecover

Repository files navigation

Perturb and Recover: Fine-tuning for Effective Backdoor Removal from CLIP

Naman Deep Singh, Francesco Croce and Matthias Hein

Paper: arXiv

Abstract

Vision-Language models like CLIP have been shown to be highly effective at linking visual perception and natural language understanding, enabling sophisticated image-text capabilities, including strong retrieval and zero-shot classification performance. Their widespread use, as well as the fact that CLIP models are trained on image-text pairs from the web, make them both a worthwhile and relatively easy target for backdoor attacks. As training foundational models, such as CLIP, from scratch is very expensive, this paper focuses on cleaning potentially poisoned models via fine-tuning. We first show that existing cleaning techniques are not effective against simple structured triggers used in Blended or BadNet backdoor attacks, exposing a critical vulnerability for potential real-world deployment of these models. Then, we introduce PAR, Perturb and Recover, a surprisingly simple yet effective mechanism to remove backdoors from CLIP models. Through extensive experiments across different encoders and types of backdoor attacks, we show that PAR achieves high backdoor removal rate while preserving good standard performance. Finally, we illustrate that our approach is effective even only with synthetic text-image pairs, i.e. without access to real training data.


Proposed Triggers


The triggers can be used as-is in either BadNet or Blended attacks.


Clean Accuracy- Attack success rate trade-off

The proposed cleaning method, PAR yields a better ASR (Attack success rate)-CA (Clean accuracy) curve than the baselines. PAR works with both the real(CC3M) and synthetic data(SynC)

Train with PAR

  • Installations, see requirements.txt

  • Run bash trigger.sh

Instructions:

  • Add path to (image,caption) paired csv file in the variable trainData.

  • Add imgRoot: directory where the train images are located - this is appended to image-filename in the csv at trainData.

  • Add the path to posioned checkpoint in modelPath and encoder model-name in modeln.

  • Set outDIR - here the checkpoints and logs would be saved.

Evaluate

  • Run bash trigger_validate.sh checkpoint_path attack_name encoder_name target_label

attack_name can be one of the following:

  • badnet_rs (BadNet-Stripes)
  • blended_rs (Blended-Stripes)
  • tri_patt (Blended-Triangles)
  • water_patt (Blended-Text)
  • random (BadNet)
  • badclip (BadCLIP)
  • blended (Blended)

Some poisoned/PAR-cleaned CLIP model checkpoints

Encoder name Backdoor-attack poisoned model PAR-cleaned
ViT-L/14-336 BadNet-Stripes Link --
ViT-L/14-336 Blended-Text Link --
ViT-B/32 BadNet-Stripes Link Link
ViT-B/32 Blended-Triangles Link Link
ViT-B/32 Blended-Text Link Link

Note: all of the above poisoned models are with target_label banana


The code in this repository is partially based on the following publically available codebases.

  1. https://github.com/nishadsinghi/CleanCLIP
  2. https://github.com/LiangSiyuan21/BadCLIP

Citation

If you use our code/models cite our work using the following BibTex entry:

@article{singh2024PAR,
      title={Perturb and Recover: Fine-tuning for Effective Backdoor Removal from CLIP}, 
      author={Naman Deep Singh and Francesco Croce and Matthias Hein},
      journal = {arXiv preprint},
      year = {2024}
      url={https://arxiv.org/abs/2412.00727}, 
}

About

Perturb and Recover: Fine-tuning for Effective Backdoor Removal from CLIP

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published