Skip to content

shreya1313/backdoor-detector

Repository files navigation

Backdoor-Detector for BadNets

Dataset:

Input:

  • B, a backdoored neural network classifier with N classes
  • validation dataset of clean, labelled images.

Output:

  • Goodnet a “repaired” BadNet
    • Output the correct class if the test input is clean. The correct class will be in [1,N]
    • Output class N+1 if the input is backdoored

Approach:

  • Pruning Defense
    • Prune the last pooling layer of BadNet B by removing one channel at a time from that layer
    • Channels should be removed in decreasing order of average activation values over the entire validation set
    • Save the model when the accuracy drops by at least {2%, 4%, 10%}

Challenges:

  • Since pooling layer doesn't have any weights, we make use of convolution layer (conv_3, in our case)
  • Didn't have resources to run pruning of all 60 channels

How to run the code:

  • Run all the cells of the ipynb notebook attached.
  • For pruned models, please look at these models attached -- directly access it while running the notebook.

Results:

  • The results are as follows
    • The pruned model as a function of the fraction of channels pruned Plot
    • Final comparison of Pruned and Good model Plot

Thanks!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published