Code for the paper: "Perturbing Inputs for Fragile Interpretations in Deep Natural Language Processing" (EMNLP BlackboxNLP - 2021)
Pre-calculated candidates and interpretations are available on Google drive here. The results can be replicated by running the results-metric.py
script. The exact commmands are detailed in Step-5.
We strongly recommend using conda
to manage dependencies.
Run conda create -n frag-exp python=3.6
and subsequently conda activate frag-exp
.
Run pip install -r requirements.txt
Following steps re-run the candidate generation process and re-calculate interpretations.
-
Install
Textattack
from theTextAttack
folder'sdist
folder by installing the wheel:pip install Textattack/dist/textattack-0.2.14-py3-none-any.whl
-
Run
python generate_candidates.py --model=distilbert --dataset=sst2 --number=500 --split=validation
. All options can be edited for different datasets and models. By default save paths are./candidates
. -
Run
python calculate_interpretations.py --model=distilbert --dataset=sst2 --interpretmethod=IG --number=500 --split=validation
. All options can be edited for different datasets and models. By default save paths are./interpretations
. -
Once all interpretations have been calculated, run
python results-metrics.py --model=distilbert --dataset=sst2 --interpretmethod=IG --number=500 --split=validation --metric=rkc
.
The available metrics are rkc (Rank Correlation)
, topk (Top-K Intersection)
,ppl (Perplexity)
, grm (Grammar errors)
and conf (Model Confidence)
. Results are stored in ./results
.