Code for our EMNLP 2022 paper. The framework and metrics are adapted from https://github.com/bangawayoo/adversarial-examples-in-text-classification
Python >= 3.6
Pytorch=1.8.1
Install python requirments via requirments file: pip install -r requirements.txt
We use TextAttack to generate attack data of the four attacks, TextFooler, BAE, Pruthi, and TextBugger. If you want to generate your own adversarial data, please refer to their repos. We also provide here with some of the data we generate, including both regular and far-boundary data. Please download the whole folder and put them under the main directory.
The experiments can be reproduced by simply running the following shell script:
bash run_test_sst2.sh
This is the example script for sst2. Changing the datasets, attack types, and detectors with the following options.
Options for the datasets are sst2, imdb, ag-news, and snli, which are listed with the DATASET
variable.
Options for the type of attacks are textfooler, bae, pruthi, and textbugger, which are listed with the RECIPE
variable. Also, use *_high_confidence_0.9
, such as textfooler_high_confidence_0.9
for far boundary version of attacks.
Options for the detectors are our proposed method ue
, and two other baselines, ppl
and rde