Skip to content

Commit a6ffde1

Browse files
committed
add codes and datasets
1 parent dda615b commit a6ffde1

File tree

719 files changed

+436269
-1
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

719 files changed

+436269
-1
lines changed

.idea/.gitignore

Lines changed: 8 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.idea/CognitiveOverload.iml

Lines changed: 8 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.idea/deployment.xml

Lines changed: 15 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.idea/inspectionProfiles/Project_Default.xml

Lines changed: 18 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.idea/inspectionProfiles/profiles_settings.xml

Lines changed: 6 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.idea/misc.xml

Lines changed: 4 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.idea/modules.xml

Lines changed: 8 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.idea/vcs.xml

Lines changed: 6 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

README.md

Lines changed: 147 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,155 @@
11
# CognitiveOverload
22
Code for our NAACL 2024 Paper "Cognitive Overload: Jailbreaking Large Language Models with Overloaded Logical Thinking"
33

4-
coming soon...
4+
## Datasets
5+
We adopt the following two datasets with malicious prompts:
6+
1. **_AdvBench_** from [Universal and transferable adversarial attacks on aligned language models](https://arxiv.org/abs/2307.15043)
7+
2. **_MasterKey_** from [Jailbreaker: Automated jailbreak across multiple large language model chatbots.](https://arxiv.org/abs/2307.08715)
58

9+
We translate the original English prompts to 52 other languages with Google Cloud API. You can find the original and translated prompts in folder` ./datasets`
10+
### Jailbreaking with Multilingual Cognitive Overload
11+
Run the following script to get baseline performance, i.e., LLMs prompted with English prompts
12+
```shell
13+
export dataset="AdvBench"
14+
export model_name="vicuna-7b"
15+
CUDA_VISIBLE_DEVICES=0 python perform_multilingual_attack.py \
16+
--dataset $dataset \
17+
--model-name $model_name \
18+
--max-tokens 128 \
19+
--attack en \
20+
--max-batch-size 16
21+
```
22+
#### Harmful Prompting in Various Languages
23+
Run the following script to get results when prompting LLMs with all other languages that the LLM supports:
24+
```shell
25+
export dataset="AdvBench"
26+
export model_name="vicuna-7b"
27+
CUDA_VISIBLE_DEVICES=0 python perform_multilingual_attack.py \
28+
--dataset $dataset \
29+
--model-name $model_name \
30+
--max-tokens 128 \
31+
--attack monolingual \
32+
--max-batch-size 16
33+
```
34+
#### Language Switching: from English to Lan X vs. from Lan X to English
35+
1. Extract keywords from prompts
36+
```shell
37+
export dataset="AdvBench"
38+
CUDA_VISIBLE_DEVICES=0 python prepare_multilingual.py \
39+
--stage extract_keywords \
40+
--dataset $dataset
41+
```
42+
2. retrieve definition of keywords from wikipedia
43+
```shell
44+
CUDA_VISIBLE_DEVICES=0 python prepare_multilingual.py \
45+
--stage wiki_definition \
46+
--dataset $dataset
47+
```
48+
3. translate definitions into 52 other languages
49+
```shell
50+
CUDA_VISIBLE_DEVICES=0 python prepare_multilingual.py \
51+
--stage translate_context \
52+
--dataset $dataset
53+
```
54+
4. attack LLMs in 2-turn, first english then other language or in reverse order
55+
```shell
56+
export model_name="vicuna-7b"
57+
echo "English first then other language"
58+
CUDA_VISIBLE_DEVICES=0 python perform_multilingual_attack.py \
59+
--dataset $dataset \
60+
--model-name $model_name \
61+
--max-tokens 128 \
62+
--attack multilingual \
63+
--max-batch-size 16 \
64+
--en-first
65+
echo "other language first then English"
66+
CUDA_VISIBLE_DEVICES=0 python perform_multilingual_attack.py \
67+
--dataset $dataset \
68+
--model-name $model_name \
69+
--max-tokens 128 \
70+
--attack multilingual \
71+
--max-batch-size 16
72+
```
73+
### Jailbreaking with Veiled Expressions
74+
1. extract sensitive words
75+
```shell
76+
export dataset="AdvBench"
77+
CUDA_VISIBLE_DEVICES=0 python perform_veiled_attack.py \
78+
--dataset $dataset \
79+
--stage extract_sensitive \
80+
--max-batch-size 16 \
81+
--model-name Mistral-7B-Instruct-v0.1
82+
```
83+
2. clean extracted sensitive words
84+
```shell
85+
CUDA_VISIBLE_DEVICES=0 python perform_veiled_attack.py \
86+
--dataset $dataset \
87+
--stage clean_word \
88+
--model-name Mistral-7B-Instruct-v0.1
89+
```
90+
3. replace sensitive words with veiled expressions
91+
```shell
92+
CUDA_VISIBLE_DEVICES=0 python perform_veiled_attack.py \
93+
--dataset $dataset \
94+
--stage replace_sensitive \
95+
--max-batch-size 16 \
96+
--model-name Mistral-7B-Instruct-v0.1
97+
```
98+
4. attack LLMs with veiled expressions
99+
```shell
100+
export model_name="vicuna-7b"
101+
CUDA_VISIBLE_DEVICES=0 python perform_veiled_attack.py \
102+
--dataset $dataset \
103+
--stage sensitive_attack \
104+
--max-batch-size 16 \
105+
--model-name $model_name \
106+
--num-demo 8
107+
```
108+
### Jailbreaking with Effect-to-Cause Cognitive Overload
109+
1. extract events with in-context learning
110+
```shell
111+
export dataset="AdvBench"
112+
for model_name in 'Mistral-7B-v0.1' 'llama2-13b'
113+
do
114+
CUDA_VISIBLE_DEVICES=0 python perform_effect_to_cause.py \
115+
--dataset $dataset \
116+
--model-name $model_name \
117+
--stage extract_event \
118+
--max-batch-size 16
119+
done
120+
```
121+
2. filter extracted events
122+
```shell
123+
python perform_effect_to_cause.py \
124+
--dataset $dataset \
125+
--stage filter_event
126+
```
127+
3. perform effect-to-cause attacks
128+
```shell
129+
export model_name="vicuna-7b"
130+
CUDA_VISIBLE_DEVICES=0 python perform_effect_to_cause.py \
131+
--dataset $dataset \
132+
--model-name $model_name \
133+
--stage attack \
134+
--max-batch-size 16
135+
```
6136
### Citation
137+
We appreciate data contribution from these two papers:
138+
```
139+
@article{zou2023universal,
140+
title={Universal and transferable adversarial attacks on aligned language models},
141+
author={Zou, Andy and Wang, Zifan and Carlini, Nicholas and Nasr, Milad and Kolter, J Zico and Fredrikson, Matt},
142+
journal={arXiv preprint arXiv:2307.15043},
143+
year={2023}
144+
}
145+
@article{deng2023jailbreaker,
146+
title={Jailbreaker: Automated jailbreak across multiple large language model chatbots},
147+
author={Deng, Gelei and Liu, Yi and Li, Yuekang and Wang, Kailong and Zhang, Ying and Li, Zefeng and Wang, Haoyu and Zhang, Tianwei and Liu, Yang},
148+
journal={arXiv preprint arXiv:2307.08715},
149+
year={2023}
150+
}
151+
```
152+
If you find our work helpful, please consider cite our work:
7153
```
8154
@inproceedings{xu2024cognitive,
9155
title={Cognitive overload: Jailbreaking large language models with overloaded logical thinking},

0 commit comments

Comments
 (0)