UW-NSL

SafeDecoding Public

Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding

Jupyter Notebook 129 11

ArtPrompt Public

[ACL24] Official Repo of Paper `ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs`

Python 67 14

ChatBug Public

[AAAI25] Official Repo of Paper `ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates`

Python 7

CleanGen Public

[EMNLP 24] Official Implementation of CLEANGEN: Mitigating Backdoor Attacks for Generation Tasks in Large Language Models

Python 14 2

safechain Public

SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities

Python 12 2

Provide feedback