Skip to content

List of T2I safety papers, updated daily, welcome to discuss using Discussions

License

Notifications You must be signed in to change notification settings

SaFoLab-WISC/Awesome-T2I-safety-Papers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

47 Commits
Β 
Β 
Β 
Β 

Repository files navigation

Awesome-T2I/T2V-safety-Papers

Update Log Report Error Pull Request

A continual collection of papers related to safety of Text-to-Image/Text-to-Video Models (T2I/T2V Safety).

πŸš€ The scope of our collection

πŸ’‘ Topic 1: The Jailbreak Attack/Defense methods on T2I/T2V Models.

Here, safety is defined as stopping models from following malicious instructions and generating toxic content, including violence, NSFW, privacy violation, animal abuse, child abuse, sexual, misinformation, etc. (please refer to [the system card of DALLΒ·E3])

πŸ’‘ Topic 2: Digital Watermarking for T2I Safety.

Digital Watermarking is broadly used for the verification, authenticity, and traceability of images. It can also trace the copyright and ownership of the T2I models.

πŸ’‘ Topic 3: Attribution for T2I Methods.

To determine whether an image is generated by AIGC methods, and by which model (e.g. Stable Diffusion, DALL-E, etc.), we can attribute an image to its generator, in proactive and passive ways.


Text-to-Image Models

πŸ’‘ Jailbreak Attack on Text-to-Image Models

[0] SneakyPrompt: Jailbreaking Text-to-image Generative Models

  • πŸ§‘β€πŸ”¬ Author: Yuchen Yang, Bo Hui, Haolin Yuan, Neil Gong, Yinzhi Cao
  • 🏫 Affiliation: Johns Hopkins University, Duke University
  • πŸ”— Link: [Code] [arXiv:2305.12082]
  • πŸ“ Note: πŸ”₯ (S&P 2024)

[1] MMA-Diffusion: MultiModal Attack on Diffusion Models

  • πŸ§‘β€πŸ”¬ Author: Yijun Yang, Ruiyuan Gao, Xiaosen Wang, Tsung-Yi Ho, Nan Xu, Qiang Xu
  • 🏫 Affiliation: The Chinese University of Hong Kong, Huawei Singular Security Lab, Institute of Automation, Chinese Academy of Sciences, Beijing Wenge Technology Co. Ltd
  • πŸ”— Link: [Code] [arXiv:2311.17516]
  • πŸ“ Note: πŸ”₯ (CVPR2024)

[2] RIATIG: Reliable and Imperceptible Adversarial Text-to-Image Generation With Natural Prompts

  • πŸ§‘β€πŸ”¬ Author: Han Liu, Yuhao Wu, Shixuan Zhai, Bo Yuan, Ning Zhang
  • 🏫 Affiliation: Washington University in St. Louis, Rutgers University
  • πŸ”— Link: [Code] [CVPR:2023]
  • πŸ“ Note: πŸ”₯ (CVPR2023)

[3] Prompt Stealing Attacks Against Text-to-Image Generation Model

  • πŸ§‘β€πŸ”¬ Author: Xinyue Shen, Yiting Qu, Michael Backes, Yang Zhang
  • 🏫 Affiliation: CISPA Helmholtz Center for Information Security
  • πŸ”— Link: [[Code]] [arXiv:2302.09923]
  • πŸ“ Note: πŸ”₯ (USENIX Security 2024)

[4] Divide-and-Conquer Attack: Harnessing the Power of LLM to Bypass the Censorship of Text-to-Image Generation Model

  • πŸ§‘β€πŸ”¬ Author: Yimo Deng, Huangxun Chen
  • 🏫 Affiliation: The Hong Kong University of Science and Technology, Northeastern University
  • πŸ”— Link: [Code] [arXiv:2302.09923]
  • πŸ“ Note:

[5] SurrogatePrompt: Bypassing the Safety Filter of Text-To-Image Models via Substitution

  • πŸ§‘β€πŸ”¬ Author: Zhongjie Ba, Jieming Zhong, Jiachen Lei, Peng Cheng, Qinglong Wang, Zhan Qin, Zhibo Wang, Kui Ren
  • 🏫 Affiliation: Zhejiang University, ZJU-Hangzhou Global Scientific and Technological Innovation Center,
  • πŸ”— Link: [[Code]] [arXiv:2309.14122]
  • πŸ“ Note:

[6] Probing Unlearned Diffusion Models: A Transferable Adversarial Attack Perspective

  • πŸ§‘β€πŸ”¬ Author: Xiaoxuan Han and Songlin Yang and Wei Wang and Yang Li and Jing Dong
  • 🏫 Affiliation: University of Chinese Academy of Sciences
  • πŸ”— Link: [Code] [arXiv:2404.19382]
  • πŸ“ Note: Adversarial Concept Restoration

[7] Adversarial Nibbler: An Open Red-Teaming Method for Identifying Diverse Harms in Text-to-Image Generation

  • πŸ§‘β€πŸ”¬ Author: Jessica Quaye, Alicia Parrish, Oana Inel, Charvi Rastogi, Hannah Rose Kirk, Minsuk Kahng, Erin van Liemt, Max Bartolo, Jess Tsang, Justin White, Nathan Clement, Rafael Mosquera, Juan Ciro, Vijay Janapa Reddi, Lora Aroyo
  • 🏫 Affiliation: Harvard University, Google Research, University of Zurich, Carnegie Mellon University, University of Oxford, University College London, Cohere, MLCommon
  • πŸ”— Link: [Code] [arXiv:2403.12075]
  • πŸ“ Note: Adversarial Nibbler Dataset

[8] Jailbreaking Prompt Attack: A Controllable Adversarial Attack against Diffusion Models

  • πŸ§‘β€πŸ”¬ Author: Jiachen Ma, Anda Cao, Zhiqing Xiao, Jie Zhang, Chao Ye, Junbo Zhao
  • 🏫 Affiliation: Zhejiang University, ETH Zurich
  • πŸ”— Link: [[Code]] [arXiv:2404.02928]
  • πŸ“ Note:

[9] UPAM: Unified Prompt Attack in Text-to-Image Generation Models Against Both Textual Filters and Visual Checkers

  • πŸ§‘β€πŸ”¬ Author: Duo Peng, Qiuhong Ke, Jun Liu
  • 🏫 Affiliation: Singapore University of Technology and Design, Monash University.
  • πŸ”— Link: [[Code]] [arXiv:2405.11336]
  • πŸ“ Note: πŸ”₯ ICML 2024

[10] Automatic Jailbreaking of the Text-to-Image Generative AI Systems

  • πŸ§‘β€πŸ”¬ Author: Minseon Kim, Hyomin Lee, Boqing Gong, Huishuai Zhang, Sung Ju Hwang
  • 🏫 Affiliation: KAIST, Korea University, Peiking University, DeepAuto.ai
  • πŸ”— Link: [[Code]] [arXiv:2405.16567]
  • πŸ“ Note:

[11] Nightshade: Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models

  • πŸ§‘β€πŸ”¬ Author: Shawn Shan, Wenxin Ding, Josephine Passananti, Stanley Wu, Haitao Zheng, Ben Y. Zhao
  • 🏫 Affiliation: University of Chicago
  • πŸ”— Link: [[Code]] [arXiv:2310.13828]
  • πŸ“ Note:

[12] Jailbreaking Text-to-Image Models with LLM-Based Agents

  • πŸ§‘β€πŸ”¬ Author: Yingkai Dong, Zheng Li, Xiangtao Meng, Ning Yu, Shanqing Guo
  • 🏫 Affiliation: Shandong University, CISPA Helmholtz Center for Information Security, Netflix Eyeline Studios
  • πŸ”— Link: [[Code]] [arXiv:2408.00523]
  • πŸ“ Note:

[13] Automatic Jailbreaking of the Text-to-Image Generative AI Systems

  • πŸ§‘β€πŸ”¬ Author: Minseon Kim, Hyomin Lee, Boqing Gong, Huishuai Zhang, Sung Ju Hwang
  • 🏫 Affiliation: KAIST, Korea University, Peiking University, DeepAuto.ai
  • πŸ”— Link: [[Code]] [arXiv:2405.16567]
  • πŸ“ Note:

πŸ’‘ Defenses & Alignment on Text-to-Image Models

[0] GuardT2I: Defending Text-to-Image Models from Adversarial Prompts

  • πŸ§‘β€πŸ”¬ Author: Yijun Yang, Ruiyuan Gao, Xiao Yang, Jianyuan Zhong, Qiang Xu
  • 🏫 Affiliation: The Chinese University of Hong Kong, Hong Kong, Tsinghua University
  • πŸ”— Link: [Code] [arXiv:2403.01446]
  • πŸ“ Note:

[1] Universal Prompt Optimizer for Safe Text-to-Image Generation

  • πŸ§‘β€πŸ”¬ Author: Zongyu Wu, Hongcheng Gao, Yueze Wang, Xiang Zhang, Suhang Wang
  • 🏫 Affiliation: The Pennsylvania State University, University of Chinese Academy of Sciences, Tianjin University
  • πŸ”— Link: [Code] [arXiv:2402.10882]
  • πŸ“ Note:

[2] SAFEGEN: Mitigating Unsafe Content Generation in Text-to-Image Models

  • πŸ§‘β€πŸ”¬ Author: Xinfeng Li, Yuchen Yang, Jiangyi Deng, Chen Yan, Yanjiao Chen, Xiaoyu Ji, Wenyuan Xu
  • 🏫 Affiliation: USSLAB, Zhejiang University, Johns Hopkins University
  • πŸ”— Link: [Code] [arXiv:2404.06666]
  • πŸ“ Note: πŸ”₯ ACM CCS 2024

[3] Adversarial Example Does Good: Preventing Painting Imitation from Diffusion Models via Adversarial Examples

  • πŸ§‘β€πŸ”¬ Author: Chumeng Liang, Xiaoyu Wu, Yang Hua, Jiaru Zhang, Yiming Xue, Tao Song, Zhengui Xue, Ruhui Ma, Haibing Guan
  • 🏫 Affiliation: USSLAB, Zhejiang University, Johns Hopkins University
  • πŸ”— Link: [Code] [arXiv:2302.04578]
  • πŸ“ Note: ICML 2023 (Oral)

[4] Anti-DreamBooth: Protecting Users from Personalized Text-to-Image Synthesis

  • πŸ§‘β€πŸ”¬ Author: Thanh Van Le, Hao Phung, Thuan Hoang Nguyen, Quan Dao, Ngoc Tran, Anh Tran
  • 🏫 Affiliation: VinAI Research, Vanderbilt University
  • πŸ”— Link: [Code] [arXiv:2303.15433]
  • πŸ“ Note: ICCV 2023

[5] Latent Guard: a Safety Framework for Text-to-image Generation

  • πŸ§‘β€πŸ”¬ Author: Runtao Liu, Ashkan Khakzar, Jindong Gu, Qifeng Chen, Philip Torr, and Fabio Pizzati
  • 🏫 Affiliation: Hong Kong University of Science and Technology, University of Oxford,
  • πŸ”— Link: [Code] [arXiv:2404.08031]
  • πŸ“ Note: ECCV 2024

[6] Defensive Unlearning with Adversarial Training for Robust Concept Erasure in Diffusion Models

  • πŸ§‘β€πŸ”¬ Author: Yimeng Zhang, Xin Chen, Jinghan Jia, Yihua Zhang, Chongyu Fan, Jiancheng Liu, Mingyi Hong, Ke Ding, Sijia Liu
  • 🏫 Affiliation: Michigan State University, Applied ML, Intel, University of Minnesota
  • πŸ”— Link: [Code] [arXiv:2405.15234]
  • πŸ“ Note:

[7] Margin-aware Preference Optimization for Aligning Diffusion Models without Reference

  • πŸ§‘β€πŸ”¬ Author: Jiwoo Hong, Sayak Paul, Noah Lee, Kashif Rasul, James Thorne, Jongheon Jeong
  • 🏫 Affiliation: KAIST AI, Hugging Face, Korea University
  • πŸ”— Link: [Code] [arXiv:2406.06424]
  • πŸ“ Note:

[8] Direct Unlearning Optimization for Robust and Safe Text-to-Image Models

  • πŸ§‘β€πŸ”¬ Author: Yong-Hyun Park, Sangdoo Yun, Jin-Hwa Kim, Junho Kim, Geonhui Jang, Yonghyun Jeong, Junghyo Jo, Gayoung Lee,
  • 🏫 Affiliation: Seoul National University, Korea University, NAVER AI Lab, NAVER Cloud, Korea Institute for Advanced Study (KIAS), AI Institute of Seoul National University or SNU AIIS
  • πŸ”— Link: [Code] [arXiv:2406.06424]
  • πŸ“ Note:

[9] Erasing Concepts from Diffusion Models

  • πŸ§‘β€πŸ”¬ Author: Rohit Gandikota, Joanna Materzynska, Jaden Fiotto-Kaufman, David Bau
  • 🏫 Affiliation: Northeastern University, Massachusetts Institute of Technology
  • πŸ”— Link: [Code] [arXiv:2303.07345][Project]
  • πŸ“ Note: ICCV 2023 oral

πŸ’‘ Evaluation on Jailbreak Text-to-Image Models

[0] UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images

  • πŸ§‘β€πŸ”¬ Author: Yiting Qu, Xinyue Shen, Yixin Wu, Michael Backes, Savvas Zannettou, Yang Zhang
  • 🏫 Affiliation: CISPA Helmholtz Center for Information Security, TU Delf
  • πŸ”— Link: [Code] [arXiv:2405.03486]
  • πŸ“ Note: 30 Apr 2024 Arxiv

[1] Unsafe Diffusion: On the Generation of Unsafe Images and Hateful Memes From Text-To-Image Models

  • πŸ§‘β€πŸ”¬ Author: Yiting Qu, Xinyue Shen, Xinlei He, Michael Backes, Savvas Zannettou, Yang Zhang
  • 🏫 Affiliation: CISPA Helmholtz Center for Information Security, Delft University of Technology
  • πŸ”— Link: [Code] [arXiv:2305.13873]
  • πŸ“ Note: πŸ”₯ (ACM CCS 2023)

πŸ’‘ Digital Watermarking for T2I Safety

[0] The Stable Signature: Rooting Watermarks in Latent Diffusion Models

  • πŸ§‘β€πŸ”¬ Author: Pierre Fernandez, Guillaume Couairon, HervΒ΄e JΒ΄egou, Matthijs Douze, Teddy Furon
  • 🏫 Affiliation: Meta AI, Centre Inria de l’UniversitΒ΄ e de Rennes, Sorbonne University
  • πŸ”— Link: [Code] [arXiv:2303.15435]
  • πŸ“ Note: ICCV 2023

[1] Tree-Rings Watermarks: Invisible Fingerprints for Diffusion Images

  • πŸ§‘β€πŸ”¬ Author: Yuxin Wen, John Kirchenbauer, Jonas Geiping, Tom Goldstein
  • 🏫 Affiliation: University of Maryland
  • πŸ”— Link: [Code] [NeurIPS 2023]
  • πŸ“ Note: NeurIPS 2023

[2] Gaussian Shading: Provable Performance-Lossless Image Watermarking for Diffusion Models

  • πŸ§‘β€πŸ”¬ Author: Zijin Yang, Kai Zeng, Kejiang Chen, Han Fang, Weiming Zhang, Nenghai Yu
  • 🏫 Affiliation: University of Science and Technology of China, National University of Singapore
  • πŸ”— Link: [Code] [arXiv:2404.04956]
  • πŸ“ Note: CVPR 2024

[3] EditGuard: Versatile Image Watermarking for Tamper Localization and Copyright Protection

  • πŸ§‘β€πŸ”¬ Author: Xuanyu Zhang, Runyi Li, Jiwen Yu, Youmin Xu, Weiqi Li, Jian Zhang
  • 🏫 Affiliation: Peking University
  • πŸ”— Link: [Code] [arXiv:2312.08883]
  • πŸ“ Note: CVPR 2024

[4] Purified and Unified Steganographic Network

  • πŸ§‘β€πŸ”¬ Author: Guobiao Li, Sheng Li, Zicong Luo, Zhenxing Qian, Xinpeng Zhang
  • 🏫 Affiliation: Peking University
  • πŸ”— Link: [Code] [arXiv:2402.17210]
  • πŸ“ Note: CVPR 2024

[5] ProMark: Proactive Diffusion Watermarking for Causal Attribution

  • πŸ§‘β€πŸ”¬ Author: Vishal Asnani, John Collomosse, Tu Bui, Xiaoming Liu, Shruti Agarwal
  • 🏫 Affiliation: Adobe Research, Michigan State University, University of Surrey
  • πŸ”— Link: [Code] [arXiv:2403.09914]
  • πŸ“ Note: CVPR 2024

[6] Performance-lossless Black-box Model Watermarking

  • πŸ§‘β€πŸ”¬ Author: Na Zhao, Kejiang Chen, Weiming Zhang, and Nenghai Yu
  • 🏫 Affiliation: University of Science and Technology of China
  • πŸ”— Link: [Code] [arXiv:2312.06488]
  • πŸ“ Note: IEEE TDSC 2023

[7] Detecting Voice Cloning Attacks via Timbre Watermarking

  • πŸ§‘β€πŸ”¬ Author: Chang Liu, Jie Zhang, Tianwei Zhang, Xi Yang, Weiming Zhang, Nenghai Yu
  • 🏫 Affiliation: University of Science and Technology of China, Nanyang Technological University
  • πŸ”— Link: [Code] [arXiv:2312.03410]
  • πŸ“ Note: NDSS 2024

[8] WavMark: Watermarking for Audio Generation

  • πŸ§‘β€πŸ”¬ Author: Guangyu Chen, Yu Wu, Shujie Liu, Tao Liu, Xiaoyong Du, Furu Wei
  • 🏫 Affiliation: Microsoft Research Asia, Renmin University of China
  • πŸ”— Link: [Code] [arXiv:2308.12770]
  • πŸ“ Note:

[9] Steal My Artworks for Fine-tuning? A Watermarking Framework for Detecting Art Theft Mimicry in Text-to-Image Models

  • πŸ§‘β€πŸ”¬ Author: Ge Luo, Junqiang Huang, Manman Zhang, Zhenxing Qian, Sheng Li, Xinpeng Zhang
  • 🏫 Affiliation: Fudan University
  • πŸ”— Link: [Code] [arXiv:2311.13619]
  • πŸ“ Note:

[10] Robust-Wide: Robust Watermarking against Instruction-driven Image Editing

  • πŸ§‘β€πŸ”¬ Author: Runyi Hu, Jie Zhang, Ting Xu, Tianwei Zhang, Jiwei Li
  • 🏫 Affiliation: Zhejiang University, Nanyang Technological University, University of Science and Technology of China
  • πŸ”— Link: [Code] [arXiv:2402.12688]
  • πŸ“ Note:

[11] Robust Image Watermarking using Stable Diffusion

  • πŸ§‘β€πŸ”¬ Author: Lijun Zhang, Xiao Liu, Antoni Viros Martin, Cindy Xiong Bearfield, Yuriy Brun, Hui Guan
  • 🏫 Affiliation: University of Massachusetts, IBM
  • πŸ”— Link: [Code] [arXiv:2401.04247]
  • πŸ“ Note:

[12] Proactive Detection of Voice Cloning with Localized Watermarking

  • πŸ§‘β€πŸ”¬ Author: Robin San Roman, Pierre Fernandez, Alexandre DΓ©fossez, Teddy Furon, Tuan Tran, Hady Elsahar
  • 🏫 Affiliation: FAIR, Meta
  • πŸ”— Link: [Code] [arXiv:2401.17264]
  • πŸ“ Note:

[13] A Watermark-Conditioned Diffusion Model for IP Protection

  • πŸ§‘β€πŸ”¬ Author: Rui Min, Sen Li, Hongyang Chen, Minhao Cheng
  • 🏫 Affiliation: Hong Kong University of Science and Technology, Zhejiang Lab, Pennsylvania State University
  • πŸ”— Link: [Code] [arXiv:2403.10893]
  • πŸ“ Note:

πŸ’‘ Attribution for T2I AIGC methods

[0] Watermark-based Detection and Attribution of AI-Generated Content

  • πŸ§‘β€πŸ”¬ Author: Zhengyuan Jiang, Moyang Guo, Yuepeng Hu, Neil Zhenqiang Gong
  • 🏫 Affiliation: Duke University
  • πŸ”— Link: [Code] [arXiv:2404.04254]
  • πŸ“ Note:

[1] Detecting Image Attribution for Text-to-Image Diffusion Models in RGB and Beyond

  • πŸ§‘β€πŸ”¬ Author: Katherine Xu, Lingzhi Zhang, Jianbo Shi
  • 🏫 Affiliation: University of Pennsylvania, Adobe Inc.
  • πŸ”— Link: [Code] [arXiv:2403.19653]
  • πŸ“ Note:

[2] Regeneration Based Training-free Attribution of Fake Images Generated by Text-to-Image Generative Models

  • πŸ§‘β€πŸ”¬ Author: Meiling Li, Zhenxing Qian, Xinpeng Zhang
  • 🏫 Affiliation: Fudan University
  • πŸ”— Link: [Code] [arXiv:2403.01489]
  • πŸ“ Note:

[3] Where Did I Come From? Origin Attribution of AI-Generated Images

  • πŸ§‘β€πŸ”¬ Author: Zhenting Wang, Chen Chen, Yi Zeng, Lingjuan Lyu, Shiqing Ma
  • 🏫 Affiliation: Rutgers University, Sony AI, Virginia Tech, University of Massachusetts Amherst
  • πŸ”— Link: [Code] [NeurIPS 23]
  • πŸ“ Note: NeurIPS 2023

Text-to-Video

πŸ’‘ Jailbreak Attack on Text-to-Video Models

[0] T2VSafetyBench: Evaluating the Safety of Text-to-Video Generative Models

  • πŸ§‘β€πŸ”¬ Author: Yibo Miao, Yifan Zhu, Yinpeng Dong, Lijia Yu, Jun Zhu, Xiao-Shan Gao
  • 🏫 Affiliation: Chinese Academy of Sciences, Tsinghua University
  • πŸ”— Link: [arXiv:2407.05965]
  • πŸ“ Note:

πŸ’‘ Defenses & Alignment on Text-to-Video Models

[0] SAFESORA: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset

  • πŸ§‘β€πŸ”¬ Author: Josef Dai, Tianle Chen, Xuyao Wang, Ziran Yang, Taiye Chen, Jiaming Ji, Yaodong Yang
  • 🏫 Affiliation: Peking University
  • πŸ”— Link: [homepage][code] arXiv:2406.14477]
  • πŸ“ Note:

[1] Towards Understanding Unsafe Video Generation

  • πŸ§‘β€πŸ”¬ Author: Yan Pang, Aiping Xiong, Yang Zhang, Tianhao Wang
  • 🏫 Affiliation: University of Virginia, Penn State University, CISPA Helmholtz Center for Information Security
  • πŸ”— Link: [code] arXiv:2407.12581]
  • πŸ“ Note:

πŸ‘ Acknowledgement

Thanks to the 3D-Gaussian-Splatting-Papers.

About

List of T2I safety papers, updated daily, welcome to discuss using Discussions

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published