Title | Link | Abstract |
---|---|---|
Robust Image Denoising through Adversarial Frequency Mixup | paper | Image denoising approaches based on deep neural networks often struggle with overfitting to specific noise distributions present in training data. This challenge persists in existing real-world denoising networks which are trained using a limited spectrum of real noise distributions and thus show poor robustness to out-of-distribution real noise types. To alleviate this issue we develop a novel training framework called Adversarial Frequency Mixup (AFM). AFM leverages mixup in the frequency domain to generate noisy images with distinctive and challenging noise characteristics all the while preserving the properties of authentic real-world noise. Subsequently incorporating these noisy images into the training pipeline enhances the denoising network's robustness to variations in noise distributions. Extensive experiments and analyses conducted on a wide range of real noise benchmarks demonstrate that denoising networks trained with our proposed framework exhibit significant improvements in robustness to unseen noise distributions. The code is available at https://github.com/dhryougit/AFM. |
Towards Robust 3D Pose Transfer with Adversarial Learning | paper | 3D pose transfer that aims to transfer the desired pose to a target mesh is one of the most challenging 3D generation tasks. Previous attempts rely on well-defined parametric human models or skeletal joints as driving pose sources. However to obtain those clean pose sources cumbersome but necessary pre-processing pipelines are inevitable hindering implementations of the real-time applications. This work is driven by the intuition that the robustness of the model can be enhanced by introducing adversarial samples into the training leading to a more invulnerable model to the noisy inputs which even can be further extended to directly handling the real-world data like raw point clouds/scans without intermediate processing. Furthermore we propose a novel 3D pose Masked Autoencoder (3D-PoseMAE) a customized MAE that effectively learns 3D extrinsic presentations (i.e. pose). 3D-PoseMAE facilitates learning from the aspect of extrinsic attributes by simultaneously generating adversarial samples that perturb the model and learning the arbitrary raw noisy poses via a multi-scale masking strategy. Both qualitative and quantitative studies show that the transferred meshes given by our network result in much better quality. Besides we demonstrate the strong generalizability of our method on various poses different domains and even raw scans. Experimental results also show meaningful insights that the intermediate adversarial samples generated in the training can successfully attack the existing pose transfer models. |
Structure-Guided Adversarial Training of Diffusion Models | paper | Diffusion models have demonstrated exceptional efficacy in various generative applications. While existing models focus on minimizing a weighted sum of denoising score matching losses for data distribution modeling their training primarily emphasizes instance-level optimization overlooking valuable structural information within each mini-batch indicative of pair-wise relationships among samples. To address this limitation we introduce Structure-guided Adversarial training of Diffusion Models (SADM). In this pioneering approach we compel the model to learn manifold structures between samples in each training batch. To ensure the model captures authentic manifold structures in the data distribution we advocate adversarial training of the diffusion generator against a novel structure discriminator in a minimax game distinguishing real manifold structures from the generated ones. SADM substantially outperforms existing methods in image generation and cross-domain fine-tuning tasks across 12 datasets establishing a new state-of-the-art FID of 1.58 and 2.11 on ImageNet for class-conditional image generation at resolutions of 256x256 and 512x512 respectively. |
Adversarial Text to Continuous Image Generation | paper | Existing GAN-based text-to-image models treat images as 2D pixel arrays. In this paper we approach the text-to-image task from a different perspective where a 2D image is represented as an implicit neural representation (INR). We show that straightforward conditioning of the unconditional INR-based GAN method on text inputs is not enough to achieve good performance. We propose a word-level attention-based weight modulation operator that controls the generation process of INR-GAN based on hypernetworks. Our experiments on benchmark datasets show that HyperCGAN achieves competitive performance to existing pixel-based methods and retains the properties of continuous generative models. |
ASAM: Boosting Segment Anything Model with Adversarial Tuning | paper | In the evolving landscape of computer vision foundation models have emerged as pivotal tools exhibiting exceptional adaptability to a myriad of tasks. Among these the Segment Anything Model (SAM) by Meta AI has distinguished itself in image segmentation. However SAM like its counterparts encounters limitations in specific niche applications prompting a quest for enhancement strategies that do not compromise its inherent capabilities. This paper introduces ASAM a novel methodology that amplifies SAM's performance through adversarial tuning. We harness the potential of natural adversarial examples inspired by their successful implementation in natural language processing. By utilizing a stable diffusion model we augment a subset (1%) of the SA-1B dataset generating adversarial instances that are more representative of natural variations rather than conventional imperceptible perturbations. Our approach maintains the photorealism of adversarial examples and ensures alignment with original mask annotations thereby preserving the integrity of the segmentation task. The fine-tuned ASAM demonstrates significant improvements across a diverse range of segmentation tasks without necessitating additional data or architectural modifications. The results of our extensive evaluations confirm that ASAM establishes new benchmarks in segmentation tasks thereby contributing to the advancement of foundational models in computer vision. Our project page is in https://asam2024.github.io/. |
Adversarial Score Distillation: When score distillation meets GAN | paper | Existing score distillation methods are sensitive to classifier-free guidance (CFG) scale manifested as over-smoothness or instability at small CFG scales while over-saturation at large ones. To explain and analyze these issues we revisit the derivation of Score Distillation Sampling (SDS) and decipher existing score distillation with the Wasserstein Generative Adversarial Network (WGAN) paradigm. With the WGAN paradigm we find that existing score distillation either employs a fixed sub-optimal discriminator or conducts incomplete discriminator optimization resulting in the scale-sensitive issue. We propose the Adversarial Score Distillation (ASD) which maintains an optimizable discriminator and updates it using the complete optimization objective. Experiments show that the proposed ASD performs favorably in 2D distillation and text-to-3D tasks against existing methods. Furthermore to explore the generalization ability of our paradigm we extend ASD to the image editing task which achieves competitive results. The project page and code are at https://github.com/2y7c3/ASD |
ACT-Diffusion: Efficient Adversarial Consistency Training for One-step Diffusion Models | paper | Though diffusion models excel in image generation their step-by-step denoising leads to slow generation speeds. Consistency training addresses this issue with single-step sampling but often produces lower-quality generations and requires high training costs. In this paper we show that optimizing consistency training loss minimizes the Wasserstein distance between target and generated distributions. As timestep increases the upper bound accumulates previous consistency training losses. Therefore larger batch sizes are needed to reduce both current and accumulated losses. We propose Adversarial Consistency Training (ACT) which directly minimizes the Jensen-Shannon (JS) divergence between distributions at each timestep using a discriminator. Theoretically ACT enhances generation quality and convergence. By incorporating a discriminator into the consistency training framework our method achieves improved FID scores on CIFAR10 and ImageNet 64x64 and LSUN Cat 256x256 datasets retains zero-shot image inpainting capabilities and uses less than 1/6 of the original batch size and fewer than 1/2 of the model parameters and training steps compared to the baseline method this leads to a substantial reduction in resource consumption. Our code is available: https://github.com/kong13661/ACT |
CAD: Photorealistic 3D Generation via Adversarial Distillation | paper | The increased demand for 3D data in AR/VR robotics and gaming applications gave rise to powerful generative pipelines capable of synthesizing high-quality 3D objects. Most of these models rely on the Score Distillation Sampling (SDS) algorithm to optimize a 3D representation such that the rendered image maintains a high likelihood as evaluated by a pre-trained diffusion model. However this distillation process involves finding a correct mode in the high-dimensional and large-variance distribution produced by the diffusion model. This task is challenging and often leads to issues such as over-saturation over-smoothing and Janus-like artifacts in the 3D generation. In this paper we propose a novel learning paradigm for 3D synthesis that utilizes pre-trained diffusion models. Instead of focusing on mode-seeking our method directly models the distribution discrepancy between multi-view renderings and diffusion priors in an adversarial manner which unlocks the generation of high-fidelity and photorealistic 3D content conditioned on a single image and prompt. Moreover by harnessing the latent space of GANs and expressive diffusion model priors our method enables a wide variety of 3D applications including single-view reconstruction high diversity generation and continuous 3D interpolation in open domain. Our experiments demonstrate the superiority of our pipeline compared to previous works in terms of generation quality and diversity. |
Adversarial Backdoor Attack by Naturalistic Data Poisoning on Trajectory Prediction in Autonomous Driving | paper | In autonomous driving behavior prediction is fundamental for safe motion planning hence the security and robustness of prediction models against adversarial attacks are of paramount importance. We propose a novel adversarial backdoor attack against trajectory prediction models as a means of studying their potential vulnerabilities. Our attack affects the victim at training time via naturalistic hence stealthy poisoned samples crafted using a novel two-step approach. First the triggers are crafted by perturbing the trajectory of attacking vehicle and then disguised by transforming the scene using a bi-level optimization technique. The proposed attack does not depend on a particular model architecture and operates in a black-box manner thus can be effective without any knowledge of the victim model. We conduct extensive empirical studies using state-of-the-art prediction models on two benchmark datasets using metrics customized for trajectory prediction. We show that the proposed attack is highly effective as it can significantly hinder the performance of prediction models unnoticeable by the victims and efficient as it forces the victim to generate malicious behavior even under constrained conditions. Via ablative studies we analyze the impact of different attack design choices followed by an evaluation of existing defence mechanisms against the proposed attack. |
Structured Gradient-based Interpretations via Norm-Regularized Adversarial Training | paper | Gradient-based saliency maps have been widely used to explain the decisions of deep neural network classifiers. However standard gradient-based interpretation maps including the simple gradient and integrated gradient algorithms often lack desired structures such as sparsity and connectedness in their application to real-world computer vision models. A common approach to induce sparsity-based structures into gradient-based saliency maps is to modify the simple gradient scheme using sparsification or norm-based regularization. However one drawback with such post-processing approaches is the potentially significant loss in fidelity to the original simple gradient map. In this work we propose to apply adversarial training as an in-processing scheme to train neural networks with structured simple gradient maps. We demonstrate an existing duality between the regularized norms of the adversarial perturbations and gradient-based maps whereby we design adversarial training schemes promoting sparsity and group-sparsity properties in simple gradient maps. We present comprehensive numerical results to show the influence of our proposed norm-based adversarial training methods on the standard gradient-based maps of standard neural network architectures on benchmark image datasets. |
Focus on Hiders: Exploring Hidden Threats for Enhancing Adversarial Training | paper | Adversarial training is often formulated as a min-max problem however concentrating only on the worst adversarial examples causes alternating repetitive confusion of the model i.e. previously defended or correctly classified samples are not defensible or accurately classifiable in subsequent adversarial training. We characterize such non-ignorable samples as "hiders" which reveal the hidden high-risk regions within the secure area obtained through adversarial training and prevent the model from finding the real worst cases. We demand the model to prevent hiders when defending against adversarial examples for improving accuracy and robustness simultaneously. By rethinking and redefining the min-max optimization problem for adversarial training we propose a generalized adversarial training algorithm called Hider-Focused Adversarial Training (HFAT). HFAT introduces the iterative evolution optimization strategy to simplify the optimization problem and employs an auxiliary model to reveal hiders effectively combining the optimization directions of standard adversarial training and prevention hiders. Furthermore we introduce an adaptive weighting mechanism that facilitates the model in adaptively adjusting its focus between adversarial examples and hiders during different training periods. We demonstrate the effectiveness of our method based on extensive experiments and ensure that HFAT can provide higher robustness and accuracy. We will release the source code upon publication. |
Defense without Forgetting: Continual Adversarial Defense with Anisotropic & Isotropic Pseudo Replay | paper | Deep neural networks have demonstrated susceptibility to adversarial attacks. Adversarial defense techniques often focus on one-shot setting to maintain robustness against attack. However new attacks can emerge in sequences in real-world deployment scenarios. As a result it is crucial for a defense model to constantly adapt to new attacks but the adaptation process can lead to catastrophic forgetting of previously defended against attacks. In this paper we discuss for the first time the concept of continual adversarial defense under a sequence of attacks and propose a lifelong defense baseline called Anisotropic & Isotropic Replay (AIR) which offers three advantages: (1) Isotropic replay ensures model consistency in the neighborhood distribution of new data indirectly aligning the output preference between old and new tasks. (2) Anisotropic replay enables the model to learn a compromise data manifold with fresh mixed semantics for further replay constraints and potential future attacks. (3) A straightforward regularizer mitigates the 'plasticity-stability' trade-off by aligning model output between new and old tasks. Experiment results demonstrate that AIR can approximate or even exceed the empirical performance upper bounds achieved by Joint Training. |
Transferable Structural Sparse Adversarial Attack Via Exact Group Sparsity Training | paper | Deep neural networks (DNNs) are vulnerable to highly transferable adversarial attacks. Especially many studies have shown that sparse attacks pose a significant threat to DNNs on account of their exceptional imperceptibility. Current sparse attack methods mostly limit only the magnitude and number of perturbations while generally overlooking the location of the perturbations resulting in decreased performances on attack transferability. A subset of studies indicates that perturbations existing in the significant regions with rich classification-relevant features are more effective. Leveraging this insight we introduce the structural sparsity constraint in the framework of generative models to limit the perturbation positions. To ensure that the perturbations are generated towards classification-relevant regions we propose an exact group sparsity training method to learn pixel-level and group-level sparsity. For purpose of improving the effectiveness of sparse training we further put forward masked quantization network and multi-stage optimization algorithm in the training process. Utilizing CNNs as surrogate models extensive experiments demonstrate that our method has higher transferability in image classification attack compared to state-of-the-art methods at approximately same sparsity levels. In cross-model ViT object detection and semantic segmentation attack tasks we also achieve a better attack success rate. Code is available at https://github.com/MisterRpeng/EGS-TSSA. |
Physical 3D Adversarial Attacks against Monocular Depth Estimation in Autonomous Driving | paper | Deep learning-based monocular depth estimation (MDE) extensively applied in autonomous driving is known to be vulnerable to adversarial attacks. Previous physical attacks against MDE models rely on 2D adversarial patches so they only affect a small localized region in the MDE map but fail under various viewpoints. To address these limitations we propose 3D Depth Fool (3D^2Fool) the first 3D texture-based adversarial attack against MDE models. 3D^2Fool is specifically optimized to generate 3D adversarial textures agnostic to model types of vehicles and to have improved robustness in bad weather conditions such as rain and fog. Experimental results validate the superior performance of our 3D^2Fool across various scenarios including vehicles MDE models weather conditions and viewpoints. Real-world experiments with printed 3D textures on physical vehicle models further demonstrate that our 3D^2Fool can cause an MDE error of over 10 meters. |
Improving Transferable Targeted Adversarial Attacks with Model Self-Enhancement | paper | Various transfer attack methods have been proposed to evaluate the robustness of deep neural networks (DNNs). Although manifesting remarkable performance in generating untargeted adversarial perturbations existing proposals still fail to achieve high targeted transferability. In this work we discover that the adversarial perturbations' overfitting towards source models of mediocre generalization capability can hurt their targeted transferability. To address this issue we focus on enhancing the source model's generalization capability to improve its ability to conduct transferable targeted adversarial attacks. In pursuit of this goal we propose a novel model self-enhancement method that incorporates two major components: Sharpness-Aware Self-Distillation (SASD) and Weight Scaling (WS). Specifically SASD distills a fine-tuned auxiliary model which mirrors the source model's structure into the source model while flattening the source model's loss landscape. WS obtains an approximate ensemble of numerous pruned models to perform model augmentation which can be conveniently synergized with SASD to elevate the source model's generalization capability and thus improve the resultant targeted perturbations' transferability. Extensive experiments corroborate the effectiveness of the proposed method. Notably under the black-box setting our approach can outperform the state-of-the-art baselines by a significant margin of 12.2% on average in terms of the obtained targeted transferability. Code is available at https://github.com/g4alllf/SASD. |
Revisiting Adversarial Training at Scale | paper | The machine learning community has witnessed a drastic change in the training pipeline pivoted by those "foundation models" with unprecedented scales. However the field of adversarial training is lagging behind predominantly centered around small model sizes like ResNet-50 and tiny and low-resolution datasets like CIFAR-10. To bridge this transformation gap this paper provides a modern re-examination with adversarial training investigating its potential benefits when applied at scale. Additionally we introduce an efficient and effective training strategy to enable adversarial training with giant models and web-scale data at an affordable computing cost. We denote this newly introduced framework as AdvXL. Empirical results demonstrate that AdvXL establishes new state-of-the-art robust accuracy records under AutoAttack on ImageNet-1K. For example by training on DataComp-1B dataset our AdvXL empowers a vanilla ViT-g model to substantially surpass the previous records of l_ infinity - l_ 2 - and l_ 1 -robust accuracy by margins of 11.4% 14.2% and 12.9% respectively. This achievement posits AdvXL as a pioneering approach charting a new trajectory for the efficient training of robust visual representations at significantly larger scales. Our code is available at https://github.com/UCSC-VLAA/AdvXL. |
Towards Fairness-Aware Adversarial Learning | paper | Although adversarial training (AT) has proven effective in enhancing the model's robustness the recently revealed issue of fairness in robustness has not been well addressed i.e. the robust accuracy varies significantly among different categories. In this paper instead of uniformly evaluating the model's average class performance we delve into the issue of robust fairness by considering the worst-case distribution across various classes. We propose a novel learning paradigm named Fairness-Aware Adversarial Learning (FAAL). As a generalization of conventional AT we re-define the problem of adversarial training as a min-max-max framework to ensure both robustness and fairness of the trained model. Specifically by taking advantage of distributional robust optimization our method aims to find the worst distribution among different categories and the solution is guaranteed to obtain the upper bound performance with high probability. In particular FAAL can fine-tune an unfair robust model to be fair within only two epochs without compromising the overall clean and robust accuracies. Extensive experiments on various image datasets validate the superior performance and efficiency of the proposed FAAL compared to other state-of-the-art methods. |
Towards Understanding and Improving Adversarial Robustness of Vision Transformers | paper | Recent literature has demonstrated that vision transformers (VITs) exhibit superior performance compared to convolutional neural networks (CNNs). The majority of recent research on adversarial robustness however has predominantly focused on CNNs. In this work we bridge this gap by analyzing the effectiveness of existing attacks on VITs. We demonstrate that due to the softmax computations in every attention block in VITs they are inherently vulnerable to floating point underflow errors. This can lead to a gradient masking effect resulting in suboptimal attack strength of well-known attacks like PGD Carlini and Wagner (CW) GAMA and Patch attacks. Motivated by this we propose Adaptive Attention Scaling (AAS) attack that can automatically find the optimal scaling factors of pre-softmax outputs using gradient-based optimization. We show that the proposed simple strategy can be incorporated with any existing adversarial attacks as well as adversarial training methods and achieved improved performance. On VIT-B16 we demonstrate an improved attack strength of upto 2.2% on CIFAR10 and upto 2.9% on CIFAR100 by incorporating the proposed AAS attack with state-of-the-art single attack methods like GAMA attack. Further we utilise the proposed AAS attack for every few epochs in existing adversarial training methods which is termed as Adaptive Attention Scaling Adversarial Training (AAS-AT). On incorporating AAS-AT with existing methods we outperform them on VITs over 1.3-3.5% on CIFAR10. We observe improved performance on ImageNet-100 as well. |
Adversarially Robust Few-shot Learning via Parameter Co-distillation of Similarity and Class Concept Learners | paper | Few-shot learning (FSL) facilitates a variety of computer vision tasks yet remains vulnerable to adversarial attacks. Existing adversarially robust FSL methods rely on either visual similarity learning or class concept learning. Our analysis reveals that these two learning paradigms are complementary exhibiting distinct robustness due to their unique decision boundary types (concepts clustering by the visual similarity label vs. classification by the class labels). To bridge this gap we propose a novel framework unifying adversarially robust similarity learning and class concept learning. Specifically we distill parameters from both network branches into a "unified embedding model" during robust optimization and redistribute them to individual network branches periodically. To capture generalizable robustness across diverse branches we initialize adversaries in each episode with cross-branch class-wise "global adversarial perturbations" instead of less informative random initialization. We also propose a branch robustness harmonization to modulate the optimization of similarity and class concept learners via their relative adversarial robustness. Extensive experiments demonstrate the state-of-the-art performance of our method in diverse few-shot scenarios. |
Revisiting Adversarial Training Under Long-Tailed Distributions | paper | Deep neural networks are vulnerable to adversarial attacks leading to erroneous outputs. Adversarial training has been recognized as one of the most effective methods to counter such attacks. However existing adversarial training techniques have predominantly been evaluated on balanced datasets whereas real-world data often exhibit a long-tailed distribution casting doubt on the efficacy of these methods in practical scenarios. In this paper we delve into the performance of adversarial training under long-tailed distributions. Through an analysis of the prior method "RoBal" (Wu et al. CVPR'21) we discover that utilizing Balanced Softmax Loss (BSL) alone can obtain comparable performance to the complete RoBal approach while significantly reducing the training overhead. Then we reveal that adversarial training under long-tailed distributions also suffers from robust overfitting similar to uniform distributions. We explore utilizing data augmentation to mitigate this issue and unexpectedly discover that unlike results obtained with balanced data data augmentation not only effectively alleviates robust overfitting but also significantly improves robustness. We further identify that the improvement is attributed to the increased diversity of training data. Extensive experiments further corroborate that data augmentation alone can significantly improve robustness. Finally building on these findings we demonstrate that compared to RoBal the combination of BSL and data augmentation leads to a +6.66% improvement in model robustness under AutoAttack on CIFAR-10-LT. Our code is available at: https://github.com/NISPLab/AT-BSL. |
Adversarial Distillation Based on Slack Matching and Attribution Region Alignment | paper | Adversarial distillation (AD) is a highly effective method for enhancing the robustness of small models. Contrary to expectations a high-performing teacher model does not always result in a more robust student model. This is due to two main reasons. First when there are significant differences in predictions between the teacher model and the student model exact matching of predicted values using KL divergence interferes with training leading to poor performance of existing methods. Second matching solely based on the output prevents the student model from fully understanding the behavior of the teacher model. To address these challenges this paper proposes a novel AD method named SmaraAD. During the training process we facilitate the student model in better understanding the teacher model's behavior by aligning the attribution region that the student model focuses on with that of the teacher model. Concurrently we relax the condition of exact matching in KL divergence and replace it with a more flexible matching criterion thereby enhancing the model's robustness. Extensive experiments substantiate the effectiveness of our method in improving the robustness of small models outperforming previous SOTA methods. |
Robust Distillation via Untargeted and Targeted Intermediate Adversarial Samples | paper | Adversarially robust knowledge distillation aims to compress large-scale models into lightweight models while preserving adversarial robustness and natural performance on a given dataset. Existing methods typically align probability distributions of natural and adversarial samples between teacher and student models but they overlook intermediate adversarial samples along the "adversarial path" formed by the multi-step gradient ascent of a sample towards the decision boundary. Such paths capture rich information about the decision boundary. In this paper we propose a novel adversarially robust knowledge distillation approach by incorporating such adversarial paths into the alignment process. Recognizing the diverse impacts of intermediate adversarial samples (ranging from benign to noisy) we propose an adaptive weighting strategy to selectively emphasize informative adversarial samples thus ensuring efficient utilization of lightweight model capacity. Moreover we propose a dual-branch mechanism exploiting two following insights: (i) complementary dynamics of adversarial paths obtained by targeted and untargeted adversarial learning and (ii) inherent differences between the gradient ascent path from class c_i towards the nearest class boundary and the gradient descent path from a specific class c_j towards the decision region of c_i (i \neq j). Comprehensive experiments demonstrate the effectiveness of our method on lightweight models under various settings. |
Soften to Defend: Towards Adversarial Robustness via Self-Guided Label Refinement | paper | Adversarial training (AT) is currently one of the most effective ways to obtain the robustness of deep neural networks against adversarial attacks. However most AT methods suffer from robust overfitting i.e. a significant generalization gap in adversarial robustness between the training and testing curves. In this paper we first identify a connection between robust overfitting and the excessive memorization of noisy labels in AT from a view of gradient norm. As such label noise is mainly caused by a distribution mismatch and improper label assignments we are motivated to propose a label refinement approach for AT. Specifically our Self-Guided Label Refinement first self-refines a more accurate and informative label distribution from over-confident hard labels and then it calibrates the training by dynamically incorporating knowledge from self-distilled models into the current model and thus requiring no external teachers. Empirical results demonstrate that our method can simultaneously boost the standard accuracy and robust performance across multiple benchmark datasets attack types and architectures. In addition we also provide a set of analyses from the perspectives of information theory to dive into our method and suggest the importance of soft labels for robust generalization. |
PeerAiD: Improving Adversarial Distillation from a Specialized Peer Tutor | paper | Adversarial robustness of the neural network is a significant concern when it is applied to security-critical domains. In this situation adversarial distillation is a promising option which aims to distill the robustness of the teacher network to improve the robustness of a small student network. Previous works pretrain the teacher network to make it robust against the adversarial examples aimed at itself. However the adversarial examples are dependent on the parameters of the target network. The fixed teacher network inevitably degrades its robustness against the unseen transferred adversarial examples which target the parameters of the student network in the adversarial distillation process. We propose PeerAiD to make a peer network learn the adversarial examples of the student network instead of adversarial examples aimed at itself. PeerAiD is an adversarial distillation that trains the peer network and the student network simultaneously in order to specialize the peer network for defending the student network. We observe that such peer networks surpass the robustness of the pretrained robust teacher model against adversarial examples aimed at the student network. With this peer network and adversarial distillation PeerAiD achieves significantly higher robustness of the student network with AutoAttack (AA) accuracy by up to 1.66%p and improves the natural accuracy of the student network by up to 4.72%p with ResNet-18 on TinyImageNet dataset. Code is available at https://github.com/jaewonalive/PeerAiD. |
Towards Transferable Targeted 3D Adversarial Attack in the Physical World | paper | Compared with transferable untargeted attacks transferable targeted adversarial attacks could specify the misclassification categories of adversarial samples posing a greater threat to security-critical tasks. In the meanwhile 3D adversarial samples due to their potential of multi-view robustness can more comprehensively identify weaknesses in existing deep learning systems possessing great application value. However the field of transferable targeted 3D adversarial attacks remains vacant. The goal of this work is to develop a more effective technique that could generate transferable targeted 3D adversarial examples filling the gap in this field. To achieve this goal we design a novel framework named TT3D that could rapidly reconstruct from few multi-view images into Transferable Targeted 3D textured meshes. While existing mesh-based texture optimization methods compute gradients in the high-dimensional mesh space and easily fall into local optima leading to unsatisfactory transferability and distinct distortions TT3D innovatively performs dual optimization towards both feature grid and Multi-layer Perceptron (MLP) parameters in the grid-based NeRF space which significantly enhances black-box transferability while enjoying naturalness. Experimental results show that TT3D not only exhibits superior cross-model transferability but also maintains considerable adaptability across different renders and vision tasks. More importantly we produce 3D adversarial examples with 3D printing techniques in the real world and verify their robust performance under various scenarios. |
Random Entangled Tokens for Adversarially Robust Vision Transformer | paper | Vision Transformers (ViTs) have emerged as a compelling alternative to Convolutional Neural Networks (CNNs) in the realm of computer vision showcasing tremendous potential. However recent research has unveiled a susceptibility of ViTs to adversarial attacks akin to their CNN counterparts. Adversarial training and randomization are two representative effective defenses for CNNs. Some researchers have attempted to apply adversarial training to ViTs and achieved comparable robustness to CNNs while it is not easy to directly apply randomization to ViTs because of the architecture difference between CNNs and ViTs. In this paper we delve into the structural intricacies of ViTs and propose a novel defense mechanism termed Random entangled image Transformer (ReiT) which seamlessly integrates adversarial training and randomization to bolster the adversarial robustness of ViTs. Recognizing the challenge posed by the structural disparities between ViTs and CNNs we introduce a novel module input-independent random entangled self-attention (II-ReSA). This module optimizes random entangled tokens that lead to "dissimilar" self-attention outputs by leveraging model parameters and the sampled random tokens thereby synthesizing the self-attention module outputs and random entangled tokens to diminish adversarial similarity. ReiT incorporates two distinct random entangled tokens and employs dual randomization offering an effective countermeasure against adversarial examples while ensuring comprehensive deduction guarantees. Through extensive experiments conducted on various ViT variants and benchmarks we substantiate the superiority of our proposed method in enhancing the adversarial robustness of Vision Transformers. |
Watermark-embedded Adversarial Examples for Copyright Protection against Diffusion Models | paper | Diffusion Models (DMs) have shown remarkable capabilities in various image-generation tasks. However there are growing concerns that DMs could be used to imitate unauthorized creations and thus raise copyright issues. To address this issue we propose a novel framework that embeds personal watermarks in the generation of adversarial examples. Such examples can force DMs to generate images with visible watermarks and prevent DMs from imitating unauthorized images. We construct a generator based on conditional adversarial networks and design three losses (adversarial loss GAN loss and perturbation loss) to generate adversarial examples that have subtle perturbation but can effectively attack DMs to prevent copyright violations. Training a generator for a personal watermark by our method only requires 5-10 samples within 2-3 minutes and once the generator is trained it can generate adversarial examples with that watermark significantly fast (0.2s per image). We conduct extensive experiments in various conditional image-generation scenarios. Compared to existing methods that generate images with chaotic textures our method adds visible watermarks on the generated images which is a more straightforward way to indicate copyright violations. We also observe that our adversarial examples exhibit good transferability across unknown generative models. Therefore this work provides a simple yet powerful way to protect copyright from DM-based imitation. |
Boosting Adversarial Transferability by Block Shuffle and Rotation | paper | Adversarial examples mislead deep neural networks with imperceptible perturbations and have brought significant threats to deep learning. An important aspect is their transferability which refers to their ability to deceive other models thus enabling attacks in the black-box setting. Though various methods have been proposed to boost transferability the performance still falls short compared with white-box attacks. In this work we observe that existing input transformation based attacks one of the mainstream transfer-based attacks result in different attention heatmaps on various models which might limit the transferability. We also find that breaking the intrinsic relation of the image can disrupt the attention heatmap of the original image. Based on this finding we propose a novel input transformation based attack called block shuffle and rotation (BSR). Specifically BSR splits the input image into several blocks then randomly shuffles and rotates these blocks to construct a set of new images for gradient calculation. Empirical evaluations on the ImageNet dataset demonstrate that BSR could achieve significantly better transferability than the existing input transformation based methods under single-model and ensemble-model settings. Combining BSR with the current input transformation method can further improve the transferability which significantly outperforms the state-of-the-art methods. Code is available at https://github.com/Trustworthy-AI-Group/BSR. |
On the Robustness of Large Multimodal Models Against Image Adversarial Attacks | paper | Recent advances in instruction tuning have led to the development of State-of-the-Art Large Multimodal Models (LMMs). Given the novelty of these models the impact of visual adversarial attacks on LMMs has not been thoroughly examined. We conduct a comprehensive study of the robustness of various LMMs against different adversarial attacks evaluated across tasks including image classification image captioning and Visual Question Answer (VQA). We find that in general LMMs are not robust to visual adversarial inputs. However our findings suggest that context provided to the model via prompts--such as questions in a QA pair--helps to mitigate the effects of visual adversarial inputs. Notably the LMMs evaluated demonstrated remarkable resilience to such attacks on the ScienceQA task with only an 8.10% drop in performance compared to their visual counterparts which dropped 99.73%. We also propose a new approach to real-world image classification which we term query decomposition. By incorporating existence queries into our input prompt we observe diminished attack effectiveness and improvements in image classification accuracy. This research highlights a previously under explored facet of LMM robustness and sets the stage for future work aimed at strengthening the resilience of multimodal systems in adversarial environments. |
One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained Vision-Language Models | paper | Large pre-trained Vision-Language Models (VLMs) like CLIP despite having remarkable generalization ability are highly vulnerable to adversarial examples. This work studies the adversarial robustness of VLMs from the novel perspective of the text prompt instead of the extensively studied model weights (frozen in this work). We first show that the effectiveness of both adversarial attack and defense are sensitive to the used text prompt. Inspired by this we propose a method to improve resilience to adversarial attacks by learning a robust text prompt for VLMs. The proposed method named Adversarial Prompt Tuning (APT) is effective while being both computationally and data efficient. Extensive experiments are conducted across 15 datasets and 4 data sparsity schemes (from 1-shot to full training data settings) to show APT's superiority over hand-engineered prompts and other state-of-the-art adaption methods. APT demonstrated excellent abilities in terms of the in-distribution performance and the generalization under input distribution shift and across datasets. Surprisingly by simply adding one learned word to the prompts APT can significantly boost the accuracy and robustness (epsilon=4/255) over the hand-engineered prompts by +13% and +8.5% on average respectively. The improvement further increases in our most effective setting to +26.4% for accuracy and +16.7% for robustness. Code is available at https://github.com/TreeLLi/APT. |
Language-Driven Anchors for Zero-Shot Adversarial Robustness | paper | Deep Neural Networks (DNNs) are known to be susceptible to adversarial attacks. Previous researches mainly focus on improving adversarial robustness in the fully supervised setting leaving the challenging domain of zero-shot adversarial robustness an open question. In this work we investigate this domain by leveraging the recent advances in large vision-language models such as CLIP to introduce zero-shot adversarial robustness to DNNs. We propose LAAT a Language-driven Anchor-based Adversarial Training strategy. LAAT utilizes the features of a text encoder for each category as fixed anchors (normalized feature embeddings) for each category which are then employed for adversarial training. By leveraging the semantic consistency of the text encoders LAAT aims to enhance the adversarial robustness of the image model on novel categories. However naively using text encoders leads to poor results. Through analysis we identified the issue to be the high cosine similarity between text encoders. We then design an expansion algorithm and an alignment cross-entropy loss to alleviate the problem. Our experimental results demonstrated that LAAT significantly improves zero-shot adversarial robustness over state-of-the-art methods. LAAT has the potential to enhance adversarial robustness by large-scale multimodal models especially when labeled data is unavailable during training. |
SlowFormer: Adversarial Attack on Compute and Energy Consumption of Efficient Vision Transformers | paper | Recently there has been a lot of progress in reducing the computation of deep models at inference time. These methods can reduce both the computational needs and power usage of deep models. Some of these approaches adaptively scale the compute based on the input instance. We show that such models can be vulnerable to a universal adversarial patch attack where the attacker optimizes for a patch that when pasted on any image can increase the compute and power consumption of the model. We run experiments with three different efficient vision transformer methods showing that in some cases the attacker can increase the computation to the maximum possible level by simply pasting a patch that occupies only 8% of the image area. We also show that a standard adversarial training defense method can reduce some of the attack's success. We believe adaptive efficient methods will be necessary for the future to lower the power usage of expensive deep models so we hope our paper encourages the community to study the robustness of these methods and develop better defense methods for the proposed attack. Code is available at: https://github.com/UCDvision/SlowFormer. |
Defense Against Adversarial Attacks on No-Reference Image Quality Models with Gradient Norm Regularization | paper | The task of No-Reference Image Quality Assessment (NR-IQA) is to estimate the quality score of an input image without additional information. NR-IQA models play a crucial role in the media industry aiding in performance evaluation and optimization guidance. However these models are found to be vulnerable to adversarial attacks which introduce imperceptible perturbations to input images resulting in significant changes in predicted scores. In this paper we propose a defense method to mitigate the variability in predicted scores caused by small perturbations thus enhancing the adversarial robustness of NR-IQA models. To be specific we present theoretical evidence showing that the extent of score changes is related to the l_1 norm of the gradient of the predicted score with respect to the input image when adversarial perturbations are l_inf-bounded. Building on this theoretical foundation we propose a norm regularization training strategy aimed at reducing the l_1 norm of the gradient thereby boosting the adversarial robustness of NR-IQA models. Experiments conducted on four NR-IQA baseline models demonstrate the effectiveness of our strategy in reducing score changes in the presence of adversarial attacks. To the best of our knowledge this work marks the first attempt to defend against adversarial attacks on NR-IQA models. Our study offers valuable insights into the adversarial robustness of NR-IQA models and provides a foundation for future research in this area. |
NAPGuard: Towards Detecting Naturalistic Adversarial Patches | paper | Recently the emergence of naturalistic adversarial patch (NAP) which possesses a deceptive appearance and various representations underscores the necessity of developing robust detection strategies. However existing approaches fail to differentiate the deep-seated natures in adversarial patches i.e. aggressiveness and naturalness leading to unsatisfactory precision and generalization against NAPs. To tackle this issue we propose NAPGuard to provide strong detection capability against NAPs via the elaborated critical feature modulation framework. For improving precision we propose the aggressive feature aligned learning to enhance the model's capability in capturing accurate aggressive patterns. Considering the challenge of inaccurate model learning caused by deceptive appearance we align the aggressive features by the proposed pattern alignment loss during training. Since the model could learn more accurate aggressive patterns it is able to detect deceptive patches more precisely. To enhance generalization we design the natural feature suppressed inference to universally mitigate the disturbance from different NAPs. Since various representations arise in diverse disturbing forms to hinder generalization we suppress the natural features in a unified approach via the feature shield module. Therefore the models could recognize NAPs within less disturbance and activate the generalized detection ability. Extensive experiments show that our method surpasses state-of-the-art methods by large margins in detecting NAPs (improve 60.24% [email protected] on average). |
PAD: Patch-Agnostic Defense against Adversarial Patch Attacks | paper | Adversarial patch attacks present a significant threat to real-world object detectors due to their practical feasibility. Existing defense methods which rely on attack data or prior knowledge struggle to effectively address a wide range of adversarial patches. In this paper we show two inherent characteristics of adversarial patches semantic independence and spatial heterogeneity independent of their appearance shape size quantity and location. Semantic independence indicates that adversarial patches operate autonomously within their semantic context while spatial heterogeneity manifests as distinct image quality of the patch area that differs from original clean image due to the independent generation process. Based on these observations we propose PAD a novel adversarial patch localization and removal method that does not require prior knowledge or additional training. PAD offers patch-agnostic defense against various adversarial patches compatible with any pre-trained object detectors. Our comprehensive digital and physical experiments involving diverse patch types such as localized noise printable and naturalistic patches exhibit notable improvements over state-of-the-art works. Our code is available at https://github.com/Lihua-Jing/PAD. |
Strong Transferable Adversarial Attacks via Ensembled Asymptotically Normal Distribution Learning | paper | Strong adversarial examples are crucial for evaluating and enhancing the robustness of deep neural networks. However the performance of popular attacks is usually sensitive for instance to minor image transformations stemming from limited information -- typically only one input example a handful of white-box source models and undefined defense strategies. Hence the crafted adversarial examples are prone to overfit the source model which hampers their transferability to unknown architectures. In this paper we propose an approach named Multiple Asymptotically Normal Distribution Attacks (MultiANDA) which explicitly characterize adversarial perturbations from a learned distribution. Specifically we approximate the posterior distribution over the perturbations by taking advantage of the asymptotic normality property of stochastic gradient ascent (SGA) then employ the deep ensemble strategy as an effective proxy for Bayesian marginalization in this process aiming to estimate a mixture of Gaussians that facilitates a more thorough exploration of the potential optimization space. The approximated posterior essentially describes the stationary distribution of SGA iterations which captures the geometric information around the local optimum. Thus MultiANDA allows drawing an unlimited number of adversarial perturbations for each input and reliably maintains the transferability. Our proposed method outperforms ten state-of-the-art black-box attacks on deep learning models with or without defenses through extensive experiments on seven normally trained and seven defense models. |
Robust Overfitting Does Matter: Test-Time Adversarial Purification With FGSM | paper | Numerous studies have demonstrated the susceptibility of deep neural networks (DNNs) to subtle adversarial perturbations prompting the development of many advanced adversarial defense methods aimed at mitigating adversarial attacks. Current defense strategies usually train DNNs for a specific adversarial attack method and can achieve good robustness in defense against this type of adversarial attack. Nevertheless when subjected to evaluations involving unfamiliar attack modalities empirical evidence reveals a pronounced deterioration in the robustness of DNNs. Meanwhile there is a trade-off between the classification accuracy of clean examples and adversarial examples. Most defense methods often sacrifice the accuracy of clean examples in order to improve the adversarial robustness of DNNs. To alleviate these problems and enhance the overall robust generalization of DNNs we propose the Test-Time Pixel-Level Adversarial Purification (TPAP) method. This approach is based on the robust overfitting characteristic of DNNs to the fast gradient sign method (FGSM) on training and test datasets. It utilizes FGSM for adversarial purification to process images for purifying unknown adversarial perturbations from pixels at testing time in a "counter changes with changelessness" manner thereby enhancing the defense capability of DNNs against various unknown adversarial attacks. Extensive experimental results show that our method can effectively improve both overall robust generalization of DNNs notably over previous methods. Code is available https://github.com/tly18/TPAP. |
Semantic-Aware Multi-Label Adversarial Attacks | paper | Despite its importance generating attacks for multi label learning (MLL) models has received much less attention compared to multi-class recognition. Attacking an MLL model by optimizing a loss on the target set of labels has often the undesired consequence of changing the predictions for other labels. On the other hand adding a loss on the remaining labels to keep them fixed leads to highly negatively correlated gradient directions reducing the attack effectiveness. In this paper we develop a framework for crafting effective and semantic aware adversarial attacks for MLL. First to obtain an attack that leads to semantically consistent predictions across all labels we find a minimal superset of the target labels referred to as consistent target set. To do so we develop an efficient search algorithm over a knowledge graph which encodes label dependencies. Next we propose an optimization that searches for an attack that modifies the predictions of labels in the consistent target set while ensuring other labels will not get affected. This leads to an efficient algorithm that projects the gradient of the consistent target set loss onto the orthogonal direction of the gradient of the loss on other labels. Our framework can generate attacks on different target set sizes and for MLL with thousands of labels (as in OpenImages). Finally by extensive experiments on three datasets and several MLL models we show that our method generates both successful and semantically consistent attacks. |
Learning to Transform Dynamically for Better Adversarial Transferability | paper | Adversarial examples crafted by adding perturbations imperceptible to humans can deceive neural networks. Recent studies identify the adversarial transferability across various models i.e. the cross-model attack ability of adversarial samples. To enhance such adversarial transferability existing input transformation-based methods diversify input data with transformation augmentation. However their effectiveness is limited by the finite number of available transformations. In our study we introduce a novel approach named Learning to Transform (L2T). L2T increases the diversity of transformed images by selecting the optimal combination of operations from a pool of candidates consequently improving adversarial transferability. We conceptualize the selection of optimal transformation combinations as a trajectory optimization problem and employ a reinforcement learning strategy to effectively solve the problem. Comprehensive experiments on the ImageNet dataset as well as practical tests with Google Vision and GPT-4V reveal that L2T surpasses current methodologies in enhancing adversarial transferability thereby confirming its effectiveness and practical significance. |
Boosting Adversarial Training via Fisher-Rao Norm-based Regularization | paper | Adversarial training is extensively utilized to improve the adversarial robustness of deep neural networks. Yet mitigating the degradation of standard generalization performance in adversarial-trained models remains an open problem. This paper attempts to resolve this issue through the lens of model complexity. First We leverage the Fisher-Rao norm a geometrically invariant metric for model complexity to establish the non-trivial bounds of the Cross-Entropy Loss-based Rademacher complexity for a ReLU-activated Multi-Layer Perceptron. Building upon this observation we propose a novel regularization framework called Logit-Oriented Adversarial Training (LOAT) which can mitigate the trade-off between robustness and accuracy while imposing only a negligible increase in computational overhead. Our extensive experiments demonstrate that the proposed regularization strategy can boost the performance of the prevalent adversarial training algorithms including PGD-AT TRADES TRADES (LSE) MART and DM-AT across various network architectures. Our code will be available at https://github.com/TrustAI/LOAT. |
Pre-trained Model Guided Fine-Tuning for Zero-Shot Adversarial Robustness | paper | Large-scale pre-trained vision-language models like CLIP have demonstrated impressive performance across various tasks and exhibit remarkable zero-shot generalization capability while they are also vulnerable to imperceptible adversarial examples. Existing works typically employ adversarial training (fine-tuning) as a defense method against adversarial examples. However direct application to the CLIP model may result in overfitting compromising the model's capacity for generalization. In this paper we propose Pre-trained Model Guided Adversarial Fine-Tuning (PMG-AFT) method which leverages supervision from the original pre-trained model by carefully designing an auxiliary branch to enhance the model's zero-shot adversarial robustness. Specifically PMG-AFT minimizes the distance between the features of adversarial examples in the target model and those in the pre-trained model aiming to preserve the generalization features already captured by the pre-trained model. Extensive Experiments on 15 zero-shot datasets demonstrate that PMG-AFT significantly outperforms the state-of-the-art method improving the top-1 robust accuracy by an average of 4.99%. Furthermore our approach consistently improves clean accuracy by an average of 8.72%. |
MimicDiffusion: Purifying Adversarial Perturbation via Mimicking Clean Diffusion Model | paper | Deep neural networks (DNNs) are vulnerable to adversarial perturbation where an imperceptible perturbation is added to the image that can fool the DNNs. Diffusion-based adversarial purification uses the diffusion model to generate a clean image against such adversarial attacks. Unfortunately the generative process of the diffusion model is also inevitably affected by adversarial perturbation since the diffusion model is also a deep neural network where its input has adversarial perturbation. In this work we propose MimicDiffusion a new diffusion-based adversarial purification technique that directly approximates the generative process of the diffusion model with the clean image as input. Concretely we analyze the differences between the guided terms using the clean image and the adversarial sample. After that we first implement MimicDiffusion based on Manhattan distance. Then we propose two guidance to purify the adversarial perturbation and approximate the clean diffusion model. Extensive experiments on three image datasets including CIFAR-10 CIFAR-100 and ImageNet with three classifier backbones including WideResNet-70-16 WideResNet-28-10 and ResNet-50 demonstrate that MimicDiffusion significantly performs better than the state-of-the-art baselines. On CIFAR-10 CIFAR-100 and ImageNet it achieves 92.67% 61.35% and 61.53% average robust accuracy which are 18.49% 13.23% and 17.64% higher respectively. The code is available at https://github.com/psky1111/MimicDiffusion. |
Dispel Darkness for Better Fusion: A Controllable Visual Enhancer based on Cross-modal Conditional Adversarial Learning | paper | We propose a controllable visual enhancer named DDBF which is based on cross-modal conditional adversarial learning and aims to dispel darkness and achieve better visible and infrared modalities fusion. Specifically a guided restoration module (GRM) is firstly designed to enhance weakened information in the low-light visible modality. The GRM utilizes the light-invariant high-contrast characteristics of the infrared modality as the central target distribution and constructs a multi-level conditional adversarial sample set to enable continuous controlled brightness enhancement of visible images. Then we develop an information fusion module (IFM) to integrate the advantageous features of the enhanced visible image and the infrared image. Thanks to customized explicit information preservation and hue fidelity constraints the IFM produces visually pleasing results with rich textures significant contrast and vivid colors. The brightened visible image and the final fused image compose the dual output of our DDBF to meet the diverse visual preferences of users. We evaluate DDBF on the public datasets achieving state-of-the-art performances of low-light enhancement and information integration that is available for both day and night scenarios. The experiments also demonstrate that our DDBF is effective in improving decision accuracy for object detection and semantic segmentation. Moreover we offer a user-friendly interface for the convenient application of our model. The code is publicly available at https://github.com/HaoZhang1018/DDBF. |
Initialization Matters for Adversarial Transfer Learning | paper | With the prevalence of the Pretraining-Finetuning paradigm in transfer learning the robustness of downstream tasks has become a critical concern. In this work we delve into adversarial robustness in transfer learning and reveal the critical role of initialization including both the pretrained model and the linear head. First we discover the necessity of an adversarially robust pretrained model. Specifically we reveal that with a standard pretrained model Parameter-Efficient Finetuning (PEFT) methods either fail to be adversarially robust or continue to exhibit significantly degraded adversarial robustness on downstream tasks even with adversarial training during finetuning. Leveraging a robust pretrained model surprisingly we observe that a simple linear probing can outperform full finetuning and other PEFT methods with random initialization on certain datasets. We further identify that linear probing excels in preserving robustness from the robust pretraining. Based on this we propose Robust Linear Initialization (RoLI) for adversarial finetuning which initializes the linear head with the weights obtained by adversarial linear probing to maximally inherit the robustness from pretraining. Across five different image classification datasets we demonstrate the effectiveness of RoLI and achieve new state-of-the-art results. Our code is available at https://github.com/DongXzz/RoLI. |
MMCert: Provable Defense against Adversarial Attacks to Multi-modal Models | paper | Different from a unimodal model whose input is from a single modality the input (called multi-modal input) of a multi-modal model is from multiple modalities such as image 3D points audio text etc. Similar to unimodal models many existing studies show that a multi-modal model is also vulnerable to adversarial perturbation where an attacker could add small perturbation to all modalities of a multi-modal input such that the multi-modal model makes incorrect predictions for it. Existing certified defenses are mostly designed for unimodal models which achieve sub-optimal certified robustness guarantees when extended to multi-modal models as shown in our experimental results. In our work we propose MMCert the first certified defense against adversarial attacks to a multi-modal model. We derive a lower bound on the performance of our MMCert under arbitrary adversarial attacks with bounded perturbations to both modalities (e.g. in the context of auto-driving we bound the number of changed pixels in both RGB image and depth image). We evaluate our MMCert using two benchmark datasets: one for the multi-modal road segmentation task and the other for the multi-modal emotion recognition task. Moreover we compare our MMCert with a state-of-the-art certified defense extended from unimodal models. Our experimental results show that our MMCert outperforms the baseline. |
Attack To Defend: Exploiting Adversarial Attacks for Detecting Poisoned Models | paper | Poisoning (trojan/backdoor) attacks enable an adversary to train and deploy a corrupted machine learning (ML) model which typically works well and achieves good accuracy on clean input samples but behaves maliciously on poisoned samples containing specific trigger patterns. Using such poisoned ML models as the foundation to build real-world systems can compromise application safety. Hence there is a critical need for algorithms that detect whether a given target model has been poisoned. This work proposes a novel approach for detecting poisoned models called Attack To Defend (A2D) which is based on the observation that poisoned models are more sensitive to adversarial perturbations compared to benign models. We propose a metric called sensitivity to adversarial perturbations (SAP) to measure the sensitivity of a ML model to adversarial attacks at a specific perturbation bound. We then generate strong adversarial attacks against an unrelated reference model and estimate the SAP value of the target model by transferring the generated attacks. The target model is deemed to be a trojan if its SAP value exceeds a decision threshold. The A2D framework requires only black-box access to the target model and a small clean set while being computationally efficient. The A2D approach has been evaluated on four standard image datasets and its effectiveness under various types of poisoning attacks has been demonstrated |
Hide in Thicket: Generating Imperceptible and Rational Adversarial Perturbations on 3D Point Clouds | paper | Adversarial attack methods based on point manipulation for 3D point cloud classification have revealed the fragility of 3D models yet the adversarial examples they produce are easily perceived or defended against. The trade-off between the imperceptibility and adversarial strength leads most point attack methods to inevitably introduce easily detectable outlier points upon a successful attack. Another promising strategy shape-based attack can effectively eliminate outliers but existing methods often suffer significant reductions in imperceptibility due to irrational deformations. We find that concealing deformation perturbations in areas insensitive to human eyes can achieve a better trade-off between imperceptibility and adversarial strength specifically in parts of the object surface that are complex and exhibit drastic curvature changes. Therefore we propose a novel shape-based adversarial attack method HiT-ADV which initially conducts a two-stage search for attack regions based on saliency and imperceptibility scores and then adds deformation perturbations in each attack region using Gaussian kernel functions. Additionally HiT-ADV is extendable to physical attack. We propose that by employing benign resampling and benign rigid transformations we can further enhance physical adversarial strength with little sacrifice to imperceptibility. Extensive experiments have validated the superiority of our method in terms of adversarial and imperceptible properties in both digital and physical spaces. |
Ensemble Diversity Facilitates Adversarial Transferability | paper | With the advent of ensemble-based attacks the transferability of generated adversarial examples is elevated by a noticeable margin despite many methods only employing superficial integration yet ignoring the diversity between ensemble models. However most of them compromise the latent value of the diversity between generated perturbation from distinct models which we argue is also able to increase the adversarial transferability especially heterogeneous attacks. To address the issues we propose a novel method of Stochastic Mini-batch black-box attack with Ensemble Reweighing using reinforcement learning (SMER) to produce highly transferable adversarial examples. We emphasize the diversity between surrogate models achieving individual perturbation iteratively. In order to customize the individual effect between surrogates ensemble reweighing is introduced to refine ensemble weights by maximizing attack loss based on reinforcement learning which functions on the ultimate transferability elevation. Extensive experiments demonstrate our superiority to recent ensemble attacks with a significant margin across different black-box attack scenarios especially on heterogeneous conditions. |
Title | Link | Abstract |
---|---|---|
Adversarially Robust Distillation by Reducing the Student-Teacher Variance Gap | paper | "Adversarial robustness generally relies on large-scale architectures and datasets, hindering resource-efficient deployment. For scalable solutions, adversarially robust knowledge distillation has emerged as a principle strategy, facilitating the transfer of robustness from a large-scale teacher model to a lightweight student model. However, existing works focus solely on sample-to-sample alignment of features or predictions between the teacher and student models, overlooking the vital role of their statistical alignment. Thus, we propose a novel adversarially robust knowledge distillation method that integrates the alignment of feature distributions between the teacher and student backbones under adversarial and clean sample sets. To motivate our idea, for an adversarially trained model (, student or teacher), we show that the robust accuracy (evaluated on testing adversarial samples under an increasing perturbation radius) correlates negatively with the gap between the feature variance evaluated on testing adversarial samples and testing clean samples. Such a negative correlation exhibits a strong linear trend, suggesting that aligning the feature covariance of the student model toward the feature covariance of the teacher model should improve the adversarial robustness of the student model by reducing the variance gap. A similar trend is observed by reducing the variance gap between the gram matrices of the student and teacher models. Extensive evaluations highlight the state-of-the-art adversarial robustness and natural performance of our method across diverse datasets and distillation scenarios." |
FLAT: Flux-aware Imperceptible Adversarial Attacks on 3D Point Clouds | paper | "Adversarial attacks on point clouds play a vital role in assessing and enhancing the adversarial robustness of 3D deep learning models. While employing a variety of geometric constraints, existing adversarial attack solutions often display unsatisfactory imperceptibility due to inadequate consideration of uniformity changes. In this paper, we propose , a novel framework designed to generate imperceptible adversarial point clouds by addressing the issue from a flux perspective. Specifically, during adversarial attacks, we assess the extent of uniformity alterations by calculating the flux of the local perturbation vector field. Upon identifying a high flux, which signals potential disruption in uniformity, the directions of the perturbation vectors are adjusted to minimize these alterations, thereby improving imperceptibility. Extensive experiments validate the effectiveness of in generating imperceptible adversarial point clouds, and its superiority to the state-of-the-art methods." |
Learning Differentially Private Diffusion Models via Stochastic Adversarial Distillation | paper | "While the success of deep learning relies on large amounts of training datasets, data is often limited in privacy-sensitive domains. To address this challenge, generative model learning with differential privacy has emerged as a solution to train private generative models for desensitized data generation. However, the quality of the images generated by existing methods is limited due to the complexity of modeling data distribution. We build on the success of diffusion models and introduce DP-SAD, which trains a private diffusion model by a stochastic adversarial distillation method. Specifically, we first train a diffusion model as a teacher and then train a student by distillation, in which we achieve differential privacy by adding noise to the gradients from other models to the student. For better generation quality, we introduce a discriminator to distinguish whether an image is from the teacher or the student, which forms the adversarial training. Extensive experiments and analysis clearly demonstrate the effectiveness of our proposed method." |
High-Fidelity 3D Textured Shapes Generation by Sparse Encoding and Adversarial Decoding | paper | "3D vision is inherently characterized by sparse spatial structures, which propels the necessity for an efficient paradigm tailored to 3D generation. Another discrepancy is the amount of training data, which undeniably affects generalization if we only use limited 3D data. To solve these, we design a 3D generation framework that maintains most of the building blocks of StableDiffusion with minimal adaptations for textured shape generation. We design a Sparse Encoding Module for details preservation and an Adversarial Decoding Module for better shape recovery. Moreover, we clean up data and build a benchmark on the biggest 3D dataset (Objaverse). We drop the concept of ‘specific class’ and treat the 3D Textured Shapes Generation as an open-vocabulary problem. We first validate our network design on ShapeNetV2 with 55K samples on single-class unconditional generation and multi-class conditional generation tasks. Then we report metrics on processed G-Objaverse with 200K samples on the image conditional generation task. Extensive experiments demonstrate our proposal outperforms SOTA methods and takes a further step towards open-vocabulary 3D generation. We release the processed data at https://aigc3d.github.io/gobjaverse/." |
Any Target Can be Offense: Adversarial Example Generation via Generalized Latent Infection | paper | "Targeted adversarial attack, which aims to mislead a model to recognize any image as a target object by imperceptible perturbations, has become a mainstream tool for vulnerability assessment of deep neural networks (DNNs). Since existing targeted attackers only learn to attack known target classes, they cannot generalize well to unknown classes. To tackle this issue, we propose Generalized Adversarial attacKER (GAKer), which is able to construct adversarial examples to any target class. The core idea behind GAKer is to craft a latently infected representation during adversarial example generation. To this end, the extracted latent representations of the target object are first injected into intermediate features of an input image in an adversarial generator. Then, the generator is optimized to ensure visual consistency with the input image while being close to the target object in the feature space. Since the GAKer is class-agnostic yet model-agnostic, it can be regarded as a general tool that not only reveals the vulnerability of more DNNs but also identifies deficiencies of DNNs in a wider range of classes. Extensive experiments have demonstrated the effectiveness of our proposed method in generating adversarial examples for both known and unknown classes. Notably, compared with other generative methods, our method achieves an approximately 14.13% higher attack success rate for unknown classes and an approximately 4.23% higher success rate for known classes. Our code is available in https://github.com/VL-Group/GAKer." |
Improving Domain Generalization in Self-Supervised Monocular Depth Estimation via Stabilized Adversarial Training | paper | "Learning a self-supervised Monocular Depth Estimation (MDE) model with great generalization remains significantly challenging. Despite the success of adversarial augmentation in the supervised learning generalization, naively incorporating it into self-supervised MDE models potentially causes over-regularization, suffering from severe performance degradation. In this paper, we conduct qualitative analysis and illuminate the main causes: (i) inherent sensitivity in the UNet-alike depth network and (ii) dual optimization conflict caused by over-regularization. To tackle these issues, we propose a general adversarial training framework, named Stabilized Conflict-optimization Adversarial Training (SCAT), integrating adversarial data augmentation into self-supervised MDE methods to achieve a balance between stability and generalization. Specifically, we devise an effective scaling depth network that tunes the coefficients of long skip connection and effectively stabilizes the training process. Then, we propose a conflict gradient surgery strategy, which progressively integrates the adversarial gradient and optimizes the model toward a conflict-free direction. Extensive experiments on five benchmarks demonstrate that SCAT can achieve state-of-the-art performance and significantly improve the generalization capability of existing self-supervised MDE methods." |
CLIP-Guided Generative Networks for Transferable Targeted Adversarial Attacks | paper | "Transferable targeted adversarial attacks aim to mislead models into outputting adversary-specified predictions in black-box scenarios. Recent studies have introduced single-target attacks that train a generator for each target class to generate highly transferable perturbations, resulting in substantial computational overhead when handling multiple classes. Multi-target attacks address this by training only one class-conditional generator for multiple classes. However, the generator simply uses class labels as conditions, failing to leverage the rich semantic information of the target class. To this end, we design a CLIP-guided Generative Network with Cross-attention modules (CGNC) to enhance multi-target attacks by incorporating textual knowledge of CLIP into the generator. Extensive experiments demonstrate that CGNC yields significant improvements over previous multi-target attacks, e.g., a 21.46% improvement in success rate from Res-152 to DenseNet-121. Moreover, we propose the masked fine-tuning to further strengthen our method in attacking a single class, which surpasses existing single-target methods." |
Preventing Catastrophic Overfitting in Fast Adversarial Training: A Bi-level Optimization Perspective | paper | "Adversarial training (AT) has become an effective defense method against adversarial examples (AEs) and it is typically framed as a bi-level optimization problem. Among various AT methods, fast AT (FAT), which employs a single-step attack strategy to guide the training process, can achieve good robustness against adversarial attacks at a low cost. However, FAT methods suffer from the catastrophic overfitting problem, especially on complex tasks or with large-parameter models. In this work, we propose a FAT method termed FGSM-PCO, which mitigates catastrophic overfitting by averting the collapse of the inner optimization problem in the bi-level optimization process. FGSM-PCO generates current-stage AEs from the historical AEs and incorporates them into the training process using an adaptive mechanism. This mechanism determines an appropriate fusion ratio according to the performance of the AEs on the training model. Coupled with a loss function tailored to the training framework, FGSM-PCO can alleviate catastrophic overfitting and help the recovery of an overfitted model to effective training. We evaluate our algorithm across three models and three datasets to validate its effectiveness. Comparative empirical studies against other FAT algorithms demonstrate that our proposed method effectively addresses unresolved overfitting issues in existing algorithms." |
Transferable 3D Adversarial Shape Completion using Diffusion Models | paper | "Recent studies that incorporate geometric features and transformers into 3D point cloud feature learning have significantly improved the performance of 3D deep-learning models. However, their robustness against adversarial attacks has not been thoroughly explored. Existing attack methods primarily focus on white-box scenarios and struggle to transfer to recently proposed 3D deep-learning models. Even worse, these attacks introduce perturbations to 3D coordinates, generating unrealistic adversarial examples and resulting in poor performance against 3D adversarial defenses. In this paper, we generate high-quality adversarial point clouds using diffusion models. By using partial points as prior knowledge, we generate realistic adversarial examples through shape completion with adversarial guidance. The proposed adversarial shape completion allows for a more reliable generation of adversarial point clouds. To enhance attack transferability, we delve into the characteristics of 3D point clouds and employ model uncertainty for better inference of model classification through random down-sampling of point clouds. We adopt ensemble adversarial guidance for improved transferability across different network architectures. To maintain the generation quality, we limit our adversarial guidance solely to the critical points of the point clouds by calculating saliency scores. Extensive experiments demonstrate that our proposed attacks outperform state-of-the-art adversarial attack methods against both black-box models and defenses. Our black-box attack establishes a new baseline for evaluating the robustness of various 3D point cloud classification models." |
Interpretability-Guided Test-Time Adversarial Defense | paper | "We propose a novel and low-cost test-time adversarial defense by devising interpretability-guided neuron importance ranking methods to identify neurons important to the output classes. Our method is a training-free approach that can significantly improve the robustness-accuracy tradeoff while incurring minimal computational overhead. While being among the most efficient test-time defenses (4× faster), our method is also robust to a wide range of black-box, white-box, and adaptive attacks that break previous test-time defenses. We demonstrate the efficacy of our method for CIFAR10, CIFAR100, and ImageNet-1k on the standard RobustBench benchmark (with average gains of 2.6%, 4.9%, and 2.8% respectively). We also show improvements (average 1.5%) over the state-of-the-art test-time defenses even under strong adaptive attacks." |
A Secure Image Watermarking Framework with Statistical Guarantees via Adversarial Attacks on Secret Key Networks | paper | "Imperceptible watermarks are essential in safeguarding the content authenticity and the rights of creators in imagery. Recently, several leading approaches, notably zero-bit watermarking, have demonstrated impressive imperceptibility and robustness in image watermarking. However, these methods have security weaknesses, e.g., the risk of counterfeiting and the ease of erasing an existing watermark with another watermark, while also lacking a statistical guarantee regarding the detection performance. To address this issue, we propose a novel framework to train a secret key network (SKN), which serves as a non-duplicable safeguard for securing the embedded watermark. The SKN is trained so that natural images’ output obeys a standard multi-variate normal distribution. To embed a watermark, we apply an adversarial attack (a modified PGD attack) on the image such that the SKN produces a secret key signature (SKS) with a longer length. We then derive two hypothesis tests to detect the presence of the watermark in an image via the SKN response magnitude and the SKS angle, which offer a statistical guarantee of the false positive rate. Our extensive empirical study demonstrates that our framework maintains robustness comparable to existing methods and excels in security and imperceptibility." |
Prompt-Driven Contrastive Learning for Transferable Adversarial Attacks | paper | "Recent vision-language foundation models, such as CLIP, have demonstrated superior capabilities in learning representations that can be transferable across diverse range of downstream tasks and domains. With the emergence of such powerful models, it has become crucial to effectively leverage their capabilities in tackling challenging vision tasks. On the other hand, only a few works have focused on devising adversarial examples that transfer well to both unknown domains and model architectures. In this paper, we propose a novel transfer attack method called PDCL-Attack, which leverages the CLIP model to enhance the transferability of adversarial perturbations generated by a generative model-based attack framework. Specifically, we formulate an effective prompt-driven feature guidance by harnessing the semantic representation power of text, particularly from the ground-truth class labels of input images. To the best of our knowledge, we are the first to introduce prompt learning to enhance the transferable generative attacks. Extensive experiments conducted across various cross-domain and cross-model settings empirically validate our approach, demonstrating its superiority over state-of-the-art methods." |
Adversarial Prompt Tuning for Vision-Language Models | paper | "With the rapid advancement of multimodal learning, pre-trained Vision-Language Models (VLMs) such as CLIP have demonstrated remarkable capacities in bridging the gap between visual and language modalities. However, these models remain vulnerable to adversarial attacks, particularly in the image modality, presenting considerable security risks. This paper introduces Adversarial Prompt Tuning (AdvPT), a novel technique to enhance the adversarial robustness of image encoders in VLMs. AdvPT innovatively leverages learnable text prompts and aligns them with adversarial image embeddings, to address the vulnerabilities inherent in VLMs without the need for extensive parameter training or modification of the model architecture. We demonstrate that AdvPT improves resistance against white-box and black-box adversarial attacks and exhibits a synergistic effect when combined with existing input denoising defense techniques, further boosting defensive capabilities. Comprehensive experimental analyses provide insights into adversarial prompt tuning, a novel paradigm devoted to improving resistance to adversarial images through textual input modifications, paving the way for future robust multimodal learning research. These findings open up new possibilities for enhancing the security of VLMs. Our code is available at https://github.com/jiamingzhang94/Adversarial-Prompt-Tuning. Corresponding authors: Xingjun Ma and Jitao Sang." |
Enhancing Tracking Robustness with Auxiliary Adversarial Defense Networks | paper | "Adversarial attacks in visual object tracking have significantly degraded the performance of advanced trackers by introducing imperceptible perturbations into images. However, there is still a lack of research on designing adversarial defense methods for object tracking. To address these issues, we propose an effective auxiliary pre-processing defense network, AADN, which performs defensive transformations on the input images before feeding them into the tracker. Moreover, it can be seamlessly integrated with other visual trackers as a plug-and-play module without parameter adjustments. We train AADN using adversarial training, specifically employing Dua-Loss to generate adversarial samples that simultaneously attack the classification and regression branches of the tracker. Extensive experiments conducted on the OTB100, LaSOT, and VOT2018 benchmarks demonstrate that AADN maintains excellent defense robustness against adversarial attack methods in both adaptive and non-adaptive attack scenarios. Moreover, when transferring the defense network to heterogeneous trackers, it exhibits reliable transferability. Finally, AADN achieves a processing time of up to 5ms/frame, allowing seamless integration with existing high-speed trackers without introducing significant computational overhead." |
AdvDiff: Generating Unrestricted Adversarial Examples using Diffusion Models | paper | "Unrestricted adversarial attacks present a serious threat to deep learning models and adversarial defense techniques. They pose severe security problems for deep learning applications because they can effectively bypass defense mechanisms. However, previous attack methods often directly inject Projected Gradient Descent (PGD) gradients into the sampling of generative models, which are not theoretically provable and thus generate unrealistic examples by incorporating adversarial objectives, especially for GAN-based methods on large-scale datasets like ImageNet. In this paper, we propose a new method, called AdvDiff, to generate unrestricted adversarial examples with diffusion models. We design two novel adversarial guidance techniques to conduct adversarial sampling in the reverse generation process of diffusion models. These two techniques are effective and stable in generating high-quality, realistic adversarial examples by integrating gradients of the target classifier interpretably. Experimental results on MNIST and ImageNet datasets demonstrate that AdvDiff is effective in generating unrestricted adversarial examples, which outperforms state-of-the-art unrestricted adversarial attack methods in terms of attack performance and generation quality." |
PapMOT: Exploring Adversarial Patch Attack against Multiple Object Tracking | paper | "Tracking multiple objects in a continuous video stream is crucial for many computer vision tasks. It involves detecting and associating objects with their respective identities across successive frames. Despite significant progress made in multiple object tracking (MOT), recent studies have revealed the vulnerability of existing MOT methods to adversarial attacks. Nevertheless, all of these attacks belong to digital attacks that inject pixel-level noise into input images, and are therefore ineffective in physical scenarios. To fill this gap, we propose PapMOT, which can generate physical adversarial patches against MOT for both digital and physical scenarios. Besides attacking the detection mechanism, PapMOT also optimizes a printable patch that can be detected as new targets to mislead the identity association process. Moreover, we introduce a patch enhancement strategy to further degrade the temporal consistency of tracking results across video frames, resulting in more aggressive attacks. We further develop new evaluation metrics to assess the robustness of MOT against such attacks. Extensive evaluations on multiple datasets demonstrate that our PapMOT can successfully attack various architectures of MOT trackers in digital scenarios. We also validate the effectiveness of PapMOT for physical attacks by deploying printed adversarial patches in the real world." |
DIFFender: Diffusion-Based Adversarial Defense against Patch Attacks | paper | "Adversarial attacks, particularly patch attacks, pose significant threats to the robustness and reliability of deep learning models. Developing reliable defenses against patch attacks is crucial for real-world applications. This paper introduces DIFFender, a novel defense framework that harnesses the capabilities of a text-guided diffusion model to combat patch attacks. Central to our approach is the discovery of the Adversarial Anomaly Perception (AAP) phenomenon, which empowers the diffusion model to detect and localize adversarial patches through the analysis of distributional discrepancies. DIFFender integrates dual tasks of patch localization and restoration within a single diffusion model framework, utilizing their close interaction to enhance defense efficacy. Moreover, DIFFender utilizes vision-language pre-training coupled with an efficient few-shot prompt-tuning algorithm, which streamlines the adaptation of the pre-trained diffusion model to defense tasks, thus eliminating the need for extensive retraining. Our comprehensive evaluation spans image classification and face recognition tasks, extending to real-world scenarios, where DIFFender shows good robustness against adversarial attacks. The versatility and generalizability of DIFFender are evident across a variety of settings, classifiers, and attack methodologies, marking an advancement in adversarial patch defense strategies. Our code is available at https:// github.com/kkkcx/DIFFender." |
Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajectory | paper | "Vision-language pre-training (VLP) models exhibit remarkable capabilities in comprehending both images and text, yet they remain susceptible to multimodal adversarial examples (AEs). Strengthening attacks and uncovering vulnerabilities, especially common issues in VLP models (e.g., high transferable AEs), can advance reliable and practical VLP models. A recent work (i.e., Set-level guidance attack) indicates that augmenting image-text pairs to increase AE diversity along the optimization path enhances the transferability of adversarial examples significantly. However, this approach predominantly emphasizes diversity around the online adversarial examples (i.e., AEs in the optimization period), leading to the risk of overfitting the victim model and affecting the transferability. In this study, we posit that the diversity of adversarial examples towards the clean input and online AEs are both pivotal for enhancing transferability across VLP models. Consequently, we propose using diversification along the intersection region of adversarial trajectory to expand the diversity of AEs. To fully leverage the interaction between modalities, we introduce text-guided adversarial example selection during optimization. Furthermore, to further mitigate the potential overfitting, we direct the adversarial text deviating from the last intersection region along the optimization path, rather than adversarial images as in existing methods. Extensive experiments affirm the effectiveness of our method in improving transferability across various VLP models and downstream vision-and-language tasks. Code is available at https://github.com/SensenGao/VLPTransferAttack." |
Robustness Tokens: Towards Adversarial Robustness of Transformers | paper | "Recently, large pre-trained foundation models have become widely adopted by machine learning practitioners for a multitude of tasks. Given that such models are publicly available, relying on their use as backbone models for downstream tasks might result in high vulnerability to adversarial attacks crafted with the same public model. In this work, we propose Robustness Tokens, a novel approach specific to the transformer architecture that fine-tunes a few additional private tokens with low computational requirements instead of tuning model parameters as done in traditional adversarial training. We show that Robustness Tokens make Vision Transformer models significantly more robust to white-box adversarial attacks while also retaining the original downstream performances." |
Self-Supervised Representation Learning for Adversarial Attack Detection | paper | "Supervised learning-based adversarial attack detection methods rely on a large number of labeled data and suffer significant performance degradation when applying the trained model to new domains. In this paper, we propose a self-supervised representation learning framework for the adversarial attack detection task to address this drawback. Firstly, we map the pixels of augmented input images into an embedding space. Then, we employ the prototype-wise contrastive estimation loss to cluster prototypes as latent variables. Additionally, drawing inspiration from the concept of memory banks, we introduce a discrimination bank to distinguish and learn representations for each individual instance that shares the same or a similar prototype, establishing a connection between instances and their associated prototypes. We propose a parallel axial-attention (PAA)-based encoder to facilitate the training process by parallel training over height- and width-axis of attention maps. Experimental results show that, compared to various benchmark self-supervised vision learning models and supervised adversarial attack detection methods, the proposed model achieves state-of-the-art performance on the adversarial attack detection task across a wide range of images." |
Improving Adversarial Transferability via Model Alignment | paper | "Neural networks are susceptible to adversarial perturbations that are transferable across different models. In this paper, we introduce a novel model alignment technique aimed at improving a given source model’s ability in generating transferable adversarial perturbations. During the alignment process, the parameters of the source model are fine-tuned to minimize an alignment loss. This loss measures the divergence in the predictions between the source model and another, independently trained model, referred to as the witness model. To understand the effect of model alignment, we conduct a geometric analysis of the resulting changes in the loss landscape. Extensive experiments on the ImageNet dataset, using a variety of model architectures, demonstrate that perturbations generated from aligned source models exhibit significantly higher transferability than those from the original source model. Our source code is available at https://github.com/averyma/model-alignment." |
Delving into Adversarial Robustness on Document Tampering Localization | paper | "Recent advances in document forgery techniques produce malicious yet nearly visually untraceable alterations, imposing a big challenge for document tampering localization (DTL). Despite significant recent progress, there has been surprisingly limited exploration of adversarial robustness in DTL. This paper presents the first effort to uncover the vulnerability of most existing DTL models to adversarial attacks, highlighting the need for greater attention within the DTL community. In pursuit of robust DTL, we demonstrate that adversarial training can promote the model’s robustness and effectively protect against adversarial attacks. As a notable advancement, we further introduce a latent manifold adversarial training approach that enhances adversarial robustness in DTL by incorporating perturbations on the latent manifold of adversarial examples, rather than exclusively relying on label-guided information. Extensive experiments on DTL benchmark datasets show the necessity of adversarial training and our proposed manifold-based method significantly improves the adversarial robustness on both white-box and black-box attacks. Codes will be available at https://github.com/SHR-77/DTL-ARob.git." |
Cocktail Universal Adversarial Attack on Deep Neural Networks | paper | "Deep neural networks (DNNs) for image classification are known to be susceptible to many diversified universal adversarial perturbations (UAPs), where each UAP successfully attacks a large but substantially different set of images. Properly combining the diversified UAPs can significantly improve the attack effectiveness, as the sets of images successfully attacked by different UAPs are complementary to each other. In this paper, we study this novel type of attack by developing a cocktail universal adversarial attack framework. The key idea is to train a set of diversified UAPs and a selection neural network at the same time, such that the selection neural network can choose the most effective UAP when attacking a new target image. Due to the simplicity and effectiveness of the cocktail attack framework, it can be generally used to significantly boost the attack effectiveness of many classic single-UAP methods that use a single UAP to attack all target images. The proposed cocktail attack framework is also able to perform real-time attacks as it does not require additional training or fine-tuning when attacking new target images. Extensive experiments demonstrate the outstanding performance of cocktail attacks." |
Dual-Path Adversarial Lifting for Domain Shift Correction in Online Test-time Adaptation | paper | "Transformer-based methods have achieved remarkable success in various machine learning tasks. How to design efficient test-time adaptation methods for transformer models becomes an important research task. In this work, motivated by the dual-subband wavelet lifting scheme developed in multi-scale signal processing which is able to efficiently separate the input signals into principal components and noise components, we introduce a dual-path token lifting for domain shift correction in test time adaptation. Specifically, we introduce an extra token, referred to as domain shift token, at each layer of the transformer network. We then perform dual-path lifting with interleaved token prediction and update between the path of domain shift tokens and the path of class tokens at all network layers. The prediction and update networks are learned in an adversarial manner. Specifically, the task of the prediction network is to learn the residual noise of domain shift which should be largely invariant across all classes and all samples in the target domain. In other words, the predicted domain shift noise should be indistinguishable between all sample classes. On the other hand, the task of the update network is to update the class tokens by removing the domain shift from the input image samples so that input samples become more discriminative between different classes in the feature space. To effectively learn the prediction and update networks with two adversarial tasks, both theoretically and practically, we demonstrate that it is necessary to use smooth optimization for the update network but non-smooth optimization for the prediction network. Experimental results on the benchmark datasets demonstrate that our proposed method significantly improves the online fully test-time domain adaptation performance." |
Similarity of Neural Architectures using Adversarial Attack Transferability | paper | "In recent years, many deep neural architectures have been developed for image classification. Whether they are similar or dissimilar and what factors contribute to their (dis)similarities remains curious. To address this question, we aim to design a quantitative and scalable similarity measure between neural architectures. We propose Similarity by Attack Transferability (SAT) from the observation that adversarial attack transferability contains information related to input gradients and decision boundaries widely used to understand model behaviors. We conduct a large-scale analysis on 69 state-of-the-art ImageNet classifiers using our SAT to answer the question. In addition, we provide interesting insights into ML applications using multiple models, such as model ensemble and knowledge distillation. Our results show that using diverse neural architectures with distinct components can benefit such scenarios." |
Dynamic Guidance Adversarial Distillation with Enhanced Teacher Knowledge | paper | "In the realm of Adversarial Distillation (AD), strategic and precise knowledge transfer from an adversarially robust teacher model to a less robust student model is paramount. Our Dynamic Guidance Adversarial Distillation (DGAD) framework directly tackles the challenge of differential sample importance, with a keen focus on rectifying the teacher model’s misclassifications. DGAD employs Misclassification-Aware Partitioning (MAP) to dynamically tailor the distillation focus, optimizing the learning process by steering towards the most reliable teacher predictions. Additionally, our Error-corrective Label Swapping (ELS) corrects misclassifications of the teacher on both clean and adversarially perturbed inputs, refining the quality of knowledge transfer. Further, Predictive Consistency Regularization (PCR) guarantees consistent performance of the student model across both clean and adversarial inputs, significantly enhancing its overall robustness. By integrating these methodologies, DGAD significantly improves upon the accuracy of clean data and fortifies the model’s defenses against sophisticated adversarial threats. Our experimental validation on CIFAR10, CIFAR100, and Tiny ImageNet datasets, employing various model architectures, demonstrates the efficacy of DGAD, establishing it as a promising approach for enhancing both the robustness and accuracy of student models in adversarial settings. The code is available at https://github.com/kunsaram01/DGAD." |
Exploring Vulnerabilities in Spiking Neural Networks: Direct Adversarial Attacks on Raw Event Data | paper | "In the field of computer vision, event-based Dynamic Vision Sensors (DVSs) have emerged as a significant complement to traditional pixel-based imaging due to their low power consumption and high temporal resolution. These sensors, particularly when combined with Spiking Neural Networks (SNNs), offer a promising direction for energy-efficient and fast-reacting vision systems. Typically, DVS data are converted into grid-based formats for processing with SNNs, with this transformation process often being an opaque step in the pipeline. As a result, the grid representation becomes an intermediate yet inaccessible stage during the implementation of attacks, highlighting the importance of attacking raw event data. Existing attack methodologies predominantly target grid-based representations, hindered by the complexity of three-valued optimization and the broad optimization space associated with raw event data. Our study addresses this gap by introducing a novel adversarial attack approach that directly targets raw event data. We tackle the inherent challenges of three-valued optimization and the need to preserve data sparsity through a strategic amalgamation of methods: 1) Treating Discrete Event Values as Probabilistic Samples: This allows for continuous optimization by considering discrete event values as probabilistic space samples. 2) Focusing on Specific Event Positions: We prioritize specific event positions that merge original data with additional target label data, enhancing attack precision. 3) Employing a Sparsity Norm: To retain the original data’s sparsity, a sparsity norm is utilized, ensuring the adversarial data’s comparability. Our empirical findings demonstrate the effectiveness of our combined approach, achieving noteworthy success in targeted attacks and highlighting vulnerabilities in models based on raw event data." |
AdversariaLeak: External Information Leakage Attack Using Adversarial Samples on Face Recognition Systems | paper | "Face recognition (FR) systems are vulnerable to external information leakage (EIL) attacks, which can reveal sensitive information about the training data, thus compromising the confidentiality of the company’s proprietary and the privacy of the individuals concerned. Existing EIL attacks mainly rely on unrealistic assumptions, such as a high query budget for the attacker and massive computational power, resulting in impractical EIL attacks. We present , a novel and practical query-based EIL attack that targets the face verification model of the FR systems by using carefully selected adversarial samples. uses substitute models to craft adversarial samples, which are then handpicked to infer sensitive information. Our extensive evaluation on the MAAD-Face and CelebA datasets, which includes over 200 different target models, shows that outperforms state-of-the-art EIL attacks in inferring the property that best characterizes the FR model’s training set while maintaining a small query budget and practical attacker assumptions." |
SeA: Semantic Adversarial Augmentation for Last Layer Features from Unsupervised Representation Learning | paper | "Deep features extracted from certain layers of a pre-trained deep model show superior performance over the conventional hand-crafted features. Compared with fine-tuning or linear probing that can explore diverse augmentations, , random crop/flipping, in the original input space, the appropriate augmentations for learning with fixed deep features are more challenging and have been less investigated, which degenerates the performance. To unleash the potential of fixed deep features, we propose a novel semantic adversarial augmentation (SeA) in the feature space for optimization. Concretely, the adversarial direction implied by the gradient will be projected to a subspace spanned by other examples to preserve the semantic information. Then, deep features will be perturbed with the semantic direction, and augmented features will be applied to learn the classifier. Experiments are conducted on 11 benchmark downstream classification tasks with 4 popular pre-trained models. Our method is 2% better than the deep features without SeA on average. Moreover, compared to the expensive fine-tuning that is expected to give good performance, SeA shows a comparable performance on 6 out of 11 tasks, demonstrating the effectiveness of our proposal in addition to its efficiency." |
Rethinking Fast Adversarial Training: A Splitting Technique To Overcome Catastrophic Overfitting | paper | "Catastrophic overfitting (CO) poses a significant challenge to fast adversarial training (FastAT), particularly at large perturbation scales, leading to dramatic reductions in adversarial test accuracy. Our analysis of existing FastAT methods shows that CO is accompanied by abrupt and irregular fluctuations in loss convergence, indicating that a stable training dynamic is key to preventing CO. Therefore, we propose a training model that uses the Douglas-Rachford (DR) splitting technique to ensure a balanced and consistent training progression, effectively counteracting CO. The DR splitting technique, known for its ability to solve complex optimization problems, offering a distinct advantage over classical FastAT methods by providing a smoother loss convergence. This is achieved without resorting to complex regularization or incurring the computational costs associated with double backpropagation, presenting an efficient solution to enhance adversarial robustness. Our comprehensive evaluation conducted across standard datasets, demonstrates that our DR splitting-based model not only improves adversarial robustness but also achieves this with remarkable efficiency compared to various FastAT methods. This efficiency is particularly observed under conditions involving long training schedules and large adversarial perturbations." |
Evaluating the Adversarial Robustness of Semantic Segmentation: Trying Harder Pays Off | paper | "Machine learning models are vulnerable to tiny adversarial input perturbations optimized to cause a very large output error. To measure this vulnerability, we need reliable methods that can find such adversarial perturbations. For image classification models, evaluation methodologies have emerged that have stood the test of time. However, we argue that in the area of semantic segmentation, a good approximation of the sensitivity to adversarial perturbations requires significantly more effort than what is currently considered satisfactory. To support this claim, we re-evaluate a number of well-known robust segmentation models in an extensive empirical study. We propose new attacks and combine them with the strongest attacks available in the literature. We also analyze the sensitivity of the models in fine detail. The results indicate that most of the state-of-the-art models have a dramatically larger sensitivity to adversarial perturbations than previously reported. We also demonstrate a size-bias: small objects are often more easily attacked, even if the large objects are robust, a phenomenon not revealed by current evaluation metrics. Our results also demonstrate that a diverse set of strong attacks is necessary, because different models are often vulnerable to different attacks. Our implementation is available at https://github.com/szegedai/Robust-Segmentation-Evaluation." |
Adversarial Robustification via Text-to-Image Diffusion Models | paper | "Adversarial robustness has been conventionally believed as a challenging property to encode for neural networks, requiring plenty of training data. In the recent paradigm of adopting off-the-shelf models, however, access to their training data is often infeasible or not practical, while most of such models are not originally trained concerning adversarial robustness. In this paper, we develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data. Our intuition is to view recent text-to-image diffusion models as “adaptable” denoisers that can be optimized to specify target tasks. Based on this, we propose: (a) to initiate a denoise-and-classify pipeline that offers provable guarantees against adversarial attacks, and (b) to leverage a few synthetic reference images generated from the text-to-image model that enables novel adaptation schemes. Our experiments show that our data-free scheme applied to the pre-trained CLIP could improve the (provable) adversarial robustness of its diverse zero-shot classification derivatives (while maintaining their accuracy), significantly surpassing prior approaches that utilize the full training data. Not only for CLIP, we also demonstrate that our framework is easily applicable for robustifying other visual classifiers efficiently. Code is available at https://github.com/ChoiDae1/robustify-T2I." |
R.A.C.E.: Robust Adversarial Concept Erasure for Secure Text-to-Image Diffusion Model | paper | "In the evolving landscape of text-to-image (T2I) diffusion models, the remarkable capability to generate high-quality images from textual descriptions faces challenges with the potential misuse of reproducing sensitive content. To address this critical issue, we introduce Robust Adversarial Concept Erase (RACE), a novel approach designed to mitigate these risks by enhancing the robustness of concept erasure method for T2I models. RACE utilizes a sophisticated adversarial training framework to identify and mitigate adversarial text embeddings, significantly reducing the Attack Success Rate (ASR). Impressively, RACE achieves a 30% reduction in ASR for the “nudity” concept against the leading white-box attack method. Our extensive evaluations demonstrate RACE’s effectiveness in defending against both white-box and black-box attacks, marking a significant advancement in protecting T2I diffusion models from generating inappropriate or misleading imagery. This work underlines the essential need for proactive defense measures in adapting to the rapidly advancing field of adversarial challenges. Our code is publicly available: https://github.com/chkimmmmm/R.A.C.E." |
Adversarial Diffusion Distillation | paper | "We introduce Adversarial Diffusion Distillation (ADD), a novel training approach that efficiently samples large-scale foundational image diffusion models in just 1–4 steps while maintaining high image quality. We use score distillation to leverage large-scale off-the-shelf image diffusion models as a teacher signal in combination with an adversarial loss to ensure high image fidelity even in the low-step regime of one or two sampling steps. Our analyses show that our model clearly outperforms existing few-step methods (GANs, Latent Consistency Models) in a single step and reaches the performance of state-of-the-art diffusion models (SDXL) in only four steps. ADD is the first method to unlock single-step, real-time image synthesis with foundation models." |