In federated learning,backdoor attacks have become an important research topic with their wide application in processing sensitive datasets.Since federated learning detects or modifies local models through defense mec...In federated learning,backdoor attacks have become an important research topic with their wide application in processing sensitive datasets.Since federated learning detects or modifies local models through defense mechanisms during aggregation,it is difficult to conduct effective backdoor attacks.In addition,existing backdoor attack methods are faced with challenges,such as low backdoor accuracy,poor ability to evade anomaly detection,and unstable model training.To address these challenges,a method called adaptive simulation backdoor attack(ASBA)is proposed.Specifically,ASBA improves the stability of model training by manipulating the local training process and using an adaptive mechanism,the ability of the malicious model to evade anomaly detection by combing large simulation training and clipping,and the backdoor accuracy by introducing a stimulus model to amplify the impact of the backdoor in the global model.Extensive comparative experiments under five advanced defense scenarios show that ASBA can effectively evade anomaly detection and achieve high backdoor accuracy in the global model.Furthermore,it exhibits excellent stability and effectiveness after multiple rounds of attacks,outperforming state-of-the-art backdoor attack methods.展开更多
At inference time,deep neural networks are susceptible to backdoor attacks,which can produce attackercontrolled outputs when inputs contain carefully crafted triggers.Existing defense methods often focus on specific a...At inference time,deep neural networks are susceptible to backdoor attacks,which can produce attackercontrolled outputs when inputs contain carefully crafted triggers.Existing defense methods often focus on specific attack types or incur high costs,such as data cleaning or model fine-tuning.In contrast,we argue that it is possible to achieve effective and generalizable defense without removing triggers or incurring high model-cleaning costs.Fromthe attacker’s perspective and based on characteristics of vulnerable neuron activation anomalies,we propose an Adaptive Feature Injection(AFI)method for black-box backdoor detection.AFI employs a pre-trained image encoder to extract multi-level deep features and constructs a dynamic weight fusionmechanism for precise identification and interception of poisoned samples.Specifically,we select the control samples with the largest feature differences fromthe clean dataset via feature-space analysis,and generate blended sample pairs with the test sample using dynamic linear interpolation.The detection statistic is computed by measuring the divergence G(x)in model output responses.We systematically evaluate the effectiveness of AFI against representative backdoor attacks,including BadNets,Blend,WaNet,and IAB,on three benchmark datasets:MNIST,CIFAR-10,and ImageNet.Experimental results show that AFI can effectively detect poisoned samples,achieving average detection rates of 95.20%,94.15%,and 86.49%on these datasets,respectively.Compared with existing methods,AFI demonstrates strong cross-domain generalization ability and robustness to unknown attacks.展开更多
Federated Learning(FL)protects data privacy through a distributed training mechanism,yet its decentralized nature also introduces new security vulnerabilities.Backdoor attacks inject malicious triggers into the global...Federated Learning(FL)protects data privacy through a distributed training mechanism,yet its decentralized nature also introduces new security vulnerabilities.Backdoor attacks inject malicious triggers into the global model through compromised updates,posing significant threats to model integrity and becoming a key focus in FL security.Existing backdoor attack methods typically embed triggers directly into original images and consider only data heterogeneity,resulting in limited stealth and adaptability.To address the heterogeneity of malicious client devices,this paper proposes a novel backdoor attack method named Capability-Adaptive Shadow Backdoor Attack(CASBA).By incorporating measurements of clients’computational and communication capabilities,CASBA employs a dynamic hierarchical attack strategy that adaptively aligns attack intensity with available resources.Furthermore,an improved deep convolutional generative adversarial network(DCGAN)is integrated into the attack pipeline to embed triggers without modifying original data,significantly enhancing stealthiness.Comparative experiments with Shadow Backdoor Attack(SBA)across multiple scenarios demonstrate that CASBA dynamically adjusts resource consumption based on device capabilities,reducing average memory usage per iteration by 5.8%.CASBA improves resource efficiency while keeping the drop in attack success rate within 3%.Additionally,the effectiveness of CASBA against three robust FL algorithms is also validated.展开更多
Deep neural networks(DNNs)have found extensive applications in safety-critical artificial intelligence systems,such as autonomous driving and facial recognition systems.However,recent research has revealed their susce...Deep neural networks(DNNs)have found extensive applications in safety-critical artificial intelligence systems,such as autonomous driving and facial recognition systems.However,recent research has revealed their susceptibility to backdoors maliciously injected by adversaries.This vulnerability arises due to the intricate architecture and opacity of DNNs,resulting in numerous redundant neurons embedded within the models.Adversaries exploit these vulnerabilities to conceal malicious backdoor information within DNNs,thereby causing erroneous outputs and posing substantial threats to the efficacy of DNN-based applications.This article presents a comprehensive survey of backdoor attacks against DNNs and the countermeasure methods employed to mitigate them.Initially,we trace the evolution of the concept from traditional backdoor attacks to backdoor attacks against DNNs,highlighting the feasibility and practicality of generating backdoor attacks against DNNs.Subsequently,we provide an overview of notable works encompassing various attack and defense strategies,facilitating a comparative analysis of their approaches.Through these discussions,we offer constructive insights aimed at refining these techniques.Finally,we extend our research perspective to the domain of large language models(LLMs)and synthesize the characteristics and developmental trends of backdoor attacks and defense methods targeting LLMs.Through a systematic review of existing studies on backdoor vulnerabilities in LLMs,we identify critical open challenges in this field and propose actionable directions for future research.展开更多
Deep neural networks(DNNs)and generative AI(GenAI)are increasingly vulnerable to backdoor attacks,where adversaries embed triggers into inputs to cause models to misclassify or misinterpret target labels.Beyond tradit...Deep neural networks(DNNs)and generative AI(GenAI)are increasingly vulnerable to backdoor attacks,where adversaries embed triggers into inputs to cause models to misclassify or misinterpret target labels.Beyond traditional single-trigger scenarios,attackers may inject multiple triggers across various object classes,forming unseen backdoor-object configurations that evade standard detection pipelines.In this paper,we introduce DBOM(Disentangled Backdoor-Object Modeling),a proactive framework that leverages structured disentanglement to identify and neutralize both seen and unseen backdoor threats at the dataset level.Specifically,DBOM factorizes input image representations by modeling triggers and objects as independent primitives in the embedding space through the use of Vision-Language Models(VLMs).By leveraging the frozen,pre-trained encoders of VLMs,our approach decomposes the latent representations into distinct components through a learnable visual prompt repository and prompt prefix tuning,ensuring that the relationships between triggers and objects are explicitly captured.To separate trigger and object representations in the visual prompt repository,we introduce the trigger–object separation and diversity losses that aids in disentangling trigger and object visual features.Next,by aligning image features with feature decomposition and fusion,as well as learned contextual prompt tokens in a shared multimodal space,DBOM enables zero-shot generalization to novel trigger-object pairings that were unseen during training,thereby offering deeper insights into adversarial attack patterns.Experimental results on CIFAR-10 and GTSRB demonstrate that DBOM robustly detects poisoned images prior to downstream training,significantly enhancing the security of DNN training pipelines.展开更多
Federated Learning(FL),a practical solution that leverages distributed data across devices without the need for centralized data storage,which enables multiple participants to jointly train models while preserving dat...Federated Learning(FL),a practical solution that leverages distributed data across devices without the need for centralized data storage,which enables multiple participants to jointly train models while preserving data privacy and avoiding direct data sharing.Despite its privacy-preserving advantages,FL remains vulnerable to backdoor attacks,where malicious participants introduce backdoors into local models that are then propagated to the global model through the aggregation process.While existing differential privacy defenses have demonstrated effectiveness against backdoor attacks in FL,they often incur a significant degradation in the performance of the aggregated models on benign tasks.To address this limitation,we propose a novel backdoor defense mechanism based on differential privacy.Our approach first utilizes the inherent out-of-distribution characteristics of backdoor samples to identify and exclude malicious model updates that significantly deviate from benign models.By filtering out models that are clearly backdoor-infected before applying differential privacy,our method reduces the required noise level for differential privacy,thereby enhancing model robustness while preserving performance.Experimental evaluations on the CIFAR10 and FEMNIST datasets demonstrate that our method effectively limits the backdoor accuracy to below 15%across various backdoor scenarios while maintaining high main task accuracy.展开更多
Federated Learning enables privacy-preserving training of Transformer-based language models,but remains vulnerable to backdoor attacks that compromise model reliability.This paper presents a comparative analysis of de...Federated Learning enables privacy-preserving training of Transformer-based language models,but remains vulnerable to backdoor attacks that compromise model reliability.This paper presents a comparative analysis of defense strategies against both classical and advanced backdoor attacks,evaluated across autoencoding and autoregressive models.Unlike prior studies,this work provides the first systematic comparison of perturbation-based,screening-based,and hybrid defenses in Transformer-based FL environments.Our results show that screening-based defenses consistently outperform perturbation-based ones,effectively neutralizing most attacks across architectures.However,this robustness comes with significant computational overhead,revealing a clear trade-off between security and efficiency.By explicitly identifying this trade-off,our study advances the understanding of defense strategies in federated learning and highlights the need for lightweight yet effective screening methods for trustworthy deployment in diverse application domains.展开更多
Visual object tracking(VOT),aiming to track a target object in a continuous video,is a fundamental and critical task in computer vision.However,the reliance on third-party resources(e.g.,dataset)for training poses con...Visual object tracking(VOT),aiming to track a target object in a continuous video,is a fundamental and critical task in computer vision.However,the reliance on third-party resources(e.g.,dataset)for training poses concealed threats to the security of VOT models.In this paper,we reveal that VOT models are vulnerable to a poison-only and targeted backdoor attack,where the adversary can achieve arbitrary tracking predictions by manipulating only part of the training data.Specifically,we first define and formulate three different variants of the targeted attacks:size-manipulation,trajectory-manipulation,and hybrid attacks.To implement these,we introduce Random Video Poisoning(RVP),a novel poison-only strategy that exploits temporal correlations within video data by poisoning entire video sequences.Extensive experiments demonstrate that RVP effectively injects controllable backdoors,enabling precise manipulation of tracking behavior upon trigger activation,while maintaining high performance on benign data,thus ensuring stealth.Our findings not only expose significant vulnerabilities but also highlight that the underlying principles could be adapted for beneficial uses,such as dataset watermarking for copyright protection.展开更多
目的后门攻击通过触发器—标签强关联已严重威胁计算机视觉模型的安全性。现有模型防御方案普遍依赖全模型微调或架构重构,面临计算资源消耗显著攀升、模型参数不可逆损伤以及部署灵活性受限等挑战。针对上述问题,面向图像分类模型提出...目的后门攻击通过触发器—标签强关联已严重威胁计算机视觉模型的安全性。现有模型防御方案普遍依赖全模型微调或架构重构,面临计算资源消耗显著攀升、模型参数不可逆损伤以及部署灵活性受限等挑战。针对上述问题,面向图像分类模型提出一种基于特征阻断的轻量化后门防御机制,通过级联模块化设计,在无须任何攻击先验知识的前提下,仅需对原始模型嵌入轻量级阻断模块并进行定向微调,即可实现多场景后门特征的自适应阻断。方法设计级联特征阻断模块(包含跨通道空间过滤层、实例统计校准层、动态通道抑制层以及随机特征掩码层等),设计定向微调策略,在冻结原始模型参数的前提下,利用少量干净样本定向优化阻断模块参数,实现阻断模块对后门特征阻断与良性特征无损传递的双重目标,并通过PyTorch Hook机制实现模块的动态植入与无损移除。结果在MNIST(Modified National Institute of Standards and Technology)、CIFAR-10(Canadian Institute for Advanced Research)和MINI-ImageNet等3个数据集上,针对BadNets、Blended、WaNet、BppAttack和WaveAttack等5种典型后门攻击类型的对比实验表明:本文方法使攻击成功率平均下降90.0%,良性样本分类准确率损失小于3%,验证了防御机制的有效性和泛化能力。与主流模型防御方法相比,计算开销显著降低,阻断模块参数量不到原模型的1%;灵活部署性方面,支持运行时动态启停,移除后原始模型性能无损恢复。实验进一步验证了方法的架构普适性,在ResNet(residual network)和VGG-11(Visual Geometry Group)两种异构网络中,攻击成功率分别下降了90.0%和88.9%,表明防御机制具有跨模型鲁棒性。结论该机制通过轻量化模块化设计与微调机制,有效突破了传统模型防御方法在计算成本与灵活性层面的瓶颈问题,其即插即用与无损移除特性为实际场景中的模型安全部署提供了高效解决方案。展开更多
目的现有数据浓缩后门攻击方法将含有触发器的中毒样本和干净样本浓缩为小的数据集,中毒数据中真实数据的强信号掩盖触发器的弱信号,并且未考虑将非目标类浓缩数据与中毒数据特征分离,非目标类浓缩数据残留触发器特征。因此,提出分离触...目的现有数据浓缩后门攻击方法将含有触发器的中毒样本和干净样本浓缩为小的数据集,中毒数据中真实数据的强信号掩盖触发器的弱信号,并且未考虑将非目标类浓缩数据与中毒数据特征分离,非目标类浓缩数据残留触发器特征。因此,提出分离触发器和多重对比的数据浓缩后门攻击。方法首先将触发器与真实数据进行分离。分离的触发器作为样本与真实数据并行嵌入浓缩数据,减少真实数据对触发器的干扰。然后,对分离的触发器进行优化,将触发器接近目标类真实数据的特征,提高触发器的嵌入效果,同时对触发器进行了分区放大预处理来增加触发器像素的数量,使其在优化过程获取大量的梯度用于指导学习。在数据浓缩阶段,通过多重对比将目标类浓缩数据与触发器特征投影在同一空间,将非目标类浓缩数据与触发器特征分离,进一步提高后门攻击的成功率。结果为了验证所提出方法的有效性,将所提出方法在FashionMNIST(Fashion Modified National Institute of Standards and Technology database)、CIFAR10(Canadian Institute for Advances Research’s ten categories dataset)、STL10(Stanford letter-10)、SVHN(street view house numbers)与其他4种方法进行对比实验。所提出的方法在5个数据集和6个不同的模型上均达到100%的攻击成功率,同时未降低干净样本在模型上的准确率。结论所提出的方法通过解决现有方法存在的问题,实现了性能的显著提高。本文方法具体代码见:https://github.com/tfuy/STMC。展开更多
文摘In federated learning,backdoor attacks have become an important research topic with their wide application in processing sensitive datasets.Since federated learning detects or modifies local models through defense mechanisms during aggregation,it is difficult to conduct effective backdoor attacks.In addition,existing backdoor attack methods are faced with challenges,such as low backdoor accuracy,poor ability to evade anomaly detection,and unstable model training.To address these challenges,a method called adaptive simulation backdoor attack(ASBA)is proposed.Specifically,ASBA improves the stability of model training by manipulating the local training process and using an adaptive mechanism,the ability of the malicious model to evade anomaly detection by combing large simulation training and clipping,and the backdoor accuracy by introducing a stimulus model to amplify the impact of the backdoor in the global model.Extensive comparative experiments under five advanced defense scenarios show that ASBA can effectively evade anomaly detection and achieve high backdoor accuracy in the global model.Furthermore,it exhibits excellent stability and effectiveness after multiple rounds of attacks,outperforming state-of-the-art backdoor attack methods.
基金supported by the National Natural Science Foundation of China Grant(No.61972133)Project of Leading Talents in Science and Technology Innovation for Thousands of People Plan in Henan Province Grant(No.204200510021)the Key Research and Development Plan Special Project of Henan Province Grant(No.241111211400).
文摘At inference time,deep neural networks are susceptible to backdoor attacks,which can produce attackercontrolled outputs when inputs contain carefully crafted triggers.Existing defense methods often focus on specific attack types or incur high costs,such as data cleaning or model fine-tuning.In contrast,we argue that it is possible to achieve effective and generalizable defense without removing triggers or incurring high model-cleaning costs.Fromthe attacker’s perspective and based on characteristics of vulnerable neuron activation anomalies,we propose an Adaptive Feature Injection(AFI)method for black-box backdoor detection.AFI employs a pre-trained image encoder to extract multi-level deep features and constructs a dynamic weight fusionmechanism for precise identification and interception of poisoned samples.Specifically,we select the control samples with the largest feature differences fromthe clean dataset via feature-space analysis,and generate blended sample pairs with the test sample using dynamic linear interpolation.The detection statistic is computed by measuring the divergence G(x)in model output responses.We systematically evaluate the effectiveness of AFI against representative backdoor attacks,including BadNets,Blend,WaNet,and IAB,on three benchmark datasets:MNIST,CIFAR-10,and ImageNet.Experimental results show that AFI can effectively detect poisoned samples,achieving average detection rates of 95.20%,94.15%,and 86.49%on these datasets,respectively.Compared with existing methods,AFI demonstrates strong cross-domain generalization ability and robustness to unknown attacks.
基金supported by the National Natural Science Foundation of China(Grant No.62172123)the Key Research and Development Program of Heilongjiang Province,China(GrantNo.2022ZX01A36).
文摘Federated Learning(FL)protects data privacy through a distributed training mechanism,yet its decentralized nature also introduces new security vulnerabilities.Backdoor attacks inject malicious triggers into the global model through compromised updates,posing significant threats to model integrity and becoming a key focus in FL security.Existing backdoor attack methods typically embed triggers directly into original images and consider only data heterogeneity,resulting in limited stealth and adaptability.To address the heterogeneity of malicious client devices,this paper proposes a novel backdoor attack method named Capability-Adaptive Shadow Backdoor Attack(CASBA).By incorporating measurements of clients’computational and communication capabilities,CASBA employs a dynamic hierarchical attack strategy that adaptively aligns attack intensity with available resources.Furthermore,an improved deep convolutional generative adversarial network(DCGAN)is integrated into the attack pipeline to embed triggers without modifying original data,significantly enhancing stealthiness.Comparative experiments with Shadow Backdoor Attack(SBA)across multiple scenarios demonstrate that CASBA dynamically adjusts resource consumption based on device capabilities,reducing average memory usage per iteration by 5.8%.CASBA improves resource efficiency while keeping the drop in attack success rate within 3%.Additionally,the effectiveness of CASBA against three robust FL algorithms is also validated.
基金supported in part by the National Natural Science Foundation of China under Grants No.62372087 and No.62072076the Research Fund of State Key Laboratory of Processors under Grant No.CLQ202310the CSC scholarship.
文摘Deep neural networks(DNNs)have found extensive applications in safety-critical artificial intelligence systems,such as autonomous driving and facial recognition systems.However,recent research has revealed their susceptibility to backdoors maliciously injected by adversaries.This vulnerability arises due to the intricate architecture and opacity of DNNs,resulting in numerous redundant neurons embedded within the models.Adversaries exploit these vulnerabilities to conceal malicious backdoor information within DNNs,thereby causing erroneous outputs and posing substantial threats to the efficacy of DNN-based applications.This article presents a comprehensive survey of backdoor attacks against DNNs and the countermeasure methods employed to mitigate them.Initially,we trace the evolution of the concept from traditional backdoor attacks to backdoor attacks against DNNs,highlighting the feasibility and practicality of generating backdoor attacks against DNNs.Subsequently,we provide an overview of notable works encompassing various attack and defense strategies,facilitating a comparative analysis of their approaches.Through these discussions,we offer constructive insights aimed at refining these techniques.Finally,we extend our research perspective to the domain of large language models(LLMs)and synthesize the characteristics and developmental trends of backdoor attacks and defense methods targeting LLMs.Through a systematic review of existing studies on backdoor vulnerabilities in LLMs,we identify critical open challenges in this field and propose actionable directions for future research.
基金supported by the UWF Argo Cyber Emerging Scholars(ACES)program funded by the National Science Foundation(NSF)CyberCorps^(®) Scholarship for Service(SFS)award under grant number 1946442.
文摘Deep neural networks(DNNs)and generative AI(GenAI)are increasingly vulnerable to backdoor attacks,where adversaries embed triggers into inputs to cause models to misclassify or misinterpret target labels.Beyond traditional single-trigger scenarios,attackers may inject multiple triggers across various object classes,forming unseen backdoor-object configurations that evade standard detection pipelines.In this paper,we introduce DBOM(Disentangled Backdoor-Object Modeling),a proactive framework that leverages structured disentanglement to identify and neutralize both seen and unseen backdoor threats at the dataset level.Specifically,DBOM factorizes input image representations by modeling triggers and objects as independent primitives in the embedding space through the use of Vision-Language Models(VLMs).By leveraging the frozen,pre-trained encoders of VLMs,our approach decomposes the latent representations into distinct components through a learnable visual prompt repository and prompt prefix tuning,ensuring that the relationships between triggers and objects are explicitly captured.To separate trigger and object representations in the visual prompt repository,we introduce the trigger–object separation and diversity losses that aids in disentangling trigger and object visual features.Next,by aligning image features with feature decomposition and fusion,as well as learned contextual prompt tokens in a shared multimodal space,DBOM enables zero-shot generalization to novel trigger-object pairings that were unseen during training,thereby offering deeper insights into adversarial attack patterns.Experimental results on CIFAR-10 and GTSRB demonstrate that DBOM robustly detects poisoned images prior to downstream training,significantly enhancing the security of DNN training pipelines.
文摘Federated Learning(FL),a practical solution that leverages distributed data across devices without the need for centralized data storage,which enables multiple participants to jointly train models while preserving data privacy and avoiding direct data sharing.Despite its privacy-preserving advantages,FL remains vulnerable to backdoor attacks,where malicious participants introduce backdoors into local models that are then propagated to the global model through the aggregation process.While existing differential privacy defenses have demonstrated effectiveness against backdoor attacks in FL,they often incur a significant degradation in the performance of the aggregated models on benign tasks.To address this limitation,we propose a novel backdoor defense mechanism based on differential privacy.Our approach first utilizes the inherent out-of-distribution characteristics of backdoor samples to identify and exclude malicious model updates that significantly deviate from benign models.By filtering out models that are clearly backdoor-infected before applying differential privacy,our method reduces the required noise level for differential privacy,thereby enhancing model robustness while preserving performance.Experimental evaluations on the CIFAR10 and FEMNIST datasets demonstrate that our method effectively limits the backdoor accuracy to below 15%across various backdoor scenarios while maintaining high main task accuracy.
基金supported by a research fund from Chosun University,2024.
文摘Federated Learning enables privacy-preserving training of Transformer-based language models,but remains vulnerable to backdoor attacks that compromise model reliability.This paper presents a comparative analysis of defense strategies against both classical and advanced backdoor attacks,evaluated across autoencoding and autoregressive models.Unlike prior studies,this work provides the first systematic comparison of perturbation-based,screening-based,and hybrid defenses in Transformer-based FL environments.Our results show that screening-based defenses consistently outperform perturbation-based ones,effectively neutralizing most attacks across architectures.However,this robustness comes with significant computational overhead,revealing a clear trade-off between security and efficiency.By explicitly identifying this trade-off,our study advances the understanding of defense strategies in federated learning and highlights the need for lightweight yet effective screening methods for trustworthy deployment in diverse application domains.
基金supported in part by the"Pioneer"and"Leading Goose"R&D Program of Zhejiang under Grant No. 2024C01169the National Natural Science Foundation of China under Grant Nos. 62441238 and U2441240。
文摘Visual object tracking(VOT),aiming to track a target object in a continuous video,is a fundamental and critical task in computer vision.However,the reliance on third-party resources(e.g.,dataset)for training poses concealed threats to the security of VOT models.In this paper,we reveal that VOT models are vulnerable to a poison-only and targeted backdoor attack,where the adversary can achieve arbitrary tracking predictions by manipulating only part of the training data.Specifically,we first define and formulate three different variants of the targeted attacks:size-manipulation,trajectory-manipulation,and hybrid attacks.To implement these,we introduce Random Video Poisoning(RVP),a novel poison-only strategy that exploits temporal correlations within video data by poisoning entire video sequences.Extensive experiments demonstrate that RVP effectively injects controllable backdoors,enabling precise manipulation of tracking behavior upon trigger activation,while maintaining high performance on benign data,thus ensuring stealth.Our findings not only expose significant vulnerabilities but also highlight that the underlying principles could be adapted for beneficial uses,such as dataset watermarking for copyright protection.
文摘目的后门攻击通过触发器—标签强关联已严重威胁计算机视觉模型的安全性。现有模型防御方案普遍依赖全模型微调或架构重构,面临计算资源消耗显著攀升、模型参数不可逆损伤以及部署灵活性受限等挑战。针对上述问题,面向图像分类模型提出一种基于特征阻断的轻量化后门防御机制,通过级联模块化设计,在无须任何攻击先验知识的前提下,仅需对原始模型嵌入轻量级阻断模块并进行定向微调,即可实现多场景后门特征的自适应阻断。方法设计级联特征阻断模块(包含跨通道空间过滤层、实例统计校准层、动态通道抑制层以及随机特征掩码层等),设计定向微调策略,在冻结原始模型参数的前提下,利用少量干净样本定向优化阻断模块参数,实现阻断模块对后门特征阻断与良性特征无损传递的双重目标,并通过PyTorch Hook机制实现模块的动态植入与无损移除。结果在MNIST(Modified National Institute of Standards and Technology)、CIFAR-10(Canadian Institute for Advanced Research)和MINI-ImageNet等3个数据集上,针对BadNets、Blended、WaNet、BppAttack和WaveAttack等5种典型后门攻击类型的对比实验表明:本文方法使攻击成功率平均下降90.0%,良性样本分类准确率损失小于3%,验证了防御机制的有效性和泛化能力。与主流模型防御方法相比,计算开销显著降低,阻断模块参数量不到原模型的1%;灵活部署性方面,支持运行时动态启停,移除后原始模型性能无损恢复。实验进一步验证了方法的架构普适性,在ResNet(residual network)和VGG-11(Visual Geometry Group)两种异构网络中,攻击成功率分别下降了90.0%和88.9%,表明防御机制具有跨模型鲁棒性。结论该机制通过轻量化模块化设计与微调机制,有效突破了传统模型防御方法在计算成本与灵活性层面的瓶颈问题,其即插即用与无损移除特性为实际场景中的模型安全部署提供了高效解决方案。
文摘目的现有数据浓缩后门攻击方法将含有触发器的中毒样本和干净样本浓缩为小的数据集,中毒数据中真实数据的强信号掩盖触发器的弱信号,并且未考虑将非目标类浓缩数据与中毒数据特征分离,非目标类浓缩数据残留触发器特征。因此,提出分离触发器和多重对比的数据浓缩后门攻击。方法首先将触发器与真实数据进行分离。分离的触发器作为样本与真实数据并行嵌入浓缩数据,减少真实数据对触发器的干扰。然后,对分离的触发器进行优化,将触发器接近目标类真实数据的特征,提高触发器的嵌入效果,同时对触发器进行了分区放大预处理来增加触发器像素的数量,使其在优化过程获取大量的梯度用于指导学习。在数据浓缩阶段,通过多重对比将目标类浓缩数据与触发器特征投影在同一空间,将非目标类浓缩数据与触发器特征分离,进一步提高后门攻击的成功率。结果为了验证所提出方法的有效性,将所提出方法在FashionMNIST(Fashion Modified National Institute of Standards and Technology database)、CIFAR10(Canadian Institute for Advances Research’s ten categories dataset)、STL10(Stanford letter-10)、SVHN(street view house numbers)与其他4种方法进行对比实验。所提出的方法在5个数据集和6个不同的模型上均达到100%的攻击成功率,同时未降低干净样本在模型上的准确率。结论所提出的方法通过解决现有方法存在的问题,实现了性能的显著提高。本文方法具体代码见:https://github.com/tfuy/STMC。