Visual object tracking(VOT),aiming to track a target object in a continuous video,is a fundamental and critical task in computer vision.However,the reliance on third-party resources(e.g.,dataset)for training poses con...Visual object tracking(VOT),aiming to track a target object in a continuous video,is a fundamental and critical task in computer vision.However,the reliance on third-party resources(e.g.,dataset)for training poses concealed threats to the security of VOT models.In this paper,we reveal that VOT models are vulnerable to a poison-only and targeted backdoor attack,where the adversary can achieve arbitrary tracking predictions by manipulating only part of the training data.Specifically,we first define and formulate three different variants of the targeted attacks:size-manipulation,trajectory-manipulation,and hybrid attacks.To implement these,we introduce Random Video Poisoning(RVP),a novel poison-only strategy that exploits temporal correlations within video data by poisoning entire video sequences.Extensive experiments demonstrate that RVP effectively injects controllable backdoors,enabling precise manipulation of tracking behavior upon trigger activation,while maintaining high performance on benign data,thus ensuring stealth.Our findings not only expose significant vulnerabilities but also highlight that the underlying principles could be adapted for beneficial uses,such as dataset watermarking for copyright protection.展开更多
Federated Learning(FL),a practical solution that leverages distributed data across devices without the need for centralized data storage,which enables multiple participants to jointly train models while preserving dat...Federated Learning(FL),a practical solution that leverages distributed data across devices without the need for centralized data storage,which enables multiple participants to jointly train models while preserving data privacy and avoiding direct data sharing.Despite its privacy-preserving advantages,FL remains vulnerable to backdoor attacks,where malicious participants introduce backdoors into local models that are then propagated to the global model through the aggregation process.While existing differential privacy defenses have demonstrated effectiveness against backdoor attacks in FL,they often incur a significant degradation in the performance of the aggregated models on benign tasks.To address this limitation,we propose a novel backdoor defense mechanism based on differential privacy.Our approach first utilizes the inherent out-of-distribution characteristics of backdoor samples to identify and exclude malicious model updates that significantly deviate from benign models.By filtering out models that are clearly backdoor-infected before applying differential privacy,our method reduces the required noise level for differential privacy,thereby enhancing model robustness while preserving performance.Experimental evaluations on the CIFAR10 and FEMNIST datasets demonstrate that our method effectively limits the backdoor accuracy to below 15%across various backdoor scenarios while maintaining high main task accuracy.展开更多
Deep neural networks(DNNs)have found extensive applications in safety-critical artificial intelligence systems,such as autonomous driving and facial recognition systems.However,recent research has revealed their susce...Deep neural networks(DNNs)have found extensive applications in safety-critical artificial intelligence systems,such as autonomous driving and facial recognition systems.However,recent research has revealed their susceptibility to backdoors maliciously injected by adversaries.This vulnerability arises due to the intricate architecture and opacity of DNNs,resulting in numerous redundant neurons embedded within the models.Adversaries exploit these vulnerabilities to conceal malicious backdoor information within DNNs,thereby causing erroneous outputs and posing substantial threats to the efficacy of DNN-based applications.This article presents a comprehensive survey of backdoor attacks against DNNs and the countermeasure methods employed to mitigate them.Initially,we trace the evolution of the concept from traditional backdoor attacks to backdoor attacks against DNNs,highlighting the feasibility and practicality of generating backdoor attacks against DNNs.Subsequently,we provide an overview of notable works encompassing various attack and defense strategies,facilitating a comparative analysis of their approaches.Through these discussions,we offer constructive insights aimed at refining these techniques.Finally,we extend our research perspective to the domain of large language models(LLMs)and synthesize the characteristics and developmental trends of backdoor attacks and defense methods targeting LLMs.Through a systematic review of existing studies on backdoor vulnerabilities in LLMs,we identify critical open challenges in this field and propose actionable directions for future research.展开更多
Federated Learning(FL),a burgeoning technology,has received increasing attention due to its privacy protection capability.However,the base algorithm FedAvg is vulnerable when it suffers from so-called backdoor attacks...Federated Learning(FL),a burgeoning technology,has received increasing attention due to its privacy protection capability.However,the base algorithm FedAvg is vulnerable when it suffers from so-called backdoor attacks.Former researchers proposed several robust aggregation methods.Unfortunately,due to the hidden characteristic of backdoor attacks,many of these aggregation methods are unable to defend against backdoor attacks.What's more,the attackers recently have proposed some hiding methods that further improve backdoor attacks'stealthiness,making all the existing robust aggregation methods fail.To tackle the threat of backdoor attacks,we propose a new aggregation method,X-raying Models with A Matrix(XMAM),to reveal the malicious local model updates submitted by the backdoor attackers.Since we observe that the output of the Softmax layer exhibits distinguishable patterns between malicious and benign updates,unlike the existing aggregation algorithms,we focus on the Softmax layer's output in which the backdoor attackers are difficult to hide their malicious behavior.Specifically,like medical X-ray examinations,we investigate the collected local model updates by using a matrix as an input to get their Softmax layer's outputs.Then,we preclude updates whose outputs are abnormal by clustering.Without any training dataset in the server,the extensive evaluations show that our XMAM can effectively distinguish malicious local model updates from benign ones.For instance,when other methods fail to defend against the backdoor attacks at no more than 20%malicious clients,our method can tolerate 45%malicious clients in the black-box mode and about 30%in Projected Gradient Descent(PGD)mode.Besides,under adaptive attacks,the results demonstrate that XMAM can still complete the global model training task even when there are 40%malicious clients.Finally,we analyze our method's screening complexity and compare the real screening time with other methods.The results show that XMAM is about 10–10000 times faster than the existing methods.展开更多
Deep Neural Networks(DNNs)are integral to various aspects of modern life,enhancing work efficiency.Nonethe-less,their susceptibility to diverse attack methods,including backdoor attacks,raises security concerns.We aim...Deep Neural Networks(DNNs)are integral to various aspects of modern life,enhancing work efficiency.Nonethe-less,their susceptibility to diverse attack methods,including backdoor attacks,raises security concerns.We aim to investigate backdoor attack methods for image categorization tasks,to promote the development of DNN towards higher security.Research on backdoor attacks currently faces significant challenges due to the distinct and abnormal data patterns of malicious samples,and the meticulous data screening by developers,hindering practical attack implementation.To overcome these challenges,this study proposes a Gaussian Noise-Targeted Universal Adversarial Perturbation(GN-TUAP)algorithm.This approach restricts the direction of perturbations and normalizes abnormal pixel values,ensuring that perturbations progress as much as possible in a direction perpendicular to the decision hyperplane in linear problems.This limits anomalies within the perturbations improves their visual stealthiness,and makes them more challenging for defense methods to detect.To verify the effectiveness,stealthiness,and robustness of GN-TUAP,we proposed a comprehensive threat model.Based on this model,extensive experiments were conducted using the CIFAR-10,CIFAR-100,GTSRB,and MNIST datasets,comparing our method with existing state-of-the-art attack methods.We also tested our perturbation triggers using various defense methods and further experimented on the robustness of the triggers against noise filtering techniques.The experimental outcomes demonstrate that backdoor attacks leveraging perturbations generated via our algorithm exhibit cross-model attack effectiveness and superior stealthiness.Furthermore,they possess robust anti-detection capabilities and maintain commendable performance when subjected to noise-filtering methods.展开更多
The Deep Neural Networks(DNN)training process is widely affected by backdoor attacks.The backdoor attack is excellent at concealing its identity in the DNN by performing well on regular samples and displaying maliciou...The Deep Neural Networks(DNN)training process is widely affected by backdoor attacks.The backdoor attack is excellent at concealing its identity in the DNN by performing well on regular samples and displaying malicious behavior with data poisoning triggers.The state-of-art backdoor attacks mainly follow a certain assumption that the trigger is sample-agnostic and different poisoned samples use the same trigger.To overcome this problem,in this work we are creating a backdoor attack to check their strength to withstand complex defense strategies,and in order to achieve this objective,we are developing an improved Convolutional Neural Network(ICNN)model optimized using a Gradient-based Optimization(GBO)(ICNN-GBO)algorithm.In the ICNN-GBO model,we are injecting the triggers via a steganography and regularization technique.We are generating triggers using a single-pixel,irregular shape,and different sizes.The performance of the proposed methodology is evaluated using different performance metrics such as Attack success rate,stealthiness,pollution index,anomaly index,entropy index,and functionality.When the CNN-GBO model is trained with the poisoned dataset,it will map the malicious code to the target label.The proposed scheme’s effectiveness is verified by the experiments conducted on both the benchmark datasets namely CIDAR-10 andMSCELEB 1M dataset.The results demonstrate that the proposed methodology offers significant defense against the conventional backdoor attack detection frameworks such as STRIP and Neutral cleanse.展开更多
In recent years,the number of parameters of deep neural networks(DNNs)has been increasing rapidly.The training of DNNs is typically computation-intensive.As a result,many users leverage cloud computing and outsource t...In recent years,the number of parameters of deep neural networks(DNNs)has been increasing rapidly.The training of DNNs is typically computation-intensive.As a result,many users leverage cloud computing and outsource their training procedures.Outsourcing computation results in a potential risk called backdoor attack,in which a welltrained DNN would performabnormally on inputs with a certain trigger.Backdoor attacks can also be classified as attacks that exploit fake images.However,most backdoor attacks design a uniformtrigger for all images,which can be easilydetectedand removed.In this paper,we propose a novel adaptivebackdoor attack.We overcome this defect and design a generator to assign a unique trigger for each image depending on its texture.To achieve this goal,we use a texture complexitymetric to create a specialmask for eachimage,which forces the trigger tobe embedded into the rich texture regions.The trigger is distributed in texture regions,which makes it invisible to humans.Besides the stealthiness of triggers,we limit the range of modification of backdoor models to evade detection.Experiments show that our method is efficient in multiple datasets,and traditional detectors cannot reveal the existence of a backdoor.展开更多
Backdoor attacks are emerging security threats to deep neural networks.In these attacks,adversaries manipulate the network by constructing training samples embedded with backdoor triggers.The backdoored model performs...Backdoor attacks are emerging security threats to deep neural networks.In these attacks,adversaries manipulate the network by constructing training samples embedded with backdoor triggers.The backdoored model performs as expected on clean test samples but consistently misclassifies samples containing the backdoor trigger as a specific target label.While quantum neural networks(QNNs)have shown promise in surpassing their classical counterparts in certain machine learning tasks,they are also susceptible to backdoor attacks.However,current attacks on QNNs are constrained by the adversary's understanding of the model structure and specific encoding methods.Given the diversity of encoding methods and model structures in QNNs,the effectiveness of such backdoor attacks remains uncertain.In this paper,we propose an algorithm that leverages dataset-based optimization to initiate backdoor attacks.A malicious adversary can embed backdoor triggers into a QNN model by poisoning only a small portion of the data.The victim QNN maintains high accuracy on clean test samples without the trigger but outputs the target label set by the adversary when predicting samples with the trigger.Furthermore,our proposed attack cannot be easily resisted by existing backdoor detection methods.展开更多
Deep learning models are well known to be susceptible to backdoor attack,where the attacker only needs to provide a tampered dataset on which the triggers are injected.Models trained on the dataset will passively impl...Deep learning models are well known to be susceptible to backdoor attack,where the attacker only needs to provide a tampered dataset on which the triggers are injected.Models trained on the dataset will passively implant the backdoor,and triggers on the input can mislead the models during testing.Our study shows that the model shows different learning behaviors in clean and poisoned subsets during training.Based on this observation,we propose a general training pipeline to defend against backdoor attacks actively.Benign models can be trained from the unreli-able dataset by decoupling the learning process into three stages,i.e.,supervised learning,active unlearning,and active semi-supervised fine-tuning.The effectiveness of our approach has been shown in numerous experiments across various backdoor attacks and datasets.展开更多
To efficiently train the billions of parameters in a giant model,sharing the parameter-fragments within the Federated Learning(FL)framework has become a popular pattern,where each client only trains and shares a fract...To efficiently train the billions of parameters in a giant model,sharing the parameter-fragments within the Federated Learning(FL)framework has become a popular pattern,where each client only trains and shares a fraction of parameters,extending the training of giant models to the broader resources-constrained scenarios.Compared with the previous works where the models are fully exchanged,the fragment-sharing pattern poses some new challenges for the backdoor attacks.In this paper,we investigate the backdoor attack on giant models when they are trained in an FL system.With the help of fine-tuning technique,a backdoor attack method is presented,by which the malicious clients can hide the backdoor in a designated fragment that is going to be shared with the benign clients.Apart from the individual backdoor attack method mentioned above,we additionally show a cooperative backdoor attack method,in which the fragment of a malicious client to be shared only contains a part of the backdoor while the backdoor is injected when the benign client receives all the fragments from the malicious clients.Obviously,the later one is more stealthy and harder to be detected.Extensive experiments have been conducted on the datasets of CIFAR-10 and CIFAR-100 with the ResNet-34 as the testing model.The numerical results show that our backdoor attack methods can achieve an attack success rate close to 100%in about 20 rounds of iterations.展开更多
Deep neural networks(DNNs)and generative AI(GenAI)are increasingly vulnerable to backdoor attacks,where adversaries embed triggers into inputs to cause models to misclassify or misinterpret target labels.Beyond tradit...Deep neural networks(DNNs)and generative AI(GenAI)are increasingly vulnerable to backdoor attacks,where adversaries embed triggers into inputs to cause models to misclassify or misinterpret target labels.Beyond traditional single-trigger scenarios,attackers may inject multiple triggers across various object classes,forming unseen backdoor-object configurations that evade standard detection pipelines.In this paper,we introduce DBOM(Disentangled Backdoor-Object Modeling),a proactive framework that leverages structured disentanglement to identify and neutralize both seen and unseen backdoor threats at the dataset level.Specifically,DBOM factorizes input image representations by modeling triggers and objects as independent primitives in the embedding space through the use of Vision-Language Models(VLMs).By leveraging the frozen,pre-trained encoders of VLMs,our approach decomposes the latent representations into distinct components through a learnable visual prompt repository and prompt prefix tuning,ensuring that the relationships between triggers and objects are explicitly captured.To separate trigger and object representations in the visual prompt repository,we introduce the trigger–object separation and diversity losses that aids in disentangling trigger and object visual features.Next,by aligning image features with feature decomposition and fusion,as well as learned contextual prompt tokens in a shared multimodal space,DBOM enables zero-shot generalization to novel trigger-object pairings that were unseen during training,thereby offering deeper insights into adversarial attack patterns.Experimental results on CIFAR-10 and GTSRB demonstrate that DBOM robustly detects poisoned images prior to downstream training,significantly enhancing the security of DNN training pipelines.展开更多
The pre-training-then-fine-tuning paradigm has been widely used in deep learning.Due to the huge computation cost for pre-training,practitioners usually download pre-trained models from the Internet and fine-tune them...The pre-training-then-fine-tuning paradigm has been widely used in deep learning.Due to the huge computation cost for pre-training,practitioners usually download pre-trained models from the Internet and fine-tune them on downstream datasets,while the downloaded models may suffer backdoor attacks.Different from previous attacks aiming at a target task,we show that a backdoored pre-trained model can behave maliciously in various downstream tasks without foreknowing task information.Attackers can restrict the output representations(the values of output neurons)of trigger-embedded samples to arbitrary predefined values through additional training,namely neuron-level backdoor attack(NeuBA).Since fine-tuning has little effect on model parameters,the fine-tuned model will retain the backdoor functionality and predict a specific label for the samples embedded with the same trigger.To provoke multiple labels in a specific task,attackers can introduce several triggers with predefined contrastive values.In the experiments of both natural language processing(NLP)and computer vision(CV),we show that NeuBA can well control the predictions for trigger-embedded instances with different trigger designs.Our findings sound a red alarm for the wide use of pre-trained models.Finally,we apply several defense methods to NeuBA and find that model pruning is a promising technique to resist NeuBA by omitting backdoored neurons.展开更多
基金supported in part by the"Pioneer"and"Leading Goose"R&D Program of Zhejiang under Grant No. 2024C01169the National Natural Science Foundation of China under Grant Nos. 62441238 and U2441240。
文摘Visual object tracking(VOT),aiming to track a target object in a continuous video,is a fundamental and critical task in computer vision.However,the reliance on third-party resources(e.g.,dataset)for training poses concealed threats to the security of VOT models.In this paper,we reveal that VOT models are vulnerable to a poison-only and targeted backdoor attack,where the adversary can achieve arbitrary tracking predictions by manipulating only part of the training data.Specifically,we first define and formulate three different variants of the targeted attacks:size-manipulation,trajectory-manipulation,and hybrid attacks.To implement these,we introduce Random Video Poisoning(RVP),a novel poison-only strategy that exploits temporal correlations within video data by poisoning entire video sequences.Extensive experiments demonstrate that RVP effectively injects controllable backdoors,enabling precise manipulation of tracking behavior upon trigger activation,while maintaining high performance on benign data,thus ensuring stealth.Our findings not only expose significant vulnerabilities but also highlight that the underlying principles could be adapted for beneficial uses,such as dataset watermarking for copyright protection.
文摘Federated Learning(FL),a practical solution that leverages distributed data across devices without the need for centralized data storage,which enables multiple participants to jointly train models while preserving data privacy and avoiding direct data sharing.Despite its privacy-preserving advantages,FL remains vulnerable to backdoor attacks,where malicious participants introduce backdoors into local models that are then propagated to the global model through the aggregation process.While existing differential privacy defenses have demonstrated effectiveness against backdoor attacks in FL,they often incur a significant degradation in the performance of the aggregated models on benign tasks.To address this limitation,we propose a novel backdoor defense mechanism based on differential privacy.Our approach first utilizes the inherent out-of-distribution characteristics of backdoor samples to identify and exclude malicious model updates that significantly deviate from benign models.By filtering out models that are clearly backdoor-infected before applying differential privacy,our method reduces the required noise level for differential privacy,thereby enhancing model robustness while preserving performance.Experimental evaluations on the CIFAR10 and FEMNIST datasets demonstrate that our method effectively limits the backdoor accuracy to below 15%across various backdoor scenarios while maintaining high main task accuracy.
基金supported in part by the National Natural Science Foundation of China under Grants No.62372087 and No.62072076the Research Fund of State Key Laboratory of Processors under Grant No.CLQ202310the CSC scholarship.
文摘Deep neural networks(DNNs)have found extensive applications in safety-critical artificial intelligence systems,such as autonomous driving and facial recognition systems.However,recent research has revealed their susceptibility to backdoors maliciously injected by adversaries.This vulnerability arises due to the intricate architecture and opacity of DNNs,resulting in numerous redundant neurons embedded within the models.Adversaries exploit these vulnerabilities to conceal malicious backdoor information within DNNs,thereby causing erroneous outputs and posing substantial threats to the efficacy of DNN-based applications.This article presents a comprehensive survey of backdoor attacks against DNNs and the countermeasure methods employed to mitigate them.Initially,we trace the evolution of the concept from traditional backdoor attacks to backdoor attacks against DNNs,highlighting the feasibility and practicality of generating backdoor attacks against DNNs.Subsequently,we provide an overview of notable works encompassing various attack and defense strategies,facilitating a comparative analysis of their approaches.Through these discussions,we offer constructive insights aimed at refining these techniques.Finally,we extend our research perspective to the domain of large language models(LLMs)and synthesize the characteristics and developmental trends of backdoor attacks and defense methods targeting LLMs.Through a systematic review of existing studies on backdoor vulnerabilities in LLMs,we identify critical open challenges in this field and propose actionable directions for future research.
基金Supported by the Fundamental Research Funds for the Central Universities(328202204)。
文摘Federated Learning(FL),a burgeoning technology,has received increasing attention due to its privacy protection capability.However,the base algorithm FedAvg is vulnerable when it suffers from so-called backdoor attacks.Former researchers proposed several robust aggregation methods.Unfortunately,due to the hidden characteristic of backdoor attacks,many of these aggregation methods are unable to defend against backdoor attacks.What's more,the attackers recently have proposed some hiding methods that further improve backdoor attacks'stealthiness,making all the existing robust aggregation methods fail.To tackle the threat of backdoor attacks,we propose a new aggregation method,X-raying Models with A Matrix(XMAM),to reveal the malicious local model updates submitted by the backdoor attackers.Since we observe that the output of the Softmax layer exhibits distinguishable patterns between malicious and benign updates,unlike the existing aggregation algorithms,we focus on the Softmax layer's output in which the backdoor attackers are difficult to hide their malicious behavior.Specifically,like medical X-ray examinations,we investigate the collected local model updates by using a matrix as an input to get their Softmax layer's outputs.Then,we preclude updates whose outputs are abnormal by clustering.Without any training dataset in the server,the extensive evaluations show that our XMAM can effectively distinguish malicious local model updates from benign ones.For instance,when other methods fail to defend against the backdoor attacks at no more than 20%malicious clients,our method can tolerate 45%malicious clients in the black-box mode and about 30%in Projected Gradient Descent(PGD)mode.Besides,under adaptive attacks,the results demonstrate that XMAM can still complete the global model training task even when there are 40%malicious clients.Finally,we analyze our method's screening complexity and compare the real screening time with other methods.The results show that XMAM is about 10–10000 times faster than the existing methods.
基金funded by National Natural Science Foundation of China under Grant No.61806171The Sichuan University of Science&Engineering Talent Project under Grant No.2021RC15Sichuan University of Science&Engineering Graduate Student Innovation Fund under Grant No.Y2023115,The Scientific Research and Innovation Team Program of Sichuan University of Science and Technology under Grant No.SUSE652A006.
文摘Deep Neural Networks(DNNs)are integral to various aspects of modern life,enhancing work efficiency.Nonethe-less,their susceptibility to diverse attack methods,including backdoor attacks,raises security concerns.We aim to investigate backdoor attack methods for image categorization tasks,to promote the development of DNN towards higher security.Research on backdoor attacks currently faces significant challenges due to the distinct and abnormal data patterns of malicious samples,and the meticulous data screening by developers,hindering practical attack implementation.To overcome these challenges,this study proposes a Gaussian Noise-Targeted Universal Adversarial Perturbation(GN-TUAP)algorithm.This approach restricts the direction of perturbations and normalizes abnormal pixel values,ensuring that perturbations progress as much as possible in a direction perpendicular to the decision hyperplane in linear problems.This limits anomalies within the perturbations improves their visual stealthiness,and makes them more challenging for defense methods to detect.To verify the effectiveness,stealthiness,and robustness of GN-TUAP,we proposed a comprehensive threat model.Based on this model,extensive experiments were conducted using the CIFAR-10,CIFAR-100,GTSRB,and MNIST datasets,comparing our method with existing state-of-the-art attack methods.We also tested our perturbation triggers using various defense methods and further experimented on the robustness of the triggers against noise filtering techniques.The experimental outcomes demonstrate that backdoor attacks leveraging perturbations generated via our algorithm exhibit cross-model attack effectiveness and superior stealthiness.Furthermore,they possess robust anti-detection capabilities and maintain commendable performance when subjected to noise-filtering methods.
基金This project was funded by the Deanship of Scientific Research(DSR)at King Abdulaziz University,Jeddah,under Grant No.(RG-91-611-42).
文摘The Deep Neural Networks(DNN)training process is widely affected by backdoor attacks.The backdoor attack is excellent at concealing its identity in the DNN by performing well on regular samples and displaying malicious behavior with data poisoning triggers.The state-of-art backdoor attacks mainly follow a certain assumption that the trigger is sample-agnostic and different poisoned samples use the same trigger.To overcome this problem,in this work we are creating a backdoor attack to check their strength to withstand complex defense strategies,and in order to achieve this objective,we are developing an improved Convolutional Neural Network(ICNN)model optimized using a Gradient-based Optimization(GBO)(ICNN-GBO)algorithm.In the ICNN-GBO model,we are injecting the triggers via a steganography and regularization technique.We are generating triggers using a single-pixel,irregular shape,and different sizes.The performance of the proposed methodology is evaluated using different performance metrics such as Attack success rate,stealthiness,pollution index,anomaly index,entropy index,and functionality.When the CNN-GBO model is trained with the poisoned dataset,it will map the malicious code to the target label.The proposed scheme’s effectiveness is verified by the experiments conducted on both the benchmark datasets namely CIDAR-10 andMSCELEB 1M dataset.The results demonstrate that the proposed methodology offers significant defense against the conventional backdoor attack detection frameworks such as STRIP and Neutral cleanse.
文摘In recent years,the number of parameters of deep neural networks(DNNs)has been increasing rapidly.The training of DNNs is typically computation-intensive.As a result,many users leverage cloud computing and outsource their training procedures.Outsourcing computation results in a potential risk called backdoor attack,in which a welltrained DNN would performabnormally on inputs with a certain trigger.Backdoor attacks can also be classified as attacks that exploit fake images.However,most backdoor attacks design a uniformtrigger for all images,which can be easilydetectedand removed.In this paper,we propose a novel adaptivebackdoor attack.We overcome this defect and design a generator to assign a unique trigger for each image depending on its texture.To achieve this goal,we use a texture complexitymetric to create a specialmask for eachimage,which forces the trigger tobe embedded into the rich texture regions.The trigger is distributed in texture regions,which makes it invisible to humans.Besides the stealthiness of triggers,we limit the range of modification of backdoor models to evade detection.Experiments show that our method is efficient in multiple datasets,and traditional detectors cannot reveal the existence of a backdoor.
基金supported by the National Natural Science Foundation of China(Grant No.62076042)the National Key Research and Development Plan of China,Key Project of Cyberspace Security Governance(Grant No.2022YFB3103103)the Key Research and Development Project of Sichuan Province(Grant Nos.2022YFS0571,2021YFSY0012,2021YFG0332,and 2020YFG0307)。
文摘Backdoor attacks are emerging security threats to deep neural networks.In these attacks,adversaries manipulate the network by constructing training samples embedded with backdoor triggers.The backdoored model performs as expected on clean test samples but consistently misclassifies samples containing the backdoor trigger as a specific target label.While quantum neural networks(QNNs)have shown promise in surpassing their classical counterparts in certain machine learning tasks,they are also susceptible to backdoor attacks.However,current attacks on QNNs are constrained by the adversary's understanding of the model structure and specific encoding methods.Given the diversity of encoding methods and model structures in QNNs,the effectiveness of such backdoor attacks remains uncertain.In this paper,we propose an algorithm that leverages dataset-based optimization to initiate backdoor attacks.A malicious adversary can embed backdoor triggers into a QNN model by poisoning only a small portion of the data.The victim QNN maintains high accuracy on clean test samples without the trigger but outputs the target label set by the adversary when predicting samples with the trigger.Furthermore,our proposed attack cannot be easily resisted by existing backdoor detection methods.
基金supported by the National Nature Science Foundation of China under Grant No.62272007National Nature Science Foundation of China under Grant No.U1936119Major Technology Program of Hainan,China(ZDKJ2019003)。
文摘Deep learning models are well known to be susceptible to backdoor attack,where the attacker only needs to provide a tampered dataset on which the triggers are injected.Models trained on the dataset will passively implant the backdoor,and triggers on the input can mislead the models during testing.Our study shows that the model shows different learning behaviors in clean and poisoned subsets during training.Based on this observation,we propose a general training pipeline to defend against backdoor attacks actively.Benign models can be trained from the unreli-able dataset by decoupling the learning process into three stages,i.e.,supervised learning,active unlearning,and active semi-supervised fine-tuning.The effectiveness of our approach has been shown in numerous experiments across various backdoor attacks and datasets.
基金supported by the National Natural Science Foundation of China(Nos.62102232,62122042,and 62302247)the Shandong Science Fund for Excellent Young Scholars(No.2023HWYQ-007)the Postdoctoral Fellowship Program of CPSF(No.GZC20231460).
文摘To efficiently train the billions of parameters in a giant model,sharing the parameter-fragments within the Federated Learning(FL)framework has become a popular pattern,where each client only trains and shares a fraction of parameters,extending the training of giant models to the broader resources-constrained scenarios.Compared with the previous works where the models are fully exchanged,the fragment-sharing pattern poses some new challenges for the backdoor attacks.In this paper,we investigate the backdoor attack on giant models when they are trained in an FL system.With the help of fine-tuning technique,a backdoor attack method is presented,by which the malicious clients can hide the backdoor in a designated fragment that is going to be shared with the benign clients.Apart from the individual backdoor attack method mentioned above,we additionally show a cooperative backdoor attack method,in which the fragment of a malicious client to be shared only contains a part of the backdoor while the backdoor is injected when the benign client receives all the fragments from the malicious clients.Obviously,the later one is more stealthy and harder to be detected.Extensive experiments have been conducted on the datasets of CIFAR-10 and CIFAR-100 with the ResNet-34 as the testing model.The numerical results show that our backdoor attack methods can achieve an attack success rate close to 100%in about 20 rounds of iterations.
基金supported by the UWF Argo Cyber Emerging Scholars(ACES)program funded by the National Science Foundation(NSF)CyberCorps^(®) Scholarship for Service(SFS)award under grant number 1946442.
文摘Deep neural networks(DNNs)and generative AI(GenAI)are increasingly vulnerable to backdoor attacks,where adversaries embed triggers into inputs to cause models to misclassify or misinterpret target labels.Beyond traditional single-trigger scenarios,attackers may inject multiple triggers across various object classes,forming unseen backdoor-object configurations that evade standard detection pipelines.In this paper,we introduce DBOM(Disentangled Backdoor-Object Modeling),a proactive framework that leverages structured disentanglement to identify and neutralize both seen and unseen backdoor threats at the dataset level.Specifically,DBOM factorizes input image representations by modeling triggers and objects as independent primitives in the embedding space through the use of Vision-Language Models(VLMs).By leveraging the frozen,pre-trained encoders of VLMs,our approach decomposes the latent representations into distinct components through a learnable visual prompt repository and prompt prefix tuning,ensuring that the relationships between triggers and objects are explicitly captured.To separate trigger and object representations in the visual prompt repository,we introduce the trigger–object separation and diversity losses that aids in disentangling trigger and object visual features.Next,by aligning image features with feature decomposition and fusion,as well as learned contextual prompt tokens in a shared multimodal space,DBOM enables zero-shot generalization to novel trigger-object pairings that were unseen during training,thereby offering deeper insights into adversarial attack patterns.Experimental results on CIFAR-10 and GTSRB demonstrate that DBOM robustly detects poisoned images prior to downstream training,significantly enhancing the security of DNN training pipelines.
基金supported by the National Key Research and Development Program of China(No.2020AAA0106500)the National Natural Science Foundation of China(NSFC No.62236004).
文摘The pre-training-then-fine-tuning paradigm has been widely used in deep learning.Due to the huge computation cost for pre-training,practitioners usually download pre-trained models from the Internet and fine-tune them on downstream datasets,while the downloaded models may suffer backdoor attacks.Different from previous attacks aiming at a target task,we show that a backdoored pre-trained model can behave maliciously in various downstream tasks without foreknowing task information.Attackers can restrict the output representations(the values of output neurons)of trigger-embedded samples to arbitrary predefined values through additional training,namely neuron-level backdoor attack(NeuBA).Since fine-tuning has little effect on model parameters,the fine-tuned model will retain the backdoor functionality and predict a specific label for the samples embedded with the same trigger.To provoke multiple labels in a specific task,attackers can introduce several triggers with predefined contrastive values.In the experiments of both natural language processing(NLP)and computer vision(CV),we show that NeuBA can well control the predictions for trigger-embedded instances with different trigger designs.Our findings sound a red alarm for the wide use of pre-trained models.Finally,we apply several defense methods to NeuBA and find that model pruning is a promising technique to resist NeuBA by omitting backdoored neurons.