Image classifiers that based on Deep Neural Networks(DNNs)have been proved to be easily fooled by well-designed perturbations.Previous defense methods have the limitations of requiring expensive computation or reducin...Image classifiers that based on Deep Neural Networks(DNNs)have been proved to be easily fooled by well-designed perturbations.Previous defense methods have the limitations of requiring expensive computation or reducing the accuracy of the image classifiers.In this paper,we propose a novel defense method which based on perceptual hash.Our main goal is to destroy the process of perturbations generation by comparing the similarities of images thus achieve the purpose of defense.To verify our idea,we defended against two main attack methods(a white-box attack and a black-box attack)in different DNN-based image classifiers and show that,after using our defense method,the attack-success-rate for all DNN-based image classifiers decreases significantly.More specifically,for the white-box attack,the attack-success-rate is reduced by an average of 36.3%.For the black-box attack,the average attack-success-rate of targeted attack and non-targeted attack has been reduced by 72.8%and 76.7%respectively.The proposed method is a simple and effective defense method and provides a new way to defend against adversarial samples.展开更多
Website fingerprinting,also known asWF,is a traffic analysis attack that enables local eavesdroppers to infer a user’s browsing destination,even when using the Tor anonymity network.While advanced attacks based on de...Website fingerprinting,also known asWF,is a traffic analysis attack that enables local eavesdroppers to infer a user’s browsing destination,even when using the Tor anonymity network.While advanced attacks based on deep neural network(DNN)can performfeature engineering and attain accuracy rates of over 98%,research has demonstrated thatDNNis vulnerable to adversarial samples.As a result,many researchers have explored using adversarial samples as a defense mechanism against DNN-based WF attacks and have achieved considerable success.However,these methods suffer from high bandwidth overhead or require access to the target model,which is unrealistic.This paper proposes CMAES-WFD,a black-box WF defense based on adversarial samples.The process of generating adversarial examples is transformed into a constrained optimization problem solved by utilizing the Covariance Matrix Adaptation Evolution Strategy(CMAES)optimization algorithm.Perturbations are injected into the local parts of the original traffic to control bandwidth overhead.According to the experiment results,CMAES-WFD was able to significantly decrease the accuracy of Deep Fingerprinting(DF)and VarCnn to below 8.3%and the bandwidth overhead to a maximum of only 14.6%and 20.5%,respectively.Specially,for Automated Website Fingerprinting(AWF)with simple structure,CMAES-WFD reduced the classification accuracy to only 6.7%and the bandwidth overhead to less than 7.4%.Moreover,it was demonstrated that CMAES-WFD was robust against adversarial training to a certain extent.展开更多
Recently developed fault classification methods for industrial processes are mainly data-driven.Notably,models based on deep neural networks have significantly improved fault classification accuracy owing to the inclu...Recently developed fault classification methods for industrial processes are mainly data-driven.Notably,models based on deep neural networks have significantly improved fault classification accuracy owing to the inclusion of a large number of data patterns.However,these data-driven models are vulnerable to adversarial attacks;thus,small perturbations on the samples can cause the models to provide incorrect fault predictions.Several recent studies have demonstrated the vulnerability of machine learning methods and the existence of adversarial samples.This paper proposes a black-box attack method with an extreme constraint for a safe-critical industrial fault classification system:Only one variable can be perturbed to craft adversarial samples.Moreover,to hide the adversarial samples in the visualization space,a Jacobian matrix is used to guide the perturbed variable selection,making the adversarial samples in the dimensional reduction space invisible to the human eye.Using the one-variable attack(OVA)method,we explore the vulnerability of industrial variables and fault types,which can help understand the geometric characteristics of fault classification systems.Based on the attack method,a corresponding adversarial training defense method is also proposed,which efficiently defends against an OVA and improves the prediction accuracy of the classifiers.In experiments,the proposed method was tested on two datasets from the Tennessee–Eastman process(TEP)and steel plates(SP).We explore the vulnerability and correlation within variables and faults and verify the effectiveness of OVAs and defenses for various classifiers and datasets.For industrial fault classification systems,the attack success rate of our method is close to(on TEP)or even higher than(on SP)the current most effective first-order white-box attack method,which requires perturbation of all variables.展开更多
In response to the issues of low robustness in anonymization and insufficient availability of anonymized speech in downstream tasks,a speaker anonymization method based on adversarial sample generation is proposed.The...In response to the issues of low robustness in anonymization and insufficient availability of anonymized speech in downstream tasks,a speaker anonymization method based on adversarial sample generation is proposed.The Adam algorithm is applied to iteratively generate adversarial samples,and speaker features are modified by using these samples to alter the corresponding speaker classification results,thereby achieving speech anonymization.Experimental results demonstrate that,compared with the B1 baseline method of the Voice Privacy Challenge 2024,the method’s equal error rate in speaker recognition under semi-informed attacks is improved from 7.64%to 26.30%,greatly enhancing the robustness of the anonymization process.Compared with the B6 baseline method,the method’s word error rate in speech recognition is reduced from 9.39%to 4.25%,and,compared with the SOTA system S1,the accuracy in emotion recognition is improved from 37.84%to 40.18%,effectively protecting the availability of anonymized speech in downstream tasks.In this approach,adversarial perturbations are used to generate adversarial samples,the speaker classification results of the voice are significantly altered,the original speaker’s identity information is concealed,and the robustness of anonymization is enhanced.At the same time,only relatively minor changes are made to the speaker features,thus preserving the data’s availability in downstream tasks.展开更多
Deep learning-based models are vulnerable to adversarial attacks. Defense against adversarial attacks is essential for sensitive and safety-critical scenarios. However, deep learning methods still lack effective and e...Deep learning-based models are vulnerable to adversarial attacks. Defense against adversarial attacks is essential for sensitive and safety-critical scenarios. However, deep learning methods still lack effective and efficient defense mechanisms against adversarial attacks. Most of the existing methods are just stopgaps for specific adversarial samples. The main obstacle is that how adversarial samples fool the deep learning models is still unclear. The underlying working mechanism of adversarial samples has not been well explored, and it is the bottleneck of adversarial attack defense. In this paper, we build a causal model to interpret the generation and performance of adversarial samples. The self-attention/transformer is adopted as a powerful tool in this causal model. Compared to existing methods, causality enables us to analyze adversarial samples more naturally and intrinsically. Based on this causal model, the working mechanism of adversarial samples is revealed, and instructive analysis is provided. Then, we propose simple and effective adversarial sample detection and recognition methods according to the revealed working mechanism. The causal insights enable us to detect and recognize adversarial samples without any extra model or training. Extensive experiments are conducted to demonstrate the effectiveness of the proposed methods. Our methods outperform the state-of-the-art defense methods under various adversarial attacks.展开更多
Recently,many researches have created adversarial samples to enrich the diversity of training data for improving the text classification performance via reducing the loss incurred in the neural network training.Howeve...Recently,many researches have created adversarial samples to enrich the diversity of training data for improving the text classification performance via reducing the loss incurred in the neural network training.However,existing studies have focused solely on adding perturbations to the input,such as text sentences and embedded representations,resulting in adversarial samples that are very similar to the original ones.Such adversarial samples can not significantly improve the diversity of training data,which restricts the potential for improved classification performance.To alleviate the problem,in this paper,we extend the diversity of generated adversarial samples based on the fact that adding different disturbances between different layers of neural network has different effects.We propose a novel neural network with perturbation strategy(PTNet),which generates adversarial samples by adding perturbation to the intrinsic representation of each hidden layer of the neural network.Specifically,we design two different perturbation ways to perturb each hidden layer:1)directly adding a certain threshold perturbation;2)adding the perturbation in the way of adversarial training.Through above settings,we can get more perturbed intrinsic representations of hidden layers and use them as new adversarial samples,thus improving the diversity of the augmented training data.We validate the effectiveness of our approach on six text classification datasets and demonstrate that it improves the classification ability of the model.In particular,the classification accuracy on the sentiment analysis task improved by an average of 1.79%and on question classification task improved by 3.2%compared to the BERT baseline,respectively.展开更多
基金The work is supported by the National Key Research Development Program of China(2016QY01W0200)the National Natural Science Foundation of China NSFC(U1636101,U1736211,U1636219).
文摘Image classifiers that based on Deep Neural Networks(DNNs)have been proved to be easily fooled by well-designed perturbations.Previous defense methods have the limitations of requiring expensive computation or reducing the accuracy of the image classifiers.In this paper,we propose a novel defense method which based on perceptual hash.Our main goal is to destroy the process of perturbations generation by comparing the similarities of images thus achieve the purpose of defense.To verify our idea,we defended against two main attack methods(a white-box attack and a black-box attack)in different DNN-based image classifiers and show that,after using our defense method,the attack-success-rate for all DNN-based image classifiers decreases significantly.More specifically,for the white-box attack,the attack-success-rate is reduced by an average of 36.3%.For the black-box attack,the average attack-success-rate of targeted attack and non-targeted attack has been reduced by 72.8%and 76.7%respectively.The proposed method is a simple and effective defense method and provides a new way to defend against adversarial samples.
基金the Key JCJQ Program of China:2020-JCJQ-ZD-021-00 and 2020-JCJQ-ZD-024-12.
文摘Website fingerprinting,also known asWF,is a traffic analysis attack that enables local eavesdroppers to infer a user’s browsing destination,even when using the Tor anonymity network.While advanced attacks based on deep neural network(DNN)can performfeature engineering and attain accuracy rates of over 98%,research has demonstrated thatDNNis vulnerable to adversarial samples.As a result,many researchers have explored using adversarial samples as a defense mechanism against DNN-based WF attacks and have achieved considerable success.However,these methods suffer from high bandwidth overhead or require access to the target model,which is unrealistic.This paper proposes CMAES-WFD,a black-box WF defense based on adversarial samples.The process of generating adversarial examples is transformed into a constrained optimization problem solved by utilizing the Covariance Matrix Adaptation Evolution Strategy(CMAES)optimization algorithm.Perturbations are injected into the local parts of the original traffic to control bandwidth overhead.According to the experiment results,CMAES-WFD was able to significantly decrease the accuracy of Deep Fingerprinting(DF)and VarCnn to below 8.3%and the bandwidth overhead to a maximum of only 14.6%and 20.5%,respectively.Specially,for Automated Website Fingerprinting(AWF)with simple structure,CMAES-WFD reduced the classification accuracy to only 6.7%and the bandwidth overhead to less than 7.4%.Moreover,it was demonstrated that CMAES-WFD was robust against adversarial training to a certain extent.
基金This work was supported in part by the National Natural Science Foundation of China(NSFC)(92167106,62103362,and 61833014)the Natural Science Foundation of Zhejiang Province(LR18F030001).
文摘Recently developed fault classification methods for industrial processes are mainly data-driven.Notably,models based on deep neural networks have significantly improved fault classification accuracy owing to the inclusion of a large number of data patterns.However,these data-driven models are vulnerable to adversarial attacks;thus,small perturbations on the samples can cause the models to provide incorrect fault predictions.Several recent studies have demonstrated the vulnerability of machine learning methods and the existence of adversarial samples.This paper proposes a black-box attack method with an extreme constraint for a safe-critical industrial fault classification system:Only one variable can be perturbed to craft adversarial samples.Moreover,to hide the adversarial samples in the visualization space,a Jacobian matrix is used to guide the perturbed variable selection,making the adversarial samples in the dimensional reduction space invisible to the human eye.Using the one-variable attack(OVA)method,we explore the vulnerability of industrial variables and fault types,which can help understand the geometric characteristics of fault classification systems.Based on the attack method,a corresponding adversarial training defense method is also proposed,which efficiently defends against an OVA and improves the prediction accuracy of the classifiers.In experiments,the proposed method was tested on two datasets from the Tennessee–Eastman process(TEP)and steel plates(SP).We explore the vulnerability and correlation within variables and faults and verify the effectiveness of OVAs and defenses for various classifiers and datasets.For industrial fault classification systems,the attack success rate of our method is close to(on TEP)or even higher than(on SP)the current most effective first-order white-box attack method,which requires perturbation of all variables.
基金supported by the National Natural Science Foundation of China(61201301,61772166).
文摘In response to the issues of low robustness in anonymization and insufficient availability of anonymized speech in downstream tasks,a speaker anonymization method based on adversarial sample generation is proposed.The Adam algorithm is applied to iteratively generate adversarial samples,and speaker features are modified by using these samples to alter the corresponding speaker classification results,thereby achieving speech anonymization.Experimental results demonstrate that,compared with the B1 baseline method of the Voice Privacy Challenge 2024,the method’s equal error rate in speaker recognition under semi-informed attacks is improved from 7.64%to 26.30%,greatly enhancing the robustness of the anonymization process.Compared with the B6 baseline method,the method’s word error rate in speech recognition is reduced from 9.39%to 4.25%,and,compared with the SOTA system S1,the accuracy in emotion recognition is improved from 37.84%to 40.18%,effectively protecting the availability of anonymized speech in downstream tasks.In this approach,adversarial perturbations are used to generate adversarial samples,the speaker classification results of the voice are significantly altered,the original speaker’s identity information is concealed,and the robustness of anonymization is enhanced.At the same time,only relatively minor changes are made to the speaker features,thus preserving the data’s availability in downstream tasks.
基金supported by National Key Research and Development Program of China(No.2020AAA0140002)Natural Science Foundation of China(Nos.U1836217,62076240,62006225,61906199,62071468,62176025 and U21B200389)the CAAI-Huawei Mind-spore Open Fund.
文摘Deep learning-based models are vulnerable to adversarial attacks. Defense against adversarial attacks is essential for sensitive and safety-critical scenarios. However, deep learning methods still lack effective and efficient defense mechanisms against adversarial attacks. Most of the existing methods are just stopgaps for specific adversarial samples. The main obstacle is that how adversarial samples fool the deep learning models is still unclear. The underlying working mechanism of adversarial samples has not been well explored, and it is the bottleneck of adversarial attack defense. In this paper, we build a causal model to interpret the generation and performance of adversarial samples. The self-attention/transformer is adopted as a powerful tool in this causal model. Compared to existing methods, causality enables us to analyze adversarial samples more naturally and intrinsically. Based on this causal model, the working mechanism of adversarial samples is revealed, and instructive analysis is provided. Then, we propose simple and effective adversarial sample detection and recognition methods according to the revealed working mechanism. The causal insights enable us to detect and recognize adversarial samples without any extra model or training. Extensive experiments are conducted to demonstrate the effectiveness of the proposed methods. Our methods outperform the state-of-the-art defense methods under various adversarial attacks.
基金supported by Project of China National Intellectual Property Administration(No.220134).
文摘Recently,many researches have created adversarial samples to enrich the diversity of training data for improving the text classification performance via reducing the loss incurred in the neural network training.However,existing studies have focused solely on adding perturbations to the input,such as text sentences and embedded representations,resulting in adversarial samples that are very similar to the original ones.Such adversarial samples can not significantly improve the diversity of training data,which restricts the potential for improved classification performance.To alleviate the problem,in this paper,we extend the diversity of generated adversarial samples based on the fact that adding different disturbances between different layers of neural network has different effects.We propose a novel neural network with perturbation strategy(PTNet),which generates adversarial samples by adding perturbation to the intrinsic representation of each hidden layer of the neural network.Specifically,we design two different perturbation ways to perturb each hidden layer:1)directly adding a certain threshold perturbation;2)adding the perturbation in the way of adversarial training.Through above settings,we can get more perturbed intrinsic representations of hidden layers and use them as new adversarial samples,thus improving the diversity of the augmented training data.We validate the effectiveness of our approach on six text classification datasets and demonstrate that it improves the classification ability of the model.In particular,the classification accuracy on the sentiment analysis task improved by an average of 1.79%and on question classification task improved by 3.2%compared to the BERT baseline,respectively.