摘要
后门攻击作为数据投毒攻击的重要方式,严重威胁数据集可靠性和模型训练安全性。现有的主流防御方法大多针对特定目标后门攻击而设计,缺乏对无目标后门攻击的研究。因此,文章提出一种面向无目标后门攻击的投毒样本检测方法。该方法是一种基于预测行为异常的黑盒方法,用于检测潜在的无目标后门样本,主要由两个模块组成:基于预测行为异常的投毒样本检测模块可以根据前后两次预测行为的不同来检测可疑样本;面向投毒样本攻击的扩散模型数据生成模块用于生成与原始数据集相似但不含触发器的新数据集。通过不同类型无目标后门攻击实验和不同生成模型实验,证明了该方法的可行性,以及生成模型尤其是扩散模型在后门攻击检测领域的巨大潜力和应用价值。
Backdoor attacks,as an important way of data poisoning attacks,represent a significant threat to the reliability of datasets and the security of model training.Currently,the predominant defensive strategies are largely targeted-backdoor-attacks and lack of research on non-target backdoor attacks.This study,however,proposed a poisoned sample detection method for untargeted backdoor attacks.This method was to propose a black-box method based on predicted behavioral anomalies to detect potential untargeted backdoor examples.This method consisted of two modules:a poisoned-example-detection module based on predictive behavior anomalies,which detected suspicious examples based on the discrepancy inprediction behaviors betweenthe original and the reconstructed samples;and a diffusion-model-datageneration module for poisoned examples attacks,which generated a new dataset similar to the original dataset,and without triggers.The feasibility of the method is demonstrated through experiments involving different types of targetless backdoor attack and different generative models.The great potential and application value of generative models,especially diffusion models,in the field of backdoor detection and defense is also demonstrated.
作者
逄淑超
李政骁
曲俊怡
马儒昊
陈贺昌
杜安安
PANG Shuchao;LI Zhengxiao;QU Junyi;MA Ruhao;CHEN Hechang;DU Anan(School of Cyber Science and Engineering,Nanjing University of Science and Technology,Nanjing 210094,China;School of Artificial Intelligence,Jilin University,Changchun 130012,China;School of Computer and Software,Nanjing University of Industry Technology,Nanjing 210023,China)
出处
《信息网络安全》
北大核心
2025年第12期1878-1888,共11页
Netinfo Security
基金
国家自然科学基金[62206128]
国家重点研发计划[2023YFB2703900]。
关键词
数据安全
无目标后门攻击
图像识别
生成模型
深度学习
data security
untargeted backdoor attacks
image recognition
generative models
deep learning