期刊文献+
共找到117篇文章
< 1 2 6 >
每页显示 20 50 100
Poison-Only and Targeted Backdoor Attack Against Visual Object Tracking
1
作者 GU Wei SHAO Shuo +2 位作者 ZHOU Lingtao QIN Zhan REN Kui 《ZTE Communications》 2025年第3期3-14,共12页
Visual object tracking(VOT),aiming to track a target object in a continuous video,is a fundamental and critical task in computer vision.However,the reliance on third-party resources(e.g.,dataset)for training poses con... Visual object tracking(VOT),aiming to track a target object in a continuous video,is a fundamental and critical task in computer vision.However,the reliance on third-party resources(e.g.,dataset)for training poses concealed threats to the security of VOT models.In this paper,we reveal that VOT models are vulnerable to a poison-only and targeted backdoor attack,where the adversary can achieve arbitrary tracking predictions by manipulating only part of the training data.Specifically,we first define and formulate three different variants of the targeted attacks:size-manipulation,trajectory-manipulation,and hybrid attacks.To implement these,we introduce Random Video Poisoning(RVP),a novel poison-only strategy that exploits temporal correlations within video data by poisoning entire video sequences.Extensive experiments demonstrate that RVP effectively injects controllable backdoors,enabling precise manipulation of tracking behavior upon trigger activation,while maintaining high performance on benign data,thus ensuring stealth.Our findings not only expose significant vulnerabilities but also highlight that the underlying principles could be adapted for beneficial uses,such as dataset watermarking for copyright protection. 展开更多
关键词 visual object tracking backdoor attack computer vision data security AI safety
在线阅读 下载PDF
Defending against Backdoor Attacks in Federated Learning by Using Differential Privacy and OOD Data Attributes
2
作者 Qingyu Tan Yan Li Byeong-Seok Shin 《Computer Modeling in Engineering & Sciences》 2025年第5期2417-2428,共12页
Federated Learning(FL),a practical solution that leverages distributed data across devices without the need for centralized data storage,which enables multiple participants to jointly train models while preserving dat... Federated Learning(FL),a practical solution that leverages distributed data across devices without the need for centralized data storage,which enables multiple participants to jointly train models while preserving data privacy and avoiding direct data sharing.Despite its privacy-preserving advantages,FL remains vulnerable to backdoor attacks,where malicious participants introduce backdoors into local models that are then propagated to the global model through the aggregation process.While existing differential privacy defenses have demonstrated effectiveness against backdoor attacks in FL,they often incur a significant degradation in the performance of the aggregated models on benign tasks.To address this limitation,we propose a novel backdoor defense mechanism based on differential privacy.Our approach first utilizes the inherent out-of-distribution characteristics of backdoor samples to identify and exclude malicious model updates that significantly deviate from benign models.By filtering out models that are clearly backdoor-infected before applying differential privacy,our method reduces the required noise level for differential privacy,thereby enhancing model robustness while preserving performance.Experimental evaluations on the CIFAR10 and FEMNIST datasets demonstrate that our method effectively limits the backdoor accuracy to below 15%across various backdoor scenarios while maintaining high main task accuracy. 展开更多
关键词 Federated learning backdoor attacks differential privacy out-of-distribution data
在线阅读 下载PDF
A survey of backdoor attacks and defenses:From deep neural networks to large language models
3
作者 Ling-Xin Jin Wei Jiang +5 位作者 Xiang-Yu Wen Mei-Yu Lin Jin-Yu Zhan Xing-Zhi Zhou Maregu Assefa Habtie Naoufel Werghi 《Journal of Electronic Science and Technology》 2025年第3期13-35,共23页
Deep neural networks(DNNs)have found extensive applications in safety-critical artificial intelligence systems,such as autonomous driving and facial recognition systems.However,recent research has revealed their susce... Deep neural networks(DNNs)have found extensive applications in safety-critical artificial intelligence systems,such as autonomous driving and facial recognition systems.However,recent research has revealed their susceptibility to backdoors maliciously injected by adversaries.This vulnerability arises due to the intricate architecture and opacity of DNNs,resulting in numerous redundant neurons embedded within the models.Adversaries exploit these vulnerabilities to conceal malicious backdoor information within DNNs,thereby causing erroneous outputs and posing substantial threats to the efficacy of DNN-based applications.This article presents a comprehensive survey of backdoor attacks against DNNs and the countermeasure methods employed to mitigate them.Initially,we trace the evolution of the concept from traditional backdoor attacks to backdoor attacks against DNNs,highlighting the feasibility and practicality of generating backdoor attacks against DNNs.Subsequently,we provide an overview of notable works encompassing various attack and defense strategies,facilitating a comparative analysis of their approaches.Through these discussions,we offer constructive insights aimed at refining these techniques.Finally,we extend our research perspective to the domain of large language models(LLMs)and synthesize the characteristics and developmental trends of backdoor attacks and defense methods targeting LLMs.Through a systematic review of existing studies on backdoor vulnerabilities in LLMs,we identify critical open challenges in this field and propose actionable directions for future research. 展开更多
关键词 backdoor attacks backdoor defenses Deep neural networks Large language model
在线阅读 下载PDF
XMAM:X-raying models with a matrix to reveal backdoor attacks for federated learning 被引量:1
4
作者 Jianyi Zhang Fangjiao Zhang +3 位作者 Qichao Jin Zhiqiang Wang Xiaodong Lin Xiali Hei 《Digital Communications and Networks》 SCIE CSCD 2024年第4期1154-1167,共14页
Federated Learning(FL),a burgeoning technology,has received increasing attention due to its privacy protection capability.However,the base algorithm FedAvg is vulnerable when it suffers from so-called backdoor attacks... Federated Learning(FL),a burgeoning technology,has received increasing attention due to its privacy protection capability.However,the base algorithm FedAvg is vulnerable when it suffers from so-called backdoor attacks.Former researchers proposed several robust aggregation methods.Unfortunately,due to the hidden characteristic of backdoor attacks,many of these aggregation methods are unable to defend against backdoor attacks.What's more,the attackers recently have proposed some hiding methods that further improve backdoor attacks'stealthiness,making all the existing robust aggregation methods fail.To tackle the threat of backdoor attacks,we propose a new aggregation method,X-raying Models with A Matrix(XMAM),to reveal the malicious local model updates submitted by the backdoor attackers.Since we observe that the output of the Softmax layer exhibits distinguishable patterns between malicious and benign updates,unlike the existing aggregation algorithms,we focus on the Softmax layer's output in which the backdoor attackers are difficult to hide their malicious behavior.Specifically,like medical X-ray examinations,we investigate the collected local model updates by using a matrix as an input to get their Softmax layer's outputs.Then,we preclude updates whose outputs are abnormal by clustering.Without any training dataset in the server,the extensive evaluations show that our XMAM can effectively distinguish malicious local model updates from benign ones.For instance,when other methods fail to defend against the backdoor attacks at no more than 20%malicious clients,our method can tolerate 45%malicious clients in the black-box mode and about 30%in Projected Gradient Descent(PGD)mode.Besides,under adaptive attacks,the results demonstrate that XMAM can still complete the global model training task even when there are 40%malicious clients.Finally,we analyze our method's screening complexity and compare the real screening time with other methods.The results show that XMAM is about 10–10000 times faster than the existing methods. 展开更多
关键词 Federated learning backdoor attacks Aggregation methods
在线阅读 下载PDF
A Gaussian Noise-Based Algorithm for Enhancing Backdoor Attacks
5
作者 Hong Huang Yunfei Wang +1 位作者 Guotao Yuan Xin Li 《Computers, Materials & Continua》 SCIE EI 2024年第7期361-387,共27页
Deep Neural Networks(DNNs)are integral to various aspects of modern life,enhancing work efficiency.Nonethe-less,their susceptibility to diverse attack methods,including backdoor attacks,raises security concerns.We aim... Deep Neural Networks(DNNs)are integral to various aspects of modern life,enhancing work efficiency.Nonethe-less,their susceptibility to diverse attack methods,including backdoor attacks,raises security concerns.We aim to investigate backdoor attack methods for image categorization tasks,to promote the development of DNN towards higher security.Research on backdoor attacks currently faces significant challenges due to the distinct and abnormal data patterns of malicious samples,and the meticulous data screening by developers,hindering practical attack implementation.To overcome these challenges,this study proposes a Gaussian Noise-Targeted Universal Adversarial Perturbation(GN-TUAP)algorithm.This approach restricts the direction of perturbations and normalizes abnormal pixel values,ensuring that perturbations progress as much as possible in a direction perpendicular to the decision hyperplane in linear problems.This limits anomalies within the perturbations improves their visual stealthiness,and makes them more challenging for defense methods to detect.To verify the effectiveness,stealthiness,and robustness of GN-TUAP,we proposed a comprehensive threat model.Based on this model,extensive experiments were conducted using the CIFAR-10,CIFAR-100,GTSRB,and MNIST datasets,comparing our method with existing state-of-the-art attack methods.We also tested our perturbation triggers using various defense methods and further experimented on the robustness of the triggers against noise filtering techniques.The experimental outcomes demonstrate that backdoor attacks leveraging perturbations generated via our algorithm exhibit cross-model attack effectiveness and superior stealthiness.Furthermore,they possess robust anti-detection capabilities and maintain commendable performance when subjected to noise-filtering methods. 展开更多
关键词 Image classification model backdoor attack gaussian distribution Artificial Intelligence(AI)security
在线阅读 下载PDF
An Improved Optimized Model for Invisible Backdoor Attack Creation Using Steganography 被引量:2
6
作者 Daniyal M.Alghazzawi Osama Bassam J.Rabie +1 位作者 Surbhi Bhatia Syed Hamid Hasan 《Computers, Materials & Continua》 SCIE EI 2022年第7期1173-1193,共21页
The Deep Neural Networks(DNN)training process is widely affected by backdoor attacks.The backdoor attack is excellent at concealing its identity in the DNN by performing well on regular samples and displaying maliciou... The Deep Neural Networks(DNN)training process is widely affected by backdoor attacks.The backdoor attack is excellent at concealing its identity in the DNN by performing well on regular samples and displaying malicious behavior with data poisoning triggers.The state-of-art backdoor attacks mainly follow a certain assumption that the trigger is sample-agnostic and different poisoned samples use the same trigger.To overcome this problem,in this work we are creating a backdoor attack to check their strength to withstand complex defense strategies,and in order to achieve this objective,we are developing an improved Convolutional Neural Network(ICNN)model optimized using a Gradient-based Optimization(GBO)(ICNN-GBO)algorithm.In the ICNN-GBO model,we are injecting the triggers via a steganography and regularization technique.We are generating triggers using a single-pixel,irregular shape,and different sizes.The performance of the proposed methodology is evaluated using different performance metrics such as Attack success rate,stealthiness,pollution index,anomaly index,entropy index,and functionality.When the CNN-GBO model is trained with the poisoned dataset,it will map the malicious code to the target label.The proposed scheme’s effectiveness is verified by the experiments conducted on both the benchmark datasets namely CIDAR-10 andMSCELEB 1M dataset.The results demonstrate that the proposed methodology offers significant defense against the conventional backdoor attack detection frameworks such as STRIP and Neutral cleanse. 展开更多
关键词 Convolutional neural network gradient-based optimization STEGANOGRAPHY backdoor attack and regularization attack
在线阅读 下载PDF
Adaptive Backdoor Attack against Deep Neural Networks 被引量:1
7
作者 Honglu He Zhiying Zhu Xinpeng Zhang 《Computer Modeling in Engineering & Sciences》 SCIE EI 2023年第9期2617-2633,共17页
In recent years,the number of parameters of deep neural networks(DNNs)has been increasing rapidly.The training of DNNs is typically computation-intensive.As a result,many users leverage cloud computing and outsource t... In recent years,the number of parameters of deep neural networks(DNNs)has been increasing rapidly.The training of DNNs is typically computation-intensive.As a result,many users leverage cloud computing and outsource their training procedures.Outsourcing computation results in a potential risk called backdoor attack,in which a welltrained DNN would performabnormally on inputs with a certain trigger.Backdoor attacks can also be classified as attacks that exploit fake images.However,most backdoor attacks design a uniformtrigger for all images,which can be easilydetectedand removed.In this paper,we propose a novel adaptivebackdoor attack.We overcome this defect and design a generator to assign a unique trigger for each image depending on its texture.To achieve this goal,we use a texture complexitymetric to create a specialmask for eachimage,which forces the trigger tobe embedded into the rich texture regions.The trigger is distributed in texture regions,which makes it invisible to humans.Besides the stealthiness of triggers,we limit the range of modification of backdoor models to evade detection.Experiments show that our method is efficient in multiple datasets,and traditional detectors cannot reveal the existence of a backdoor. 展开更多
关键词 backdoor attack AI security DNN
在线阅读 下载PDF
A backdoor attack against quantum neural networks with limited information
8
作者 黄晨猗 张仕斌 《Chinese Physics B》 SCIE EI CAS CSCD 2023年第10期219-228,共10页
Backdoor attacks are emerging security threats to deep neural networks.In these attacks,adversaries manipulate the network by constructing training samples embedded with backdoor triggers.The backdoored model performs... Backdoor attacks are emerging security threats to deep neural networks.In these attacks,adversaries manipulate the network by constructing training samples embedded with backdoor triggers.The backdoored model performs as expected on clean test samples but consistently misclassifies samples containing the backdoor trigger as a specific target label.While quantum neural networks(QNNs)have shown promise in surpassing their classical counterparts in certain machine learning tasks,they are also susceptible to backdoor attacks.However,current attacks on QNNs are constrained by the adversary's understanding of the model structure and specific encoding methods.Given the diversity of encoding methods and model structures in QNNs,the effectiveness of such backdoor attacks remains uncertain.In this paper,we propose an algorithm that leverages dataset-based optimization to initiate backdoor attacks.A malicious adversary can embed backdoor triggers into a QNN model by poisoning only a small portion of the data.The victim QNN maintains high accuracy on clean test samples without the trigger but outputs the target label set by the adversary when predicting samples with the trigger.Furthermore,our proposed attack cannot be easily resisted by existing backdoor detection methods. 展开更多
关键词 backdoor attack quantum artificial intelligence security quantum neural network variational quantum circuit
原文传递
DLP:towards active defense against backdoor attacks with decoupled learning process
9
作者 Zonghao Ying Bin Wu 《Cybersecurity》 EI CSCD 2024年第1期122-134,共13页
Deep learning models are well known to be susceptible to backdoor attack,where the attacker only needs to provide a tampered dataset on which the triggers are injected.Models trained on the dataset will passively impl... Deep learning models are well known to be susceptible to backdoor attack,where the attacker only needs to provide a tampered dataset on which the triggers are injected.Models trained on the dataset will passively implant the backdoor,and triggers on the input can mislead the models during testing.Our study shows that the model shows different learning behaviors in clean and poisoned subsets during training.Based on this observation,we propose a general training pipeline to defend against backdoor attacks actively.Benign models can be trained from the unreli-able dataset by decoupling the learning process into three stages,i.e.,supervised learning,active unlearning,and active semi-supervised fine-tuning.The effectiveness of our approach has been shown in numerous experiments across various backdoor attacks and datasets. 展开更多
关键词 Deep learning backdoor attack Active defense
原文传递
Backdoor Attack to Giant Model in Fragment-Sharing Federated Learning
10
作者 Senmao Qi Hao Ma +4 位作者 Yifei Zou Yuan Yuan Zhenzhen Xie Peng Li Xiuzhen Cheng 《Big Data Mining and Analytics》 CSCD 2024年第4期1084-1097,共14页
To efficiently train the billions of parameters in a giant model,sharing the parameter-fragments within the Federated Learning(FL)framework has become a popular pattern,where each client only trains and shares a fract... To efficiently train the billions of parameters in a giant model,sharing the parameter-fragments within the Federated Learning(FL)framework has become a popular pattern,where each client only trains and shares a fraction of parameters,extending the training of giant models to the broader resources-constrained scenarios.Compared with the previous works where the models are fully exchanged,the fragment-sharing pattern poses some new challenges for the backdoor attacks.In this paper,we investigate the backdoor attack on giant models when they are trained in an FL system.With the help of fine-tuning technique,a backdoor attack method is presented,by which the malicious clients can hide the backdoor in a designated fragment that is going to be shared with the benign clients.Apart from the individual backdoor attack method mentioned above,we additionally show a cooperative backdoor attack method,in which the fragment of a malicious client to be shared only contains a part of the backdoor while the backdoor is injected when the benign client receives all the fragments from the malicious clients.Obviously,the later one is more stealthy and harder to be detected.Extensive experiments have been conducted on the datasets of CIFAR-10 and CIFAR-100 with the ResNet-34 as the testing model.The numerical results show that our backdoor attack methods can achieve an attack success rate close to 100%in about 20 rounds of iterations. 展开更多
关键词 Federated Learning(FL) giant model backdoor attack fragment-sharing
原文传递
Proactive Disentangled Modeling of Trigger-Object Pairings for Backdoor Defense
11
作者 Kyle Stein Andrew AMahyari +1 位作者 Guillermo Francia III Eman El-Sheikh 《Computers, Materials & Continua》 2025年第10期1001-1018,共18页
Deep neural networks(DNNs)and generative AI(GenAI)are increasingly vulnerable to backdoor attacks,where adversaries embed triggers into inputs to cause models to misclassify or misinterpret target labels.Beyond tradit... Deep neural networks(DNNs)and generative AI(GenAI)are increasingly vulnerable to backdoor attacks,where adversaries embed triggers into inputs to cause models to misclassify or misinterpret target labels.Beyond traditional single-trigger scenarios,attackers may inject multiple triggers across various object classes,forming unseen backdoor-object configurations that evade standard detection pipelines.In this paper,we introduce DBOM(Disentangled Backdoor-Object Modeling),a proactive framework that leverages structured disentanglement to identify and neutralize both seen and unseen backdoor threats at the dataset level.Specifically,DBOM factorizes input image representations by modeling triggers and objects as independent primitives in the embedding space through the use of Vision-Language Models(VLMs).By leveraging the frozen,pre-trained encoders of VLMs,our approach decomposes the latent representations into distinct components through a learnable visual prompt repository and prompt prefix tuning,ensuring that the relationships between triggers and objects are explicitly captured.To separate trigger and object representations in the visual prompt repository,we introduce the trigger–object separation and diversity losses that aids in disentangling trigger and object visual features.Next,by aligning image features with feature decomposition and fusion,as well as learned contextual prompt tokens in a shared multimodal space,DBOM enables zero-shot generalization to novel trigger-object pairings that were unseen during training,thereby offering deeper insights into adversarial attack patterns.Experimental results on CIFAR-10 and GTSRB demonstrate that DBOM robustly detects poisoned images prior to downstream training,significantly enhancing the security of DNN training pipelines. 展开更多
关键词 backdoor attacks generative AI DISENTANGLEMENT
在线阅读 下载PDF
Red Alarm for Pre-trained Models:Universal Vulnerability to Neuron-level Backdoor Attacks 被引量:5
12
作者 Zhengyan Zhang Guangxuan Xiao +6 位作者 Yongwei Li Tian Lv Fanchao Qi Zhiyuan Liu Yasheng Wang Xin Jiang Maosong Sun 《Machine Intelligence Research》 EI CSCD 2023年第2期180-193,共14页
The pre-training-then-fine-tuning paradigm has been widely used in deep learning.Due to the huge computation cost for pre-training,practitioners usually download pre-trained models from the Internet and fine-tune them... The pre-training-then-fine-tuning paradigm has been widely used in deep learning.Due to the huge computation cost for pre-training,practitioners usually download pre-trained models from the Internet and fine-tune them on downstream datasets,while the downloaded models may suffer backdoor attacks.Different from previous attacks aiming at a target task,we show that a backdoored pre-trained model can behave maliciously in various downstream tasks without foreknowing task information.Attackers can restrict the output representations(the values of output neurons)of trigger-embedded samples to arbitrary predefined values through additional training,namely neuron-level backdoor attack(NeuBA).Since fine-tuning has little effect on model parameters,the fine-tuned model will retain the backdoor functionality and predict a specific label for the samples embedded with the same trigger.To provoke multiple labels in a specific task,attackers can introduce several triggers with predefined contrastive values.In the experiments of both natural language processing(NLP)and computer vision(CV),we show that NeuBA can well control the predictions for trigger-embedded instances with different trigger designs.Our findings sound a red alarm for the wide use of pre-trained models.Finally,we apply several defense methods to NeuBA and find that model pruning is a promising technique to resist NeuBA by omitting backdoored neurons. 展开更多
关键词 Pre-trained language models backdoor attacks transformers natural language processing(NLP) computer vision(CV)
原文传递
面向模型量化的安全性研究综述 被引量:1
13
作者 陈晋音 曹志骐 +1 位作者 郑海斌 郑雅羽 《小型微型计算机系统》 北大核心 2025年第6期1473-1490,共18页
随着边缘智能设备的飞速发展,为了在资源受限的边缘端设备上部署参数和存储需求巨大的深度模型,模型压缩技术显得至关重要.现有的模型压缩主要包含剪枝、量化、知识蒸馏和低秩分解4类,量化凭借推理快、功耗低、存储少的优势,使它成为了... 随着边缘智能设备的飞速发展,为了在资源受限的边缘端设备上部署参数和存储需求巨大的深度模型,模型压缩技术显得至关重要.现有的模型压缩主要包含剪枝、量化、知识蒸馏和低秩分解4类,量化凭借推理快、功耗低、存储少的优势,使它成为了边缘端部署的常用技术.然而,已有的量化方法主要关注的是模型量化后的模型精度损失和内存占用情况,而忽略模型量化可能面临的安全性威胁.因此,针对模型量化的安全性研究显得尤为重要.本文首次针对模型量化的安全性问题展开分析,首先定义了模型量化的攻防理论,其次按照模型量化前和模型量化过程中两个阶段对量化攻击方法和量化防御方法进行分析归纳,整理了针对不同攻击任务进行的通用基准数据集与主要评价指标,最后探讨了模型量化的安全性研究及其应用,以及未来潜在研究方向,进一步推动模型量化的安全性研究发展和应用. 展开更多
关键词 模型量化 模型安全 对抗攻击 后门攻击 隐私窃取 公平性 模型防御
在线阅读 下载PDF
面向激光雷达的自动驾驶相关任务安全性综述
14
作者 陈晋音 赵卓 +2 位作者 徐曦恩 项圣 郑海斌 《小型微型计算机系统》 北大核心 2025年第7期1590-1605,共16页
自动驾驶技术的迅猛发展,推动了激光雷达的应用.激光雷达以其卓越的环境感知、导航和避障能力,在自动驾驶领域扮演着关键角色.随着人工智能和深度学习技术的不断进步,三维数据处理技术取得了显著成果,并在多个场景中得到应用.然而,随着... 自动驾驶技术的迅猛发展,推动了激光雷达的应用.激光雷达以其卓越的环境感知、导航和避障能力,在自动驾驶领域扮演着关键角色.随着人工智能和深度学习技术的不断进步,三维数据处理技术取得了显著成果,并在多个场景中得到应用.然而,随着技术的应用深入,其安全性问题日益凸显,例如行驶中的车辆可能会错误识别出不存在的物体.而现有研究多聚焦于单一任务,缺乏对安全性问题的综合性论述,尤其是对后门攻击的研究相对匮乏.因此,本文首次全面评估和分析了基于激光雷达在自动驾驶中的安全性问题,特别是对抗攻击和后门攻击的挑战.文章首先阐述了激光雷达的工作原理及其在自动驾驶任务中的应用,包括目标分类、目标检测和语义分割3大类.具体而言,本综述深入探讨了55篇相关论文,系统地介绍了不同任务下的攻击方法和防御策略.进一步,本文提供了11个公共数据集、7个评估指标、7个常用模型和4个仿真平台,为研究者提供了宝贵的资源和工具.最后,文章结合当前面临的挑战与未来机遇,对激光雷达在自动驾驶安全应用的研究方向进行了前瞻性展望,旨在为激光雷达技术的安全可靠应用提供指导和参考. 展开更多
关键词 激光雷达 目标分类 语义分割 目标检测 对抗攻击 后门攻击
在线阅读 下载PDF
多模态大模型安全研究进展 被引量:1
15
作者 郭园方 余梓彤 +8 位作者 刘艾杉 周文柏 乔通 李斌 张卫明 康显桂 周琳娜 俞能海 黄继武 《中国图象图形学报》 北大核心 2025年第6期2051-2081,共31页
多模态大模型的安全性研究已成为当下人工智能领域的焦点。由于大模型以深度神经网络为核心构建,因此与深度神经网络类似,存在多种安全风险。此外,由于其特有的复杂性,以及广泛的应用场景,也使得大模型面临一些独特的安全风险。本文系... 多模态大模型的安全性研究已成为当下人工智能领域的焦点。由于大模型以深度神经网络为核心构建,因此与深度神经网络类似,存在多种安全风险。此外,由于其特有的复杂性,以及广泛的应用场景,也使得大模型面临一些独特的安全风险。本文系统地总结多模态大模型的安全风险,包括对抗攻击、越狱攻击、后门攻击、版权窃取、幻觉现象、泛化问题以及偏见问题等。具体来说,在对抗攻击中,攻击者通过构造微小但具有欺骗性的对抗样本,使大模型在面对带噪输入时产生严重的误判;越狱攻击利用大模型的复杂结构,绕过或破坏原有的安全约束和防御措施,使模型执行未授权的操作,甚至泄露敏感数据;后门攻击则通过在大模型的训练阶段植入隐秘的触发器,使模型在特定条件下做出攻击者预期的反应;未经授权的窃取者可能未经模型拥有者的同意随意分发或进行商业使用,将导致模型版权拥有者遭受损失;幻觉现象,即模型输出与输入不一致的问题;泛化问题即大模型当前应对部分新数据分布或风格的能力仍显不足;大模型在性别、种族、肤色和年龄等敏感问题上的偏向性可能引发伦理等问题。随后,针对这些安全风险分别介绍相应的解决方案。本文旨在为理解和应对多模态大模型的独特安全挑战提供一个独特的视角,促进多模态大模型安全技术的发展,引导未来相关安全技术的发展方向。 展开更多
关键词 多模态大模型 大模型安全 对抗样本(AE) 越狱攻击 后门攻击 版权窃取 模型幻觉 模型偏见
原文传递
结合图像显著性区域的局部动态干净标签后门攻击
16
作者 洪维 耿沛霖 +2 位作者 王弘宇 张雪芹 顾春华 《计算机科学与探索》 北大核心 2025年第8期2229-2240,共12页
随着深度学习技术的广泛应用,针对深度学习模型的后门攻击也越来越多。研究后门攻击对揭示人工智能领域存在的安全隐患具有重要意义。为改进现有干净标签后门攻击方法在实际场景下可行性较低、隐蔽性不够高、攻击效果不佳等问题,提出了... 随着深度学习技术的广泛应用,针对深度学习模型的后门攻击也越来越多。研究后门攻击对揭示人工智能领域存在的安全隐患具有重要意义。为改进现有干净标签后门攻击方法在实际场景下可行性较低、隐蔽性不够高、攻击效果不佳等问题,提出了一种结合图像显著性区域的局部动态干净标签后门攻击方法。在仅掌握少量目标类数据的前提下,该方法引入代理模型训练方法,并通过隐式语义数据增广(ISDA)在训练阶段增加样本多样性。利用小批量随机梯度下降(MBSGD)优化算法生成与目标类相匹配的扰动,并设计特征分离正则化(FDR)方法,扩大中毒图像特征与干净图像特征的差异,从而提高攻击的有效性。为了增强攻击的隐蔽性和鲁棒性,采用Grad-CAM算法提取输入图像的显著性区域,将扰动限制在这些关键像素上,使生成的中毒样本触发器具有局部动态性。实验结果表明,所提方法在不超过0.05%的低中毒率下,攻击性能仍能超过目前一些先进的干净标签攻击方法,对现有防御模型仍然具备威胁性。 展开更多
关键词 深度学习 后门攻击 干净标签攻击 显著性区域 特征分离
在线阅读 下载PDF
联邦学习中针对后门攻击的检测与防御方案
17
作者 苏锦涛 葛丽娜 +2 位作者 肖礼广 邹经 王哲 《计算机应用》 北大核心 2025年第8期2399-2408,共10页
针对联邦学习(FL)系统中普遍存在的恶意后门攻击行为,以及现有防御方案难以在隐私保护与模型训练的高准确率之间取得平衡的难题,探索FL中的后门攻击及其防御方法,提出一种名为GKFL(Generative Knowledge-based Federated Learning)的安... 针对联邦学习(FL)系统中普遍存在的恶意后门攻击行为,以及现有防御方案难以在隐私保护与模型训练的高准确率之间取得平衡的难题,探索FL中的后门攻击及其防御方法,提出一种名为GKFL(Generative Knowledge-based Federated Learning)的安全高效集成方案用于检测后门攻击并修复受损模型。该方案无需访问参与方的原始隐私数据,通过中央服务器生成检测数据检测联邦学习中的聚合模型是否遭受后门入侵,并采用知识蒸馏技术恢复受损模型,从而确保模型的完整性和准确性。在数据集MNIST和Fashion-MNIST上的实验结果表明,GKFL的总体性能均优于经典方案FoolsGold、GeoMed和RFA(Robust Federated Aggregation);GKFL比FoolsGold更能保护数据的隐私。可见,GKFL方案拥有检测后门攻击及修复受损模型的能力,并在模型中毒准确率和模型主任务准确率上明显优于对比方案。 展开更多
关键词 联邦学习 后门攻击 数据安全 隐私保护 人工智能安全
在线阅读 下载PDF
深度代码模型安全综述
18
作者 孙伟松 陈宇琛 +7 位作者 赵梓含 陈宏 葛一飞 韩廷旭 黄胜寒 李佳讯 房春荣 陈振宇 《软件学报》 北大核心 2025年第4期1461-1488,共28页
随着深度学习技术在计算机视觉与自然语言处理等领域取得巨大成功,软件工程研究者开始尝试将其引入到软件工程任务求解当中.已有研究结果显示,深度学习技术在各种代码相关任务(例如代码检索与代码摘要)上具有传统方法与机器学习方法无... 随着深度学习技术在计算机视觉与自然语言处理等领域取得巨大成功,软件工程研究者开始尝试将其引入到软件工程任务求解当中.已有研究结果显示,深度学习技术在各种代码相关任务(例如代码检索与代码摘要)上具有传统方法与机器学习方法无法比拟的优势.这些面向代码相关任务训练的深度学习模型统称为深度代码模型.然而,由于神经网络的脆弱性和不可解释性,与自然语言处理模型与图像处理模型一样,深度代码模型安全也面临众多挑战,已经成为软件工程领域的焦点.近年来,研究者提出了众多针对深度代码模型的攻击与防御方法.然而,目前仍缺乏对深度代码模型安全研究的系统性综述,不利于后续研究者对该领域进行快速的了解.因此,为了总结该领域研究现状、挑战及时跟进该领域的最新研究成果,搜集32篇该领域相关论文,并将现有的研究成果主要分为后门攻击与防御技术和对抗攻击与防御技术两类.按照不同技术类别对所收集的论文进行系统地梳理和总结.随后,总结该领域中常用的实验数据集和评估指标.最后,分析该领域所面临的关键挑战以及未来可行的研究方向,旨在为后续研究者进一步推动深度代码模型安全的发展提供有益指导. 展开更多
关键词 深度代码模型 深度代码模型安全 人工智能模型安全 后门攻击与防御 对抗攻击与防御
在线阅读 下载PDF
基于可解释性的不可见后门攻击研究 被引量:1
19
作者 郑嘉熙 陈伟 +1 位作者 尹萍 张怡婷 《信息安全研究》 北大核心 2025年第1期21-27,共7页
深度学习在各种关键任务上取得了显著的成功.然而,最近的研究表明,深度神经网络很容易受到后门攻击,攻击者释放出对良性样本行为正常的反向模型,但将任何触发器施加的样本错误地分类到目标标签上.与对抗性样本不同,后门攻击主要实施在... 深度学习在各种关键任务上取得了显著的成功.然而,最近的研究表明,深度神经网络很容易受到后门攻击,攻击者释放出对良性样本行为正常的反向模型,但将任何触发器施加的样本错误地分类到目标标签上.与对抗性样本不同,后门攻击主要实施在模型训练阶段,用触发器干扰样本,并向模型中注入后门,提出了一种基于可解释性算法的不可见后门攻击方法.与现有的任意设置触发掩膜的工作不同,精心设计了一个基于可解释性的触发掩膜确定,并采用最新型的随机像素扰动作为触发器样式设计,使触发器施加的样本更自然和难以察觉,用以规避人眼的检测,以及对后门攻击的防御策略.通过在CIFAR-10,CIFAR-100和ImageNet数据集上进行了大量的对比实验证明该攻击的有效性和优越性.还使用SSIM指数评估所设计的后门样本与良性样本之间的差异,得到了接近0.99的评估指标,证明了生成的后门样本在目视检查下是无法识别的.最后还证明了攻击的抗防御性,可以抵御现有的后门防御方法. 展开更多
关键词 深度学习 深度神经网络 后门攻击 触发器 可解释性 后门样本
在线阅读 下载PDF
面向深度学习的后门攻击及防御研究综述
20
作者 高梦楠 陈伟 +1 位作者 吴礼发 张伯雷 《软件学报》 北大核心 2025年第7期3271-3305,共35页
深度学习模型是人工智能系统的重要组成部分,被广泛应用于现实多种关键场景.现有研究表明,深度学习的低透明度与弱可解释性使得深度学习模型对扰动敏感.人工智能系统面临多种安全威胁,其中针对深度学习的后门攻击是人工智能系统面临的... 深度学习模型是人工智能系统的重要组成部分,被广泛应用于现实多种关键场景.现有研究表明,深度学习的低透明度与弱可解释性使得深度学习模型对扰动敏感.人工智能系统面临多种安全威胁,其中针对深度学习的后门攻击是人工智能系统面临的重要威胁.为了提高深度学习模型的安全性,全面地介绍计算机视觉、自然语言处理等主流深度学习系统的后门攻击与防御研究进展.首先根据现实中攻击者能力将后门攻击分为全过程可控后门、模型修改后门和仅数据投毒后门.然后根据后门构建方式进行子类划分.接着根据防御策略对象将现有后门防御方法分为基于输入的后门防御与基于模型的后门防御.最后汇总后门攻击常用数据集与评价指标,并总结后门攻击与防御领域存在的问题,在后门攻击的安全应用场景与后门防御的有效性等方面提出建议与展望. 展开更多
关键词 深度学习 后门攻击 后门防御 人工智能安全
在线阅读 下载PDF
上一页 1 2 6 下一页 到第
使用帮助 返回顶部