期刊文献+

基于中文逻辑词的模型劫持攻击方法

Model Hijacking Attack Method Based on Chinese Logical Words
在线阅读 下载PDF
导出
摘要 模型劫持攻击是一种新型攻击方式,通过植入特定词语,能够隐蔽地控制模型执行与原始任务截然不同的劫持任务,使模型拥有者的训练算力成本增加的同时面临潜在的法律风险。目前,已有研究针对德-英文语言翻译模型探索了这一攻击方式,但在中文自然语言处理(natural language processing,NLP)领域尚属空白。中文语言的独特性使得其面临不同于其他语言环境的安全挑战,因此亟需开发针对中文模型的攻击评估方法。基于上述事实,提出了一种基于中文逻辑词的模型劫持攻击方法Cheater,用于评估中文模型的安全性。Cheater针对中-英文NLP任务,首先使用公共模型对劫持数据进行伪装生成过渡数据,再通过在过渡样本中嵌入中文逻辑词的方式对其进行改造生成毒性数据,最后利用毒性数据完成对目标模型的劫持。实验表明,对于Bart[large]模型,Cheater在0.5%的数据投毒率下攻击成功率可以达到90.2%。 Model hijacking attack is a novel attack method that implants specific words to covertly control a model,making it perform tasks different from its original purpose,increasing training costs and exposing the model owner to legal risks.While this attack has been recently studied for German-English models,it remains unexplored in the Chinese natural language processing(NLP)field.Compared with other languages,the unique characteristics of Chinese pose distinct security challenges,making existing attack methods suitable for German-English models not directly applicable to Chinese models.However,these risks posed by this attack can still be exploited by attackers,thereby threatening Chinese models.Therefore,it is crucial to develop an attack evaluation method for Chinese models.Based on these considerations,we propose Cheater,a model hijacking attack method tailored for Chinese-English NLP tasks to evaluate the security of Chinese models.To successfully hijack the target model,Cheater first uses a public translation model to camouflage the hijacking data,generating a transitional dataset.It then embeds Chinese logical words into the transitional dataset to produce malicious data,which is used to hijack the target model.For the Bart[large]model,the experiment shows that Cheater achieves an attack success rate of 90.2%at a 0.5%data contamination rate.
作者 钟一 陈珍珠 付安民 高艳松 Zhong Yi;Chen Zhenzhu;Fu Anmin;Gao Yansong(School of Computing and Artificial Intelligence,Southwestern University of Finance and Economics,Chengdu 611130;Artificial Intelligence and Digital Finance Key Laboratory of Sichuan Province(Southwestern University of Finance and Economics),Chengdu 611130;School of Computer Science and Engineering,Nanjing University of Science and Technology,Nanjing 210094;School of Computer Science and Software Engieering,University of Western Australia,Perth WA,Australia 6009)
出处 《计算机研究与发展》 北大核心 2026年第2期525-538,共14页 Journal of Computer Research and Development
基金 国家自然科学基金项目(62402397,62372236)。
关键词 劫持攻击 人工智能安全 中文模型 自然语言处理 逻辑词 hijacking attack artificial intelligence security Chinese model natural language processing logical words
  • 相关文献

参考文献7

二级参考文献18

共引文献326

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部