期刊文献+

基于大语言模型的钓鱼邮件检测技术研究

Research on Phishing Email Detection Based on Large Language Model
在线阅读 下载PDF
导出
摘要 随着钓鱼邮件数量的迅速增加以及对抗技术的不断演进,传统的钓鱼邮件检测方法在效率和准确性方面面临严峻挑战.为此,提出了一种基于大语言模型(large language model,LLM)的钓鱼邮件检测方法,以解决现有系统检测率低、漏报率高及人机交互性差等问题.通过全面分析钓鱼邮件的关键特征,包括邮件头部字段、正文内容、URL、二维码、附件及HTML页面,利用特征插入算法构建高质量的训练数据集.基于预训练语言模型LLaMA和低秩自适应微调技术(low-rank adaptation,LoRA),在仅更新0.72%模型参数(约50 MB)条件下实现领域知识迁移,获得钓鱼邮件检测大模型.实验结果显示,与传统方法相比,基于大语言模型的检测方法显著提升了检测的准确性与鲁棒性,整体准确率达到94.5%,有效降低了误报率,增强了钓鱼邮件特征的分类与解释能力,提供了更具实用性和可靠性的钓鱼邮件检测方案. With the rapid increase in phishing email volumes and the continuous evolution of adversarial techniques,traditional phishing detection methods have encountered significant challenges regarding efficiency and accuracy.To address issues such as low detection rates,high false-negative rates,and poor human-computer interaction in existing systems,the authors proposed a phishing email detection system based on large language model.Through comprehensive analysis of key phishing email characteristics-including header fields,body content,URLs,QR codes,attachments,and HTML pages-they constructed a high-quality training dataset using feature insertion algorithms.Building upon the pre-trained LLaMA model,the researchers implemented LoRA fine-tuning technology,achieving domain knowledge transfer by updating only 0.72%of model parameters(approximately 50 MB).Experimental results demonstrate that compared to traditional methods,the LLM-based detection approach achieves 94.5%overall accuracy with enhanced robustness,effectively reduces false-positive rates,improves classification and interpretation capabilities for phishing email features,and provides a more practical and reliable solution for phishing detection.
作者 袁斌 杨克涵 邹德清 刘勇 张乾坤 Yuan Bin;Yang Kehan;Zou Deqing;Liu Yong;Zhang Qiankun(School of Cyber Science and Engineering,Huazhong University of Science and Technology,Wuhan 430074;Songshan Laboratory,Zhengzhou 452470;Zhongguancun Laboratory,Beijing 100190;Qi An Xin Technology Group Inc,Beijing 100044)
出处 《信息安全研究》 北大核心 2026年第2期151-163,共13页 Journal of Information Security Research
基金 国家自然科学基金项目(62372191) 湖北省自然科学基金项目(2023AFB258) 嵩山实验室项目(241110210200)。
关键词 钓鱼邮件 大语言模型 预训练语言模型 低秩自适应 微调 phishing email large language model pre-trained language model low-rank adaptation fine-tuning
  • 相关文献

参考文献3

二级参考文献1

共引文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部