摘要
随着攻击检测及缓解等安全防护能力的增强,高结构化的文件(如PDF、HTML等)成为当前漏洞利用的主要目标。由于高结构化的文件具有结构复杂、格式多样、自定义规则灵活等特点,恶意样本的模式与规则难以抽取,导致传统基于模式和规则的检测方法难以应对高结构化恶意样本的检测问题。边界值填充、恶意代码嵌入等操作使得恶意样本字节流分布有所改变,依据样本字节流分布差异,本文提出了一种基于深度学习的高结构化恶意样本的检测方法(JLMethod)。该方法使用卷积神经网络对样本文件的字节流特征进行分类,能有效检测出恶意样本。在文档型PDF文件实验中以4.1‰的漏报率、99.59%准确率和在非文档型HTML恶意样本(WebShell)检测实验中以8.5‰的漏报率、98.89%准确率,验证了本文方法在高结构化恶意样本检测方面的可行性。
With the enhancement of security protection capabilities such as attack detection and mitigation,highly structured files(such as PDF,HTML,etc.)have become the main targets of vulnerability exploitation.Due to the structure complexity,formats variety,and the flexibility of customized rules,it is difficult to extract patterns of malicious samples,which brings great challenge to traditional detection techniques based on patterns and rules.It is observed that the construction of malicious samples such as filling boundary values or embedding malicious code can change the distribution of byte streams,thus this paper proposes a method to detect highly structured malicious samples based on deep learning(JLMethod).In details,this method leverage convolutional neural network to classify byte streams features of sample,and then effectively detect malicious samples.Experiment results show that our approach achieves 99.59%accuracy rate and 4.1‰false negative on the detection of highly structured PDF file,98.89%accuracy rate and 8.5‰false negative rate on the detection of highly structured non-document HTML malicious samples(WebShell),which demonstrates the effectiveness of our method.
作者
赵磊
金银山
刘勤亮
张羿辰
ZHAO Lei;JIN Yinshan;LIU Qinliang;ZHANG Yichen(School of Cyber Science and Engineering,Wuhan University,Wuhan 430072,Hubei,China)
出处
《武汉大学学报(理学版)》
CAS
CSCD
北大核心
2019年第6期571-575,共5页
Journal of Wuhan University:Natural Science Edition
基金
国家自然科学基金(61672394,61872273)
关键词
恶意样本
深度学习
漏洞
高结构化
malicious samples
deep learning
vulnerability
highly structured