摘要
针对基于API序列的恶意代码检测方法中,深度学习方法特征可解释性差,传统机器学习方法依赖人工设计特征以及忽视数据间时序特性等问题,从时序分类的角度,提出一种基于API序列的可解释恶意代码检测方法。将恶意代码动态API调用序列转换为熵时间序列;使用时间序列分类中的shapelet方法提取具有辨别性的特征;使用多种分类器构造检测模型。实验结果表明,该方法能够自主学习具有辨别性的时序特征,能够在兼具高准确率的同时提供模型的可解释性分类依据。
In the malicious code detection method based on API sequence,the feature interpretability of deep learning method is poor,the traditional machine learning method relies on artificial design features and ignores the time series characteristics between data.From the perspective of time series classification,an interpretable malicious code detection method based on API sequence was proposed.The malicious code dynamic API call sequence was converted into an entropy time series.The discriminative features were extracted using the shapelet method in time series classification.A detection model was constructed by multiple classifiers.Experimental results show that the proposed method can automatically extract discriminative time series features,and can provide the interpretable classification basis of the model with high accuracy.
作者
高琪琪
师智斌
覃月明
雷海卫
GAO Qi-qi;SHI Zhi-bin;QIN Yue-ming;LEI Hai-wei(No.710 R&D Institute,China State Shipbuilding Corporation Limited,Yichang 443000,China;School of Data Science and Technology,North University of China,Taiyuan 030051,China)
出处
《计算机工程与设计》
北大核心
2023年第6期1642-1648,共7页
Computer Engineering and Design
基金
山西省自然科学基金项目(201801D121155)。
关键词
恶意代码检测
时间序列分类
时序特征
信息熵
沙箱
特征提取
可解释性
malicious code detection
time series classification
time series characteristics
information entropy
sandbox
feature extraction
interpretability