摘要
机器学习模型在恶意软件检测中已被广泛应用。当这些模型在训练环境中正确捕获因果关系时,它们可以在部署环境中提供高准确性的检测结果,前提是两个环境之间的因果结构保持稳定。然而,在实际操作中,由于各种因素会导致环境变化,从而使得原有的因果关系发生改变,导致检测准确性下降。本研究提出一种基于因果推断的统一训练框架CSAFE,用于增强恶意软件检测模型在未知部署环境变化下的稳定性。该框架通过识别和过滤恶意行为与无关特征之间的虚假因果关联(Spurious Correlation)来维持模型对本质因果关系的把握。此外,CSAFE提出了一种细化的因果关系过滤与重建策略,以在保持检测准确性的同时增强模型的环境适应能力。本研究通过两个实际的安卓恶意软件数据集,从分布内准确性、环境变化下的因果关系稳定性以及综合检测能力等方面对CSAFE进行了评估。实验结果表明,CSAFE在各类环境变化场景下,将检测准确性提高了13.4%,同时保持了与基线方法相当的分布内准确性。
Machine learning models have gained widespread application in malware detection.These models demonstrate high detection accuracy in deployment environments when they correctly capture causal relationships under proper training conditions.However,the environmental changes caused by the various factors can alter these causal relationships,significantly decreasing malware detection accuracy in practical operations.In this study,a unified training framework CSAFE based on causal inference designed,was proposed to enhance the stability of malware detection models under unknown deployment environment changes.The framework identified and filtered out Spurious Correlations(SC)between malicious behaviors and irrelevant features,while maintaining essential causal relationships that remain stable across different environments.A refined causal relationship filtering and rebuilding strategy was presented to achieve improved accuracy performance while enhancing environmental adaptability.Two real-world Android malware datasets were applied to examine CSAFE’s performance in three aspects,such as in-distribution accuracy,causal relationship stability under environmental changes,and comprehensive detection capabilities.The experimental results showed 13.4%improvement in detection accuracy under various environmental changes while maintaining comparable in-distribution accuracy with the best baseline methods.
作者
蒋屹新
张喜铭
徐文倩
梁志宏
杨祎巍
毕乐宇
徐欢
洪超
张宇南
JIANG Yi-xin;ZHANG Xi-ming;XU Wen-qian;LIANG Zhi-hong;YANG Yi-wei;BI Le-yu;XU Huan;HONG Chao;ZHANG Yu-nan(Electric Power Research Institute,China Southern Power Grid,Guangzhou 510663,China;Guangdong Provincial Key Laboratory of Power System Network Security,Guangzhou 510663,China;China Southern Power Grid,Guangzhou 510663,China)
出处
《印刷与数字媒体技术研究》
北大核心
2025年第6期315-331,352,共18页
Printing and Digital Media Technology Study
关键词
恶意软件检测
因果推断
机器学习
虚假关联过滤
双重解耦重加权
安卓安全
Malware detection
Causal inference
Machine learning
Spurious correlation filtering
Dual decoupling reweighting
Android security