With the development of anti-virus technology,malicious documents have gradually become the main pathway of Advanced Persistent Threat(APT)attacks,therefore,the development of effective malicious document classifiers ...With the development of anti-virus technology,malicious documents have gradually become the main pathway of Advanced Persistent Threat(APT)attacks,therefore,the development of effective malicious document classifiers has become particularly urgent.Currently,detection methods based on document structure and behavioral features encounter challenges in feature engineering,these methods not only have limited accuracy,but also consume large resources,and usually can only detect documents in specific formats,which lacks versatility and adaptability.To address such problems,this paper proposes a novel malicious document detection method-visualizing documents as GGE images(Grayscale,Grayscale matrix,Entropy).The GGE method visualizes the original byte sequence of the malicious document as a grayscale image,the information entropy sequence of the document as an entropy image,and at the same time,the grayscale level co-occurrence matrix and the texture and spatial information stored in it are converted into grayscale matrix image,and fuses the three types of images to get the GGE color image.The Convolutional Block Attention Module-EfficientNet-B0(CBAM-EfficientNet-B0)model is then used for classification,combining transfer learning and applying the pre-trained model on the ImageNet dataset to the feature extraction process of GGE images.As shown in the experimental results,the GGE method has superior performance compared with other methods,which is suitable for detecting malicious documents in different formats,and achieves an accuracy of 99.44%and 97.39%on Portable Document Format(PDF)and office datasets,respectively,and consumes less time during the detection process,which can be effectively applied to the task of detecting malicious documents in real-time.展开更多
Nowadays,the malicious MS-Office document has already become one of the most effective attacking vectors in APT attacks.Though many protection mechanisms are provided,they have been proved easy to bypass,and the exist...Nowadays,the malicious MS-Office document has already become one of the most effective attacking vectors in APT attacks.Though many protection mechanisms are provided,they have been proved easy to bypass,and the existed detection methods show poor performance when facing malicious documents with unknown vulnerabilities or with few malicious behaviors.In this paper,we first introduce the definition of im-documents,to describe those vulnerable documents which show implicitly malicious behaviors and escape most of public antivirus engines.Then we present GLDOC—a GCN based framework that is aimed at effectively detecting im-documents with dynamic analysis,and improving the possible blind spots of past detection methods.Besides the system call which is the only focus in most researches,we capture all dynamic behaviors in sandbox,take the process tree into consideration and reconstruct both of them into graphs.Using each line to learn each graph,GLDOC trains a 2-channel network as well as a classifier to formulate the malicious document detection problem into a graph learning and classification problem.Experiments show that GLDOC has a comprehensive balance of accuracy rate and false alarm rate−95.33%and 4.33%respectively,outperforming other detection methods.When further testing in a simulated 5-day attacking scenario,our proposed framework still maintains a stable and high detection accuracy on the unknown vulnerabilities.展开更多
基金supported by the Natural Science Foundation of Henan Province(Grant No.242300420297)awarded to Yi Sun.
文摘With the development of anti-virus technology,malicious documents have gradually become the main pathway of Advanced Persistent Threat(APT)attacks,therefore,the development of effective malicious document classifiers has become particularly urgent.Currently,detection methods based on document structure and behavioral features encounter challenges in feature engineering,these methods not only have limited accuracy,but also consume large resources,and usually can only detect documents in specific formats,which lacks versatility and adaptability.To address such problems,this paper proposes a novel malicious document detection method-visualizing documents as GGE images(Grayscale,Grayscale matrix,Entropy).The GGE method visualizes the original byte sequence of the malicious document as a grayscale image,the information entropy sequence of the document as an entropy image,and at the same time,the grayscale level co-occurrence matrix and the texture and spatial information stored in it are converted into grayscale matrix image,and fuses the three types of images to get the GGE color image.The Convolutional Block Attention Module-EfficientNet-B0(CBAM-EfficientNet-B0)model is then used for classification,combining transfer learning and applying the pre-trained model on the ImageNet dataset to the feature extraction process of GGE images.As shown in the experimental results,the GGE method has superior performance compared with other methods,which is suitable for detecting malicious documents in different formats,and achieves an accuracy of 99.44%and 97.39%on Portable Document Format(PDF)and office datasets,respectively,and consumes less time during the detection process,which can be effectively applied to the task of detecting malicious documents in real-time.
基金supported by the National Natural Science Foundation of China(General Program,NO.62176264).
文摘Nowadays,the malicious MS-Office document has already become one of the most effective attacking vectors in APT attacks.Though many protection mechanisms are provided,they have been proved easy to bypass,and the existed detection methods show poor performance when facing malicious documents with unknown vulnerabilities or with few malicious behaviors.In this paper,we first introduce the definition of im-documents,to describe those vulnerable documents which show implicitly malicious behaviors and escape most of public antivirus engines.Then we present GLDOC—a GCN based framework that is aimed at effectively detecting im-documents with dynamic analysis,and improving the possible blind spots of past detection methods.Besides the system call which is the only focus in most researches,we capture all dynamic behaviors in sandbox,take the process tree into consideration and reconstruct both of them into graphs.Using each line to learn each graph,GLDOC trains a 2-channel network as well as a classifier to formulate the malicious document detection problem into a graph learning and classification problem.Experiments show that GLDOC has a comprehensive balance of accuracy rate and false alarm rate−95.33%and 4.33%respectively,outperforming other detection methods.When further testing in a simulated 5-day attacking scenario,our proposed framework still maintains a stable and high detection accuracy on the unknown vulnerabilities.