With the development of anti-virus technology,malicious documents have gradually become the main pathway of Advanced Persistent Threat(APT)attacks,therefore,the development of effective malicious document classifiers ...With the development of anti-virus technology,malicious documents have gradually become the main pathway of Advanced Persistent Threat(APT)attacks,therefore,the development of effective malicious document classifiers has become particularly urgent.Currently,detection methods based on document structure and behavioral features encounter challenges in feature engineering,these methods not only have limited accuracy,but also consume large resources,and usually can only detect documents in specific formats,which lacks versatility and adaptability.To address such problems,this paper proposes a novel malicious document detection method-visualizing documents as GGE images(Grayscale,Grayscale matrix,Entropy).The GGE method visualizes the original byte sequence of the malicious document as a grayscale image,the information entropy sequence of the document as an entropy image,and at the same time,the grayscale level co-occurrence matrix and the texture and spatial information stored in it are converted into grayscale matrix image,and fuses the three types of images to get the GGE color image.The Convolutional Block Attention Module-EfficientNet-B0(CBAM-EfficientNet-B0)model is then used for classification,combining transfer learning and applying the pre-trained model on the ImageNet dataset to the feature extraction process of GGE images.As shown in the experimental results,the GGE method has superior performance compared with other methods,which is suitable for detecting malicious documents in different formats,and achieves an accuracy of 99.44%and 97.39%on Portable Document Format(PDF)and office datasets,respectively,and consumes less time during the detection process,which can be effectively applied to the task of detecting malicious documents in real-time.展开更多
With the widespread use of SMS(Short Message Service),the proliferation of malicious SMS has emerged as a pressing societal issue.While deep learning-based text classifiers offer promise,they often exhibit suboptimal ...With the widespread use of SMS(Short Message Service),the proliferation of malicious SMS has emerged as a pressing societal issue.While deep learning-based text classifiers offer promise,they often exhibit suboptimal performance in fine-grained detection tasks,primarily due to imbalanced datasets and insufficient model representation capabilities.To address this challenge,this paper proposes an LLMs-enhanced graph fusion dual-stream Transformer model for fine-grained Chinese malicious SMS detection.During the data processing stage,Large Language Models(LLMs)are employed for data augmentation,mitigating dataset imbalance.In the data input stage,both word-level and character-level features are utilized as model inputs,enhancing the richness of features and preventing information loss.A dual-stream Transformer serves as the backbone network in the learning representation stage,complemented by a graph-based feature fusion mechanism.At the output stage,both supervised classification cross-entropy loss and supervised contrastive learning loss are used as multi-task optimization objectives,further enhancing the model’s feature representation.Experimental results demonstrate that the proposed method significantly outperforms baselines on a publicly available Chinese malicious SMS dataset.展开更多
The increasingly complex and interconnected train control information network is vulnerable to a variety of malicious traffic attacks,and the existing malicious traffic detection methods mainly rely on machine learnin...The increasingly complex and interconnected train control information network is vulnerable to a variety of malicious traffic attacks,and the existing malicious traffic detection methods mainly rely on machine learning,such as poor robustness,weak generalization,and a lack of ability to learn common features.Therefore,this paper proposes a malicious traffic identification method based on stacked sparse denoising autoencoders combined with a regularized extreme learning machine through particle swarm optimization.Firstly,the simulation environment of the Chinese train control system-3,was constructed for data acquisition.Then Pearson coefficient and other methods are used for pre-processing,then a stacked sparse denoising autoencoder is used to achieve nonlinear dimensionality reduction of features,and finally regularization extreme learning machine optimized by particle swarm optimization is used to achieve classification.Experimental data show that the proposed method has good training performance,with an average accuracy of 97.57%and a false negative rate of 2.43%,which is better than other alternative methods.In addition,ablation experiments were performed to evaluate the contribution of each component,and the results showed that the combination of methods was superior to individual methods.To further evaluate the generalization ability of the model in different scenarios,publicly available data sets of industrial control system networks were used.The results show that the model has robust detection capability in various types of network attacks.展开更多
The proliferation of internet traffic encryption has become a double-edged sword. While it significantly enhances user privacy, it also inadvertently shields cyber-attacks from detection, presenting a formidable chall...The proliferation of internet traffic encryption has become a double-edged sword. While it significantly enhances user privacy, it also inadvertently shields cyber-attacks from detection, presenting a formidable challenge to cybersecurity. Traditional machine learning and deep learning techniques often fall short in identifying encrypted malicious traffic due to their inability to fully extract and utilize the implicit relational and positional information embedded within data packets. This limitation has led to an unresolved challenge in the cybersecurity community: how to effectively extract valuable insights from the complex patterns of traffic packet transmission. Consequently, this paper introduces the TB-Graph model, an encrypted malicious traffic classification model based on a relational graph attention network. The model is a heterogeneous traffic burst graph that embeds side-channel features, which are unaffected by encryption, into the graph nodes and connects them with three different types of burst edges. Subsequently, we design a relational positional coding that prevents the loss of temporal relationships between the original traffic flows during graph transformation. Ultimately, TB-Graph leverages the powerful graph representation learning capabilities of Relational Graph Attention Network (RGAT) to extract latent behavioral features from the burst graph nodes and edge relationships. Experimental results show that TB-Graph outperforms various state-of-the-art methods in fine-grained encrypted malicious traffic classification tasks on two public datasets, indicating its enhanced capability for identifying encrypted malicious traffic.展开更多
With the growth of the Internet of Things(IoT)comes a flood of malicious traffic in the IoT,intensifying the challenges of network security.Traditional models operate with independent layers,limiting their effectivene...With the growth of the Internet of Things(IoT)comes a flood of malicious traffic in the IoT,intensifying the challenges of network security.Traditional models operate with independent layers,limiting their effectiveness in addressing these challenges.To address this issue,we propose a cross-layer cooperative Feature Subset-Based Malicious Traffic Detection(FSMMTD)model for detecting malicious traffic.Our approach begins by applying an enhanced random forest method to adaptively filter and retain highly discriminative first-layer features.These processed features are then input into an improved state-space model that integrates the strengths of recurrent neural networks(RNNs)and transformers,enabling superior processing of complex patterns and global information.This integration allows the FSMMTD model to enhance its capability in identifying intricate data relationships and capturing comprehensive contextual insights.The FSMMTD model monitors IoT data flows in real-time,efficiently detecting anomalies and enabling rapid response to potential intrusions.We validate our approach using the publicly available ToN_IoT dataset for IoT traffic analysis.Experimental results demonstrate that our method achieves superior performance with an accuracy of 98.37%,precision of 96.28%,recall of 95.36%,and F1-score of 96.79%.These metrics indicate that the FSMMTD model outperforms existing methods in detecting malicious traffic,showcasing its effectiveness and reliability in enhancing IoT network security.展开更多
While encryption technology safeguards the security of network communications,malicious traffic also uses encryption protocols to obscure its malicious behavior.To address the issues of traditional machine learning me...While encryption technology safeguards the security of network communications,malicious traffic also uses encryption protocols to obscure its malicious behavior.To address the issues of traditional machine learning methods relying on expert experience and the insufficient representation capabilities of existing deep learning methods for encrypted malicious traffic,we propose an encrypted malicious traffic classification method that integrates global semantic features with local spatiotemporal features,called BERT-based Spatio-Temporal Features Network(BSTFNet).At the packet-level granularity,the model captures the global semantic features of packets through the attention mechanism of the Bidirectional Encoder Representations from Transformers(BERT)model.At the byte-level granularity,we initially employ the Bidirectional Gated Recurrent Unit(BiGRU)model to extract temporal features from bytes,followed by the utilization of the Text Convolutional Neural Network(TextCNN)model with multi-sized convolution kernels to extract local multi-receptive field spatial features.The fusion of features from both granularities serves as the ultimate multidimensional representation of malicious traffic.Our approach achieves accuracy and F1-score of 99.39%and 99.40%,respectively,on the publicly available USTC-TFC2016 dataset,and effectively reduces sample confusion within the Neris and Virut categories.The experimental results demonstrate that our method has outstanding representation and classification capabilities for encrypted malicious traffic.展开更多
With the advancement of wireless network technology,vast amounts of traffic have been generated,and malicious traffic attacks that threaten the network environment are becoming increasingly sophisticated.While signatu...With the advancement of wireless network technology,vast amounts of traffic have been generated,and malicious traffic attacks that threaten the network environment are becoming increasingly sophisticated.While signature-based detection methods,static analysis,and dynamic analysis techniques have been previously explored for malicious traffic detection,they have limitations in identifying diversified malware traffic patterns.Recent research has been focused on the application of machine learning to detect these patterns.However,applying machine learning to lightweight devices like IoT devices is challenging because of the high computational demands and complexity involved in the learning process.In this study,we examined methods for effectively utilizing machine learning-based malicious traffic detection approaches for lightweight devices.We introduced the suboptimal feature selection model(SFSM),a feature selection technique designed to reduce complexity while maintaining the effectiveness of malicious traffic detection.Detection performance was evaluated on various malicious traffic,benign,exploits,and generic,using the UNSW-NB15 dataset and SFSM sub-optimized hyperparameters for feature selection and narrowed the search scope to encompass all features.SFSM improved learning performance while minimizing complexity by considering feature selection and exhaustive search as two steps,a problem not considered in conventional models.Our experimental results showed that the detection accuracy was improved by approximately 20%compared to the random model,and the reduction in accuracy compared to the greedy model,which performs an exhaustive search on all features,was kept within 6%.Additionally,latency and complexity were reduced by approximately 96%and 99.78%,respectively,compared to the greedy model.This study demonstrates that malicious traffic can be effectively detected even in lightweight device environments.SFSM verified the possibility of detecting various attack traffic on lightweight devices.展开更多
The potential of text analytics is revealed by Machine Learning(ML)and Natural Language Processing(NLP)techniques.In this paper,we propose an NLP framework that is applied to multiple datasets to detect malicious Unif...The potential of text analytics is revealed by Machine Learning(ML)and Natural Language Processing(NLP)techniques.In this paper,we propose an NLP framework that is applied to multiple datasets to detect malicious Uniform Resource Locators(URLs).Three categories of features,both ML and Deep Learning(DL)algorithms and a ranking schema are included in the proposed framework.We apply frequency and prediction-based embeddings,such as hash vectorizer,Term Frequency-Inverse Dense Frequency(TF-IDF)and predictors,word to vector-word2vec(continuous bag of words,skip-gram)from Google,to extract features from text.Further,we apply more state-of-the-art methods to create vectorized features,such as GloVe.Additionally,feature engineering that is specific to URL structure is deployed to detect scams and other threats.For framework assessment,four ranking indicators are weighted:computational time and performance as accuracy,F1 score and type error II.For the computational time,we propose a new metric-Feature Building Time(FBT)as the cutting-edge feature builders(like doc2vec or GloVe)require more time.By applying the proposed assessment step,the skip-gram algorithm of word2vec surpasses other feature builders in performance.Additionally,eXtreme Gradient Boost(XGB)outperforms other classifiers.With this setup,we attain an accuracy of 99.5%and an F1 score of 0.99.展开更多
The Internet of Things(IoT)has characteristics such as node mobility,node heterogeneity,link heterogeneity,and topology heterogeneity.In the face of the IoT characteristics and the explosive growth of IoT nodes,which ...The Internet of Things(IoT)has characteristics such as node mobility,node heterogeneity,link heterogeneity,and topology heterogeneity.In the face of the IoT characteristics and the explosive growth of IoT nodes,which brings about large-scale data processing requirements,edge computing architecture has become an emerging network architecture to support IoT applications due to its ability to provide powerful computing capabilities and good service functions.However,the defense mechanism of Edge Computing-enabled IoT Nodes(ECIoTNs)is still weak due to their limited resources,so that they are susceptible to malicious software spread,which can compromise data confidentiality and network service availability.Facing this situation,we put forward an epidemiology-based susceptible-curb-infectious-removed-dead(SCIRD)model.Then,we analyze the dynamics of ECIoTNs with different infection levels under different initial conditions to obtain the dynamic differential equations.Additionally,we establish the presence of equilibrium states in the SCIRD model.Furthermore,we conduct an analysis of the model’s stability and examine the conditions under which malicious software will either spread or disappear within Edge Computing-enabled IoT(ECIoT)networks.Lastly,we validate the efficacy and superiority of the SCIRD model through MATLAB simulations.These research findings offer a theoretical foundation for suppressing the propagation of malicious software in ECIoT networks.The experimental results indicate that the theoretical SCIRD model has instructive significance,deeply revealing the principles of malicious software propagation in ECIoT networks.This study solves a challenging security problem of ECIoT networks by determining the malicious software propagation threshold,which lays the foundation for buildingmore secure and reliable ECIoT networks.展开更多
Due to the diversity and unpredictability of changes in malicious code,studying the traceability of variant families remains challenging.In this paper,we propose a GAN-EfficientNetV2-based method for tracing families ...Due to the diversity and unpredictability of changes in malicious code,studying the traceability of variant families remains challenging.In this paper,we propose a GAN-EfficientNetV2-based method for tracing families of malicious code variants.This method leverages the similarity in layouts and textures between images of malicious code variants from the same source and their original family of malicious code images.The method includes a lightweight classifier and a simulator.The classifier utilizes the enhanced EfficientNetV2 to categorize malicious code images and can be easily deployed on mobile,embedded,and other devices.The simulator utilizes an enhanced generative adversarial network to simulate different variants of malicious code and generates datasets to validate the model’s performance.This process helps identify model vulnerabilities and security risks,facilitating model enhancement and development.The classifier achieves 98.61%and 97.59%accuracy on the MMCC dataset and Malevis dataset,respectively.The simulator’s generated image of malicious code variants has an FID value of 155.44 and an IS value of 1.72±0.42.The classifier’s accuracy for tracing the family of malicious code variants is as high as 90.29%,surpassing that of mainstream neural network models.This meets the current demand for high generalization and anti-obfuscation abilities in malicious code classification models due to the rapid evolution of malicious code.展开更多
With the growth of the Internet,more and more business is being done online,for example,online offices,online education and so on.While this makes people’s lives more convenient,it also increases the risk of the netw...With the growth of the Internet,more and more business is being done online,for example,online offices,online education and so on.While this makes people’s lives more convenient,it also increases the risk of the network being attacked by malicious code.Therefore,it is important to identify malicious codes on computer systems efficiently.However,most of the existing malicious code detection methods have two problems:(1)The ability of the model to extract features is weak,resulting in poor model performance.(2)The large scale of model data leads to difficulties deploying on devices with limited resources.Therefore,this paper proposes a lightweight malicious code identification model Lightweight Malicious Code Classification Method Based on Improved SqueezeNet(LCMISNet).In this paper,the MFire lightweight feature extraction module is constructed by proposing a feature slicing module and a multi-size depthwise separable convolution module.The feature slicing module reduces the number of parameters by grouping features.The multi-size depthwise separable convolution module reduces the number of parameters and enhances the feature extraction capability by replacing the standard convolution with depthwise separable convolution with different convolution kernel sizes.In addition,this paper also proposes a feature splicing module to connect the MFire lightweight feature extraction module based on the feature reuse and constructs the lightweight model LCMISNet.The malicious code recognition accuracy of LCMISNet on the BIG 2015 dataset and the Malimg dataset reaches 98.90% and 99.58%,respectively.It proves that LCMISNet has a powerful malicious code recognition performance.In addition,compared with other network models,LCMISNet has better performance,and a lower number of parameters and computations.展开更多
The field of finance heavily relies on cybersecurity to safeguard its systems and clients from harmful software.The identification of malevolent code within financial software is vital for protecting both the financia...The field of finance heavily relies on cybersecurity to safeguard its systems and clients from harmful software.The identification of malevolent code within financial software is vital for protecting both the financial system and individual clients.Nevertheless,present detection models encounter limitations in their ability to identify malevolent code and its variations,all while encompassing a multitude of parameters.To overcome these obsta-cles,we introduce a lean model for classifying families of malevolent code,formulated on Ghost-DenseNet-SE.This model integrates the Ghost module,DenseNet,and the squeeze-and-excitation(SE)channel domain attention mechanism.It substitutes the standard convolutional layer in DenseNet with the Ghost module,thereby diminishing the model’s size and augmenting recognition speed.Additionally,the channel domain attention mechanism assigns distinctive weights to feature channels,facilitating the extraction of pivotal characteristics of malevolent code and bolstering detection precision.Experimental outcomes on the Malimg dataset indicate that the model attained an accuracy of 99.14%in discerning families of malevolent code,surpassing AlexNet(97.8%)and The visual geometry group network(VGGNet)(96.16%).The proposed model exhibits reduced parameters,leading to decreased model complexity alongside enhanced classification accuracy,rendering it a valuable asset for categorizing malevolent code.展开更多
A URL(Uniform Resource Locator)is used to locate a digital resource.With this URL,an attacker can perform a variety of attacks,which can lead to serious consequences for both individuals and organizations.Therefore,at...A URL(Uniform Resource Locator)is used to locate a digital resource.With this URL,an attacker can perform a variety of attacks,which can lead to serious consequences for both individuals and organizations.Therefore,attackers create malicious URLs to gain access to an organization’s systems or sensitive information.It is crucial to secure individuals and organizations against these malicious URLs.A combination of machine learning and deep learning was used to predict malicious URLs.This research contributes significantly to the field of cybersecurity by proposing a model that seamlessly integrates the accuracy of machine learning with the swiftness of deep learning.The strategic fusion of Random Forest(RF) and Multilayer Perceptron(MLP)with an accuracy of 81% represents a noteworthy advancement,offering a balanced solution for robust cybersecurity.This study found that by combining RF and MLP,an efficient model was developed with an accuracy of 81%and a training time of 33.78 s.展开更多
Power Shell has been widely deployed in fileless malware and advanced persistent threat(APT)attacks due to its high stealthiness and live-off-theland technique.However,existing works mainly focus on deobfuscation and ...Power Shell has been widely deployed in fileless malware and advanced persistent threat(APT)attacks due to its high stealthiness and live-off-theland technique.However,existing works mainly focus on deobfuscation and malicious detection,lacking the malicious Power Shell families classification and behavior analysis.Moreover,the state-of-the-art methods fail to capture fine-grained features and semantic relationships,resulting in low robustness and accuracy.To this end,we propose Power Detector,a novel malicious Power Shell script detector based on multimodal semantic fusion and deep learning.Specifically,we design four feature extraction methods to extract key features from character,token,abstract syntax tree(AST),and semantic knowledge graph.Then,we intelligently design four embeddings(i.e.,Char2Vec,Token2Vec,AST2Vec,and Rela2Vec) and construct a multi-modal fusion algorithm to concatenate feature vectors from different views.Finally,we propose a combined model based on transformer and CNN-Bi LSTM to implement Power Shell family detection.Our experiments with five types of Power Shell attacks show that PowerDetector can accurately detect various obfuscated and stealth PowerShell scripts,with a 0.9402 precision,a 0.9358 recall,and a 0.9374 F1-score.Furthermore,through singlemodal and multi-modal comparison experiments,we demonstrate that PowerDetector’s multi-modal embedding and deep learning model can achieve better accuracy and even identify more unknown attacks.展开更多
The limited labeled sample data in the field of advanced security threats detection seriously restricts the effective development of research work.Learning the sample labels from the labeled and unlabeled data has rec...The limited labeled sample data in the field of advanced security threats detection seriously restricts the effective development of research work.Learning the sample labels from the labeled and unlabeled data has received a lot of research attention and various universal labeling methods have been proposed.However,the labeling task of malicious communication samples targeted at advanced threats has to face the two practical challenges:the difficulty of extracting effective features in advance and the complexity of the actual sample types.To address these problems,we proposed a sample labeling method for malicious communication based on semi-supervised deep neural network.This method supports continuous learning and optimization feature representation while labeling sample,and can handle uncertain samples that are outside the concerned sample types.According to the experimental results,our proposed deep neural network can automatically learn effective feature representation,and the validity of features is close to or even higher than that of features which extracted based on expert knowledge.Furthermore,our proposed method can achieve the labeling accuracy of 97.64%~98.50%,which is more accurate than the train-then-detect,kNN and LPA methodsin any labeled-sample proportion condition.The problem of insufficient labeled samples in many network attack detecting scenarios,and our proposed work can function as a reference for the sample labeling tasks in the similar real-world scenarios.展开更多
Background:In recent years,blockchain technology has attracted considerable attention.It records cryptographic transactions in a public ledger that is difficult to alter and compromise because of the distributed conse...Background:In recent years,blockchain technology has attracted considerable attention.It records cryptographic transactions in a public ledger that is difficult to alter and compromise because of the distributed consensus.As a result,blockchain is believed to resist fraud and hacking.Results:This work explores the types of fraud and malicious activities that can be prevented by blockchain technology and identifies attacks to which blockchain remains vulnerable.Conclusions:This study recommends appropriate defensive measures and calls for further research into the techniques for fighting malicious activities related to blockchains.展开更多
Spam is no longer just commercial unsolicited email messages that waste our time, it consumes network traffic and mail servers’ storage. Furthermore, spam has become a major component of several attack vectors includ...Spam is no longer just commercial unsolicited email messages that waste our time, it consumes network traffic and mail servers’ storage. Furthermore, spam has become a major component of several attack vectors including attacks such as phishing, cross-site scripting, cross-site request forgery and malware infection. Statistics show that the amount of spam containing malicious contents increased compared to the one advertising legitimate products and services. In this paper, the issue of spam detection is investigated with the aim to develop an efficient method to identify spam email based on the analysis of the content of email messages. We identify a set of features that have a considerable number of malicious related features. Our goal is to study the effect of these features in helping the classical classifiers in identifying spam emails. To make the problem more challenging, we developed spam classification models based on imbalanced data where spam emails form the rare class with only 16.5% of the total emails. Different metrics were utilized in the evaluation of the developed models. Results show noticeable improvement of spam classification models when trained by dataset that includes malicious related features.展开更多
In the upcoming large-scale Internet of Things(Io T),it is increasingly challenging to defend against malicious traffic,due to the heterogeneity of Io T devices and the diversity of Io T communication protocols.In thi...In the upcoming large-scale Internet of Things(Io T),it is increasingly challenging to defend against malicious traffic,due to the heterogeneity of Io T devices and the diversity of Io T communication protocols.In this paper,we propose a semi-supervised learning-based approach to detect malicious traffic at the access side.It overcomes the resource-bottleneck problem of traditional malicious traffic defenders which are deployed at the victim side,and also is free of labeled traffic data in model training.Specifically,we design a coarse-grained behavior model of Io T devices by self-supervised learning with unlabeled traffic data.Then,we fine-tune this model to improve its accuracy in malicious traffic detection by adopting a transfer learning method using a small amount of labeled data.Experimental results show that our method can achieve the accuracy of 99.52%and the F1-score of 99.52%with only 1%of the labeled training data based on the CICDDoS2019 dataset.Moreover,our method outperforms the stateof-the-art supervised learning-based methods in terms of accuracy,precision,recall and F1-score with 1%of the training data.展开更多
This paper introduces the background,illustrates the hardware structure and software features of malicious base station,explains its work principle,presents a method of detecting malicious base station,analyses the ex...This paper introduces the background,illustrates the hardware structure and software features of malicious base station,explains its work principle,presents a method of detecting malicious base station,analyses the experiment and evaluates the experimental results to verify the reliability of this method.Finally proposes the future work.展开更多
We study the detailed malicious code propagating process in scale-free networks with link weights that denotes traffic between two nodes. It is found that the propagating velocity reaches a peak rapidly then decays in...We study the detailed malicious code propagating process in scale-free networks with link weights that denotes traffic between two nodes. It is found that the propagating velocity reaches a peak rapidly then decays in a power-law form, which is different from the well-known result in unweighted network case. Simulation results show that the nodes with larger strength are preferential to be infected, but the hierarchical dynamics are not clearly found. The simulation results also show that larger dispersion of weight of networks leads to slower propagating, which indicates that malicious code propagates more quickly in unweighted scale-free networks than in weighted scale-free networks under the same condition. These results show that not only the topology of networks but also the link weights affect the malicious propagating process.展开更多
基金supported by the Natural Science Foundation of Henan Province(Grant No.242300420297)awarded to Yi Sun.
文摘With the development of anti-virus technology,malicious documents have gradually become the main pathway of Advanced Persistent Threat(APT)attacks,therefore,the development of effective malicious document classifiers has become particularly urgent.Currently,detection methods based on document structure and behavioral features encounter challenges in feature engineering,these methods not only have limited accuracy,but also consume large resources,and usually can only detect documents in specific formats,which lacks versatility and adaptability.To address such problems,this paper proposes a novel malicious document detection method-visualizing documents as GGE images(Grayscale,Grayscale matrix,Entropy).The GGE method visualizes the original byte sequence of the malicious document as a grayscale image,the information entropy sequence of the document as an entropy image,and at the same time,the grayscale level co-occurrence matrix and the texture and spatial information stored in it are converted into grayscale matrix image,and fuses the three types of images to get the GGE color image.The Convolutional Block Attention Module-EfficientNet-B0(CBAM-EfficientNet-B0)model is then used for classification,combining transfer learning and applying the pre-trained model on the ImageNet dataset to the feature extraction process of GGE images.As shown in the experimental results,the GGE method has superior performance compared with other methods,which is suitable for detecting malicious documents in different formats,and achieves an accuracy of 99.44%and 97.39%on Portable Document Format(PDF)and office datasets,respectively,and consumes less time during the detection process,which can be effectively applied to the task of detecting malicious documents in real-time.
基金supported by the Fundamental Research Funds for the Central Universities(2024JKF13)the Beijing Municipal Education Commission General Program of Science and Technology(No.KM202414019003).
文摘With the widespread use of SMS(Short Message Service),the proliferation of malicious SMS has emerged as a pressing societal issue.While deep learning-based text classifiers offer promise,they often exhibit suboptimal performance in fine-grained detection tasks,primarily due to imbalanced datasets and insufficient model representation capabilities.To address this challenge,this paper proposes an LLMs-enhanced graph fusion dual-stream Transformer model for fine-grained Chinese malicious SMS detection.During the data processing stage,Large Language Models(LLMs)are employed for data augmentation,mitigating dataset imbalance.In the data input stage,both word-level and character-level features are utilized as model inputs,enhancing the richness of features and preventing information loss.A dual-stream Transformer serves as the backbone network in the learning representation stage,complemented by a graph-based feature fusion mechanism.At the output stage,both supervised classification cross-entropy loss and supervised contrastive learning loss are used as multi-task optimization objectives,further enhancing the model’s feature representation.Experimental results demonstrate that the proposed method significantly outperforms baselines on a publicly available Chinese malicious SMS dataset.
文摘The increasingly complex and interconnected train control information network is vulnerable to a variety of malicious traffic attacks,and the existing malicious traffic detection methods mainly rely on machine learning,such as poor robustness,weak generalization,and a lack of ability to learn common features.Therefore,this paper proposes a malicious traffic identification method based on stacked sparse denoising autoencoders combined with a regularized extreme learning machine through particle swarm optimization.Firstly,the simulation environment of the Chinese train control system-3,was constructed for data acquisition.Then Pearson coefficient and other methods are used for pre-processing,then a stacked sparse denoising autoencoder is used to achieve nonlinear dimensionality reduction of features,and finally regularization extreme learning machine optimized by particle swarm optimization is used to achieve classification.Experimental data show that the proposed method has good training performance,with an average accuracy of 97.57%and a false negative rate of 2.43%,which is better than other alternative methods.In addition,ablation experiments were performed to evaluate the contribution of each component,and the results showed that the combination of methods was superior to individual methods.To further evaluate the generalization ability of the model in different scenarios,publicly available data sets of industrial control system networks were used.The results show that the model has robust detection capability in various types of network attacks.
文摘The proliferation of internet traffic encryption has become a double-edged sword. While it significantly enhances user privacy, it also inadvertently shields cyber-attacks from detection, presenting a formidable challenge to cybersecurity. Traditional machine learning and deep learning techniques often fall short in identifying encrypted malicious traffic due to their inability to fully extract and utilize the implicit relational and positional information embedded within data packets. This limitation has led to an unresolved challenge in the cybersecurity community: how to effectively extract valuable insights from the complex patterns of traffic packet transmission. Consequently, this paper introduces the TB-Graph model, an encrypted malicious traffic classification model based on a relational graph attention network. The model is a heterogeneous traffic burst graph that embeds side-channel features, which are unaffected by encryption, into the graph nodes and connects them with three different types of burst edges. Subsequently, we design a relational positional coding that prevents the loss of temporal relationships between the original traffic flows during graph transformation. Ultimately, TB-Graph leverages the powerful graph representation learning capabilities of Relational Graph Attention Network (RGAT) to extract latent behavioral features from the burst graph nodes and edge relationships. Experimental results show that TB-Graph outperforms various state-of-the-art methods in fine-grained encrypted malicious traffic classification tasks on two public datasets, indicating its enhanced capability for identifying encrypted malicious traffic.
基金funded by the National Natural Science Foundation of China,grant numbers 61876189,61703426,and 61273275.
文摘With the growth of the Internet of Things(IoT)comes a flood of malicious traffic in the IoT,intensifying the challenges of network security.Traditional models operate with independent layers,limiting their effectiveness in addressing these challenges.To address this issue,we propose a cross-layer cooperative Feature Subset-Based Malicious Traffic Detection(FSMMTD)model for detecting malicious traffic.Our approach begins by applying an enhanced random forest method to adaptively filter and retain highly discriminative first-layer features.These processed features are then input into an improved state-space model that integrates the strengths of recurrent neural networks(RNNs)and transformers,enabling superior processing of complex patterns and global information.This integration allows the FSMMTD model to enhance its capability in identifying intricate data relationships and capturing comprehensive contextual insights.The FSMMTD model monitors IoT data flows in real-time,efficiently detecting anomalies and enabling rapid response to potential intrusions.We validate our approach using the publicly available ToN_IoT dataset for IoT traffic analysis.Experimental results demonstrate that our method achieves superior performance with an accuracy of 98.37%,precision of 96.28%,recall of 95.36%,and F1-score of 96.79%.These metrics indicate that the FSMMTD model outperforms existing methods in detecting malicious traffic,showcasing its effectiveness and reliability in enhancing IoT network security.
基金This research was funded by National Natural Science Foundation of China under Grant No.61806171Sichuan University of Science&Engineering Talent Project under Grant No.2021RC15+2 种基金Open Fund Project of Key Laboratory for Non-Destructive Testing and Engineering Computer of Sichuan Province Universities on Bridge Inspection and Engineering under Grant No.2022QYJ06Sichuan University of Science&Engineering Graduate Student Innovation Fund under Grant No.Y2023115The Scientific Research and Innovation Team Program of Sichuan University of Science and Technology under Grant No.SUSE652A006.
文摘While encryption technology safeguards the security of network communications,malicious traffic also uses encryption protocols to obscure its malicious behavior.To address the issues of traditional machine learning methods relying on expert experience and the insufficient representation capabilities of existing deep learning methods for encrypted malicious traffic,we propose an encrypted malicious traffic classification method that integrates global semantic features with local spatiotemporal features,called BERT-based Spatio-Temporal Features Network(BSTFNet).At the packet-level granularity,the model captures the global semantic features of packets through the attention mechanism of the Bidirectional Encoder Representations from Transformers(BERT)model.At the byte-level granularity,we initially employ the Bidirectional Gated Recurrent Unit(BiGRU)model to extract temporal features from bytes,followed by the utilization of the Text Convolutional Neural Network(TextCNN)model with multi-sized convolution kernels to extract local multi-receptive field spatial features.The fusion of features from both granularities serves as the ultimate multidimensional representation of malicious traffic.Our approach achieves accuracy and F1-score of 99.39%and 99.40%,respectively,on the publicly available USTC-TFC2016 dataset,and effectively reduces sample confusion within the Neris and Virut categories.The experimental results demonstrate that our method has outstanding representation and classification capabilities for encrypted malicious traffic.
基金supported by the Korea Institute for Advancement of Technology(KIAT)Grant funded by the Korean Government(MOTIE)(P0008703,The Competency Development Program for Industry Specialists)MSIT under the ICAN(ICT Challenge and Advanced Network of HRD)Program(No.IITP-2022-RS-2022-00156310)supervised by the Institute of Information&Communication Technology Planning and Evaluation(IITP).
文摘With the advancement of wireless network technology,vast amounts of traffic have been generated,and malicious traffic attacks that threaten the network environment are becoming increasingly sophisticated.While signature-based detection methods,static analysis,and dynamic analysis techniques have been previously explored for malicious traffic detection,they have limitations in identifying diversified malware traffic patterns.Recent research has been focused on the application of machine learning to detect these patterns.However,applying machine learning to lightweight devices like IoT devices is challenging because of the high computational demands and complexity involved in the learning process.In this study,we examined methods for effectively utilizing machine learning-based malicious traffic detection approaches for lightweight devices.We introduced the suboptimal feature selection model(SFSM),a feature selection technique designed to reduce complexity while maintaining the effectiveness of malicious traffic detection.Detection performance was evaluated on various malicious traffic,benign,exploits,and generic,using the UNSW-NB15 dataset and SFSM sub-optimized hyperparameters for feature selection and narrowed the search scope to encompass all features.SFSM improved learning performance while minimizing complexity by considering feature selection and exhaustive search as two steps,a problem not considered in conventional models.Our experimental results showed that the detection accuracy was improved by approximately 20%compared to the random model,and the reduction in accuracy compared to the greedy model,which performs an exhaustive search on all features,was kept within 6%.Additionally,latency and complexity were reduced by approximately 96%and 99.78%,respectively,compared to the greedy model.This study demonstrates that malicious traffic can be effectively detected even in lightweight device environments.SFSM verified the possibility of detecting various attack traffic on lightweight devices.
基金supported by a grant of the Ministry of Research,Innovation and Digitization,CNCS-UEFISCDI,Project Number PN-Ⅲ-P4-PCE-2021-0334,within PNCDI Ⅲ.
文摘The potential of text analytics is revealed by Machine Learning(ML)and Natural Language Processing(NLP)techniques.In this paper,we propose an NLP framework that is applied to multiple datasets to detect malicious Uniform Resource Locators(URLs).Three categories of features,both ML and Deep Learning(DL)algorithms and a ranking schema are included in the proposed framework.We apply frequency and prediction-based embeddings,such as hash vectorizer,Term Frequency-Inverse Dense Frequency(TF-IDF)and predictors,word to vector-word2vec(continuous bag of words,skip-gram)from Google,to extract features from text.Further,we apply more state-of-the-art methods to create vectorized features,such as GloVe.Additionally,feature engineering that is specific to URL structure is deployed to detect scams and other threats.For framework assessment,four ranking indicators are weighted:computational time and performance as accuracy,F1 score and type error II.For the computational time,we propose a new metric-Feature Building Time(FBT)as the cutting-edge feature builders(like doc2vec or GloVe)require more time.By applying the proposed assessment step,the skip-gram algorithm of word2vec surpasses other feature builders in performance.Additionally,eXtreme Gradient Boost(XGB)outperforms other classifiers.With this setup,we attain an accuracy of 99.5%and an F1 score of 0.99.
基金in part by National Undergraduate Innovation and Entrepreneurship Training Program under Grant No.202310347039Zhejiang Provincial Natural Science Foundation of China under Grant No.LZ22F020002Huzhou Science and Technology Planning Foundation under Grant No.2023GZ04.
文摘The Internet of Things(IoT)has characteristics such as node mobility,node heterogeneity,link heterogeneity,and topology heterogeneity.In the face of the IoT characteristics and the explosive growth of IoT nodes,which brings about large-scale data processing requirements,edge computing architecture has become an emerging network architecture to support IoT applications due to its ability to provide powerful computing capabilities and good service functions.However,the defense mechanism of Edge Computing-enabled IoT Nodes(ECIoTNs)is still weak due to their limited resources,so that they are susceptible to malicious software spread,which can compromise data confidentiality and network service availability.Facing this situation,we put forward an epidemiology-based susceptible-curb-infectious-removed-dead(SCIRD)model.Then,we analyze the dynamics of ECIoTNs with different infection levels under different initial conditions to obtain the dynamic differential equations.Additionally,we establish the presence of equilibrium states in the SCIRD model.Furthermore,we conduct an analysis of the model’s stability and examine the conditions under which malicious software will either spread or disappear within Edge Computing-enabled IoT(ECIoT)networks.Lastly,we validate the efficacy and superiority of the SCIRD model through MATLAB simulations.These research findings offer a theoretical foundation for suppressing the propagation of malicious software in ECIoT networks.The experimental results indicate that the theoretical SCIRD model has instructive significance,deeply revealing the principles of malicious software propagation in ECIoT networks.This study solves a challenging security problem of ECIoT networks by determining the malicious software propagation threshold,which lays the foundation for buildingmore secure and reliable ECIoT networks.
基金support this work is the Key Research and Development Program of Heilongjiang Province,specifically Grant Number 2023ZX02C10.
文摘Due to the diversity and unpredictability of changes in malicious code,studying the traceability of variant families remains challenging.In this paper,we propose a GAN-EfficientNetV2-based method for tracing families of malicious code variants.This method leverages the similarity in layouts and textures between images of malicious code variants from the same source and their original family of malicious code images.The method includes a lightweight classifier and a simulator.The classifier utilizes the enhanced EfficientNetV2 to categorize malicious code images and can be easily deployed on mobile,embedded,and other devices.The simulator utilizes an enhanced generative adversarial network to simulate different variants of malicious code and generates datasets to validate the model’s performance.This process helps identify model vulnerabilities and security risks,facilitating model enhancement and development.The classifier achieves 98.61%and 97.59%accuracy on the MMCC dataset and Malevis dataset,respectively.The simulator’s generated image of malicious code variants has an FID value of 155.44 and an IS value of 1.72±0.42.The classifier’s accuracy for tracing the family of malicious code variants is as high as 90.29%,surpassing that of mainstream neural network models.This meets the current demand for high generalization and anti-obfuscation abilities in malicious code classification models due to the rapid evolution of malicious code.
文摘With the growth of the Internet,more and more business is being done online,for example,online offices,online education and so on.While this makes people’s lives more convenient,it also increases the risk of the network being attacked by malicious code.Therefore,it is important to identify malicious codes on computer systems efficiently.However,most of the existing malicious code detection methods have two problems:(1)The ability of the model to extract features is weak,resulting in poor model performance.(2)The large scale of model data leads to difficulties deploying on devices with limited resources.Therefore,this paper proposes a lightweight malicious code identification model Lightweight Malicious Code Classification Method Based on Improved SqueezeNet(LCMISNet).In this paper,the MFire lightweight feature extraction module is constructed by proposing a feature slicing module and a multi-size depthwise separable convolution module.The feature slicing module reduces the number of parameters by grouping features.The multi-size depthwise separable convolution module reduces the number of parameters and enhances the feature extraction capability by replacing the standard convolution with depthwise separable convolution with different convolution kernel sizes.In addition,this paper also proposes a feature splicing module to connect the MFire lightweight feature extraction module based on the feature reuse and constructs the lightweight model LCMISNet.The malicious code recognition accuracy of LCMISNet on the BIG 2015 dataset and the Malimg dataset reaches 98.90% and 99.58%,respectively.It proves that LCMISNet has a powerful malicious code recognition performance.In addition,compared with other network models,LCMISNet has better performance,and a lower number of parameters and computations.
基金funded by National Natural Science Foundation of China(under Grant No.61905201)。
文摘The field of finance heavily relies on cybersecurity to safeguard its systems and clients from harmful software.The identification of malevolent code within financial software is vital for protecting both the financial system and individual clients.Nevertheless,present detection models encounter limitations in their ability to identify malevolent code and its variations,all while encompassing a multitude of parameters.To overcome these obsta-cles,we introduce a lean model for classifying families of malevolent code,formulated on Ghost-DenseNet-SE.This model integrates the Ghost module,DenseNet,and the squeeze-and-excitation(SE)channel domain attention mechanism.It substitutes the standard convolutional layer in DenseNet with the Ghost module,thereby diminishing the model’s size and augmenting recognition speed.Additionally,the channel domain attention mechanism assigns distinctive weights to feature channels,facilitating the extraction of pivotal characteristics of malevolent code and bolstering detection precision.Experimental outcomes on the Malimg dataset indicate that the model attained an accuracy of 99.14%in discerning families of malevolent code,surpassing AlexNet(97.8%)and The visual geometry group network(VGGNet)(96.16%).The proposed model exhibits reduced parameters,leading to decreased model complexity alongside enhanced classification accuracy,rendering it a valuable asset for categorizing malevolent code.
文摘A URL(Uniform Resource Locator)is used to locate a digital resource.With this URL,an attacker can perform a variety of attacks,which can lead to serious consequences for both individuals and organizations.Therefore,attackers create malicious URLs to gain access to an organization’s systems or sensitive information.It is crucial to secure individuals and organizations against these malicious URLs.A combination of machine learning and deep learning was used to predict malicious URLs.This research contributes significantly to the field of cybersecurity by proposing a model that seamlessly integrates the accuracy of machine learning with the swiftness of deep learning.The strategic fusion of Random Forest(RF) and Multilayer Perceptron(MLP)with an accuracy of 81% represents a noteworthy advancement,offering a balanced solution for robust cybersecurity.This study found that by combining RF and MLP,an efficient model was developed with an accuracy of 81%and a training time of 33.78 s.
基金This work was supported by National Natural Science Foundation of China(No.62172308,No.U1626107,No.61972297,No.62172144,and No.62062019).
文摘Power Shell has been widely deployed in fileless malware and advanced persistent threat(APT)attacks due to its high stealthiness and live-off-theland technique.However,existing works mainly focus on deobfuscation and malicious detection,lacking the malicious Power Shell families classification and behavior analysis.Moreover,the state-of-the-art methods fail to capture fine-grained features and semantic relationships,resulting in low robustness and accuracy.To this end,we propose Power Detector,a novel malicious Power Shell script detector based on multimodal semantic fusion and deep learning.Specifically,we design four feature extraction methods to extract key features from character,token,abstract syntax tree(AST),and semantic knowledge graph.Then,we intelligently design four embeddings(i.e.,Char2Vec,Token2Vec,AST2Vec,and Rela2Vec) and construct a multi-modal fusion algorithm to concatenate feature vectors from different views.Finally,we propose a combined model based on transformer and CNN-Bi LSTM to implement Power Shell family detection.Our experiments with five types of Power Shell attacks show that PowerDetector can accurately detect various obfuscated and stealth PowerShell scripts,with a 0.9402 precision,a 0.9358 recall,and a 0.9374 F1-score.Furthermore,through singlemodal and multi-modal comparison experiments,we demonstrate that PowerDetector’s multi-modal embedding and deep learning model can achieve better accuracy and even identify more unknown attacks.
基金partially funded by the National Natural Science Foundation of China (Grant No. 61272447)National Entrepreneurship & Innovation Demonstration Base of China (Grant No. C700011)Key Research & Development Project of Sichuan Province of China (Grant No. 2018G20100)
文摘The limited labeled sample data in the field of advanced security threats detection seriously restricts the effective development of research work.Learning the sample labels from the labeled and unlabeled data has received a lot of research attention and various universal labeling methods have been proposed.However,the labeling task of malicious communication samples targeted at advanced threats has to face the two practical challenges:the difficulty of extracting effective features in advance and the complexity of the actual sample types.To address these problems,we proposed a sample labeling method for malicious communication based on semi-supervised deep neural network.This method supports continuous learning and optimization feature representation while labeling sample,and can handle uncertain samples that are outside the concerned sample types.According to the experimental results,our proposed deep neural network can automatically learn effective feature representation,and the validity of features is close to or even higher than that of features which extracted based on expert knowledge.Furthermore,our proposed method can achieve the labeling accuracy of 97.64%~98.50%,which is more accurate than the train-then-detect,kNN and LPA methodsin any labeled-sample proportion condition.The problem of insufficient labeled samples in many network attack detecting scenarios,and our proposed work can function as a reference for the sample labeling tasks in the similar real-world scenarios.
文摘Background:In recent years,blockchain technology has attracted considerable attention.It records cryptographic transactions in a public ledger that is difficult to alter and compromise because of the distributed consensus.As a result,blockchain is believed to resist fraud and hacking.Results:This work explores the types of fraud and malicious activities that can be prevented by blockchain technology and identifies attacks to which blockchain remains vulnerable.Conclusions:This study recommends appropriate defensive measures and calls for further research into the techniques for fighting malicious activities related to blockchains.
文摘Spam is no longer just commercial unsolicited email messages that waste our time, it consumes network traffic and mail servers’ storage. Furthermore, spam has become a major component of several attack vectors including attacks such as phishing, cross-site scripting, cross-site request forgery and malware infection. Statistics show that the amount of spam containing malicious contents increased compared to the one advertising legitimate products and services. In this paper, the issue of spam detection is investigated with the aim to develop an efficient method to identify spam email based on the analysis of the content of email messages. We identify a set of features that have a considerable number of malicious related features. Our goal is to study the effect of these features in helping the classical classifiers in identifying spam emails. To make the problem more challenging, we developed spam classification models based on imbalanced data where spam emails form the rare class with only 16.5% of the total emails. Different metrics were utilized in the evaluation of the developed models. Results show noticeable improvement of spam classification models when trained by dataset that includes malicious related features.
基金supported in part by the National Key R&D Program of China under Grant 2018YFA0701601part by the National Natural Science Foundation of China(Grant No.U22A2002,61941104,62201605)part by Tsinghua University-China Mobile Communications Group Co.,Ltd.Joint Institute。
文摘In the upcoming large-scale Internet of Things(Io T),it is increasingly challenging to defend against malicious traffic,due to the heterogeneity of Io T devices and the diversity of Io T communication protocols.In this paper,we propose a semi-supervised learning-based approach to detect malicious traffic at the access side.It overcomes the resource-bottleneck problem of traditional malicious traffic defenders which are deployed at the victim side,and also is free of labeled traffic data in model training.Specifically,we design a coarse-grained behavior model of Io T devices by self-supervised learning with unlabeled traffic data.Then,we fine-tune this model to improve its accuracy in malicious traffic detection by adopting a transfer learning method using a small amount of labeled data.Experimental results show that our method can achieve the accuracy of 99.52%and the F1-score of 99.52%with only 1%of the labeled training data based on the CICDDoS2019 dataset.Moreover,our method outperforms the stateof-the-art supervised learning-based methods in terms of accuracy,precision,recall and F1-score with 1%of the training data.
文摘This paper introduces the background,illustrates the hardware structure and software features of malicious base station,explains its work principle,presents a method of detecting malicious base station,analyses the experiment and evaluates the experimental results to verify the reliability of this method.Finally proposes the future work.
基金Supported by the National Natural Science Foundation of China (90204012, 60573036) and the Natural Science Foundation of Hebei Province (F2006000177)
文摘We study the detailed malicious code propagating process in scale-free networks with link weights that denotes traffic between two nodes. It is found that the propagating velocity reaches a peak rapidly then decays in a power-law form, which is different from the well-known result in unweighted network case. Simulation results show that the nodes with larger strength are preferential to be infected, but the hierarchical dynamics are not clearly found. The simulation results also show that larger dispersion of weight of networks leads to slower propagating, which indicates that malicious code propagates more quickly in unweighted scale-free networks than in weighted scale-free networks under the same condition. These results show that not only the topology of networks but also the link weights affect the malicious propagating process.