Traffic encryption techniques facilitate cyberattackers to hide their presence and activities.Traffic classification is an important method to prevent network threats.However,due to the tremendous traffic volume and l...Traffic encryption techniques facilitate cyberattackers to hide their presence and activities.Traffic classification is an important method to prevent network threats.However,due to the tremendous traffic volume and limitations of computing,most existing traffic classification techniques are inapplicable to the high-speed network environment.In this paper,we propose a High-speed Encrypted Traffic Classification(HETC)method containing two stages.First,to efficiently detect whether traffic is encrypted,HETC focuses on randomly sampled short flows and extracts aggregation entropies with chi-square test features to measure the different patterns of the byte composition and distribution between encrypted and unencrypted flows.Second,HETC introduces binary features upon the previous features and performs fine-grained traffic classification by combining these payload features with a Random Forest model.The experimental results show that HETC can achieve a 94%F-measure in detecting encrypted flows and a 85%–93%F-measure in classifying fine-grained flows for a 1-KB flow-length dataset,outperforming the state-of-the-art comparison methods.Meanwhile,HETC does not need to wait for the end of the flow and can extract mass computing features.The average time for HETC to process each flow is only 2 or 16 ms,which is lower than the flow duration in most cases,making it a good candidate for high-speed traffic classification.展开更多
Network traffic classification is a crucial research area aimed at improving quality of service,simplifying network management,and enhancing network security.To address the growing complexity of cryptography,researche...Network traffic classification is a crucial research area aimed at improving quality of service,simplifying network management,and enhancing network security.To address the growing complexity of cryptography,researchers have proposed various machine learning and deep learning approaches to tackle this challenge.However,existing mainstream methods face several general issues.On one hand,the widely used Transformer architecture exhibits high computational complexity,which negatively impacts its efficiency.On the other hand,traditional methods are often unreliable in traffic representation,frequently losing important byte information while retaining unnecessary biases.To address these problems,this paper introduces the Swin Transformer architecture into the domain of network traffic classification and proposes the NetST(Network Swin Transformer)model.This model improves the Swin Transformer to better accommodate the characteristics of network traffic,effectively addressing efficiency issues.Furthermore,this paper presents a traffic representation scheme designed to extract meaningful information from large volumes of traffic while minimizing bias.We integrate four datasets relevant to network traffic classification for our experiments,and the results demonstrate that NetST achieves a high accuracy rate while maintaining low memory usage.展开更多
This study proposes an efficient traffic classification model to address the growing threat of distributed denial-of-service(DDoS)attacks in 5th generation technology standard(5G)slicing networks.The proposed method u...This study proposes an efficient traffic classification model to address the growing threat of distributed denial-of-service(DDoS)attacks in 5th generation technology standard(5G)slicing networks.The proposed method utilizes an ensemble of encoder components from multiple autoencoders to compress and extract latent representations from high-dimensional traffic data.These representations are then used as input for a support vector machine(SVM)-based metadata classifier,enabling precise detection of attack traffic.This architecture is designed to achieve both high detection accuracy and training efficiency,while adapting flexibly to the diverse service requirements and complexity of 5G network slicing.The model was evaluated using the DDoS Datasets 2022,collected in a simulated 5G slicing environment.Experiments were conducted under both class-balanced and class-imbalanced conditions.In the balanced setting,the model achieved an accuracy of 89.33%,an F1-score of 88.23%,and an Area Under the Curve(AUC)of 89.45%.In the imbalanced setting(attack:normal 7:3),the model maintained strong robustness,=achieving a recall of 100%and an F1-score of 90.91%,demonstrating its effectiveness in diverse real-world scenarios.Compared to existing AI-based detection methods,the proposed model showed higher precision,better handling of class imbalance,and strong generalization performance.Moreover,its modular structure is well-suited for deployment in containerized network function(NF)environments,making it a practical solution for real-world 5G infrastructure.These results highlight the potential of the proposed approach to enhance both the security and operational resilience of 5G slicing networks.展开更多
With the rise of encrypted traffic,traditional network analysis methods have become less effective,leading to a shift towards deep learning-based approaches.Among these,multimodal learning-based classification methods...With the rise of encrypted traffic,traditional network analysis methods have become less effective,leading to a shift towards deep learning-based approaches.Among these,multimodal learning-based classification methods have gained attention due to their ability to leverage diverse feature sets from encrypted traffic,improving classification accuracy.However,existing research predominantly relies on late fusion techniques,which hinder the full utilization of deep features within the data.To address this limitation,we propose a novel multimodal encrypted traffic classification model that synchronizes modality fusion with multiscale feature extraction.Specifically,our approach performs real-time fusion of modalities at each stage of feature extraction,enhancing feature representation at each level and preserving inter-level correlations for more effective learning.This continuous fusion strategy improves the model’s ability to detect subtle variations in encrypted traffic,while boosting its robustness and adaptability to evolving network conditions.Experimental results on two real-world encrypted traffic datasets demonstrate that our method achieves a classification accuracy of 98.23% and 97.63%,outperforming existing multimodal learning-based methods.展开更多
Intelligent vehicle applications provide convenience but raise privacy and security concerns.Misuse of sensitive data,including vehicle location,and facial recognition information,poses a threat to user privacy.Hence,...Intelligent vehicle applications provide convenience but raise privacy and security concerns.Misuse of sensitive data,including vehicle location,and facial recognition information,poses a threat to user privacy.Hence,traffic classification is vital for promptly overseeing and controlling applications with sensitive information.In this paper,we propose ETNet,a framework that combines multiple features and leverages self-attention mechanisms to learn deep relationships between packets.ET-Net employs a multisimilarity triplet network to extract features from raw bytes,and exploits self-attention to capture long-range dependencies within packets in a session and contextual information features.Additionally,we utilizing the loss function to more effectively integrate information acquired from both byte sequences and their corresponding lengths.Through simulated evaluations on datasets with similar attributes,ET-Net demonstrates the ability to finely distinguish between nine categories of applications,achieving superior results compared to existing methods.展开更多
The proliferation of internet traffic encryption has become a double-edged sword. While it significantly enhances user privacy, it also inadvertently shields cyber-attacks from detection, presenting a formidable chall...The proliferation of internet traffic encryption has become a double-edged sword. While it significantly enhances user privacy, it also inadvertently shields cyber-attacks from detection, presenting a formidable challenge to cybersecurity. Traditional machine learning and deep learning techniques often fall short in identifying encrypted malicious traffic due to their inability to fully extract and utilize the implicit relational and positional information embedded within data packets. This limitation has led to an unresolved challenge in the cybersecurity community: how to effectively extract valuable insights from the complex patterns of traffic packet transmission. Consequently, this paper introduces the TB-Graph model, an encrypted malicious traffic classification model based on a relational graph attention network. The model is a heterogeneous traffic burst graph that embeds side-channel features, which are unaffected by encryption, into the graph nodes and connects them with three different types of burst edges. Subsequently, we design a relational positional coding that prevents the loss of temporal relationships between the original traffic flows during graph transformation. Ultimately, TB-Graph leverages the powerful graph representation learning capabilities of Relational Graph Attention Network (RGAT) to extract latent behavioral features from the burst graph nodes and edge relationships. Experimental results show that TB-Graph outperforms various state-of-the-art methods in fine-grained encrypted malicious traffic classification tasks on two public datasets, indicating its enhanced capability for identifying encrypted malicious traffic.展开更多
While encryption technology safeguards the security of network communications,malicious traffic also uses encryption protocols to obscure its malicious behavior.To address the issues of traditional machine learning me...While encryption technology safeguards the security of network communications,malicious traffic also uses encryption protocols to obscure its malicious behavior.To address the issues of traditional machine learning methods relying on expert experience and the insufficient representation capabilities of existing deep learning methods for encrypted malicious traffic,we propose an encrypted malicious traffic classification method that integrates global semantic features with local spatiotemporal features,called BERT-based Spatio-Temporal Features Network(BSTFNet).At the packet-level granularity,the model captures the global semantic features of packets through the attention mechanism of the Bidirectional Encoder Representations from Transformers(BERT)model.At the byte-level granularity,we initially employ the Bidirectional Gated Recurrent Unit(BiGRU)model to extract temporal features from bytes,followed by the utilization of the Text Convolutional Neural Network(TextCNN)model with multi-sized convolution kernels to extract local multi-receptive field spatial features.The fusion of features from both granularities serves as the ultimate multidimensional representation of malicious traffic.Our approach achieves accuracy and F1-score of 99.39%and 99.40%,respectively,on the publicly available USTC-TFC2016 dataset,and effectively reduces sample confusion within the Neris and Virut categories.The experimental results demonstrate that our method has outstanding representation and classification capabilities for encrypted malicious traffic.展开更多
Accurate classification of encrypted traffic plays an important role in network management.However,current methods confronts several problems:inability to characterize traffic that exhibits great dispersion,inability ...Accurate classification of encrypted traffic plays an important role in network management.However,current methods confronts several problems:inability to characterize traffic that exhibits great dispersion,inability to classify traffic with multi-level features,and degradation due to limited training traffic size.To address these problems,this paper proposes a traffic granularity-based cryptographic traffic classification method,called Granular Classifier(GC).In this paper,a novel Cardinality-based Constrained Fuzzy C-Means(CCFCM)clustering algorithm is proposed to address the problem caused by limited training traffic,considering the ratio of cardinality that must be linked between flows to achieve good traffic partitioning.Then,an original representation format of traffic is presented based on granular computing,named Traffic Granules(TG),to accurately describe traffic structure by catching the dispersion of different traffic features.Each granule is a compact set of similar data with a refined boundary by excluding outliers.Based on TG,GC is constructed to perform traffic classification based on multi-level features.The performance of the GC is evaluated based on real-world encrypted network traffic data.Experimental results show that the GC achieves outstanding performance for encrypted traffic classification with limited size of training traffic and keeps accurate classification in dynamic network conditions.展开更多
With the increasing proportion of encrypted traffic in cyberspace, the classification of encrypted traffic has becomea core key technology in network supervision. In recent years, many different solutions have emerged...With the increasing proportion of encrypted traffic in cyberspace, the classification of encrypted traffic has becomea core key technology in network supervision. In recent years, many different solutions have emerged in this field.Most methods identify and classify traffic by extracting spatiotemporal characteristics of data flows or byte-levelfeatures of packets. However, due to changes in data transmission mediums, such as fiber optics and satellites,temporal features can exhibit significant variations due to changes in communication links and transmissionquality. Additionally, partial spatial features can change due to reasons like data reordering and retransmission.Faced with these challenges, identifying encrypted traffic solely based on packet byte-level features is significantlydifficult. To address this, we propose a universal packet-level encrypted traffic identification method, ComboPacket. This method utilizes convolutional neural networks to extract deep features of the current packet andits contextual information and employs spatial and channel attention mechanisms to select and locate effectivefeatures. Experimental data shows that Combo Packet can effectively distinguish between encrypted traffic servicecategories (e.g., File Transfer Protocol, FTP, and Peer-to-Peer, P2P) and encrypted traffic application categories (e.g.,BitTorrent and Skype). Validated on the ISCX VPN-non VPN dataset, it achieves classification accuracies of 97.0%and 97.1% for service and application categories, respectively. It also provides shorter training times and higherrecognition speeds. The performance and recognition capabilities of Combo Packet are significantly superior tothe existing classification methods mentioned.展开更多
Encrypted traffic plays a crucial role in safeguarding network security and user privacy.However,encrypting malicious traffic can lead to numerous security issues,making the effective classification of encrypted traff...Encrypted traffic plays a crucial role in safeguarding network security and user privacy.However,encrypting malicious traffic can lead to numerous security issues,making the effective classification of encrypted traffic essential.Existing methods for detecting encrypted traffic face two significant challenges.First,relying solely on the original byte information for classification fails to leverage the rich temporal relationships within network traffic.Second,machine learning and convolutional neural network methods lack sufficient network expression capabilities,hindering the full exploration of traffic’s potential characteristics.To address these limitations,this study introduces a traffic classification method that utilizes time relationships and a higher-order graph neural network,termed HGNN-ETC.This approach fully exploits the original byte information and chronological relationships of traffic packets,transforming traffic data into a graph structure to provide the model with more comprehensive context information.HGNN-ETC employs an innovative k-dimensional graph neural network to effectively capture the multi-scale structural features of traffic graphs,enabling more accurate classification.We select the ISCXVPN and the USTC-TK2016 dataset for our experiments.The results show that compared with other state-of-the-art methods,our method can obtain a better classification effect on different datasets,and the accuracy rate is about 97.00%.In addition,by analyzing the impact of varying input specifications on classification performance,we determine the optimal network data truncation strategy and confirm the model’s excellent generalization ability on different datasets.展开更多
Machine Learning(ML) techniques have been widely applied in recent traffic classification.However, the problems of both discriminator bias and class imbalance decrease the accuracies of ML based traffic classifier. In...Machine Learning(ML) techniques have been widely applied in recent traffic classification.However, the problems of both discriminator bias and class imbalance decrease the accuracies of ML based traffic classifier. In this paper, we propose an accurate and extensible traffic classifier. Specifically, to address the discriminator bias issue, our classifier is built by making an optimal cascade of binary sub-classifiers, where each binary sub-classifier is trained independently with the discriminators used for identifying application specific traffic. Moreover, to balance a training dataset,we apply SMOTE algorithm in generating artificial training samples for minority classes.We evaluate our classifier on two datasets collected from different network border routers.Compared with the previous multi-class traffic classifiers built in one-time training process,our classifier achieves much higher F-Measure and AUC for each application.展开更多
Attacks on websites and network servers are among the most critical threats in network security.Network behavior identification is one of the most effective ways to identify malicious network intrusions.Analyzing abno...Attacks on websites and network servers are among the most critical threats in network security.Network behavior identification is one of the most effective ways to identify malicious network intrusions.Analyzing abnormal network traffic patterns and traffic classification based on labeled network traffic data are among the most effective approaches for network behavior identification.Traditional methods for network traffic classification utilize algorithms such as Naive Bayes,Decision Tree and XGBoost.However,network traffic classification,which is required for network behavior identification,generally suffers from the problem of low accuracy even with the recently proposed deep learning models.To improve network traffic classification accuracy thus improving network intrusion detection rate,this paper proposes a new network traffic classification model,called ArcMargin,which incorporates metric learning into a convolutional neural network(CNN)to make the CNN model more discriminative.ArcMargin maps network traffic samples from the same category more closely while samples from different categories are mapped as far apart as possible.The metric learning regularization feature is called additive angular margin loss,and it is embedded in the object function of traditional CNN models.The proposed ArcMargin model is validated with three datasets and is compared with several other related algorithms.According to a set of classification indicators,the ArcMargin model is proofed to have better performances in both network traffic classification tasks and open-set tasks.Moreover,in open-set tasks,the ArcMargin model can cluster unknown data classes that do not exist in the previous training dataset.展开更多
Encrypted traffic classification has become a hot issue in network security research.The class imbalance problem of traffic samples often causes the deterioration of Machine Learning based classifier performance.Altho...Encrypted traffic classification has become a hot issue in network security research.The class imbalance problem of traffic samples often causes the deterioration of Machine Learning based classifier performance.Although the Generative Adversarial Network(GAN)method can generate new samples by learning the feature distribution of the original samples,it is confronted with the problems of unstable training andmode collapse.To this end,a novel data augmenting approach called Graph CWGAN-GP is proposed in this paper.The traffic data is first converted into grayscale images as the input for the proposed model.Then,the minority class data is augmented with our proposed model,which is built by introducing conditional constraints and a new distance metric in typical GAN.Finally,the classical deep learning model is adopted as a classifier to classify datasets augmented by the Condition GAN(CGAN),Wasserstein GAN-Gradient Penalty(WGAN-GP)and Graph CWGAN-GP,respectively.Compared with the state-of-the-art GAN methods,the Graph CWGAN-GP cannot only control the modes of the data to be generated,but also overcome the problem of unstable training and generate more realistic and diverse samples.The experimental results show that the classification precision,recall and F1-Score of theminority class in the balanced dataset augmented in this paper have improved by more than 2.37%,3.39% and 4.57%,respectively.展开更多
Internet of Things(IoT)defines a network of devices connected to the internet and sharing a massive amount of data between each other and a central location.These IoT devices are connected to a network therefore prone...Internet of Things(IoT)defines a network of devices connected to the internet and sharing a massive amount of data between each other and a central location.These IoT devices are connected to a network therefore prone to attacks.Various management tasks and network operations such as security,intrusion detection,Quality-of-Service provisioning,performance monitoring,resource provisioning,and traffic engineering require traffic classification.Due to the ineffectiveness of traditional classification schemes,such as port-based and payload-based methods,researchers proposed machine learning-based traffic classification systems based on shallow neural networks.Furthermore,machine learning-based models incline to misclassify internet traffic due to improper feature selection.In this research,an efficient multilayer deep learning based classification system is presented to overcome these challenges that can classify internet traffic.To examine the performance of the proposed technique,Moore-dataset is used for training the classifier.The proposed scheme takes the pre-processed data and extracts the flow features using a deep neural network(DNN).In particular,the maximum entropy classifier is used to classify the internet traffic.The experimental results show that the proposed hybrid deep learning algorithm is effective and achieved high accuracy for internet traffic classification,i.e.,99.23%.Furthermore,the proposed algorithm achieved the highest accuracy compared to the support vector machine(SVM)based classification technique and k-nearest neighbours(KNNs)based classification technique.展开更多
The growing P2P streaming traffic brings a variety of problems and challenges to ISP networks and service providers.A P2P streaming traffic classification method based on sampling technology is presented in this paper...The growing P2P streaming traffic brings a variety of problems and challenges to ISP networks and service providers.A P2P streaming traffic classification method based on sampling technology is presented in this paper.By analyzing traffic statistical features and network behavior of P2P streaming,a group of flow characteristics were found,which can make P2P streaming more recognizable among other applications.Attributes from Netflow and those proposed by us are compared in terms of classification accuracy,and so are the results of different sampling rates.It is proved that the unified classification model with the proposed attributes can identify P2P streaming quickly and efficiently in the online system.Even with 1:50 sampling rate,the recognition accuracy can be higher than 94%.Moreover,we have evaluated the CPU resources,storage capacity and time consumption before and after the sampling,it is shown that the classification model after the sampling can significantly reduce the resource requirements with the same recognition accuracy.展开更多
The continual growth of the use of technological appliances during the COVID-19 pandemic has resulted in a massive volume of data flow on the Internet,as many employees have transitioned to working from home.Furthermo...The continual growth of the use of technological appliances during the COVID-19 pandemic has resulted in a massive volume of data flow on the Internet,as many employees have transitioned to working from home.Furthermore,with the increase in the adoption of encrypted data transmission by many people who tend to use a Virtual Private Network(VPN)or Tor Browser(dark web)to keep their data privacy and hidden,network traffic encryption is rapidly becoming a universal approach.This affects and complicates the quality of service(QoS),traffic monitoring,and network security provided by Internet Service Providers(ISPs),particularly for analysis and anomaly detection approaches based on the network traffic’s nature.The method of categorizing encrypted traffic is one of the most challenging issues introduced by a VPN as a way to bypass censorship as well as gain access to geo-locked services.Therefore,an efficient approach is especially needed that enables the identification of encrypted network traffic data to extract and select valuable features which improve the quality of service and network management as well as to oversee the overall performance.In this paper,the classification of network traffic data in terms of VPN and non-VPN traffic is studied based on the efficiency of time-based features extracted from network packets.Therefore,this paper suggests two machine learning models that categorize network traffic into encrypted and non-encrypted traffic.The proposed models utilize statistical features(SF),Pearson Correlation(PC),and a Genetic Algorithm(GA),preprocessing the traffic samples into net flow traffic to accomplish the experiment’s objectives.The GA-based method utilizes a stochastic method based on natural genetics and biological evolution to extract essential features.The PC-based method performs well in removing different features of network traffic.With a microsecond perpacket prediction time,the best model achieved an accuracy of more than 95.02 percent in the most demanding traffic classification task,a drop in accuracy of only 2.37 percent in comparison to the entire statistical-based machine learning approach.This is extremely promising for the development of real-time traffic analyzers.展开更多
Traffic characterization(e.g.,chat,video)and application identifi-cation(e.g.,FTP,Facebook)are two of the more crucial jobs in encrypted network traffic classification.These two activities are typically carried out se...Traffic characterization(e.g.,chat,video)and application identifi-cation(e.g.,FTP,Facebook)are two of the more crucial jobs in encrypted network traffic classification.These two activities are typically carried out separately by existing systems using separate models,significantly adding to the difficulty of network administration.Convolutional Neural Network(CNN)and Transformer are deep learning-based approaches for network traf-fic classification.CNN is good at extracting local features while ignoring long-distance information from the network traffic sequence,and Transformer can capture long-distance feature dependencies while ignoring local details.Based on these characteristics,a multi-task learning model that combines Transformer and 1D-CNN for encrypted traffic classification is proposed(MTC).In order to make up for the Transformer’s lack of local detail feature extraction capability and the 1D-CNN’s shortcoming of ignoring long-distance correlation information when processing traffic sequences,the model uses a parallel structure to fuse the features generated by the Transformer block and the 1D-CNN block with each other using a feature fusion block.This structure improved the representation of traffic features by both blocks and allows the model to perform well with both long and short length sequences.The model simultaneously handles multiple tasks,which lowers the cost of training.Experiments reveal that on the ISCX VPN-nonVPN dataset,the model achieves an average F1 score of 98.25%and an average recall of 98.30%for the task of identifying applications,and an average F1 score of 97.94%,and an average recall of 97.54%for the task of traffic characterization.When advanced models on the same dataset are chosen for comparison,the model produces the best results.To prove the generalization,we applied MTC to CICIDS2017 dataset,and our model also achieved good results.展开更多
Interact traffic classification is vital to the areas of network operation and management. Traditional classification methods such as port mapping and payload analysis are becoming increasingly difficult as newly emer...Interact traffic classification is vital to the areas of network operation and management. Traditional classification methods such as port mapping and payload analysis are becoming increasingly difficult as newly emerged applications (e. g. Peer-to-Peer) using dynamic port numbers, masquerading techniques and encryption to avoid detection. This paper presents a machine learning (ML) based traffic classifica- tion scheme, which offers solutions to a variety of network activities and provides a platform of performance evaluation for the classifiers. The impact of dataset size, feature selection, number of application types and ML algorithm selection on classification performance is analyzed and demonstrated by the following experiments: (1) The genetic algorithm based feature selection can dramatically reduce the cost without diminishing classification accuracy. (2) The chosen ML algorithms can achieve high classification accuracy. Particularly, REPTree and C4.5 outperform the other ML algorithms when computational complexity and accuracy are both taken into account. (3) Larger dataset and fewer application types would result in better classification accuracy. Finally, early detection with only several initial packets is proposed for real-time network activity and it is proved to be feasible according to the preliminary results.展开更多
Accurate and real-time classification of network traffic is significant to network operation and management such as QoS differentiation, traffic shaping and security surveillance. However, with many newly emerged P2P ...Accurate and real-time classification of network traffic is significant to network operation and management such as QoS differentiation, traffic shaping and security surveillance. However, with many newly emerged P2P applications using dynamic port numbers, masquerading techniques, and payload encryption to avoid detection, traditional classification approaches turn to be ineffective. In this paper, we present a layered hybrid system to classify current Internet traffic, motivated by variety of network activities and their requirements of traffic classification. The proposed method could achieve fast and accurate traffic classification with low overheads and robustness to accommodate both known and unknown/encrypted applications. Furthermore, it is feasible to be used in the context of real-time traffic classification. Our experimental results show the distinct advantages of the proposed classifi- cation system, compared with the one-step Machine Learning (ML) approach.展开更多
Traffic identification becomes more important,yet more challenging as related encryption techniques are rapidly developing nowadays.Unlike recent deep learning methods that apply image processing to solve such encrypt...Traffic identification becomes more important,yet more challenging as related encryption techniques are rapidly developing nowadays.Unlike recent deep learning methods that apply image processing to solve such encrypted traffic problems,in this pa⁃per,we propose a method named Payload Encoding Representation from Transformer(PERT)to perform automatic traffic feature extraction using a state-of-the-art dynamic word embedding technique.By implementing traffic classification experiments on a pub⁃lic encrypted traffic data set and our captured Android HTTPS traffic,we prove the pro⁃posed method can achieve an obvious better effectiveness than other compared baselines.To the best of our knowledge,this is the first time the encrypted traffic classification with the dynamic word embedding has been addressed.展开更多
基金supported by the National Natural Science Foundation of China under Grant No.U1736216。
文摘Traffic encryption techniques facilitate cyberattackers to hide their presence and activities.Traffic classification is an important method to prevent network threats.However,due to the tremendous traffic volume and limitations of computing,most existing traffic classification techniques are inapplicable to the high-speed network environment.In this paper,we propose a High-speed Encrypted Traffic Classification(HETC)method containing two stages.First,to efficiently detect whether traffic is encrypted,HETC focuses on randomly sampled short flows and extracts aggregation entropies with chi-square test features to measure the different patterns of the byte composition and distribution between encrypted and unencrypted flows.Second,HETC introduces binary features upon the previous features and performs fine-grained traffic classification by combining these payload features with a Random Forest model.The experimental results show that HETC can achieve a 94%F-measure in detecting encrypted flows and a 85%–93%F-measure in classifying fine-grained flows for a 1-KB flow-length dataset,outperforming the state-of-the-art comparison methods.Meanwhile,HETC does not need to wait for the end of the flow and can extract mass computing features.The average time for HETC to process each flow is only 2 or 16 ms,which is lower than the flow duration in most cases,making it a good candidate for high-speed traffic classification.
基金supported by National Natural Science Foundation of China(62473341)Key Technologies R&D Program of Henan Province(242102211071,252102211086,252102210166).
文摘Network traffic classification is a crucial research area aimed at improving quality of service,simplifying network management,and enhancing network security.To address the growing complexity of cryptography,researchers have proposed various machine learning and deep learning approaches to tackle this challenge.However,existing mainstream methods face several general issues.On one hand,the widely used Transformer architecture exhibits high computational complexity,which negatively impacts its efficiency.On the other hand,traditional methods are often unreliable in traffic representation,frequently losing important byte information while retaining unnecessary biases.To address these problems,this paper introduces the Swin Transformer architecture into the domain of network traffic classification and proposes the NetST(Network Swin Transformer)model.This model improves the Swin Transformer to better accommodate the characteristics of network traffic,effectively addressing efficiency issues.Furthermore,this paper presents a traffic representation scheme designed to extract meaningful information from large volumes of traffic while minimizing bias.We integrate four datasets relevant to network traffic classification for our experiments,and the results demonstrate that NetST achieves a high accuracy rate while maintaining low memory usage.
基金supported by an Institute of Information&Communications Technology Planning&Evaluation(IITP)grant funded by the Korean government(MSIT)(RS-2024-00438156,Development of Security Resilience Technology Based on Network Slicing Services in a 5G Specialized Network).
文摘This study proposes an efficient traffic classification model to address the growing threat of distributed denial-of-service(DDoS)attacks in 5th generation technology standard(5G)slicing networks.The proposed method utilizes an ensemble of encoder components from multiple autoencoders to compress and extract latent representations from high-dimensional traffic data.These representations are then used as input for a support vector machine(SVM)-based metadata classifier,enabling precise detection of attack traffic.This architecture is designed to achieve both high detection accuracy and training efficiency,while adapting flexibly to the diverse service requirements and complexity of 5G network slicing.The model was evaluated using the DDoS Datasets 2022,collected in a simulated 5G slicing environment.Experiments were conducted under both class-balanced and class-imbalanced conditions.In the balanced setting,the model achieved an accuracy of 89.33%,an F1-score of 88.23%,and an Area Under the Curve(AUC)of 89.45%.In the imbalanced setting(attack:normal 7:3),the model maintained strong robustness,=achieving a recall of 100%and an F1-score of 90.91%,demonstrating its effectiveness in diverse real-world scenarios.Compared to existing AI-based detection methods,the proposed model showed higher precision,better handling of class imbalance,and strong generalization performance.Moreover,its modular structure is well-suited for deployment in containerized network function(NF)environments,making it a practical solution for real-world 5G infrastructure.These results highlight the potential of the proposed approach to enhance both the security and operational resilience of 5G slicing networks.
基金supported by the National Key Research and Development Program of China No.2023YFB2705000.
文摘With the rise of encrypted traffic,traditional network analysis methods have become less effective,leading to a shift towards deep learning-based approaches.Among these,multimodal learning-based classification methods have gained attention due to their ability to leverage diverse feature sets from encrypted traffic,improving classification accuracy.However,existing research predominantly relies on late fusion techniques,which hinder the full utilization of deep features within the data.To address this limitation,we propose a novel multimodal encrypted traffic classification model that synchronizes modality fusion with multiscale feature extraction.Specifically,our approach performs real-time fusion of modalities at each stage of feature extraction,enhancing feature representation at each level and preserving inter-level correlations for more effective learning.This continuous fusion strategy improves the model’s ability to detect subtle variations in encrypted traffic,while boosting its robustness and adaptability to evolving network conditions.Experimental results on two real-world encrypted traffic datasets demonstrate that our method achieves a classification accuracy of 98.23% and 97.63%,outperforming existing multimodal learning-based methods.
基金supported by National Key Research and Development Program of China(2022YFB3104903)S&T Program of Hebei(No.SZX2020034).
文摘Intelligent vehicle applications provide convenience but raise privacy and security concerns.Misuse of sensitive data,including vehicle location,and facial recognition information,poses a threat to user privacy.Hence,traffic classification is vital for promptly overseeing and controlling applications with sensitive information.In this paper,we propose ETNet,a framework that combines multiple features and leverages self-attention mechanisms to learn deep relationships between packets.ET-Net employs a multisimilarity triplet network to extract features from raw bytes,and exploits self-attention to capture long-range dependencies within packets in a session and contextual information features.Additionally,we utilizing the loss function to more effectively integrate information acquired from both byte sequences and their corresponding lengths.Through simulated evaluations on datasets with similar attributes,ET-Net demonstrates the ability to finely distinguish between nine categories of applications,achieving superior results compared to existing methods.
文摘The proliferation of internet traffic encryption has become a double-edged sword. While it significantly enhances user privacy, it also inadvertently shields cyber-attacks from detection, presenting a formidable challenge to cybersecurity. Traditional machine learning and deep learning techniques often fall short in identifying encrypted malicious traffic due to their inability to fully extract and utilize the implicit relational and positional information embedded within data packets. This limitation has led to an unresolved challenge in the cybersecurity community: how to effectively extract valuable insights from the complex patterns of traffic packet transmission. Consequently, this paper introduces the TB-Graph model, an encrypted malicious traffic classification model based on a relational graph attention network. The model is a heterogeneous traffic burst graph that embeds side-channel features, which are unaffected by encryption, into the graph nodes and connects them with three different types of burst edges. Subsequently, we design a relational positional coding that prevents the loss of temporal relationships between the original traffic flows during graph transformation. Ultimately, TB-Graph leverages the powerful graph representation learning capabilities of Relational Graph Attention Network (RGAT) to extract latent behavioral features from the burst graph nodes and edge relationships. Experimental results show that TB-Graph outperforms various state-of-the-art methods in fine-grained encrypted malicious traffic classification tasks on two public datasets, indicating its enhanced capability for identifying encrypted malicious traffic.
基金This research was funded by National Natural Science Foundation of China under Grant No.61806171Sichuan University of Science&Engineering Talent Project under Grant No.2021RC15+2 种基金Open Fund Project of Key Laboratory for Non-Destructive Testing and Engineering Computer of Sichuan Province Universities on Bridge Inspection and Engineering under Grant No.2022QYJ06Sichuan University of Science&Engineering Graduate Student Innovation Fund under Grant No.Y2023115The Scientific Research and Innovation Team Program of Sichuan University of Science and Technology under Grant No.SUSE652A006.
文摘While encryption technology safeguards the security of network communications,malicious traffic also uses encryption protocols to obscure its malicious behavior.To address the issues of traditional machine learning methods relying on expert experience and the insufficient representation capabilities of existing deep learning methods for encrypted malicious traffic,we propose an encrypted malicious traffic classification method that integrates global semantic features with local spatiotemporal features,called BERT-based Spatio-Temporal Features Network(BSTFNet).At the packet-level granularity,the model captures the global semantic features of packets through the attention mechanism of the Bidirectional Encoder Representations from Transformers(BERT)model.At the byte-level granularity,we initially employ the Bidirectional Gated Recurrent Unit(BiGRU)model to extract temporal features from bytes,followed by the utilization of the Text Convolutional Neural Network(TextCNN)model with multi-sized convolution kernels to extract local multi-receptive field spatial features.The fusion of features from both granularities serves as the ultimate multidimensional representation of malicious traffic.Our approach achieves accuracy and F1-score of 99.39%and 99.40%,respectively,on the publicly available USTC-TFC2016 dataset,and effectively reduces sample confusion within the Neris and Virut categories.The experimental results demonstrate that our method has outstanding representation and classification capabilities for encrypted malicious traffic.
基金supported in part by the Shandong Provincial Natural Science Foundation under Grant ZR2021QF008the National Natural Science Foundation of China under Grant 62072351+1 种基金in part by the open research project of ZheJiang Lab under grant 2021PD0AB01in part by the 111 Project under Grant B16037。
文摘Accurate classification of encrypted traffic plays an important role in network management.However,current methods confronts several problems:inability to characterize traffic that exhibits great dispersion,inability to classify traffic with multi-level features,and degradation due to limited training traffic size.To address these problems,this paper proposes a traffic granularity-based cryptographic traffic classification method,called Granular Classifier(GC).In this paper,a novel Cardinality-based Constrained Fuzzy C-Means(CCFCM)clustering algorithm is proposed to address the problem caused by limited training traffic,considering the ratio of cardinality that must be linked between flows to achieve good traffic partitioning.Then,an original representation format of traffic is presented based on granular computing,named Traffic Granules(TG),to accurately describe traffic structure by catching the dispersion of different traffic features.Each granule is a compact set of similar data with a refined boundary by excluding outliers.Based on TG,GC is constructed to perform traffic classification based on multi-level features.The performance of the GC is evaluated based on real-world encrypted network traffic data.Experimental results show that the GC achieves outstanding performance for encrypted traffic classification with limited size of training traffic and keeps accurate classification in dynamic network conditions.
基金the National Natural Science Foundation of China Youth Project(62302520).
文摘With the increasing proportion of encrypted traffic in cyberspace, the classification of encrypted traffic has becomea core key technology in network supervision. In recent years, many different solutions have emerged in this field.Most methods identify and classify traffic by extracting spatiotemporal characteristics of data flows or byte-levelfeatures of packets. However, due to changes in data transmission mediums, such as fiber optics and satellites,temporal features can exhibit significant variations due to changes in communication links and transmissionquality. Additionally, partial spatial features can change due to reasons like data reordering and retransmission.Faced with these challenges, identifying encrypted traffic solely based on packet byte-level features is significantlydifficult. To address this, we propose a universal packet-level encrypted traffic identification method, ComboPacket. This method utilizes convolutional neural networks to extract deep features of the current packet andits contextual information and employs spatial and channel attention mechanisms to select and locate effectivefeatures. Experimental data shows that Combo Packet can effectively distinguish between encrypted traffic servicecategories (e.g., File Transfer Protocol, FTP, and Peer-to-Peer, P2P) and encrypted traffic application categories (e.g.,BitTorrent and Skype). Validated on the ISCX VPN-non VPN dataset, it achieves classification accuracies of 97.0%and 97.1% for service and application categories, respectively. It also provides shorter training times and higherrecognition speeds. The performance and recognition capabilities of Combo Packet are significantly superior tothe existing classification methods mentioned.
基金supported in part by the National Key Research and Development Program of China(No.2022YFB4500800)the National Science Foundation of China(No.42071431).
文摘Encrypted traffic plays a crucial role in safeguarding network security and user privacy.However,encrypting malicious traffic can lead to numerous security issues,making the effective classification of encrypted traffic essential.Existing methods for detecting encrypted traffic face two significant challenges.First,relying solely on the original byte information for classification fails to leverage the rich temporal relationships within network traffic.Second,machine learning and convolutional neural network methods lack sufficient network expression capabilities,hindering the full exploration of traffic’s potential characteristics.To address these limitations,this study introduces a traffic classification method that utilizes time relationships and a higher-order graph neural network,termed HGNN-ETC.This approach fully exploits the original byte information and chronological relationships of traffic packets,transforming traffic data into a graph structure to provide the model with more comprehensive context information.HGNN-ETC employs an innovative k-dimensional graph neural network to effectively capture the multi-scale structural features of traffic graphs,enabling more accurate classification.We select the ISCXVPN and the USTC-TK2016 dataset for our experiments.The results show that compared with other state-of-the-art methods,our method can obtain a better classification effect on different datasets,and the accuracy rate is about 97.00%.In addition,by analyzing the impact of varying input specifications on classification performance,we determine the optimal network data truncation strategy and confirm the model’s excellent generalization ability on different datasets.
基金supported by the National Natural Science Foundation of China under Grant No.61402485National Natural Science Foundation of China under Grant No.61303061supported by the Open fund from HPCL No.201513-01
文摘Machine Learning(ML) techniques have been widely applied in recent traffic classification.However, the problems of both discriminator bias and class imbalance decrease the accuracies of ML based traffic classifier. In this paper, we propose an accurate and extensible traffic classifier. Specifically, to address the discriminator bias issue, our classifier is built by making an optimal cascade of binary sub-classifiers, where each binary sub-classifier is trained independently with the discriminators used for identifying application specific traffic. Moreover, to balance a training dataset,we apply SMOTE algorithm in generating artificial training samples for minority classes.We evaluate our classifier on two datasets collected from different network border routers.Compared with the previous multi-class traffic classifiers built in one-time training process,our classifier achieves much higher F-Measure and AUC for each application.
基金This work was supported by the National Natural Science Foundation of China(61871046).
文摘Attacks on websites and network servers are among the most critical threats in network security.Network behavior identification is one of the most effective ways to identify malicious network intrusions.Analyzing abnormal network traffic patterns and traffic classification based on labeled network traffic data are among the most effective approaches for network behavior identification.Traditional methods for network traffic classification utilize algorithms such as Naive Bayes,Decision Tree and XGBoost.However,network traffic classification,which is required for network behavior identification,generally suffers from the problem of low accuracy even with the recently proposed deep learning models.To improve network traffic classification accuracy thus improving network intrusion detection rate,this paper proposes a new network traffic classification model,called ArcMargin,which incorporates metric learning into a convolutional neural network(CNN)to make the CNN model more discriminative.ArcMargin maps network traffic samples from the same category more closely while samples from different categories are mapped as far apart as possible.The metric learning regularization feature is called additive angular margin loss,and it is embedded in the object function of traditional CNN models.The proposed ArcMargin model is validated with three datasets and is compared with several other related algorithms.According to a set of classification indicators,the ArcMargin model is proofed to have better performances in both network traffic classification tasks and open-set tasks.Moreover,in open-set tasks,the ArcMargin model can cluster unknown data classes that do not exist in the previous training dataset.
基金supported by the National Natural Science Foundation of China (Grants Nos.61931004,62072250)the Talent Launch Fund of Nanjing University of Information Science and Technology (2020r061).
文摘Encrypted traffic classification has become a hot issue in network security research.The class imbalance problem of traffic samples often causes the deterioration of Machine Learning based classifier performance.Although the Generative Adversarial Network(GAN)method can generate new samples by learning the feature distribution of the original samples,it is confronted with the problems of unstable training andmode collapse.To this end,a novel data augmenting approach called Graph CWGAN-GP is proposed in this paper.The traffic data is first converted into grayscale images as the input for the proposed model.Then,the minority class data is augmented with our proposed model,which is built by introducing conditional constraints and a new distance metric in typical GAN.Finally,the classical deep learning model is adopted as a classifier to classify datasets augmented by the Condition GAN(CGAN),Wasserstein GAN-Gradient Penalty(WGAN-GP)and Graph CWGAN-GP,respectively.Compared with the state-of-the-art GAN methods,the Graph CWGAN-GP cannot only control the modes of the data to be generated,but also overcome the problem of unstable training and generate more realistic and diverse samples.The experimental results show that the classification precision,recall and F1-Score of theminority class in the balanced dataset augmented in this paper have improved by more than 2.37%,3.39% and 4.57%,respectively.
基金This work has supported by the Xiamen University Malaysia Research Fund(XMUMRF)(Grant No:XMUMRF/2019-C3/IECE/0007)。
文摘Internet of Things(IoT)defines a network of devices connected to the internet and sharing a massive amount of data between each other and a central location.These IoT devices are connected to a network therefore prone to attacks.Various management tasks and network operations such as security,intrusion detection,Quality-of-Service provisioning,performance monitoring,resource provisioning,and traffic engineering require traffic classification.Due to the ineffectiveness of traditional classification schemes,such as port-based and payload-based methods,researchers proposed machine learning-based traffic classification systems based on shallow neural networks.Furthermore,machine learning-based models incline to misclassify internet traffic due to improper feature selection.In this research,an efficient multilayer deep learning based classification system is presented to overcome these challenges that can classify internet traffic.To examine the performance of the proposed technique,Moore-dataset is used for training the classifier.The proposed scheme takes the pre-processed data and extracts the flow features using a deep neural network(DNN).In particular,the maximum entropy classifier is used to classify the internet traffic.The experimental results show that the proposed hybrid deep learning algorithm is effective and achieved high accuracy for internet traffic classification,i.e.,99.23%.Furthermore,the proposed algorithm achieved the highest accuracy compared to the support vector machine(SVM)based classification technique and k-nearest neighbours(KNNs)based classification technique.
基金supported by State Key Program of National Natural Science Foundation of China under Grant No.61072061111 Project of China under Grant No.B08004the Fundamental Research Funds for the Central Universities under Grant No.2009RC0122
文摘The growing P2P streaming traffic brings a variety of problems and challenges to ISP networks and service providers.A P2P streaming traffic classification method based on sampling technology is presented in this paper.By analyzing traffic statistical features and network behavior of P2P streaming,a group of flow characteristics were found,which can make P2P streaming more recognizable among other applications.Attributes from Netflow and those proposed by us are compared in terms of classification accuracy,and so are the results of different sampling rates.It is proved that the unified classification model with the proposed attributes can identify P2P streaming quickly and efficiently in the online system.Even with 1:50 sampling rate,the recognition accuracy can be higher than 94%.Moreover,we have evaluated the CPU resources,storage capacity and time consumption before and after the sampling,it is shown that the classification model after the sampling can significantly reduce the resource requirements with the same recognition accuracy.
文摘The continual growth of the use of technological appliances during the COVID-19 pandemic has resulted in a massive volume of data flow on the Internet,as many employees have transitioned to working from home.Furthermore,with the increase in the adoption of encrypted data transmission by many people who tend to use a Virtual Private Network(VPN)or Tor Browser(dark web)to keep their data privacy and hidden,network traffic encryption is rapidly becoming a universal approach.This affects and complicates the quality of service(QoS),traffic monitoring,and network security provided by Internet Service Providers(ISPs),particularly for analysis and anomaly detection approaches based on the network traffic’s nature.The method of categorizing encrypted traffic is one of the most challenging issues introduced by a VPN as a way to bypass censorship as well as gain access to geo-locked services.Therefore,an efficient approach is especially needed that enables the identification of encrypted network traffic data to extract and select valuable features which improve the quality of service and network management as well as to oversee the overall performance.In this paper,the classification of network traffic data in terms of VPN and non-VPN traffic is studied based on the efficiency of time-based features extracted from network packets.Therefore,this paper suggests two machine learning models that categorize network traffic into encrypted and non-encrypted traffic.The proposed models utilize statistical features(SF),Pearson Correlation(PC),and a Genetic Algorithm(GA),preprocessing the traffic samples into net flow traffic to accomplish the experiment’s objectives.The GA-based method utilizes a stochastic method based on natural genetics and biological evolution to extract essential features.The PC-based method performs well in removing different features of network traffic.With a microsecond perpacket prediction time,the best model achieved an accuracy of more than 95.02 percent in the most demanding traffic classification task,a drop in accuracy of only 2.37 percent in comparison to the entire statistical-based machine learning approach.This is extremely promising for the development of real-time traffic analyzers.
基金supported by the People’s Public Security University of China central basic scientific research business program(No.2021JKF206).
文摘Traffic characterization(e.g.,chat,video)and application identifi-cation(e.g.,FTP,Facebook)are two of the more crucial jobs in encrypted network traffic classification.These two activities are typically carried out separately by existing systems using separate models,significantly adding to the difficulty of network administration.Convolutional Neural Network(CNN)and Transformer are deep learning-based approaches for network traf-fic classification.CNN is good at extracting local features while ignoring long-distance information from the network traffic sequence,and Transformer can capture long-distance feature dependencies while ignoring local details.Based on these characteristics,a multi-task learning model that combines Transformer and 1D-CNN for encrypted traffic classification is proposed(MTC).In order to make up for the Transformer’s lack of local detail feature extraction capability and the 1D-CNN’s shortcoming of ignoring long-distance correlation information when processing traffic sequences,the model uses a parallel structure to fuse the features generated by the Transformer block and the 1D-CNN block with each other using a feature fusion block.This structure improved the representation of traffic features by both blocks and allows the model to perform well with both long and short length sequences.The model simultaneously handles multiple tasks,which lowers the cost of training.Experiments reveal that on the ISCX VPN-nonVPN dataset,the model achieves an average F1 score of 98.25%and an average recall of 98.30%for the task of identifying applications,and an average F1 score of 97.94%,and an average recall of 97.54%for the task of traffic characterization.When advanced models on the same dataset are chosen for comparison,the model produces the best results.To prove the generalization,we applied MTC to CICIDS2017 dataset,and our model also achieved good results.
基金Supported by the National High Technology Research and Development Programme of China (No. 2005AA121620, 2006AA01Z232)the Zhejiang Provincial Natural Science Foundation of China (No. Y1080935 )the Research Innovation Program for Graduate Students in Jiangsu Province (No. CX07B_ 110zF)
文摘Interact traffic classification is vital to the areas of network operation and management. Traditional classification methods such as port mapping and payload analysis are becoming increasingly difficult as newly emerged applications (e. g. Peer-to-Peer) using dynamic port numbers, masquerading techniques and encryption to avoid detection. This paper presents a machine learning (ML) based traffic classifica- tion scheme, which offers solutions to a variety of network activities and provides a platform of performance evaluation for the classifiers. The impact of dataset size, feature selection, number of application types and ML algorithm selection on classification performance is analyzed and demonstrated by the following experiments: (1) The genetic algorithm based feature selection can dramatically reduce the cost without diminishing classification accuracy. (2) The chosen ML algorithms can achieve high classification accuracy. Particularly, REPTree and C4.5 outperform the other ML algorithms when computational complexity and accuracy are both taken into account. (3) Larger dataset and fewer application types would result in better classification accuracy. Finally, early detection with only several initial packets is proposed for real-time network activity and it is proved to be feasible according to the preliminary results.
基金Supported in part by the National 863 Project of China (No.2006AA01Z232)Zhejiang Natural Science Founda-tion (No.Y1080935)Research Innovation Program Project for Graduate Students in Jiangsu Province ( No.CX07B_110zF)
文摘Accurate and real-time classification of network traffic is significant to network operation and management such as QoS differentiation, traffic shaping and security surveillance. However, with many newly emerged P2P applications using dynamic port numbers, masquerading techniques, and payload encryption to avoid detection, traditional classification approaches turn to be ineffective. In this paper, we present a layered hybrid system to classify current Internet traffic, motivated by variety of network activities and their requirements of traffic classification. The proposed method could achieve fast and accurate traffic classification with low overheads and robustness to accommodate both known and unknown/encrypted applications. Furthermore, it is feasible to be used in the context of real-time traffic classification. Our experimental results show the distinct advantages of the proposed classifi- cation system, compared with the one-step Machine Learning (ML) approach.
文摘Traffic identification becomes more important,yet more challenging as related encryption techniques are rapidly developing nowadays.Unlike recent deep learning methods that apply image processing to solve such encrypted traffic problems,in this pa⁃per,we propose a method named Payload Encoding Representation from Transformer(PERT)to perform automatic traffic feature extraction using a state-of-the-art dynamic word embedding technique.By implementing traffic classification experiments on a pub⁃lic encrypted traffic data set and our captured Android HTTPS traffic,we prove the pro⁃posed method can achieve an obvious better effectiveness than other compared baselines.To the best of our knowledge,this is the first time the encrypted traffic classification with the dynamic word embedding has been addressed.