期刊文献+
共找到19篇文章
< 1 >
每页显示 20 50 100
Clustered Federated Learning with Weighted Model Aggregation for Imbalanced Data
1
作者 Dong Wang Naifu Zhang Meixia Tao 《China Communications》 SCIE CSCD 2022年第8期41-56,共16页
As a promising edge learning framework in future 6G networks,federated learning(FL)faces a number of technical challenges due to the heterogeneous network environment and diversified user behaviors.Data imbalance is o... As a promising edge learning framework in future 6G networks,federated learning(FL)faces a number of technical challenges due to the heterogeneous network environment and diversified user behaviors.Data imbalance is one of these challenges that can significantly degrade the learning efficiency.To deal with data imbalance issue,this work proposes a new learning framework,called clustered federated learning with weighted model aggregation(weighted CFL).Compared with traditional FL,our weighted CFL adaptively clusters the participating edge devices based on the cosine similarity of their local gradients at each training iteration,and then performs weighted per-cluster model aggregation.Therein,the similarity threshold for clustering is adaptive over iterations in response to the time-varying divergence of local gradients.Moreover,the weights for per-cluster model aggregation are adjusted according to the data balance feature so as to speed up the convergence rate.Experimental results show that the proposed weighted CFL achieves a faster model convergence rate and greater learning accuracy than benchmark methods under the imbalanced data scenario. 展开更多
关键词 clustered federated learning data imbalance convergence rate analysis model aggregation
在线阅读 下载PDF
IDS-INT:Intrusion detection system using transformer-based transfer learning for imbalanced network traffic 被引量:11
2
作者 Farhan Ullah Shamsher Ullah +1 位作者 Gautam Srivastava Jerry Chun-Wei Lin 《Digital Communications and Networks》 SCIE CSCD 2024年第1期190-204,共15页
A network intrusion detection system is critical for cyber security against llegitimate attacks.In terms of feature perspectives,network traffic may include a variety of elements such as attack reference,attack type,a... A network intrusion detection system is critical for cyber security against llegitimate attacks.In terms of feature perspectives,network traffic may include a variety of elements such as attack reference,attack type,a subcategory of attack,host information,malicious scripts,etc.In terms of network perspectives,network traffic may contain an imbalanced number of harmful attacks when compared to normal traffic.It is challenging to identify a specific attack due to complex features and data imbalance issues.To address these issues,this paper proposes an Intrusion Detection System using transformer-based transfer learning for Imbalanced Network Traffic(IDS-INT).IDS-INT uses transformer-based transfer learning to learn feature interactions in both network feature representation and imbalanced data.First,detailed information about each type of attack is gathered from network interaction descriptions,which include network nodes,attack type,reference,host information,etc.Second,the transformer-based transfer learning approach is developed to learn detailed feature representation using their semantic anchors.Third,the Synthetic Minority Oversampling Technique(SMOTE)is implemented to balance abnormal traffic and detect minority attacks.Fourth,the Convolution Neural Network(CNN)model is designed to extract deep features from the balanced network traffic.Finally,the hybrid approach of the CNN-Long Short-Term Memory(CNN-LSTM)model is developed to detect different types of attacks from the deep features.Detailed experiments are conducted to test the proposed approach using three standard datasets,i.e.,UNsWNB15,CIC-IDS2017,and NSL-KDD.An explainable AI approach is implemented to interpret the proposed method and develop a trustable model. 展开更多
关键词 Network intrusion detection Transfer learning Features extraction imbalance data Explainable AI CYBERSECURITY
在线阅读 下载PDF
An Ensemble Classification Model Based on Imbalanced Data for Aviation Safety
3
作者 NI Xiaomei WANG Huawei +1 位作者 LV Shaolan XIONG Minglan 《Wuhan University Journal of Natural Sciences》 CAS CSCD 2021年第5期437-443,共7页
Nowadays aviation accidents have become one of the major causes of severe injuries and fatalities around the world. This attracts the research community to look into aviation safety by applying data analysis technique... Nowadays aviation accidents have become one of the major causes of severe injuries and fatalities around the world. This attracts the research community to look into aviation safety by applying data analysis techniques based on an advanced machine learning algorithm. An ensemble classification model based on Aviation Safety Reporting System(ASRS) has been proposed to analyze aviation safety targeting the people injured in the system.The ensemble classification model shall contain two modules: the data-driven module consisting of data cleaning, feature selection,and imbalanced data division and reorganization, and the modeldriven module stacked by Random Forest(RF), XGBoost(XGB),and Light Gradient Boosting Machine(LGBM) separately. The results indicate that the ensemble model could solve the data imbalance while vastly improving accuracy. LGBM illustrates higher accuracy and faster run in the analysis of a single model of the ASRS-based imbalanced data, while the ensemble model has the best performance in classification at the same time. The ensemble model proposed for imbalanced data classification can provide a certain reference for similar data processing while improving the safety of civil aviation. 展开更多
关键词 aviation safety Aviation Safety Reporting System(ASRS) ensemble model imbalance data CLASSIFICATION Light Gradient Boosting Machine(LGBM)
原文传递
From microstructure to mechanical properties:Image-based machine learning prediction for AZ80 magnesium alloy
4
作者 Erfan Azqadan Arash Arami Hamid Jahed 《Journal of Magnesium and Alloys》 2025年第9期4231-4244,共14页
Recent advancements in machine learning and computer vision enable direct prediction of mechanical properties from microstructure images.The feasibility of this process hinges on the material structure-property relati... Recent advancements in machine learning and computer vision enable direct prediction of mechanical properties from microstructure images.The feasibility of this process hinges on the material structure-property relationship,richness of the dataset,and the choice of machine learning approach.This study investigates the application of a deep learning model to directly predict the yield strength(YS),ultimate tensile strength(UTS),and true stress-strain curve of the cast-forged AZ80 alloys from SEM microstructure images.We manufactured 27 cast-forged AZ80 magnesium alloy components using varied process parameters,creating a diverse dataset of AZ80 microstructures and mechanical properties through their characterization.In addition to predicting magnesium alloy properties,we address challenges related to data imbalance,brightness and contrast variability,and microstructure long-range heterogeneity.We demonstrate that synthetic data oversampling using a denoising diffusion probabilistic model effectively improves the model’s prediction accuracy via balancing the minority classes.A rigorous analysis of the model’s performance shows that the model accurately predicts the YS,UTS,and Ramberg-Osgood equation’s parameters(K and n).In image-out validation,the model achieves average percentage errors of 2.10%(YS),2.15%(UTS),1.50%(K),and 5.47%(n).In class-out validation,the errors are 6.27%,9.58%,4.69%,and 10.24%,respectively. 展开更多
关键词 Machine learning Magnesium alloys Mechanical properties Computer vision data imbalance Cast-forging
在线阅读 下载PDF
An Arrhythmia Intelligent Recognition Method Based on a Multimodal Information and Spatio-Temporal Hybrid Neural Network Model
5
作者 Xinchao Han Aojun Zhang +6 位作者 Runchuan Li Shengya Shen Di Zhang Bo Jin Longfei Mao Linqi Yang Shuqin Zhang 《Computers, Materials & Continua》 2025年第2期3443-3465,共23页
Electrocardiogram (ECG) analysis is critical for detecting arrhythmias, but traditional methods struggle with large-scale Electrocardiogram data and rare arrhythmia events in imbalanced datasets. These methods fail to... Electrocardiogram (ECG) analysis is critical for detecting arrhythmias, but traditional methods struggle with large-scale Electrocardiogram data and rare arrhythmia events in imbalanced datasets. These methods fail to perform multi-perspective learning of temporal signals and Electrocardiogram images, nor can they fully extract the latent information within the data, falling short of the accuracy required by clinicians. Therefore, this paper proposes an innovative hybrid multimodal spatiotemporal neural network to address these challenges. The model employs a multimodal data augmentation framework integrating visual and signal-based features to enhance the classification performance of rare arrhythmias in imbalanced datasets. Additionally, the spatiotemporal fusion module incorporates a spatiotemporal graph convolutional network to jointly model temporal and spatial features, uncovering complex dependencies within the Electrocardiogram data and improving the model’s ability to represent complex patterns. In experiments conducted on the MIT-BIH arrhythmia dataset, the model achieved 99.95% accuracy, 99.80% recall, and a 99.78% F1 score. The model was further validated for generalization using the clinical INCART arrhythmia dataset, and the results demonstrated its effectiveness in terms of both generalization and robustness. 展开更多
关键词 Multimodal learning spatio-temporal hybrid graph convolutional network data imbalance ECG classification
在线阅读 下载PDF
An Improved Hilbert Curve for Parallel Spatial Data Partitioning 被引量:7
6
作者 MENG Lingkui HUANG Changqing ZHAO Chunyu LIN Zhiyong 《Geo-Spatial Information Science》 2007年第4期282-286,共5页
A novel Hilbert-curve is introduced for parallel spatial data partitioning, with consideration of the huge-amount property of spatial information and the variable-length characteristic of vector data items. Based on t... A novel Hilbert-curve is introduced for parallel spatial data partitioning, with consideration of the huge-amount property of spatial information and the variable-length characteristic of vector data items. Based on the improved Hilbert curve, the algorithm can be designed to achieve almost-uniform spatial data partitioning among multiple disks in parallel spatial databases. Thus, the phenomenon of data imbalance can be significantly avoided and search and query efficiency can be enhanced. 展开更多
关键词 parallel spatial database spatial data partitioning data imbalance Hilbert curve
在线阅读 下载PDF
Chinese DeepSeek: Performance of Various Oversampling Techniques on Public Perceptions Using Natural Language Processing
7
作者 Anees Ara Muhammad Mujahid +2 位作者 Amal Al-Rasheed Shaha Al-Otaibi Tanzila Saba 《Computers, Materials & Continua》 2025年第8期2717-2731,共15页
DeepSeek Chinese artificial intelligence(AI)open-source model,has gained a lot of attention due to its economical training and efficient inference.DeepSeek,a model trained on large-scale reinforcement learning without... DeepSeek Chinese artificial intelligence(AI)open-source model,has gained a lot of attention due to its economical training and efficient inference.DeepSeek,a model trained on large-scale reinforcement learning without supervised fine-tuning as a preliminary step,demonstrates remarkable reasoning capabilities of performing a wide range of tasks.DeepSeek is a prominent AI-driven chatbot that assists individuals in learning and enhances responses by generating insightful solutions to inquiries.Users possess divergent viewpoints regarding advanced models like DeepSeek,posting both their merits and shortcomings across several social media platforms.This research presents a new framework for predicting public sentiment to evaluate perceptions of DeepSeek.To transform the unstructured data into a suitable manner,we initially collect DeepSeek-related tweets from Twitter and subsequently implement various preprocessing methods.Subsequently,we annotated the tweets utilizing the Valence Aware Dictionary and sentiment Reasoning(VADER)methodology and the lexicon-driven TextBlob.Next,we classified the attitudes obtained from the purified data utilizing the proposed hybrid model.The proposed hybrid model consists of long-term,shortterm memory(LSTM)and bidirectional gated recurrent units(BiGRU).To strengthen it,we include multi-head attention,regularizer activation,and dropout units to enhance performance.Topic modeling employing KMeans clustering and Latent Dirichlet Allocation(LDA),was utilized to analyze public behavior concerning DeepSeek.The perceptions demonstrate that 82.5%of the people are positive,15.2%negative,and 2.3%neutral using TextBlob,and 82.8%positive,16.1%negative,and 1.2%neutral using the VADER analysis.The slight difference in results ensures that both analyses concur with their overall perceptions and may have distinct views of language peculiarities.The results indicate that the proposed model surpassed previous state-of-the-art approaches. 展开更多
关键词 DeepSeek PREDICTION natural language processing deep learning analysis TextBlob imbalance data
在线阅读 下载PDF
An Optimal Big Data Analytics with Concept Drift Detection on High-Dimensional Streaming Data 被引量:1
8
作者 Romany F.Mansour Shaha Al-Otaibi +3 位作者 Amal Al-Rasheed Hanan Aljuaid Irina V.Pustokhina Denis A.Pustokhin 《Computers, Materials & Continua》 SCIE EI 2021年第9期2843-2858,共16页
Big data streams started becoming ubiquitous in recent years,thanks to rapid generation of massive volumes of data by different applications.It is challenging to apply existing data mining tools and techniques directl... Big data streams started becoming ubiquitous in recent years,thanks to rapid generation of massive volumes of data by different applications.It is challenging to apply existing data mining tools and techniques directly in these big data streams.At the same time,streaming data from several applications results in two major problems such as class imbalance and concept drift.The current research paper presents a new Multi-Objective Metaheuristic Optimization-based Big Data Analytics with Concept Drift Detection(MOMBD-CDD)method on High-Dimensional Streaming Data.The presented MOMBD-CDD model has different operational stages such as pre-processing,CDD,and classification.MOMBD-CDD model overcomes class imbalance problem by Synthetic Minority Over-sampling Technique(SMOTE).In order to determine the oversampling rates and neighboring point values of SMOTE,Glowworm Swarm Optimization(GSO)algorithm is employed.Besides,Statistical Test of Equal Proportions(STEPD),a CDD technique is also utilized.Finally,Bidirectional Long Short-Term Memory(Bi-LSTM)model is applied for classification.In order to improve classification performance and to compute the optimum parameters for Bi-LSTM model,GSO-based hyperparameter tuning process is carried out.The performance of the presented model was evaluated using high dimensional benchmark streaming datasets namely intrusion detection(NSL KDDCup)dataset and ECUE spam dataset.An extensive experimental validation process confirmed the effective outcome of MOMBD-CDD model.The proposed model attained high accuracy of 97.45%and 94.23%on the applied KDDCup99 Dataset and ECUE Spam datasets respectively. 展开更多
关键词 Streaming data concept drift classification model deep learning class imbalance data
在线阅读 下载PDF
Advances in Machine Learning for Explainable Intrusion Detection Using Imbalance Datasets in Cybersecurity with Harris Hawks Optimization
9
作者 Amjad Rehman Tanzila Saba +2 位作者 Mona M.Jamjoom Shaha Al-Otaibi Muhammad I.Khan 《Computers, Materials & Continua》 2026年第1期1804-1818,共15页
Modern intrusion detection systems(MIDS)face persistent challenges in coping with the rapid evolution of cyber threats,high-volume network traffic,and imbalanced datasets.Traditional models often lack the robustness a... Modern intrusion detection systems(MIDS)face persistent challenges in coping with the rapid evolution of cyber threats,high-volume network traffic,and imbalanced datasets.Traditional models often lack the robustness and explainability required to detect novel and sophisticated attacks effectively.This study introduces an advanced,explainable machine learning framework for multi-class IDS using the KDD99 and IDS datasets,which reflects real-world network behavior through a blend of normal and diverse attack classes.The methodology begins with sophisticated data preprocessing,incorporating both RobustScaler and QuantileTransformer to address outliers and skewed feature distributions,ensuring standardized and model-ready inputs.Critical dimensionality reduction is achieved via the Harris Hawks Optimization(HHO)algorithm—a nature-inspired metaheuristic modeled on hawks’hunting strategies.HHO efficiently identifies the most informative features by optimizing a fitness function based on classification performance.Following feature selection,the SMOTE is applied to the training data to resolve class imbalance by synthetically augmenting underrepresented attack types.The stacked architecture is then employed,combining the strengths of XGBoost,SVM,and RF as base learners.This layered approach improves prediction robustness and generalization by balancing bias and variance across diverse classifiers.The model was evaluated using standard classification metrics:precision,recall,F1-score,and overall accuracy.The best overall performance was recorded with an accuracy of 99.44%for UNSW-NB15,demonstrating the model’s effectiveness.After balancing,the model demonstrated a clear improvement in detecting the attacks.We tested the model on four datasets to show the effectiveness of the proposed approach and performed the ablation study to check the effect of each parameter.Also,the proposed model is computationaly efficient.To support transparency and trust in decision-making,explainable AI(XAI)techniques are incorporated that provides both global and local insight into feature contributions,and offers intuitive visualizations for individual predictions.This makes it suitable for practical deployment in cybersecurity environments that demand both precision and accountability. 展开更多
关键词 Intrusion detection XAI machine learning ensemble method cybersecurity imbalance data
在线阅读 下载PDF
The study of intelligent algorithm in particle identification of heavy-ion collisions at low and intermediate energies
10
作者 Gao-Yi Cheng Qian-Min Su +1 位作者 Xi-Guang Cao Guo-Qiang Zhang 《Nuclear Science and Techniques》 SCIE EI CAS CSCD 2024年第2期170-182,共13页
Traditional particle identification methods face timeconsuming,experience-dependent,and poor repeatability challenges in heavy-ion collisions at low and intermediate energies.Researchers urgently need solutions to the... Traditional particle identification methods face timeconsuming,experience-dependent,and poor repeatability challenges in heavy-ion collisions at low and intermediate energies.Researchers urgently need solutions to the dilemma of traditional particle identification methods.This study explores the possibility of applying intelligent learning algorithms to the particle identification of heavy-ion collisions at low and intermediate energies.Multiple intelligent algorithms,including XgBoost and TabNet,were selected to test datasets from the neutron ion multi-detector for reaction-oriented dynamics(NIMROD-ISiS)and Geant4 simulation.Tree-based machine learning algorithms and deep learning algorithms e.g.TabNet show excellent performance and generalization ability.Adding additional data features besides energy deposition can improve the algorithm’s performance when the data distribution is nonuniform.Intelligent learning algorithms can be applied to solve the particle identification problem in heavy-ion collisions at low and intermediate energies. 展开更多
关键词 Heavy-ion collisions at low and intermediate energies Machine learning Ensemble learning algorithm Particle identification data imbalance
在线阅读 下载PDF
Classification of aviation incident causes using LGBM with improved cross-validation 被引量:1
11
作者 NI Xiaomei WANG Huawei +1 位作者 CHEN Lingzi LIN Ruiguan 《Journal of Systems Engineering and Electronics》 SCIE CSCD 2024年第2期396-405,共10页
Aviation accidents are currently one of the leading causes of significant injuries and deaths worldwide. This entices researchers to investigate aircraft safety using data analysis approaches based on an advanced mach... Aviation accidents are currently one of the leading causes of significant injuries and deaths worldwide. This entices researchers to investigate aircraft safety using data analysis approaches based on an advanced machine learning algorithm.To assess aviation safety and identify the causes of incidents, a classification model with light gradient boosting machine (LGBM)based on the aviation safety reporting system (ASRS) has been developed. It is improved by k-fold cross-validation with hybrid sampling model (HSCV), which may boost classification performance and maintain data balance. The results show that employing the LGBM-HSCV model can significantly improve accuracy while alleviating data imbalance. Vertical comparison with other cross-validation (CV) methods and lateral comparison with different fold times comprise the comparative approach. Aside from the comparison, two further CV approaches based on the improved method in this study are discussed:one with a different sampling and folding order, and the other with more CV. According to the assessment indices with different methods, the LGBMHSCV model proposed here is effective at detecting incident causes. The improved model for imbalanced data categorization proposed may serve as a point of reference for similar data processing, and the model’s accurate identification of civil aviation incident causes can assist to improve civil aviation safety. 展开更多
关键词 aviation safety imbalance data light gradient boosting machine(LGBM) cross-validation(CV)
在线阅读 下载PDF
End-to-End 2D Convolutional Neural Network Architecture for Lung Nodule Identification and Abnormal Detection in Cloud
12
作者 Safdar Ali Saad Asad +2 位作者 Zeeshan Asghar Atif Ali Dohyeun Kim 《Computers, Materials & Continua》 SCIE EI 2023年第4期461-475,共15页
The extent of the peril associated with cancer can be perceivedfrom the lack of treatment, ineffective early diagnosis techniques, and mostimportantly its fatality rate. Globally, cancer is the second leading cause of... The extent of the peril associated with cancer can be perceivedfrom the lack of treatment, ineffective early diagnosis techniques, and mostimportantly its fatality rate. Globally, cancer is the second leading cause ofdeath and among over a hundred types of cancer;lung cancer is the secondmost common type of cancer as well as the leading cause of cancer-relateddeaths. Anyhow, an accurate lung cancer diagnosis in a timely manner canelevate the likelihood of survival by a noticeable margin and medical imagingis a prevalent manner of cancer diagnosis since it is easily accessible to peoplearound the globe. Nonetheless, this is not eminently efficacious consideringhuman inspection of medical images can yield a high false positive rate. Ineffectiveand inefficient diagnosis is a crucial reason for such a high mortalityrate for this malady. However, the conspicuous advancements in deep learningand artificial intelligence have stimulated the development of exceedinglyprecise diagnosis systems. The development and performance of these systemsrely prominently on the data that is used to train these systems. A standardproblem witnessed in publicly available medical image datasets is the severeimbalance of data between different classes. This grave imbalance of data canmake a deep learning model biased towards the dominant class and unableto generalize. This study aims to present an end-to-end convolutional neuralnetwork that can accurately differentiate lung nodules from non-nodules andreduce the false positive rate to a bare minimum. To tackle the problem ofdata imbalance, we oversampled the data by transforming available images inthe minority class. The average false positive rate in the proposed method isa mere 1.5 percent. However, the average false negative rate is 31.76 percent.The proposed neural network has 68.66 percent sensitivity and 98.42 percentspecificity. 展开更多
关键词 Convolutional neural networks medical image processing lung nodule identification data imbalance deep learning
在线阅读 下载PDF
MEM-TET: Improved Triplet Network for Intrusion Detection System 被引量:3
13
作者 Weifei Wang Jinguo Li +1 位作者 Na Zhao Min Liu 《Computers, Materials & Continua》 SCIE EI 2023年第7期471-487,共17页
With the advancement of network communication technology,network traffic shows explosive growth.Consequently,network attacks occur frequently.Network intrusion detection systems are still the primary means of detectin... With the advancement of network communication technology,network traffic shows explosive growth.Consequently,network attacks occur frequently.Network intrusion detection systems are still the primary means of detecting attacks.However,two challenges continue to stymie the development of a viable network intrusion detection system:imbalanced training data and new undiscovered attacks.Therefore,this study proposes a unique deep learning-based intrusion detection method.We use two independent in-memory autoencoders trained on regular network traffic and attacks to capture the dynamic relationship between traffic features in the presence of unbalanced training data.Then the original data is fed into the triplet network by forming a triplet with the data reconstructed from the two encoders to train.Finally,the distance relationship between the triples determines whether the traffic is an attack.In addition,to improve the accuracy of detecting unknown attacks,this research proposes an improved triplet loss function that is used to pull the distances of the same class closer while pushing the distances belonging to different classes farther in the learned feature space.The proposed approach’s effectiveness,stability,and significance are evaluated against advanced models on the Android Adware and General Malware Dataset(AAGM17),Knowledge Discovery and Data Mining Cup 1999(KDDCUP99),Canadian Institute for Cybersecurity Group’s Intrusion Detection Evaluation Dataset(CICIDS2017),UNSW-NB15,Network Security Lab-Knowledge Discovery and Data Mining(NSL-KDD)datasets.The achieved results confirmed the superiority of the proposed method for the task of network intrusion detection. 展开更多
关键词 Intrusion detection memory-augmented autoencoder deep metric learning imbalance data
在线阅读 下载PDF
Enhanced Coyote Optimization with Deep Learning Based Cloud-Intrusion Detection System 被引量:1
14
作者 Abdullah M.Basahel Mohammad Yamin +1 位作者 Sulafah M.Basahel E.Laxmi Lydia 《Computers, Materials & Continua》 SCIE EI 2023年第2期4319-4336,共18页
Cloud Computing(CC)is the preference of all information technology(IT)organizations as it offers pay-per-use based and flexible services to its users.But the privacy and security become the main hindrances in its achi... Cloud Computing(CC)is the preference of all information technology(IT)organizations as it offers pay-per-use based and flexible services to its users.But the privacy and security become the main hindrances in its achievement due to distributed and open architecture that is prone to intruders.Intrusion Detection System(IDS)refers to one of the commonly utilized system for detecting attacks on cloud.IDS proves to be an effective and promising technique,that identifies malicious activities and known threats by observing traffic data in computers,and warnings are given when such threatswere identified.The current mainstream IDS are assisted with machine learning(ML)but have issues of low detection rates and demanded wide feature engineering.This article devises an Enhanced Coyote Optimization with Deep Learning based Intrusion Detection System for Cloud Security(ECODL-IDSCS)model.The ECODL-IDSCS model initially addresses the class imbalance data problem by the use of Adaptive Synthetic(ADASYN)technique.For detecting and classification of intrusions,long short term memory(LSTM)model is exploited.In addition,ECO algorithm is derived to optimally fine tune the hyperparameters related to the LSTM model to enhance its detection efficiency in the cloud environment.Once the presented ECODL-IDSCS model is tested on benchmark dataset,the experimental results show the promising performance of the ECODL-IDSCS model over the existing IDS models. 展开更多
关键词 Intrusion detection system cloud security coyote optimization algorithm class imbalance data deep learning
在线阅读 下载PDF
CT-GCN+:a high-performance cryptocurrency transaction graph convolutional model for phishing node classification 被引量:1
15
作者 Bingxue Fu Yixuan Wang Tao Feng 《Cybersecurity》 2025年第1期126-141,共16页
Due to the anonymous and contract transfer nature of blockchain cryptocurrencies,they are susceptible to fraudulent incidents such as phishing.This poses a threat to the property security of users and hinders the heal... Due to the anonymous and contract transfer nature of blockchain cryptocurrencies,they are susceptible to fraudulent incidents such as phishing.This poses a threat to the property security of users and hinders the healthy development of the entire blockchain community.While numerous studies have been conducted on identifying cryptocurrency phishing users,there is a lack of research that integrates class imbalance and transaction time characteristics.This paper introduces a novel graph neural network-based account identification model called CT-GCN+,which utilizes blockchain cryptocurrency phishing data.It incorporates an imbalanced data processing module for graphs to consider cryptocurrency transaction time.The model initially extracts time characteristics from the transaction graph using LSTM and Attention mechanisms.These time characteristics are then fused with underlying features,which are subsequently inputted into a combined SMOTE and GCN model for phishing user classification.Experimental results demonstrate that the CT-GCN+model achieves a phishing user identification accuracy of 97.22%and a phishing user identification area under the curve of 96.67%.This paper presents a valuable approach to phishing detection research within the blockchain and cryptocurrency ecosystems. 展开更多
关键词 Blockchain Information security Phishing detection imbalance data Transaction graph
原文传递
Imbalanced Problem in Initial Coin Offering Fraud Detection
16
作者 Yifan Zheng Maoning Wang 《国际计算机前沿大会会议论文集》 2022年第2期448-464,共17页
ICOs,the initial coin offerings,are a common way to raise funds for blockchain projects.Fraudulent ICO projects not only cause financial losses to investors but also cause a loss of confidence in the blockchain capita... ICOs,the initial coin offerings,are a common way to raise funds for blockchain projects.Fraudulent ICO projects not only cause financial losses to investors but also cause a loss of confidence in the blockchain capital market.Whitepapers are usually the most important information source,so it is feasible to identify fraudulent ICO programs by analyzing whitepapers.However,the fraud samples are difficult to collect,and the classes are imbalanced.In this study,we attempt to solve this problem by extracting linguistic features from the ICO whitepaper and using a variety of cutting-edge machine learning and deep learning algorithms to train the prediction model and attempt to resample,modify the weight and modify the loss function for imbalanced samples.Our optimal method achieves an AUC of 0.94 and an accuracy of 82%,which is better than other traditional standard methods,and the results provide important implications for ICO fraud detection. 展开更多
关键词 Initial coin offering Fraud detection data imbalance
原文传递
Detecting anomalies in blockchain transactions using machine learning classifiers and explainability analysis
17
作者 Mohammad Hasan Mohammad Shahriar Rahman +1 位作者 Helge Janicke Iqbal H.Sarker 《Blockchain(Research and Applications)》 EI 2024年第3期106-122,共17页
As the use of blockchain for digital payments continues to rise,it becomes susceptible to various malicious attacks.Successfully detecting anomalies within blockchain transactions is essential for bolstering trust in ... As the use of blockchain for digital payments continues to rise,it becomes susceptible to various malicious attacks.Successfully detecting anomalies within blockchain transactions is essential for bolstering trust in digital payments.However,the task of anomaly detection in blockchain transaction data is challenging due to the infrequent occurrence of illicit transactions.Although several studies have been conducted in the field,a limitation persists:the lack of explanations for the model’s predictions.This study seeks to overcome this limitation by integrating explainable artificial intelligence(XAI)techniques and anomaly rules into tree-based ensemble classifiers for detecting anomalous Bitcoin transactions.The shapley additive explanation(SHAP)method is employed to measure the contribution of each feature,and it is compatible with ensemble models.Moreover,we present rules for interpreting whether a Bitcoin transaction is anomalous or not.Additionally,we introduce an under-sampling algorithm named XGBCLUS,designed to balance anomalous and non-anomalous transaction data.This algorithm is compared against other commonly used under-sampling and over-sampling techniques.Finally,the outcomes of various tree-based single classifiers are compared with those of stacking and voting ensemble classifiers.Our experimental results demonstrate that:(i)XGBCLUS enhances true positive rate(TPR)and receiver operating characteristic-area under curve(ROC-AUC)scores compared to state-of-the-art under-sampling and over-sampling techniques,and(ii)our proposed ensemble classifiers outperform traditional single tree-based machine learning classifiers in terms of accuracy,TPR,and false positive rate(FPR)scores. 展开更多
关键词 Anomaly detection Blockchain Bitcoin transactions data imbalance data sampling Explainable AI Machine learning Decision tree Anomaly rules
原文传递
Credit Risk Prediction Based on Improved ADASYN Sampling and Optimized LightGBM
18
作者 Mei Song He Ma +1 位作者 Yi Zhu Mengdi Zhang 《Journal of Social Computing》 EI 2024年第3期232-241,共10页
A credit risk prediction model named KM-ADASYN-TL-FLLightGBM(KADT-FLightGBM)is proposed in this study.Firstly,to overcome the limitation of traditional sampling methods in dealing with imbalanced datasets,an improved ... A credit risk prediction model named KM-ADASYN-TL-FLLightGBM(KADT-FLightGBM)is proposed in this study.Firstly,to overcome the limitation of traditional sampling methods in dealing with imbalanced datasets,an improved ADASYN sampling with K-means clustering algorithm is constructed.Moreover,the Tomek Links method is used to filter the generated samples.Secondly,an utilized an optimized LightGBM algorithm with the Focal Loss is employed to training the model using the datasets obtained by the improved ADASYN sampling.Finally,the comparative analysis between the ensemble model and other different sampling methodologies is conducted on the Lending Club dataset.The results demonstrate that the proposed model effectively minimizes the misclassification of minority classes in credit risk prediction and can be used as a reference for similar studies. 展开更多
关键词 imbalance data credit risk prediction Focal Loss ADAPTIVE hybrid sampling
原文传递
Review of training-free event-related potential classification approaches in the World Robot Contest 2021 被引量:1
19
作者 Huanyu Wu Dongrui Wu 《Brain Science Advances》 2022年第2期82-98,共17页
Recently,rapid serial visual presentation(RSVP),as a new event-related potential(ERP)paradigm,has become one of the most popular forms in electroencephalogram signal processing technologies.Several improvement approac... Recently,rapid serial visual presentation(RSVP),as a new event-related potential(ERP)paradigm,has become one of the most popular forms in electroencephalogram signal processing technologies.Several improvement approaches have been proposed to improve the performance of RSVP analysis.In brain-computer interface systems based on RSVP,the family of approaches that do not depend on training specific parameters is essential.The participating teams proposed several effective training-free frameworks of algorithms in the ERP competition of the BCI Controlled Robot Contest in World Robot Contest 2021.This paper discusses the effectiveness of various approaches in improving the performance of the system without requiring training and suggests how to apply these approaches in a practical system.First,appropriate preprocessing techniques will greatly improve the results.Then,the non-deep learning algorithm may be more stable than the deep learning approach.Furthermore,ensemble learning can make the model more stable and robust. 展开更多
关键词 brain-computer interfaces ELECTROENCEPHALOGRAM rapid serial visual presentation(RSVP) data imbalance training-free
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部