期刊文献+
共找到275,483篇文章
< 1 2 250 >
每页显示 20 50 100
Engine Failure Prediction on Large-Scale CMAPSS Data Using Hybrid Feature Selection and Imbalance-Aware Learning
1
作者 Ahmad Junaid Abid Iqbal +3 位作者 Abuzar Khan Ghassan Husnain Abdul-Rahim Ahmad Mohammed Al-Naeem 《Computers, Materials & Continua》 2026年第4期1485-1508,共24页
Most predictive maintenance studies have emphasized accuracy but provide very little focus on Interpretability or deployment readiness.This study improves on prior methods by developing a small yet robust system that ... Most predictive maintenance studies have emphasized accuracy but provide very little focus on Interpretability or deployment readiness.This study improves on prior methods by developing a small yet robust system that can predict when turbofan engines will fail.It uses the NASA CMAPSS dataset,which has over 200,000 engine cycles from260 engines.The process begins with systematic preprocessing,which includes imputation,outlier removal,scaling,and labelling of the remaining useful life.Dimensionality is reduced using a hybrid selection method that combines variance filtering,recursive elimination,and gradient-boosted importance scores,yielding a stable set of 10 informative sensors.To mitigate class imbalance,minority cases are oversampled,and class-weighted losses are applied during training.Benchmarking is carried out with logistic regression,gradient boosting,and a recurrent design that integrates gated recurrent units with long short-term memory networks.The Long Short-Term Memory–Gated Recurrent Unit(LSTM–GRU)hybrid achieved the strongest performance with an F1 score of 0.92,precision of 0.93,recall of 0.91,ReceiverOperating Characteristic–AreaUnder the Curve(ROC-AUC)of 0.97,andminority recall of 0.75.Interpretability testing using permutation importance and Shapley values indicates that sensors 13,15,and 11 are the most important indicators of engine wear.The proposed system combines imbalance handling,feature reduction,and Interpretability into a practical design suitable for real industrial settings. 展开更多
关键词 Predictive maintenance CMAPSS dataset feature selection class imbalance LSTM-GRUhybrid model INTERPRETABILITY industrial deployment
在线阅读 下载PDF
Detecting Anomalies in FinTech: A Graph Neural Network and Feature Selection Perspective
2
作者 Vinh Truong Hoang Nghia Dinh +3 位作者 Viet-Tuan Le Kiet Tran-Trung Bay Nguyen Van Kittikhun Meethongjan 《Computers, Materials & Continua》 2026年第1期207-246,共40页
The Financial Technology(FinTech)sector has witnessed rapid growth,resulting in increasingly complex and high-volume digital transactions.Although this expansion improves efficiency and accessibility,it also introduce... The Financial Technology(FinTech)sector has witnessed rapid growth,resulting in increasingly complex and high-volume digital transactions.Although this expansion improves efficiency and accessibility,it also introduces significant vulnerabilities,including fraud,money laundering,and market manipulation.Traditional anomaly detection techniques often fail to capture the relational and dynamic characteristics of financial data.Graph Neural Networks(GNNs),capable of modeling intricate interdependencies among entities,have emerged as a powerful framework for detecting subtle and sophisticated anomalies.However,the high-dimensionality and inherent noise of FinTech datasets demand robust feature selection strategies to improve model scalability,performance,and interpretability.This paper presents a comprehensive survey of GNN-based approaches for anomaly detection in FinTech,with an emphasis on the synergistic role of feature selection.We examine the theoretical foundations of GNNs,review state-of-the-art feature selection techniques,analyze their integration with GNNs,and categorize prevalent anomaly types in FinTech applications.In addition,we discuss practical implementation challenges,highlight representative case studies,and propose future research directions to advance the field of graph-based anomaly detection in financial systems. 展开更多
关键词 GNN SECURITY ECOMMERCE FinTech abnormal detection feature selection
在线阅读 下载PDF
Federated Multi-Label Feature Selection via Dual-Layer Hybrid Breeding Cooperative Particle Swarm Optimization with Manifold and Sparsity Regularization
3
作者 Songsong Zhang Huazhong Jin +5 位作者 Zhiwei Ye Jia Yang Jixin Zhang Dongfang Wu Xiao Zheng Dingfeng Song 《Computers, Materials & Continua》 2026年第1期1141-1159,共19页
Multi-label feature selection(MFS)is a crucial dimensionality reduction technique aimed at identifying informative features associated with multiple labels.However,traditional centralized methods face significant chal... Multi-label feature selection(MFS)is a crucial dimensionality reduction technique aimed at identifying informative features associated with multiple labels.However,traditional centralized methods face significant challenges in privacy-sensitive and distributed settings,often neglecting label dependencies and suffering from low computational efficiency.To address these issues,we introduce a novel framework,Fed-MFSDHBCPSO—federated MFS via dual-layer hybrid breeding cooperative particle swarm optimization algorithm with manifold and sparsity regularization(DHBCPSO-MSR).Leveraging the federated learning paradigm,Fed-MFSDHBCPSO allows clients to perform local feature selection(FS)using DHBCPSO-MSR.Locally selected feature subsets are encrypted with differential privacy(DP)and transmitted to a central server,where they are securely aggregated and refined through secure multi-party computation(SMPC)until global convergence is achieved.Within each client,DHBCPSO-MSR employs a dual-layer FS strategy.The inner layer constructs sample and label similarity graphs,generates Laplacian matrices to capture the manifold structure between samples and labels,and applies L2,1-norm regularization to sparsify the feature subset,yielding an optimized feature weight matrix.The outer layer uses a hybrid breeding cooperative particle swarm optimization algorithm to further refine the feature weight matrix and identify the optimal feature subset.The updated weight matrix is then fed back to the inner layer for further optimization.Comprehensive experiments on multiple real-world multi-label datasets demonstrate that Fed-MFSDHBCPSO consistently outperforms both centralized and federated baseline methods across several key evaluation metrics. 展开更多
关键词 Multi-label feature selection federated learning manifold regularization sparse constraints hybrid breeding optimization algorithm particle swarm optimizatio algorithm privacy protection
在线阅读 下载PDF
Efficient Arabic Essay Scoring with Hybrid Models: Feature Selection, Data Optimization, and Performance Trade-Offs
4
作者 Mohamed Ezz Meshrif Alruily +4 位作者 Ayman Mohamed Mostafa Alaa SAlaerjan Bader Aldughayfiq Hisham Allahem Abdulaziz Shehab 《Computers, Materials & Continua》 2026年第1期2274-2301,共28页
Automated essay scoring(AES)systems have gained significant importance in educational settings,offering a scalable,efficient,and objective method for evaluating student essays.However,developing AES systems for Arabic... Automated essay scoring(AES)systems have gained significant importance in educational settings,offering a scalable,efficient,and objective method for evaluating student essays.However,developing AES systems for Arabic poses distinct challenges due to the language’s complex morphology,diglossia,and the scarcity of annotated datasets.This paper presents a hybrid approach to Arabic AES by combining text-based,vector-based,and embeddingbased similarity measures to improve essay scoring accuracy while minimizing the training data required.Using a large Arabic essay dataset categorized into thematic groups,the study conducted four experiments to evaluate the impact of feature selection,data size,and model performance.Experiment 1 established a baseline using a non-machine learning approach,selecting top-N correlated features to predict essay scores.The subsequent experiments employed 5-fold cross-validation.Experiment 2 showed that combining embedding-based,text-based,and vector-based features in a Random Forest(RF)model achieved an R2 of 88.92%and an accuracy of 83.3%within a 0.5-point tolerance.Experiment 3 further refined the feature selection process,demonstrating that 19 correlated features yielded optimal results,improving R2 to 88.95%.In Experiment 4,an optimal data efficiency training approach was introduced,where training data portions increased from 5%to 50%.The study found that using just 10%of the data achieved near-peak performance,with an R2 of 85.49%,emphasizing an effective trade-off between performance and computational costs.These findings highlight the potential of the hybrid approach for developing scalable Arabic AES systems,especially in low-resource environments,addressing linguistic challenges while ensuring efficient data usage. 展开更多
关键词 Automated essay scoring text-based features vector-based features embedding-based features feature selection optimal data efficiency
在线阅读 下载PDF
Optimizing UCS Prediction Models through XAI-Based Feature Selection in Soil Stabilization
5
作者 Ahmed Mohammed Awad Mohammed Omayma Husain +5 位作者 Mosab Hamdan Abdalmomen Mohammed Abdullah Ansari Atef Badr Abubakar Elsafi Abubakr Siddig 《Computer Modeling in Engineering & Sciences》 2026年第2期524-549,共26页
Unconfined Compressive Strength(UCS)is a key parameter for the assessment of the stability and performance of stabilized soils,yet traditional laboratory testing is both time and resource intensive.In this study,an in... Unconfined Compressive Strength(UCS)is a key parameter for the assessment of the stability and performance of stabilized soils,yet traditional laboratory testing is both time and resource intensive.In this study,an interpretable machine learning approach to UCS prediction is presented,pairing five models(Random Forest(RF),Gradient Boosting(GB),Extreme Gradient Boosting(XGB),CatBoost,and K-Nearest Neighbors(KNN))with SHapley Additive exPlanations(SHAP)for enhanced interpretability and to guide feature removal.A complete dataset of 12 geotechnical and chemical parameters,i.e.,Atterberg limits,compaction properties,stabilizer chemistry,dosage,curing time,was used to train and test the models.R2,RMSE,MSE,and MAE were used to assess performance.Initial results with all 12 features indicated that boosting-based models(GB,XGB,CatBoost)exhibited the highest predictive accuracy(R^(2)=0.93)with satisfactory generalization on test data,followed by RF and KNN.SHAP analysis consistently picked CaO content,curing time,stabilizer dosage,and compaction parameters as the most important features,aligning with established soil stabilization mechanisms.Models were then re-trained on the top 8 and top 5 SHAP-ranked features.Interestingly,GB,XGB,and CatBoost maintained comparable accuracy with reduced input sets,while RF was moderately sensitive and KNN was somewhat better owing to reduced dimensionality.The findings confirm that feature reduction through SHAP enables cost-effective UCS prediction through the reduction of laboratory test requirements without significant accuracy loss.The suggested hybrid approach offers an explainable,interpretable,and cost-effective tool for geotechnical engineering practice. 展开更多
关键词 Explainable AI feature selection machine learning SHAP analysis soil stabilization unconfined compressive strength
在线阅读 下载PDF
Leveraging Opposition-Based Learning in Particle Swarm Optimization for Effective Feature Selection
6
作者 Fei Yu Zhenya Diao +3 位作者 Hongrun Wu Yingpin Chen Xuewen Xia Yuanxiang Li 《Computers, Materials & Continua》 2026年第4期1148-1179,共32页
Feature selection serves as a critical preprocessing step inmachine learning,focusing on identifying and preserving the most relevant features to improve the efficiency and performance of classification algorithms.Par... Feature selection serves as a critical preprocessing step inmachine learning,focusing on identifying and preserving the most relevant features to improve the efficiency and performance of classification algorithms.Particle Swarm Optimization has demonstrated significant potential in addressing feature selection challenges.However,there are inherent limitations in Particle Swarm Optimization,such as the delicate balance between exploration and exploitation,susceptibility to local optima,and suboptimal convergence rates,hinder its performance.To tackle these issues,this study introduces a novel Leveraged Opposition-Based Learning method within Fitness Landscape Particle Swarm Optimization,tailored for wrapper-based feature selection.The proposed approach integrates:(1)a fitness-landscape adaptive strategy to dynamically balance exploration and exploitation,(2)the lever principle within Opposition-Based Learning to improve search efficiency,and(3)a Local Selection and Re-optimization mechanism combined with random perturbation to expedite convergence and enhance the quality of the optimal feature subset.The effectiveness of is rigorously evaluated on 24 benchmark datasets and compared against 13 advancedmetaheuristic algorithms.Experimental results demonstrate that the proposed method outperforms the compared algorithms in classification accuracy on over half of the datasets,whilst also significantly reducing the number of selected features.These findings demonstrate its effectiveness and robustness in feature selection tasks. 展开更多
关键词 feature selection fitness landscape opposition-based learning principle of the lever particle swarm optimization
在线阅读 下载PDF
GSLDWOA: A Feature Selection Algorithm for Intrusion Detection Systems in IIoT
7
作者 Wanwei Huang Huicong Yu +3 位作者 Jiawei Ren Kun Wang Yanbu Guo Lifeng Jin 《Computers, Materials & Continua》 2026年第1期2006-2029,共24页
Existing feature selection methods for intrusion detection systems in the Industrial Internet of Things often suffer from local optimality and high computational complexity.These challenges hinder traditional IDS from... Existing feature selection methods for intrusion detection systems in the Industrial Internet of Things often suffer from local optimality and high computational complexity.These challenges hinder traditional IDS from effectively extracting features while maintaining detection accuracy.This paper proposes an industrial Internet ofThings intrusion detection feature selection algorithm based on an improved whale optimization algorithm(GSLDWOA).The aim is to address the problems that feature selection algorithms under high-dimensional data are prone to,such as local optimality,long detection time,and reduced accuracy.First,the initial population’s diversity is increased using the Gaussian Mutation mechanism.Then,Non-linear Shrinking Factor balances global exploration and local development,avoiding premature convergence.Lastly,Variable-step Levy Flight operator and Dynamic Differential Evolution strategy are introduced to improve the algorithm’s search efficiency and convergence accuracy in highdimensional feature space.Experiments on the NSL-KDD and WUSTL-IIoT-2021 datasets demonstrate that the feature subset selected by GSLDWOA significantly improves detection performance.Compared to the traditional WOA algorithm,the detection rate and F1-score increased by 3.68%and 4.12%.On the WUSTL-IIoT-2021 dataset,accuracy,recall,and F1-score all exceed 99.9%. 展开更多
关键词 Industrial Internet of Things intrusion detection system feature selection whale optimization algorithm Gaussian mutation
在线阅读 下载PDF
A Unified Feature Selection Framework Combining Mutual Information and Regression Optimization for Multi-Label Learning
8
作者 Hyunki Lim 《Computers, Materials & Continua》 2026年第4期1262-1281,共20页
High-dimensional data causes difficulties in machine learning due to high time consumption and large memory requirements.In particular,in amulti-label environment,higher complexity is required asmuch as the number of ... High-dimensional data causes difficulties in machine learning due to high time consumption and large memory requirements.In particular,in amulti-label environment,higher complexity is required asmuch as the number of labels.Moreover,an optimization problem that fully considers all dependencies between features and labels is difficult to solve.In this study,we propose a novel regression-basedmulti-label feature selectionmethod that integrates mutual information to better exploit the underlying data structure.By incorporating mutual information into the regression formulation,the model captures not only linear relationships but also complex non-linear dependencies.The proposed objective function simultaneously considers three types of relationships:(1)feature redundancy,(2)featurelabel relevance,and(3)inter-label dependency.These three quantities are computed usingmutual information,allowing the proposed formulation to capture nonlinear dependencies among variables.These three types of relationships are key factors in multi-label feature selection,and our method expresses them within a unified formulation,enabling efficient optimization while simultaneously accounting for all of them.To efficiently solve the proposed optimization problem under non-negativity constraints,we develop a gradient-based optimization algorithm with fast convergence.Theexperimental results on sevenmulti-label datasets show that the proposed method outperforms existingmulti-label feature selection techniques. 展开更多
关键词 feature selection multi-label learning regression model optimization mutual information
在线阅读 下载PDF
Effects of feature selection and normalization on network intrusion detection 被引量:3
9
作者 Mubarak Albarka Umar Zhanfang Chen +1 位作者 Khaled Shuaib Yan Liu 《Data Science and Management》 2025年第1期23-39,共17页
The rapid rise of cyberattacks and the gradual failure of traditional defense systems and approaches led to using artificial intelligence(AI)techniques(such as machine learning(ML)and deep learning(DL))to build more e... The rapid rise of cyberattacks and the gradual failure of traditional defense systems and approaches led to using artificial intelligence(AI)techniques(such as machine learning(ML)and deep learning(DL))to build more efficient and reliable intrusion detection systems(IDSs).However,the advent of larger IDS datasets has negatively impacted the performance and computational complexity of AI-based IDSs.Many researchers used data preprocessing techniques such as feature selection and normalization to overcome such issues.While most of these researchers reported the success of these preprocessing techniques on a shallow level,very few studies have been performed on their effects on a wider scale.Furthermore,the performance of an IDS model is subject to not only the utilized preprocessing techniques but also the dataset and the ML/DL algorithm used,which most of the existing studies give little emphasis on.Thus,this study provides an in-depth analysis of feature selection and normalization effects on IDS models built using three IDS datasets:NSL-KDD,UNSW-NB15,and CSE–CIC–IDS2018,and various AI algorithms.A wrapper-based approach,which tends to give superior performance,and min-max normalization methods were used for feature selection and normalization,respectively.Numerous IDS models were implemented using the full and feature-selected copies of the datasets with and without normalization.The models were evaluated using popular evaluation metrics in IDS modeling,intra-and inter-model comparisons were performed between models and with state-of-the-art works.Random forest(RF)models performed better on NSL-KDD and UNSW-NB15 datasets with accuracies of 99.86%and 96.01%,respectively,whereas artificial neural network(ANN)achieved the best accuracy of 95.43%on the CSE–CIC–IDS2018 dataset.The RF models also achieved an excellent performance compared to recent works.The results show that normalization and feature selection positively affect IDS modeling.Furthermore,while feature selection benefits simpler algorithms(such as RF),normalization is more useful for complex algorithms like ANNs and deep neural networks(DNNs),and algorithms such as Naive Bayes are unsuitable for IDS modeling.The study also found that the UNSW-NB15 and CSE–CIC–IDS2018 datasets are more complex and more suitable for building and evaluating modern-day IDS than the NSL-KDD dataset.Our findings suggest that prioritizing robust algorithms like RF,alongside complex models such as ANN and DNN,can significantly enhance IDS performance.These insights provide valuable guidance for managers to develop more effective security measures by focusing on high detection rates and low false alert rates. 展开更多
关键词 CYBERSECURITY Intrusion detection system Machine learning Deep learning feature selection NORMALIZATION
在线阅读 下载PDF
A Feature Selection Method for Software Defect Prediction Based on Improved Beluga Whale Optimization Algorithm 被引量:1
10
作者 Shaoming Qiu Jingjie He +1 位作者 Yan Wang Bicong E 《Computers, Materials & Continua》 2025年第6期4879-4898,共20页
Software defect prediction(SDP)aims to find a reliable method to predict defects in specific software projects and help software engineers allocate limited resources to release high-quality software products.Software ... Software defect prediction(SDP)aims to find a reliable method to predict defects in specific software projects and help software engineers allocate limited resources to release high-quality software products.Software defect prediction can be effectively performed using traditional features,but there are some redundant or irrelevant features in them(the presence or absence of this feature has little effect on the prediction results).These problems can be solved using feature selection.However,existing feature selection methods have shortcomings such as insignificant dimensionality reduction effect and low classification accuracy of the selected optimal feature subset.In order to reduce the impact of these shortcomings,this paper proposes a new feature selection method Cubic TraverseMa Beluga whale optimization algorithm(CTMBWO)based on the improved Beluga whale optimization algorithm(BWO).The goal of this study is to determine how well the CTMBWO can extract the features that are most important for correctly predicting software defects,improve the accuracy of fault prediction,reduce the number of the selected feature and mitigate the risk of overfitting,thereby achieving more efficient resource utilization and better distribution of test workload.The CTMBWO comprises three main stages:preprocessing the dataset,selecting relevant features,and evaluating the classification performance of the model.The novel feature selection method can effectively improve the performance of SDP.This study performs experiments on two software defect datasets(PROMISE,NASA)and shows the method’s classification performance using four detailed evaluation metrics,Accuracy,F1-score,MCC,AUC and Recall.The results indicate that the approach presented in this paper achieves outstanding classification performance on both datasets and has significant improvement over the baseline models. 展开更多
关键词 Software defect prediction feature selection beluga optimization algorithm triangular wandering strategy cauchy mutation reverse learning
在线阅读 下载PDF
Congruent Feature Selection Method to Improve the Efficacy of Machine Learning-Based Classification in Medical Image Processing
11
作者 Mohd Anjum Naoufel Kraiem +2 位作者 Hong Min Ashit Kumar Dutta Yousef Ibrahim Daradkeh 《Computer Modeling in Engineering & Sciences》 SCIE EI 2025年第1期357-384,共28页
Machine learning(ML)is increasingly applied for medical image processing with appropriate learning paradigms.These applications include analyzing images of various organs,such as the brain,lung,eye,etc.,to identify sp... Machine learning(ML)is increasingly applied for medical image processing with appropriate learning paradigms.These applications include analyzing images of various organs,such as the brain,lung,eye,etc.,to identify specific flaws/diseases for diagnosis.The primary concern of ML applications is the precise selection of flexible image features for pattern detection and region classification.Most of the extracted image features are irrelevant and lead to an increase in computation time.Therefore,this article uses an analytical learning paradigm to design a Congruent Feature Selection Method to select the most relevant image features.This process trains the learning paradigm using similarity and correlation-based features over different textural intensities and pixel distributions.The similarity between the pixels over the various distribution patterns with high indexes is recommended for disease diagnosis.Later,the correlation based on intensity and distribution is analyzed to improve the feature selection congruency.Therefore,the more congruent pixels are sorted in the descending order of the selection,which identifies better regions than the distribution.Now,the learning paradigm is trained using intensity and region-based similarity to maximize the chances of selection.Therefore,the probability of feature selection,regardless of the textures and medical image patterns,is improved.This process enhances the performance of ML applications for different medical image processing.The proposed method improves the accuracy,precision,and training rate by 13.19%,10.69%,and 11.06%,respectively,compared to other models for the selected dataset.The mean error and selection time is also reduced by 12.56%and 13.56%,respectively,compared to the same models and dataset. 展开更多
关键词 Computer vision feature selection machine learning region detection texture analysis image classification medical images
在线阅读 下载PDF
Optimizing Forecast Accuracy in Cryptocurrency Markets:Evaluating Feature Selection Techniques for Technical Indicators
12
作者 Ahmed El Youssefi Abdelaaziz Hessane +1 位作者 Imad Zeroual Yousef Farhaoui 《Computers, Materials & Continua》 2025年第5期3411-3433,共23页
This study provides a systematic investigation into the influence of feature selection methods on cryptocurrency price forecasting models employing technical indicators.In this work,over 130 technical indicators—cove... This study provides a systematic investigation into the influence of feature selection methods on cryptocurrency price forecasting models employing technical indicators.In this work,over 130 technical indicators—covering momentum,volatility,volume,and trend-related technical indicators—are subjected to three distinct feature selection approaches.Specifically,mutual information(MI),recursive feature elimination(RFE),and random forest importance(RFI).By extracting an optimal set of 20 predictors,the proposed framework aims to mitigate redundancy and overfitting while enhancing interpretability.These feature subsets are integrated into support vector regression(SVR),Huber regressors,and k-nearest neighbors(KNN)models to forecast the prices of three leading cryptocurrencies—Bitcoin(BTC/USDT),Ethereum(ETH/USDT),and Binance Coin(BNB/USDT)—across horizons ranging from 1 to 20 days.Model evaluation employs the coefficient of determination(R2)and the root mean squared logarithmic error(RMSLE),alongside a walk-forward validation scheme to approximate real-world trading contexts.Empirical results indicate that incorporating momentum and volatility measures substantially improves predictive accuracy,with particularly pronounced effects observed at longer forecast windows.Moreover,indicators related to volume and trend provide incremental benefits in select market conditions.Notably,an 80%–85% reduction in the original feature set frequently maintains or enhances model performance relative to the complete indicator set.These findings highlight the critical role of targeted feature selection in addressing high-dimensional financial data challenges while preserving model robustness.This research advances the field of cryptocurrency forecasting by offering a rigorous comparison of feature selection methods and their effects on multiple digital assets and prediction horizons.The outcomes highlight the importance of dimension-reduction strategies in developing more efficient and resilient forecasting algorithms.Future efforts should incorporate high-frequency data and explore alternative selection techniques to further refine predictive accuracy in this highly volatile domain. 展开更多
关键词 Cryptocurrency forecasting technical indicator feature selection walk-forward VOLATILITY MOMENTUM TREND
在线阅读 下载PDF
A Hybrid Feature Selection Method for Advanced Persistent Threat Detection
13
作者 Adam Khalid Anazida Zainal +2 位作者 Fuad A.Ghaleb Bander Ali Saleh Al-rimy Yussuf Ahmed 《Computers, Materials & Continua》 2025年第9期5665-5691,共27页
Advanced Persistent Threats(APTs)represent one of the most complex and dangerous categories of cyber-attacks characterised by their stealthy behaviour,long-term persistence,and ability to bypass traditional detection ... Advanced Persistent Threats(APTs)represent one of the most complex and dangerous categories of cyber-attacks characterised by their stealthy behaviour,long-term persistence,and ability to bypass traditional detection systems.The complexity of real-world network data poses significant challenges in detection.Machine learning models have shown promise in detecting APTs;however,their performance often suffers when trained on large datasets with redundant or irrelevant features.This study presents a novel,hybrid feature selection method designed to improve APT detection by reducing dimensionality while preserving the informative characteristics of the data.It combines Mutual Information(MI),Symmetric Uncertainty(SU)and Minimum Redundancy Maximum Relevance(mRMR)to enhance feature selection.MI and SU assess feature relevance,while mRMR maximises relevance and minimises redundancy,ensuring that the most impactful features are prioritised.This method addresses redundancy among selected features,improving the overall efficiency and effectiveness of the detection model.Experiments on a real-world APT datasets were conducted to evaluate the proposed method.Multiple classifiers including,Random Forest,Support Vector Machine(SVM),Gradient Boosting,and Neural Networks were used to assess classification performance.The results demonstrate that the proposed feature selection method significantly enhances detection accuracy compared to baseline models trained on the full feature set.The Random Forest algorithm achieved the highest performance,with near-perfect accuracy,precision,recall,and F1 scores(99.97%).The proposed adaptive thresholding algorithm within the selection method allows each classifier to benefit from a reduced and optimised feature space,resulting in improved training and predictive performance.This research offers a scalable and classifier-agnostic solution for dimensionality reduction in cybersecurity applications. 展开更多
关键词 Advanced persistent threats hybrid-based techniques feature selection data processing symmetric uncertainty mutual information minimum redundancy APT detection
在线阅读 下载PDF
Heart Disease Prediction Model Using Feature Selection and Ensemble Deep Learning with Optimized Weight
14
作者 Iman S.Al-Mahdi Saad M.Darwish Magda M.Madbouly 《Computer Modeling in Engineering & Sciences》 2025年第4期875-909,共35页
Heart disease prediction is a critical issue in healthcare,where accurate early diagnosis can save lives and reduce healthcare costs.The problem is inherently complex due to the high dimensionality of medical data,irr... Heart disease prediction is a critical issue in healthcare,where accurate early diagnosis can save lives and reduce healthcare costs.The problem is inherently complex due to the high dimensionality of medical data,irrelevant or redundant features,and the variability in risk factors such as age,lifestyle,andmedical history.These challenges often lead to inefficient and less accuratemodels.Traditional predictionmethodologies face limitations in effectively handling large feature sets and optimizing classification performance,which can result in overfitting poor generalization,and high computational cost.This work proposes a novel classification model for heart disease prediction that addresses these challenges by integrating feature selection through a Genetic Algorithm(GA)with an ensemble deep learning approach optimized using the Tunicate Swarm Algorithm(TSA).GA selects the most relevant features,reducing dimensionality and improvingmodel efficiency.Theselected features are then used to train an ensemble of deep learning models,where the TSA optimizes the weight of each model in the ensemble to enhance prediction accuracy.This hybrid approach addresses key challenges in the field,such as high dimensionality,redundant features,and classification performance,by introducing an efficient feature selection mechanism and optimizing the weighting of deep learning models in the ensemble.These enhancements result in a model that achieves superior accuracy,generalization,and efficiency compared to traditional methods.The proposed model demonstrated notable advancements in both prediction accuracy and computational efficiency over traditionalmodels.Specifically,it achieved an accuracy of 97.5%,a sensitivity of 97.2%,and a specificity of 97.8%.Additionally,with a 60-40 data split and 5-fold cross-validation,the model showed a significant reduction in training time(90 s),memory consumption(950 MB),and CPU usage(80%),highlighting its effectiveness in processing large,complex medical datasets for heart disease prediction. 展开更多
关键词 Heart disease prediction feature selection ensemble deep learning optimization genetic algorithm(GA) ensemble deep learning tunicate swarm algorithm(TSA) feature selection
在线阅读 下载PDF
Efficient soil moisture estimation on the Qinghai-Xizang Plateau via machine learning and optimized feature selection
15
作者 JIA Shichao SUN Wen +1 位作者 WEI Sihao SUN Rui 《Journal of Arid Land》 2025年第8期1147-1167,共21页
Soil moisture is a key parameter in the exchange of energy and water between the land surface and the atmosphere.This parameter plays an important role in the dynamics of permafrost on the Qinghai-Xizang Plateau,China... Soil moisture is a key parameter in the exchange of energy and water between the land surface and the atmosphere.This parameter plays an important role in the dynamics of permafrost on the Qinghai-Xizang Plateau,China,as well as in the related ecological and hydrological processes.However,the region's complex terrain and extreme climatic conditions result in low-accuracy soil moisture estimations using traditional remote sensing techniques.Thus,this study considered parameters of the backscatter coefficient of Sentinel-1A ground range detected(GRD)data,the polarization decomposition parameters of Sentinel-1A single-look complex(SLC)data,the normalized difference vegetation index(NDVI)based on Sentinel-2B data,and the topographic factors based on digital elevation model(DEM)data.By combining these parameters with a machine learning model,we established a feature selection rule.A cumulative importance threshold was derived for feature variables,and those variables that failed to meet the threshold were eliminated based on variations in the coefficient of determination(R^(2))and the unbiased root mean square error(ubRMSE).The eight most influential variables were selected and combined with the CatBoost model for soil moisture inversion,and the SHapley Additive exPlanations(SHAP)method was used to analyze the importance of these variables.The results demonstrated that the optimized model significantly improved the accuracy of soil moisture inversion.Compared to the unfiltered model,the optimal feature combination led to a 0.09 increase in R^(2)and a 0.7%reduction in ubRMSE.Ultimately,the optimized model achieved a R²of 0.87 and an ubRMSE of 5.6%.Analysis revealed that soil particle size had significant impact on soil water retention capacity.The impact of vegetation on the estimated soil moisture on the Qinghai-Xizang Plateau was considerable,demonstrating a significant positive correlation.Moreover,the microtopographical features of hummocks interfered with soil moisture estimation,indicating that such terrain effects warrant increased attention in future studies within the permafrost regions.The developed method not only enhances the accuracy of soil moisture retrieval in the complex terrain of the Qinghai-Xizang Plateau,but also exhibits high computational efficiency(with a relative time reduction of 18.5%),striking an excellent balance between accuracy and efficiency.This approach provides a robust framework for efficient soil moisture monitoring in remote areas with limited ground data,offering critical insights for ecological conservation,water resource management,and climate change adaptation on the Qinghai-Xizang Plateau. 展开更多
关键词 soil moisture machine learning feature selection radar and optical remote sensing polarization decomposition CatBoost model Qinghai-Xizang Plateau
在线阅读 下载PDF
A Hybrid Feature Selection and Clustering-Based Ensemble Learning Approach for Real-Time Fraud Detection in Financial Transactions
16
作者 Naif Almusallam Junaid Qayyum 《Computers, Materials & Continua》 2025年第11期3653-3687,共35页
This paper proposes a novel hybrid fraud detection framework that integrates multi-stage feature selection,unsupervised clustering,and ensemble learning to improve classification performance in financial transaction m... This paper proposes a novel hybrid fraud detection framework that integrates multi-stage feature selection,unsupervised clustering,and ensemble learning to improve classification performance in financial transaction monitoring systems.The framework is structured into three core layers:(1)feature selection using Recursive Feature Elimination(RFE),Principal Component Analysis(PCA),and Mutual Information(MI)to reduce dimensionality and enhance input relevance;(2)anomaly detection through unsupervised clustering using K-Means,Density-Based Spatial Clustering(DBSCAN),and Hierarchical Clustering to flag suspicious patterns in unlabeled data;and(3)final classification using a voting-based hybrid ensemble of Support Vector Machine(SVM),Random Forest(RF),and Gradient Boosting Classifier(GBC).The experimental evaluation is conducted on a synthetically generated dataset comprising one million financial transactions,with 5% labelled as fraudulent,simulating realistic fraud rates and behavioural features,including transaction time,origin,amount,and geo-location.The proposed model demonstrated a significant improvement over baseline classifiers,achieving an accuracy of 99%,a precision of 99%,a recall of 97%,and an F1-score of 99%.Compared to individual models,it yielded a 9% gain in overall detection accuracy.It reduced the false positive rate to below 3.5%,thereby minimising the operational costs associated with manually reviewing false alerts.The model’s interpretability is enhanced by the integration of Shapley Additive Explanations(SHAP)values for feature importance,supporting transparency and regulatory auditability.These results affirm the practical relevance of the proposed system for deployment in real-time fraud detection scenarios such as credit card transactions,mobile banking,and cross-border payments.The study also highlights future directions,including the deployment of lightweight models and the integration of multimodal data for scalable fraud analytics. 展开更多
关键词 Fraud detection financial transactions economic impact feature selection CLUSTERING ensemble learning
在线阅读 下载PDF
A Filter-Based Feature Selection Framework to Detect Phishing URLs Using Stacking Ensemble Machine Learning
17
作者 Nimra Bari Tahir Saleem +3 位作者 Munam Shah Abdulmohsen Algarni Asma Patel Insaf Ullah 《Computer Modeling in Engineering & Sciences》 2025年第10期1167-1187,共21页
Today,phishing is an online attack designed to obtain sensitive information such as credit card and bank account numbers,passwords,and usernames.We can find several anti-phishing solutions,such as heuristic detection,... Today,phishing is an online attack designed to obtain sensitive information such as credit card and bank account numbers,passwords,and usernames.We can find several anti-phishing solutions,such as heuristic detection,virtual similarity detection,black and white lists,and machine learning(ML).However,phishing attempts remain a problem,and establishing an effective anti-phishing strategy is a work in progress.Furthermore,while most antiphishing solutions achieve the highest levels of accuracy on a given dataset,their methods suffer from an increased number of false positives.These methods are ineffective against zero-hour attacks.Phishing sites with a high False Positive Rate(FPR)are considered genuine because they can cause people to lose a lot ofmoney by visiting them.Feature selection is critical when developing phishing detection strategies.Good feature selection helps improve accuracy;however,duplicate features can also increase noise in the dataset and reduce the accuracy of the algorithm.Therefore,a combination of filter-based feature selection methods is proposed to detect phishing attacks,including constant feature removal,duplicate feature removal,quasi-feature removal,correlated feature removal,mutual information extraction,and Analysis of Variance(ANOVA)testing.The technique has been tested with differentMachine Learning classifiers:Random Forest,Artificial Neural Network(ANN),Ada-Boost,Extreme Gradient Boosting(XGBoost),Logistic Regression,Decision Trees,Gradient Boosting Classifiers,Support Vector Machine(SVM),and two types of ensemble models,stacking and majority voting to gain A low false positive rate is achieved.Stacked ensemble classifiers(gradient boosting,randomforest,support vector machine)achieve 1.31%FPR and 98.17%accuracy on Dataset 1,2.81%FPR and Dataset 3 shows 2.81%FPR and 97.61%accuracy,while Dataset 2 shows 3.47%FPR and 96.47%accuracy. 展开更多
关键词 Phishing detection feature selection phishing detection stacking ensemble machine learning phishing URL
在线阅读 下载PDF
Advanced Feature Selection Techniques in Medical Imaging--A Systematic Literature Review
18
作者 Sunawar Khan Tehseen Mazhar +5 位作者 Naila Sammar Naz Fahed Ahmed Tariq Shahzad Atif Ali Muhammad Adnan Khan Habib Hamam 《Computers, Materials & Continua》 2025年第11期2347-2401,共55页
Feature selection(FS)plays a crucial role in medical imaging by reducing dimensionality,improving computational efficiency,and enhancing diagnostic accuracy.Traditional FS techniques,including filter,wrapper,and embed... Feature selection(FS)plays a crucial role in medical imaging by reducing dimensionality,improving computational efficiency,and enhancing diagnostic accuracy.Traditional FS techniques,including filter,wrapper,and embedded methods,have been widely used but often struggle with high-dimensional and heterogeneous medical imaging data.Deep learning-based FS methods,particularly Convolutional Neural Networks(CNNs)and autoencoders,have demonstrated superior performance but lack interpretability.Hybrid approaches that combine classical and deep learning techniques have emerged as a promising solution,offering improved accuracy and explainability.Furthermore,integratingmulti-modal imaging data(e.g.,MagneticResonance Imaging(MRI),ComputedTomography(CT),Positron Emission Tomography(PET),and Ultrasound(US))poses additional challenges in FS,necessitating advanced feature fusion strategies.Multi-modal feature fusion combines information fromdifferent imagingmodalities to improve diagnostic accuracy.Recently,quantum computing has gained attention as a revolutionary approach for FS,providing the potential to handle high-dimensional medical data more efficiently.This systematic literature review comprehensively examines classical,Deep Learning(DL),hybrid,and quantum-based FS techniques inmedical imaging.Key outcomes include a structured taxonomy of FS methods,a critical evaluation of their performance across modalities,and identification of core challenges such as computational burden,interpretability,and ethical considerations.Future research directions—such as explainable AI(XAI),federated learning,and quantum-enhanced FS—are also emphasized to bridge the current gaps.This review provides actionable insights for developing scalable,interpretable,and clinically applicable FS methods in the evolving landscape of medical imaging. 展开更多
关键词 feature selection medical imaging deep learning hybrid approaches multi-modal imaging quantum computing explainable AI computational efficiency dimensionality reduction
在线阅读 下载PDF
Optimizing Feature Selection by Enhancing Particle Swarm Optimization with Orthogonal Initialization and Crossover Operator
19
作者 Indu Bala Wathsala Karunarathne Lewis Mitchell 《Computers, Materials & Continua》 2025年第7期727-744,共18页
Recent advancements in computational and database technologies have led to the exponential growth of large-scale medical datasets,significantly increasing data complexity and dimensionality in medical diagnostics.Effi... Recent advancements in computational and database technologies have led to the exponential growth of large-scale medical datasets,significantly increasing data complexity and dimensionality in medical diagnostics.Efficient feature selection methods are critical for improving diagnostic accuracy,reducing computational costs,and enhancing the interpretability of predictive models.Particle Swarm Optimization(PSO),a widely used metaheuristic inspired by swarm intelligence,has shown considerable promise in feature selection tasks.However,conventional PSO often suffers from premature convergence and limited exploration capabilities,particularly in high-dimensional spaces.To overcome these limitations,this study proposes an enhanced PSO framework incorporating Orthogonal Initializa-tion and a Crossover Operator(OrPSOC).Orthogonal Initialization ensures a diverse and uniformly distributed initial particle population,substantially improving the algorithm’s exploration capability.The Crossover Operator,inspired by genetic algorithms,introduces additional diversity during the search process,effectively mitigating premature convergence and enhancing global search performance.The effectiveness of OrPSOC was rigorously evaluated on three benchmark medical datasets—Colon,Leukemia,and Prostate Tumor.Comparative analyses were conducted against traditional filter-based methods,including Fast Clustering-Based Feature Selection Technique(Fast-C),Minimum Redundancy Maximum Relevance(MinRedMaxRel),and Five-Way Joint Mutual Information(FJMI),as well as prominent metaheuristic algorithms such as standard PSO,Ant Colony Optimization(ACO),Comprehensive Learning Gravitational Search Algorithm(CLGSA),and Fuzzy-Based CLGSA(FCLGSA).Experimental results demonstrated that OrPSOC consistently outperformed these existing methods in terms of classification accuracy,computational efficiency,and result stability,achieving significant improvements even with fewer selected features.Additionally,a sensitivity analysis of the crossover parameter provided valuable insights into parameter tuning and its impact on model performance.These findings highlight the superiority and robustness of the proposed OrPSOC approach for feature selection in medical diagnostic applications and underscore its potential for broader adoption in various high-dimensional,data-driven fields. 展开更多
关键词 Machine learning feature selection classification medical diagnosis orthogonal initialization CROSSOVER particle swarm optimization
在线阅读 下载PDF
Adaptive feature selection method for high-dimensional imbalanced data classification
20
作者 WU Jianzhen XUE Zhen +1 位作者 ZHANG Liangliang YANG Xu 《Journal of Measurement Science and Instrumentation》 2025年第4期612-624,共13页
Data collected in fields such as cybersecurity and biomedicine often encounter high dimensionality and class imbalance.To address the problem of low classification accuracy for minority class samples arising from nume... Data collected in fields such as cybersecurity and biomedicine often encounter high dimensionality and class imbalance.To address the problem of low classification accuracy for minority class samples arising from numerous irrelevant and redundant features in high-dimensional imbalanced data,we proposed a novel feature selection method named AMF-SGSK based on adaptive multi-filter and subspace-based gaining sharing knowledge.Firstly,the balanced dataset was obtained by random under-sampling.Secondly,combining the feature importance score with the AUC score for each filter method,we proposed a concept called feature hardness to judge the importance of feature,which could adaptively select the essential features.Finally,the optimal feature subset was obtained by gaining sharing knowledge in multiple subspaces.This approach effectively achieved dimensionality reduction for high-dimensional imbalanced data.The experiment results on 30 benchmark imbalanced datasets showed that AMF-SGSK performed better than other eight commonly used algorithms including BGWO and IG-SSO in terms of F1-score,AUC,and G-mean.The mean values of F1-score,AUC,and Gmean for AMF-SGSK are 0.950,0.967,and 0.965,respectively,achieving the highest among all algorithms.And the mean value of Gmean is higher than those of IG-PSO,ReliefF-GWO,and BGOA by 3.72%,11.12%,and 20.06%,respectively.Furthermore,the selected feature ratio is below 0.01 across the selected ten datasets,further demonstrating the proposed method’s overall superiority over competing approaches.AMF-SGSK could adaptively remove irrelevant and redundant features and effectively improve the classification accuracy of high-dimensional imbalanced data,providing scientific and technological references for practical applications. 展开更多
关键词 high-dimensional imbalanced data adaptive feature selection adaptive multi-filter feature hardness gaining sharing knowledge based algorithm metaheuristic algorithm
在线阅读 下载PDF
上一页 1 2 250 下一页 到第
使用帮助 返回顶部