Automated essay scoring(AES)systems have gained significant importance in educational settings,offering a scalable,efficient,and objective method for evaluating student essays.However,developing AES systems for Arabic...Automated essay scoring(AES)systems have gained significant importance in educational settings,offering a scalable,efficient,and objective method for evaluating student essays.However,developing AES systems for Arabic poses distinct challenges due to the language’s complex morphology,diglossia,and the scarcity of annotated datasets.This paper presents a hybrid approach to Arabic AES by combining text-based,vector-based,and embeddingbased similarity measures to improve essay scoring accuracy while minimizing the training data required.Using a large Arabic essay dataset categorized into thematic groups,the study conducted four experiments to evaluate the impact of feature selection,data size,and model performance.Experiment 1 established a baseline using a non-machine learning approach,selecting top-N correlated features to predict essay scores.The subsequent experiments employed 5-fold cross-validation.Experiment 2 showed that combining embedding-based,text-based,and vector-based features in a Random Forest(RF)model achieved an R2 of 88.92%and an accuracy of 83.3%within a 0.5-point tolerance.Experiment 3 further refined the feature selection process,demonstrating that 19 correlated features yielded optimal results,improving R2 to 88.95%.In Experiment 4,an optimal data efficiency training approach was introduced,where training data portions increased from 5%to 50%.The study found that using just 10%of the data achieved near-peak performance,with an R2 of 85.49%,emphasizing an effective trade-off between performance and computational costs.These findings highlight the potential of the hybrid approach for developing scalable Arabic AES systems,especially in low-resource environments,addressing linguistic challenges while ensuring efficient data usage.展开更多
Multi-label feature selection(MFS)is a crucial dimensionality reduction technique aimed at identifying informative features associated with multiple labels.However,traditional centralized methods face significant chal...Multi-label feature selection(MFS)is a crucial dimensionality reduction technique aimed at identifying informative features associated with multiple labels.However,traditional centralized methods face significant challenges in privacy-sensitive and distributed settings,often neglecting label dependencies and suffering from low computational efficiency.To address these issues,we introduce a novel framework,Fed-MFSDHBCPSO—federated MFS via dual-layer hybrid breeding cooperative particle swarm optimization algorithm with manifold and sparsity regularization(DHBCPSO-MSR).Leveraging the federated learning paradigm,Fed-MFSDHBCPSO allows clients to perform local feature selection(FS)using DHBCPSO-MSR.Locally selected feature subsets are encrypted with differential privacy(DP)and transmitted to a central server,where they are securely aggregated and refined through secure multi-party computation(SMPC)until global convergence is achieved.Within each client,DHBCPSO-MSR employs a dual-layer FS strategy.The inner layer constructs sample and label similarity graphs,generates Laplacian matrices to capture the manifold structure between samples and labels,and applies L2,1-norm regularization to sparsify the feature subset,yielding an optimized feature weight matrix.The outer layer uses a hybrid breeding cooperative particle swarm optimization algorithm to further refine the feature weight matrix and identify the optimal feature subset.The updated weight matrix is then fed back to the inner layer for further optimization.Comprehensive experiments on multiple real-world multi-label datasets demonstrate that Fed-MFSDHBCPSO consistently outperforms both centralized and federated baseline methods across several key evaluation metrics.展开更多
Acute lymphoblastic leukemia(ALL)is characterized by overgrowth of immature lymphoid cells in the bone marrow at the expense of normal hematopoiesis.One of the most prioritized tasks is the early and correct diagnosis...Acute lymphoblastic leukemia(ALL)is characterized by overgrowth of immature lymphoid cells in the bone marrow at the expense of normal hematopoiesis.One of the most prioritized tasks is the early and correct diagnosis of this malignancy;however,manual observation of the blood smear is very time-consuming and requires labor and expertise.Transfer learning in deep neural networks is of growing importance to intricate medical tasks such as medical imaging.Our work proposes an application of a novel ensemble architecture that puts together Vision Transformer and EfficientNetV2.This approach fuses deep and spatial features to optimize discriminative power by selecting features accurately,reducing redundancy,and promoting sparsity.Besides the architecture of the ensemble,the advanced feature selection is performed by the Frog-Snake Prey-Predation Relationship Optimization(FSRO)algorithm.FSRO prioritizes the most relevant features while dynamically reducing redundant and noisy data,hence improving the efficiency and accuracy of the classification model.We have compared our method for feature selection against state-of-the-art techniques and recorded an accuracy of 94.88%,a recall of 94.38%,a precision of 96.18%,and an F1-score of 95.63%.These figures are therefore better than the classical methods for deep learning.Though our dataset,collected from four different hospitals,is non-standard and heterogeneous,making the analysis more challenging,although computationally expensive,our approach proves diagnostically superior in cancer detection.Source codes and datasets are available on GitHub.展开更多
Heart disease prediction is a critical issue in healthcare,where accurate early diagnosis can save lives and reduce healthcare costs.The problem is inherently complex due to the high dimensionality of medical data,irr...Heart disease prediction is a critical issue in healthcare,where accurate early diagnosis can save lives and reduce healthcare costs.The problem is inherently complex due to the high dimensionality of medical data,irrelevant or redundant features,and the variability in risk factors such as age,lifestyle,andmedical history.These challenges often lead to inefficient and less accuratemodels.Traditional predictionmethodologies face limitations in effectively handling large feature sets and optimizing classification performance,which can result in overfitting poor generalization,and high computational cost.This work proposes a novel classification model for heart disease prediction that addresses these challenges by integrating feature selection through a Genetic Algorithm(GA)with an ensemble deep learning approach optimized using the Tunicate Swarm Algorithm(TSA).GA selects the most relevant features,reducing dimensionality and improvingmodel efficiency.Theselected features are then used to train an ensemble of deep learning models,where the TSA optimizes the weight of each model in the ensemble to enhance prediction accuracy.This hybrid approach addresses key challenges in the field,such as high dimensionality,redundant features,and classification performance,by introducing an efficient feature selection mechanism and optimizing the weighting of deep learning models in the ensemble.These enhancements result in a model that achieves superior accuracy,generalization,and efficiency compared to traditional methods.The proposed model demonstrated notable advancements in both prediction accuracy and computational efficiency over traditionalmodels.Specifically,it achieved an accuracy of 97.5%,a sensitivity of 97.2%,and a specificity of 97.8%.Additionally,with a 60-40 data split and 5-fold cross-validation,the model showed a significant reduction in training time(90 s),memory consumption(950 MB),and CPU usage(80%),highlighting its effectiveness in processing large,complex medical datasets for heart disease prediction.展开更多
Existing feature selection methods for intrusion detection systems in the Industrial Internet of Things often suffer from local optimality and high computational complexity.These challenges hinder traditional IDS from...Existing feature selection methods for intrusion detection systems in the Industrial Internet of Things often suffer from local optimality and high computational complexity.These challenges hinder traditional IDS from effectively extracting features while maintaining detection accuracy.This paper proposes an industrial Internet ofThings intrusion detection feature selection algorithm based on an improved whale optimization algorithm(GSLDWOA).The aim is to address the problems that feature selection algorithms under high-dimensional data are prone to,such as local optimality,long detection time,and reduced accuracy.First,the initial population’s diversity is increased using the Gaussian Mutation mechanism.Then,Non-linear Shrinking Factor balances global exploration and local development,avoiding premature convergence.Lastly,Variable-step Levy Flight operator and Dynamic Differential Evolution strategy are introduced to improve the algorithm’s search efficiency and convergence accuracy in highdimensional feature space.Experiments on the NSL-KDD and WUSTL-IIoT-2021 datasets demonstrate that the feature subset selected by GSLDWOA significantly improves detection performance.Compared to the traditional WOA algorithm,the detection rate and F1-score increased by 3.68%and 4.12%.On the WUSTL-IIoT-2021 dataset,accuracy,recall,and F1-score all exceed 99.9%.展开更多
Feature selection(FS)is essential in machine learning(ML)and data mapping by its ability to preprocess high-dimensional data.By selecting a subset of relevant features,feature selection cuts down on the dimension of t...Feature selection(FS)is essential in machine learning(ML)and data mapping by its ability to preprocess high-dimensional data.By selecting a subset of relevant features,feature selection cuts down on the dimension of the data.It excludes irrelevant or surplus features,thus boosting the performance and efficiency of the model.Particle Swarm Optimization(PSO)boasts a streamlined algorithmic framework and exhibits rapid convergence traits.Compared with other algorithms,it incurs reduced computational expenses when tackling high-dimensional datasets.However,PSO faces challenges like inadequate convergence precision.Therefore,regarding FS problems,this paper presents a binary version enhanced PSO based on the Support Vector Machines(SVM)classifier.First,the Sand Cat Swarm Optimization(SCSO)is added to enhance the global search capability of PSO and improve the accuracy of the solution.Secondly,the Latin hypercube sampling strategy initializes populations more uniformly and helps to increase population diversity.The last is the roundup search strategy introducing the grey wolf hierarchy idea to help improve convergence speed.To verify the capability of Self-adaptive Cooperative Particle Swarm Optimization(SCPSO),the CEC2020 test suite and CEC2022 test suite are selected for experiments and applied to three engineering problems.Compared with the standard PSO algorithm,SCPSO converges faster,and the convergence accuracy is significantly improved.Moreover,SCPSO’s comprehensive performance far exceeds that of other algorithms.Six datasets from the University of California,Irvine(UCI)database were selected to evaluate SCPSO’s effectiveness in solving feature selection problems.The results indicate that SCPSO has significant potential for addressing these problems.展开更多
Recent advancements in computational and database technologies have led to the exponential growth of large-scale medical datasets,significantly increasing data complexity and dimensionality in medical diagnostics.Effi...Recent advancements in computational and database technologies have led to the exponential growth of large-scale medical datasets,significantly increasing data complexity and dimensionality in medical diagnostics.Efficient feature selection methods are critical for improving diagnostic accuracy,reducing computational costs,and enhancing the interpretability of predictive models.Particle Swarm Optimization(PSO),a widely used metaheuristic inspired by swarm intelligence,has shown considerable promise in feature selection tasks.However,conventional PSO often suffers from premature convergence and limited exploration capabilities,particularly in high-dimensional spaces.To overcome these limitations,this study proposes an enhanced PSO framework incorporating Orthogonal Initializa-tion and a Crossover Operator(OrPSOC).Orthogonal Initialization ensures a diverse and uniformly distributed initial particle population,substantially improving the algorithm’s exploration capability.The Crossover Operator,inspired by genetic algorithms,introduces additional diversity during the search process,effectively mitigating premature convergence and enhancing global search performance.The effectiveness of OrPSOC was rigorously evaluated on three benchmark medical datasets—Colon,Leukemia,and Prostate Tumor.Comparative analyses were conducted against traditional filter-based methods,including Fast Clustering-Based Feature Selection Technique(Fast-C),Minimum Redundancy Maximum Relevance(MinRedMaxRel),and Five-Way Joint Mutual Information(FJMI),as well as prominent metaheuristic algorithms such as standard PSO,Ant Colony Optimization(ACO),Comprehensive Learning Gravitational Search Algorithm(CLGSA),and Fuzzy-Based CLGSA(FCLGSA).Experimental results demonstrated that OrPSOC consistently outperformed these existing methods in terms of classification accuracy,computational efficiency,and result stability,achieving significant improvements even with fewer selected features.Additionally,a sensitivity analysis of the crossover parameter provided valuable insights into parameter tuning and its impact on model performance.These findings highlight the superiority and robustness of the proposed OrPSOC approach for feature selection in medical diagnostic applications and underscore its potential for broader adoption in various high-dimensional,data-driven fields.展开更多
A large number of features are involved in fault diagnosis,and it is challenging to identify important and relative features for fault classification.Feature selection selects suitable features from the fault dataset ...A large number of features are involved in fault diagnosis,and it is challenging to identify important and relative features for fault classification.Feature selection selects suitable features from the fault dataset to determine the root cause of the fault.Particle swarm optimization(PSO)has shown promising results in performing feature selection due to its promising search effectiveness and ease of implementation.However,most PSObased feature selection approaches for fault diagnosis do not adequately take domain-specific a priori knowledge into account.In this study,we propose a correlation-guided PSO feature selection approach for fault diagnosis that focuses on improving the initialisation effectiveness,individual exploration ability,and population diversity.To be more specific,an initialisation strategy based on feature correlation is designed to enhance the quality of the initial population,while a probability individual updating mechanism is proposed to improve the exploitation ability.In addition,a sample shrinkage strategy is developed to enhance the ability to jump out of local optimal.Results on four public fault diagnosis datasets show that the proposed approach can select smaller feature subsets to achieve higher classification accuracy than other state-of-the-art feature selection methods in most cases.Furthermore,the effectiveness of the proposed approach is also verified by examining real-world fault diagnosis problems.展开更多
In recent years, particle swarm optimization (PSO) has received widespread attention in feature selection due to its simplicity and potential for global search. However, in traditional PSO, particles primarily update ...In recent years, particle swarm optimization (PSO) has received widespread attention in feature selection due to its simplicity and potential for global search. However, in traditional PSO, particles primarily update based on two extreme values: personal best and global best, which limits the diversity of information. Ideally, particles should learn from multiple advantageous particles to enhance interactivity and optimization efficiency. Accordingly, this paper proposes a PSO that simulates the evolutionary dynamics of species survival in mountain peak ecology (PEPSO) for feature selection. Based on the pyramid topology, the algorithm simulates the features of mountain peak ecology in nature and the competitive-cooperative strategies among species. According to the principles of the algorithm, the population is first adaptively divided into many subgroups based on the fitness level of particles. Then, particles within each subgroup are divided into three different types based on their evolutionary levels, employing different adaptive inertia weight rules and dynamic learning mechanisms to define distinct learning modes. Consequently, all particles play their respective roles in promoting the global optimization performance of the algorithm, similar to different species in the ecological pattern of mountain peaks. Experimental validation of the PEPSO performance was conducted on 18 public datasets. The experimental results demonstrate that the PEPSO outperforms other PSO variant-based feature selection methods and mainstream feature selection methods based on intelligent optimization algorithms in terms of overall performance in global search capability, classification accuracy, and reduction of feature space dimensions. Wilcoxon signed-rank test also confirms the excellent performance of the PEPSO.展开更多
Software defect prediction(SDP)aims to find a reliable method to predict defects in specific software projects and help software engineers allocate limited resources to release high-quality software products.Software ...Software defect prediction(SDP)aims to find a reliable method to predict defects in specific software projects and help software engineers allocate limited resources to release high-quality software products.Software defect prediction can be effectively performed using traditional features,but there are some redundant or irrelevant features in them(the presence or absence of this feature has little effect on the prediction results).These problems can be solved using feature selection.However,existing feature selection methods have shortcomings such as insignificant dimensionality reduction effect and low classification accuracy of the selected optimal feature subset.In order to reduce the impact of these shortcomings,this paper proposes a new feature selection method Cubic TraverseMa Beluga whale optimization algorithm(CTMBWO)based on the improved Beluga whale optimization algorithm(BWO).The goal of this study is to determine how well the CTMBWO can extract the features that are most important for correctly predicting software defects,improve the accuracy of fault prediction,reduce the number of the selected feature and mitigate the risk of overfitting,thereby achieving more efficient resource utilization and better distribution of test workload.The CTMBWO comprises three main stages:preprocessing the dataset,selecting relevant features,and evaluating the classification performance of the model.The novel feature selection method can effectively improve the performance of SDP.This study performs experiments on two software defect datasets(PROMISE,NASA)and shows the method’s classification performance using four detailed evaluation metrics,Accuracy,F1-score,MCC,AUC and Recall.The results indicate that the approach presented in this paper achieves outstanding classification performance on both datasets and has significant improvement over the baseline models.展开更多
The exponential growth of data in recent years has introduced significant challenges in managing high-dimensional datasets,particularly in industrial contexts where efficient data handling and process innovation are c...The exponential growth of data in recent years has introduced significant challenges in managing high-dimensional datasets,particularly in industrial contexts where efficient data handling and process innovation are critical.Feature selection,an essential step in data-driven process innovation,aims to identify the most relevant features to improve model interpretability,reduce complexity,and enhance predictive accuracy.To address the limitations of existing feature selection methods,this study introduces a novel wrapper-based feature selection framework leveraging the recently proposed Arctic Puffin Optimization(APO)algorithm.Specifically,we incorporate a specialized conversion mechanism to effectively adapt APO from continuous optimization to discrete,binary feature selection problems.Moreover,we introduce a fully parallelized implementation of APO in which both the search operators and fitness evaluations are executed concurrently using MATLAB’s Parallel Computing Toolbox.This parallel design significantly improves runtime efficiency and scalability,particularly for high-dimensional feature spaces.Extensive comparative experiments conducted against 14 state-of-the-art metaheuristic algorithms across 15 benchmark datasets reveal that the proposed APO-based method consistently achieves superior classification accuracy while selecting fewer features.These findings highlight the robustness and effectiveness of APO,validating its potential for advancing process innovation,economic productivity and smart city application in real-world machine learning scenarios.展开更多
With the birth of Software-Defined Networking(SDN),integration of both SDN and traditional architectures becomes the development trend of computer networks.Network intrusion detection faces challenges in dealing with ...With the birth of Software-Defined Networking(SDN),integration of both SDN and traditional architectures becomes the development trend of computer networks.Network intrusion detection faces challenges in dealing with complex attacks in SDN environments,thus to address the network security issues from the viewpoint of Artificial Intelligence(AI),this paper introduces the Crayfish Optimization Algorithm(COA)to the field of intrusion detection for both SDN and traditional network architectures,and based on the characteristics of the original COA,an Improved Crayfish Optimization Algorithm(ICOA)is proposed by integrating strategies of elite reverse learning,Levy flight,crowding factor and parameter modification.The ICOA is then utilized for AI-integrated feature selection of intrusion detection for both SDN and traditional network architectures,to reduce the dimensionality of the data and improve the performance of network intrusion detection.Finally,the performance evaluation is performed by testing not only the NSL-KDD dataset and the UNSW-NB 15 dataset for traditional networks but also the InSDN dataset for SDN-based networks.Experimental results show that ICOA improves the accuracy by 0.532%and 2.928%respectively compared with GWO and COA in traditional networks.In SDN networks,the accuracy of ICOA is 0.25%and 0.3%higher than COA and PSO.These findings collectively indicate that AI-integrated feature selection based on the proposed ICOA can promote network intrusion detection for both SDN and traditional architectures.展开更多
Heart disease is a primary cause of death worldwide and is notoriously difficult to cure without a proper diagnosis.Hence,machine learning(ML)can reduce and better understand symptoms associated with heart disease.Thi...Heart disease is a primary cause of death worldwide and is notoriously difficult to cure without a proper diagnosis.Hence,machine learning(ML)can reduce and better understand symptoms associated with heart disease.This study aims to develop a framework for the automatic and accurate classification of heart disease utilizing machine learning algorithms,grid search(GS),and the Aquila optimization algorithm.In the proposed approach,feature selection is used to identify characteristics of heart disease by using a method for dimensionality reduction.First,feature selection is accomplished with the help of the Aquila algorithm.Then,the optimal combination of the hyperparameters is selected using grid search.The experiments were conducted with three datasets from Kaggle:The Heart Failure Prediction Dataset,Heart Disease Binary Classification,and Heart Disease Dataset.Two classes can be distinguished:diseased and healthy(i.e.,uninfected).The Histogram Gradient Boosting(HGB)classifier produced the highest Weighted Sum Metric(WSM)scores of 98.65%concerning the Heart Failure Prediction Dataset.In contrast,the Decision Tree(DT)machine learning classifier had the highest WSM scores of 87.64%concerning the Heart Disease Health Indicators Dataset.Measures of accuracy,specificity,sensitivity,and other metrics are used to evaluate the proposed approach.The presented method demonstrates superior performance compared to different state-ofthe-art algorithms.展开更多
Particle Swarm Optimization (PSO) is a popular and bionic algorithm based on the social behavior associated with bird flocking for optimization problems. To maintain the diversity of swarms, a few studies of multi-s...Particle Swarm Optimization (PSO) is a popular and bionic algorithm based on the social behavior associated with bird flocking for optimization problems. To maintain the diversity of swarms, a few studies of multi-swarm strategy have been reported. However, the competition among swarms, reservation or destruction of a swarm, has not been considered further. In this paper, we formulate four rules by introducing the mechanism for survival of the fittest, which simulates the competition among the swarms. Based on the mechanism, we design a modified Multi-Swarm PSO (MSPSO) to solve discrete problems, which consists of a number of sub-swarms and a multi-swarm scheduler that can monitor and control each sub-swarm using the rules. To further settle the feature selection problems, we propose an Improved Feature Selection (1FS) method by integrating MSPSO, Support Vector Machines (SVM) with F-score method. The IFS method aims to achieve higher generalization capa- bility through performing kernel parameter optimization and feature selection simultaneously. The performance of the proposed method is compared with that of the standard PSO based, Genetic Algorithm (GA) based and the grid search based mcthods on 10 benchmark datasets, taken from UCI machine learning and StatLog databases. The numerical results and statistical analysis show that the proposed IFS method performs significantly better than the other three methods in terms of prediction accuracy with smaller subset of features.展开更多
Dipper throated optimization(DTO)algorithm is a novel with a very efficient metaheuristic inspired by the dipper throated bird.DTO has its unique hunting technique by performing rapid bowing movements.To show the effi...Dipper throated optimization(DTO)algorithm is a novel with a very efficient metaheuristic inspired by the dipper throated bird.DTO has its unique hunting technique by performing rapid bowing movements.To show the efficiency of the proposed algorithm,DTO is tested and compared to the algorithms of Particle Swarm Optimization(PSO),Whale Optimization Algorithm(WOA),Grey Wolf Optimizer(GWO),and Genetic Algorithm(GA)based on the seven unimodal benchmark functions.Then,ANOVA and Wilcoxon rank-sum tests are performed to confirm the effectiveness of the DTO compared to other optimization techniques.Additionally,to demonstrate the proposed algorithm’s suitability for solving complex realworld issues,DTO is used to solve the feature selection problem.The strategy of using DTOs as feature selection is evaluated using commonly used data sets from the University of California at Irvine(UCI)repository.The findings indicate that the DTO outperforms all other algorithms in addressing feature selection issues,demonstrating the proposed algorithm’s capabilities to solve complex real-world situations.展开更多
Whale optimization algorithm(WOA)tends to fall into the local optimum and fails to converge quickly in solving complex problems.To address the shortcomings,an improved WOA(QGBWOA)is proposed in this work.First,quasi-o...Whale optimization algorithm(WOA)tends to fall into the local optimum and fails to converge quickly in solving complex problems.To address the shortcomings,an improved WOA(QGBWOA)is proposed in this work.First,quasi-opposition-based learning is introduced to enhance the ability of WOA to search for optimal solutions.Second,a Gaussian barebone mechanism is embedded to promote diversity and expand the scope of the solution space in WOA.To verify the advantages of QGBWOA,comparison experiments between QGBWOA and its comparison peers were carried out on CEC 2014 with dimensions 10,30,50,and 100 and on CEC 2020 test with dimension 30.Furthermore,the performance results were tested using Wilcoxon signed-rank(WS),Friedman test,and post hoc statistical tests for statistical analysis.Convergence accuracy and speed are remarkably improved,as shown by experimental results.Finally,feature selection and multi-threshold image segmentation applications are demonstrated to validate the ability of QGBWOA to solve complex real-world problems.QGBWOA proves its superiority over compared algorithms in feature selection and multi-threshold image segmentation by performing several evaluation metrics.展开更多
The advent of Big Data has rendered Machine Learning tasks more intricate as they frequently involve higher-dimensional data.Feature Selection(FS)methods can abate the complexity of the data and enhance the accuracy,g...The advent of Big Data has rendered Machine Learning tasks more intricate as they frequently involve higher-dimensional data.Feature Selection(FS)methods can abate the complexity of the data and enhance the accuracy,generalizability,and interpretability of models.Meta-heuristic algorithms are often utilized for FS tasks due to their low requirements and efficient performance.This paper introduces an augmented Forensic-Based Investigation algorithm(DCFBI)that incorporates a Dynamic Individual Selection(DIS)and crisscross(CC)mechanism to improve the pursuit phase of the FBI.Moreover,a binary version of DCFBI(BDCFBI)is applied to FS.Experiments conducted on IEEE CEC 2017 with other metaheuristics demonstrate that DCFBI surpasses them in search capability.The influence of different mechanisms on the original FBI is analyzed on benchmark functions,while its scalability is verified by comparing it with the original FBI on benchmarks with varied dimensions.BDCFBI is then applied to 18 real datasets from the UCI machine learning database and the Wieslaw dataset to select near-optimal features,which are then compared with six renowned binary metaheuristics.The results show that BDCFBI can be more competitive than similar methods and acquire a subset of features with superior classification accuracy.展开更多
Feature Selection(FS)is considered as an important preprocessing step in data mining and is used to remove redundant or unrelated features from high-dimensional data.Most optimization algorithms for FS problems are no...Feature Selection(FS)is considered as an important preprocessing step in data mining and is used to remove redundant or unrelated features from high-dimensional data.Most optimization algorithms for FS problems are not balanced in search.A hybrid algorithm called nonlinear binary grasshopper whale optimization algorithm(NL-BGWOA)is proposed to solve the problem in this paper.In the proposed method,a new position updating strategy combining the position changes of whales and grasshoppers population is expressed,which optimizes the diversity of searching in the target domain.Ten distinct high-dimensional UCI datasets,the multi-modal Parkinson's speech datasets,and the COVID-19 symptom dataset are used to validate the proposed method.It has been demonstrated that the proposed NL-BGWOA performs well across most of high-dimensional datasets,which shows a high accuracy rate of up to 0.9895.Furthermore,the experimental results on the medical datasets also demonstrate the advantages of the proposed method in actual FS problem,including accuracy,size of feature subsets,and fitness with best values of 0.913,5.7,and 0.0873,respectively.The results reveal that the proposed NL-BGWOA has comprehensive superiority in solving the FS problem of high-dimensional data.展开更多
Feature selection has been widely used in data mining and machine learning.Its objective is to select a minimal subset of features according to some reasonable criteria so as to solve the original task more quickly.In...Feature selection has been widely used in data mining and machine learning.Its objective is to select a minimal subset of features according to some reasonable criteria so as to solve the original task more quickly.In this article,a feature selection algorithm with local search strategy based on the forest optimization algorithm,namely FSLSFOA,is proposed.The novel local search strategy in local seeding process guarantees the quality of the feature subset in the forest.Next,the fitness function is improved,which not only considers the classification accuracy,but also considers the size of the feature subset.To avoid falling into local optimum,a novel global seeding method is attempted,which selects trees on the bottom of candidate set and gives the algorithm more diversities.Finally,FSLSFOA is compared with four feature selection methods to verify its effectiveness.Most of the results are superior to these comparative methods.展开更多
Selecting the most relevant subset of features from a dataset is a vital step in data mining and machine learning.Each feature in a dataset has 2n possible subsets,making it challenging to select the optimum collectio...Selecting the most relevant subset of features from a dataset is a vital step in data mining and machine learning.Each feature in a dataset has 2n possible subsets,making it challenging to select the optimum collection of features using typical methods.As a result,a new metaheuristicsbased feature selection method based on the dipper-throated and grey-wolf optimization(DTO-GW)algorithms has been developed in this research.Instability can result when the selection of features is subject to metaheuristics,which can lead to a wide range of results.Thus,we adopted hybrid optimization in our method of optimizing,which allowed us to better balance exploration and harvesting chores more equitably.We propose utilizing the binary DTO-GW search approach we previously devised for selecting the optimal subset of attributes.In the proposed method,the number of features selected is minimized,while classification accuracy is increased.To test the proposed method’s performance against eleven other state-of-theart approaches,eight datasets from the UCI repository were used,such as binary grey wolf search(bGWO),binary hybrid grey wolf,and particle swarm optimization(bGWO-PSO),bPSO,binary stochastic fractal search(bSFS),binary whale optimization algorithm(bWOA),binary modified grey wolf optimization(bMGWO),binary multiverse optimization(bMVO),binary bowerbird optimization(bSBO),binary hysteresis optimization(bHy),and binary hysteresis optimization(bHWO).The suggested method is superior 4532 CMC,2023,vol.74,no.2 and successful in handling the problem of feature selection,according to the results of the experiments.展开更多
基金funded by Deanship of Graduate studies and Scientific Research at Jouf University under grant No.(DGSSR-2024-02-01264).
文摘Automated essay scoring(AES)systems have gained significant importance in educational settings,offering a scalable,efficient,and objective method for evaluating student essays.However,developing AES systems for Arabic poses distinct challenges due to the language’s complex morphology,diglossia,and the scarcity of annotated datasets.This paper presents a hybrid approach to Arabic AES by combining text-based,vector-based,and embeddingbased similarity measures to improve essay scoring accuracy while minimizing the training data required.Using a large Arabic essay dataset categorized into thematic groups,the study conducted four experiments to evaluate the impact of feature selection,data size,and model performance.Experiment 1 established a baseline using a non-machine learning approach,selecting top-N correlated features to predict essay scores.The subsequent experiments employed 5-fold cross-validation.Experiment 2 showed that combining embedding-based,text-based,and vector-based features in a Random Forest(RF)model achieved an R2 of 88.92%and an accuracy of 83.3%within a 0.5-point tolerance.Experiment 3 further refined the feature selection process,demonstrating that 19 correlated features yielded optimal results,improving R2 to 88.95%.In Experiment 4,an optimal data efficiency training approach was introduced,where training data portions increased from 5%to 50%.The study found that using just 10%of the data achieved near-peak performance,with an R2 of 85.49%,emphasizing an effective trade-off between performance and computational costs.These findings highlight the potential of the hybrid approach for developing scalable Arabic AES systems,especially in low-resource environments,addressing linguistic challenges while ensuring efficient data usage.
文摘Multi-label feature selection(MFS)is a crucial dimensionality reduction technique aimed at identifying informative features associated with multiple labels.However,traditional centralized methods face significant challenges in privacy-sensitive and distributed settings,often neglecting label dependencies and suffering from low computational efficiency.To address these issues,we introduce a novel framework,Fed-MFSDHBCPSO—federated MFS via dual-layer hybrid breeding cooperative particle swarm optimization algorithm with manifold and sparsity regularization(DHBCPSO-MSR).Leveraging the federated learning paradigm,Fed-MFSDHBCPSO allows clients to perform local feature selection(FS)using DHBCPSO-MSR.Locally selected feature subsets are encrypted with differential privacy(DP)and transmitted to a central server,where they are securely aggregated and refined through secure multi-party computation(SMPC)until global convergence is achieved.Within each client,DHBCPSO-MSR employs a dual-layer FS strategy.The inner layer constructs sample and label similarity graphs,generates Laplacian matrices to capture the manifold structure between samples and labels,and applies L2,1-norm regularization to sparsify the feature subset,yielding an optimized feature weight matrix.The outer layer uses a hybrid breeding cooperative particle swarm optimization algorithm to further refine the feature weight matrix and identify the optimal feature subset.The updated weight matrix is then fed back to the inner layer for further optimization.Comprehensive experiments on multiple real-world multi-label datasets demonstrate that Fed-MFSDHBCPSO consistently outperforms both centralized and federated baseline methods across several key evaluation metrics.
文摘Acute lymphoblastic leukemia(ALL)is characterized by overgrowth of immature lymphoid cells in the bone marrow at the expense of normal hematopoiesis.One of the most prioritized tasks is the early and correct diagnosis of this malignancy;however,manual observation of the blood smear is very time-consuming and requires labor and expertise.Transfer learning in deep neural networks is of growing importance to intricate medical tasks such as medical imaging.Our work proposes an application of a novel ensemble architecture that puts together Vision Transformer and EfficientNetV2.This approach fuses deep and spatial features to optimize discriminative power by selecting features accurately,reducing redundancy,and promoting sparsity.Besides the architecture of the ensemble,the advanced feature selection is performed by the Frog-Snake Prey-Predation Relationship Optimization(FSRO)algorithm.FSRO prioritizes the most relevant features while dynamically reducing redundant and noisy data,hence improving the efficiency and accuracy of the classification model.We have compared our method for feature selection against state-of-the-art techniques and recorded an accuracy of 94.88%,a recall of 94.38%,a precision of 96.18%,and an F1-score of 95.63%.These figures are therefore better than the classical methods for deep learning.Though our dataset,collected from four different hospitals,is non-standard and heterogeneous,making the analysis more challenging,although computationally expensive,our approach proves diagnostically superior in cancer detection.Source codes and datasets are available on GitHub.
文摘Heart disease prediction is a critical issue in healthcare,where accurate early diagnosis can save lives and reduce healthcare costs.The problem is inherently complex due to the high dimensionality of medical data,irrelevant or redundant features,and the variability in risk factors such as age,lifestyle,andmedical history.These challenges often lead to inefficient and less accuratemodels.Traditional predictionmethodologies face limitations in effectively handling large feature sets and optimizing classification performance,which can result in overfitting poor generalization,and high computational cost.This work proposes a novel classification model for heart disease prediction that addresses these challenges by integrating feature selection through a Genetic Algorithm(GA)with an ensemble deep learning approach optimized using the Tunicate Swarm Algorithm(TSA).GA selects the most relevant features,reducing dimensionality and improvingmodel efficiency.Theselected features are then used to train an ensemble of deep learning models,where the TSA optimizes the weight of each model in the ensemble to enhance prediction accuracy.This hybrid approach addresses key challenges in the field,such as high dimensionality,redundant features,and classification performance,by introducing an efficient feature selection mechanism and optimizing the weighting of deep learning models in the ensemble.These enhancements result in a model that achieves superior accuracy,generalization,and efficiency compared to traditional methods.The proposed model demonstrated notable advancements in both prediction accuracy and computational efficiency over traditionalmodels.Specifically,it achieved an accuracy of 97.5%,a sensitivity of 97.2%,and a specificity of 97.8%.Additionally,with a 60-40 data split and 5-fold cross-validation,the model showed a significant reduction in training time(90 s),memory consumption(950 MB),and CPU usage(80%),highlighting its effectiveness in processing large,complex medical datasets for heart disease prediction.
基金supported by the Major Science and Technology Programs in Henan Province(No.241100210100)Henan Provincial Science and Technology Research Project(No.252102211085,No.252102211105)+3 种基金Endogenous Security Cloud Network Convergence R&D Center(No.602431011PQ1)The Special Project for Research and Development in Key Areas of Guangdong Province(No.2021ZDZX1098)The Stabilization Support Program of Science,Technology and Innovation Commission of Shenzhen Municipality(No.20231128083944001)The Key scientific research projects of Henan higher education institutions(No.24A520042).
文摘Existing feature selection methods for intrusion detection systems in the Industrial Internet of Things often suffer from local optimality and high computational complexity.These challenges hinder traditional IDS from effectively extracting features while maintaining detection accuracy.This paper proposes an industrial Internet ofThings intrusion detection feature selection algorithm based on an improved whale optimization algorithm(GSLDWOA).The aim is to address the problems that feature selection algorithms under high-dimensional data are prone to,such as local optimality,long detection time,and reduced accuracy.First,the initial population’s diversity is increased using the Gaussian Mutation mechanism.Then,Non-linear Shrinking Factor balances global exploration and local development,avoiding premature convergence.Lastly,Variable-step Levy Flight operator and Dynamic Differential Evolution strategy are introduced to improve the algorithm’s search efficiency and convergence accuracy in highdimensional feature space.Experiments on the NSL-KDD and WUSTL-IIoT-2021 datasets demonstrate that the feature subset selected by GSLDWOA significantly improves detection performance.Compared to the traditional WOA algorithm,the detection rate and F1-score increased by 3.68%and 4.12%.On the WUSTL-IIoT-2021 dataset,accuracy,recall,and F1-score all exceed 99.9%.
基金supported by the Fundamental Research Funds for the Central Universities of China(No.300102122105)the Natural Science Basic Research Plan in Shaanxi Province of China(2023-JC-YB-023).
文摘Feature selection(FS)is essential in machine learning(ML)and data mapping by its ability to preprocess high-dimensional data.By selecting a subset of relevant features,feature selection cuts down on the dimension of the data.It excludes irrelevant or surplus features,thus boosting the performance and efficiency of the model.Particle Swarm Optimization(PSO)boasts a streamlined algorithmic framework and exhibits rapid convergence traits.Compared with other algorithms,it incurs reduced computational expenses when tackling high-dimensional datasets.However,PSO faces challenges like inadequate convergence precision.Therefore,regarding FS problems,this paper presents a binary version enhanced PSO based on the Support Vector Machines(SVM)classifier.First,the Sand Cat Swarm Optimization(SCSO)is added to enhance the global search capability of PSO and improve the accuracy of the solution.Secondly,the Latin hypercube sampling strategy initializes populations more uniformly and helps to increase population diversity.The last is the roundup search strategy introducing the grey wolf hierarchy idea to help improve convergence speed.To verify the capability of Self-adaptive Cooperative Particle Swarm Optimization(SCPSO),the CEC2020 test suite and CEC2022 test suite are selected for experiments and applied to three engineering problems.Compared with the standard PSO algorithm,SCPSO converges faster,and the convergence accuracy is significantly improved.Moreover,SCPSO’s comprehensive performance far exceeds that of other algorithms.Six datasets from the University of California,Irvine(UCI)database were selected to evaluate SCPSO’s effectiveness in solving feature selection problems.The results indicate that SCPSO has significant potential for addressing these problems.
文摘Recent advancements in computational and database technologies have led to the exponential growth of large-scale medical datasets,significantly increasing data complexity and dimensionality in medical diagnostics.Efficient feature selection methods are critical for improving diagnostic accuracy,reducing computational costs,and enhancing the interpretability of predictive models.Particle Swarm Optimization(PSO),a widely used metaheuristic inspired by swarm intelligence,has shown considerable promise in feature selection tasks.However,conventional PSO often suffers from premature convergence and limited exploration capabilities,particularly in high-dimensional spaces.To overcome these limitations,this study proposes an enhanced PSO framework incorporating Orthogonal Initializa-tion and a Crossover Operator(OrPSOC).Orthogonal Initialization ensures a diverse and uniformly distributed initial particle population,substantially improving the algorithm’s exploration capability.The Crossover Operator,inspired by genetic algorithms,introduces additional diversity during the search process,effectively mitigating premature convergence and enhancing global search performance.The effectiveness of OrPSOC was rigorously evaluated on three benchmark medical datasets—Colon,Leukemia,and Prostate Tumor.Comparative analyses were conducted against traditional filter-based methods,including Fast Clustering-Based Feature Selection Technique(Fast-C),Minimum Redundancy Maximum Relevance(MinRedMaxRel),and Five-Way Joint Mutual Information(FJMI),as well as prominent metaheuristic algorithms such as standard PSO,Ant Colony Optimization(ACO),Comprehensive Learning Gravitational Search Algorithm(CLGSA),and Fuzzy-Based CLGSA(FCLGSA).Experimental results demonstrated that OrPSOC consistently outperformed these existing methods in terms of classification accuracy,computational efficiency,and result stability,achieving significant improvements even with fewer selected features.Additionally,a sensitivity analysis of the crossover parameter provided valuable insights into parameter tuning and its impact on model performance.These findings highlight the superiority and robustness of the proposed OrPSOC approach for feature selection in medical diagnostic applications and underscore its potential for broader adoption in various high-dimensional,data-driven fields.
基金supported in part by the National Natural Science Foundation of China(62206255,62476254,62176238,U23A20340)Natural Science Foundation of Henan(252300421501)+3 种基金Young Talents Lifting Project of Henan Association for Scienceand Technology(2024HYTP023)Frontier Exploration Projects of Longmen Laboratory(LMQYTSKT031)Program for Science&Technology Innovation Talents in Universities of Henan Province(23HASTIT023)Key Research and Development Program of Henan(251111113900,241111210100).
文摘A large number of features are involved in fault diagnosis,and it is challenging to identify important and relative features for fault classification.Feature selection selects suitable features from the fault dataset to determine the root cause of the fault.Particle swarm optimization(PSO)has shown promising results in performing feature selection due to its promising search effectiveness and ease of implementation.However,most PSObased feature selection approaches for fault diagnosis do not adequately take domain-specific a priori knowledge into account.In this study,we propose a correlation-guided PSO feature selection approach for fault diagnosis that focuses on improving the initialisation effectiveness,individual exploration ability,and population diversity.To be more specific,an initialisation strategy based on feature correlation is designed to enhance the quality of the initial population,while a probability individual updating mechanism is proposed to improve the exploitation ability.In addition,a sample shrinkage strategy is developed to enhance the ability to jump out of local optimal.Results on four public fault diagnosis datasets show that the proposed approach can select smaller feature subsets to achieve higher classification accuracy than other state-of-the-art feature selection methods in most cases.Furthermore,the effectiveness of the proposed approach is also verified by examining real-world fault diagnosis problems.
文摘In recent years, particle swarm optimization (PSO) has received widespread attention in feature selection due to its simplicity and potential for global search. However, in traditional PSO, particles primarily update based on two extreme values: personal best and global best, which limits the diversity of information. Ideally, particles should learn from multiple advantageous particles to enhance interactivity and optimization efficiency. Accordingly, this paper proposes a PSO that simulates the evolutionary dynamics of species survival in mountain peak ecology (PEPSO) for feature selection. Based on the pyramid topology, the algorithm simulates the features of mountain peak ecology in nature and the competitive-cooperative strategies among species. According to the principles of the algorithm, the population is first adaptively divided into many subgroups based on the fitness level of particles. Then, particles within each subgroup are divided into three different types based on their evolutionary levels, employing different adaptive inertia weight rules and dynamic learning mechanisms to define distinct learning modes. Consequently, all particles play their respective roles in promoting the global optimization performance of the algorithm, similar to different species in the ecological pattern of mountain peaks. Experimental validation of the PEPSO performance was conducted on 18 public datasets. The experimental results demonstrate that the PEPSO outperforms other PSO variant-based feature selection methods and mainstream feature selection methods based on intelligent optimization algorithms in terms of overall performance in global search capability, classification accuracy, and reduction of feature space dimensions. Wilcoxon signed-rank test also confirms the excellent performance of the PEPSO.
文摘Software defect prediction(SDP)aims to find a reliable method to predict defects in specific software projects and help software engineers allocate limited resources to release high-quality software products.Software defect prediction can be effectively performed using traditional features,but there are some redundant or irrelevant features in them(the presence or absence of this feature has little effect on the prediction results).These problems can be solved using feature selection.However,existing feature selection methods have shortcomings such as insignificant dimensionality reduction effect and low classification accuracy of the selected optimal feature subset.In order to reduce the impact of these shortcomings,this paper proposes a new feature selection method Cubic TraverseMa Beluga whale optimization algorithm(CTMBWO)based on the improved Beluga whale optimization algorithm(BWO).The goal of this study is to determine how well the CTMBWO can extract the features that are most important for correctly predicting software defects,improve the accuracy of fault prediction,reduce the number of the selected feature and mitigate the risk of overfitting,thereby achieving more efficient resource utilization and better distribution of test workload.The CTMBWO comprises three main stages:preprocessing the dataset,selecting relevant features,and evaluating the classification performance of the model.The novel feature selection method can effectively improve the performance of SDP.This study performs experiments on two software defect datasets(PROMISE,NASA)and shows the method’s classification performance using four detailed evaluation metrics,Accuracy,F1-score,MCC,AUC and Recall.The results indicate that the approach presented in this paper achieves outstanding classification performance on both datasets and has significant improvement over the baseline models.
文摘The exponential growth of data in recent years has introduced significant challenges in managing high-dimensional datasets,particularly in industrial contexts where efficient data handling and process innovation are critical.Feature selection,an essential step in data-driven process innovation,aims to identify the most relevant features to improve model interpretability,reduce complexity,and enhance predictive accuracy.To address the limitations of existing feature selection methods,this study introduces a novel wrapper-based feature selection framework leveraging the recently proposed Arctic Puffin Optimization(APO)algorithm.Specifically,we incorporate a specialized conversion mechanism to effectively adapt APO from continuous optimization to discrete,binary feature selection problems.Moreover,we introduce a fully parallelized implementation of APO in which both the search operators and fitness evaluations are executed concurrently using MATLAB’s Parallel Computing Toolbox.This parallel design significantly improves runtime efficiency and scalability,particularly for high-dimensional feature spaces.Extensive comparative experiments conducted against 14 state-of-the-art metaheuristic algorithms across 15 benchmark datasets reveal that the proposed APO-based method consistently achieves superior classification accuracy while selecting fewer features.These findings highlight the robustness and effectiveness of APO,validating its potential for advancing process innovation,economic productivity and smart city application in real-world machine learning scenarios.
基金supported by the National Natural Science Foundation of China under Grant 61602162the Hubei Provincial Science and Technology Plan Project under Grant 2023BCB041.
文摘With the birth of Software-Defined Networking(SDN),integration of both SDN and traditional architectures becomes the development trend of computer networks.Network intrusion detection faces challenges in dealing with complex attacks in SDN environments,thus to address the network security issues from the viewpoint of Artificial Intelligence(AI),this paper introduces the Crayfish Optimization Algorithm(COA)to the field of intrusion detection for both SDN and traditional network architectures,and based on the characteristics of the original COA,an Improved Crayfish Optimization Algorithm(ICOA)is proposed by integrating strategies of elite reverse learning,Levy flight,crowding factor and parameter modification.The ICOA is then utilized for AI-integrated feature selection of intrusion detection for both SDN and traditional network architectures,to reduce the dimensionality of the data and improve the performance of network intrusion detection.Finally,the performance evaluation is performed by testing not only the NSL-KDD dataset and the UNSW-NB 15 dataset for traditional networks but also the InSDN dataset for SDN-based networks.Experimental results show that ICOA improves the accuracy by 0.532%and 2.928%respectively compared with GWO and COA in traditional networks.In SDN networks,the accuracy of ICOA is 0.25%and 0.3%higher than COA and PSO.These findings collectively indicate that AI-integrated feature selection based on the proposed ICOA can promote network intrusion detection for both SDN and traditional architectures.
文摘Heart disease is a primary cause of death worldwide and is notoriously difficult to cure without a proper diagnosis.Hence,machine learning(ML)can reduce and better understand symptoms associated with heart disease.This study aims to develop a framework for the automatic and accurate classification of heart disease utilizing machine learning algorithms,grid search(GS),and the Aquila optimization algorithm.In the proposed approach,feature selection is used to identify characteristics of heart disease by using a method for dimensionality reduction.First,feature selection is accomplished with the help of the Aquila algorithm.Then,the optimal combination of the hyperparameters is selected using grid search.The experiments were conducted with three datasets from Kaggle:The Heart Failure Prediction Dataset,Heart Disease Binary Classification,and Heart Disease Dataset.Two classes can be distinguished:diseased and healthy(i.e.,uninfected).The Histogram Gradient Boosting(HGB)classifier produced the highest Weighted Sum Metric(WSM)scores of 98.65%concerning the Heart Failure Prediction Dataset.In contrast,the Decision Tree(DT)machine learning classifier had the highest WSM scores of 87.64%concerning the Heart Disease Health Indicators Dataset.Measures of accuracy,specificity,sensitivity,and other metrics are used to evaluate the proposed approach.The presented method demonstrates superior performance compared to different state-ofthe-art algorithms.
基金Acknowledgments This work was supported by National Natural Science Foundation of China (Grant no. 60971089), National Electronic Development Foundation of China (Grant no. 2009537), Jilin Province Science and Tech- nology Department Project of China (Grant no. 20090502).
文摘Particle Swarm Optimization (PSO) is a popular and bionic algorithm based on the social behavior associated with bird flocking for optimization problems. To maintain the diversity of swarms, a few studies of multi-swarm strategy have been reported. However, the competition among swarms, reservation or destruction of a swarm, has not been considered further. In this paper, we formulate four rules by introducing the mechanism for survival of the fittest, which simulates the competition among the swarms. Based on the mechanism, we design a modified Multi-Swarm PSO (MSPSO) to solve discrete problems, which consists of a number of sub-swarms and a multi-swarm scheduler that can monitor and control each sub-swarm using the rules. To further settle the feature selection problems, we propose an Improved Feature Selection (1FS) method by integrating MSPSO, Support Vector Machines (SVM) with F-score method. The IFS method aims to achieve higher generalization capa- bility through performing kernel parameter optimization and feature selection simultaneously. The performance of the proposed method is compared with that of the standard PSO based, Genetic Algorithm (GA) based and the grid search based mcthods on 10 benchmark datasets, taken from UCI machine learning and StatLog databases. The numerical results and statistical analysis show that the proposed IFS method performs significantly better than the other three methods in terms of prediction accuracy with smaller subset of features.
文摘Dipper throated optimization(DTO)algorithm is a novel with a very efficient metaheuristic inspired by the dipper throated bird.DTO has its unique hunting technique by performing rapid bowing movements.To show the efficiency of the proposed algorithm,DTO is tested and compared to the algorithms of Particle Swarm Optimization(PSO),Whale Optimization Algorithm(WOA),Grey Wolf Optimizer(GWO),and Genetic Algorithm(GA)based on the seven unimodal benchmark functions.Then,ANOVA and Wilcoxon rank-sum tests are performed to confirm the effectiveness of the DTO compared to other optimization techniques.Additionally,to demonstrate the proposed algorithm’s suitability for solving complex realworld issues,DTO is used to solve the feature selection problem.The strategy of using DTOs as feature selection is evaluated using commonly used data sets from the University of California at Irvine(UCI)repository.The findings indicate that the DTO outperforms all other algorithms in addressing feature selection issues,demonstrating the proposed algorithm’s capabilities to solve complex real-world situations.
基金the Zhejiang Provincial Natural Science Foundation of China(no.LZ21F020001)the Basic Scientific Research Program of Wenzhou(no.S20220018).
文摘Whale optimization algorithm(WOA)tends to fall into the local optimum and fails to converge quickly in solving complex problems.To address the shortcomings,an improved WOA(QGBWOA)is proposed in this work.First,quasi-opposition-based learning is introduced to enhance the ability of WOA to search for optimal solutions.Second,a Gaussian barebone mechanism is embedded to promote diversity and expand the scope of the solution space in WOA.To verify the advantages of QGBWOA,comparison experiments between QGBWOA and its comparison peers were carried out on CEC 2014 with dimensions 10,30,50,and 100 and on CEC 2020 test with dimension 30.Furthermore,the performance results were tested using Wilcoxon signed-rank(WS),Friedman test,and post hoc statistical tests for statistical analysis.Convergence accuracy and speed are remarkably improved,as shown by experimental results.Finally,feature selection and multi-threshold image segmentation applications are demonstrated to validate the ability of QGBWOA to solve complex real-world problems.QGBWOA proves its superiority over compared algorithms in feature selection and multi-threshold image segmentation by performing several evaluation metrics.
基金supported by Special Fund of Fundamental Scientific Research Business Expense for Higher School of Central Government(ZY20180119)the Natural Science Foundation of Zhejiang Province(LZ22F020005)+1 种基金the Natural Science Foundation of Hebei Province(D2022512001)National Natural Science Foundation of China(42164002,62076185).
文摘The advent of Big Data has rendered Machine Learning tasks more intricate as they frequently involve higher-dimensional data.Feature Selection(FS)methods can abate the complexity of the data and enhance the accuracy,generalizability,and interpretability of models.Meta-heuristic algorithms are often utilized for FS tasks due to their low requirements and efficient performance.This paper introduces an augmented Forensic-Based Investigation algorithm(DCFBI)that incorporates a Dynamic Individual Selection(DIS)and crisscross(CC)mechanism to improve the pursuit phase of the FBI.Moreover,a binary version of DCFBI(BDCFBI)is applied to FS.Experiments conducted on IEEE CEC 2017 with other metaheuristics demonstrate that DCFBI surpasses them in search capability.The influence of different mechanisms on the original FBI is analyzed on benchmark functions,while its scalability is verified by comparing it with the original FBI on benchmarks with varied dimensions.BDCFBI is then applied to 18 real datasets from the UCI machine learning database and the Wieslaw dataset to select near-optimal features,which are then compared with six renowned binary metaheuristics.The results show that BDCFBI can be more competitive than similar methods and acquire a subset of features with superior classification accuracy.
基金supported by Natural Science Foundation of Liaoning Province under Grant 2021-MS-272Educational Committee project of Liaoning Province under Grant LJKQZ2021088.
文摘Feature Selection(FS)is considered as an important preprocessing step in data mining and is used to remove redundant or unrelated features from high-dimensional data.Most optimization algorithms for FS problems are not balanced in search.A hybrid algorithm called nonlinear binary grasshopper whale optimization algorithm(NL-BGWOA)is proposed to solve the problem in this paper.In the proposed method,a new position updating strategy combining the position changes of whales and grasshoppers population is expressed,which optimizes the diversity of searching in the target domain.Ten distinct high-dimensional UCI datasets,the multi-modal Parkinson's speech datasets,and the COVID-19 symptom dataset are used to validate the proposed method.It has been demonstrated that the proposed NL-BGWOA performs well across most of high-dimensional datasets,which shows a high accuracy rate of up to 0.9895.Furthermore,the experimental results on the medical datasets also demonstrate the advantages of the proposed method in actual FS problem,including accuracy,size of feature subsets,and fitness with best values of 0.913,5.7,and 0.0873,respectively.The results reveal that the proposed NL-BGWOA has comprehensive superiority in solving the FS problem of high-dimensional data.
基金National Science Foundation of China(Nos.U1736105,61572259,41942017)The authors extend their appreciation to the Deanship of Scientific Research at King Saud University for funding this work through research group no.RGP-VPP-264.
文摘Feature selection has been widely used in data mining and machine learning.Its objective is to select a minimal subset of features according to some reasonable criteria so as to solve the original task more quickly.In this article,a feature selection algorithm with local search strategy based on the forest optimization algorithm,namely FSLSFOA,is proposed.The novel local search strategy in local seeding process guarantees the quality of the feature subset in the forest.Next,the fitness function is improved,which not only considers the classification accuracy,but also considers the size of the feature subset.To avoid falling into local optimum,a novel global seeding method is attempted,which selects trees on the bottom of candidate set and gives the algorithm more diversities.Finally,FSLSFOA is compared with four feature selection methods to verify its effectiveness.Most of the results are superior to these comparative methods.
文摘Selecting the most relevant subset of features from a dataset is a vital step in data mining and machine learning.Each feature in a dataset has 2n possible subsets,making it challenging to select the optimum collection of features using typical methods.As a result,a new metaheuristicsbased feature selection method based on the dipper-throated and grey-wolf optimization(DTO-GW)algorithms has been developed in this research.Instability can result when the selection of features is subject to metaheuristics,which can lead to a wide range of results.Thus,we adopted hybrid optimization in our method of optimizing,which allowed us to better balance exploration and harvesting chores more equitably.We propose utilizing the binary DTO-GW search approach we previously devised for selecting the optimal subset of attributes.In the proposed method,the number of features selected is minimized,while classification accuracy is increased.To test the proposed method’s performance against eleven other state-of-theart approaches,eight datasets from the UCI repository were used,such as binary grey wolf search(bGWO),binary hybrid grey wolf,and particle swarm optimization(bGWO-PSO),bPSO,binary stochastic fractal search(bSFS),binary whale optimization algorithm(bWOA),binary modified grey wolf optimization(bMGWO),binary multiverse optimization(bMVO),binary bowerbird optimization(bSBO),binary hysteresis optimization(bHy),and binary hysteresis optimization(bHWO).The suggested method is superior 4532 CMC,2023,vol.74,no.2 and successful in handling the problem of feature selection,according to the results of the experiments.