Existing feature selection methods for intrusion detection systems in the Industrial Internet of Things often suffer from local optimality and high computational complexity.These challenges hinder traditional IDS from...Existing feature selection methods for intrusion detection systems in the Industrial Internet of Things often suffer from local optimality and high computational complexity.These challenges hinder traditional IDS from effectively extracting features while maintaining detection accuracy.This paper proposes an industrial Internet ofThings intrusion detection feature selection algorithm based on an improved whale optimization algorithm(GSLDWOA).The aim is to address the problems that feature selection algorithms under high-dimensional data are prone to,such as local optimality,long detection time,and reduced accuracy.First,the initial population’s diversity is increased using the Gaussian Mutation mechanism.Then,Non-linear Shrinking Factor balances global exploration and local development,avoiding premature convergence.Lastly,Variable-step Levy Flight operator and Dynamic Differential Evolution strategy are introduced to improve the algorithm’s search efficiency and convergence accuracy in highdimensional feature space.Experiments on the NSL-KDD and WUSTL-IIoT-2021 datasets demonstrate that the feature subset selected by GSLDWOA significantly improves detection performance.Compared to the traditional WOA algorithm,the detection rate and F1-score increased by 3.68%and 4.12%.On the WUSTL-IIoT-2021 dataset,accuracy,recall,and F1-score all exceed 99.9%.展开更多
Multi-label feature selection(MFS)is a crucial dimensionality reduction technique aimed at identifying informative features associated with multiple labels.However,traditional centralized methods face significant chal...Multi-label feature selection(MFS)is a crucial dimensionality reduction technique aimed at identifying informative features associated with multiple labels.However,traditional centralized methods face significant challenges in privacy-sensitive and distributed settings,often neglecting label dependencies and suffering from low computational efficiency.To address these issues,we introduce a novel framework,Fed-MFSDHBCPSO—federated MFS via dual-layer hybrid breeding cooperative particle swarm optimization algorithm with manifold and sparsity regularization(DHBCPSO-MSR).Leveraging the federated learning paradigm,Fed-MFSDHBCPSO allows clients to perform local feature selection(FS)using DHBCPSO-MSR.Locally selected feature subsets are encrypted with differential privacy(DP)and transmitted to a central server,where they are securely aggregated and refined through secure multi-party computation(SMPC)until global convergence is achieved.Within each client,DHBCPSO-MSR employs a dual-layer FS strategy.The inner layer constructs sample and label similarity graphs,generates Laplacian matrices to capture the manifold structure between samples and labels,and applies L2,1-norm regularization to sparsify the feature subset,yielding an optimized feature weight matrix.The outer layer uses a hybrid breeding cooperative particle swarm optimization algorithm to further refine the feature weight matrix and identify the optimal feature subset.The updated weight matrix is then fed back to the inner layer for further optimization.Comprehensive experiments on multiple real-world multi-label datasets demonstrate that Fed-MFSDHBCPSO consistently outperforms both centralized and federated baseline methods across several key evaluation metrics.展开更多
In recent years, particle swarm optimization (PSO) has received widespread attention in feature selection due to its simplicity and potential for global search. However, in traditional PSO, particles primarily update ...In recent years, particle swarm optimization (PSO) has received widespread attention in feature selection due to its simplicity and potential for global search. However, in traditional PSO, particles primarily update based on two extreme values: personal best and global best, which limits the diversity of information. Ideally, particles should learn from multiple advantageous particles to enhance interactivity and optimization efficiency. Accordingly, this paper proposes a PSO that simulates the evolutionary dynamics of species survival in mountain peak ecology (PEPSO) for feature selection. Based on the pyramid topology, the algorithm simulates the features of mountain peak ecology in nature and the competitive-cooperative strategies among species. According to the principles of the algorithm, the population is first adaptively divided into many subgroups based on the fitness level of particles. Then, particles within each subgroup are divided into three different types based on their evolutionary levels, employing different adaptive inertia weight rules and dynamic learning mechanisms to define distinct learning modes. Consequently, all particles play their respective roles in promoting the global optimization performance of the algorithm, similar to different species in the ecological pattern of mountain peaks. Experimental validation of the PEPSO performance was conducted on 18 public datasets. The experimental results demonstrate that the PEPSO outperforms other PSO variant-based feature selection methods and mainstream feature selection methods based on intelligent optimization algorithms in terms of overall performance in global search capability, classification accuracy, and reduction of feature space dimensions. Wilcoxon signed-rank test also confirms the excellent performance of the PEPSO.展开更多
Software defect prediction(SDP)aims to find a reliable method to predict defects in specific software projects and help software engineers allocate limited resources to release high-quality software products.Software ...Software defect prediction(SDP)aims to find a reliable method to predict defects in specific software projects and help software engineers allocate limited resources to release high-quality software products.Software defect prediction can be effectively performed using traditional features,but there are some redundant or irrelevant features in them(the presence or absence of this feature has little effect on the prediction results).These problems can be solved using feature selection.However,existing feature selection methods have shortcomings such as insignificant dimensionality reduction effect and low classification accuracy of the selected optimal feature subset.In order to reduce the impact of these shortcomings,this paper proposes a new feature selection method Cubic TraverseMa Beluga whale optimization algorithm(CTMBWO)based on the improved Beluga whale optimization algorithm(BWO).The goal of this study is to determine how well the CTMBWO can extract the features that are most important for correctly predicting software defects,improve the accuracy of fault prediction,reduce the number of the selected feature and mitigate the risk of overfitting,thereby achieving more efficient resource utilization and better distribution of test workload.The CTMBWO comprises three main stages:preprocessing the dataset,selecting relevant features,and evaluating the classification performance of the model.The novel feature selection method can effectively improve the performance of SDP.This study performs experiments on two software defect datasets(PROMISE,NASA)and shows the method’s classification performance using four detailed evaluation metrics,Accuracy,F1-score,MCC,AUC and Recall.The results indicate that the approach presented in this paper achieves outstanding classification performance on both datasets and has significant improvement over the baseline models.展开更多
In recent years,feature selection(FS)optimization of high-dimensional gene expression data has become one of the most promising approaches for cancer prediction and classification.This work reviews FS and classificati...In recent years,feature selection(FS)optimization of high-dimensional gene expression data has become one of the most promising approaches for cancer prediction and classification.This work reviews FS and classification methods that utilize evolutionary algorithms(EAs)for gene expression profiles in cancer or medical applications based on research motivations,challenges,and recommendations.Relevant studies were retrieved from four major academic databases-IEEE,Scopus,Springer,and ScienceDirect-using the keywords‘cancer classification’,‘optimization’,‘FS’,and‘gene expression profile’.A total of 67 papers were finally selected with key advancements identified as follows:(1)The majority of papers(44.8%)focused on developing algorithms and models for FS and classification.(2)The second category encompassed studies on biomarker identification by EAs,including 20 papers(30%).(3)The third category comprised works that applied FS to cancer data for decision support system purposes,addressing high-dimensional data and the formulation of chromosome length.These studies accounted for 12%of the total number of studies.(4)The remaining three papers(4.5%)were reviews and surveys focusing on models and developments in prediction and classification optimization for cancer classification under current technical conditions.This review highlights the importance of optimizing FS in EAs to manage high-dimensional data effectively.Despite recent advancements,significant limitations remain:the dynamic formulation of chromosome length remains an underexplored area.Thus,further research is needed on dynamic-length chromosome techniques for more sophisticated biomarker gene selection techniques.The findings suggest that further advancements in dynamic chromosome length formulations and adaptive algorithms could enhance cancer classification accuracy and efficiency.展开更多
With the birth of Software-Defined Networking(SDN),integration of both SDN and traditional architectures becomes the development trend of computer networks.Network intrusion detection faces challenges in dealing with ...With the birth of Software-Defined Networking(SDN),integration of both SDN and traditional architectures becomes the development trend of computer networks.Network intrusion detection faces challenges in dealing with complex attacks in SDN environments,thus to address the network security issues from the viewpoint of Artificial Intelligence(AI),this paper introduces the Crayfish Optimization Algorithm(COA)to the field of intrusion detection for both SDN and traditional network architectures,and based on the characteristics of the original COA,an Improved Crayfish Optimization Algorithm(ICOA)is proposed by integrating strategies of elite reverse learning,Levy flight,crowding factor and parameter modification.The ICOA is then utilized for AI-integrated feature selection of intrusion detection for both SDN and traditional network architectures,to reduce the dimensionality of the data and improve the performance of network intrusion detection.Finally,the performance evaluation is performed by testing not only the NSL-KDD dataset and the UNSW-NB 15 dataset for traditional networks but also the InSDN dataset for SDN-based networks.Experimental results show that ICOA improves the accuracy by 0.532%and 2.928%respectively compared with GWO and COA in traditional networks.In SDN networks,the accuracy of ICOA is 0.25%and 0.3%higher than COA and PSO.These findings collectively indicate that AI-integrated feature selection based on the proposed ICOA can promote network intrusion detection for both SDN and traditional architectures.展开更多
Heart disease prediction is a critical issue in healthcare,where accurate early diagnosis can save lives and reduce healthcare costs.The problem is inherently complex due to the high dimensionality of medical data,irr...Heart disease prediction is a critical issue in healthcare,where accurate early diagnosis can save lives and reduce healthcare costs.The problem is inherently complex due to the high dimensionality of medical data,irrelevant or redundant features,and the variability in risk factors such as age,lifestyle,andmedical history.These challenges often lead to inefficient and less accuratemodels.Traditional predictionmethodologies face limitations in effectively handling large feature sets and optimizing classification performance,which can result in overfitting poor generalization,and high computational cost.This work proposes a novel classification model for heart disease prediction that addresses these challenges by integrating feature selection through a Genetic Algorithm(GA)with an ensemble deep learning approach optimized using the Tunicate Swarm Algorithm(TSA).GA selects the most relevant features,reducing dimensionality and improvingmodel efficiency.Theselected features are then used to train an ensemble of deep learning models,where the TSA optimizes the weight of each model in the ensemble to enhance prediction accuracy.This hybrid approach addresses key challenges in the field,such as high dimensionality,redundant features,and classification performance,by introducing an efficient feature selection mechanism and optimizing the weighting of deep learning models in the ensemble.These enhancements result in a model that achieves superior accuracy,generalization,and efficiency compared to traditional methods.The proposed model demonstrated notable advancements in both prediction accuracy and computational efficiency over traditionalmodels.Specifically,it achieved an accuracy of 97.5%,a sensitivity of 97.2%,and a specificity of 97.8%.Additionally,with a 60-40 data split and 5-fold cross-validation,the model showed a significant reduction in training time(90 s),memory consumption(950 MB),and CPU usage(80%),highlighting its effectiveness in processing large,complex medical datasets for heart disease prediction.展开更多
The exponential growth of data in recent years has introduced significant challenges in managing high-dimensional datasets,particularly in industrial contexts where efficient data handling and process innovation are c...The exponential growth of data in recent years has introduced significant challenges in managing high-dimensional datasets,particularly in industrial contexts where efficient data handling and process innovation are critical.Feature selection,an essential step in data-driven process innovation,aims to identify the most relevant features to improve model interpretability,reduce complexity,and enhance predictive accuracy.To address the limitations of existing feature selection methods,this study introduces a novel wrapper-based feature selection framework leveraging the recently proposed Arctic Puffin Optimization(APO)algorithm.Specifically,we incorporate a specialized conversion mechanism to effectively adapt APO from continuous optimization to discrete,binary feature selection problems.Moreover,we introduce a fully parallelized implementation of APO in which both the search operators and fitness evaluations are executed concurrently using MATLAB’s Parallel Computing Toolbox.This parallel design significantly improves runtime efficiency and scalability,particularly for high-dimensional feature spaces.Extensive comparative experiments conducted against 14 state-of-the-art metaheuristic algorithms across 15 benchmark datasets reveal that the proposed APO-based method consistently achieves superior classification accuracy while selecting fewer features.These findings highlight the robustness and effectiveness of APO,validating its potential for advancing process innovation,economic productivity and smart city application in real-world machine learning scenarios.展开更多
Planetary gear train is a prominent component of helicopter transmission system and its health is of great significance for the flight safety of the helicopter.During health condition monitoring,the selection of a fau...Planetary gear train is a prominent component of helicopter transmission system and its health is of great significance for the flight safety of the helicopter.During health condition monitoring,the selection of a fault sensitive feature subset is meaningful for fault diagnosis of helicopter planetary gear train.According to actual situation,this paper proposed a multi-criteria fusion feature selection algorithm (MCFFSA) to identify an optimal feature subset from the highdimensional original feature space.In MCFFSA,a fault feature set of multiple domains,including time domain,frequency domain and wavelet domain,is first extracted from the raw vibration dataset.Four targeted criteria are then fused by multi-objective evolutionary algorithm based on decomposition (MOEA/D) to find Proto-efficient subsets,wherein two criteria for measuring diagnostic performance are assessed by sparse Bayesian extreme learning machine (SBELM).Further,Fmeasure is adopted to identify the optimal feature subset,which was employed for subsequent fault diagnosis.The effectiveness of MCFFSA is validated through six fault recognition datasets from a real helicopter transmission platform.The experimental results illustrate the superiority of combination of MOEA/D and SBELM in MCFFSA,and comparative analysis demonstrates that the optimal feature subset provided by MCFFSA can achieve a better diagnosis performance than other algorithms.展开更多
The advent of Big Data has rendered Machine Learning tasks more intricate as they frequently involve higher-dimensional data.Feature Selection(FS)methods can abate the complexity of the data and enhance the accuracy,g...The advent of Big Data has rendered Machine Learning tasks more intricate as they frequently involve higher-dimensional data.Feature Selection(FS)methods can abate the complexity of the data and enhance the accuracy,generalizability,and interpretability of models.Meta-heuristic algorithms are often utilized for FS tasks due to their low requirements and efficient performance.This paper introduces an augmented Forensic-Based Investigation algorithm(DCFBI)that incorporates a Dynamic Individual Selection(DIS)and crisscross(CC)mechanism to improve the pursuit phase of the FBI.Moreover,a binary version of DCFBI(BDCFBI)is applied to FS.Experiments conducted on IEEE CEC 2017 with other metaheuristics demonstrate that DCFBI surpasses them in search capability.The influence of different mechanisms on the original FBI is analyzed on benchmark functions,while its scalability is verified by comparing it with the original FBI on benchmarks with varied dimensions.BDCFBI is then applied to 18 real datasets from the UCI machine learning database and the Wieslaw dataset to select near-optimal features,which are then compared with six renowned binary metaheuristics.The results show that BDCFBI can be more competitive than similar methods and acquire a subset of features with superior classification accuracy.展开更多
Prediction plays a vital role in decision making. Correct prediction leads to right decision making to save the life, energy,efforts, money and time. The right decision prevents physical and material losses and it is ...Prediction plays a vital role in decision making. Correct prediction leads to right decision making to save the life, energy,efforts, money and time. The right decision prevents physical and material losses and it is practiced in all the fields including medical,finance, environmental studies, engineering and emerging technologies. Prediction is carried out by a model called classifier. The predictive accuracy of the classifier highly depends on the training datasets utilized for training the classifier. The irrelevant and redundant features of the training dataset reduce the accuracy of the classifier. Hence, the irrelevant and redundant features must be removed from the training dataset through the process known as feature selection. This paper proposes a feature selection algorithm namely unsupervised learning with ranking based feature selection(FSULR). It removes redundant features by clustering and eliminates irrelevant features by statistical measures to select the most significant features from the training dataset. The performance of this proposed algorithm is compared with the other seven feature selection algorithms by well known classifiers namely naive Bayes(NB),instance based(IB1) and tree based J48. Experimental results show that the proposed algorithm yields better prediction accuracy for classifiers.展开更多
Feature Selection(FS)is considered as an important preprocessing step in data mining and is used to remove redundant or unrelated features from high-dimensional data.Most optimization algorithms for FS problems are no...Feature Selection(FS)is considered as an important preprocessing step in data mining and is used to remove redundant or unrelated features from high-dimensional data.Most optimization algorithms for FS problems are not balanced in search.A hybrid algorithm called nonlinear binary grasshopper whale optimization algorithm(NL-BGWOA)is proposed to solve the problem in this paper.In the proposed method,a new position updating strategy combining the position changes of whales and grasshoppers population is expressed,which optimizes the diversity of searching in the target domain.Ten distinct high-dimensional UCI datasets,the multi-modal Parkinson's speech datasets,and the COVID-19 symptom dataset are used to validate the proposed method.It has been demonstrated that the proposed NL-BGWOA performs well across most of high-dimensional datasets,which shows a high accuracy rate of up to 0.9895.Furthermore,the experimental results on the medical datasets also demonstrate the advantages of the proposed method in actual FS problem,including accuracy,size of feature subsets,and fitness with best values of 0.913,5.7,and 0.0873,respectively.The results reveal that the proposed NL-BGWOA has comprehensive superiority in solving the FS problem of high-dimensional data.展开更多
The motivation of data mining is how to extract effective information from huge data in very large database. However, some redundant and irrelevant attributes, which result in low performance and high computing comple...The motivation of data mining is how to extract effective information from huge data in very large database. However, some redundant and irrelevant attributes, which result in low performance and high computing complexity, are included in the very large database in general.So, Feature Subset Selection (FSS) becomes one important issue in the field of data mining. In this letter, an FSS model based on the filter approach is built, which uses the simulated annealing genetic algorithm. Experimental results show that convergence and stability of this algorithm are adequately achieved.展开更多
Feature Selection(FS)is an important problem that involves selecting the most informative subset of features from a dataset to improve classification accuracy.However,due to the high dimensionality and complexity of t...Feature Selection(FS)is an important problem that involves selecting the most informative subset of features from a dataset to improve classification accuracy.However,due to the high dimensionality and complexity of the dataset,most optimization algorithms for feature selection suffer from a balance issue during the search process.Therefore,the present paper proposes a hybrid Sine-Cosine Chimp Optimization Algorithm(SCChOA)to address the feature selection problem.In this approach,firstly,a multi-cycle iterative strategy is designed to better combine the Sine-Cosine Algorithm(SCA)and the Chimp Optimization Algorithm(ChOA),enabling a more effective search in the objective space.Secondly,an S-shaped transfer function is introduced to perform binary transformation on SCChOA.Finally,the binary SCChOA is combined with the K-Nearest Neighbor(KNN)classifier to form a novel binary hybrid wrapper feature selection method.To evaluate the performance of the proposed method,16 datasets from different dimensions of the UCI repository along with four evaluation metrics of average fitness value,average classification accuracy,average feature selection number,and average running time are considered.Meanwhile,seven state-of-the-art metaheuristic algorithms for solving the feature selection problem are chosen for comparison.Experimental results demonstrate that the proposed method outperforms other compared algorithms in solving the feature selection problem.It is capable of maximizing the reduction in the number of selected features while maintaining a high classification accuracy.Furthermore,the results of statistical tests also confirm the significant effectiveness of this method.展开更多
The eigenface method that uses principal component analysis(PCA) has been the standard and popular method used in face recognition.This paper presents a PCA-memetic algorithm(PCA-MA) approach for feature selection.PCA...The eigenface method that uses principal component analysis(PCA) has been the standard and popular method used in face recognition.This paper presents a PCA-memetic algorithm(PCA-MA) approach for feature selection.PCA has been extended by MAs where the former was used for feature extraction/dimensionality reduction and the latter exploited for feature selection.Simulations were performed over ORL and YaleB face databases using Euclidean norm as the classifier.It was found that as far as the recognition rate is concerned,PCA-MA completely outperforms the eigenface method.We compared the performance of PCA extended with genetic algorithm(PCA-GA) with our proposed PCA-MA method.The results also clearly established the supremacy of the PCA-MA method over the PCA-GA method.We further extended linear discriminant analysis(LDA) and kernel principal component analysis(KPCA) approaches with the MA and observed significant improvement in recognition rate with fewer features.This paper also compares the performance of PCA-MA,LDA-MA and KPCA-MA approaches.展开更多
Parkinson’s disease is a neurodegenerative disorder that inflicts irreversible damage on humans.Some experimental data regarding Parkinson’s patients are redundant and irrelevant,posing significant challenges for di...Parkinson’s disease is a neurodegenerative disorder that inflicts irreversible damage on humans.Some experimental data regarding Parkinson’s patients are redundant and irrelevant,posing significant challenges for disease detection.Therefore,there is a need to devise an effective method for the selective extraction of disease-specific information,ensuring both accuracy and the utilization of fewer features.In this paper,a Binary Hybrid Artificial Hummingbird and Flower Pollination Algorithm(FPA),called BFAHA,is proposed to solve the problem of Parkinson’s disease diagnosis based on speech signals.First,combining FPA with Artificial Hummingbird Algorithm(AHA)can take advantage of the strong global exploration ability possessed by FPA to improve the disadvantages of AHA,such as premature convergence and easy falling into local optimum.Second,the Hemming distance is used to determine the difference between the other individuals in the population and the optimal individual after each iteration,if the difference is too significant,the cross-mutation strategy in the genetic algorithm(GA)is used to induce the population individuals to keep approaching the optimal individual in the random search process to speed up finding the optimal solution.Finally,an S-shaped function converts the improved algorithm into a binary version to suit the characteristics of the feature selection(FS)tasks.In this paper,10 high-dimensional datasets from UCI and the ASU are used to test the performance of BFAHA and apply it to Parkinson’s disease diagnosis.Compared with other state-of-the-art algorithms,BFAHA shows excellent competitiveness in both the test datasets and the classification problem,indicating that the algorithm proposed in this study has apparent advantages in the field of feature selection.展开更多
Feature selection(FS)plays a crucial role in pre-processing machine learning datasets,as it eliminates redundant features to improve classification accuracy and reduce computational costs.This paper presents an enhanc...Feature selection(FS)plays a crucial role in pre-processing machine learning datasets,as it eliminates redundant features to improve classification accuracy and reduce computational costs.This paper presents an enhanced approach to FS for software fault prediction,specifically by enhancing the binary dwarf mongoose optimization(BDMO)algorithm with a crossover mechanism and a modified positioning updating formula.The proposed approach,termed iBDMOcr,aims to fortify exploration capability,promote population diversity,and lastly improve the wrapper-based FS process for software fault prediction tasks.iBDMOcr gained superb performance compared to other well-esteemed optimization methods across 17 benchmark datasets.It ranked first in 11 out of 17 datasets in terms of average classification accuracy.Moreover,iBDMOcr outperformed other methods in terms of average fitness values and number of selected features across all datasets.The findings demonstrate the effectiveness of iBDMOcr in addressing FS problems in software fault prediction,leading to more accurate and efficient models.展开更多
As a crucial data preprocessing method in data mining,feature selection(FS)can be regarded as a bi-objective optimization problem that aims to maximize classification accuracy and minimize the number of selected featu...As a crucial data preprocessing method in data mining,feature selection(FS)can be regarded as a bi-objective optimization problem that aims to maximize classification accuracy and minimize the number of selected features.Evolutionary computing(EC)is promising for FS owing to its powerful search capability.However,in traditional EC-based methods,feature subsets are represented via a length-fixed individual encoding.It is ineffective for high-dimensional data,because it results in a huge search space and prohibitive training time.This work proposes a length-adaptive non-dominated sorting genetic algorithm(LA-NSGA)with a length-variable individual encoding and a length-adaptive evolution mechanism for bi-objective highdimensional FS.In LA-NSGA,an initialization method based on correlation and redundancy is devised to initialize individuals of diverse lengths,and a Pareto dominance-based length change operator is introduced to guide individuals to explore in promising search space adaptively.Moreover,a dominance-based local search method is employed for further improvement.The experimental results based on 12 high-dimensional gene datasets show that the Pareto front of feature subsets produced by LA-NSGA is superior to those of existing algorithms.展开更多
In classification problems,datasets often contain a large amount of features,but not all of them are relevant for accurate classification.In fact,irrelevant features may even hinder classification accuracy.Feature sel...In classification problems,datasets often contain a large amount of features,but not all of them are relevant for accurate classification.In fact,irrelevant features may even hinder classification accuracy.Feature selection aims to alleviate this issue by minimizing the number of features in the subset while simultaneously minimizing the classification error rate.Single-objective optimization approaches employ an evaluation function designed as an aggregate function with a parameter,but the results obtained depend on the value of the parameter.To eliminate this parameter’s influence,the problem can be reformulated as a multi-objective optimization problem.The Whale Optimization Algorithm(WOA)is widely used in optimization problems because of its simplicity and easy implementation.In this paper,we propose a multi-strategy assisted multi-objective WOA(MSMOWOA)to address feature selection.To enhance the algorithm’s search ability,we integrate multiple strategies such as Levy flight,Grey Wolf Optimizer,and adaptive mutation into it.Additionally,we utilize an external repository to store non-dominant solution sets and grid technology is used to maintain diversity.Results on fourteen University of California Irvine(UCI)datasets demonstrate that our proposed method effectively removes redundant features and improves classification performance.The source code can be accessed from the website:https://github.com/zc0315/MSMOWOA.展开更多
Pavement crack detection plays a crucial role in ensuring road safety and reducing maintenance expenses.Recent advancements in deep learning(DL)techniques have shown promising results in detecting pavement cracks;howe...Pavement crack detection plays a crucial role in ensuring road safety and reducing maintenance expenses.Recent advancements in deep learning(DL)techniques have shown promising results in detecting pavement cracks;however,the selection of relevant features for classification remains challenging.In this study,we propose a new approach for pavement crack detection that integrates deep learning for feature extraction,the whale optimization algorithm(WOA)for feature selection,and random forest(RF)for classification.The performance of the models was evaluated using accuracy,recall,precision,F1 score,and area under the receiver operating characteristic curve(AUC).Our findings reveal that Model 2,which incorporates RF into the ResNet-18 architecture,outperforms baseline Model 1 across all evaluation metrics.Nevertheless,our proposed model,which combines ResNet-18 with both WOA and RF,achieves significantly higher accuracy,recall,precision,and F1 score compared to the other two models.These results underscore the effectiveness of integrating RF and WOA into ResNet-18 for pavement crack detection applications.We applied the proposed approach to a dataset of pavement images,achieving an accuracy of 97.16%and an AUC of 0.984.Our results demonstrate that the proposed approach surpasses existing methods for pavement crack detection,offering a promising solution for the automatic identification of pavement cracks.By leveraging this approach,potential safety hazards can be identified more effectively,enabling timely repairs and maintenance measures.Lastly,the findings of this study also emphasize the potential of integrating RF and WOA with deep learning for pavement crack detection,providing road authorities with the necessary tools to make informed decisions regarding road infrastructure maintenance.展开更多
基金supported by the Major Science and Technology Programs in Henan Province(No.241100210100)Henan Provincial Science and Technology Research Project(No.252102211085,No.252102211105)+3 种基金Endogenous Security Cloud Network Convergence R&D Center(No.602431011PQ1)The Special Project for Research and Development in Key Areas of Guangdong Province(No.2021ZDZX1098)The Stabilization Support Program of Science,Technology and Innovation Commission of Shenzhen Municipality(No.20231128083944001)The Key scientific research projects of Henan higher education institutions(No.24A520042).
文摘Existing feature selection methods for intrusion detection systems in the Industrial Internet of Things often suffer from local optimality and high computational complexity.These challenges hinder traditional IDS from effectively extracting features while maintaining detection accuracy.This paper proposes an industrial Internet ofThings intrusion detection feature selection algorithm based on an improved whale optimization algorithm(GSLDWOA).The aim is to address the problems that feature selection algorithms under high-dimensional data are prone to,such as local optimality,long detection time,and reduced accuracy.First,the initial population’s diversity is increased using the Gaussian Mutation mechanism.Then,Non-linear Shrinking Factor balances global exploration and local development,avoiding premature convergence.Lastly,Variable-step Levy Flight operator and Dynamic Differential Evolution strategy are introduced to improve the algorithm’s search efficiency and convergence accuracy in highdimensional feature space.Experiments on the NSL-KDD and WUSTL-IIoT-2021 datasets demonstrate that the feature subset selected by GSLDWOA significantly improves detection performance.Compared to the traditional WOA algorithm,the detection rate and F1-score increased by 3.68%and 4.12%.On the WUSTL-IIoT-2021 dataset,accuracy,recall,and F1-score all exceed 99.9%.
文摘Multi-label feature selection(MFS)is a crucial dimensionality reduction technique aimed at identifying informative features associated with multiple labels.However,traditional centralized methods face significant challenges in privacy-sensitive and distributed settings,often neglecting label dependencies and suffering from low computational efficiency.To address these issues,we introduce a novel framework,Fed-MFSDHBCPSO—federated MFS via dual-layer hybrid breeding cooperative particle swarm optimization algorithm with manifold and sparsity regularization(DHBCPSO-MSR).Leveraging the federated learning paradigm,Fed-MFSDHBCPSO allows clients to perform local feature selection(FS)using DHBCPSO-MSR.Locally selected feature subsets are encrypted with differential privacy(DP)and transmitted to a central server,where they are securely aggregated and refined through secure multi-party computation(SMPC)until global convergence is achieved.Within each client,DHBCPSO-MSR employs a dual-layer FS strategy.The inner layer constructs sample and label similarity graphs,generates Laplacian matrices to capture the manifold structure between samples and labels,and applies L2,1-norm regularization to sparsify the feature subset,yielding an optimized feature weight matrix.The outer layer uses a hybrid breeding cooperative particle swarm optimization algorithm to further refine the feature weight matrix and identify the optimal feature subset.The updated weight matrix is then fed back to the inner layer for further optimization.Comprehensive experiments on multiple real-world multi-label datasets demonstrate that Fed-MFSDHBCPSO consistently outperforms both centralized and federated baseline methods across several key evaluation metrics.
文摘In recent years, particle swarm optimization (PSO) has received widespread attention in feature selection due to its simplicity and potential for global search. However, in traditional PSO, particles primarily update based on two extreme values: personal best and global best, which limits the diversity of information. Ideally, particles should learn from multiple advantageous particles to enhance interactivity and optimization efficiency. Accordingly, this paper proposes a PSO that simulates the evolutionary dynamics of species survival in mountain peak ecology (PEPSO) for feature selection. Based on the pyramid topology, the algorithm simulates the features of mountain peak ecology in nature and the competitive-cooperative strategies among species. According to the principles of the algorithm, the population is first adaptively divided into many subgroups based on the fitness level of particles. Then, particles within each subgroup are divided into three different types based on their evolutionary levels, employing different adaptive inertia weight rules and dynamic learning mechanisms to define distinct learning modes. Consequently, all particles play their respective roles in promoting the global optimization performance of the algorithm, similar to different species in the ecological pattern of mountain peaks. Experimental validation of the PEPSO performance was conducted on 18 public datasets. The experimental results demonstrate that the PEPSO outperforms other PSO variant-based feature selection methods and mainstream feature selection methods based on intelligent optimization algorithms in terms of overall performance in global search capability, classification accuracy, and reduction of feature space dimensions. Wilcoxon signed-rank test also confirms the excellent performance of the PEPSO.
文摘Software defect prediction(SDP)aims to find a reliable method to predict defects in specific software projects and help software engineers allocate limited resources to release high-quality software products.Software defect prediction can be effectively performed using traditional features,but there are some redundant or irrelevant features in them(the presence or absence of this feature has little effect on the prediction results).These problems can be solved using feature selection.However,existing feature selection methods have shortcomings such as insignificant dimensionality reduction effect and low classification accuracy of the selected optimal feature subset.In order to reduce the impact of these shortcomings,this paper proposes a new feature selection method Cubic TraverseMa Beluga whale optimization algorithm(CTMBWO)based on the improved Beluga whale optimization algorithm(BWO).The goal of this study is to determine how well the CTMBWO can extract the features that are most important for correctly predicting software defects,improve the accuracy of fault prediction,reduce the number of the selected feature and mitigate the risk of overfitting,thereby achieving more efficient resource utilization and better distribution of test workload.The CTMBWO comprises three main stages:preprocessing the dataset,selecting relevant features,and evaluating the classification performance of the model.The novel feature selection method can effectively improve the performance of SDP.This study performs experiments on two software defect datasets(PROMISE,NASA)and shows the method’s classification performance using four detailed evaluation metrics,Accuracy,F1-score,MCC,AUC and Recall.The results indicate that the approach presented in this paper achieves outstanding classification performance on both datasets and has significant improvement over the baseline models.
基金funded by the Ministry of Higher Education of Malaysia,grant number FRGS/1/2022/ICT02/UPSI/02/1.
文摘In recent years,feature selection(FS)optimization of high-dimensional gene expression data has become one of the most promising approaches for cancer prediction and classification.This work reviews FS and classification methods that utilize evolutionary algorithms(EAs)for gene expression profiles in cancer or medical applications based on research motivations,challenges,and recommendations.Relevant studies were retrieved from four major academic databases-IEEE,Scopus,Springer,and ScienceDirect-using the keywords‘cancer classification’,‘optimization’,‘FS’,and‘gene expression profile’.A total of 67 papers were finally selected with key advancements identified as follows:(1)The majority of papers(44.8%)focused on developing algorithms and models for FS and classification.(2)The second category encompassed studies on biomarker identification by EAs,including 20 papers(30%).(3)The third category comprised works that applied FS to cancer data for decision support system purposes,addressing high-dimensional data and the formulation of chromosome length.These studies accounted for 12%of the total number of studies.(4)The remaining three papers(4.5%)were reviews and surveys focusing on models and developments in prediction and classification optimization for cancer classification under current technical conditions.This review highlights the importance of optimizing FS in EAs to manage high-dimensional data effectively.Despite recent advancements,significant limitations remain:the dynamic formulation of chromosome length remains an underexplored area.Thus,further research is needed on dynamic-length chromosome techniques for more sophisticated biomarker gene selection techniques.The findings suggest that further advancements in dynamic chromosome length formulations and adaptive algorithms could enhance cancer classification accuracy and efficiency.
基金supported by the National Natural Science Foundation of China under Grant 61602162the Hubei Provincial Science and Technology Plan Project under Grant 2023BCB041.
文摘With the birth of Software-Defined Networking(SDN),integration of both SDN and traditional architectures becomes the development trend of computer networks.Network intrusion detection faces challenges in dealing with complex attacks in SDN environments,thus to address the network security issues from the viewpoint of Artificial Intelligence(AI),this paper introduces the Crayfish Optimization Algorithm(COA)to the field of intrusion detection for both SDN and traditional network architectures,and based on the characteristics of the original COA,an Improved Crayfish Optimization Algorithm(ICOA)is proposed by integrating strategies of elite reverse learning,Levy flight,crowding factor and parameter modification.The ICOA is then utilized for AI-integrated feature selection of intrusion detection for both SDN and traditional network architectures,to reduce the dimensionality of the data and improve the performance of network intrusion detection.Finally,the performance evaluation is performed by testing not only the NSL-KDD dataset and the UNSW-NB 15 dataset for traditional networks but also the InSDN dataset for SDN-based networks.Experimental results show that ICOA improves the accuracy by 0.532%and 2.928%respectively compared with GWO and COA in traditional networks.In SDN networks,the accuracy of ICOA is 0.25%and 0.3%higher than COA and PSO.These findings collectively indicate that AI-integrated feature selection based on the proposed ICOA can promote network intrusion detection for both SDN and traditional architectures.
文摘Heart disease prediction is a critical issue in healthcare,where accurate early diagnosis can save lives and reduce healthcare costs.The problem is inherently complex due to the high dimensionality of medical data,irrelevant or redundant features,and the variability in risk factors such as age,lifestyle,andmedical history.These challenges often lead to inefficient and less accuratemodels.Traditional predictionmethodologies face limitations in effectively handling large feature sets and optimizing classification performance,which can result in overfitting poor generalization,and high computational cost.This work proposes a novel classification model for heart disease prediction that addresses these challenges by integrating feature selection through a Genetic Algorithm(GA)with an ensemble deep learning approach optimized using the Tunicate Swarm Algorithm(TSA).GA selects the most relevant features,reducing dimensionality and improvingmodel efficiency.Theselected features are then used to train an ensemble of deep learning models,where the TSA optimizes the weight of each model in the ensemble to enhance prediction accuracy.This hybrid approach addresses key challenges in the field,such as high dimensionality,redundant features,and classification performance,by introducing an efficient feature selection mechanism and optimizing the weighting of deep learning models in the ensemble.These enhancements result in a model that achieves superior accuracy,generalization,and efficiency compared to traditional methods.The proposed model demonstrated notable advancements in both prediction accuracy and computational efficiency over traditionalmodels.Specifically,it achieved an accuracy of 97.5%,a sensitivity of 97.2%,and a specificity of 97.8%.Additionally,with a 60-40 data split and 5-fold cross-validation,the model showed a significant reduction in training time(90 s),memory consumption(950 MB),and CPU usage(80%),highlighting its effectiveness in processing large,complex medical datasets for heart disease prediction.
文摘The exponential growth of data in recent years has introduced significant challenges in managing high-dimensional datasets,particularly in industrial contexts where efficient data handling and process innovation are critical.Feature selection,an essential step in data-driven process innovation,aims to identify the most relevant features to improve model interpretability,reduce complexity,and enhance predictive accuracy.To address the limitations of existing feature selection methods,this study introduces a novel wrapper-based feature selection framework leveraging the recently proposed Arctic Puffin Optimization(APO)algorithm.Specifically,we incorporate a specialized conversion mechanism to effectively adapt APO from continuous optimization to discrete,binary feature selection problems.Moreover,we introduce a fully parallelized implementation of APO in which both the search operators and fitness evaluations are executed concurrently using MATLAB’s Parallel Computing Toolbox.This parallel design significantly improves runtime efficiency and scalability,particularly for high-dimensional feature spaces.Extensive comparative experiments conducted against 14 state-of-the-art metaheuristic algorithms across 15 benchmark datasets reveal that the proposed APO-based method consistently achieves superior classification accuracy while selecting fewer features.These findings highlight the robustness and effectiveness of APO,validating its potential for advancing process innovation,economic productivity and smart city application in real-world machine learning scenarios.
基金co-supported by the Equipment Pre-research Foundation Project of China (No. JZX7Y20190243016301)Helicopter Transmission Technology Key Laboratory Foundation of China (No. KY-52-2018-0024)the Fundamental Research Funds for the Central Universities & Funding of Jiangsu Innovation Program for Graduate Education under Grant (No. KYLX16_0336)
文摘Planetary gear train is a prominent component of helicopter transmission system and its health is of great significance for the flight safety of the helicopter.During health condition monitoring,the selection of a fault sensitive feature subset is meaningful for fault diagnosis of helicopter planetary gear train.According to actual situation,this paper proposed a multi-criteria fusion feature selection algorithm (MCFFSA) to identify an optimal feature subset from the highdimensional original feature space.In MCFFSA,a fault feature set of multiple domains,including time domain,frequency domain and wavelet domain,is first extracted from the raw vibration dataset.Four targeted criteria are then fused by multi-objective evolutionary algorithm based on decomposition (MOEA/D) to find Proto-efficient subsets,wherein two criteria for measuring diagnostic performance are assessed by sparse Bayesian extreme learning machine (SBELM).Further,Fmeasure is adopted to identify the optimal feature subset,which was employed for subsequent fault diagnosis.The effectiveness of MCFFSA is validated through six fault recognition datasets from a real helicopter transmission platform.The experimental results illustrate the superiority of combination of MOEA/D and SBELM in MCFFSA,and comparative analysis demonstrates that the optimal feature subset provided by MCFFSA can achieve a better diagnosis performance than other algorithms.
基金supported by Special Fund of Fundamental Scientific Research Business Expense for Higher School of Central Government(ZY20180119)the Natural Science Foundation of Zhejiang Province(LZ22F020005)+1 种基金the Natural Science Foundation of Hebei Province(D2022512001)National Natural Science Foundation of China(42164002,62076185).
文摘The advent of Big Data has rendered Machine Learning tasks more intricate as they frequently involve higher-dimensional data.Feature Selection(FS)methods can abate the complexity of the data and enhance the accuracy,generalizability,and interpretability of models.Meta-heuristic algorithms are often utilized for FS tasks due to their low requirements and efficient performance.This paper introduces an augmented Forensic-Based Investigation algorithm(DCFBI)that incorporates a Dynamic Individual Selection(DIS)and crisscross(CC)mechanism to improve the pursuit phase of the FBI.Moreover,a binary version of DCFBI(BDCFBI)is applied to FS.Experiments conducted on IEEE CEC 2017 with other metaheuristics demonstrate that DCFBI surpasses them in search capability.The influence of different mechanisms on the original FBI is analyzed on benchmark functions,while its scalability is verified by comparing it with the original FBI on benchmarks with varied dimensions.BDCFBI is then applied to 18 real datasets from the UCI machine learning database and the Wieslaw dataset to select near-optimal features,which are then compared with six renowned binary metaheuristics.The results show that BDCFBI can be more competitive than similar methods and acquire a subset of features with superior classification accuracy.
文摘Prediction plays a vital role in decision making. Correct prediction leads to right decision making to save the life, energy,efforts, money and time. The right decision prevents physical and material losses and it is practiced in all the fields including medical,finance, environmental studies, engineering and emerging technologies. Prediction is carried out by a model called classifier. The predictive accuracy of the classifier highly depends on the training datasets utilized for training the classifier. The irrelevant and redundant features of the training dataset reduce the accuracy of the classifier. Hence, the irrelevant and redundant features must be removed from the training dataset through the process known as feature selection. This paper proposes a feature selection algorithm namely unsupervised learning with ranking based feature selection(FSULR). It removes redundant features by clustering and eliminates irrelevant features by statistical measures to select the most significant features from the training dataset. The performance of this proposed algorithm is compared with the other seven feature selection algorithms by well known classifiers namely naive Bayes(NB),instance based(IB1) and tree based J48. Experimental results show that the proposed algorithm yields better prediction accuracy for classifiers.
基金supported by Natural Science Foundation of Liaoning Province under Grant 2021-MS-272Educational Committee project of Liaoning Province under Grant LJKQZ2021088.
文摘Feature Selection(FS)is considered as an important preprocessing step in data mining and is used to remove redundant or unrelated features from high-dimensional data.Most optimization algorithms for FS problems are not balanced in search.A hybrid algorithm called nonlinear binary grasshopper whale optimization algorithm(NL-BGWOA)is proposed to solve the problem in this paper.In the proposed method,a new position updating strategy combining the position changes of whales and grasshoppers population is expressed,which optimizes the diversity of searching in the target domain.Ten distinct high-dimensional UCI datasets,the multi-modal Parkinson's speech datasets,and the COVID-19 symptom dataset are used to validate the proposed method.It has been demonstrated that the proposed NL-BGWOA performs well across most of high-dimensional datasets,which shows a high accuracy rate of up to 0.9895.Furthermore,the experimental results on the medical datasets also demonstrate the advantages of the proposed method in actual FS problem,including accuracy,size of feature subsets,and fitness with best values of 0.913,5.7,and 0.0873,respectively.The results reveal that the proposed NL-BGWOA has comprehensive superiority in solving the FS problem of high-dimensional data.
基金Supported by the Project of the Science and Technology Plan of Chongqing City
文摘The motivation of data mining is how to extract effective information from huge data in very large database. However, some redundant and irrelevant attributes, which result in low performance and high computing complexity, are included in the very large database in general.So, Feature Subset Selection (FSS) becomes one important issue in the field of data mining. In this letter, an FSS model based on the filter approach is built, which uses the simulated annealing genetic algorithm. Experimental results show that convergence and stability of this algorithm are adequately achieved.
基金supported by the Key Research and Development Project of Hubei Province(No.2023BAB094)the Key Project of Science and Technology Research Program of Hubei Educational Committee(No.D20211402)the Teaching Research Project of Hubei University of Technology(No.2020099).
文摘Feature Selection(FS)is an important problem that involves selecting the most informative subset of features from a dataset to improve classification accuracy.However,due to the high dimensionality and complexity of the dataset,most optimization algorithms for feature selection suffer from a balance issue during the search process.Therefore,the present paper proposes a hybrid Sine-Cosine Chimp Optimization Algorithm(SCChOA)to address the feature selection problem.In this approach,firstly,a multi-cycle iterative strategy is designed to better combine the Sine-Cosine Algorithm(SCA)and the Chimp Optimization Algorithm(ChOA),enabling a more effective search in the objective space.Secondly,an S-shaped transfer function is introduced to perform binary transformation on SCChOA.Finally,the binary SCChOA is combined with the K-Nearest Neighbor(KNN)classifier to form a novel binary hybrid wrapper feature selection method.To evaluate the performance of the proposed method,16 datasets from different dimensions of the UCI repository along with four evaluation metrics of average fitness value,average classification accuracy,average feature selection number,and average running time are considered.Meanwhile,seven state-of-the-art metaheuristic algorithms for solving the feature selection problem are chosen for comparison.Experimental results demonstrate that the proposed method outperforms other compared algorithms in solving the feature selection problem.It is capable of maximizing the reduction in the number of selected features while maintaining a high classification accuracy.Furthermore,the results of statistical tests also confirm the significant effectiveness of this method.
文摘The eigenface method that uses principal component analysis(PCA) has been the standard and popular method used in face recognition.This paper presents a PCA-memetic algorithm(PCA-MA) approach for feature selection.PCA has been extended by MAs where the former was used for feature extraction/dimensionality reduction and the latter exploited for feature selection.Simulations were performed over ORL and YaleB face databases using Euclidean norm as the classifier.It was found that as far as the recognition rate is concerned,PCA-MA completely outperforms the eigenface method.We compared the performance of PCA extended with genetic algorithm(PCA-GA) with our proposed PCA-MA method.The results also clearly established the supremacy of the PCA-MA method over the PCA-GA method.We further extended linear discriminant analysis(LDA) and kernel principal component analysis(KPCA) approaches with the MA and observed significant improvement in recognition rate with fewer features.This paper also compares the performance of PCA-MA,LDA-MA and KPCA-MA approaches.
基金supported by the National Natural Science Foundation of China under Grant Nos.U21A20464,62066005the Innovation Project of Guangxi Graduate Education under Grant No.YCSW2023259.
文摘Parkinson’s disease is a neurodegenerative disorder that inflicts irreversible damage on humans.Some experimental data regarding Parkinson’s patients are redundant and irrelevant,posing significant challenges for disease detection.Therefore,there is a need to devise an effective method for the selective extraction of disease-specific information,ensuring both accuracy and the utilization of fewer features.In this paper,a Binary Hybrid Artificial Hummingbird and Flower Pollination Algorithm(FPA),called BFAHA,is proposed to solve the problem of Parkinson’s disease diagnosis based on speech signals.First,combining FPA with Artificial Hummingbird Algorithm(AHA)can take advantage of the strong global exploration ability possessed by FPA to improve the disadvantages of AHA,such as premature convergence and easy falling into local optimum.Second,the Hemming distance is used to determine the difference between the other individuals in the population and the optimal individual after each iteration,if the difference is too significant,the cross-mutation strategy in the genetic algorithm(GA)is used to induce the population individuals to keep approaching the optimal individual in the random search process to speed up finding the optimal solution.Finally,an S-shaped function converts the improved algorithm into a binary version to suit the characteristics of the feature selection(FS)tasks.In this paper,10 high-dimensional datasets from UCI and the ASU are used to test the performance of BFAHA and apply it to Parkinson’s disease diagnosis.Compared with other state-of-the-art algorithms,BFAHA shows excellent competitiveness in both the test datasets and the classification problem,indicating that the algorithm proposed in this study has apparent advantages in the field of feature selection.
基金supported by the Deanship of Scientific Research and Innovation at Al-Balqa Applied University in Jordan.
文摘Feature selection(FS)plays a crucial role in pre-processing machine learning datasets,as it eliminates redundant features to improve classification accuracy and reduce computational costs.This paper presents an enhanced approach to FS for software fault prediction,specifically by enhancing the binary dwarf mongoose optimization(BDMO)algorithm with a crossover mechanism and a modified positioning updating formula.The proposed approach,termed iBDMOcr,aims to fortify exploration capability,promote population diversity,and lastly improve the wrapper-based FS process for software fault prediction tasks.iBDMOcr gained superb performance compared to other well-esteemed optimization methods across 17 benchmark datasets.It ranked first in 11 out of 17 datasets in terms of average classification accuracy.Moreover,iBDMOcr outperformed other methods in terms of average fitness values and number of selected features across all datasets.The findings demonstrate the effectiveness of iBDMOcr in addressing FS problems in software fault prediction,leading to more accurate and efficient models.
基金supported in part by the National Natural Science Foundation of China(62172065,62072060)。
文摘As a crucial data preprocessing method in data mining,feature selection(FS)can be regarded as a bi-objective optimization problem that aims to maximize classification accuracy and minimize the number of selected features.Evolutionary computing(EC)is promising for FS owing to its powerful search capability.However,in traditional EC-based methods,feature subsets are represented via a length-fixed individual encoding.It is ineffective for high-dimensional data,because it results in a huge search space and prohibitive training time.This work proposes a length-adaptive non-dominated sorting genetic algorithm(LA-NSGA)with a length-variable individual encoding and a length-adaptive evolution mechanism for bi-objective highdimensional FS.In LA-NSGA,an initialization method based on correlation and redundancy is devised to initialize individuals of diverse lengths,and a Pareto dominance-based length change operator is introduced to guide individuals to explore in promising search space adaptively.Moreover,a dominance-based local search method is employed for further improvement.The experimental results based on 12 high-dimensional gene datasets show that the Pareto front of feature subsets produced by LA-NSGA is superior to those of existing algorithms.
基金supported in part by the Natural Science Youth Foundation of Hebei Province under Grant F2019403207in part by the PhD Research Startup Foundation of Hebei GEO University under Grant BQ2019055+3 种基金in part by the Open Research Project of the Hubei Key Laboratory of Intelligent Geo-Information Processing under Grant KLIGIP-2021A06in part by the Fundamental Research Funds for the Universities in Hebei Province under Grant QN202220in part by the Science and Technology Research Project for Universities of Hebei under Grant ZD2020344in part by the Guangxi Natural Science Fund General Project under Grant 2021GXNSFAA075029.
文摘In classification problems,datasets often contain a large amount of features,but not all of them are relevant for accurate classification.In fact,irrelevant features may even hinder classification accuracy.Feature selection aims to alleviate this issue by minimizing the number of features in the subset while simultaneously minimizing the classification error rate.Single-objective optimization approaches employ an evaluation function designed as an aggregate function with a parameter,but the results obtained depend on the value of the parameter.To eliminate this parameter’s influence,the problem can be reformulated as a multi-objective optimization problem.The Whale Optimization Algorithm(WOA)is widely used in optimization problems because of its simplicity and easy implementation.In this paper,we propose a multi-strategy assisted multi-objective WOA(MSMOWOA)to address feature selection.To enhance the algorithm’s search ability,we integrate multiple strategies such as Levy flight,Grey Wolf Optimizer,and adaptive mutation into it.Additionally,we utilize an external repository to store non-dominant solution sets and grid technology is used to maintain diversity.Results on fourteen University of California Irvine(UCI)datasets demonstrate that our proposed method effectively removes redundant features and improves classification performance.The source code can be accessed from the website:https://github.com/zc0315/MSMOWOA.
文摘Pavement crack detection plays a crucial role in ensuring road safety and reducing maintenance expenses.Recent advancements in deep learning(DL)techniques have shown promising results in detecting pavement cracks;however,the selection of relevant features for classification remains challenging.In this study,we propose a new approach for pavement crack detection that integrates deep learning for feature extraction,the whale optimization algorithm(WOA)for feature selection,and random forest(RF)for classification.The performance of the models was evaluated using accuracy,recall,precision,F1 score,and area under the receiver operating characteristic curve(AUC).Our findings reveal that Model 2,which incorporates RF into the ResNet-18 architecture,outperforms baseline Model 1 across all evaluation metrics.Nevertheless,our proposed model,which combines ResNet-18 with both WOA and RF,achieves significantly higher accuracy,recall,precision,and F1 score compared to the other two models.These results underscore the effectiveness of integrating RF and WOA into ResNet-18 for pavement crack detection applications.We applied the proposed approach to a dataset of pavement images,achieving an accuracy of 97.16%and an AUC of 0.984.Our results demonstrate that the proposed approach surpasses existing methods for pavement crack detection,offering a promising solution for the automatic identification of pavement cracks.By leveraging this approach,potential safety hazards can be identified more effectively,enabling timely repairs and maintenance measures.Lastly,the findings of this study also emphasize the potential of integrating RF and WOA with deep learning for pavement crack detection,providing road authorities with the necessary tools to make informed decisions regarding road infrastructure maintenance.