In recent years, particle swarm optimization (PSO) has received widespread attention in feature selection due to its simplicity and potential for global search. However, in traditional PSO, particles primarily update ...In recent years, particle swarm optimization (PSO) has received widespread attention in feature selection due to its simplicity and potential for global search. However, in traditional PSO, particles primarily update based on two extreme values: personal best and global best, which limits the diversity of information. Ideally, particles should learn from multiple advantageous particles to enhance interactivity and optimization efficiency. Accordingly, this paper proposes a PSO that simulates the evolutionary dynamics of species survival in mountain peak ecology (PEPSO) for feature selection. Based on the pyramid topology, the algorithm simulates the features of mountain peak ecology in nature and the competitive-cooperative strategies among species. According to the principles of the algorithm, the population is first adaptively divided into many subgroups based on the fitness level of particles. Then, particles within each subgroup are divided into three different types based on their evolutionary levels, employing different adaptive inertia weight rules and dynamic learning mechanisms to define distinct learning modes. Consequently, all particles play their respective roles in promoting the global optimization performance of the algorithm, similar to different species in the ecological pattern of mountain peaks. Experimental validation of the PEPSO performance was conducted on 18 public datasets. The experimental results demonstrate that the PEPSO outperforms other PSO variant-based feature selection methods and mainstream feature selection methods based on intelligent optimization algorithms in terms of overall performance in global search capability, classification accuracy, and reduction of feature space dimensions. Wilcoxon signed-rank test also confirms the excellent performance of the PEPSO.展开更多
Existing feature selection methods for intrusion detection systems in the Industrial Internet of Things often suffer from local optimality and high computational complexity.These challenges hinder traditional IDS from...Existing feature selection methods for intrusion detection systems in the Industrial Internet of Things often suffer from local optimality and high computational complexity.These challenges hinder traditional IDS from effectively extracting features while maintaining detection accuracy.This paper proposes an industrial Internet ofThings intrusion detection feature selection algorithm based on an improved whale optimization algorithm(GSLDWOA).The aim is to address the problems that feature selection algorithms under high-dimensional data are prone to,such as local optimality,long detection time,and reduced accuracy.First,the initial population’s diversity is increased using the Gaussian Mutation mechanism.Then,Non-linear Shrinking Factor balances global exploration and local development,avoiding premature convergence.Lastly,Variable-step Levy Flight operator and Dynamic Differential Evolution strategy are introduced to improve the algorithm’s search efficiency and convergence accuracy in highdimensional feature space.Experiments on the NSL-KDD and WUSTL-IIoT-2021 datasets demonstrate that the feature subset selected by GSLDWOA significantly improves detection performance.Compared to the traditional WOA algorithm,the detection rate and F1-score increased by 3.68%and 4.12%.On the WUSTL-IIoT-2021 dataset,accuracy,recall,and F1-score all exceed 99.9%.展开更多
Software defect prediction(SDP)aims to find a reliable method to predict defects in specific software projects and help software engineers allocate limited resources to release high-quality software products.Software ...Software defect prediction(SDP)aims to find a reliable method to predict defects in specific software projects and help software engineers allocate limited resources to release high-quality software products.Software defect prediction can be effectively performed using traditional features,but there are some redundant or irrelevant features in them(the presence or absence of this feature has little effect on the prediction results).These problems can be solved using feature selection.However,existing feature selection methods have shortcomings such as insignificant dimensionality reduction effect and low classification accuracy of the selected optimal feature subset.In order to reduce the impact of these shortcomings,this paper proposes a new feature selection method Cubic TraverseMa Beluga whale optimization algorithm(CTMBWO)based on the improved Beluga whale optimization algorithm(BWO).The goal of this study is to determine how well the CTMBWO can extract the features that are most important for correctly predicting software defects,improve the accuracy of fault prediction,reduce the number of the selected feature and mitigate the risk of overfitting,thereby achieving more efficient resource utilization and better distribution of test workload.The CTMBWO comprises three main stages:preprocessing the dataset,selecting relevant features,and evaluating the classification performance of the model.The novel feature selection method can effectively improve the performance of SDP.This study performs experiments on two software defect datasets(PROMISE,NASA)and shows the method’s classification performance using four detailed evaluation metrics,Accuracy,F1-score,MCC,AUC and Recall.The results indicate that the approach presented in this paper achieves outstanding classification performance on both datasets and has significant improvement over the baseline models.展开更多
In recent years,feature selection(FS)optimization of high-dimensional gene expression data has become one of the most promising approaches for cancer prediction and classification.This work reviews FS and classificati...In recent years,feature selection(FS)optimization of high-dimensional gene expression data has become one of the most promising approaches for cancer prediction and classification.This work reviews FS and classification methods that utilize evolutionary algorithms(EAs)for gene expression profiles in cancer or medical applications based on research motivations,challenges,and recommendations.Relevant studies were retrieved from four major academic databases-IEEE,Scopus,Springer,and ScienceDirect-using the keywords‘cancer classification’,‘optimization’,‘FS’,and‘gene expression profile’.A total of 67 papers were finally selected with key advancements identified as follows:(1)The majority of papers(44.8%)focused on developing algorithms and models for FS and classification.(2)The second category encompassed studies on biomarker identification by EAs,including 20 papers(30%).(3)The third category comprised works that applied FS to cancer data for decision support system purposes,addressing high-dimensional data and the formulation of chromosome length.These studies accounted for 12%of the total number of studies.(4)The remaining three papers(4.5%)were reviews and surveys focusing on models and developments in prediction and classification optimization for cancer classification under current technical conditions.This review highlights the importance of optimizing FS in EAs to manage high-dimensional data effectively.Despite recent advancements,significant limitations remain:the dynamic formulation of chromosome length remains an underexplored area.Thus,further research is needed on dynamic-length chromosome techniques for more sophisticated biomarker gene selection techniques.The findings suggest that further advancements in dynamic chromosome length formulations and adaptive algorithms could enhance cancer classification accuracy and efficiency.展开更多
With the birth of Software-Defined Networking(SDN),integration of both SDN and traditional architectures becomes the development trend of computer networks.Network intrusion detection faces challenges in dealing with ...With the birth of Software-Defined Networking(SDN),integration of both SDN and traditional architectures becomes the development trend of computer networks.Network intrusion detection faces challenges in dealing with complex attacks in SDN environments,thus to address the network security issues from the viewpoint of Artificial Intelligence(AI),this paper introduces the Crayfish Optimization Algorithm(COA)to the field of intrusion detection for both SDN and traditional network architectures,and based on the characteristics of the original COA,an Improved Crayfish Optimization Algorithm(ICOA)is proposed by integrating strategies of elite reverse learning,Levy flight,crowding factor and parameter modification.The ICOA is then utilized for AI-integrated feature selection of intrusion detection for both SDN and traditional network architectures,to reduce the dimensionality of the data and improve the performance of network intrusion detection.Finally,the performance evaluation is performed by testing not only the NSL-KDD dataset and the UNSW-NB 15 dataset for traditional networks but also the InSDN dataset for SDN-based networks.Experimental results show that ICOA improves the accuracy by 0.532%and 2.928%respectively compared with GWO and COA in traditional networks.In SDN networks,the accuracy of ICOA is 0.25%and 0.3%higher than COA and PSO.These findings collectively indicate that AI-integrated feature selection based on the proposed ICOA can promote network intrusion detection for both SDN and traditional architectures.展开更多
To overcome the limitations of traditional monitoring methods, based on vibration parameter image of rotating machinery, this paper presents an abnormality online monitoring method suitable for rotating machinery usin...To overcome the limitations of traditional monitoring methods, based on vibration parameter image of rotating machinery, this paper presents an abnormality online monitoring method suitable for rotating machinery using the negative selection mechanism of biology immune system. This method uses techniques of biology clone and learning mechanism to improve the negative selection algorithm to generate detectors possessing different monitoring radius, covers the abnormality space effectively, and avoids such problems as the low efficiency of generating detectors, etc. The result of an example applying the presented monitoring method shows that this method can solve the difficulty of obtaining fault samples preferably and extract the turbine state character effectively, it also can detect abnormality by causing various fault of the turbine and obtain the degree of abnormality accurately. The exact monitoring precision of abnormality indicates that this method is feasible and has better on-line quality, accuracy and robustness.展开更多
In this paper,negative selection and genetic algorithms are combined and an improved bi-objective optimization scheme is presented to achieve optimized negative selection algorithm detectors.The main aim of the optima...In this paper,negative selection and genetic algorithms are combined and an improved bi-objective optimization scheme is presented to achieve optimized negative selection algorithm detectors.The main aim of the optimal detector generation technique is maximal nonself space coverage with reduced number of diversified detectors.Conventionally,researchers opted clonal selection based optimization methods to achieve the maximal nonself coverage milestone;however,detectors cloning process results in generation of redundant similar detectors and inefficient detector distribution in nonself space.In approach proposed in the present paper,the maximal nonself space coverage is associated with bi-objective optimization criteria including minimization of the detector overlap and maximization of the diversity factor of the detectors.In the proposed methodology,a novel diversity factorbased approach is presented to obtain diversified detector distribution in the nonself space.The concept of diversified detector distribution is studied for detector coverage with 2-dimensional pentagram and spiral self-patterns.Furthermore,the feasibility of the developed fault detection methodology is tested the fault detection of induction motor inner race and outer race bearings.展开更多
A real-valued negative selection algorithm with good mathematical foundation is presented to solve some of the drawbacks of previous approach. Specifically, it can produce a good estimate of the optimal number of dete...A real-valued negative selection algorithm with good mathematical foundation is presented to solve some of the drawbacks of previous approach. Specifically, it can produce a good estimate of the optimal number of detectors needed to cover the non-self space, and the maximization of the non-self coverage is done through an optimization algorithm with proven convergence properties. Experiments are performed to validate the assumptions made while designing the algorithm and to evaluate its performance.展开更多
The negative selection algorithm(NSA)is an adaptive technique inspired by how the biological immune system discriminates the self from nonself.It asserts itself as one of the most important algorithms of the artificia...The negative selection algorithm(NSA)is an adaptive technique inspired by how the biological immune system discriminates the self from nonself.It asserts itself as one of the most important algorithms of the artificial immune system.A key element of the NSA is its great dependency on the random detectors in monitoring for any abnormalities.However,these detectors have limited performance.Redundant detectors are generated,leading to difficulties for detectors to effectively occupy the non-self space.To alleviate this problem,we propose the nature-inspired metaheuristic cuckoo search(CS),a stochastic global search algorithm,which improves the random generation of detectors in the NSA.Inbuilt characteristics such as mutation,crossover,and selection operators make the CS attain global convergence.With the use of Lévy flight and a distance measure,efficient detectors are produced.Experimental results show that integrating CS into the negative selection algorithm elevated the detection performance of the NSA,with an average increase of 3.52%detection rate on the tested datasets.The proposed method shows superiority over other models,and detection rates of 98%and 99.29%on Fisher’s IRIS and Breast Cancer datasets,respectively.Thus,the generation of highest detection rates and lowest false alarm rates can be achieved.展开更多
Parkinson’s disease is a neurodegenerative disorder that inflicts irreversible damage on humans.Some experimental data regarding Parkinson’s patients are redundant and irrelevant,posing significant challenges for di...Parkinson’s disease is a neurodegenerative disorder that inflicts irreversible damage on humans.Some experimental data regarding Parkinson’s patients are redundant and irrelevant,posing significant challenges for disease detection.Therefore,there is a need to devise an effective method for the selective extraction of disease-specific information,ensuring both accuracy and the utilization of fewer features.In this paper,a Binary Hybrid Artificial Hummingbird and Flower Pollination Algorithm(FPA),called BFAHA,is proposed to solve the problem of Parkinson’s disease diagnosis based on speech signals.First,combining FPA with Artificial Hummingbird Algorithm(AHA)can take advantage of the strong global exploration ability possessed by FPA to improve the disadvantages of AHA,such as premature convergence and easy falling into local optimum.Second,the Hemming distance is used to determine the difference between the other individuals in the population and the optimal individual after each iteration,if the difference is too significant,the cross-mutation strategy in the genetic algorithm(GA)is used to induce the population individuals to keep approaching the optimal individual in the random search process to speed up finding the optimal solution.Finally,an S-shaped function converts the improved algorithm into a binary version to suit the characteristics of the feature selection(FS)tasks.In this paper,10 high-dimensional datasets from UCI and the ASU are used to test the performance of BFAHA and apply it to Parkinson’s disease diagnosis.Compared with other state-of-the-art algorithms,BFAHA shows excellent competitiveness in both the test datasets and the classification problem,indicating that the algorithm proposed in this study has apparent advantages in the field of feature selection.展开更多
Feature selection(FS)plays a crucial role in pre-processing machine learning datasets,as it eliminates redundant features to improve classification accuracy and reduce computational costs.This paper presents an enhanc...Feature selection(FS)plays a crucial role in pre-processing machine learning datasets,as it eliminates redundant features to improve classification accuracy and reduce computational costs.This paper presents an enhanced approach to FS for software fault prediction,specifically by enhancing the binary dwarf mongoose optimization(BDMO)algorithm with a crossover mechanism and a modified positioning updating formula.The proposed approach,termed iBDMOcr,aims to fortify exploration capability,promote population diversity,and lastly improve the wrapper-based FS process for software fault prediction tasks.iBDMOcr gained superb performance compared to other well-esteemed optimization methods across 17 benchmark datasets.It ranked first in 11 out of 17 datasets in terms of average classification accuracy.Moreover,iBDMOcr outperformed other methods in terms of average fitness values and number of selected features across all datasets.The findings demonstrate the effectiveness of iBDMOcr in addressing FS problems in software fault prediction,leading to more accurate and efficient models.展开更多
In classification problems,datasets often contain a large amount of features,but not all of them are relevant for accurate classification.In fact,irrelevant features may even hinder classification accuracy.Feature sel...In classification problems,datasets often contain a large amount of features,but not all of them are relevant for accurate classification.In fact,irrelevant features may even hinder classification accuracy.Feature selection aims to alleviate this issue by minimizing the number of features in the subset while simultaneously minimizing the classification error rate.Single-objective optimization approaches employ an evaluation function designed as an aggregate function with a parameter,but the results obtained depend on the value of the parameter.To eliminate this parameter’s influence,the problem can be reformulated as a multi-objective optimization problem.The Whale Optimization Algorithm(WOA)is widely used in optimization problems because of its simplicity and easy implementation.In this paper,we propose a multi-strategy assisted multi-objective WOA(MSMOWOA)to address feature selection.To enhance the algorithm’s search ability,we integrate multiple strategies such as Levy flight,Grey Wolf Optimizer,and adaptive mutation into it.Additionally,we utilize an external repository to store non-dominant solution sets and grid technology is used to maintain diversity.Results on fourteen University of California Irvine(UCI)datasets demonstrate that our proposed method effectively removes redundant features and improves classification performance.The source code can be accessed from the website:https://github.com/zc0315/MSMOWOA.展开更多
In this study,our aim is to address the problem of gene selection by proposing a hybrid bio-inspired evolutionary algorithm that combines Grey Wolf Optimization(GWO)with Harris Hawks Optimization(HHO)for feature selec...In this study,our aim is to address the problem of gene selection by proposing a hybrid bio-inspired evolutionary algorithm that combines Grey Wolf Optimization(GWO)with Harris Hawks Optimization(HHO)for feature selection.Themotivation for utilizingGWOandHHOstems fromtheir bio-inspired nature and their demonstrated success in optimization problems.We aimto leverage the strengths of these algorithms to enhance the effectiveness of feature selection in microarray-based cancer classification.We selected leave-one-out cross-validation(LOOCV)to evaluate the performance of both two widely used classifiers,k-nearest neighbors(KNN)and support vector machine(SVM),on high-dimensional cancer microarray data.The proposed method is extensively tested on six publicly available cancer microarray datasets,and a comprehensive comparison with recently published methods is conducted.Our hybrid algorithm demonstrates its effectiveness in improving classification performance,Surpassing alternative approaches in terms of precision.The outcomes confirm the capability of our method to substantially improve both the precision and efficiency of cancer classification,thereby advancing the development ofmore efficient treatment strategies.The proposed hybridmethod offers a promising solution to the gene selection problem in microarray-based cancer classification.It improves the accuracy and efficiency of cancer diagnosis and treatment,and its superior performance compared to other methods highlights its potential applicability in realworld cancer classification tasks.By harnessing the complementary search mechanisms of GWO and HHO,we leverage their bio-inspired behavior to identify informative genes relevant to cancer diagnosis and treatment.展开更多
Heart disease prediction is a critical issue in healthcare,where accurate early diagnosis can save lives and reduce healthcare costs.The problem is inherently complex due to the high dimensionality of medical data,irr...Heart disease prediction is a critical issue in healthcare,where accurate early diagnosis can save lives and reduce healthcare costs.The problem is inherently complex due to the high dimensionality of medical data,irrelevant or redundant features,and the variability in risk factors such as age,lifestyle,andmedical history.These challenges often lead to inefficient and less accuratemodels.Traditional predictionmethodologies face limitations in effectively handling large feature sets and optimizing classification performance,which can result in overfitting poor generalization,and high computational cost.This work proposes a novel classification model for heart disease prediction that addresses these challenges by integrating feature selection through a Genetic Algorithm(GA)with an ensemble deep learning approach optimized using the Tunicate Swarm Algorithm(TSA).GA selects the most relevant features,reducing dimensionality and improvingmodel efficiency.Theselected features are then used to train an ensemble of deep learning models,where the TSA optimizes the weight of each model in the ensemble to enhance prediction accuracy.This hybrid approach addresses key challenges in the field,such as high dimensionality,redundant features,and classification performance,by introducing an efficient feature selection mechanism and optimizing the weighting of deep learning models in the ensemble.These enhancements result in a model that achieves superior accuracy,generalization,and efficiency compared to traditional methods.The proposed model demonstrated notable advancements in both prediction accuracy and computational efficiency over traditionalmodels.Specifically,it achieved an accuracy of 97.5%,a sensitivity of 97.2%,and a specificity of 97.8%.Additionally,with a 60-40 data split and 5-fold cross-validation,the model showed a significant reduction in training time(90 s),memory consumption(950 MB),and CPU usage(80%),highlighting its effectiveness in processing large,complex medical datasets for heart disease prediction.展开更多
Multi-label feature selection(MFS)is a crucial dimensionality reduction technique aimed at identifying informative features associated with multiple labels.However,traditional centralized methods face significant chal...Multi-label feature selection(MFS)is a crucial dimensionality reduction technique aimed at identifying informative features associated with multiple labels.However,traditional centralized methods face significant challenges in privacy-sensitive and distributed settings,often neglecting label dependencies and suffering from low computational efficiency.To address these issues,we introduce a novel framework,Fed-MFSDHBCPSO—federated MFS via dual-layer hybrid breeding cooperative particle swarm optimization algorithm with manifold and sparsity regularization(DHBCPSO-MSR).Leveraging the federated learning paradigm,Fed-MFSDHBCPSO allows clients to perform local feature selection(FS)using DHBCPSO-MSR.Locally selected feature subsets are encrypted with differential privacy(DP)and transmitted to a central server,where they are securely aggregated and refined through secure multi-party computation(SMPC)until global convergence is achieved.Within each client,DHBCPSO-MSR employs a dual-layer FS strategy.The inner layer constructs sample and label similarity graphs,generates Laplacian matrices to capture the manifold structure between samples and labels,and applies L2,1-norm regularization to sparsify the feature subset,yielding an optimized feature weight matrix.The outer layer uses a hybrid breeding cooperative particle swarm optimization algorithm to further refine the feature weight matrix and identify the optimal feature subset.The updated weight matrix is then fed back to the inner layer for further optimization.Comprehensive experiments on multiple real-world multi-label datasets demonstrate that Fed-MFSDHBCPSO consistently outperforms both centralized and federated baseline methods across several key evaluation metrics.展开更多
The exponential growth of data in recent years has introduced significant challenges in managing high-dimensional datasets,particularly in industrial contexts where efficient data handling and process innovation are c...The exponential growth of data in recent years has introduced significant challenges in managing high-dimensional datasets,particularly in industrial contexts where efficient data handling and process innovation are critical.Feature selection,an essential step in data-driven process innovation,aims to identify the most relevant features to improve model interpretability,reduce complexity,and enhance predictive accuracy.To address the limitations of existing feature selection methods,this study introduces a novel wrapper-based feature selection framework leveraging the recently proposed Arctic Puffin Optimization(APO)algorithm.Specifically,we incorporate a specialized conversion mechanism to effectively adapt APO from continuous optimization to discrete,binary feature selection problems.Moreover,we introduce a fully parallelized implementation of APO in which both the search operators and fitness evaluations are executed concurrently using MATLAB’s Parallel Computing Toolbox.This parallel design significantly improves runtime efficiency and scalability,particularly for high-dimensional feature spaces.Extensive comparative experiments conducted against 14 state-of-the-art metaheuristic algorithms across 15 benchmark datasets reveal that the proposed APO-based method consistently achieves superior classification accuracy while selecting fewer features.These findings highlight the robustness and effectiveness of APO,validating its potential for advancing process innovation,economic productivity and smart city application in real-world machine learning scenarios.展开更多
To overcome the challenges associated with predicting gas extraction performance and mitigating the gradual decline in extraction volume,which adversely impacts gas utilization efficiency in mines,a gas extraction pur...To overcome the challenges associated with predicting gas extraction performance and mitigating the gradual decline in extraction volume,which adversely impacts gas utilization efficiency in mines,a gas extraction pure volume prediction model was developed using Support Vector Regression(SVR)and Random Forest(RF),with hyperparameters fine-tuned via the Genetic Algorithm(GA).Building upon this,an adaptive control model for gas extraction negative pressure was formulated to maximize the extracted gas volume within the pipeline network,followed by field validation experiments.Experimental results indicate that the GA-SVR model surpasses comparable models in terms of mean absolute error,root mean square error,and mean absolute percentage error.In the extraction process of bedding boreholes,the influence of negative pressure on gas extraction concentration diminishes over time,yet it remains a critical factor in determining the extracted pure volume.In contrast,throughout the entire extraction period of cross-layer boreholes,both extracted pure volume and concentration exhibit pronounced sensitivity to fluctuations in extraction negative pressure.Field experiments demonstrated that the adaptive controlmodel enhanced the average extracted gas volume by 5.08% in the experimental borehole group compared to the control group during the later extraction stage,with a more pronounced increase of 7.15% in the first 15 days.The research findings offer essential technical support for the efficient utilization and long-term sustainable development of mine gas resources.The research findings offer essential technical support for gas disaster mitigation and the sustained,efficient utilization of mine gas.展开更多
This study investigated the impacts of random negative training datasets(NTDs)on the uncertainty of machine learning models for geologic hazard susceptibility assessment of the Loess Plateau,northern Shaanxi Province,...This study investigated the impacts of random negative training datasets(NTDs)on the uncertainty of machine learning models for geologic hazard susceptibility assessment of the Loess Plateau,northern Shaanxi Province,China.Based on randomly generated 40 NTDs,the study developed models for the geologic hazard susceptibility assessment using the random forest algorithm and evaluated their performances using the area under the receiver operating characteristic curve(AUC).Specifically,the means and standard deviations of the AUC values from all models were then utilized to assess the overall spatial correlation between the conditioning factors and the susceptibility assessment,as well as the uncertainty introduced by the NTDs.A risk and return methodology was thus employed to quantify and mitigate the uncertainty,with log odds ratios used to characterize the susceptibility assessment levels.The risk and return values were calculated based on the standard deviations and means of the log odds ratios of various locations.After the mean log odds ratios were converted into probability values,the final susceptibility map was plotted,which accounts for the uncertainty induced by random NTDs.The results indicate that the AUC values of the models ranged from 0.810 to 0.963,with an average of 0.852 and a standard deviation of 0.035,indicating encouraging prediction effects and certain uncertainty.The risk and return analysis reveals that low-risk and high-return areas suggest lower standard deviations and higher means across multiple model-derived assessments.Overall,this study introduces a new framework for quantifying the uncertainty of multiple training and evaluation models,aimed at improving their robustness and reliability.Additionally,by identifying low-risk and high-return areas,resource allocation for geologic hazard prevention and control can be optimized,thus ensuring that limited resources are directed toward the most effective prevention and control measures.展开更多
The world produces vast quantities of high-dimensional multi-semantic data.However,extracting valuable information from such a large amount of high-dimensional and multi-label data is undoubtedly arduous and challengi...The world produces vast quantities of high-dimensional multi-semantic data.However,extracting valuable information from such a large amount of high-dimensional and multi-label data is undoubtedly arduous and challenging.Feature selection aims to mitigate the adverse impacts of high dimensionality in multi-label data by eliminating redundant and irrelevant features.The ant colony optimization algorithm has demonstrated encouraging outcomes in multi-label feature selection,because of its simplicity,efficiency,and similarity to reinforcement learning.Nevertheless,existing methods do not consider crucial correlation information,such as dynamic redundancy and label correlation.To tackle these concerns,the paper proposes a multi-label feature selection technique based on ant colony optimization algorithm(MFACO),focusing on dynamic redundancy and label correlation.Initially,the dynamic redundancy is assessed between the selected feature subset and potential features.Meanwhile,the ant colony optimization algorithm extracts label correlation from the label set,which is then combined into the heuristic factor as label weights.Experimental results demonstrate that our proposed strategies can effectively enhance the optimal search ability of ant colony,outperforming the other algorithms involved in the paper.展开更多
Lung cancer is among the most frequent cancers in the world,with over one million deaths per year.Classification is required for lung cancer diagnosis and therapy to be effective,accurate,and reliable.Gene expression ...Lung cancer is among the most frequent cancers in the world,with over one million deaths per year.Classification is required for lung cancer diagnosis and therapy to be effective,accurate,and reliable.Gene expression microarrays have made it possible to find genetic biomarkers for cancer diagnosis and prediction in a high-throughput manner.Machine Learning(ML)has been widely used to diagnose and classify lung cancer where the performance of ML methods is evaluated to identify the appropriate technique.Identifying and selecting the gene expression patterns can help in lung cancer diagnoses and classification.Normally,microarrays include several genes and may cause confusion or false prediction.Therefore,the Arithmetic Optimization Algorithm(AOA)is used to identify the optimal gene subset to reduce the number of selected genes.Which can allow the classifiers to yield the best performance for lung cancer classification.In addition,we proposed a modified version of AOA which can work effectively on the high dimensional dataset.In the modified AOA,the features are ranked by their weights and are used to initialize the AOA population.The exploitation process of AOA is then enhanced by developing a local search algorithm based on two neighborhood strategies.Finally,the efficiency of the proposed methods was evaluated on gene expression datasets related to Lung cancer using stratified 4-fold cross-validation.The method’s efficacy in selecting the optimal gene subset is underscored by its ability to maintain feature proportions between 10%to 25%.Moreover,the approach significantly enhances lung cancer prediction accuracy.For instance,Lung_Harvard1 achieved an accuracy of 97.5%,Lung_Harvard2 and Lung_Michigan datasets both achieved 100%,Lung_Adenocarcinoma obtained an accuracy of 88.2%,and Lung_Ontario achieved an accuracy of 87.5%.In conclusion,the results indicate the potential promise of the proposed modified AOA approach in classifying microarray cancer data.展开更多
文摘In recent years, particle swarm optimization (PSO) has received widespread attention in feature selection due to its simplicity and potential for global search. However, in traditional PSO, particles primarily update based on two extreme values: personal best and global best, which limits the diversity of information. Ideally, particles should learn from multiple advantageous particles to enhance interactivity and optimization efficiency. Accordingly, this paper proposes a PSO that simulates the evolutionary dynamics of species survival in mountain peak ecology (PEPSO) for feature selection. Based on the pyramid topology, the algorithm simulates the features of mountain peak ecology in nature and the competitive-cooperative strategies among species. According to the principles of the algorithm, the population is first adaptively divided into many subgroups based on the fitness level of particles. Then, particles within each subgroup are divided into three different types based on their evolutionary levels, employing different adaptive inertia weight rules and dynamic learning mechanisms to define distinct learning modes. Consequently, all particles play their respective roles in promoting the global optimization performance of the algorithm, similar to different species in the ecological pattern of mountain peaks. Experimental validation of the PEPSO performance was conducted on 18 public datasets. The experimental results demonstrate that the PEPSO outperforms other PSO variant-based feature selection methods and mainstream feature selection methods based on intelligent optimization algorithms in terms of overall performance in global search capability, classification accuracy, and reduction of feature space dimensions. Wilcoxon signed-rank test also confirms the excellent performance of the PEPSO.
基金supported by the Major Science and Technology Programs in Henan Province(No.241100210100)Henan Provincial Science and Technology Research Project(No.252102211085,No.252102211105)+3 种基金Endogenous Security Cloud Network Convergence R&D Center(No.602431011PQ1)The Special Project for Research and Development in Key Areas of Guangdong Province(No.2021ZDZX1098)The Stabilization Support Program of Science,Technology and Innovation Commission of Shenzhen Municipality(No.20231128083944001)The Key scientific research projects of Henan higher education institutions(No.24A520042).
文摘Existing feature selection methods for intrusion detection systems in the Industrial Internet of Things often suffer from local optimality and high computational complexity.These challenges hinder traditional IDS from effectively extracting features while maintaining detection accuracy.This paper proposes an industrial Internet ofThings intrusion detection feature selection algorithm based on an improved whale optimization algorithm(GSLDWOA).The aim is to address the problems that feature selection algorithms under high-dimensional data are prone to,such as local optimality,long detection time,and reduced accuracy.First,the initial population’s diversity is increased using the Gaussian Mutation mechanism.Then,Non-linear Shrinking Factor balances global exploration and local development,avoiding premature convergence.Lastly,Variable-step Levy Flight operator and Dynamic Differential Evolution strategy are introduced to improve the algorithm’s search efficiency and convergence accuracy in highdimensional feature space.Experiments on the NSL-KDD and WUSTL-IIoT-2021 datasets demonstrate that the feature subset selected by GSLDWOA significantly improves detection performance.Compared to the traditional WOA algorithm,the detection rate and F1-score increased by 3.68%and 4.12%.On the WUSTL-IIoT-2021 dataset,accuracy,recall,and F1-score all exceed 99.9%.
文摘Software defect prediction(SDP)aims to find a reliable method to predict defects in specific software projects and help software engineers allocate limited resources to release high-quality software products.Software defect prediction can be effectively performed using traditional features,but there are some redundant or irrelevant features in them(the presence or absence of this feature has little effect on the prediction results).These problems can be solved using feature selection.However,existing feature selection methods have shortcomings such as insignificant dimensionality reduction effect and low classification accuracy of the selected optimal feature subset.In order to reduce the impact of these shortcomings,this paper proposes a new feature selection method Cubic TraverseMa Beluga whale optimization algorithm(CTMBWO)based on the improved Beluga whale optimization algorithm(BWO).The goal of this study is to determine how well the CTMBWO can extract the features that are most important for correctly predicting software defects,improve the accuracy of fault prediction,reduce the number of the selected feature and mitigate the risk of overfitting,thereby achieving more efficient resource utilization and better distribution of test workload.The CTMBWO comprises three main stages:preprocessing the dataset,selecting relevant features,and evaluating the classification performance of the model.The novel feature selection method can effectively improve the performance of SDP.This study performs experiments on two software defect datasets(PROMISE,NASA)and shows the method’s classification performance using four detailed evaluation metrics,Accuracy,F1-score,MCC,AUC and Recall.The results indicate that the approach presented in this paper achieves outstanding classification performance on both datasets and has significant improvement over the baseline models.
基金funded by the Ministry of Higher Education of Malaysia,grant number FRGS/1/2022/ICT02/UPSI/02/1.
文摘In recent years,feature selection(FS)optimization of high-dimensional gene expression data has become one of the most promising approaches for cancer prediction and classification.This work reviews FS and classification methods that utilize evolutionary algorithms(EAs)for gene expression profiles in cancer or medical applications based on research motivations,challenges,and recommendations.Relevant studies were retrieved from four major academic databases-IEEE,Scopus,Springer,and ScienceDirect-using the keywords‘cancer classification’,‘optimization’,‘FS’,and‘gene expression profile’.A total of 67 papers were finally selected with key advancements identified as follows:(1)The majority of papers(44.8%)focused on developing algorithms and models for FS and classification.(2)The second category encompassed studies on biomarker identification by EAs,including 20 papers(30%).(3)The third category comprised works that applied FS to cancer data for decision support system purposes,addressing high-dimensional data and the formulation of chromosome length.These studies accounted for 12%of the total number of studies.(4)The remaining three papers(4.5%)were reviews and surveys focusing on models and developments in prediction and classification optimization for cancer classification under current technical conditions.This review highlights the importance of optimizing FS in EAs to manage high-dimensional data effectively.Despite recent advancements,significant limitations remain:the dynamic formulation of chromosome length remains an underexplored area.Thus,further research is needed on dynamic-length chromosome techniques for more sophisticated biomarker gene selection techniques.The findings suggest that further advancements in dynamic chromosome length formulations and adaptive algorithms could enhance cancer classification accuracy and efficiency.
基金supported by the National Natural Science Foundation of China under Grant 61602162the Hubei Provincial Science and Technology Plan Project under Grant 2023BCB041.
文摘With the birth of Software-Defined Networking(SDN),integration of both SDN and traditional architectures becomes the development trend of computer networks.Network intrusion detection faces challenges in dealing with complex attacks in SDN environments,thus to address the network security issues from the viewpoint of Artificial Intelligence(AI),this paper introduces the Crayfish Optimization Algorithm(COA)to the field of intrusion detection for both SDN and traditional network architectures,and based on the characteristics of the original COA,an Improved Crayfish Optimization Algorithm(ICOA)is proposed by integrating strategies of elite reverse learning,Levy flight,crowding factor and parameter modification.The ICOA is then utilized for AI-integrated feature selection of intrusion detection for both SDN and traditional network architectures,to reduce the dimensionality of the data and improve the performance of network intrusion detection.Finally,the performance evaluation is performed by testing not only the NSL-KDD dataset and the UNSW-NB 15 dataset for traditional networks but also the InSDN dataset for SDN-based networks.Experimental results show that ICOA improves the accuracy by 0.532%and 2.928%respectively compared with GWO and COA in traditional networks.In SDN networks,the accuracy of ICOA is 0.25%and 0.3%higher than COA and PSO.These findings collectively indicate that AI-integrated feature selection based on the proposed ICOA can promote network intrusion detection for both SDN and traditional architectures.
基金Sponsored by the National Natural Science Foundation of China(Grant No.50875056)
文摘To overcome the limitations of traditional monitoring methods, based on vibration parameter image of rotating machinery, this paper presents an abnormality online monitoring method suitable for rotating machinery using the negative selection mechanism of biology immune system. This method uses techniques of biology clone and learning mechanism to improve the negative selection algorithm to generate detectors possessing different monitoring radius, covers the abnormality space effectively, and avoids such problems as the low efficiency of generating detectors, etc. The result of an example applying the presented monitoring method shows that this method can solve the difficulty of obtaining fault samples preferably and extract the turbine state character effectively, it also can detect abnormality by causing various fault of the turbine and obtain the degree of abnormality accurately. The exact monitoring precision of abnormality indicates that this method is feasible and has better on-line quality, accuracy and robustness.
文摘In this paper,negative selection and genetic algorithms are combined and an improved bi-objective optimization scheme is presented to achieve optimized negative selection algorithm detectors.The main aim of the optimal detector generation technique is maximal nonself space coverage with reduced number of diversified detectors.Conventionally,researchers opted clonal selection based optimization methods to achieve the maximal nonself coverage milestone;however,detectors cloning process results in generation of redundant similar detectors and inefficient detector distribution in nonself space.In approach proposed in the present paper,the maximal nonself space coverage is associated with bi-objective optimization criteria including minimization of the detector overlap and maximization of the diversity factor of the detectors.In the proposed methodology,a novel diversity factorbased approach is presented to obtain diversified detector distribution in the nonself space.The concept of diversified detector distribution is studied for detector coverage with 2-dimensional pentagram and spiral self-patterns.Furthermore,the feasibility of the developed fault detection methodology is tested the fault detection of induction motor inner race and outer race bearings.
基金Sponsored by the National Natural Science Foundation of China ( Grant No. 60671049 ), the Subject Chief Foundation of Harbin ( Grant No.2003AFXXJ013), the Education Department Research Foundation of Heilongjiang Province(Grant No.10541044,1151G012) and the Postdoctor Founda-tion of Heilongjiang(Grant No.LBH-Z05092).
文摘A real-valued negative selection algorithm with good mathematical foundation is presented to solve some of the drawbacks of previous approach. Specifically, it can produce a good estimate of the optimal number of detectors needed to cover the non-self space, and the maximization of the non-self coverage is done through an optimization algorithm with proven convergence properties. Experiments are performed to validate the assumptions made while designing the algorithm and to evaluate its performance.
文摘The negative selection algorithm(NSA)is an adaptive technique inspired by how the biological immune system discriminates the self from nonself.It asserts itself as one of the most important algorithms of the artificial immune system.A key element of the NSA is its great dependency on the random detectors in monitoring for any abnormalities.However,these detectors have limited performance.Redundant detectors are generated,leading to difficulties for detectors to effectively occupy the non-self space.To alleviate this problem,we propose the nature-inspired metaheuristic cuckoo search(CS),a stochastic global search algorithm,which improves the random generation of detectors in the NSA.Inbuilt characteristics such as mutation,crossover,and selection operators make the CS attain global convergence.With the use of Lévy flight and a distance measure,efficient detectors are produced.Experimental results show that integrating CS into the negative selection algorithm elevated the detection performance of the NSA,with an average increase of 3.52%detection rate on the tested datasets.The proposed method shows superiority over other models,and detection rates of 98%and 99.29%on Fisher’s IRIS and Breast Cancer datasets,respectively.Thus,the generation of highest detection rates and lowest false alarm rates can be achieved.
基金supported by the National Natural Science Foundation of China under Grant Nos.U21A20464,62066005the Innovation Project of Guangxi Graduate Education under Grant No.YCSW2023259.
文摘Parkinson’s disease is a neurodegenerative disorder that inflicts irreversible damage on humans.Some experimental data regarding Parkinson’s patients are redundant and irrelevant,posing significant challenges for disease detection.Therefore,there is a need to devise an effective method for the selective extraction of disease-specific information,ensuring both accuracy and the utilization of fewer features.In this paper,a Binary Hybrid Artificial Hummingbird and Flower Pollination Algorithm(FPA),called BFAHA,is proposed to solve the problem of Parkinson’s disease diagnosis based on speech signals.First,combining FPA with Artificial Hummingbird Algorithm(AHA)can take advantage of the strong global exploration ability possessed by FPA to improve the disadvantages of AHA,such as premature convergence and easy falling into local optimum.Second,the Hemming distance is used to determine the difference between the other individuals in the population and the optimal individual after each iteration,if the difference is too significant,the cross-mutation strategy in the genetic algorithm(GA)is used to induce the population individuals to keep approaching the optimal individual in the random search process to speed up finding the optimal solution.Finally,an S-shaped function converts the improved algorithm into a binary version to suit the characteristics of the feature selection(FS)tasks.In this paper,10 high-dimensional datasets from UCI and the ASU are used to test the performance of BFAHA and apply it to Parkinson’s disease diagnosis.Compared with other state-of-the-art algorithms,BFAHA shows excellent competitiveness in both the test datasets and the classification problem,indicating that the algorithm proposed in this study has apparent advantages in the field of feature selection.
基金supported by the Deanship of Scientific Research and Innovation at Al-Balqa Applied University in Jordan.
文摘Feature selection(FS)plays a crucial role in pre-processing machine learning datasets,as it eliminates redundant features to improve classification accuracy and reduce computational costs.This paper presents an enhanced approach to FS for software fault prediction,specifically by enhancing the binary dwarf mongoose optimization(BDMO)algorithm with a crossover mechanism and a modified positioning updating formula.The proposed approach,termed iBDMOcr,aims to fortify exploration capability,promote population diversity,and lastly improve the wrapper-based FS process for software fault prediction tasks.iBDMOcr gained superb performance compared to other well-esteemed optimization methods across 17 benchmark datasets.It ranked first in 11 out of 17 datasets in terms of average classification accuracy.Moreover,iBDMOcr outperformed other methods in terms of average fitness values and number of selected features across all datasets.The findings demonstrate the effectiveness of iBDMOcr in addressing FS problems in software fault prediction,leading to more accurate and efficient models.
基金supported in part by the Natural Science Youth Foundation of Hebei Province under Grant F2019403207in part by the PhD Research Startup Foundation of Hebei GEO University under Grant BQ2019055+3 种基金in part by the Open Research Project of the Hubei Key Laboratory of Intelligent Geo-Information Processing under Grant KLIGIP-2021A06in part by the Fundamental Research Funds for the Universities in Hebei Province under Grant QN202220in part by the Science and Technology Research Project for Universities of Hebei under Grant ZD2020344in part by the Guangxi Natural Science Fund General Project under Grant 2021GXNSFAA075029.
文摘In classification problems,datasets often contain a large amount of features,but not all of them are relevant for accurate classification.In fact,irrelevant features may even hinder classification accuracy.Feature selection aims to alleviate this issue by minimizing the number of features in the subset while simultaneously minimizing the classification error rate.Single-objective optimization approaches employ an evaluation function designed as an aggregate function with a parameter,but the results obtained depend on the value of the parameter.To eliminate this parameter’s influence,the problem can be reformulated as a multi-objective optimization problem.The Whale Optimization Algorithm(WOA)is widely used in optimization problems because of its simplicity and easy implementation.In this paper,we propose a multi-strategy assisted multi-objective WOA(MSMOWOA)to address feature selection.To enhance the algorithm’s search ability,we integrate multiple strategies such as Levy flight,Grey Wolf Optimizer,and adaptive mutation into it.Additionally,we utilize an external repository to store non-dominant solution sets and grid technology is used to maintain diversity.Results on fourteen University of California Irvine(UCI)datasets demonstrate that our proposed method effectively removes redundant features and improves classification performance.The source code can be accessed from the website:https://github.com/zc0315/MSMOWOA.
基金the Deputyship for Research and Innovation,“Ministry of Education”in Saudi Arabia for funding this research(IFKSUOR3-014-3).
文摘In this study,our aim is to address the problem of gene selection by proposing a hybrid bio-inspired evolutionary algorithm that combines Grey Wolf Optimization(GWO)with Harris Hawks Optimization(HHO)for feature selection.Themotivation for utilizingGWOandHHOstems fromtheir bio-inspired nature and their demonstrated success in optimization problems.We aimto leverage the strengths of these algorithms to enhance the effectiveness of feature selection in microarray-based cancer classification.We selected leave-one-out cross-validation(LOOCV)to evaluate the performance of both two widely used classifiers,k-nearest neighbors(KNN)and support vector machine(SVM),on high-dimensional cancer microarray data.The proposed method is extensively tested on six publicly available cancer microarray datasets,and a comprehensive comparison with recently published methods is conducted.Our hybrid algorithm demonstrates its effectiveness in improving classification performance,Surpassing alternative approaches in terms of precision.The outcomes confirm the capability of our method to substantially improve both the precision and efficiency of cancer classification,thereby advancing the development ofmore efficient treatment strategies.The proposed hybridmethod offers a promising solution to the gene selection problem in microarray-based cancer classification.It improves the accuracy and efficiency of cancer diagnosis and treatment,and its superior performance compared to other methods highlights its potential applicability in realworld cancer classification tasks.By harnessing the complementary search mechanisms of GWO and HHO,we leverage their bio-inspired behavior to identify informative genes relevant to cancer diagnosis and treatment.
文摘Heart disease prediction is a critical issue in healthcare,where accurate early diagnosis can save lives and reduce healthcare costs.The problem is inherently complex due to the high dimensionality of medical data,irrelevant or redundant features,and the variability in risk factors such as age,lifestyle,andmedical history.These challenges often lead to inefficient and less accuratemodels.Traditional predictionmethodologies face limitations in effectively handling large feature sets and optimizing classification performance,which can result in overfitting poor generalization,and high computational cost.This work proposes a novel classification model for heart disease prediction that addresses these challenges by integrating feature selection through a Genetic Algorithm(GA)with an ensemble deep learning approach optimized using the Tunicate Swarm Algorithm(TSA).GA selects the most relevant features,reducing dimensionality and improvingmodel efficiency.Theselected features are then used to train an ensemble of deep learning models,where the TSA optimizes the weight of each model in the ensemble to enhance prediction accuracy.This hybrid approach addresses key challenges in the field,such as high dimensionality,redundant features,and classification performance,by introducing an efficient feature selection mechanism and optimizing the weighting of deep learning models in the ensemble.These enhancements result in a model that achieves superior accuracy,generalization,and efficiency compared to traditional methods.The proposed model demonstrated notable advancements in both prediction accuracy and computational efficiency over traditionalmodels.Specifically,it achieved an accuracy of 97.5%,a sensitivity of 97.2%,and a specificity of 97.8%.Additionally,with a 60-40 data split and 5-fold cross-validation,the model showed a significant reduction in training time(90 s),memory consumption(950 MB),and CPU usage(80%),highlighting its effectiveness in processing large,complex medical datasets for heart disease prediction.
文摘Multi-label feature selection(MFS)is a crucial dimensionality reduction technique aimed at identifying informative features associated with multiple labels.However,traditional centralized methods face significant challenges in privacy-sensitive and distributed settings,often neglecting label dependencies and suffering from low computational efficiency.To address these issues,we introduce a novel framework,Fed-MFSDHBCPSO—federated MFS via dual-layer hybrid breeding cooperative particle swarm optimization algorithm with manifold and sparsity regularization(DHBCPSO-MSR).Leveraging the federated learning paradigm,Fed-MFSDHBCPSO allows clients to perform local feature selection(FS)using DHBCPSO-MSR.Locally selected feature subsets are encrypted with differential privacy(DP)and transmitted to a central server,where they are securely aggregated and refined through secure multi-party computation(SMPC)until global convergence is achieved.Within each client,DHBCPSO-MSR employs a dual-layer FS strategy.The inner layer constructs sample and label similarity graphs,generates Laplacian matrices to capture the manifold structure between samples and labels,and applies L2,1-norm regularization to sparsify the feature subset,yielding an optimized feature weight matrix.The outer layer uses a hybrid breeding cooperative particle swarm optimization algorithm to further refine the feature weight matrix and identify the optimal feature subset.The updated weight matrix is then fed back to the inner layer for further optimization.Comprehensive experiments on multiple real-world multi-label datasets demonstrate that Fed-MFSDHBCPSO consistently outperforms both centralized and federated baseline methods across several key evaluation metrics.
文摘The exponential growth of data in recent years has introduced significant challenges in managing high-dimensional datasets,particularly in industrial contexts where efficient data handling and process innovation are critical.Feature selection,an essential step in data-driven process innovation,aims to identify the most relevant features to improve model interpretability,reduce complexity,and enhance predictive accuracy.To address the limitations of existing feature selection methods,this study introduces a novel wrapper-based feature selection framework leveraging the recently proposed Arctic Puffin Optimization(APO)algorithm.Specifically,we incorporate a specialized conversion mechanism to effectively adapt APO from continuous optimization to discrete,binary feature selection problems.Moreover,we introduce a fully parallelized implementation of APO in which both the search operators and fitness evaluations are executed concurrently using MATLAB’s Parallel Computing Toolbox.This parallel design significantly improves runtime efficiency and scalability,particularly for high-dimensional feature spaces.Extensive comparative experiments conducted against 14 state-of-the-art metaheuristic algorithms across 15 benchmark datasets reveal that the proposed APO-based method consistently achieves superior classification accuracy while selecting fewer features.These findings highlight the robustness and effectiveness of APO,validating its potential for advancing process innovation,economic productivity and smart city application in real-world machine learning scenarios.
基金funded by the National Key Research and Development Program of China,grant number:2023YFF0615404.
文摘To overcome the challenges associated with predicting gas extraction performance and mitigating the gradual decline in extraction volume,which adversely impacts gas utilization efficiency in mines,a gas extraction pure volume prediction model was developed using Support Vector Regression(SVR)and Random Forest(RF),with hyperparameters fine-tuned via the Genetic Algorithm(GA).Building upon this,an adaptive control model for gas extraction negative pressure was formulated to maximize the extracted gas volume within the pipeline network,followed by field validation experiments.Experimental results indicate that the GA-SVR model surpasses comparable models in terms of mean absolute error,root mean square error,and mean absolute percentage error.In the extraction process of bedding boreholes,the influence of negative pressure on gas extraction concentration diminishes over time,yet it remains a critical factor in determining the extracted pure volume.In contrast,throughout the entire extraction period of cross-layer boreholes,both extracted pure volume and concentration exhibit pronounced sensitivity to fluctuations in extraction negative pressure.Field experiments demonstrated that the adaptive controlmodel enhanced the average extracted gas volume by 5.08% in the experimental borehole group compared to the control group during the later extraction stage,with a more pronounced increase of 7.15% in the first 15 days.The research findings offer essential technical support for the efficient utilization and long-term sustainable development of mine gas resources.The research findings offer essential technical support for gas disaster mitigation and the sustained,efficient utilization of mine gas.
基金supported by a project entitled Loess Plateau Region-Watershed-Slope Geological Hazard Multi-Scale Collaborative Intelligent Early Warning System of the National Key R&D Program of China(2022YFC3003404)a project of the Shaanxi Youth Science and Technology Star(2021KJXX-87)public welfare geological survey projects of Shaanxi Institute of Geologic Survey(20180301,201918,202103,and 202413).
文摘This study investigated the impacts of random negative training datasets(NTDs)on the uncertainty of machine learning models for geologic hazard susceptibility assessment of the Loess Plateau,northern Shaanxi Province,China.Based on randomly generated 40 NTDs,the study developed models for the geologic hazard susceptibility assessment using the random forest algorithm and evaluated their performances using the area under the receiver operating characteristic curve(AUC).Specifically,the means and standard deviations of the AUC values from all models were then utilized to assess the overall spatial correlation between the conditioning factors and the susceptibility assessment,as well as the uncertainty introduced by the NTDs.A risk and return methodology was thus employed to quantify and mitigate the uncertainty,with log odds ratios used to characterize the susceptibility assessment levels.The risk and return values were calculated based on the standard deviations and means of the log odds ratios of various locations.After the mean log odds ratios were converted into probability values,the final susceptibility map was plotted,which accounts for the uncertainty induced by random NTDs.The results indicate that the AUC values of the models ranged from 0.810 to 0.963,with an average of 0.852 and a standard deviation of 0.035,indicating encouraging prediction effects and certain uncertainty.The risk and return analysis reveals that low-risk and high-return areas suggest lower standard deviations and higher means across multiple model-derived assessments.Overall,this study introduces a new framework for quantifying the uncertainty of multiple training and evaluation models,aimed at improving their robustness and reliability.Additionally,by identifying low-risk and high-return areas,resource allocation for geologic hazard prevention and control can be optimized,thus ensuring that limited resources are directed toward the most effective prevention and control measures.
基金supported by National Natural Science Foundation of China(Grant Nos.62376089,62302153,62302154,62202147)the key Research and Development Program of Hubei Province,China(Grant No.2023BEB024).
文摘The world produces vast quantities of high-dimensional multi-semantic data.However,extracting valuable information from such a large amount of high-dimensional and multi-label data is undoubtedly arduous and challenging.Feature selection aims to mitigate the adverse impacts of high dimensionality in multi-label data by eliminating redundant and irrelevant features.The ant colony optimization algorithm has demonstrated encouraging outcomes in multi-label feature selection,because of its simplicity,efficiency,and similarity to reinforcement learning.Nevertheless,existing methods do not consider crucial correlation information,such as dynamic redundancy and label correlation.To tackle these concerns,the paper proposes a multi-label feature selection technique based on ant colony optimization algorithm(MFACO),focusing on dynamic redundancy and label correlation.Initially,the dynamic redundancy is assessed between the selected feature subset and potential features.Meanwhile,the ant colony optimization algorithm extracts label correlation from the label set,which is then combined into the heuristic factor as label weights.Experimental results demonstrate that our proposed strategies can effectively enhance the optimal search ability of ant colony,outperforming the other algorithms involved in the paper.
基金supported by the Deanship of Scientific Research,at Imam Abdulrahman Bin Faisal University.Grant Number:2019-416-ASCS.
文摘Lung cancer is among the most frequent cancers in the world,with over one million deaths per year.Classification is required for lung cancer diagnosis and therapy to be effective,accurate,and reliable.Gene expression microarrays have made it possible to find genetic biomarkers for cancer diagnosis and prediction in a high-throughput manner.Machine Learning(ML)has been widely used to diagnose and classify lung cancer where the performance of ML methods is evaluated to identify the appropriate technique.Identifying and selecting the gene expression patterns can help in lung cancer diagnoses and classification.Normally,microarrays include several genes and may cause confusion or false prediction.Therefore,the Arithmetic Optimization Algorithm(AOA)is used to identify the optimal gene subset to reduce the number of selected genes.Which can allow the classifiers to yield the best performance for lung cancer classification.In addition,we proposed a modified version of AOA which can work effectively on the high dimensional dataset.In the modified AOA,the features are ranked by their weights and are used to initialize the AOA population.The exploitation process of AOA is then enhanced by developing a local search algorithm based on two neighborhood strategies.Finally,the efficiency of the proposed methods was evaluated on gene expression datasets related to Lung cancer using stratified 4-fold cross-validation.The method’s efficacy in selecting the optimal gene subset is underscored by its ability to maintain feature proportions between 10%to 25%.Moreover,the approach significantly enhances lung cancer prediction accuracy.For instance,Lung_Harvard1 achieved an accuracy of 97.5%,Lung_Harvard2 and Lung_Michigan datasets both achieved 100%,Lung_Adenocarcinoma obtained an accuracy of 88.2%,and Lung_Ontario achieved an accuracy of 87.5%.In conclusion,the results indicate the potential promise of the proposed modified AOA approach in classifying microarray cancer data.