DNA microarrays, a cornerstone in biomedicine, measure gene expression across thousands to tens of thousands of genes. Identifying the genes vital for accurate cancer classification is a key challenge. Here, we presen...DNA microarrays, a cornerstone in biomedicine, measure gene expression across thousands to tens of thousands of genes. Identifying the genes vital for accurate cancer classification is a key challenge. Here, we present Fs-LSA (F-score based Learning Search Algorithm), a novel gene selection algorithm designed to enhance the precision and efficiency of target gene identification from microarray data for cancer classification. This algorithm is divided into two phases: the first leverages F-score values to prioritize and select feature genes with the most significant differential expression;the second phase introduces our Learning Search Algorithm (LSA), which harnesses swarm intelligence to identify the optimal subset among the remaining genes. Inspired by human social learning, LSA integrates historical data and collective intelligence for a thorough search, with a dynamic control mechanism that balances exploration and refinement, thereby enhancing the gene selection process. We conducted a rigorous validation of Fs-LSA’s performance using eight publicly available cancer microarray expression datasets. Fs-LSA achieved accuracy, precision, sensitivity, and F1-score values of 0.9932, 0.9923, 0.9962, and 0.994, respectively. Comparative analyses with state-of-the-art algorithms revealed Fs-LSA’s superior performance in terms of simplicity and efficiency. Additionally, we validated the algorithm’s efficacy independently using glioblastoma data from GEO and TCGA databases. It was significantly superior to those of the comparison algorithms. Importantly, the driver genes identified by Fs-LSA were instrumental in developing a predictive model as an independent prognostic indicator for glioblastoma, underscoring Fs-LSA’s transformative potential in genomics and personalized medicine.展开更多
In recent years,feature selection(FS)optimization of high-dimensional gene expression data has become one of the most promising approaches for cancer prediction and classification.This work reviews FS and classificati...In recent years,feature selection(FS)optimization of high-dimensional gene expression data has become one of the most promising approaches for cancer prediction and classification.This work reviews FS and classification methods that utilize evolutionary algorithms(EAs)for gene expression profiles in cancer or medical applications based on research motivations,challenges,and recommendations.Relevant studies were retrieved from four major academic databases-IEEE,Scopus,Springer,and ScienceDirect-using the keywords‘cancer classification’,‘optimization’,‘FS’,and‘gene expression profile’.A total of 67 papers were finally selected with key advancements identified as follows:(1)The majority of papers(44.8%)focused on developing algorithms and models for FS and classification.(2)The second category encompassed studies on biomarker identification by EAs,including 20 papers(30%).(3)The third category comprised works that applied FS to cancer data for decision support system purposes,addressing high-dimensional data and the formulation of chromosome length.These studies accounted for 12%of the total number of studies.(4)The remaining three papers(4.5%)were reviews and surveys focusing on models and developments in prediction and classification optimization for cancer classification under current technical conditions.This review highlights the importance of optimizing FS in EAs to manage high-dimensional data effectively.Despite recent advancements,significant limitations remain:the dynamic formulation of chromosome length remains an underexplored area.Thus,further research is needed on dynamic-length chromosome techniques for more sophisticated biomarker gene selection techniques.The findings suggest that further advancements in dynamic chromosome length formulations and adaptive algorithms could enhance cancer classification accuracy and efficiency.展开更多
In this study,our aim is to address the problem of gene selection by proposing a hybrid bio-inspired evolutionary algorithm that combines Grey Wolf Optimization(GWO)with Harris Hawks Optimization(HHO)for feature selec...In this study,our aim is to address the problem of gene selection by proposing a hybrid bio-inspired evolutionary algorithm that combines Grey Wolf Optimization(GWO)with Harris Hawks Optimization(HHO)for feature selection.Themotivation for utilizingGWOandHHOstems fromtheir bio-inspired nature and their demonstrated success in optimization problems.We aimto leverage the strengths of these algorithms to enhance the effectiveness of feature selection in microarray-based cancer classification.We selected leave-one-out cross-validation(LOOCV)to evaluate the performance of both two widely used classifiers,k-nearest neighbors(KNN)and support vector machine(SVM),on high-dimensional cancer microarray data.The proposed method is extensively tested on six publicly available cancer microarray datasets,and a comprehensive comparison with recently published methods is conducted.Our hybrid algorithm demonstrates its effectiveness in improving classification performance,Surpassing alternative approaches in terms of precision.The outcomes confirm the capability of our method to substantially improve both the precision and efficiency of cancer classification,thereby advancing the development ofmore efficient treatment strategies.The proposed hybridmethod offers a promising solution to the gene selection problem in microarray-based cancer classification.It improves the accuracy and efficiency of cancer diagnosis and treatment,and its superior performance compared to other methods highlights its potential applicability in realworld cancer classification tasks.By harnessing the complementary search mechanisms of GWO and HHO,we leverage their bio-inspired behavior to identify informative genes relevant to cancer diagnosis and treatment.展开更多
In blood or bone marrow,leukemia is a form of cancer.A person with leukemia has an expansion of white blood cells(WBCs).It primarily affects children and rarely affects adults.Treatment depends on the type of leukemia...In blood or bone marrow,leukemia is a form of cancer.A person with leukemia has an expansion of white blood cells(WBCs).It primarily affects children and rarely affects adults.Treatment depends on the type of leukemia and the extent to which cancer has established throughout the body.Identifying leukemia in the initial stage is vital to providing timely patient care.Medical image-analysis-related approaches grant safer,quicker,and less costly solutions while ignoring the difficulties of these invasive processes.It can be simple to generalize Computer vision(CV)-based and image-processing techniques and eradicate human error.Many researchers have implemented computer-aided diagnosticmethods andmachine learning(ML)for laboratory image analysis,hopefully overcoming the limitations of late leukemia detection and determining its subgroups.This study establishes a Marine Predators Algorithm with Deep Learning Leukemia Cancer Classification(MPADL-LCC)algorithm onMedical Images.The projectedMPADL-LCC system uses a bilateral filtering(BF)technique to pre-process medical images.The MPADL-LCC system uses Faster SqueezeNet withMarine Predators Algorithm(MPA)as a hyperparameter optimizer for feature extraction.Lastly,the denoising autoencoder(DAE)methodology can be executed to accurately detect and classify leukemia cancer.The hyperparameter tuning process using MPA helps enhance leukemia cancer classification performance.Simulation results are compared with other recent approaches concerning various measurements and the MPADL-LCC algorithm exhibits the best results over other recent approaches.展开更多
Background:The nasal alar defect in Asians remains a challenging issue,as do clear classification and algorithm guidance,despite numerous previously described surgical techniques.The aim of this study is to propose a ...Background:The nasal alar defect in Asians remains a challenging issue,as do clear classification and algorithm guidance,despite numerous previously described surgical techniques.The aim of this study is to propose a surgical algorithm that addresses the appropriate surgical procedures for different types of nasal alar defects in Asian patients.Methods:A retrospective case note review was conducted on 32 patients with nasal alar defect who underwent reconstruction between 2008 and 2022.Based on careful analysis and our clinical experience,we proposed a classification system for nasal alar defects and presented a reconstructive algorithm.Patient data,including age,sex,diagnosis,surgical options,and complications,were assessed.The extent of surgical scar formation was evaluated using standard photography based on a 4-grade scar scale.Results:Among the 32 patients,there were 20 males and 12 females with nasal alar defects.The predominant cause of trauma in China was industrial factors.The majority of alar defects were classified as type Ⅰ C(n=8,25%),comprising 18 cases(56.2%);there were 5 cases(15.6%)of type Ⅱ defect,7(21.9%)of type Ⅲ defect,and 2(6.3%)of type Ⅳ defect.The most common surgical option was auricular composite graft(n=8,25%),followed by bilobed flap(n=6,18.8%),free auricular composite flap(n=4,12.5%),and primary closure(n=3,9.4%).Satisfactory improvements were observed postoperatively.Conclusion:Factors contributing to classifications were analyzed and defined,providing a framework for the proposed classification system.The reconstructive algorithm offers surgeons appropriate procedures for treating nasal alar defect in Asians.展开更多
In this study,eight different varieties of maize seeds were used as the research objects.Conduct 81 types of combined preprocessing on the original spectra.Through comparison,Savitzky-Golay(SG)-multivariate scattering...In this study,eight different varieties of maize seeds were used as the research objects.Conduct 81 types of combined preprocessing on the original spectra.Through comparison,Savitzky-Golay(SG)-multivariate scattering correction(MSC)-maximum-minimum normalization(MN)was identified as the optimal preprocessing technique.The competitive adaptive reweighted sampling(CARS),successive projections algorithm(SPA),and their combined methods were employed to extract feature wavelengths.Classification models based on back propagation(BP),support vector machine(SVM),random forest(RF),and partial least squares(PLS)were established using full-band data and feature wavelengths.Among all models,the(CARS-SPA)-BP model achieved the highest accuracy rate of 98.44%.This study offers novel insights and methodologies for the rapid and accurate identification of corn seeds as well as other crop seeds.展开更多
Cardiovascular disease prediction is a significant area of research in healthcare management systems(HMS).We will only be able to reduce the number of deaths if we anticipate cardiac problems in advance.The existing h...Cardiovascular disease prediction is a significant area of research in healthcare management systems(HMS).We will only be able to reduce the number of deaths if we anticipate cardiac problems in advance.The existing heart disease detection systems using machine learning have not yet produced sufficient results due to the reliance on available data.We present Clustered Butterfly Optimization Techniques(RoughK-means+BOA)as a new hybrid method for predicting heart disease.This method comprises two phases:clustering data using Roughk-means(RKM)and data analysis using the butterfly optimization algorithm(BOA).The benchmark dataset from the UCI repository is used for our experiments.The experiments are divided into three sets:the first set involves the RKM clustering technique,the next set evaluates the classification outcomes,and the last set validates the performance of the proposed hybrid model.The proposed RoughK-means+BOA has achieved a reasonable accuracy of 97.03 and a minimal error rate of 2.97.This result is comparatively better than other combinations of optimization techniques.In addition,this approach effectively enhances data segmentation,optimization,and classification performance.展开更多
Due to global warming, extreme weather and climate events are becoming more frequent, highlighting the need to explore the changing characteristics of precipitation in China, including extreme precipitation. A cluster...Due to global warming, extreme weather and climate events are becoming more frequent, highlighting the need to explore the changing characteristics of precipitation in China, including extreme precipitation. A clustering algorithm was developed to classify summer(June, July, and August) daily precipitation in China from 1961 to 2020, considering spatial distribution, standard deviations, and frequency of extreme precipitation events. The results reveal six distinct precipitation climate zones, a classification that differs from previous divisions. While overall precipitation has decreased in most regions, the frequency of extreme precipitation events has increased across all clusters, indicating a shift in precipitation distribution patterns. Analysis shows that the weakened Lake Baikal blocking high and strengthened Mongolian cyclone influence the arid region in northwest China(Cluster 1), which is characterized by the lowest precipitation.The transition zone between the monsoon and arid region(Cluster 2) is affected by the Mongolian cyclone, water vapor transport from the Indian Ocean, and shifts in the monsoon boundary. Clusters 3 and 4 represent areas associated with advancement and retreat of the summer monsoon. In the Meiyu region, two distinct subregions have been identified exist.Cluster 4 is primarily influenced by the East Asia-Pacific wave train. Despite sharing similar climate drivers and proximity,Clusters 4 and 5 differ significantly due to topographic variations and disparate levels of urbanization. Cluster 5 exhibits a higher average precipitation, greater variability, and more frequent extreme events. Cluster 6 exhibits the highest overall precipitation in the coastal areas of Guangdong and Guangxi, where abundant water vapor contributes to a higher frequency of extreme precipitation. In addition, anthropogenic activities and urbanization significantly influence precipitation in Beijing-Tianjin-Hebei and Yangtze River Delta regions. This research proposes a precipitation classification scheme integrating multiple precipitation parameters, providing support for risk management and mitigation strategies in the face of increasing extreme precipitation events.展开更多
Non-technical losses(NTL)of electric power are a serious problem for electric distribution companies.The solution determines the cost,stability,reliability,and quality of the supplied electricity.The widespread use of...Non-technical losses(NTL)of electric power are a serious problem for electric distribution companies.The solution determines the cost,stability,reliability,and quality of the supplied electricity.The widespread use of advanced metering infrastructure(AMI)and Smart Grid allows all participants in the distribution grid to store and track electricity consumption.During the research,a machine learning model is developed that allows analyzing and predicting the probability of NTL for each consumer of the distribution grid based on daily electricity consumption readings.This model is an ensemble meta-algorithm(stacking)that generalizes the algorithms of random forest,LightGBM,and a homogeneous ensemble of artificial neural networks.The best accuracy of the proposed meta-algorithm in comparison to basic classifiers is experimentally confirmed on the test sample.Such a model,due to good accuracy indicators(ROC-AUC-0.88),can be used as a methodological basis for a decision support system,the purpose of which is to form a sample of suspected NTL sources.The use of such a sample will allow the top management of electric distribution companies to increase the efficiency of raids by performers,making them targeted and accurate,which should contribute to the fight against NTL and the sustainable development of the electric power industry.展开更多
The distribution of shear-wave velocities in the subsurface is generally used to assess the potential forseismic liquefaction and soil amplification effects and to classify seismic sites. Newly developeddistributed ac...The distribution of shear-wave velocities in the subsurface is generally used to assess the potential forseismic liquefaction and soil amplification effects and to classify seismic sites. Newly developeddistributed acoustic sensing (DAS) technology enables estimation of the shear-wave distribution as ahigh-density seismic observation system. This technology is characterized by low maintenance costs,high-resolution outputs, and real-time data transmission capabilities, albeit with the challenge ofmanaging massive data generation. Rapid and efficient interpretation of data is the key to advancingapplication of the DAS technology. In this study, field tests were carried out to record ambient noise overa short period using DAS technology, from which the surface-wave dispersion curves were extracted. Inorder to reduce the influence of directional effects on the results, an unsupervised clustering method isused to select appropriate clusters to extract the Green's function. A combination of a genetic algorithmand Monte Carlo (GA-MC) simulation is proposed to invert the subsurface velocity structure. Thestratigraphic profiles obtained by the GA-MC method are in agreement with the borehole profiles.Compared to other methods, the proposed optimization method not only improves the solution qualitybut also reduces the solution time.展开更多
A new arrival and departure flight classification method based on the transitive closure algorithm (TCA) is proposed. Firstly, the fuzzy set theory and the transitive closure algorithm are introduced. Then four diff...A new arrival and departure flight classification method based on the transitive closure algorithm (TCA) is proposed. Firstly, the fuzzy set theory and the transitive closure algorithm are introduced. Then four different factors are selected to establish the flight classification model and a method is given to calculate the delay cost for each class. Finally, the proposed method is implemented in the sequencing problems of flights in a terminal area, and results are compared with that of the traditional classification method(TCM). Results show that the new classification model is effective in reducing the expenses of flight delays, thus optimizing the sequences of arrival and departure flights, and improving the efficiency of air traffic control.展开更多
A learning algorithm based on a hard limiter for feedforward neural networks (NN) is presented,and is applied in solving classification problems on separable convex sets and disjoint sets.It has been proved that the a...A learning algorithm based on a hard limiter for feedforward neural networks (NN) is presented,and is applied in solving classification problems on separable convex sets and disjoint sets.It has been proved that the algorithm has stronger classification ability than that of the back propagation (BP) algorithm for the feedforward NN using sigmoid function by simulation.What is more,the models can be implemented with lower cost hardware than that of the BP NN.LEARNIN展开更多
Since there are many factors affecting the quality of wine, total 17 factors were screened out using principle component analysis. The difference test was conducted on the evaluation data of the two groups of testers....Since there are many factors affecting the quality of wine, total 17 factors were screened out using principle component analysis. The difference test was conducted on the evaluation data of the two groups of testers. The results showed that the evaluation data of the second group were more reliable compared with those of the first group. At the same time, the KM algorithm was optimized using the QPSO algorithm. The wine classification model was established. Compared with the other two algorithms, the QPSO-KM algorithm was more capable of searching the globally optimum solution, and it could be used to classify the wine samples. In addition,the QPSO-KM algorithm could also be used to solve the issues about clustering.展开更多
In this paper, sixty-eight research articles published between 2000 and 2017 as well as textbooks which employed four classification algorithms: K-Nearest-Neighbor (KNN), Support Vector Machines (SVM), Random Forest (...In this paper, sixty-eight research articles published between 2000 and 2017 as well as textbooks which employed four classification algorithms: K-Nearest-Neighbor (KNN), Support Vector Machines (SVM), Random Forest (RF) and Neural Network (NN) as the main statistical tools were reviewed. The aim was to examine and compare these nonparametric classification methods on the following attributes: robustness to training data, sensitivity to changes, data fitting, stability, ability to handle large data sizes, sensitivity to noise, time invested in parameter tuning, and accuracy. The performances, strengths and shortcomings of each of the algorithms were examined, and finally, a conclusion was arrived at on which one has higher performance. It was evident from the literature reviewed that RF is too sensitive to small changes in the training dataset and is occasionally unstable and tends to overfit in the model. KNN is easy to implement and understand but has a major drawback of becoming significantly slow as the size of the data in use grows, while the ideal value of K for the KNN classifier is difficult to set. SVM and RF are insensitive to noise or overtraining, which shows their ability in dealing with unbalanced data. Larger input datasets will lengthen classification times for NN and KNN more than for SVM and RF. Among these nonparametric classification methods, NN has the potential to become a more widely used classification algorithm, but because of their time-consuming parameter tuning procedure, high level of complexity in computational processing, the numerous types of NN architectures to choose from and the high number of algorithms used for training, most researchers recommend SVM and RF as easier and wieldy used methods which repeatedly achieve results with high accuracies and are often faster to implement.展开更多
This study presents a framework for predicting geological characteristics based on integrating a stacking classification algorithm(SCA) with a grid search(GS) and K-fold cross validation(K-CV). The SCA includes two le...This study presents a framework for predicting geological characteristics based on integrating a stacking classification algorithm(SCA) with a grid search(GS) and K-fold cross validation(K-CV). The SCA includes two learner layers: a primary learner’s layer and meta-classifier layer. The accuracy of the SCA can be improved by using the GS and K-CV. The GS was developed to match the hyper-parameters and optimise complicated problems. The K-CV is commonly applied to changing the validation set in a training set. In general, a GS is usually combined with K-CV to produce a corresponding evaluation index and select the best hyper-parameters. The torque penetration index(TPI) and field penetration index(FPI) are proposed based on shield parameters to express the geological characteristics. The elbow method(EM) and silhouette coefficient(Si) are employed to determine the types of geological characteristics(K) in a Kmeans++ algorithm. A case study on mixed ground in Guangzhou is adopted to validate the applicability of the developed model. The results show that with the developed framework, the four selected parameters, i.e. thrust, advance rate, cutterhead rotation speed and cutterhead torque, can be used to effectively predict the corresponding geological characteristics.展开更多
Clustering filtering is usually a practical method for light detection and ranging(LiDAR)point clouds filtering according to their characteristic attributes.However,the amount of point cloud data is extremely large in...Clustering filtering is usually a practical method for light detection and ranging(LiDAR)point clouds filtering according to their characteristic attributes.However,the amount of point cloud data is extremely large in practice,making it impossible to cluster point clouds data directly,and the filtering error is also too large.Moreover,many existing filtering algorithms have poor classification results in discontinuous terrain.This article proposes a new fast classification filtering algorithm based on density clustering,which can solve the problem of point clouds classification in discontinuous terrain.Based on the spatial density of LiDAR point clouds,also the features of the ground object point clouds and the terrain point clouds,the point clouds are clustered firstly by their elevations,and then the plane point clouds are selected.Thus the number of samples and feature dimensions of data are reduced.Using the DBSCAN clustering filtering method,the original point clouds are finally divided into noise point clouds,ground object point clouds,and terrain point clouds.The experiment uses 15 sets of data samples provided by the International Society for Photogrammetry and Remote Sensing(ISPRS),and the results of the proposed algorithm are compared with the other eight classical filtering algorithms.Quantitative and qualitative analysis shows that the proposed algorithm has good applicability in urban areas and rural areas,and is significantly better than other classic filtering algorithms in discontinuous terrain,with a total error of about 10%.The results show that the proposed method is feasible and can be used in different terrains.展开更多
A brain-computer interface(BCI)system is one of the most effective ways that translates brain signals into output commands.Different imagery activities can be classified based on the changes inμandβrhythms and their...A brain-computer interface(BCI)system is one of the most effective ways that translates brain signals into output commands.Different imagery activities can be classified based on the changes inμandβrhythms and their spatial distributions.Multi-layer perceptron neural networks(MLP-NNs)are commonly used for classification.Training such MLP-NNs has great importance in a way that has attracted many researchers to this field recently.Conventional methods for training NNs,such as gradient descent and recursive methods,have some disadvantages including low accuracy,slow convergence speed and trapping in local minimums.In this paper,in order to overcome these issues,the MLP-NN trained by a hybrid population-physics-based algorithm,the combination of particle swarm optimization and gravitational search algorithm(PSOGSA),is proposed for our classification problem.To show the advantages of using PSOGSA that trains NNs,this algorithm is compared with other meta-heuristic algorithms such as particle swarm optimization(PSO),gravitational search algorithm(GSA)and new versions of PSO.The metrics that are discussed in this paper are the speed of convergence and classification accuracy metrics.The results show that the proposed algorithm in most subjects of encephalography(EEG)dataset has very better or acceptable performance compared to others.展开更多
This paper presents a new framework for object-based classification of high-resolution hyperspectral data.This multi-step framework is based on multi-resolution segmentation(MRS)and Random Forest classifier(RFC)algori...This paper presents a new framework for object-based classification of high-resolution hyperspectral data.This multi-step framework is based on multi-resolution segmentation(MRS)and Random Forest classifier(RFC)algorithms.The first step is to determine of weights of the input features while using the object-based approach with MRS to processing such images.Given the high number of input features,an automatic method is needed for estimation of this parameter.Moreover,we used the Variable Importance(VI),one of the outputs of the RFC,to determine the importance of each image band.Then,based on this parameter and other required parameters,the image is segmented into some homogenous regions.Finally,the RFC is carried out based on the characteristics of segments for converting them into meaningful objects.The proposed method,as well as,the conventional pixel-based RFC and Support Vector Machine(SVM)method was applied to three different hyperspectral data-sets with various spectral and spatial characteristics.These data were acquired by the HyMap,the Airborne Prism Experiment(APEX),and the Compact Airborne Spectrographic Imager(CASI)hyperspectral sensors.The experimental results show that the proposed method is more consistent for land cover mapping in various areas.The overall classification accuracy(OA),obtained by the proposed method was 95.48,86.57,and 84.29%for the HyMap,the APEX,and the CASI datasets,respectively.Moreover,this method showed better efficiency in comparison to the spectralbased classifications because the OAs of the proposed method was 5.67 and 3.75%higher than the conventional RFC and SVM classifiers,respectively.展开更多
In order to improve the accuracy and reduce the training and testing time in image classification algorithm, a novel image classification scheme based on extreme learning machine(ELM) and linear spatial pyramid matchi...In order to improve the accuracy and reduce the training and testing time in image classification algorithm, a novel image classification scheme based on extreme learning machine(ELM) and linear spatial pyramid matching using sparse coding(Sc SPM) for image classification is proposed. A new structure based on two layer extreme learning machine instead of the original linear SVM classifier is constructed. Firstly, the Sc SPM algorithm is performed to extract features of the multi-scale image blocks, and then each layer feature vector is connected to an ELM. Finally, the mapping features are connected together, and as the input of one ELM based on radial basis kernel function. With experimental evaluations on the well-known dataset benchmarks, the results demonstrate that the proposed algorithm has better performance not only in reducing the training time, but also in improving the accuracy of classification.展开更多
AIM:To conduct a classification study of high myopic maculopathy(HMM)using limited datasets,including tessellated fundus,diffuse chorioretinal atrophy,patchy chorioretinal atrophy,and macular atrophy,and minimize anno...AIM:To conduct a classification study of high myopic maculopathy(HMM)using limited datasets,including tessellated fundus,diffuse chorioretinal atrophy,patchy chorioretinal atrophy,and macular atrophy,and minimize annotation costs,and to optimize the ALFA-Mix active learning algorithm and apply it to HMM classification.METHODS:The optimized ALFA-Mix algorithm(ALFAMix+)was compared with five algorithms,including ALFA-Mix.Four models,including Res Net18,were established.Each algorithm was combined with four models for experiments on the HMM dataset.Each experiment consisted of 20 active learning rounds,with 100 images selected per round.The algorithm was evaluated by comparing the number of rounds in which ALFA-Mix+outperformed other algorithms.Finally,this study employed six models,including Efficient Former,to classify HMM.The best-performing model among these models was selected as the baseline model and combined with the ALFA-Mix+algorithm to achieve satisfactor y classification results with a small dataset.RESULTS:ALFA-Mix+outperforms other algorithms with an average superiority of 16.6,14.75,16.8,and 16.7 rounds in terms of accuracy,sensitivity,specificity,and Kappa value,respectively.This study conducted experiments on classifying HMM using several advanced deep learning models with a complete training set of 4252 images.The Efficient Former achieved the best results with an accuracy,sensitivity,specificity,and Kappa value of 0.8821,0.8334,0.9693,and 0.8339,respectively.Therefore,by combining ALFA-Mix+with Efficient Former,this study achieved results with an accuracy,sensitivity,specificity,and Kappa value of 0.8964,0.8643,0.9721,and 0.8537,respectively.CONCLUSION:The ALFA-Mix+algorithm reduces the required samples without compromising accuracy.Compared to other algorithms,ALFA-Mix+outperforms in more rounds of experiments.It effectively selects valuable samples compared to other algorithms.In HMM classification,combining ALFA-Mix+with Efficient Former enhances model performance,further demonstrating the effectiveness of ALFA-Mix+.展开更多
基金supported by the National Natural Science Foundation of China(Grant Number 62341210)Natural Science Foundation of Guangxi Province(Grant Number:2025GXNSFHA069267)Science and Technology Development Plan for Baise City(Grant Number 20233654).
文摘DNA microarrays, a cornerstone in biomedicine, measure gene expression across thousands to tens of thousands of genes. Identifying the genes vital for accurate cancer classification is a key challenge. Here, we present Fs-LSA (F-score based Learning Search Algorithm), a novel gene selection algorithm designed to enhance the precision and efficiency of target gene identification from microarray data for cancer classification. This algorithm is divided into two phases: the first leverages F-score values to prioritize and select feature genes with the most significant differential expression;the second phase introduces our Learning Search Algorithm (LSA), which harnesses swarm intelligence to identify the optimal subset among the remaining genes. Inspired by human social learning, LSA integrates historical data and collective intelligence for a thorough search, with a dynamic control mechanism that balances exploration and refinement, thereby enhancing the gene selection process. We conducted a rigorous validation of Fs-LSA’s performance using eight publicly available cancer microarray expression datasets. Fs-LSA achieved accuracy, precision, sensitivity, and F1-score values of 0.9932, 0.9923, 0.9962, and 0.994, respectively. Comparative analyses with state-of-the-art algorithms revealed Fs-LSA’s superior performance in terms of simplicity and efficiency. Additionally, we validated the algorithm’s efficacy independently using glioblastoma data from GEO and TCGA databases. It was significantly superior to those of the comparison algorithms. Importantly, the driver genes identified by Fs-LSA were instrumental in developing a predictive model as an independent prognostic indicator for glioblastoma, underscoring Fs-LSA’s transformative potential in genomics and personalized medicine.
基金funded by the Ministry of Higher Education of Malaysia,grant number FRGS/1/2022/ICT02/UPSI/02/1.
文摘In recent years,feature selection(FS)optimization of high-dimensional gene expression data has become one of the most promising approaches for cancer prediction and classification.This work reviews FS and classification methods that utilize evolutionary algorithms(EAs)for gene expression profiles in cancer or medical applications based on research motivations,challenges,and recommendations.Relevant studies were retrieved from four major academic databases-IEEE,Scopus,Springer,and ScienceDirect-using the keywords‘cancer classification’,‘optimization’,‘FS’,and‘gene expression profile’.A total of 67 papers were finally selected with key advancements identified as follows:(1)The majority of papers(44.8%)focused on developing algorithms and models for FS and classification.(2)The second category encompassed studies on biomarker identification by EAs,including 20 papers(30%).(3)The third category comprised works that applied FS to cancer data for decision support system purposes,addressing high-dimensional data and the formulation of chromosome length.These studies accounted for 12%of the total number of studies.(4)The remaining three papers(4.5%)were reviews and surveys focusing on models and developments in prediction and classification optimization for cancer classification under current technical conditions.This review highlights the importance of optimizing FS in EAs to manage high-dimensional data effectively.Despite recent advancements,significant limitations remain:the dynamic formulation of chromosome length remains an underexplored area.Thus,further research is needed on dynamic-length chromosome techniques for more sophisticated biomarker gene selection techniques.The findings suggest that further advancements in dynamic chromosome length formulations and adaptive algorithms could enhance cancer classification accuracy and efficiency.
基金the Deputyship for Research and Innovation,“Ministry of Education”in Saudi Arabia for funding this research(IFKSUOR3-014-3).
文摘In this study,our aim is to address the problem of gene selection by proposing a hybrid bio-inspired evolutionary algorithm that combines Grey Wolf Optimization(GWO)with Harris Hawks Optimization(HHO)for feature selection.Themotivation for utilizingGWOandHHOstems fromtheir bio-inspired nature and their demonstrated success in optimization problems.We aimto leverage the strengths of these algorithms to enhance the effectiveness of feature selection in microarray-based cancer classification.We selected leave-one-out cross-validation(LOOCV)to evaluate the performance of both two widely used classifiers,k-nearest neighbors(KNN)and support vector machine(SVM),on high-dimensional cancer microarray data.The proposed method is extensively tested on six publicly available cancer microarray datasets,and a comprehensive comparison with recently published methods is conducted.Our hybrid algorithm demonstrates its effectiveness in improving classification performance,Surpassing alternative approaches in terms of precision.The outcomes confirm the capability of our method to substantially improve both the precision and efficiency of cancer classification,thereby advancing the development ofmore efficient treatment strategies.The proposed hybridmethod offers a promising solution to the gene selection problem in microarray-based cancer classification.It improves the accuracy and efficiency of cancer diagnosis and treatment,and its superior performance compared to other methods highlights its potential applicability in realworld cancer classification tasks.By harnessing the complementary search mechanisms of GWO and HHO,we leverage their bio-inspired behavior to identify informative genes relevant to cancer diagnosis and treatment.
基金funded by Researchers Supporting Program at King Saud University,(RSPD2024R809).
文摘In blood or bone marrow,leukemia is a form of cancer.A person with leukemia has an expansion of white blood cells(WBCs).It primarily affects children and rarely affects adults.Treatment depends on the type of leukemia and the extent to which cancer has established throughout the body.Identifying leukemia in the initial stage is vital to providing timely patient care.Medical image-analysis-related approaches grant safer,quicker,and less costly solutions while ignoring the difficulties of these invasive processes.It can be simple to generalize Computer vision(CV)-based and image-processing techniques and eradicate human error.Many researchers have implemented computer-aided diagnosticmethods andmachine learning(ML)for laboratory image analysis,hopefully overcoming the limitations of late leukemia detection and determining its subgroups.This study establishes a Marine Predators Algorithm with Deep Learning Leukemia Cancer Classification(MPADL-LCC)algorithm onMedical Images.The projectedMPADL-LCC system uses a bilateral filtering(BF)technique to pre-process medical images.The MPADL-LCC system uses Faster SqueezeNet withMarine Predators Algorithm(MPA)as a hyperparameter optimizer for feature extraction.Lastly,the denoising autoencoder(DAE)methodology can be executed to accurately detect and classify leukemia cancer.The hyperparameter tuning process using MPA helps enhance leukemia cancer classification performance.Simulation results are compared with other recent approaches concerning various measurements and the MPADL-LCC algorithm exhibits the best results over other recent approaches.
文摘Background:The nasal alar defect in Asians remains a challenging issue,as do clear classification and algorithm guidance,despite numerous previously described surgical techniques.The aim of this study is to propose a surgical algorithm that addresses the appropriate surgical procedures for different types of nasal alar defects in Asian patients.Methods:A retrospective case note review was conducted on 32 patients with nasal alar defect who underwent reconstruction between 2008 and 2022.Based on careful analysis and our clinical experience,we proposed a classification system for nasal alar defects and presented a reconstructive algorithm.Patient data,including age,sex,diagnosis,surgical options,and complications,were assessed.The extent of surgical scar formation was evaluated using standard photography based on a 4-grade scar scale.Results:Among the 32 patients,there were 20 males and 12 females with nasal alar defects.The predominant cause of trauma in China was industrial factors.The majority of alar defects were classified as type Ⅰ C(n=8,25%),comprising 18 cases(56.2%);there were 5 cases(15.6%)of type Ⅱ defect,7(21.9%)of type Ⅲ defect,and 2(6.3%)of type Ⅳ defect.The most common surgical option was auricular composite graft(n=8,25%),followed by bilobed flap(n=6,18.8%),free auricular composite flap(n=4,12.5%),and primary closure(n=3,9.4%).Satisfactory improvements were observed postoperatively.Conclusion:Factors contributing to classifications were analyzed and defined,providing a framework for the proposed classification system.The reconstructive algorithm offers surgeons appropriate procedures for treating nasal alar defect in Asians.
基金supported by the Science and Technology Development Plan Project of Jilin Provincial Department of Science and Technology (No.20220203112S)the Jilin Provincial Department of Education Science and Technology Research Project (No.JJKH20210039KJ)。
文摘In this study,eight different varieties of maize seeds were used as the research objects.Conduct 81 types of combined preprocessing on the original spectra.Through comparison,Savitzky-Golay(SG)-multivariate scattering correction(MSC)-maximum-minimum normalization(MN)was identified as the optimal preprocessing technique.The competitive adaptive reweighted sampling(CARS),successive projections algorithm(SPA),and their combined methods were employed to extract feature wavelengths.Classification models based on back propagation(BP),support vector machine(SVM),random forest(RF),and partial least squares(PLS)were established using full-band data and feature wavelengths.Among all models,the(CARS-SPA)-BP model achieved the highest accuracy rate of 98.44%.This study offers novel insights and methodologies for the rapid and accurate identification of corn seeds as well as other crop seeds.
基金supported by the Research Incentive Grant 23200 of Zayed University,United Arab Emirates.
文摘Cardiovascular disease prediction is a significant area of research in healthcare management systems(HMS).We will only be able to reduce the number of deaths if we anticipate cardiac problems in advance.The existing heart disease detection systems using machine learning have not yet produced sufficient results due to the reliance on available data.We present Clustered Butterfly Optimization Techniques(RoughK-means+BOA)as a new hybrid method for predicting heart disease.This method comprises two phases:clustering data using Roughk-means(RKM)and data analysis using the butterfly optimization algorithm(BOA).The benchmark dataset from the UCI repository is used for our experiments.The experiments are divided into three sets:the first set involves the RKM clustering technique,the next set evaluates the classification outcomes,and the last set validates the performance of the proposed hybrid model.The proposed RoughK-means+BOA has achieved a reasonable accuracy of 97.03 and a minimal error rate of 2.97.This result is comparatively better than other combinations of optimization techniques.In addition,this approach effectively enhances data segmentation,optimization,and classification performance.
基金National Natural Science Foundation of China(U2442202, 42274217, 62441501)Key Innovation Team of China Meteorological Administration (CMA2024ZD01)Scientific Research Foundation of CUIT (376278, KYTZ202158)。
文摘Due to global warming, extreme weather and climate events are becoming more frequent, highlighting the need to explore the changing characteristics of precipitation in China, including extreme precipitation. A clustering algorithm was developed to classify summer(June, July, and August) daily precipitation in China from 1961 to 2020, considering spatial distribution, standard deviations, and frequency of extreme precipitation events. The results reveal six distinct precipitation climate zones, a classification that differs from previous divisions. While overall precipitation has decreased in most regions, the frequency of extreme precipitation events has increased across all clusters, indicating a shift in precipitation distribution patterns. Analysis shows that the weakened Lake Baikal blocking high and strengthened Mongolian cyclone influence the arid region in northwest China(Cluster 1), which is characterized by the lowest precipitation.The transition zone between the monsoon and arid region(Cluster 2) is affected by the Mongolian cyclone, water vapor transport from the Indian Ocean, and shifts in the monsoon boundary. Clusters 3 and 4 represent areas associated with advancement and retreat of the summer monsoon. In the Meiyu region, two distinct subregions have been identified exist.Cluster 4 is primarily influenced by the East Asia-Pacific wave train. Despite sharing similar climate drivers and proximity,Clusters 4 and 5 differ significantly due to topographic variations and disparate levels of urbanization. Cluster 5 exhibits a higher average precipitation, greater variability, and more frequent extreme events. Cluster 6 exhibits the highest overall precipitation in the coastal areas of Guangdong and Guangxi, where abundant water vapor contributes to a higher frequency of extreme precipitation. In addition, anthropogenic activities and urbanization significantly influence precipitation in Beijing-Tianjin-Hebei and Yangtze River Delta regions. This research proposes a precipitation classification scheme integrating multiple precipitation parameters, providing support for risk management and mitigation strategies in the face of increasing extreme precipitation events.
文摘Non-technical losses(NTL)of electric power are a serious problem for electric distribution companies.The solution determines the cost,stability,reliability,and quality of the supplied electricity.The widespread use of advanced metering infrastructure(AMI)and Smart Grid allows all participants in the distribution grid to store and track electricity consumption.During the research,a machine learning model is developed that allows analyzing and predicting the probability of NTL for each consumer of the distribution grid based on daily electricity consumption readings.This model is an ensemble meta-algorithm(stacking)that generalizes the algorithms of random forest,LightGBM,and a homogeneous ensemble of artificial neural networks.The best accuracy of the proposed meta-algorithm in comparison to basic classifiers is experimentally confirmed on the test sample.Such a model,due to good accuracy indicators(ROC-AUC-0.88),can be used as a methodological basis for a decision support system,the purpose of which is to form a sample of suspected NTL sources.The use of such a sample will allow the top management of electric distribution companies to increase the efficiency of raids by performers,making them targeted and accurate,which should contribute to the fight against NTL and the sustainable development of the electric power industry.
基金supported by the National Natural Science Foundation of China(Grant Nos.42225702 and 42077235)the Natural Science Foundation of Jiangsu Province(Grant No.BK20211086)the open fund of the Key Laboratory of Earth Fissures Geological Disaster,Ministry of Natural Resources.
文摘The distribution of shear-wave velocities in the subsurface is generally used to assess the potential forseismic liquefaction and soil amplification effects and to classify seismic sites. Newly developeddistributed acoustic sensing (DAS) technology enables estimation of the shear-wave distribution as ahigh-density seismic observation system. This technology is characterized by low maintenance costs,high-resolution outputs, and real-time data transmission capabilities, albeit with the challenge ofmanaging massive data generation. Rapid and efficient interpretation of data is the key to advancingapplication of the DAS technology. In this study, field tests were carried out to record ambient noise overa short period using DAS technology, from which the surface-wave dispersion curves were extracted. Inorder to reduce the influence of directional effects on the results, an unsupervised clustering method isused to select appropriate clusters to extract the Green's function. A combination of a genetic algorithmand Monte Carlo (GA-MC) simulation is proposed to invert the subsurface velocity structure. Thestratigraphic profiles obtained by the GA-MC method are in agreement with the borehole profiles.Compared to other methods, the proposed optimization method not only improves the solution qualitybut also reduces the solution time.
文摘A new arrival and departure flight classification method based on the transitive closure algorithm (TCA) is proposed. Firstly, the fuzzy set theory and the transitive closure algorithm are introduced. Then four different factors are selected to establish the flight classification model and a method is given to calculate the delay cost for each class. Finally, the proposed method is implemented in the sequencing problems of flights in a terminal area, and results are compared with that of the traditional classification method(TCM). Results show that the new classification model is effective in reducing the expenses of flight delays, thus optimizing the sequences of arrival and departure flights, and improving the efficiency of air traffic control.
文摘A learning algorithm based on a hard limiter for feedforward neural networks (NN) is presented,and is applied in solving classification problems on separable convex sets and disjoint sets.It has been proved that the algorithm has stronger classification ability than that of the back propagation (BP) algorithm for the feedforward NN using sigmoid function by simulation.What is more,the models can be implemented with lower cost hardware than that of the BP NN.LEARNIN
文摘Since there are many factors affecting the quality of wine, total 17 factors were screened out using principle component analysis. The difference test was conducted on the evaluation data of the two groups of testers. The results showed that the evaluation data of the second group were more reliable compared with those of the first group. At the same time, the KM algorithm was optimized using the QPSO algorithm. The wine classification model was established. Compared with the other two algorithms, the QPSO-KM algorithm was more capable of searching the globally optimum solution, and it could be used to classify the wine samples. In addition,the QPSO-KM algorithm could also be used to solve the issues about clustering.
文摘In this paper, sixty-eight research articles published between 2000 and 2017 as well as textbooks which employed four classification algorithms: K-Nearest-Neighbor (KNN), Support Vector Machines (SVM), Random Forest (RF) and Neural Network (NN) as the main statistical tools were reviewed. The aim was to examine and compare these nonparametric classification methods on the following attributes: robustness to training data, sensitivity to changes, data fitting, stability, ability to handle large data sizes, sensitivity to noise, time invested in parameter tuning, and accuracy. The performances, strengths and shortcomings of each of the algorithms were examined, and finally, a conclusion was arrived at on which one has higher performance. It was evident from the literature reviewed that RF is too sensitive to small changes in the training dataset and is occasionally unstable and tends to overfit in the model. KNN is easy to implement and understand but has a major drawback of becoming significantly slow as the size of the data in use grows, while the ideal value of K for the KNN classifier is difficult to set. SVM and RF are insensitive to noise or overtraining, which shows their ability in dealing with unbalanced data. Larger input datasets will lengthen classification times for NN and KNN more than for SVM and RF. Among these nonparametric classification methods, NN has the potential to become a more widely used classification algorithm, but because of their time-consuming parameter tuning procedure, high level of complexity in computational processing, the numerous types of NN architectures to choose from and the high number of algorithms used for training, most researchers recommend SVM and RF as easier and wieldy used methods which repeatedly achieve results with high accuracies and are often faster to implement.
基金funded by“The Pearl River Talent Recruitment Program”of Guangdong Province in 2019(Grant No.2019CX01G338)the Research Funding of Shantou University for New Faculty Member(Grant No.NTF19024-2019).
文摘This study presents a framework for predicting geological characteristics based on integrating a stacking classification algorithm(SCA) with a grid search(GS) and K-fold cross validation(K-CV). The SCA includes two learner layers: a primary learner’s layer and meta-classifier layer. The accuracy of the SCA can be improved by using the GS and K-CV. The GS was developed to match the hyper-parameters and optimise complicated problems. The K-CV is commonly applied to changing the validation set in a training set. In general, a GS is usually combined with K-CV to produce a corresponding evaluation index and select the best hyper-parameters. The torque penetration index(TPI) and field penetration index(FPI) are proposed based on shield parameters to express the geological characteristics. The elbow method(EM) and silhouette coefficient(Si) are employed to determine the types of geological characteristics(K) in a Kmeans++ algorithm. A case study on mixed ground in Guangzhou is adopted to validate the applicability of the developed model. The results show that with the developed framework, the four selected parameters, i.e. thrust, advance rate, cutterhead rotation speed and cutterhead torque, can be used to effectively predict the corresponding geological characteristics.
基金The Natural Science Foundation of Hunan Province,China(No.2020JJ4601)Open Fund of the Key Laboratory of Highway Engi-neering of Ministry of Education(No.kfj190203).
文摘Clustering filtering is usually a practical method for light detection and ranging(LiDAR)point clouds filtering according to their characteristic attributes.However,the amount of point cloud data is extremely large in practice,making it impossible to cluster point clouds data directly,and the filtering error is also too large.Moreover,many existing filtering algorithms have poor classification results in discontinuous terrain.This article proposes a new fast classification filtering algorithm based on density clustering,which can solve the problem of point clouds classification in discontinuous terrain.Based on the spatial density of LiDAR point clouds,also the features of the ground object point clouds and the terrain point clouds,the point clouds are clustered firstly by their elevations,and then the plane point clouds are selected.Thus the number of samples and feature dimensions of data are reduced.Using the DBSCAN clustering filtering method,the original point clouds are finally divided into noise point clouds,ground object point clouds,and terrain point clouds.The experiment uses 15 sets of data samples provided by the International Society for Photogrammetry and Remote Sensing(ISPRS),and the results of the proposed algorithm are compared with the other eight classical filtering algorithms.Quantitative and qualitative analysis shows that the proposed algorithm has good applicability in urban areas and rural areas,and is significantly better than other classic filtering algorithms in discontinuous terrain,with a total error of about 10%.The results show that the proposed method is feasible and can be used in different terrains.
文摘A brain-computer interface(BCI)system is one of the most effective ways that translates brain signals into output commands.Different imagery activities can be classified based on the changes inμandβrhythms and their spatial distributions.Multi-layer perceptron neural networks(MLP-NNs)are commonly used for classification.Training such MLP-NNs has great importance in a way that has attracted many researchers to this field recently.Conventional methods for training NNs,such as gradient descent and recursive methods,have some disadvantages including low accuracy,slow convergence speed and trapping in local minimums.In this paper,in order to overcome these issues,the MLP-NN trained by a hybrid population-physics-based algorithm,the combination of particle swarm optimization and gravitational search algorithm(PSOGSA),is proposed for our classification problem.To show the advantages of using PSOGSA that trains NNs,this algorithm is compared with other meta-heuristic algorithms such as particle swarm optimization(PSO),gravitational search algorithm(GSA)and new versions of PSO.The metrics that are discussed in this paper are the speed of convergence and classification accuracy metrics.The results show that the proposed algorithm in most subjects of encephalography(EEG)dataset has very better or acceptable performance compared to others.
文摘This paper presents a new framework for object-based classification of high-resolution hyperspectral data.This multi-step framework is based on multi-resolution segmentation(MRS)and Random Forest classifier(RFC)algorithms.The first step is to determine of weights of the input features while using the object-based approach with MRS to processing such images.Given the high number of input features,an automatic method is needed for estimation of this parameter.Moreover,we used the Variable Importance(VI),one of the outputs of the RFC,to determine the importance of each image band.Then,based on this parameter and other required parameters,the image is segmented into some homogenous regions.Finally,the RFC is carried out based on the characteristics of segments for converting them into meaningful objects.The proposed method,as well as,the conventional pixel-based RFC and Support Vector Machine(SVM)method was applied to three different hyperspectral data-sets with various spectral and spatial characteristics.These data were acquired by the HyMap,the Airborne Prism Experiment(APEX),and the Compact Airborne Spectrographic Imager(CASI)hyperspectral sensors.The experimental results show that the proposed method is more consistent for land cover mapping in various areas.The overall classification accuracy(OA),obtained by the proposed method was 95.48,86.57,and 84.29%for the HyMap,the APEX,and the CASI datasets,respectively.Moreover,this method showed better efficiency in comparison to the spectralbased classifications because the OAs of the proposed method was 5.67 and 3.75%higher than the conventional RFC and SVM classifiers,respectively.
基金supported by the National Natural Science Foundation of China under Grant No. 61503424the Research Project by The State Ethnic Affairs Commission under Grant No. 14ZYZ017+1 种基金the Jiangsu Future Networks Innovation Institute-Prospective Research Project on Future Networks under Grant No. BY2013095-2-14the first-class discipline construction transitional funds of Minzu University of China
文摘In order to improve the accuracy and reduce the training and testing time in image classification algorithm, a novel image classification scheme based on extreme learning machine(ELM) and linear spatial pyramid matching using sparse coding(Sc SPM) for image classification is proposed. A new structure based on two layer extreme learning machine instead of the original linear SVM classifier is constructed. Firstly, the Sc SPM algorithm is performed to extract features of the multi-scale image blocks, and then each layer feature vector is connected to an ELM. Finally, the mapping features are connected together, and as the input of one ELM based on radial basis kernel function. With experimental evaluations on the well-known dataset benchmarks, the results demonstrate that the proposed algorithm has better performance not only in reducing the training time, but also in improving the accuracy of classification.
基金Supported by the National Natural Science Foundation of China(No.61906066)the Zhejiang Provincial Philosophy and Social Science Planning Project(No.21NDJC021Z)+4 种基金Shenzhen Fund for Guangdong Provincial High-level Clinical Key Specialties(No.SZGSP014)Sanming Project of Medicine in Shenzhen(No.SZSM202011015)Shenzhen Science and Technology Planning Project(No.KCXFZ20211020163813019)the Natural Science Foundation of Ningbo City(No.202003N4072)the Postgraduate Research and Innovation Project of Huzhou University(No.2023KYCX52)。
文摘AIM:To conduct a classification study of high myopic maculopathy(HMM)using limited datasets,including tessellated fundus,diffuse chorioretinal atrophy,patchy chorioretinal atrophy,and macular atrophy,and minimize annotation costs,and to optimize the ALFA-Mix active learning algorithm and apply it to HMM classification.METHODS:The optimized ALFA-Mix algorithm(ALFAMix+)was compared with five algorithms,including ALFA-Mix.Four models,including Res Net18,were established.Each algorithm was combined with four models for experiments on the HMM dataset.Each experiment consisted of 20 active learning rounds,with 100 images selected per round.The algorithm was evaluated by comparing the number of rounds in which ALFA-Mix+outperformed other algorithms.Finally,this study employed six models,including Efficient Former,to classify HMM.The best-performing model among these models was selected as the baseline model and combined with the ALFA-Mix+algorithm to achieve satisfactor y classification results with a small dataset.RESULTS:ALFA-Mix+outperforms other algorithms with an average superiority of 16.6,14.75,16.8,and 16.7 rounds in terms of accuracy,sensitivity,specificity,and Kappa value,respectively.This study conducted experiments on classifying HMM using several advanced deep learning models with a complete training set of 4252 images.The Efficient Former achieved the best results with an accuracy,sensitivity,specificity,and Kappa value of 0.8821,0.8334,0.9693,and 0.8339,respectively.Therefore,by combining ALFA-Mix+with Efficient Former,this study achieved results with an accuracy,sensitivity,specificity,and Kappa value of 0.8964,0.8643,0.9721,and 0.8537,respectively.CONCLUSION:The ALFA-Mix+algorithm reduces the required samples without compromising accuracy.Compared to other algorithms,ALFA-Mix+outperforms in more rounds of experiments.It effectively selects valuable samples compared to other algorithms.In HMM classification,combining ALFA-Mix+with Efficient Former enhances model performance,further demonstrating the effectiveness of ALFA-Mix+.