DNA microarrays, a cornerstone in biomedicine, measure gene expression across thousands to tens of thousands of genes. Identifying the genes vital for accurate cancer classification is a key challenge. Here, we presen...DNA microarrays, a cornerstone in biomedicine, measure gene expression across thousands to tens of thousands of genes. Identifying the genes vital for accurate cancer classification is a key challenge. Here, we present Fs-LSA (F-score based Learning Search Algorithm), a novel gene selection algorithm designed to enhance the precision and efficiency of target gene identification from microarray data for cancer classification. This algorithm is divided into two phases: the first leverages F-score values to prioritize and select feature genes with the most significant differential expression;the second phase introduces our Learning Search Algorithm (LSA), which harnesses swarm intelligence to identify the optimal subset among the remaining genes. Inspired by human social learning, LSA integrates historical data and collective intelligence for a thorough search, with a dynamic control mechanism that balances exploration and refinement, thereby enhancing the gene selection process. We conducted a rigorous validation of Fs-LSA’s performance using eight publicly available cancer microarray expression datasets. Fs-LSA achieved accuracy, precision, sensitivity, and F1-score values of 0.9932, 0.9923, 0.9962, and 0.994, respectively. Comparative analyses with state-of-the-art algorithms revealed Fs-LSA’s superior performance in terms of simplicity and efficiency. Additionally, we validated the algorithm’s efficacy independently using glioblastoma data from GEO and TCGA databases. It was significantly superior to those of the comparison algorithms. Importantly, the driver genes identified by Fs-LSA were instrumental in developing a predictive model as an independent prognostic indicator for glioblastoma, underscoring Fs-LSA’s transformative potential in genomics and personalized medicine.展开更多
In recent years,feature selection(FS)optimization of high-dimensional gene expression data has become one of the most promising approaches for cancer prediction and classification.This work reviews FS and classificati...In recent years,feature selection(FS)optimization of high-dimensional gene expression data has become one of the most promising approaches for cancer prediction and classification.This work reviews FS and classification methods that utilize evolutionary algorithms(EAs)for gene expression profiles in cancer or medical applications based on research motivations,challenges,and recommendations.Relevant studies were retrieved from four major academic databases-IEEE,Scopus,Springer,and ScienceDirect-using the keywords‘cancer classification’,‘optimization’,‘FS’,and‘gene expression profile’.A total of 67 papers were finally selected with key advancements identified as follows:(1)The majority of papers(44.8%)focused on developing algorithms and models for FS and classification.(2)The second category encompassed studies on biomarker identification by EAs,including 20 papers(30%).(3)The third category comprised works that applied FS to cancer data for decision support system purposes,addressing high-dimensional data and the formulation of chromosome length.These studies accounted for 12%of the total number of studies.(4)The remaining three papers(4.5%)were reviews and surveys focusing on models and developments in prediction and classification optimization for cancer classification under current technical conditions.This review highlights the importance of optimizing FS in EAs to manage high-dimensional data effectively.Despite recent advancements,significant limitations remain:the dynamic formulation of chromosome length remains an underexplored area.Thus,further research is needed on dynamic-length chromosome techniques for more sophisticated biomarker gene selection techniques.The findings suggest that further advancements in dynamic chromosome length formulations and adaptive algorithms could enhance cancer classification accuracy and efficiency.展开更多
A new arrival and departure flight classification method based on the transitive closure algorithm (TCA) is proposed. Firstly, the fuzzy set theory and the transitive closure algorithm are introduced. Then four diff...A new arrival and departure flight classification method based on the transitive closure algorithm (TCA) is proposed. Firstly, the fuzzy set theory and the transitive closure algorithm are introduced. Then four different factors are selected to establish the flight classification model and a method is given to calculate the delay cost for each class. Finally, the proposed method is implemented in the sequencing problems of flights in a terminal area, and results are compared with that of the traditional classification method(TCM). Results show that the new classification model is effective in reducing the expenses of flight delays, thus optimizing the sequences of arrival and departure flights, and improving the efficiency of air traffic control.展开更多
A learning algorithm based on a hard limiter for feedforward neural networks (NN) is presented,and is applied in solving classification problems on separable convex sets and disjoint sets.It has been proved that the a...A learning algorithm based on a hard limiter for feedforward neural networks (NN) is presented,and is applied in solving classification problems on separable convex sets and disjoint sets.It has been proved that the algorithm has stronger classification ability than that of the back propagation (BP) algorithm for the feedforward NN using sigmoid function by simulation.What is more,the models can be implemented with lower cost hardware than that of the BP NN.LEARNIN展开更多
Since there are many factors affecting the quality of wine, total 17 factors were screened out using principle component analysis. The difference test was conducted on the evaluation data of the two groups of testers....Since there are many factors affecting the quality of wine, total 17 factors were screened out using principle component analysis. The difference test was conducted on the evaluation data of the two groups of testers. The results showed that the evaluation data of the second group were more reliable compared with those of the first group. At the same time, the KM algorithm was optimized using the QPSO algorithm. The wine classification model was established. Compared with the other two algorithms, the QPSO-KM algorithm was more capable of searching the globally optimum solution, and it could be used to classify the wine samples. In addition,the QPSO-KM algorithm could also be used to solve the issues about clustering.展开更多
This study presents a framework for predicting geological characteristics based on integrating a stacking classification algorithm(SCA) with a grid search(GS) and K-fold cross validation(K-CV). The SCA includes two le...This study presents a framework for predicting geological characteristics based on integrating a stacking classification algorithm(SCA) with a grid search(GS) and K-fold cross validation(K-CV). The SCA includes two learner layers: a primary learner’s layer and meta-classifier layer. The accuracy of the SCA can be improved by using the GS and K-CV. The GS was developed to match the hyper-parameters and optimise complicated problems. The K-CV is commonly applied to changing the validation set in a training set. In general, a GS is usually combined with K-CV to produce a corresponding evaluation index and select the best hyper-parameters. The torque penetration index(TPI) and field penetration index(FPI) are proposed based on shield parameters to express the geological characteristics. The elbow method(EM) and silhouette coefficient(Si) are employed to determine the types of geological characteristics(K) in a Kmeans++ algorithm. A case study on mixed ground in Guangzhou is adopted to validate the applicability of the developed model. The results show that with the developed framework, the four selected parameters, i.e. thrust, advance rate, cutterhead rotation speed and cutterhead torque, can be used to effectively predict the corresponding geological characteristics.展开更多
In this paper, sixty-eight research articles published between 2000 and 2017 as well as textbooks which employed four classification algorithms: K-Nearest-Neighbor (KNN), Support Vector Machines (SVM), Random Forest (...In this paper, sixty-eight research articles published between 2000 and 2017 as well as textbooks which employed four classification algorithms: K-Nearest-Neighbor (KNN), Support Vector Machines (SVM), Random Forest (RF) and Neural Network (NN) as the main statistical tools were reviewed. The aim was to examine and compare these nonparametric classification methods on the following attributes: robustness to training data, sensitivity to changes, data fitting, stability, ability to handle large data sizes, sensitivity to noise, time invested in parameter tuning, and accuracy. The performances, strengths and shortcomings of each of the algorithms were examined, and finally, a conclusion was arrived at on which one has higher performance. It was evident from the literature reviewed that RF is too sensitive to small changes in the training dataset and is occasionally unstable and tends to overfit in the model. KNN is easy to implement and understand but has a major drawback of becoming significantly slow as the size of the data in use grows, while the ideal value of K for the KNN classifier is difficult to set. SVM and RF are insensitive to noise or overtraining, which shows their ability in dealing with unbalanced data. Larger input datasets will lengthen classification times for NN and KNN more than for SVM and RF. Among these nonparametric classification methods, NN has the potential to become a more widely used classification algorithm, but because of their time-consuming parameter tuning procedure, high level of complexity in computational processing, the numerous types of NN architectures to choose from and the high number of algorithms used for training, most researchers recommend SVM and RF as easier and wieldy used methods which repeatedly achieve results with high accuracies and are often faster to implement.展开更多
Clustering filtering is usually a practical method for light detection and ranging(LiDAR)point clouds filtering according to their characteristic attributes.However,the amount of point cloud data is extremely large in...Clustering filtering is usually a practical method for light detection and ranging(LiDAR)point clouds filtering according to their characteristic attributes.However,the amount of point cloud data is extremely large in practice,making it impossible to cluster point clouds data directly,and the filtering error is also too large.Moreover,many existing filtering algorithms have poor classification results in discontinuous terrain.This article proposes a new fast classification filtering algorithm based on density clustering,which can solve the problem of point clouds classification in discontinuous terrain.Based on the spatial density of LiDAR point clouds,also the features of the ground object point clouds and the terrain point clouds,the point clouds are clustered firstly by their elevations,and then the plane point clouds are selected.Thus the number of samples and feature dimensions of data are reduced.Using the DBSCAN clustering filtering method,the original point clouds are finally divided into noise point clouds,ground object point clouds,and terrain point clouds.The experiment uses 15 sets of data samples provided by the International Society for Photogrammetry and Remote Sensing(ISPRS),and the results of the proposed algorithm are compared with the other eight classical filtering algorithms.Quantitative and qualitative analysis shows that the proposed algorithm has good applicability in urban areas and rural areas,and is significantly better than other classic filtering algorithms in discontinuous terrain,with a total error of about 10%.The results show that the proposed method is feasible and can be used in different terrains.展开更多
A brain-computer interface(BCI)system is one of the most effective ways that translates brain signals into output commands.Different imagery activities can be classified based on the changes inμandβrhythms and their...A brain-computer interface(BCI)system is one of the most effective ways that translates brain signals into output commands.Different imagery activities can be classified based on the changes inμandβrhythms and their spatial distributions.Multi-layer perceptron neural networks(MLP-NNs)are commonly used for classification.Training such MLP-NNs has great importance in a way that has attracted many researchers to this field recently.Conventional methods for training NNs,such as gradient descent and recursive methods,have some disadvantages including low accuracy,slow convergence speed and trapping in local minimums.In this paper,in order to overcome these issues,the MLP-NN trained by a hybrid population-physics-based algorithm,the combination of particle swarm optimization and gravitational search algorithm(PSOGSA),is proposed for our classification problem.To show the advantages of using PSOGSA that trains NNs,this algorithm is compared with other meta-heuristic algorithms such as particle swarm optimization(PSO),gravitational search algorithm(GSA)and new versions of PSO.The metrics that are discussed in this paper are the speed of convergence and classification accuracy metrics.The results show that the proposed algorithm in most subjects of encephalography(EEG)dataset has very better or acceptable performance compared to others.展开更多
This paper presents a new framework for object-based classification of high-resolution hyperspectral data.This multi-step framework is based on multi-resolution segmentation(MRS)and Random Forest classifier(RFC)algori...This paper presents a new framework for object-based classification of high-resolution hyperspectral data.This multi-step framework is based on multi-resolution segmentation(MRS)and Random Forest classifier(RFC)algorithms.The first step is to determine of weights of the input features while using the object-based approach with MRS to processing such images.Given the high number of input features,an automatic method is needed for estimation of this parameter.Moreover,we used the Variable Importance(VI),one of the outputs of the RFC,to determine the importance of each image band.Then,based on this parameter and other required parameters,the image is segmented into some homogenous regions.Finally,the RFC is carried out based on the characteristics of segments for converting them into meaningful objects.The proposed method,as well as,the conventional pixel-based RFC and Support Vector Machine(SVM)method was applied to three different hyperspectral data-sets with various spectral and spatial characteristics.These data were acquired by the HyMap,the Airborne Prism Experiment(APEX),and the Compact Airborne Spectrographic Imager(CASI)hyperspectral sensors.The experimental results show that the proposed method is more consistent for land cover mapping in various areas.The overall classification accuracy(OA),obtained by the proposed method was 95.48,86.57,and 84.29%for the HyMap,the APEX,and the CASI datasets,respectively.Moreover,this method showed better efficiency in comparison to the spectralbased classifications because the OAs of the proposed method was 5.67 and 3.75%higher than the conventional RFC and SVM classifiers,respectively.展开更多
In order to improve the accuracy and reduce the training and testing time in image classification algorithm, a novel image classification scheme based on extreme learning machine(ELM) and linear spatial pyramid matchi...In order to improve the accuracy and reduce the training and testing time in image classification algorithm, a novel image classification scheme based on extreme learning machine(ELM) and linear spatial pyramid matching using sparse coding(Sc SPM) for image classification is proposed. A new structure based on two layer extreme learning machine instead of the original linear SVM classifier is constructed. Firstly, the Sc SPM algorithm is performed to extract features of the multi-scale image blocks, and then each layer feature vector is connected to an ELM. Finally, the mapping features are connected together, and as the input of one ELM based on radial basis kernel function. With experimental evaluations on the well-known dataset benchmarks, the results demonstrate that the proposed algorithm has better performance not only in reducing the training time, but also in improving the accuracy of classification.展开更多
AIM:To conduct a classification study of high myopic maculopathy(HMM)using limited datasets,including tessellated fundus,diffuse chorioretinal atrophy,patchy chorioretinal atrophy,and macular atrophy,and minimize anno...AIM:To conduct a classification study of high myopic maculopathy(HMM)using limited datasets,including tessellated fundus,diffuse chorioretinal atrophy,patchy chorioretinal atrophy,and macular atrophy,and minimize annotation costs,and to optimize the ALFA-Mix active learning algorithm and apply it to HMM classification.METHODS:The optimized ALFA-Mix algorithm(ALFAMix+)was compared with five algorithms,including ALFA-Mix.Four models,including Res Net18,were established.Each algorithm was combined with four models for experiments on the HMM dataset.Each experiment consisted of 20 active learning rounds,with 100 images selected per round.The algorithm was evaluated by comparing the number of rounds in which ALFA-Mix+outperformed other algorithms.Finally,this study employed six models,including Efficient Former,to classify HMM.The best-performing model among these models was selected as the baseline model and combined with the ALFA-Mix+algorithm to achieve satisfactor y classification results with a small dataset.RESULTS:ALFA-Mix+outperforms other algorithms with an average superiority of 16.6,14.75,16.8,and 16.7 rounds in terms of accuracy,sensitivity,specificity,and Kappa value,respectively.This study conducted experiments on classifying HMM using several advanced deep learning models with a complete training set of 4252 images.The Efficient Former achieved the best results with an accuracy,sensitivity,specificity,and Kappa value of 0.8821,0.8334,0.9693,and 0.8339,respectively.Therefore,by combining ALFA-Mix+with Efficient Former,this study achieved results with an accuracy,sensitivity,specificity,and Kappa value of 0.8964,0.8643,0.9721,and 0.8537,respectively.CONCLUSION:The ALFA-Mix+algorithm reduces the required samples without compromising accuracy.Compared to other algorithms,ALFA-Mix+outperforms in more rounds of experiments.It effectively selects valuable samples compared to other algorithms.In HMM classification,combining ALFA-Mix+with Efficient Former enhances model performance,further demonstrating the effectiveness of ALFA-Mix+.展开更多
Coastal wetlands are characterized by complex patterns both in their geomorphlc and ecological teatures. Besides field observations, it is necessary to analyze the land cover of wetlands through the color infrared (...Coastal wetlands are characterized by complex patterns both in their geomorphlc and ecological teatures. Besides field observations, it is necessary to analyze the land cover of wetlands through the color infrared (CIR) aerial photography or remote sensing image. In this paper, we designed an evolving neural network classifier using variable string genetic algorithm (VGA) for the land cover classification of CIR aerial image. With the VGA, the classifier that we designed is able to evolve automatically the appropriate number of hidden nodes for modeling the neural network topology optimally and to find a near-optimal set of connection weights globally. Then, with backpropagation algorithm (BP), it can find the best connection weights. The VGA-BP classifier, which is derived from hybrid algorithms mentioned above, is demonstrated on CIR images classification effectively. Compared with standard classifiers, such as Bayes maximum-likelihood classifier, VGA classifier and BP-MLP (multi-layer perception) classifier, it has shown that the VGA-BP classifier can have better performance on highly resolution land cover classification.展开更多
In this research,an integrated classification method based on principal component analysis-simulated annealing genetic algorithm-fuzzy cluster means(PCA-SAGA-FCM)was proposed for the unsupervised classification of tig...In this research,an integrated classification method based on principal component analysis-simulated annealing genetic algorithm-fuzzy cluster means(PCA-SAGA-FCM)was proposed for the unsupervised classification of tight sandstone reservoirs which lack the prior information and core experiments.A variety of evaluation parameters were selected,including lithology characteristic parameters,poro-permeability quality characteristic parameters,engineering quality characteristic parameters,and pore structure characteristic parameters.The PCA was used to reduce the dimension of the evaluation pa-rameters,and the low-dimensional data was used as input.The unsupervised reservoir classification of tight sandstone reservoir was carried out by the SAGA-FCM,the characteristics of reservoir at different categories were analyzed and compared with the lithological profiles.The analysis results of numerical simulation and actual logging data show that:1)compared with FCM algorithm,SAGA-FCM has stronger stability and higher accuracy;2)the proposed method can cluster the reservoir flexibly and effectively according to the degree of membership;3)the results of reservoir integrated classification match well with the lithologic profle,which demonstrates the reliability of the classification method.展开更多
Feedforward Neural Network(FNN)is one of the most popular neural network models that is utilized to solve a wide range of nonlinear and complex problems.Several models such as stochastic gradient descent have been dev...Feedforward Neural Network(FNN)is one of the most popular neural network models that is utilized to solve a wide range of nonlinear and complex problems.Several models such as stochastic gradient descent have been developed to train FNNs.However,they mainly suffer from falling into local optima leading to reduce the accuracy of FNNs.Moreover,the convergence speed of training process depends on the initial values of weights and biases in FNNs.Generally,these values are randomly determined by most of the training models.To deal with these issues,in this paper,we develop a novel evolutionary algorithm by modifying the original version of Whale Optimization Algorithm(WOA).To this end,a nonlinear function is introduced to improve the exploration and exploitation phases in the search process of WOA.Then,the modified WOA is applied to automatically obtain the initial values of weights and biases in FNN leading to reduce the probability of falling into local optima.In addition,the FNN model trained by the modified WOA is used to develop a classification approach for medical diagnosis problems.Ten medical diagnosis datasets are utilized to evaluate the efficiency of the proposed method.Also,four evaluation metrics including accuracy,AUC,specificity,and sensitivity are used in the experiments to compare the performance of classification models.The experimental results demonstrate that the proposed method is better than other competing classification models due to achieving higher values of accuracy,AUC,specificity,and sensitivity metrics for the used datasets.展开更多
In this study,eight different varieties of maize seeds were used as the research objects.Conduct 81 types of combined preprocessing on the original spectra.Through comparison,Savitzky-Golay(SG)-multivariate scattering...In this study,eight different varieties of maize seeds were used as the research objects.Conduct 81 types of combined preprocessing on the original spectra.Through comparison,Savitzky-Golay(SG)-multivariate scattering correction(MSC)-maximum-minimum normalization(MN)was identified as the optimal preprocessing technique.The competitive adaptive reweighted sampling(CARS),successive projections algorithm(SPA),and their combined methods were employed to extract feature wavelengths.Classification models based on back propagation(BP),support vector machine(SVM),random forest(RF),and partial least squares(PLS)were established using full-band data and feature wavelengths.Among all models,the(CARS-SPA)-BP model achieved the highest accuracy rate of 98.44%.This study offers novel insights and methodologies for the rapid and accurate identification of corn seeds as well as other crop seeds.展开更多
Classification algorithm is one of the key techniques to affect text automatic classification system’s performance, play an important role in automatic classification research area. This paper comparatively analyzed ...Classification algorithm is one of the key techniques to affect text automatic classification system’s performance, play an important role in automatic classification research area. This paper comparatively analyzed k-NN. VSM and hybrid classification algorithm presented by our research group. Some 2000 pieces of Internet news provided by ChinaInfoBank are used in the experiment. The result shows that the hybrid algorithm’s performance presented by the groups is superior to the other two algorithms.展开更多
The accurate identification and classification of various power quality disturbances are keys to ensuring high-quality electrical energy. In this study, the statistical characteristics of the disturbance signal of wav...The accurate identification and classification of various power quality disturbances are keys to ensuring high-quality electrical energy. In this study, the statistical characteristics of the disturbance signal of wavelet transform coefficients and wavelet transform energy distribution constitute feature vectors. These vectors are then trained and tested using SVM multi-class algorithms. Experimental results demonstrate that the SVM multi-class algorithms, which use the Gaussian radial basis function, exponential radial basis function, and hyperbolic tangent function as basis functions, are suitable methods for power quality disturbance classification.展开更多
In this study,our aim is to address the problem of gene selection by proposing a hybrid bio-inspired evolutionary algorithm that combines Grey Wolf Optimization(GWO)with Harris Hawks Optimization(HHO)for feature selec...In this study,our aim is to address the problem of gene selection by proposing a hybrid bio-inspired evolutionary algorithm that combines Grey Wolf Optimization(GWO)with Harris Hawks Optimization(HHO)for feature selection.Themotivation for utilizingGWOandHHOstems fromtheir bio-inspired nature and their demonstrated success in optimization problems.We aimto leverage the strengths of these algorithms to enhance the effectiveness of feature selection in microarray-based cancer classification.We selected leave-one-out cross-validation(LOOCV)to evaluate the performance of both two widely used classifiers,k-nearest neighbors(KNN)and support vector machine(SVM),on high-dimensional cancer microarray data.The proposed method is extensively tested on six publicly available cancer microarray datasets,and a comprehensive comparison with recently published methods is conducted.Our hybrid algorithm demonstrates its effectiveness in improving classification performance,Surpassing alternative approaches in terms of precision.The outcomes confirm the capability of our method to substantially improve both the precision and efficiency of cancer classification,thereby advancing the development ofmore efficient treatment strategies.The proposed hybridmethod offers a promising solution to the gene selection problem in microarray-based cancer classification.It improves the accuracy and efficiency of cancer diagnosis and treatment,and its superior performance compared to other methods highlights its potential applicability in realworld cancer classification tasks.By harnessing the complementary search mechanisms of GWO and HHO,we leverage their bio-inspired behavior to identify informative genes relevant to cancer diagnosis and treatment.展开更多
The remote sensing image classification has stimulated considerable interest as an effective method for better retrieving information from the rapidly increasing large volume, complex and distributed satellite remote ...The remote sensing image classification has stimulated considerable interest as an effective method for better retrieving information from the rapidly increasing large volume, complex and distributed satellite remote imaging data of large scale and cross-time, due to the increase of remote image quantities and image resolutions. In the paper, the genetic algorithms were employed to solve the weighting of the radial basis faction networks in order to improve the precision of remote sensing image classification. The remote sensing image classification was also introduced for the GIS spatial analysis and the spatial online analytical processing (OLAP), and the resulted effectiveness was demonstrated in the analysis of land utilization variation of Daqing city.展开更多
基金supported by the National Natural Science Foundation of China(Grant Number 62341210)Natural Science Foundation of Guangxi Province(Grant Number:2025GXNSFHA069267)Science and Technology Development Plan for Baise City(Grant Number 20233654).
文摘DNA microarrays, a cornerstone in biomedicine, measure gene expression across thousands to tens of thousands of genes. Identifying the genes vital for accurate cancer classification is a key challenge. Here, we present Fs-LSA (F-score based Learning Search Algorithm), a novel gene selection algorithm designed to enhance the precision and efficiency of target gene identification from microarray data for cancer classification. This algorithm is divided into two phases: the first leverages F-score values to prioritize and select feature genes with the most significant differential expression;the second phase introduces our Learning Search Algorithm (LSA), which harnesses swarm intelligence to identify the optimal subset among the remaining genes. Inspired by human social learning, LSA integrates historical data and collective intelligence for a thorough search, with a dynamic control mechanism that balances exploration and refinement, thereby enhancing the gene selection process. We conducted a rigorous validation of Fs-LSA’s performance using eight publicly available cancer microarray expression datasets. Fs-LSA achieved accuracy, precision, sensitivity, and F1-score values of 0.9932, 0.9923, 0.9962, and 0.994, respectively. Comparative analyses with state-of-the-art algorithms revealed Fs-LSA’s superior performance in terms of simplicity and efficiency. Additionally, we validated the algorithm’s efficacy independently using glioblastoma data from GEO and TCGA databases. It was significantly superior to those of the comparison algorithms. Importantly, the driver genes identified by Fs-LSA were instrumental in developing a predictive model as an independent prognostic indicator for glioblastoma, underscoring Fs-LSA’s transformative potential in genomics and personalized medicine.
基金funded by the Ministry of Higher Education of Malaysia,grant number FRGS/1/2022/ICT02/UPSI/02/1.
文摘In recent years,feature selection(FS)optimization of high-dimensional gene expression data has become one of the most promising approaches for cancer prediction and classification.This work reviews FS and classification methods that utilize evolutionary algorithms(EAs)for gene expression profiles in cancer or medical applications based on research motivations,challenges,and recommendations.Relevant studies were retrieved from four major academic databases-IEEE,Scopus,Springer,and ScienceDirect-using the keywords‘cancer classification’,‘optimization’,‘FS’,and‘gene expression profile’.A total of 67 papers were finally selected with key advancements identified as follows:(1)The majority of papers(44.8%)focused on developing algorithms and models for FS and classification.(2)The second category encompassed studies on biomarker identification by EAs,including 20 papers(30%).(3)The third category comprised works that applied FS to cancer data for decision support system purposes,addressing high-dimensional data and the formulation of chromosome length.These studies accounted for 12%of the total number of studies.(4)The remaining three papers(4.5%)were reviews and surveys focusing on models and developments in prediction and classification optimization for cancer classification under current technical conditions.This review highlights the importance of optimizing FS in EAs to manage high-dimensional data effectively.Despite recent advancements,significant limitations remain:the dynamic formulation of chromosome length remains an underexplored area.Thus,further research is needed on dynamic-length chromosome techniques for more sophisticated biomarker gene selection techniques.The findings suggest that further advancements in dynamic chromosome length formulations and adaptive algorithms could enhance cancer classification accuracy and efficiency.
文摘A new arrival and departure flight classification method based on the transitive closure algorithm (TCA) is proposed. Firstly, the fuzzy set theory and the transitive closure algorithm are introduced. Then four different factors are selected to establish the flight classification model and a method is given to calculate the delay cost for each class. Finally, the proposed method is implemented in the sequencing problems of flights in a terminal area, and results are compared with that of the traditional classification method(TCM). Results show that the new classification model is effective in reducing the expenses of flight delays, thus optimizing the sequences of arrival and departure flights, and improving the efficiency of air traffic control.
文摘A learning algorithm based on a hard limiter for feedforward neural networks (NN) is presented,and is applied in solving classification problems on separable convex sets and disjoint sets.It has been proved that the algorithm has stronger classification ability than that of the back propagation (BP) algorithm for the feedforward NN using sigmoid function by simulation.What is more,the models can be implemented with lower cost hardware than that of the BP NN.LEARNIN
文摘Since there are many factors affecting the quality of wine, total 17 factors were screened out using principle component analysis. The difference test was conducted on the evaluation data of the two groups of testers. The results showed that the evaluation data of the second group were more reliable compared with those of the first group. At the same time, the KM algorithm was optimized using the QPSO algorithm. The wine classification model was established. Compared with the other two algorithms, the QPSO-KM algorithm was more capable of searching the globally optimum solution, and it could be used to classify the wine samples. In addition,the QPSO-KM algorithm could also be used to solve the issues about clustering.
基金funded by“The Pearl River Talent Recruitment Program”of Guangdong Province in 2019(Grant No.2019CX01G338)the Research Funding of Shantou University for New Faculty Member(Grant No.NTF19024-2019).
文摘This study presents a framework for predicting geological characteristics based on integrating a stacking classification algorithm(SCA) with a grid search(GS) and K-fold cross validation(K-CV). The SCA includes two learner layers: a primary learner’s layer and meta-classifier layer. The accuracy of the SCA can be improved by using the GS and K-CV. The GS was developed to match the hyper-parameters and optimise complicated problems. The K-CV is commonly applied to changing the validation set in a training set. In general, a GS is usually combined with K-CV to produce a corresponding evaluation index and select the best hyper-parameters. The torque penetration index(TPI) and field penetration index(FPI) are proposed based on shield parameters to express the geological characteristics. The elbow method(EM) and silhouette coefficient(Si) are employed to determine the types of geological characteristics(K) in a Kmeans++ algorithm. A case study on mixed ground in Guangzhou is adopted to validate the applicability of the developed model. The results show that with the developed framework, the four selected parameters, i.e. thrust, advance rate, cutterhead rotation speed and cutterhead torque, can be used to effectively predict the corresponding geological characteristics.
文摘In this paper, sixty-eight research articles published between 2000 and 2017 as well as textbooks which employed four classification algorithms: K-Nearest-Neighbor (KNN), Support Vector Machines (SVM), Random Forest (RF) and Neural Network (NN) as the main statistical tools were reviewed. The aim was to examine and compare these nonparametric classification methods on the following attributes: robustness to training data, sensitivity to changes, data fitting, stability, ability to handle large data sizes, sensitivity to noise, time invested in parameter tuning, and accuracy. The performances, strengths and shortcomings of each of the algorithms were examined, and finally, a conclusion was arrived at on which one has higher performance. It was evident from the literature reviewed that RF is too sensitive to small changes in the training dataset and is occasionally unstable and tends to overfit in the model. KNN is easy to implement and understand but has a major drawback of becoming significantly slow as the size of the data in use grows, while the ideal value of K for the KNN classifier is difficult to set. SVM and RF are insensitive to noise or overtraining, which shows their ability in dealing with unbalanced data. Larger input datasets will lengthen classification times for NN and KNN more than for SVM and RF. Among these nonparametric classification methods, NN has the potential to become a more widely used classification algorithm, but because of their time-consuming parameter tuning procedure, high level of complexity in computational processing, the numerous types of NN architectures to choose from and the high number of algorithms used for training, most researchers recommend SVM and RF as easier and wieldy used methods which repeatedly achieve results with high accuracies and are often faster to implement.
基金The Natural Science Foundation of Hunan Province,China(No.2020JJ4601)Open Fund of the Key Laboratory of Highway Engi-neering of Ministry of Education(No.kfj190203).
文摘Clustering filtering is usually a practical method for light detection and ranging(LiDAR)point clouds filtering according to their characteristic attributes.However,the amount of point cloud data is extremely large in practice,making it impossible to cluster point clouds data directly,and the filtering error is also too large.Moreover,many existing filtering algorithms have poor classification results in discontinuous terrain.This article proposes a new fast classification filtering algorithm based on density clustering,which can solve the problem of point clouds classification in discontinuous terrain.Based on the spatial density of LiDAR point clouds,also the features of the ground object point clouds and the terrain point clouds,the point clouds are clustered firstly by their elevations,and then the plane point clouds are selected.Thus the number of samples and feature dimensions of data are reduced.Using the DBSCAN clustering filtering method,the original point clouds are finally divided into noise point clouds,ground object point clouds,and terrain point clouds.The experiment uses 15 sets of data samples provided by the International Society for Photogrammetry and Remote Sensing(ISPRS),and the results of the proposed algorithm are compared with the other eight classical filtering algorithms.Quantitative and qualitative analysis shows that the proposed algorithm has good applicability in urban areas and rural areas,and is significantly better than other classic filtering algorithms in discontinuous terrain,with a total error of about 10%.The results show that the proposed method is feasible and can be used in different terrains.
文摘A brain-computer interface(BCI)system is one of the most effective ways that translates brain signals into output commands.Different imagery activities can be classified based on the changes inμandβrhythms and their spatial distributions.Multi-layer perceptron neural networks(MLP-NNs)are commonly used for classification.Training such MLP-NNs has great importance in a way that has attracted many researchers to this field recently.Conventional methods for training NNs,such as gradient descent and recursive methods,have some disadvantages including low accuracy,slow convergence speed and trapping in local minimums.In this paper,in order to overcome these issues,the MLP-NN trained by a hybrid population-physics-based algorithm,the combination of particle swarm optimization and gravitational search algorithm(PSOGSA),is proposed for our classification problem.To show the advantages of using PSOGSA that trains NNs,this algorithm is compared with other meta-heuristic algorithms such as particle swarm optimization(PSO),gravitational search algorithm(GSA)and new versions of PSO.The metrics that are discussed in this paper are the speed of convergence and classification accuracy metrics.The results show that the proposed algorithm in most subjects of encephalography(EEG)dataset has very better or acceptable performance compared to others.
文摘This paper presents a new framework for object-based classification of high-resolution hyperspectral data.This multi-step framework is based on multi-resolution segmentation(MRS)and Random Forest classifier(RFC)algorithms.The first step is to determine of weights of the input features while using the object-based approach with MRS to processing such images.Given the high number of input features,an automatic method is needed for estimation of this parameter.Moreover,we used the Variable Importance(VI),one of the outputs of the RFC,to determine the importance of each image band.Then,based on this parameter and other required parameters,the image is segmented into some homogenous regions.Finally,the RFC is carried out based on the characteristics of segments for converting them into meaningful objects.The proposed method,as well as,the conventional pixel-based RFC and Support Vector Machine(SVM)method was applied to three different hyperspectral data-sets with various spectral and spatial characteristics.These data were acquired by the HyMap,the Airborne Prism Experiment(APEX),and the Compact Airborne Spectrographic Imager(CASI)hyperspectral sensors.The experimental results show that the proposed method is more consistent for land cover mapping in various areas.The overall classification accuracy(OA),obtained by the proposed method was 95.48,86.57,and 84.29%for the HyMap,the APEX,and the CASI datasets,respectively.Moreover,this method showed better efficiency in comparison to the spectralbased classifications because the OAs of the proposed method was 5.67 and 3.75%higher than the conventional RFC and SVM classifiers,respectively.
基金supported by the National Natural Science Foundation of China under Grant No. 61503424the Research Project by The State Ethnic Affairs Commission under Grant No. 14ZYZ017+1 种基金the Jiangsu Future Networks Innovation Institute-Prospective Research Project on Future Networks under Grant No. BY2013095-2-14the first-class discipline construction transitional funds of Minzu University of China
文摘In order to improve the accuracy and reduce the training and testing time in image classification algorithm, a novel image classification scheme based on extreme learning machine(ELM) and linear spatial pyramid matching using sparse coding(Sc SPM) for image classification is proposed. A new structure based on two layer extreme learning machine instead of the original linear SVM classifier is constructed. Firstly, the Sc SPM algorithm is performed to extract features of the multi-scale image blocks, and then each layer feature vector is connected to an ELM. Finally, the mapping features are connected together, and as the input of one ELM based on radial basis kernel function. With experimental evaluations on the well-known dataset benchmarks, the results demonstrate that the proposed algorithm has better performance not only in reducing the training time, but also in improving the accuracy of classification.
基金Supported by the National Natural Science Foundation of China(No.61906066)the Zhejiang Provincial Philosophy and Social Science Planning Project(No.21NDJC021Z)+4 种基金Shenzhen Fund for Guangdong Provincial High-level Clinical Key Specialties(No.SZGSP014)Sanming Project of Medicine in Shenzhen(No.SZSM202011015)Shenzhen Science and Technology Planning Project(No.KCXFZ20211020163813019)the Natural Science Foundation of Ningbo City(No.202003N4072)the Postgraduate Research and Innovation Project of Huzhou University(No.2023KYCX52)。
文摘AIM:To conduct a classification study of high myopic maculopathy(HMM)using limited datasets,including tessellated fundus,diffuse chorioretinal atrophy,patchy chorioretinal atrophy,and macular atrophy,and minimize annotation costs,and to optimize the ALFA-Mix active learning algorithm and apply it to HMM classification.METHODS:The optimized ALFA-Mix algorithm(ALFAMix+)was compared with five algorithms,including ALFA-Mix.Four models,including Res Net18,were established.Each algorithm was combined with four models for experiments on the HMM dataset.Each experiment consisted of 20 active learning rounds,with 100 images selected per round.The algorithm was evaluated by comparing the number of rounds in which ALFA-Mix+outperformed other algorithms.Finally,this study employed six models,including Efficient Former,to classify HMM.The best-performing model among these models was selected as the baseline model and combined with the ALFA-Mix+algorithm to achieve satisfactor y classification results with a small dataset.RESULTS:ALFA-Mix+outperforms other algorithms with an average superiority of 16.6,14.75,16.8,and 16.7 rounds in terms of accuracy,sensitivity,specificity,and Kappa value,respectively.This study conducted experiments on classifying HMM using several advanced deep learning models with a complete training set of 4252 images.The Efficient Former achieved the best results with an accuracy,sensitivity,specificity,and Kappa value of 0.8821,0.8334,0.9693,and 0.8339,respectively.Therefore,by combining ALFA-Mix+with Efficient Former,this study achieved results with an accuracy,sensitivity,specificity,and Kappa value of 0.8964,0.8643,0.9721,and 0.8537,respectively.CONCLUSION:The ALFA-Mix+algorithm reduces the required samples without compromising accuracy.Compared to other algorithms,ALFA-Mix+outperforms in more rounds of experiments.It effectively selects valuable samples compared to other algorithms.In HMM classification,combining ALFA-Mix+with Efficient Former enhances model performance,further demonstrating the effectiveness of ALFA-Mix+.
文摘Coastal wetlands are characterized by complex patterns both in their geomorphlc and ecological teatures. Besides field observations, it is necessary to analyze the land cover of wetlands through the color infrared (CIR) aerial photography or remote sensing image. In this paper, we designed an evolving neural network classifier using variable string genetic algorithm (VGA) for the land cover classification of CIR aerial image. With the VGA, the classifier that we designed is able to evolve automatically the appropriate number of hidden nodes for modeling the neural network topology optimally and to find a near-optimal set of connection weights globally. Then, with backpropagation algorithm (BP), it can find the best connection weights. The VGA-BP classifier, which is derived from hybrid algorithms mentioned above, is demonstrated on CIR images classification effectively. Compared with standard classifiers, such as Bayes maximum-likelihood classifier, VGA classifier and BP-MLP (multi-layer perception) classifier, it has shown that the VGA-BP classifier can have better performance on highly resolution land cover classification.
基金funded by the National Natural Science Foundation of China(42174131)the Strategic Cooperation Technology Projects of CNPC and CUPB(ZLZX2020-03).
文摘In this research,an integrated classification method based on principal component analysis-simulated annealing genetic algorithm-fuzzy cluster means(PCA-SAGA-FCM)was proposed for the unsupervised classification of tight sandstone reservoirs which lack the prior information and core experiments.A variety of evaluation parameters were selected,including lithology characteristic parameters,poro-permeability quality characteristic parameters,engineering quality characteristic parameters,and pore structure characteristic parameters.The PCA was used to reduce the dimension of the evaluation pa-rameters,and the low-dimensional data was used as input.The unsupervised reservoir classification of tight sandstone reservoir was carried out by the SAGA-FCM,the characteristics of reservoir at different categories were analyzed and compared with the lithological profiles.The analysis results of numerical simulation and actual logging data show that:1)compared with FCM algorithm,SAGA-FCM has stronger stability and higher accuracy;2)the proposed method can cluster the reservoir flexibly and effectively according to the degree of membership;3)the results of reservoir integrated classification match well with the lithologic profle,which demonstrates the reliability of the classification method.
文摘Feedforward Neural Network(FNN)is one of the most popular neural network models that is utilized to solve a wide range of nonlinear and complex problems.Several models such as stochastic gradient descent have been developed to train FNNs.However,they mainly suffer from falling into local optima leading to reduce the accuracy of FNNs.Moreover,the convergence speed of training process depends on the initial values of weights and biases in FNNs.Generally,these values are randomly determined by most of the training models.To deal with these issues,in this paper,we develop a novel evolutionary algorithm by modifying the original version of Whale Optimization Algorithm(WOA).To this end,a nonlinear function is introduced to improve the exploration and exploitation phases in the search process of WOA.Then,the modified WOA is applied to automatically obtain the initial values of weights and biases in FNN leading to reduce the probability of falling into local optima.In addition,the FNN model trained by the modified WOA is used to develop a classification approach for medical diagnosis problems.Ten medical diagnosis datasets are utilized to evaluate the efficiency of the proposed method.Also,four evaluation metrics including accuracy,AUC,specificity,and sensitivity are used in the experiments to compare the performance of classification models.The experimental results demonstrate that the proposed method is better than other competing classification models due to achieving higher values of accuracy,AUC,specificity,and sensitivity metrics for the used datasets.
基金supported by the Science and Technology Development Plan Project of Jilin Provincial Department of Science and Technology (No.20220203112S)the Jilin Provincial Department of Education Science and Technology Research Project (No.JJKH20210039KJ)。
文摘In this study,eight different varieties of maize seeds were used as the research objects.Conduct 81 types of combined preprocessing on the original spectra.Through comparison,Savitzky-Golay(SG)-multivariate scattering correction(MSC)-maximum-minimum normalization(MN)was identified as the optimal preprocessing technique.The competitive adaptive reweighted sampling(CARS),successive projections algorithm(SPA),and their combined methods were employed to extract feature wavelengths.Classification models based on back propagation(BP),support vector machine(SVM),random forest(RF),and partial least squares(PLS)were established using full-band data and feature wavelengths.Among all models,the(CARS-SPA)-BP model achieved the highest accuracy rate of 98.44%.This study offers novel insights and methodologies for the rapid and accurate identification of corn seeds as well as other crop seeds.
文摘Classification algorithm is one of the key techniques to affect text automatic classification system’s performance, play an important role in automatic classification research area. This paper comparatively analyzed k-NN. VSM and hybrid classification algorithm presented by our research group. Some 2000 pieces of Internet news provided by ChinaInfoBank are used in the experiment. The result shows that the hybrid algorithm’s performance presented by the groups is superior to the other two algorithms.
文摘The accurate identification and classification of various power quality disturbances are keys to ensuring high-quality electrical energy. In this study, the statistical characteristics of the disturbance signal of wavelet transform coefficients and wavelet transform energy distribution constitute feature vectors. These vectors are then trained and tested using SVM multi-class algorithms. Experimental results demonstrate that the SVM multi-class algorithms, which use the Gaussian radial basis function, exponential radial basis function, and hyperbolic tangent function as basis functions, are suitable methods for power quality disturbance classification.
基金the Deputyship for Research and Innovation,“Ministry of Education”in Saudi Arabia for funding this research(IFKSUOR3-014-3).
文摘In this study,our aim is to address the problem of gene selection by proposing a hybrid bio-inspired evolutionary algorithm that combines Grey Wolf Optimization(GWO)with Harris Hawks Optimization(HHO)for feature selection.Themotivation for utilizingGWOandHHOstems fromtheir bio-inspired nature and their demonstrated success in optimization problems.We aimto leverage the strengths of these algorithms to enhance the effectiveness of feature selection in microarray-based cancer classification.We selected leave-one-out cross-validation(LOOCV)to evaluate the performance of both two widely used classifiers,k-nearest neighbors(KNN)and support vector machine(SVM),on high-dimensional cancer microarray data.The proposed method is extensively tested on six publicly available cancer microarray datasets,and a comprehensive comparison with recently published methods is conducted.Our hybrid algorithm demonstrates its effectiveness in improving classification performance,Surpassing alternative approaches in terms of precision.The outcomes confirm the capability of our method to substantially improve both the precision and efficiency of cancer classification,thereby advancing the development ofmore efficient treatment strategies.The proposed hybridmethod offers a promising solution to the gene selection problem in microarray-based cancer classification.It improves the accuracy and efficiency of cancer diagnosis and treatment,and its superior performance compared to other methods highlights its potential applicability in realworld cancer classification tasks.By harnessing the complementary search mechanisms of GWO and HHO,we leverage their bio-inspired behavior to identify informative genes relevant to cancer diagnosis and treatment.
基金Sponsored by the National Natural Science Foundation of China (Grant No.40271044), Natural Science Foundation(Grant No.TK2005 -17) and Projectof Science Backbone of Heilongjiang Province(Grant No.1151G021).
文摘The remote sensing image classification has stimulated considerable interest as an effective method for better retrieving information from the rapidly increasing large volume, complex and distributed satellite remote imaging data of large scale and cross-time, due to the increase of remote image quantities and image resolutions. In the paper, the genetic algorithms were employed to solve the weighting of the radial basis faction networks in order to improve the precision of remote sensing image classification. The remote sensing image classification was also introduced for the GIS spatial analysis and the spatial online analytical processing (OLAP), and the resulted effectiveness was demonstrated in the analysis of land utilization variation of Daqing city.