Cyberbullying on social media poses significant psychological risks,yet most detection systems over-simplify the task by focusing on binary classification,ignoring nuanced categories like passive-aggressive remarks or...Cyberbullying on social media poses significant psychological risks,yet most detection systems over-simplify the task by focusing on binary classification,ignoring nuanced categories like passive-aggressive remarks or indirect slurs.To address this gap,we propose a hybrid framework combining Term Frequency-Inverse Document Frequency(TF-IDF),word-to-vector(Word2Vec),and Bidirectional Encoder Representations from Transformers(BERT)based models for multi-class cyberbullying detection.Our approach integrates TF-IDF for lexical specificity and Word2Vec for semantic relationships,fused with BERT’s contextual embeddings to capture syntactic and semantic complexities.We evaluate the framework on a publicly available dataset of 47,000 annotated social media posts across five cyberbullying categories:age,ethnicity,gender,religion,and indirect aggression.Among BERT variants tested,BERT Base Un-Cased achieved the highest performance with 93%accuracy(standard deviation across±1%5-fold cross-validation)and an average AUC of 0.96,outperforming standalone TF-IDF(78%)and Word2Vec(82%)models.Notably,it achieved near-perfect AUC scores(0.99)for age and ethnicity-based bullying.A comparative analysis with state-of-the-art benchmarks,including Generative Pre-trained Transformer 2(GPT-2)and Text-to-Text Transfer Transformer(T5)models highlights BERT’s superiority in handling ambiguous language.This work advances cyberbullying detection by demonstrating how hybrid feature extraction and transformer models improve multi-class classification,offering a scalable solution for moderating nuanced harmful content.展开更多
DNA microarrays, a cornerstone in biomedicine, measure gene expression across thousands to tens of thousands of genes. Identifying the genes vital for accurate cancer classification is a key challenge. Here, we presen...DNA microarrays, a cornerstone in biomedicine, measure gene expression across thousands to tens of thousands of genes. Identifying the genes vital for accurate cancer classification is a key challenge. Here, we present Fs-LSA (F-score based Learning Search Algorithm), a novel gene selection algorithm designed to enhance the precision and efficiency of target gene identification from microarray data for cancer classification. This algorithm is divided into two phases: the first leverages F-score values to prioritize and select feature genes with the most significant differential expression;the second phase introduces our Learning Search Algorithm (LSA), which harnesses swarm intelligence to identify the optimal subset among the remaining genes. Inspired by human social learning, LSA integrates historical data and collective intelligence for a thorough search, with a dynamic control mechanism that balances exploration and refinement, thereby enhancing the gene selection process. We conducted a rigorous validation of Fs-LSA’s performance using eight publicly available cancer microarray expression datasets. Fs-LSA achieved accuracy, precision, sensitivity, and F1-score values of 0.9932, 0.9923, 0.9962, and 0.994, respectively. Comparative analyses with state-of-the-art algorithms revealed Fs-LSA’s superior performance in terms of simplicity and efficiency. Additionally, we validated the algorithm’s efficacy independently using glioblastoma data from GEO and TCGA databases. It was significantly superior to those of the comparison algorithms. Importantly, the driver genes identified by Fs-LSA were instrumental in developing a predictive model as an independent prognostic indicator for glioblastoma, underscoring Fs-LSA’s transformative potential in genomics and personalized medicine.展开更多
In recent years,feature selection(FS)optimization of high-dimensional gene expression data has become one of the most promising approaches for cancer prediction and classification.This work reviews FS and classificati...In recent years,feature selection(FS)optimization of high-dimensional gene expression data has become one of the most promising approaches for cancer prediction and classification.This work reviews FS and classification methods that utilize evolutionary algorithms(EAs)for gene expression profiles in cancer or medical applications based on research motivations,challenges,and recommendations.Relevant studies were retrieved from four major academic databases-IEEE,Scopus,Springer,and ScienceDirect-using the keywords‘cancer classification’,‘optimization’,‘FS’,and‘gene expression profile’.A total of 67 papers were finally selected with key advancements identified as follows:(1)The majority of papers(44.8%)focused on developing algorithms and models for FS and classification.(2)The second category encompassed studies on biomarker identification by EAs,including 20 papers(30%).(3)The third category comprised works that applied FS to cancer data for decision support system purposes,addressing high-dimensional data and the formulation of chromosome length.These studies accounted for 12%of the total number of studies.(4)The remaining three papers(4.5%)were reviews and surveys focusing on models and developments in prediction and classification optimization for cancer classification under current technical conditions.This review highlights the importance of optimizing FS in EAs to manage high-dimensional data effectively.Despite recent advancements,significant limitations remain:the dynamic formulation of chromosome length remains an underexplored area.Thus,further research is needed on dynamic-length chromosome techniques for more sophisticated biomarker gene selection techniques.The findings suggest that further advancements in dynamic chromosome length formulations and adaptive algorithms could enhance cancer classification accuracy and efficiency.展开更多
The basic idea of multi-class classification is a disassembly method,which is to decompose a multi-class classification task into several binary classification tasks.In order to improve the accuracy of multi-class cla...The basic idea of multi-class classification is a disassembly method,which is to decompose a multi-class classification task into several binary classification tasks.In order to improve the accuracy of multi-class classification in the case of insufficient samples,this paper proposes a multi-class classification method combining K-means and multi-task relationship learning(MTRL).The method first uses the split method of One vs.Rest to disassemble the multi-class classification task into binary classification tasks.K-means is used to down sample the dataset of each task,which can prevent over-fitting of the model while reducing training costs.Finally,the sampled dataset is applied to the MTRL,and multiple binary classifiers are trained together.With the help of MTRL,this method can utilize the inter-task association to train the model,and achieve the purpose of improving the classification accuracy of each binary classifier.The effectiveness of the proposed approach is demonstrated by experimental results on the Iris dataset,Wine dataset,Multiple Features dataset,Wireless Indoor Localization dataset and Avila dataset.展开更多
The accurate identification and classification of various power quality disturbances are keys to ensuring high-quality electrical energy. In this study, the statistical characteristics of the disturbance signal of wav...The accurate identification and classification of various power quality disturbances are keys to ensuring high-quality electrical energy. In this study, the statistical characteristics of the disturbance signal of wavelet transform coefficients and wavelet transform energy distribution constitute feature vectors. These vectors are then trained and tested using SVM multi-class algorithms. Experimental results demonstrate that the SVM multi-class algorithms, which use the Gaussian radial basis function, exponential radial basis function, and hyperbolic tangent function as basis functions, are suitable methods for power quality disturbance classification.展开更多
A new arrival and departure flight classification method based on the transitive closure algorithm (TCA) is proposed. Firstly, the fuzzy set theory and the transitive closure algorithm are introduced. Then four diff...A new arrival and departure flight classification method based on the transitive closure algorithm (TCA) is proposed. Firstly, the fuzzy set theory and the transitive closure algorithm are introduced. Then four different factors are selected to establish the flight classification model and a method is given to calculate the delay cost for each class. Finally, the proposed method is implemented in the sequencing problems of flights in a terminal area, and results are compared with that of the traditional classification method(TCM). Results show that the new classification model is effective in reducing the expenses of flight delays, thus optimizing the sequences of arrival and departure flights, and improving the efficiency of air traffic control.展开更多
A learning algorithm based on a hard limiter for feedforward neural networks (NN) is presented,and is applied in solving classification problems on separable convex sets and disjoint sets.It has been proved that the a...A learning algorithm based on a hard limiter for feedforward neural networks (NN) is presented,and is applied in solving classification problems on separable convex sets and disjoint sets.It has been proved that the algorithm has stronger classification ability than that of the back propagation (BP) algorithm for the feedforward NN using sigmoid function by simulation.What is more,the models can be implemented with lower cost hardware than that of the BP NN.LEARNIN展开更多
Since there are many factors affecting the quality of wine, total 17 factors were screened out using principle component analysis. The difference test was conducted on the evaluation data of the two groups of testers....Since there are many factors affecting the quality of wine, total 17 factors were screened out using principle component analysis. The difference test was conducted on the evaluation data of the two groups of testers. The results showed that the evaluation data of the second group were more reliable compared with those of the first group. At the same time, the KM algorithm was optimized using the QPSO algorithm. The wine classification model was established. Compared with the other two algorithms, the QPSO-KM algorithm was more capable of searching the globally optimum solution, and it could be used to classify the wine samples. In addition,the QPSO-KM algorithm could also be used to solve the issues about clustering.展开更多
Aiming at the limitations of rapid fault diagnosis of blast furnace, a novel strategy based on cost-conscious least squares support vector machine (LS-SVM) is proposed to solve this problem. Firstly, modified discre...Aiming at the limitations of rapid fault diagnosis of blast furnace, a novel strategy based on cost-conscious least squares support vector machine (LS-SVM) is proposed to solve this problem. Firstly, modified discrete particle swarm optimization is applied to optimize the feature selection and the LS-SVM parameters. Secondly, cost-con- scious formula is presented for fitness function and it contains in detail training time, recognition accuracy and the feature selection. The CLS-SVM algorithm is presented to increase the performance of the LS-SVM classifier. The new method can select the best fault features in much shorter time and have fewer support vectbrs and better general- ization performance in the application of fault diagnosis of the blast furnace. Thirdly, a gradual change binary tree is established for blast furnace faults diagnosis. It is a multi-class classification method based on center-of-gravity formula distance of cluster. A gradual change classification percentage ia used to select sample randomly. The proposed new metbod raises the sped of diagnosis, optimizes the classifieation scraraey and has good generalization ability for fault diagnosis of the application of blast furnace.展开更多
This study presents a framework for predicting geological characteristics based on integrating a stacking classification algorithm(SCA) with a grid search(GS) and K-fold cross validation(K-CV). The SCA includes two le...This study presents a framework for predicting geological characteristics based on integrating a stacking classification algorithm(SCA) with a grid search(GS) and K-fold cross validation(K-CV). The SCA includes two learner layers: a primary learner’s layer and meta-classifier layer. The accuracy of the SCA can be improved by using the GS and K-CV. The GS was developed to match the hyper-parameters and optimise complicated problems. The K-CV is commonly applied to changing the validation set in a training set. In general, a GS is usually combined with K-CV to produce a corresponding evaluation index and select the best hyper-parameters. The torque penetration index(TPI) and field penetration index(FPI) are proposed based on shield parameters to express the geological characteristics. The elbow method(EM) and silhouette coefficient(Si) are employed to determine the types of geological characteristics(K) in a Kmeans++ algorithm. A case study on mixed ground in Guangzhou is adopted to validate the applicability of the developed model. The results show that with the developed framework, the four selected parameters, i.e. thrust, advance rate, cutterhead rotation speed and cutterhead torque, can be used to effectively predict the corresponding geological characteristics.展开更多
In this paper, sixty-eight research articles published between 2000 and 2017 as well as textbooks which employed four classification algorithms: K-Nearest-Neighbor (KNN), Support Vector Machines (SVM), Random Forest (...In this paper, sixty-eight research articles published between 2000 and 2017 as well as textbooks which employed four classification algorithms: K-Nearest-Neighbor (KNN), Support Vector Machines (SVM), Random Forest (RF) and Neural Network (NN) as the main statistical tools were reviewed. The aim was to examine and compare these nonparametric classification methods on the following attributes: robustness to training data, sensitivity to changes, data fitting, stability, ability to handle large data sizes, sensitivity to noise, time invested in parameter tuning, and accuracy. The performances, strengths and shortcomings of each of the algorithms were examined, and finally, a conclusion was arrived at on which one has higher performance. It was evident from the literature reviewed that RF is too sensitive to small changes in the training dataset and is occasionally unstable and tends to overfit in the model. KNN is easy to implement and understand but has a major drawback of becoming significantly slow as the size of the data in use grows, while the ideal value of K for the KNN classifier is difficult to set. SVM and RF are insensitive to noise or overtraining, which shows their ability in dealing with unbalanced data. Larger input datasets will lengthen classification times for NN and KNN more than for SVM and RF. Among these nonparametric classification methods, NN has the potential to become a more widely used classification algorithm, but because of their time-consuming parameter tuning procedure, high level of complexity in computational processing, the numerous types of NN architectures to choose from and the high number of algorithms used for training, most researchers recommend SVM and RF as easier and wieldy used methods which repeatedly achieve results with high accuracies and are often faster to implement.展开更多
Considering strip steel surface defect samples, a multi-class classification method was proposed based on enhanced least squares twin support vector machines (ELS-TWSVMs) and binary tree. Firstly, pruning region sam...Considering strip steel surface defect samples, a multi-class classification method was proposed based on enhanced least squares twin support vector machines (ELS-TWSVMs) and binary tree. Firstly, pruning region samples center method with adjustable pruning scale was used to prune data samples. This method could reduce classifierr s training time and testing time. Secondly, ELS-TWSVM was proposed to classify the data samples. By introducing error variable contribution parameter and weight parameter, ELS-TWSVM could restrain the impact of noise sam- ples and have better classification accuracy. Finally, multi-class classification algorithms of ELS-TWSVM were pro- posed by combining ELS-TWSVM and complete binary tree. Some experiments were made on two-dimensional data- sets and strip steel surface defect datasets. The experiments showed that the multi-class classification methods of ELS-TWSVM had higher classification speed and accuracy for the datasets with large-scale, unbalanced and noise samples.展开更多
Clustering filtering is usually a practical method for light detection and ranging(LiDAR)point clouds filtering according to their characteristic attributes.However,the amount of point cloud data is extremely large in...Clustering filtering is usually a practical method for light detection and ranging(LiDAR)point clouds filtering according to their characteristic attributes.However,the amount of point cloud data is extremely large in practice,making it impossible to cluster point clouds data directly,and the filtering error is also too large.Moreover,many existing filtering algorithms have poor classification results in discontinuous terrain.This article proposes a new fast classification filtering algorithm based on density clustering,which can solve the problem of point clouds classification in discontinuous terrain.Based on the spatial density of LiDAR point clouds,also the features of the ground object point clouds and the terrain point clouds,the point clouds are clustered firstly by their elevations,and then the plane point clouds are selected.Thus the number of samples and feature dimensions of data are reduced.Using the DBSCAN clustering filtering method,the original point clouds are finally divided into noise point clouds,ground object point clouds,and terrain point clouds.The experiment uses 15 sets of data samples provided by the International Society for Photogrammetry and Remote Sensing(ISPRS),and the results of the proposed algorithm are compared with the other eight classical filtering algorithms.Quantitative and qualitative analysis shows that the proposed algorithm has good applicability in urban areas and rural areas,and is significantly better than other classic filtering algorithms in discontinuous terrain,with a total error of about 10%.The results show that the proposed method is feasible and can be used in different terrains.展开更多
A brain-computer interface(BCI)system is one of the most effective ways that translates brain signals into output commands.Different imagery activities can be classified based on the changes inμandβrhythms and their...A brain-computer interface(BCI)system is one of the most effective ways that translates brain signals into output commands.Different imagery activities can be classified based on the changes inμandβrhythms and their spatial distributions.Multi-layer perceptron neural networks(MLP-NNs)are commonly used for classification.Training such MLP-NNs has great importance in a way that has attracted many researchers to this field recently.Conventional methods for training NNs,such as gradient descent and recursive methods,have some disadvantages including low accuracy,slow convergence speed and trapping in local minimums.In this paper,in order to overcome these issues,the MLP-NN trained by a hybrid population-physics-based algorithm,the combination of particle swarm optimization and gravitational search algorithm(PSOGSA),is proposed for our classification problem.To show the advantages of using PSOGSA that trains NNs,this algorithm is compared with other meta-heuristic algorithms such as particle swarm optimization(PSO),gravitational search algorithm(GSA)and new versions of PSO.The metrics that are discussed in this paper are the speed of convergence and classification accuracy metrics.The results show that the proposed algorithm in most subjects of encephalography(EEG)dataset has very better or acceptable performance compared to others.展开更多
This paper presents a new framework for object-based classification of high-resolution hyperspectral data.This multi-step framework is based on multi-resolution segmentation(MRS)and Random Forest classifier(RFC)algori...This paper presents a new framework for object-based classification of high-resolution hyperspectral data.This multi-step framework is based on multi-resolution segmentation(MRS)and Random Forest classifier(RFC)algorithms.The first step is to determine of weights of the input features while using the object-based approach with MRS to processing such images.Given the high number of input features,an automatic method is needed for estimation of this parameter.Moreover,we used the Variable Importance(VI),one of the outputs of the RFC,to determine the importance of each image band.Then,based on this parameter and other required parameters,the image is segmented into some homogenous regions.Finally,the RFC is carried out based on the characteristics of segments for converting them into meaningful objects.The proposed method,as well as,the conventional pixel-based RFC and Support Vector Machine(SVM)method was applied to three different hyperspectral data-sets with various spectral and spatial characteristics.These data were acquired by the HyMap,the Airborne Prism Experiment(APEX),and the Compact Airborne Spectrographic Imager(CASI)hyperspectral sensors.The experimental results show that the proposed method is more consistent for land cover mapping in various areas.The overall classification accuracy(OA),obtained by the proposed method was 95.48,86.57,and 84.29%for the HyMap,the APEX,and the CASI datasets,respectively.Moreover,this method showed better efficiency in comparison to the spectralbased classifications because the OAs of the proposed method was 5.67 and 3.75%higher than the conventional RFC and SVM classifiers,respectively.展开更多
In order to improve the accuracy and reduce the training and testing time in image classification algorithm, a novel image classification scheme based on extreme learning machine(ELM) and linear spatial pyramid matchi...In order to improve the accuracy and reduce the training and testing time in image classification algorithm, a novel image classification scheme based on extreme learning machine(ELM) and linear spatial pyramid matching using sparse coding(Sc SPM) for image classification is proposed. A new structure based on two layer extreme learning machine instead of the original linear SVM classifier is constructed. Firstly, the Sc SPM algorithm is performed to extract features of the multi-scale image blocks, and then each layer feature vector is connected to an ELM. Finally, the mapping features are connected together, and as the input of one ELM based on radial basis kernel function. With experimental evaluations on the well-known dataset benchmarks, the results demonstrate that the proposed algorithm has better performance not only in reducing the training time, but also in improving the accuracy of classification.展开更多
Focusing on strip steel surface defects classification, a novel support vector machine with adjustable hyper-sphere (AHSVM) is formulated. Meanwhile, a new multi-class classification method is proposed. Originated f...Focusing on strip steel surface defects classification, a novel support vector machine with adjustable hyper-sphere (AHSVM) is formulated. Meanwhile, a new multi-class classification method is proposed. Originated from support vector data description, AHSVM adopts hyper-sphere to solve classification problem. AHSVM can obey two principles: the margin maximization and inner-class dispersion minimization. Moreover, the hyper-sphere of AHSVM is adjustable, which makes the final classification hyper-sphere optimal for training dataset. On the other hand, AHSVM is combined with binary tree to solve multi-class classification for steel surface defects. A scheme of samples pruning in mapped feature space is provided, which can reduce the number of training samples under the premise of classification accuracy, resulting in the improvements of classification speed. Finally, some testing experiments are done for eight types of strip steel surface defects. Experimental results show that multi-class AHSVM classifier exhibits satisfactory results in classification accuracy and efficiency.展开更多
Defect classification is the key task of a steel surface defect detection system.The current defect classification algorithms have not taken the feature noise into consideration.In order to reduce the adverse impact o...Defect classification is the key task of a steel surface defect detection system.The current defect classification algorithms have not taken the feature noise into consideration.In order to reduce the adverse impact of feature noise,an anti-noise multi-class classification method was proposed for steel surface defects.On the one hand,a novel anti-noise support vector hyper-spheres(ASVHs)classifier was formulated.For N types of defects,the ASVHs classifier built N hyper-spheres.These hyper-spheres were insensitive to feature and label noise.On the other hand,in order to reduce the costs of online time and storage space,the defect samples were pruned by support vector data description with parameter iteration adjustment strategy.In the end,the ASVHs classifier was built with sparse defect samples set and auxiliary information.Experimental results show that the novel multi-class classification method has high efficiency and accuracy for corrupted defect samples in steel surface.展开更多
AIM:To conduct a classification study of high myopic maculopathy(HMM)using limited datasets,including tessellated fundus,diffuse chorioretinal atrophy,patchy chorioretinal atrophy,and macular atrophy,and minimize anno...AIM:To conduct a classification study of high myopic maculopathy(HMM)using limited datasets,including tessellated fundus,diffuse chorioretinal atrophy,patchy chorioretinal atrophy,and macular atrophy,and minimize annotation costs,and to optimize the ALFA-Mix active learning algorithm and apply it to HMM classification.METHODS:The optimized ALFA-Mix algorithm(ALFAMix+)was compared with five algorithms,including ALFA-Mix.Four models,including Res Net18,were established.Each algorithm was combined with four models for experiments on the HMM dataset.Each experiment consisted of 20 active learning rounds,with 100 images selected per round.The algorithm was evaluated by comparing the number of rounds in which ALFA-Mix+outperformed other algorithms.Finally,this study employed six models,including Efficient Former,to classify HMM.The best-performing model among these models was selected as the baseline model and combined with the ALFA-Mix+algorithm to achieve satisfactor y classification results with a small dataset.RESULTS:ALFA-Mix+outperforms other algorithms with an average superiority of 16.6,14.75,16.8,and 16.7 rounds in terms of accuracy,sensitivity,specificity,and Kappa value,respectively.This study conducted experiments on classifying HMM using several advanced deep learning models with a complete training set of 4252 images.The Efficient Former achieved the best results with an accuracy,sensitivity,specificity,and Kappa value of 0.8821,0.8334,0.9693,and 0.8339,respectively.Therefore,by combining ALFA-Mix+with Efficient Former,this study achieved results with an accuracy,sensitivity,specificity,and Kappa value of 0.8964,0.8643,0.9721,and 0.8537,respectively.CONCLUSION:The ALFA-Mix+algorithm reduces the required samples without compromising accuracy.Compared to other algorithms,ALFA-Mix+outperforms in more rounds of experiments.It effectively selects valuable samples compared to other algorithms.In HMM classification,combining ALFA-Mix+with Efficient Former enhances model performance,further demonstrating the effectiveness of ALFA-Mix+.展开更多
Coastal wetlands are characterized by complex patterns both in their geomorphlc and ecological teatures. Besides field observations, it is necessary to analyze the land cover of wetlands through the color infrared (...Coastal wetlands are characterized by complex patterns both in their geomorphlc and ecological teatures. Besides field observations, it is necessary to analyze the land cover of wetlands through the color infrared (CIR) aerial photography or remote sensing image. In this paper, we designed an evolving neural network classifier using variable string genetic algorithm (VGA) for the land cover classification of CIR aerial image. With the VGA, the classifier that we designed is able to evolve automatically the appropriate number of hidden nodes for modeling the neural network topology optimally and to find a near-optimal set of connection weights globally. Then, with backpropagation algorithm (BP), it can find the best connection weights. The VGA-BP classifier, which is derived from hybrid algorithms mentioned above, is demonstrated on CIR images classification effectively. Compared with standard classifiers, such as Bayes maximum-likelihood classifier, VGA classifier and BP-MLP (multi-layer perception) classifier, it has shown that the VGA-BP classifier can have better performance on highly resolution land cover classification.展开更多
基金funded by Scientific Research Deanship at University of Hail-Saudi Arabia through Project Number RG-23092.
文摘Cyberbullying on social media poses significant psychological risks,yet most detection systems over-simplify the task by focusing on binary classification,ignoring nuanced categories like passive-aggressive remarks or indirect slurs.To address this gap,we propose a hybrid framework combining Term Frequency-Inverse Document Frequency(TF-IDF),word-to-vector(Word2Vec),and Bidirectional Encoder Representations from Transformers(BERT)based models for multi-class cyberbullying detection.Our approach integrates TF-IDF for lexical specificity and Word2Vec for semantic relationships,fused with BERT’s contextual embeddings to capture syntactic and semantic complexities.We evaluate the framework on a publicly available dataset of 47,000 annotated social media posts across five cyberbullying categories:age,ethnicity,gender,religion,and indirect aggression.Among BERT variants tested,BERT Base Un-Cased achieved the highest performance with 93%accuracy(standard deviation across±1%5-fold cross-validation)and an average AUC of 0.96,outperforming standalone TF-IDF(78%)and Word2Vec(82%)models.Notably,it achieved near-perfect AUC scores(0.99)for age and ethnicity-based bullying.A comparative analysis with state-of-the-art benchmarks,including Generative Pre-trained Transformer 2(GPT-2)and Text-to-Text Transfer Transformer(T5)models highlights BERT’s superiority in handling ambiguous language.This work advances cyberbullying detection by demonstrating how hybrid feature extraction and transformer models improve multi-class classification,offering a scalable solution for moderating nuanced harmful content.
基金supported by the National Natural Science Foundation of China(Grant Number 62341210)Natural Science Foundation of Guangxi Province(Grant Number:2025GXNSFHA069267)Science and Technology Development Plan for Baise City(Grant Number 20233654).
文摘DNA microarrays, a cornerstone in biomedicine, measure gene expression across thousands to tens of thousands of genes. Identifying the genes vital for accurate cancer classification is a key challenge. Here, we present Fs-LSA (F-score based Learning Search Algorithm), a novel gene selection algorithm designed to enhance the precision and efficiency of target gene identification from microarray data for cancer classification. This algorithm is divided into two phases: the first leverages F-score values to prioritize and select feature genes with the most significant differential expression;the second phase introduces our Learning Search Algorithm (LSA), which harnesses swarm intelligence to identify the optimal subset among the remaining genes. Inspired by human social learning, LSA integrates historical data and collective intelligence for a thorough search, with a dynamic control mechanism that balances exploration and refinement, thereby enhancing the gene selection process. We conducted a rigorous validation of Fs-LSA’s performance using eight publicly available cancer microarray expression datasets. Fs-LSA achieved accuracy, precision, sensitivity, and F1-score values of 0.9932, 0.9923, 0.9962, and 0.994, respectively. Comparative analyses with state-of-the-art algorithms revealed Fs-LSA’s superior performance in terms of simplicity and efficiency. Additionally, we validated the algorithm’s efficacy independently using glioblastoma data from GEO and TCGA databases. It was significantly superior to those of the comparison algorithms. Importantly, the driver genes identified by Fs-LSA were instrumental in developing a predictive model as an independent prognostic indicator for glioblastoma, underscoring Fs-LSA’s transformative potential in genomics and personalized medicine.
基金funded by the Ministry of Higher Education of Malaysia,grant number FRGS/1/2022/ICT02/UPSI/02/1.
文摘In recent years,feature selection(FS)optimization of high-dimensional gene expression data has become one of the most promising approaches for cancer prediction and classification.This work reviews FS and classification methods that utilize evolutionary algorithms(EAs)for gene expression profiles in cancer or medical applications based on research motivations,challenges,and recommendations.Relevant studies were retrieved from four major academic databases-IEEE,Scopus,Springer,and ScienceDirect-using the keywords‘cancer classification’,‘optimization’,‘FS’,and‘gene expression profile’.A total of 67 papers were finally selected with key advancements identified as follows:(1)The majority of papers(44.8%)focused on developing algorithms and models for FS and classification.(2)The second category encompassed studies on biomarker identification by EAs,including 20 papers(30%).(3)The third category comprised works that applied FS to cancer data for decision support system purposes,addressing high-dimensional data and the formulation of chromosome length.These studies accounted for 12%of the total number of studies.(4)The remaining three papers(4.5%)were reviews and surveys focusing on models and developments in prediction and classification optimization for cancer classification under current technical conditions.This review highlights the importance of optimizing FS in EAs to manage high-dimensional data effectively.Despite recent advancements,significant limitations remain:the dynamic formulation of chromosome length remains an underexplored area.Thus,further research is needed on dynamic-length chromosome techniques for more sophisticated biomarker gene selection techniques.The findings suggest that further advancements in dynamic chromosome length formulations and adaptive algorithms could enhance cancer classification accuracy and efficiency.
基金supported by the National Natural Science Foundation of China(61703131 61703129+1 种基金 61701148 61703128)
文摘The basic idea of multi-class classification is a disassembly method,which is to decompose a multi-class classification task into several binary classification tasks.In order to improve the accuracy of multi-class classification in the case of insufficient samples,this paper proposes a multi-class classification method combining K-means and multi-task relationship learning(MTRL).The method first uses the split method of One vs.Rest to disassemble the multi-class classification task into binary classification tasks.K-means is used to down sample the dataset of each task,which can prevent over-fitting of the model while reducing training costs.Finally,the sampled dataset is applied to the MTRL,and multiple binary classifiers are trained together.With the help of MTRL,this method can utilize the inter-task association to train the model,and achieve the purpose of improving the classification accuracy of each binary classifier.The effectiveness of the proposed approach is demonstrated by experimental results on the Iris dataset,Wine dataset,Multiple Features dataset,Wireless Indoor Localization dataset and Avila dataset.
文摘The accurate identification and classification of various power quality disturbances are keys to ensuring high-quality electrical energy. In this study, the statistical characteristics of the disturbance signal of wavelet transform coefficients and wavelet transform energy distribution constitute feature vectors. These vectors are then trained and tested using SVM multi-class algorithms. Experimental results demonstrate that the SVM multi-class algorithms, which use the Gaussian radial basis function, exponential radial basis function, and hyperbolic tangent function as basis functions, are suitable methods for power quality disturbance classification.
文摘A new arrival and departure flight classification method based on the transitive closure algorithm (TCA) is proposed. Firstly, the fuzzy set theory and the transitive closure algorithm are introduced. Then four different factors are selected to establish the flight classification model and a method is given to calculate the delay cost for each class. Finally, the proposed method is implemented in the sequencing problems of flights in a terminal area, and results are compared with that of the traditional classification method(TCM). Results show that the new classification model is effective in reducing the expenses of flight delays, thus optimizing the sequences of arrival and departure flights, and improving the efficiency of air traffic control.
文摘A learning algorithm based on a hard limiter for feedforward neural networks (NN) is presented,and is applied in solving classification problems on separable convex sets and disjoint sets.It has been proved that the algorithm has stronger classification ability than that of the back propagation (BP) algorithm for the feedforward NN using sigmoid function by simulation.What is more,the models can be implemented with lower cost hardware than that of the BP NN.LEARNIN
文摘Since there are many factors affecting the quality of wine, total 17 factors were screened out using principle component analysis. The difference test was conducted on the evaluation data of the two groups of testers. The results showed that the evaluation data of the second group were more reliable compared with those of the first group. At the same time, the KM algorithm was optimized using the QPSO algorithm. The wine classification model was established. Compared with the other two algorithms, the QPSO-KM algorithm was more capable of searching the globally optimum solution, and it could be used to classify the wine samples. In addition,the QPSO-KM algorithm could also be used to solve the issues about clustering.
基金Item Sponsored by National Natural Science Foundation of China(60843007,61050006)
文摘Aiming at the limitations of rapid fault diagnosis of blast furnace, a novel strategy based on cost-conscious least squares support vector machine (LS-SVM) is proposed to solve this problem. Firstly, modified discrete particle swarm optimization is applied to optimize the feature selection and the LS-SVM parameters. Secondly, cost-con- scious formula is presented for fitness function and it contains in detail training time, recognition accuracy and the feature selection. The CLS-SVM algorithm is presented to increase the performance of the LS-SVM classifier. The new method can select the best fault features in much shorter time and have fewer support vectbrs and better general- ization performance in the application of fault diagnosis of the blast furnace. Thirdly, a gradual change binary tree is established for blast furnace faults diagnosis. It is a multi-class classification method based on center-of-gravity formula distance of cluster. A gradual change classification percentage ia used to select sample randomly. The proposed new metbod raises the sped of diagnosis, optimizes the classifieation scraraey and has good generalization ability for fault diagnosis of the application of blast furnace.
基金funded by“The Pearl River Talent Recruitment Program”of Guangdong Province in 2019(Grant No.2019CX01G338)the Research Funding of Shantou University for New Faculty Member(Grant No.NTF19024-2019).
文摘This study presents a framework for predicting geological characteristics based on integrating a stacking classification algorithm(SCA) with a grid search(GS) and K-fold cross validation(K-CV). The SCA includes two learner layers: a primary learner’s layer and meta-classifier layer. The accuracy of the SCA can be improved by using the GS and K-CV. The GS was developed to match the hyper-parameters and optimise complicated problems. The K-CV is commonly applied to changing the validation set in a training set. In general, a GS is usually combined with K-CV to produce a corresponding evaluation index and select the best hyper-parameters. The torque penetration index(TPI) and field penetration index(FPI) are proposed based on shield parameters to express the geological characteristics. The elbow method(EM) and silhouette coefficient(Si) are employed to determine the types of geological characteristics(K) in a Kmeans++ algorithm. A case study on mixed ground in Guangzhou is adopted to validate the applicability of the developed model. The results show that with the developed framework, the four selected parameters, i.e. thrust, advance rate, cutterhead rotation speed and cutterhead torque, can be used to effectively predict the corresponding geological characteristics.
文摘In this paper, sixty-eight research articles published between 2000 and 2017 as well as textbooks which employed four classification algorithms: K-Nearest-Neighbor (KNN), Support Vector Machines (SVM), Random Forest (RF) and Neural Network (NN) as the main statistical tools were reviewed. The aim was to examine and compare these nonparametric classification methods on the following attributes: robustness to training data, sensitivity to changes, data fitting, stability, ability to handle large data sizes, sensitivity to noise, time invested in parameter tuning, and accuracy. The performances, strengths and shortcomings of each of the algorithms were examined, and finally, a conclusion was arrived at on which one has higher performance. It was evident from the literature reviewed that RF is too sensitive to small changes in the training dataset and is occasionally unstable and tends to overfit in the model. KNN is easy to implement and understand but has a major drawback of becoming significantly slow as the size of the data in use grows, while the ideal value of K for the KNN classifier is difficult to set. SVM and RF are insensitive to noise or overtraining, which shows their ability in dealing with unbalanced data. Larger input datasets will lengthen classification times for NN and KNN more than for SVM and RF. Among these nonparametric classification methods, NN has the potential to become a more widely used classification algorithm, but because of their time-consuming parameter tuning procedure, high level of complexity in computational processing, the numerous types of NN architectures to choose from and the high number of algorithms used for training, most researchers recommend SVM and RF as easier and wieldy used methods which repeatedly achieve results with high accuracies and are often faster to implement.
基金Item Sponsored by National Natural Science Foundation of China(61050006)
文摘Considering strip steel surface defect samples, a multi-class classification method was proposed based on enhanced least squares twin support vector machines (ELS-TWSVMs) and binary tree. Firstly, pruning region samples center method with adjustable pruning scale was used to prune data samples. This method could reduce classifierr s training time and testing time. Secondly, ELS-TWSVM was proposed to classify the data samples. By introducing error variable contribution parameter and weight parameter, ELS-TWSVM could restrain the impact of noise sam- ples and have better classification accuracy. Finally, multi-class classification algorithms of ELS-TWSVM were pro- posed by combining ELS-TWSVM and complete binary tree. Some experiments were made on two-dimensional data- sets and strip steel surface defect datasets. The experiments showed that the multi-class classification methods of ELS-TWSVM had higher classification speed and accuracy for the datasets with large-scale, unbalanced and noise samples.
基金The Natural Science Foundation of Hunan Province,China(No.2020JJ4601)Open Fund of the Key Laboratory of Highway Engi-neering of Ministry of Education(No.kfj190203).
文摘Clustering filtering is usually a practical method for light detection and ranging(LiDAR)point clouds filtering according to their characteristic attributes.However,the amount of point cloud data is extremely large in practice,making it impossible to cluster point clouds data directly,and the filtering error is also too large.Moreover,many existing filtering algorithms have poor classification results in discontinuous terrain.This article proposes a new fast classification filtering algorithm based on density clustering,which can solve the problem of point clouds classification in discontinuous terrain.Based on the spatial density of LiDAR point clouds,also the features of the ground object point clouds and the terrain point clouds,the point clouds are clustered firstly by their elevations,and then the plane point clouds are selected.Thus the number of samples and feature dimensions of data are reduced.Using the DBSCAN clustering filtering method,the original point clouds are finally divided into noise point clouds,ground object point clouds,and terrain point clouds.The experiment uses 15 sets of data samples provided by the International Society for Photogrammetry and Remote Sensing(ISPRS),and the results of the proposed algorithm are compared with the other eight classical filtering algorithms.Quantitative and qualitative analysis shows that the proposed algorithm has good applicability in urban areas and rural areas,and is significantly better than other classic filtering algorithms in discontinuous terrain,with a total error of about 10%.The results show that the proposed method is feasible and can be used in different terrains.
文摘A brain-computer interface(BCI)system is one of the most effective ways that translates brain signals into output commands.Different imagery activities can be classified based on the changes inμandβrhythms and their spatial distributions.Multi-layer perceptron neural networks(MLP-NNs)are commonly used for classification.Training such MLP-NNs has great importance in a way that has attracted many researchers to this field recently.Conventional methods for training NNs,such as gradient descent and recursive methods,have some disadvantages including low accuracy,slow convergence speed and trapping in local minimums.In this paper,in order to overcome these issues,the MLP-NN trained by a hybrid population-physics-based algorithm,the combination of particle swarm optimization and gravitational search algorithm(PSOGSA),is proposed for our classification problem.To show the advantages of using PSOGSA that trains NNs,this algorithm is compared with other meta-heuristic algorithms such as particle swarm optimization(PSO),gravitational search algorithm(GSA)and new versions of PSO.The metrics that are discussed in this paper are the speed of convergence and classification accuracy metrics.The results show that the proposed algorithm in most subjects of encephalography(EEG)dataset has very better or acceptable performance compared to others.
文摘This paper presents a new framework for object-based classification of high-resolution hyperspectral data.This multi-step framework is based on multi-resolution segmentation(MRS)and Random Forest classifier(RFC)algorithms.The first step is to determine of weights of the input features while using the object-based approach with MRS to processing such images.Given the high number of input features,an automatic method is needed for estimation of this parameter.Moreover,we used the Variable Importance(VI),one of the outputs of the RFC,to determine the importance of each image band.Then,based on this parameter and other required parameters,the image is segmented into some homogenous regions.Finally,the RFC is carried out based on the characteristics of segments for converting them into meaningful objects.The proposed method,as well as,the conventional pixel-based RFC and Support Vector Machine(SVM)method was applied to three different hyperspectral data-sets with various spectral and spatial characteristics.These data were acquired by the HyMap,the Airborne Prism Experiment(APEX),and the Compact Airborne Spectrographic Imager(CASI)hyperspectral sensors.The experimental results show that the proposed method is more consistent for land cover mapping in various areas.The overall classification accuracy(OA),obtained by the proposed method was 95.48,86.57,and 84.29%for the HyMap,the APEX,and the CASI datasets,respectively.Moreover,this method showed better efficiency in comparison to the spectralbased classifications because the OAs of the proposed method was 5.67 and 3.75%higher than the conventional RFC and SVM classifiers,respectively.
基金supported by the National Natural Science Foundation of China under Grant No. 61503424the Research Project by The State Ethnic Affairs Commission under Grant No. 14ZYZ017+1 种基金the Jiangsu Future Networks Innovation Institute-Prospective Research Project on Future Networks under Grant No. BY2013095-2-14the first-class discipline construction transitional funds of Minzu University of China
文摘In order to improve the accuracy and reduce the training and testing time in image classification algorithm, a novel image classification scheme based on extreme learning machine(ELM) and linear spatial pyramid matching using sparse coding(Sc SPM) for image classification is proposed. A new structure based on two layer extreme learning machine instead of the original linear SVM classifier is constructed. Firstly, the Sc SPM algorithm is performed to extract features of the multi-scale image blocks, and then each layer feature vector is connected to an ELM. Finally, the mapping features are connected together, and as the input of one ELM based on radial basis kernel function. With experimental evaluations on the well-known dataset benchmarks, the results demonstrate that the proposed algorithm has better performance not only in reducing the training time, but also in improving the accuracy of classification.
文摘Focusing on strip steel surface defects classification, a novel support vector machine with adjustable hyper-sphere (AHSVM) is formulated. Meanwhile, a new multi-class classification method is proposed. Originated from support vector data description, AHSVM adopts hyper-sphere to solve classification problem. AHSVM can obey two principles: the margin maximization and inner-class dispersion minimization. Moreover, the hyper-sphere of AHSVM is adjustable, which makes the final classification hyper-sphere optimal for training dataset. On the other hand, AHSVM is combined with binary tree to solve multi-class classification for steel surface defects. A scheme of samples pruning in mapped feature space is provided, which can reduce the number of training samples under the premise of classification accuracy, resulting in the improvements of classification speed. Finally, some testing experiments are done for eight types of strip steel surface defects. Experimental results show that multi-class AHSVM classifier exhibits satisfactory results in classification accuracy and efficiency.
基金This work was supported by the National Natural Science Foundation of China(No.51674140)Natural Science Foundation of Liaoning Province,China(No.20180550067)+2 种基金Department of Education of Liaoning Province,China(Nos.2017LNQN11 and 2020LNZD06)University of Science and Technology Liaoning Talent Project Grants(No.601011507-20)University of Science and Technology Liaoning Team Building Grants(No.601013360-17).
文摘Defect classification is the key task of a steel surface defect detection system.The current defect classification algorithms have not taken the feature noise into consideration.In order to reduce the adverse impact of feature noise,an anti-noise multi-class classification method was proposed for steel surface defects.On the one hand,a novel anti-noise support vector hyper-spheres(ASVHs)classifier was formulated.For N types of defects,the ASVHs classifier built N hyper-spheres.These hyper-spheres were insensitive to feature and label noise.On the other hand,in order to reduce the costs of online time and storage space,the defect samples were pruned by support vector data description with parameter iteration adjustment strategy.In the end,the ASVHs classifier was built with sparse defect samples set and auxiliary information.Experimental results show that the novel multi-class classification method has high efficiency and accuracy for corrupted defect samples in steel surface.
基金Supported by the National Natural Science Foundation of China(No.61906066)the Zhejiang Provincial Philosophy and Social Science Planning Project(No.21NDJC021Z)+4 种基金Shenzhen Fund for Guangdong Provincial High-level Clinical Key Specialties(No.SZGSP014)Sanming Project of Medicine in Shenzhen(No.SZSM202011015)Shenzhen Science and Technology Planning Project(No.KCXFZ20211020163813019)the Natural Science Foundation of Ningbo City(No.202003N4072)the Postgraduate Research and Innovation Project of Huzhou University(No.2023KYCX52)。
文摘AIM:To conduct a classification study of high myopic maculopathy(HMM)using limited datasets,including tessellated fundus,diffuse chorioretinal atrophy,patchy chorioretinal atrophy,and macular atrophy,and minimize annotation costs,and to optimize the ALFA-Mix active learning algorithm and apply it to HMM classification.METHODS:The optimized ALFA-Mix algorithm(ALFAMix+)was compared with five algorithms,including ALFA-Mix.Four models,including Res Net18,were established.Each algorithm was combined with four models for experiments on the HMM dataset.Each experiment consisted of 20 active learning rounds,with 100 images selected per round.The algorithm was evaluated by comparing the number of rounds in which ALFA-Mix+outperformed other algorithms.Finally,this study employed six models,including Efficient Former,to classify HMM.The best-performing model among these models was selected as the baseline model and combined with the ALFA-Mix+algorithm to achieve satisfactor y classification results with a small dataset.RESULTS:ALFA-Mix+outperforms other algorithms with an average superiority of 16.6,14.75,16.8,and 16.7 rounds in terms of accuracy,sensitivity,specificity,and Kappa value,respectively.This study conducted experiments on classifying HMM using several advanced deep learning models with a complete training set of 4252 images.The Efficient Former achieved the best results with an accuracy,sensitivity,specificity,and Kappa value of 0.8821,0.8334,0.9693,and 0.8339,respectively.Therefore,by combining ALFA-Mix+with Efficient Former,this study achieved results with an accuracy,sensitivity,specificity,and Kappa value of 0.8964,0.8643,0.9721,and 0.8537,respectively.CONCLUSION:The ALFA-Mix+algorithm reduces the required samples without compromising accuracy.Compared to other algorithms,ALFA-Mix+outperforms in more rounds of experiments.It effectively selects valuable samples compared to other algorithms.In HMM classification,combining ALFA-Mix+with Efficient Former enhances model performance,further demonstrating the effectiveness of ALFA-Mix+.
文摘Coastal wetlands are characterized by complex patterns both in their geomorphlc and ecological teatures. Besides field observations, it is necessary to analyze the land cover of wetlands through the color infrared (CIR) aerial photography or remote sensing image. In this paper, we designed an evolving neural network classifier using variable string genetic algorithm (VGA) for the land cover classification of CIR aerial image. With the VGA, the classifier that we designed is able to evolve automatically the appropriate number of hidden nodes for modeling the neural network topology optimally and to find a near-optimal set of connection weights globally. Then, with backpropagation algorithm (BP), it can find the best connection weights. The VGA-BP classifier, which is derived from hybrid algorithms mentioned above, is demonstrated on CIR images classification effectively. Compared with standard classifiers, such as Bayes maximum-likelihood classifier, VGA classifier and BP-MLP (multi-layer perception) classifier, it has shown that the VGA-BP classifier can have better performance on highly resolution land cover classification.