One-class support vector machine (OCSVM) and support vector data description (SVDD) are two main domain-based one-class (kernel) classifiers. To reveal their relationship with density estimation in the case of t...One-class support vector machine (OCSVM) and support vector data description (SVDD) are two main domain-based one-class (kernel) classifiers. To reveal their relationship with density estimation in the case of the Gaussian kernel, OCSVM and SVDD are firstly unified into the framework of kernel density estimation, and the essential relationship between them is explicitly revealed. Then the result proves that the density estimation induced by OCSVM or SVDD is in agreement with the true density. Meanwhile, it can also reduce the integrated squared error (ISE). Finally, experiments on several simulated datasets verify the revealed relationships.展开更多
Fuel injectors are considered as an important component of combustion engines. Operational weakness can possibly lead to the complete machine malfunction, decreasing reliability and leading to loss of production. To o...Fuel injectors are considered as an important component of combustion engines. Operational weakness can possibly lead to the complete machine malfunction, decreasing reliability and leading to loss of production. To overcome these circumstances, various condition monitoring techniques can be applied. The application of acoustic signals is common in the field of fault diagnosis of rotating machinery. Advanced signal processing is utilized for the construction of features that are specialized in detecting fuel injector faults. A performance comparison between novelty detection algorithms in the form of one-class classifiers is presented. The one-class classifiers that were tested included One-Class Support Vector Machine (OCSVM) and One-Class Self Organizing Map (OCSOM). The acoustic signals of fuel injectors in different operational conditions were processed for feature extraction. Features from all the signals were used as input to the one-class classifiers. The one-class classifiers were trained only with healthy fuel injector conditions and compared with new experimental data which belonged to different operational conditions that were not included in the training set so as to contribute to generalization. The results present the effectiveness of one-class classifiers for detecting faults in fuel injectors.展开更多
microRNAs (miRNAs) are short nucleotide sequences expressed by a genome that are involved in post transcriptional modulation of gene expression. Since miRNAs need to be co-expressed with their target mRNA to observe a...microRNAs (miRNAs) are short nucleotide sequences expressed by a genome that are involved in post transcriptional modulation of gene expression. Since miRNAs need to be co-expressed with their target mRNA to observe an effect and since miRNAs and target interactions can be cooperative, it is currently not possible to develop a comprehensive experimental atlas of miRNAs and their targets. To overcome this limitation, machine learning has been applied to miRNA detection. In general binary learning (two-class) approaches are applied to miRNA discovery. These learners consider both positive (miRNA) and negative (non-miRNA) examples during the training process. One-class classifiers, on the other hand, use only the information for the target class (miRNA). The one-class approach in machine learning is gradually receiving more attention particularly for solving problems where the negative class is not well defined. This is especially true for miRNAs where the positive class can be experimentally confirmed relatively easy, but where it is not currently possible to call any part of a genome a non-miRNA. To do that, it should be co-expressed with all other possible transcripts of the genome, which currently is a futile endeavor. For machine learning, miRNAs need to be transformed into a feature vector and some currently used features like minimum free energy vary widely in the case of plant miRNAs. In this study it was our aim to analyze different methods applying one-class approaches and the effectiveness of motif-based features for prediction of plant miRNA genes. We show that the application of these one-class classifiers is promising and useful for this kind of problem which relies only on sequence- based features such as k-mers and motifs comparing to the results from two-class classification. In some cases the results of one-class are, to our surprise, more accurate than results from two-class classifiers.展开更多
Background:In the field of genetic diagnostics,DNA sequencing is an important tool because the depth and complexity of this field have major implications in light of the genetic architectures of diseases and the ident...Background:In the field of genetic diagnostics,DNA sequencing is an important tool because the depth and complexity of this field have major implications in light of the genetic architectures of diseases and the identification of risk factors associated with genetic disorders.Methods:Our study introduces a novel two-tiered analytical framework to raise the precision and reliability of genetic data interpretation.It is initiated by extracting and analyzing salient features from DNA sequences through a CNN-based feature analysis,taking advantage of the power inherent in Convolutional neural networks(CNNs)to attain complex patterns and minute mutations in genetic data.This study embraces an elite collection of machine learning classifiers interweaved through a stern voting mechanism,which synergistically joins the predictions made from multiple classifiers to generate comprehensive and well-balanced interpretations of the genetic data.Results:This state-of-the-art method was further tested by carrying out an empirical analysis on a variants'dataset of DNA sequences taken from patients affected by breast cancer,juxtaposed with a control group composed of healthy people.Thus,the integration of CNNs with a voting-based ensemble of classifiers returned outstanding outcomes,with performance metrics accuracy,precision,recall,and F1-scorereaching the outstanding rate of 0.88,outperforming previous models.Conclusions:This dual accomplishment underlines the transformative potential that integrating deep learning techniques with ensemble machine learning might provide in real added value for further genetic diagnostics and prognostics.These results from this study set a new benchmark in the accuracy of disease diagnosis through DNA sequencing and promise future studies on improved personalized medicine and healthcare approaches with precise genetic information.展开更多
Background:Stomach cancer(SC)is one of the most lethal malignancies worldwide due to late-stage diagnosis and limited treatment.The transcriptomic,epigenomic,and proteomic,etc.,omics datasets generated by high-through...Background:Stomach cancer(SC)is one of the most lethal malignancies worldwide due to late-stage diagnosis and limited treatment.The transcriptomic,epigenomic,and proteomic,etc.,omics datasets generated by high-throughput sequencing technology have become prominent in biomedical research,and they reveal molecular aspects of cancer diagnosis and therapy.Despite the development of advanced sequencing technology,the presence of high-dimensionality in multi-omics data makes it challenging to interpret the data.Methods:In this study,we introduce RankXLAN,an explainable ensemble-based multi-omics framework that integrates feature selection(FS),ensemble learning,bioinformatics,and in-silico validation for robust biomarker detection,potential therapeutic drug-repurposing candidates’identification,and classification of SC.To enhance the interpretability of the model,we incorporated explainable artificial intelligence(SHapley Additive exPlanations analysis),as well as accuracy,precision,F1-score,recall,cross-validation,specificity,likelihood ratio(LR)+,LR−,and Youden index results.Results:The experimental results showed that the top four FS algorithms achieved improved results when applied to the ensemble learning classification model.The proposed ensemble model produced an area under the curve(AUC)score of 0.994 for gene expression,0.97 for methylation,and 0.96 for miRNA expression data.Through the integration of bioinformatics and ML approach of the transcriptomic and epigenomic multi-omics dataset,we identified potential marker genes,namely,UBE2D2,HPCAL4,IGHA1,DPT,and FN3K.In-silico molecular docking revealed a strong binding affinity between ANKRD13C and the FDA-approved drug Everolimus(binding affinity−10.1 kcal/mol),identifying ANKRD13C as a potential therapeutic drug-repurposing target for SC.Conclusion:The proposed framework RankXLAN outperforms other existing frameworks for serum biomarker identification,therapeutic target identification,and SC classification with multi-omics datasets.展开更多
To improve the performance of the multiple classifier system, a new method of feature-decision level fusion is proposed based on knowledge discovery. In the new method, the base classifiers operate on different featur...To improve the performance of the multiple classifier system, a new method of feature-decision level fusion is proposed based on knowledge discovery. In the new method, the base classifiers operate on different feature spaces and their types depend on different measures of between-class separability. The uncertainty measures corresponding to each output of each base classifier are induced from the established decision tables (DTs) in the form of mass function in the Dempster-Shafer theory (DST). Furthermore, an effective fusion framework is built at the feature-decision level on the basis of a generalized rough set model and the DST. The experiment for the classification of hyperspectral remote sensing images shows that the performance of the classification can be improved by the proposed method compared with that of plurality voting (PV).展开更多
The turbo air classifier is widely used powder classification equipment in a variety of fields. The flow field characteristics of the turbo air classifier are important basis for the improvement of the turbo air class...The turbo air classifier is widely used powder classification equipment in a variety of fields. The flow field characteristics of the turbo air classifier are important basis for the improvement of the turbo air classifier's structural design. The flow field characteristics of the rotor cage in turbo air classifiers were investigated trader different operating conditions by laser Doppler velocimeter(LDV), and a measure diminishing the axial velocity is proposed. The investigation results show that the tangential velocity of the air flow inside the rotor cage is different from the rotary speed of the rotor cage on the same measurement point due to the influences of both the negative pressure at the exit and the rotation of the rotor cage. The tangential velocity of the air flow likewise decreases as the radius decreases in the case of the rotor cage's low rotary speed. In contrast, the tangential velocity of the air flow increases as the radius decreases in the case of the rotor cage's high rotary speed. Meanwhile, the vortex inside the rotor cage is found to occur near the pressure side of the blade when the rotor cage's rotary speed is less than the tangential velocity of air flow. On the contrary, the vortex is found to occur near the blade suction side once the rotor cage's rotary speed is higher than the tangential velocity of air flow. Inside the rotor cage, the axial velocity could not be disregarded and is largely determined by the distances between the measurement point and the exit.展开更多
Predicting stock price movements is a challenging task for academicians and practitioners. In particular, forecasting price movements in emerging markets seems to be more elusive because they are usually more volatile...Predicting stock price movements is a challenging task for academicians and practitioners. In particular, forecasting price movements in emerging markets seems to be more elusive because they are usually more volatile often accompa-nied by thin trading-volumes and they are susceptible to more manipulation compared to mature markets. Technical analysis of stocks and commodities has become a science on its own;quantitative methods and techniques have been applied by many practitioners to forecast price movements. Lagging and sometimes leading technical indicators pro-vide rich quantitative tools for traders and investors in their attempt to gain advantage when making investment or trading decisions. Artificial Neural Networks (ANN) have been used widely in predicting stock prices because of their capability in capturing the non-linearity that often exists in price movements. Recently, Polynomial Classifiers (PC) have been applied to various recognition and classification application and showed favorable results in terms of recog-nition rates and computational complexity as compared to ANN. In this paper, we present two prediction models for predicting securities’ prices. The first model was developed using back propagation feed forward neural networks. The second model was developed using polynomial classifiers (PC), as a first time application for PC to be used in stock prices prediction. The inputs to both models were identical, and both models were trained and tested on the same data. The study was conducted on Dubai Financial Market as an emerging market and applied to two of the market’s leading stocks. In general, both models achieved very good results in terms of mean absolute error percentage. Both models show an average error around 1.5% predicting the next day price, an average error of 2.5% when predicting second day price, and an average error of 4% when predicted the third day price.展开更多
The classification performance of model coal mill classifiers with different bottom incoming flow inlets was experimentally and numerically studied.The flow field adjacent to two neighboring impeller blades was measur...The classification performance of model coal mill classifiers with different bottom incoming flow inlets was experimentally and numerically studied.The flow field adjacent to two neighboring impeller blades was measured using the particle image velocimetry technique.The results showed that the flow field adjacent to two neighboring blades with the swirling inlet was significantly different from that with the non-swirling inlet.With the swirling inlet,there was a vortex located between two neighboring blades,while with the nonswirling inlet,the vortex was attached to the blade tip.The vorticity of the vortex with the non-swirling inlet was much lower than that with the swirling inlet.The classifier with the non-swirling inlet demonstrated a larger cut size than that with the swirling inlet when the impeller was stationary(~0 r·min-1).As the impeller rotational speed increased,the cut size of the cases with non-swirling and swirling inlets both decreased,and the one with the non-swirling inlet decreased more dramatically.The values of the cut size of the two classifiers were close to each other at a high impeller rotational speed(≥120 r·min-1).The overall separation efficiency of the classifier with the non-swirling inlet was lower than that with the swirling inlet,and monotonically increased as the impeller rotational speed increased.With the swirling inlet,the overall separation efficiency first increased with the impeller rotational speed and then decreased when the rotational speed was above 120 r·min-1,and the variation trend of the separation efficiency was more moderate.As the initial particle concentration increased,the cut sizes of both swirling and non-swirling inlet cases decreased first and then barely changed.At a low initial particle concentration(b 0.04 kg·m-3),the classifier with the swirling inlet had a larger cut size than that with the non-swirling inlet.展开更多
This study investigated the efficiency of learning the Chinese numeral classifiers by L2 Chinese learners by means of an alignment-oriented task. Participants were a total of 96 intermediate learners of L2 Chinese, wh...This study investigated the efficiency of learning the Chinese numeral classifiers by L2 Chinese learners by means of an alignment-oriented task. Participants were a total of 96 intermediate learners of L2 Chinese, who were randomly assigned to two experimental groups and one control group, with each group consisting of 32 participants. The continuation task used in this study consisted of a picture-based Chinese text depicting a room with an array of objects, which necessitates the use of classifiers. The two experimental groups were both required to first read the text and then write to describe their own rooms in comparison with the one in the text. One group was instructed to use the classifiers from the text as much as possible in their writing, whereas the other was not required to do so. Participants in the control group were first given the picture to look at in the absence of the text and then asked to describe their own rooms. The results showed that the continuation task significantly enhanced participants’ retention of the Chinese numeral classifiers, suggesting that the alignment-based approach is an effective way to learn difficult linguistic categories such as the Chinese classifiers.展开更多
Biometric recognition refers to the identification of individuals through their unique behavioral features(e.g.,fingerprint,face,and iris).We need distinguishing characteristics to identify people,such as fingerprints...Biometric recognition refers to the identification of individuals through their unique behavioral features(e.g.,fingerprint,face,and iris).We need distinguishing characteristics to identify people,such as fingerprints,which are world-renowned as the most reliablemethod to identify people.The recognition of fingerprints has become a standard procedure in forensics,and different techniques are available for this purpose.Most current techniques lack interest in image enhancement and rely on high-dimensional features to generate classification models.Therefore,we proposed an effective fingerprint classification method for classifying the fingerprint image as authentic or altered since criminals and hackers routinely change their fingerprints to generate fake ones.In order to improve fingerprint classification accuracy,our proposed method used the most effective texture features and classifiers.Discriminant Analysis(DCA)and Gaussian Discriminant Analysis(GDA)are employed as classifiers,along with Histogram of Oriented Gradient(HOG)and Segmentation-based Feature Texture Analysis(SFTA)feature vectors as inputs.The performance of the classifiers is determined by assessing a range of feature sets,and the most accurate results are obtained.The proposed method is tested using a Sokoto Coventry Fingerprint Dataset(SOCOFing).The SOCOFing project includes 6,000 fingerprint images collected from 600 African people whose fingerprints were taken ten times.Three distinct degrees of obliteration,central rotation,and z-cut have been performed to obtain synthetically altered replicas of the genuine fingerprints.The proposal achieved massive success with a classification accuracy reaching 99%.The experimental results indicate that the proposed method for fingerprint classification is feasible and effective.The experiments also showed that the proposed SFTA-based GDA method outperformed state-of-art approaches in feature dimension and classification accuracy.展开更多
The rapid growth of multimedia content necessitates powerful technologies to filter, classify, index and retrieve video documents more efficiently. However, the essential bottleneck of image and video analysis is the ...The rapid growth of multimedia content necessitates powerful technologies to filter, classify, index and retrieve video documents more efficiently. However, the essential bottleneck of image and video analysis is the problem of semantic gap that low level features extracted by computers always fail to coincide with high-level concepts interpreted by humans. In this paper, we present a generic scheme for the detection video semantic concepts based on multiple visual features machine learning. Various global and local low-level visual features are systelrtically investigated, and kernelbased learning method equips the concept detection system to explore the potential of these features. Then we combine the different features and sub-systen on both classifier-level and kernel-level fusion that contribute to a more robust system Our proposed system is tested on the TRECVID dataset. The resulted Mean Average Precision (MAP) score is rmch better than the benchmark perforrmnce, which proves that our concepts detection engine develops a generic model and perforrrs well on both object and scene type concepts.展开更多
The methods for combining multiple classifiers based on belief functions require to work with a common and complete(closed)Frame of Discernment(Fo D)on which the belief functions are defined before making their combin...The methods for combining multiple classifiers based on belief functions require to work with a common and complete(closed)Frame of Discernment(Fo D)on which the belief functions are defined before making their combination.This theoretical requirement is however difficult to satisfy in practice because some abnormal(or unknown)objects that do not belong to any predefined class of the Fo D can appear in real classification applications.The classifiers learnt using different attributes information can provide complementary knowledge which is very useful for making the classification but they are usually based on different Fo Ds.In order to clearly identify the specific class of the abnormal objects,we propose a new method for combination of classifiers working with incomplete frames of discernment,named CCIF for short.This is a progressive detection method that select and add the detected abnormal objects to the training data set.Because one pattern can be considered as an abnormal object by one classifier and be committed to a specific class by another one,a weighted evidence combination method is proposed to fuse the classification results of multiple classifiers.This new method offers the advantage to make a refined classification of abnormal objects,and to improve the classification accuracy thanks to the complementarity of the classifiers.Some experimental results are given to validate the effectiveness of the proposed method using real data sets.展开更多
Radio Frequency Identification (RFID) is wireless technology that has been designed to automatically identify tagged objects using a reader. Several applications of this technology have been introduced in past literat...Radio Frequency Identification (RFID) is wireless technology that has been designed to automatically identify tagged objects using a reader. Several applications of this technology have been introduced in past literature such as pet identification and luggage tracking which have increased the efficiency and effectiveness of each environment into which it was integrated. However, due to the ambiguous nature of the captured information with the existence of missing, wrong and duplicate readings, the wide-scale adoption of the architecture is limited to commercial sectors where the integrity of the observations can tolerate ambiguity. In this work, we propose an application of RFID to take the reporting of class attendance and to integrate a predictive classifier to extract high level meaningful information that can be used in diverse areas such as scheduling and low student retention. We conclude by providing an analysis of the core strengths and opportunities that exist for this concept and how we might extend it in future research.展开更多
This study modeled the effects of structural and dimensional manipulations on hydrodynamic behavior of a bench vertical current classifier. Computational fluid dynamics (CFD) approach was used as modeling method, an...This study modeled the effects of structural and dimensional manipulations on hydrodynamic behavior of a bench vertical current classifier. Computational fluid dynamics (CFD) approach was used as modeling method, and turbulent intensity and fluid velocity were applied as system responses to predict the over- flow cut size variations. These investigations showed that cut size would decrease by increasing diameter and height of the separation column and cone section depth, due to the decrease of turbulent intensity and fluid velocity. As the size of discharge gate increases, the overflow cut-size would decrease due to freely fluid stream out of the column. The overflow cut-size was significantly increased in downward fed classifier compared to that fed by upward fluid stream. In addition, reforming the shape of angular overflow outlet's weir into the curved form prevented stream inside returning and consequently unselec- tire cut-size decreasing.展开更多
Wind energy is considered as a alternative renewable energy source due to its low operating cost when compared with other sources.The wind turbine is an essential system used to change kinetic energy into electrical e...Wind energy is considered as a alternative renewable energy source due to its low operating cost when compared with other sources.The wind turbine is an essential system used to change kinetic energy into electrical energy.Wind turbine blades,in particular,require a competitive condition inspection approach as it is a significant component of the wind turbine system that costs around 20-25 percent of the total turbine cost.The main objective of this study is to differentiate between various blade faults which affect the wind turbine blade under operating conditions using a machine learning approach through histogram features.In this study,blade bend,hub-blade loose connection,blade erosion,pitch angle twist,and blade cracks were simulated on the blade.This problem is formulated as a machine learning problem which consists of three phases,namely feature extraction,feature selection and feature classification.Histogram features are extracted from vibration signals and feature selection was carried out using the J48 decision tree algorithm.Feature classification was performed using 15 tree classifiers.The results of the machine learning classifiers were compared with respect to their accuracy percentage and a better model is suggested for real-time monitoring of a wind turbine blade.展开更多
Word Sense Disambiguation (WSD) is to decide the sense of an ambiguous word on particular context. Most of current studies on WSD only use several ambiguous words as test samples, thus leads to some limitation in prac...Word Sense Disambiguation (WSD) is to decide the sense of an ambiguous word on particular context. Most of current studies on WSD only use several ambiguous words as test samples, thus leads to some limitation in practical application. In this paper, we perform WSD study based on large scale real-world corpus using two unsupervised learning algorithms based on ±n-improved Bayesian model and Dependency Grammar (DG)-improved Bayesian model. ±n-improved classifiers reduce the window size of context of ambiguous words with close-distance feature extraction method, and decrease the jamming of useless features, thus obviously improve the accuracy, reaching 83.18% (in open test). DG-improved classifier can more effectively conquer the noise effect existing in Naive-Bayesian classifier. Experimental results show that this approach does better on Chinese WSD, and the open test achieved an accuracy of 86.27%.展开更多
Human action recognition has become one of the most active research topics in human-computer interaction and artificial intelligence, and has attracted much attention. Here, we employ a low-cost optical sensor Kinect ...Human action recognition has become one of the most active research topics in human-computer interaction and artificial intelligence, and has attracted much attention. Here, we employ a low-cost optical sensor Kinect to capture the action information of the human skeleton. We then propose a two-level hierarchical human action recognition model with self-selection classifiers via skeleton data. Especially different optimal classifiers are selected by probability voting mechanism and 10 times 10-fold cross validation at different coarse grained levels. Extensive simulations on a well-known open dataset and results demonstrate that our proposed method is efficient in human action recognition, achieving 94.19%the average recognition rate and 95.61% the best rate.展开更多
The multi-voxel pattern analysis technique is applied to fMRI data for classification of high-level brain functions using pattern information distributed over multiple voxels. In this paper, we propose a classifier en...The multi-voxel pattern analysis technique is applied to fMRI data for classification of high-level brain functions using pattern information distributed over multiple voxels. In this paper, we propose a classifier ensemble for multiclass classification in fMRI analysis, exploiting the fact that specific neighboring voxels can contain spatial pattern information. The proposed method converts the multiclass classification to a pairwise classifier ensemble, and each pairwise classifier consists of multiple sub-clas- sifiers using an adaptive feature set for each class-pair. Simulated and real fMRI data were used to verify the proposed method. Intra- and inter-subject analyses were performed to compare the proposed method with several well-known classitiers, including single and ensemble classifiers. The comparison results showed that the proposed method can be generally applied to multiclass classification in both simulations and real fMRI analyses.展开更多
In various application areas of pattern recognition, combing multiple classifiers is regarded as a new method for achieving a substantial gain in performance of systems. This paper discusses the properties of the dive...In various application areas of pattern recognition, combing multiple classifiers is regarded as a new method for achieving a substantial gain in performance of systems. This paper discusses the properties of the diversity of classifiers and its applications. At the same time, the paper presents a novel method for combining multiple classifiers based on the diversity. Fusion strategies are discussed for providing a basis for combing classifiers. These combination strategies are experimentally tested on online handwritten Chinese character recognition system and their effectiveness is considered.展开更多
基金Supported by the National Natural Science Foundation of China(60603029)the Natural Science Foundation of Jiangsu Province(BK2007074)the Natural Science Foundation for Colleges and Universities in Jiangsu Province(06KJB520132)~~
文摘One-class support vector machine (OCSVM) and support vector data description (SVDD) are two main domain-based one-class (kernel) classifiers. To reveal their relationship with density estimation in the case of the Gaussian kernel, OCSVM and SVDD are firstly unified into the framework of kernel density estimation, and the essential relationship between them is explicitly revealed. Then the result proves that the density estimation induced by OCSVM or SVDD is in agreement with the true density. Meanwhile, it can also reduce the integrated squared error (ISE). Finally, experiments on several simulated datasets verify the revealed relationships.
文摘Fuel injectors are considered as an important component of combustion engines. Operational weakness can possibly lead to the complete machine malfunction, decreasing reliability and leading to loss of production. To overcome these circumstances, various condition monitoring techniques can be applied. The application of acoustic signals is common in the field of fault diagnosis of rotating machinery. Advanced signal processing is utilized for the construction of features that are specialized in detecting fuel injector faults. A performance comparison between novelty detection algorithms in the form of one-class classifiers is presented. The one-class classifiers that were tested included One-Class Support Vector Machine (OCSVM) and One-Class Self Organizing Map (OCSOM). The acoustic signals of fuel injectors in different operational conditions were processed for feature extraction. Features from all the signals were used as input to the one-class classifiers. The one-class classifiers were trained only with healthy fuel injector conditions and compared with new experimental data which belonged to different operational conditions that were not included in the training set so as to contribute to generalization. The results present the effectiveness of one-class classifiers for detecting faults in fuel injectors.
文摘microRNAs (miRNAs) are short nucleotide sequences expressed by a genome that are involved in post transcriptional modulation of gene expression. Since miRNAs need to be co-expressed with their target mRNA to observe an effect and since miRNAs and target interactions can be cooperative, it is currently not possible to develop a comprehensive experimental atlas of miRNAs and their targets. To overcome this limitation, machine learning has been applied to miRNA detection. In general binary learning (two-class) approaches are applied to miRNA discovery. These learners consider both positive (miRNA) and negative (non-miRNA) examples during the training process. One-class classifiers, on the other hand, use only the information for the target class (miRNA). The one-class approach in machine learning is gradually receiving more attention particularly for solving problems where the negative class is not well defined. This is especially true for miRNAs where the positive class can be experimentally confirmed relatively easy, but where it is not currently possible to call any part of a genome a non-miRNA. To do that, it should be co-expressed with all other possible transcripts of the genome, which currently is a futile endeavor. For machine learning, miRNAs need to be transformed into a feature vector and some currently used features like minimum free energy vary widely in the case of plant miRNAs. In this study it was our aim to analyze different methods applying one-class approaches and the effectiveness of motif-based features for prediction of plant miRNA genes. We show that the application of these one-class classifiers is promising and useful for this kind of problem which relies only on sequence- based features such as k-mers and motifs comparing to the results from two-class classification. In some cases the results of one-class are, to our surprise, more accurate than results from two-class classifiers.
文摘Background:In the field of genetic diagnostics,DNA sequencing is an important tool because the depth and complexity of this field have major implications in light of the genetic architectures of diseases and the identification of risk factors associated with genetic disorders.Methods:Our study introduces a novel two-tiered analytical framework to raise the precision and reliability of genetic data interpretation.It is initiated by extracting and analyzing salient features from DNA sequences through a CNN-based feature analysis,taking advantage of the power inherent in Convolutional neural networks(CNNs)to attain complex patterns and minute mutations in genetic data.This study embraces an elite collection of machine learning classifiers interweaved through a stern voting mechanism,which synergistically joins the predictions made from multiple classifiers to generate comprehensive and well-balanced interpretations of the genetic data.Results:This state-of-the-art method was further tested by carrying out an empirical analysis on a variants'dataset of DNA sequences taken from patients affected by breast cancer,juxtaposed with a control group composed of healthy people.Thus,the integration of CNNs with a voting-based ensemble of classifiers returned outstanding outcomes,with performance metrics accuracy,precision,recall,and F1-scorereaching the outstanding rate of 0.88,outperforming previous models.Conclusions:This dual accomplishment underlines the transformative potential that integrating deep learning techniques with ensemble machine learning might provide in real added value for further genetic diagnostics and prognostics.These results from this study set a new benchmark in the accuracy of disease diagnosis through DNA sequencing and promise future studies on improved personalized medicine and healthcare approaches with precise genetic information.
基金the Deanship of Research and Graduate Studies at King Khalid University,KSA,for funding this work through the Large Research Project under grant number RGP2/164/46.
文摘Background:Stomach cancer(SC)is one of the most lethal malignancies worldwide due to late-stage diagnosis and limited treatment.The transcriptomic,epigenomic,and proteomic,etc.,omics datasets generated by high-throughput sequencing technology have become prominent in biomedical research,and they reveal molecular aspects of cancer diagnosis and therapy.Despite the development of advanced sequencing technology,the presence of high-dimensionality in multi-omics data makes it challenging to interpret the data.Methods:In this study,we introduce RankXLAN,an explainable ensemble-based multi-omics framework that integrates feature selection(FS),ensemble learning,bioinformatics,and in-silico validation for robust biomarker detection,potential therapeutic drug-repurposing candidates’identification,and classification of SC.To enhance the interpretability of the model,we incorporated explainable artificial intelligence(SHapley Additive exPlanations analysis),as well as accuracy,precision,F1-score,recall,cross-validation,specificity,likelihood ratio(LR)+,LR−,and Youden index results.Results:The experimental results showed that the top four FS algorithms achieved improved results when applied to the ensemble learning classification model.The proposed ensemble model produced an area under the curve(AUC)score of 0.994 for gene expression,0.97 for methylation,and 0.96 for miRNA expression data.Through the integration of bioinformatics and ML approach of the transcriptomic and epigenomic multi-omics dataset,we identified potential marker genes,namely,UBE2D2,HPCAL4,IGHA1,DPT,and FN3K.In-silico molecular docking revealed a strong binding affinity between ANKRD13C and the FDA-approved drug Everolimus(binding affinity−10.1 kcal/mol),identifying ANKRD13C as a potential therapeutic drug-repurposing target for SC.Conclusion:The proposed framework RankXLAN outperforms other existing frameworks for serum biomarker identification,therapeutic target identification,and SC classification with multi-omics datasets.
文摘To improve the performance of the multiple classifier system, a new method of feature-decision level fusion is proposed based on knowledge discovery. In the new method, the base classifiers operate on different feature spaces and their types depend on different measures of between-class separability. The uncertainty measures corresponding to each output of each base classifier are induced from the established decision tables (DTs) in the form of mass function in the Dempster-Shafer theory (DST). Furthermore, an effective fusion framework is built at the feature-decision level on the basis of a generalized rough set model and the DST. The experiment for the classification of hyperspectral remote sensing images shows that the performance of the classification can be improved by the proposed method compared with that of plurality voting (PV).
基金supported by National Natural Science Foundation of China (Grant No. 50474035)
文摘The turbo air classifier is widely used powder classification equipment in a variety of fields. The flow field characteristics of the turbo air classifier are important basis for the improvement of the turbo air classifier's structural design. The flow field characteristics of the rotor cage in turbo air classifiers were investigated trader different operating conditions by laser Doppler velocimeter(LDV), and a measure diminishing the axial velocity is proposed. The investigation results show that the tangential velocity of the air flow inside the rotor cage is different from the rotary speed of the rotor cage on the same measurement point due to the influences of both the negative pressure at the exit and the rotation of the rotor cage. The tangential velocity of the air flow likewise decreases as the radius decreases in the case of the rotor cage's low rotary speed. In contrast, the tangential velocity of the air flow increases as the radius decreases in the case of the rotor cage's high rotary speed. Meanwhile, the vortex inside the rotor cage is found to occur near the pressure side of the blade when the rotor cage's rotary speed is less than the tangential velocity of air flow. On the contrary, the vortex is found to occur near the blade suction side once the rotor cage's rotary speed is higher than the tangential velocity of air flow. Inside the rotor cage, the axial velocity could not be disregarded and is largely determined by the distances between the measurement point and the exit.
文摘Predicting stock price movements is a challenging task for academicians and practitioners. In particular, forecasting price movements in emerging markets seems to be more elusive because they are usually more volatile often accompa-nied by thin trading-volumes and they are susceptible to more manipulation compared to mature markets. Technical analysis of stocks and commodities has become a science on its own;quantitative methods and techniques have been applied by many practitioners to forecast price movements. Lagging and sometimes leading technical indicators pro-vide rich quantitative tools for traders and investors in their attempt to gain advantage when making investment or trading decisions. Artificial Neural Networks (ANN) have been used widely in predicting stock prices because of their capability in capturing the non-linearity that often exists in price movements. Recently, Polynomial Classifiers (PC) have been applied to various recognition and classification application and showed favorable results in terms of recog-nition rates and computational complexity as compared to ANN. In this paper, we present two prediction models for predicting securities’ prices. The first model was developed using back propagation feed forward neural networks. The second model was developed using polynomial classifiers (PC), as a first time application for PC to be used in stock prices prediction. The inputs to both models were identical, and both models were trained and tested on the same data. The study was conducted on Dubai Financial Market as an emerging market and applied to two of the market’s leading stocks. In general, both models achieved very good results in terms of mean absolute error percentage. Both models show an average error around 1.5% predicting the next day price, an average error of 2.5% when predicting second day price, and an average error of 4% when predicted the third day price.
基金financial support from the National Key Technologies R&D Program of China(2018YFF0216002)。
文摘The classification performance of model coal mill classifiers with different bottom incoming flow inlets was experimentally and numerically studied.The flow field adjacent to two neighboring impeller blades was measured using the particle image velocimetry technique.The results showed that the flow field adjacent to two neighboring blades with the swirling inlet was significantly different from that with the non-swirling inlet.With the swirling inlet,there was a vortex located between two neighboring blades,while with the nonswirling inlet,the vortex was attached to the blade tip.The vorticity of the vortex with the non-swirling inlet was much lower than that with the swirling inlet.The classifier with the non-swirling inlet demonstrated a larger cut size than that with the swirling inlet when the impeller was stationary(~0 r·min-1).As the impeller rotational speed increased,the cut size of the cases with non-swirling and swirling inlets both decreased,and the one with the non-swirling inlet decreased more dramatically.The values of the cut size of the two classifiers were close to each other at a high impeller rotational speed(≥120 r·min-1).The overall separation efficiency of the classifier with the non-swirling inlet was lower than that with the swirling inlet,and monotonically increased as the impeller rotational speed increased.With the swirling inlet,the overall separation efficiency first increased with the impeller rotational speed and then decreased when the rotational speed was above 120 r·min-1,and the variation trend of the separation efficiency was more moderate.As the initial particle concentration increased,the cut sizes of both swirling and non-swirling inlet cases decreased first and then barely changed.At a low initial particle concentration(b 0.04 kg·m-3),the classifier with the swirling inlet had a larger cut size than that with the non-swirling inlet.
文摘This study investigated the efficiency of learning the Chinese numeral classifiers by L2 Chinese learners by means of an alignment-oriented task. Participants were a total of 96 intermediate learners of L2 Chinese, who were randomly assigned to two experimental groups and one control group, with each group consisting of 32 participants. The continuation task used in this study consisted of a picture-based Chinese text depicting a room with an array of objects, which necessitates the use of classifiers. The two experimental groups were both required to first read the text and then write to describe their own rooms in comparison with the one in the text. One group was instructed to use the classifiers from the text as much as possible in their writing, whereas the other was not required to do so. Participants in the control group were first given the picture to look at in the absence of the text and then asked to describe their own rooms. The results showed that the continuation task significantly enhanced participants’ retention of the Chinese numeral classifiers, suggesting that the alignment-based approach is an effective way to learn difficult linguistic categories such as the Chinese classifiers.
文摘Biometric recognition refers to the identification of individuals through their unique behavioral features(e.g.,fingerprint,face,and iris).We need distinguishing characteristics to identify people,such as fingerprints,which are world-renowned as the most reliablemethod to identify people.The recognition of fingerprints has become a standard procedure in forensics,and different techniques are available for this purpose.Most current techniques lack interest in image enhancement and rely on high-dimensional features to generate classification models.Therefore,we proposed an effective fingerprint classification method for classifying the fingerprint image as authentic or altered since criminals and hackers routinely change their fingerprints to generate fake ones.In order to improve fingerprint classification accuracy,our proposed method used the most effective texture features and classifiers.Discriminant Analysis(DCA)and Gaussian Discriminant Analysis(GDA)are employed as classifiers,along with Histogram of Oriented Gradient(HOG)and Segmentation-based Feature Texture Analysis(SFTA)feature vectors as inputs.The performance of the classifiers is determined by assessing a range of feature sets,and the most accurate results are obtained.The proposed method is tested using a Sokoto Coventry Fingerprint Dataset(SOCOFing).The SOCOFing project includes 6,000 fingerprint images collected from 600 African people whose fingerprints were taken ten times.Three distinct degrees of obliteration,central rotation,and z-cut have been performed to obtain synthetically altered replicas of the genuine fingerprints.The proposal achieved massive success with a classification accuracy reaching 99%.The experimental results indicate that the proposed method for fingerprint classification is feasible and effective.The experiments also showed that the proposed SFTA-based GDA method outperformed state-of-art approaches in feature dimension and classification accuracy.
基金Acknowledgements This paper was supported by the coUabomtive Research Project SEV under Cant No. 01100474 between Beijing University of Posts and Telecorrrcnications and France Telecom R&D Beijing the National Natural Science Foundation of China under Cant No. 90920001 the Caduate Innovation Fund of SICE, BUPT, 2011.
文摘The rapid growth of multimedia content necessitates powerful technologies to filter, classify, index and retrieve video documents more efficiently. However, the essential bottleneck of image and video analysis is the problem of semantic gap that low level features extracted by computers always fail to coincide with high-level concepts interpreted by humans. In this paper, we present a generic scheme for the detection video semantic concepts based on multiple visual features machine learning. Various global and local low-level visual features are systelrtically investigated, and kernelbased learning method equips the concept detection system to explore the potential of these features. Then we combine the different features and sub-systen on both classifier-level and kernel-level fusion that contribute to a more robust system Our proposed system is tested on the TRECVID dataset. The resulted Mean Average Precision (MAP) score is rmch better than the benchmark perforrmnce, which proves that our concepts detection engine develops a generic model and perforrrs well on both object and scene type concepts.
基金partially supported by National Natural Science Foundation of China(Nos.U20B2067,61790552,61790554)Shaanxi Science Fund for Distinguished Young Scholars,China(No.2018JC-006)。
文摘The methods for combining multiple classifiers based on belief functions require to work with a common and complete(closed)Frame of Discernment(Fo D)on which the belief functions are defined before making their combination.This theoretical requirement is however difficult to satisfy in practice because some abnormal(or unknown)objects that do not belong to any predefined class of the Fo D can appear in real classification applications.The classifiers learnt using different attributes information can provide complementary knowledge which is very useful for making the classification but they are usually based on different Fo Ds.In order to clearly identify the specific class of the abnormal objects,we propose a new method for combination of classifiers working with incomplete frames of discernment,named CCIF for short.This is a progressive detection method that select and add the detected abnormal objects to the training data set.Because one pattern can be considered as an abnormal object by one classifier and be committed to a specific class by another one,a weighted evidence combination method is proposed to fuse the classification results of multiple classifiers.This new method offers the advantage to make a refined classification of abnormal objects,and to improve the classification accuracy thanks to the complementarity of the classifiers.Some experimental results are given to validate the effectiveness of the proposed method using real data sets.
文摘Radio Frequency Identification (RFID) is wireless technology that has been designed to automatically identify tagged objects using a reader. Several applications of this technology have been introduced in past literature such as pet identification and luggage tracking which have increased the efficiency and effectiveness of each environment into which it was integrated. However, due to the ambiguous nature of the captured information with the existence of missing, wrong and duplicate readings, the wide-scale adoption of the architecture is limited to commercial sectors where the integrity of the observations can tolerate ambiguity. In this work, we propose an application of RFID to take the reporting of class attendance and to integrate a predictive classifier to extract high level meaningful information that can be used in diverse areas such as scheduling and low student retention. We conclude by providing an analysis of the core strengths and opportunities that exist for this concept and how we might extend it in future research.
基金financially supported by INVENTIVE~ Mineral Processing Research Center of Iran
文摘This study modeled the effects of structural and dimensional manipulations on hydrodynamic behavior of a bench vertical current classifier. Computational fluid dynamics (CFD) approach was used as modeling method, and turbulent intensity and fluid velocity were applied as system responses to predict the over- flow cut size variations. These investigations showed that cut size would decrease by increasing diameter and height of the separation column and cone section depth, due to the decrease of turbulent intensity and fluid velocity. As the size of discharge gate increases, the overflow cut-size would decrease due to freely fluid stream out of the column. The overflow cut-size was significantly increased in downward fed classifier compared to that fed by upward fluid stream. In addition, reforming the shape of angular overflow outlet's weir into the curved form prevented stream inside returning and consequently unselec- tire cut-size decreasing.
文摘Wind energy is considered as a alternative renewable energy source due to its low operating cost when compared with other sources.The wind turbine is an essential system used to change kinetic energy into electrical energy.Wind turbine blades,in particular,require a competitive condition inspection approach as it is a significant component of the wind turbine system that costs around 20-25 percent of the total turbine cost.The main objective of this study is to differentiate between various blade faults which affect the wind turbine blade under operating conditions using a machine learning approach through histogram features.In this study,blade bend,hub-blade loose connection,blade erosion,pitch angle twist,and blade cracks were simulated on the blade.This problem is formulated as a machine learning problem which consists of three phases,namely feature extraction,feature selection and feature classification.Histogram features are extracted from vibration signals and feature selection was carried out using the J48 decision tree algorithm.Feature classification was performed using 15 tree classifiers.The results of the machine learning classifiers were compared with respect to their accuracy percentage and a better model is suggested for real-time monitoring of a wind turbine blade.
基金Supported by the National Natural Science Foundation of China (No.60435020).
文摘Word Sense Disambiguation (WSD) is to decide the sense of an ambiguous word on particular context. Most of current studies on WSD only use several ambiguous words as test samples, thus leads to some limitation in practical application. In this paper, we perform WSD study based on large scale real-world corpus using two unsupervised learning algorithms based on ±n-improved Bayesian model and Dependency Grammar (DG)-improved Bayesian model. ±n-improved classifiers reduce the window size of context of ambiguous words with close-distance feature extraction method, and decrease the jamming of useless features, thus obviously improve the accuracy, reaching 83.18% (in open test). DG-improved classifier can more effectively conquer the noise effect existing in Naive-Bayesian classifier. Experimental results show that this approach does better on Chinese WSD, and the open test achieved an accuracy of 86.27%.
基金Supported by the National Nature Science Foundation of China under Grant Nos.11475003,61603003,and 11471093the Key Project of Cultivation of Leading Talents in Universities of Anhui Province under Grant No.gxfxZD2016174+2 种基金Funds of Integration of Cloud Computing and Big DataInnovation of Science and Technology of Ministry of Education of China under Grant No.2017A09116Anhui Provincial Department of Education Outstanding Top-Notch Talent-Funded Project under Grant No.gxbjZD26
文摘Human action recognition has become one of the most active research topics in human-computer interaction and artificial intelligence, and has attracted much attention. Here, we employ a low-cost optical sensor Kinect to capture the action information of the human skeleton. We then propose a two-level hierarchical human action recognition model with self-selection classifiers via skeleton data. Especially different optimal classifiers are selected by probability voting mechanism and 10 times 10-fold cross validation at different coarse grained levels. Extensive simulations on a well-known open dataset and results demonstrate that our proposed method is efficient in human action recognition, achieving 94.19%the average recognition rate and 95.61% the best rate.
文摘The multi-voxel pattern analysis technique is applied to fMRI data for classification of high-level brain functions using pattern information distributed over multiple voxels. In this paper, we propose a classifier ensemble for multiclass classification in fMRI analysis, exploiting the fact that specific neighboring voxels can contain spatial pattern information. The proposed method converts the multiclass classification to a pairwise classifier ensemble, and each pairwise classifier consists of multiple sub-clas- sifiers using an adaptive feature set for each class-pair. Simulated and real fMRI data were used to verify the proposed method. Intra- and inter-subject analyses were performed to compare the proposed method with several well-known classitiers, including single and ensemble classifiers. The comparison results showed that the proposed method can be generally applied to multiclass classification in both simulations and real fMRI analyses.
文摘In various application areas of pattern recognition, combing multiple classifiers is regarded as a new method for achieving a substantial gain in performance of systems. This paper discusses the properties of the diversity of classifiers and its applications. At the same time, the paper presents a novel method for combining multiple classifiers based on the diversity. Fusion strategies are discussed for providing a basis for combing classifiers. These combination strategies are experimentally tested on online handwritten Chinese character recognition system and their effectiveness is considered.