Background:Stomach cancer(SC)is one of the most lethal malignancies worldwide due to late-stage diagnosis and limited treatment.The transcriptomic,epigenomic,and proteomic,etc.,omics datasets generated by high-through...Background:Stomach cancer(SC)is one of the most lethal malignancies worldwide due to late-stage diagnosis and limited treatment.The transcriptomic,epigenomic,and proteomic,etc.,omics datasets generated by high-throughput sequencing technology have become prominent in biomedical research,and they reveal molecular aspects of cancer diagnosis and therapy.Despite the development of advanced sequencing technology,the presence of high-dimensionality in multi-omics data makes it challenging to interpret the data.Methods:In this study,we introduce RankXLAN,an explainable ensemble-based multi-omics framework that integrates feature selection(FS),ensemble learning,bioinformatics,and in-silico validation for robust biomarker detection,potential therapeutic drug-repurposing candidates’identification,and classification of SC.To enhance the interpretability of the model,we incorporated explainable artificial intelligence(SHapley Additive exPlanations analysis),as well as accuracy,precision,F1-score,recall,cross-validation,specificity,likelihood ratio(LR)+,LR−,and Youden index results.Results:The experimental results showed that the top four FS algorithms achieved improved results when applied to the ensemble learning classification model.The proposed ensemble model produced an area under the curve(AUC)score of 0.994 for gene expression,0.97 for methylation,and 0.96 for miRNA expression data.Through the integration of bioinformatics and ML approach of the transcriptomic and epigenomic multi-omics dataset,we identified potential marker genes,namely,UBE2D2,HPCAL4,IGHA1,DPT,and FN3K.In-silico molecular docking revealed a strong binding affinity between ANKRD13C and the FDA-approved drug Everolimus(binding affinity−10.1 kcal/mol),identifying ANKRD13C as a potential therapeutic drug-repurposing target for SC.Conclusion:The proposed framework RankXLAN outperforms other existing frameworks for serum biomarker identification,therapeutic target identification,and SC classification with multi-omics datasets.展开更多
Driven by both the“new engineering”initiative and the energy revolution,the traditional engineering education model can hardly meet the demand of the energy and electric power industry for diversified and interdisci...Driven by both the“new engineering”initiative and the energy revolution,the traditional engineering education model can hardly meet the demand of the energy and electric power industry for diversified and interdisciplinary outstanding engineers.Based on the“industry-university-research-application”four-in-one collaborative education concept,this paper constructs a new training system centered on classified cultivation and classified evaluation.The system aims to solve core problems such as homogeneous training,disconnection between industry and academia,single evaluation method,and insufficient faculty.Through measures including modular courses,the dual-tutor system,and diversified practical platforms,it realizes differentiated and precise talent training,so as to deliver outstanding engineers with the ability to“define problems,break through technologies,and create value”for the energy and electric power industry.展开更多
Machine learning techniques and a dataset of five wells from the Rawat oilfield in Sudan containing 93,925 samples per feature(seven well logs and one facies log) were used to classify four facies. Data preprocessing ...Machine learning techniques and a dataset of five wells from the Rawat oilfield in Sudan containing 93,925 samples per feature(seven well logs and one facies log) were used to classify four facies. Data preprocessing and preparation involve two processes: data cleaning and feature scaling. Several machine learning algorithms, including Linear Regression(LR), Decision Tree(DT), Support Vector Machine(SVM),Random Forest(RF), and Gradient Boosting(GB) for classification, were tested using different iterations and various combinations of features and parameters. The support vector radial kernel training model achieved an accuracy of 72.49% without grid search and 64.02% with grid search, while the blind-well test scores were 71.01% and 69.67%, respectively. The Decision Tree(DT) Hyperparameter Optimization model showed an accuracy of 64.15% for training and 67.45% for testing. In comparison, the Decision Tree coupled with grid search yielded better results, with a training score of 69.91% and a testing score of67.89%. The model's validation was carried out using the blind well validation approach, which achieved an accuracy of 69.81%. Three algorithms were used to generate the gradient-boosting model. During training, the Gradient Boosting classifier achieved an accuracy score of 71.57%, and during testing, it achieved 69.89%. The Grid Search model achieved a higher accuracy score of 72.14% during testing. The Extreme Gradient Boosting model had the lowest accuracy score, with only 66.13% for training and66.12% for testing. For validation, the Gradient Boosting(GB) classifier model achieved an accuracy score of 75.41% on the blind well test, while the Gradient Boosting with Grid Search achieved an accuracy score of 71.36%. The Enhanced Random Forest and Random Forest with Bagging algorithms were the most effective, with validation accuracies of 78.30% and 79.18%, respectively. However, the Random Forest and Random Forest with Grid Search models displayed significant variance between their training and testing scores, indicating the potential for overfitting. Random Forest(RF) and Gradient Boosting(GB) are highly effective for facies classification because they handle complex relationships and provide high predictive accuracy. The choice between the two depends on specific project requirements, including interpretability, computational resources, and data nature.展开更多
Background:In the field of genetic diagnostics,DNA sequencing is an important tool because the depth and complexity of this field have major implications in light of the genetic architectures of diseases and the ident...Background:In the field of genetic diagnostics,DNA sequencing is an important tool because the depth and complexity of this field have major implications in light of the genetic architectures of diseases and the identification of risk factors associated with genetic disorders.Methods:Our study introduces a novel two-tiered analytical framework to raise the precision and reliability of genetic data interpretation.It is initiated by extracting and analyzing salient features from DNA sequences through a CNN-based feature analysis,taking advantage of the power inherent in Convolutional neural networks(CNNs)to attain complex patterns and minute mutations in genetic data.This study embraces an elite collection of machine learning classifiers interweaved through a stern voting mechanism,which synergistically joins the predictions made from multiple classifiers to generate comprehensive and well-balanced interpretations of the genetic data.Results:This state-of-the-art method was further tested by carrying out an empirical analysis on a variants'dataset of DNA sequences taken from patients affected by breast cancer,juxtaposed with a control group composed of healthy people.Thus,the integration of CNNs with a voting-based ensemble of classifiers returned outstanding outcomes,with performance metrics accuracy,precision,recall,and F1-scorereaching the outstanding rate of 0.88,outperforming previous models.Conclusions:This dual accomplishment underlines the transformative potential that integrating deep learning techniques with ensemble machine learning might provide in real added value for further genetic diagnostics and prognostics.These results from this study set a new benchmark in the accuracy of disease diagnosis through DNA sequencing and promise future studies on improved personalized medicine and healthcare approaches with precise genetic information.展开更多
Human Activity Recognition(HAR)in drone-captured videos has become popular because of the interest in various fields such as video surveillance,sports analysis,and human-robot interaction.However,recognizing actions f...Human Activity Recognition(HAR)in drone-captured videos has become popular because of the interest in various fields such as video surveillance,sports analysis,and human-robot interaction.However,recognizing actions from such videos poses the following challenges:variations of human motion,the complexity of backdrops,motion blurs,occlusions,and restricted camera angles.This research presents a human activity recognition system to address these challenges by working with drones’red-green-blue(RGB)videos.The first step in the proposed system involves partitioning videos into frames and then using bilateral filtering to improve the quality of object foregrounds while reducing background interference before converting from RGB to grayscale images.The YOLO(You Only Look Once)algorithm detects and extracts humans from each frame,obtaining their skeletons for further processing.The joint angles,displacement and velocity,histogram of oriented gradients(HOG),3D points,and geodesic Distance are included.These features are optimized using Quadratic Discriminant Analysis(QDA)and utilized in a Neuro-Fuzzy Classifier(NFC)for activity classification.Real-world evaluations on the Drone-Action,Unmanned Aerial Vehicle(UAV)-Gesture,and Okutama-Action datasets substantiate the proposed system’s superiority in accuracy rates over existing methods.In particular,the system obtains recognition rates of 93%for drone action,97%for UAV gestures,and 81%for Okutama-action,demonstrating the system’s reliability and ability to learn human activity from drone videos.展开更多
Machinery condition monitoring is beneficial to equipment maintenance and has been receiving much attention from academia and industry.Machine learning,especially deep learning,has become popular for machinery conditi...Machinery condition monitoring is beneficial to equipment maintenance and has been receiving much attention from academia and industry.Machine learning,especially deep learning,has become popular for machinery condition monitoring because that can fully use available data and computational power.Since significant accidents might be caused if wrong fault alarms are given for machine condition monitoring,interpretable machine learning models,integrate signal processing knowledge to enhance trustworthiness of models,are gradually becoming a research hotspot.A previous spectrum-based and interpretable optimized weights method has been proposed to indicate faulty and fundamental frequencies when the analyzed data only contains a healthy type and a fault type.Considering that multiclass fault types are naturally met in practice,this work aims to explore the interpretable optimized weights method for multiclass fault type scenarios.Therefore,a new multiclass optimized weights spectrum(OWS)is proposed and further studied theoretically and numerically.It is found that the multiclass OWS is capable of capturing the characteristic components associated with different conditions and clearly indicating specific fault characteristic frequencies(FCFs)corresponding to each fault condition.This work can provide new insights into spectrum-based fault classification models,and the new multiclass OWS also shows great potential for practical applications.展开更多
The categorization of brain tumors is a significant issue for healthcare applications.Perfect and timely identification of brain tumors is important for employing an effective treatment of this disease.Brain tumors po...The categorization of brain tumors is a significant issue for healthcare applications.Perfect and timely identification of brain tumors is important for employing an effective treatment of this disease.Brain tumors possess high changes in terms of size,shape,and amount,and hence the classification process acts as a more difficult research problem.This paper suggests a deep learning model using the magnetic resonance imaging technique that overcomes the limitations associated with the existing classification methods.The effectiveness of the suggested method depends on the coyote optimization algorithm,also known as the LOBO algorithm,which optimizes the weights of the deep-convolutional neural network classifier.The accuracy,sensitivity,and specificity indices,which are obtained to be 92.40%,94.15%,and 91.92%,respectively,are used to validate the effectiveness of the suggested method.The result suggests that the suggested strategy is superior for effectively classifying brain tumors.展开更多
The increasing risk of ground pressure disasters resulting from deep well mining highlights the urgent need for advanced monitoring and early warning systems.Ground pressure monitoring,supported by microseismic techno...The increasing risk of ground pressure disasters resulting from deep well mining highlights the urgent need for advanced monitoring and early warning systems.Ground pressure monitoring,supported by microseismic technology,plays a pivotal role in ensuring mine safety by enabling real-time identifi cation and accurate classification of vibration signals such as microseismic signals,blasting signals,and noise.These classifications are critical for improving the efficacy of ground pressure monitoring systems,conducting stability analyses of deep rock masses,and implementing timely and precise roadway support measures.Such eff orts are essential for mitigating ground pressure disasters and ensuring safe mining operations.This study proposes an artificial intelligence-based automatic classification network model for mine vibration signals.Based on conventional convolutional neural networks,the proposed model further incorporates long short-term memory(LSTM)networks and attention mechanisms.The LSTM component eff ectively captures temporal correlations in time-series mining vibration data,while the attention mechanism enhances the models’ability to focus on critical features within the data.To validate the eff ectiveness of our proposed model,a dataset comprising 480,526 waveform records collected in 2022 by the microseismic monitoring system at Guangxi Shanhu Tungsten Mine was used for training,validation,and testing purposes.Results demonstrate that the proposed artifi cial intelligence-based classifi cation method achieves a higher recognition accuracy of 92.21%,significantly outperforming traditional manual classification methods.The proposed model represents a signifi cant advancement in ground pressure monitoring and disaster mitigation.展开更多
Biometric recognition refers to the identification of individuals through their unique behavioral features(e.g.,fingerprint,face,and iris).We need distinguishing characteristics to identify people,such as fingerprints...Biometric recognition refers to the identification of individuals through their unique behavioral features(e.g.,fingerprint,face,and iris).We need distinguishing characteristics to identify people,such as fingerprints,which are world-renowned as the most reliablemethod to identify people.The recognition of fingerprints has become a standard procedure in forensics,and different techniques are available for this purpose.Most current techniques lack interest in image enhancement and rely on high-dimensional features to generate classification models.Therefore,we proposed an effective fingerprint classification method for classifying the fingerprint image as authentic or altered since criminals and hackers routinely change their fingerprints to generate fake ones.In order to improve fingerprint classification accuracy,our proposed method used the most effective texture features and classifiers.Discriminant Analysis(DCA)and Gaussian Discriminant Analysis(GDA)are employed as classifiers,along with Histogram of Oriented Gradient(HOG)and Segmentation-based Feature Texture Analysis(SFTA)feature vectors as inputs.The performance of the classifiers is determined by assessing a range of feature sets,and the most accurate results are obtained.The proposed method is tested using a Sokoto Coventry Fingerprint Dataset(SOCOFing).The SOCOFing project includes 6,000 fingerprint images collected from 600 African people whose fingerprints were taken ten times.Three distinct degrees of obliteration,central rotation,and z-cut have been performed to obtain synthetically altered replicas of the genuine fingerprints.The proposal achieved massive success with a classification accuracy reaching 99%.The experimental results indicate that the proposed method for fingerprint classification is feasible and effective.The experiments also showed that the proposed SFTA-based GDA method outperformed state-of-art approaches in feature dimension and classification accuracy.展开更多
The rise of fake news on social media has had a detrimental effect on society. Numerous performance evaluations on classifiers that can detect fake news have previously been undertaken by researchers in this area. To ...The rise of fake news on social media has had a detrimental effect on society. Numerous performance evaluations on classifiers that can detect fake news have previously been undertaken by researchers in this area. To assess their performance, we used 14 different classifiers in this study. Secondly, we looked at how soft voting and hard voting classifiers performed in a mixture of distinct individual classifiers. Finally, heuristics are used to create 9 models of stacking classifiers. The F1 score, prediction, recall, and accuracy have all been used to assess performance. Models 6 and 7 achieved the best accuracy of 96.13 while having a larger computational complexity. For benchmarking purposes, other individual classifiers are also tested.展开更多
To improve the performance of the multiple classifier system, a new method of feature-decision level fusion is proposed based on knowledge discovery. In the new method, the base classifiers operate on different featur...To improve the performance of the multiple classifier system, a new method of feature-decision level fusion is proposed based on knowledge discovery. In the new method, the base classifiers operate on different feature spaces and their types depend on different measures of between-class separability. The uncertainty measures corresponding to each output of each base classifier are induced from the established decision tables (DTs) in the form of mass function in the Dempster-Shafer theory (DST). Furthermore, an effective fusion framework is built at the feature-decision level on the basis of a generalized rough set model and the DST. The experiment for the classification of hyperspectral remote sensing images shows that the performance of the classification can be improved by the proposed method compared with that of plurality voting (PV).展开更多
This paper proposed an algorithm in which the maximum probability and the weighted average strategy were used for the combination of member classifiers. Using parallel computing, we test the algorithm on a China-Brazi...This paper proposed an algorithm in which the maximum probability and the weighted average strategy were used for the combination of member classifiers. Using parallel computing, we test the algorithm on a China-Brazil Earth Resources Satellite (CBERS) image for land cover classification. The results show that using three computers in parallel can reduce the classification time by 30%, as compared with using only one computer with a dual core processor. The accuracy of the final image is 93.34%, and Kappa is 0.92. Multiple classifier combination can enhance the precision of the image classification, and parallel computing can increase the speed of calculation so that it becomes possible to process remote sensing images with high efficiency and accuracy.展开更多
The suitable process parameters for a two-stage turbo air classifier are important for obtaining the ultrafine powder that has a narrow particle-size distribution, however little has been published internationally on ...The suitable process parameters for a two-stage turbo air classifier are important for obtaining the ultrafine powder that has a narrow particle-size distribution, however little has been published internationally on the classification process for the two-stage turbo air classifier in series. The influence of the process parameters of a two-stage turbo air classifier in series on classification performance is empirically studied by using aluminum oxide powders as the experimental material. The experimental results show the following: 1) When the rotor cage rotary speed of the first-stage classifier is increased from 2 300 r/min to 2 500 r/min with a constant rotor cage rotary speed of the second-stage classifier, classification precision is increased from 0.64 to 0.67. However, in this case, the final ultrafine powder yield is decreased from 79% to 74%, which means the classification precision and the final ultrafine powder yield can be regulated through adjusting the rotor cage rotary speed of the first-stage classifier. 2) When the rotor cage rotary speed of the second-stage classifier is increased from 2 500 r/min to 3 100 r/min with a constant rotor cage rotary speed of the first-stage classifier, the cut size is decreased from 13.16 μm to 8.76 μm, which means the cut size of the ultrafine powder can be regulated through adjusting the rotor cage rotary speed of the second-stage classifier. 3) When the feeding speed is increased from 35 kg/h to 50 kg/h, the 'fish-hook' effect is strengthened, which makes the ultrafine powder yield decrease. 4) To weaken the 'fish-hook' effect, the equalization of the two-stage wind speeds or the combination of a high first-stage wind speed with a low second-stage wind speed should be selected. This empirical study provides a criterion of process parameter configurations for a two-stage or multi-stage classifier in series, which offers a theoretical basis for practical production.展开更多
Automatic modulation classification is the process of identification of the modulation type of a signal in a general environment. This paper proposes a new method to evaluate the tracking performance of large margin c...Automatic modulation classification is the process of identification of the modulation type of a signal in a general environment. This paper proposes a new method to evaluate the tracking performance of large margin classifier against signal-tonoise ratio (SNR), and classifies all forms of primary user's signals in a cognitive radio environment. For achieving this objective, two structures of a large margin are developed in additive white Gaussian noise (AWGN) channels with priori unknown SNR. A combination of higher order statistics and instantaneous characteristics is selected as effective features. Simulation results show that the classification rates of the proposed structures are well robust against environmental SNR changes.展开更多
Roman Urdu has been used for text messaging over the Internet for years especially in Indo-Pak Subcontinent.Persons from the subcontinent may speak the same Urdu language but they might be using different scripts for ...Roman Urdu has been used for text messaging over the Internet for years especially in Indo-Pak Subcontinent.Persons from the subcontinent may speak the same Urdu language but they might be using different scripts for writing.The communication using the Roman characters,which are used in the script of Urdu language on social media,is now considered the most typical standard of communication in an Indian landmass that makes it an expensive information supply.English Text classification is a solved problem but there have been only a few efforts to examine the rich information supply of Roman Urdu in the past.This is due to the numerous complexities involved in the processing of Roman Urdu data.The complexities associated with Roman Urdu include the non-availability of the tagged corpus,lack of a set of rules,and lack of standardized spellings.A large amount of Roman Urdu news data is available on mainstream news websites and social media websites like Facebook,Twitter but meaningful information can only be extracted if data is in a structured format.We have developed a Roman Urdu news headline classifier,which will help to classify news into relevant categories on which further analysis and modeling can be done.The author of this research aims to develop the Roman Urdu news classifier,which will classify the news into five categories(health,business,technology,sports,international).First,we will develop the news dataset using scraping tools and then after preprocessing,we will compare the results of different machine learning algorithms like Logistic Regression(LR),Multinomial Naïve Bayes(MNB),Long short term memory(LSTM),and Convolutional Neural Network(CNN).After this,we will use a phonetic algorithm to control lexical variation and test news from different websites.The preliminary results suggest that a more accurate classification can be accomplished by monitoring noise inside data and by classifying the news.After applying above mentioned different machine learning algorithms,results have shown that Multinomial Naïve Bayes classifier is giving the best accuracy of 90.17%which is due to the noise lexical variation.展开更多
This paper presents a supervised classification method of sonar image, which takes advantages of both multi-fractal theory and wavelet analysis. In the process of feature extraction, image transformation and wavelet d...This paper presents a supervised classification method of sonar image, which takes advantages of both multi-fractal theory and wavelet analysis. In the process of feature extraction, image transformation and wavelet decomposition are combined and a feature set based on multi-fractal dimension is obtained. In the part of classifier construction, the Learning Vector Quantization (LVQ) network is adopted as a classifier. Experiments of sonar image classification were carried out with satisfactory results, which verify the effectiveness of this method.展开更多
Leukemia is blood cancer, including bone marrow and lymphatictissues, typically involving white blood cells. Leukemia produces an abnormalamount of white blood cells compared to normal blood. Deoxyribonucleicacid (DNA...Leukemia is blood cancer, including bone marrow and lymphatictissues, typically involving white blood cells. Leukemia produces an abnormalamount of white blood cells compared to normal blood. Deoxyribonucleicacid (DNA) microarrays provide reliable medical diagnostic services to helpmore patients find the proposed treatment for infections. DNA microarraysare also known as biochips that consist of microscopic DNA spots attachedto a solid glass surface. Currently, it is difficult to classify cancers usingmicroarray data. Nearly many data mining techniques have failed becauseof the small sample size, which has become more critical for organizations.However, they are not highly effective in improving results and are frequently employed by doctors for cancer diagnosis. This study proposes a novelmethod using machine learning algorithms based on microarrays of leukemiaGSE9476 cells. The main aim was to predict the initial leukemia disease.Machine learning algorithms such as decision tree (DT), naive bayes (NB),random forest (RF), gradient boosting machine (GBM), linear regression(LinR), support vector machine (SVM), and novel approach based on thecombination of Logistic Regression (LR), DT and SVM named as ensembleLDSVM model. The k-fold cross-validation and grid search optimizationmethods were used with the LDSVM model to classify leukemia in patientsand comparatively analyze their impacts. The proposed approach evaluatedbetter accuracy, precision, recall, and f1 scores than the other algorithms.Furthermore, the results were relatively assessed, which showed LDSVMperformance. This study aims to successfully predict leukemia in patientsand enhance prediction accuracy in minimum time. Moreover, a Syntheticminority oversampling technique (SMOTE) and Principal compenent analysis(PCA) approaches were implemented. This makes the records generalized andevaluates the outcomes well. PCA reduces the feature count without losing anyinformation and deals with class imbalanced datasets, as well as faster modelexecution along with less computation cost. In this study, a novel processwas used to reduce the column results to develop a faster and more rapidexperiment execution.展开更多
Obstructive Sleep Apnea(OSA)is a respiratory syndrome that occurs due to insufficient airflow through the respiratory or respiratory arrest while sleeping and sometimes due to the reduced oxygen saturation.The aim of ...Obstructive Sleep Apnea(OSA)is a respiratory syndrome that occurs due to insufficient airflow through the respiratory or respiratory arrest while sleeping and sometimes due to the reduced oxygen saturation.The aim of this paper is to analyze the respiratory signal of a person to detect the Normal Breathing Activity and the Sleep Apnea(SA)activity.In the proposed method,the time domain and frequency domain features of respiration signal obtained from the PPG device are extracted.These features are applied to the Classification and Regression Tree(CART)-Particle Swarm Optimization(PSO)classifier which classifies the signal into normal breathing signal and sleep apnea signal.The proposed method is validated to measure the performance metrics like sensitivity,specificity,accuracy and F1 score by applying time domain and frequency domain features separately.Additionally,the performance of the CART-PSO(CPSO)classification algorithm is evaluated through comparing its measures with existing classification algorithms.Concurrently,the effect of the PSO algorithm in the classifier is validated by varying the parameters of PSO.展开更多
基金the Deanship of Research and Graduate Studies at King Khalid University,KSA,for funding this work through the Large Research Project under grant number RGP2/164/46.
文摘Background:Stomach cancer(SC)is one of the most lethal malignancies worldwide due to late-stage diagnosis and limited treatment.The transcriptomic,epigenomic,and proteomic,etc.,omics datasets generated by high-throughput sequencing technology have become prominent in biomedical research,and they reveal molecular aspects of cancer diagnosis and therapy.Despite the development of advanced sequencing technology,the presence of high-dimensionality in multi-omics data makes it challenging to interpret the data.Methods:In this study,we introduce RankXLAN,an explainable ensemble-based multi-omics framework that integrates feature selection(FS),ensemble learning,bioinformatics,and in-silico validation for robust biomarker detection,potential therapeutic drug-repurposing candidates’identification,and classification of SC.To enhance the interpretability of the model,we incorporated explainable artificial intelligence(SHapley Additive exPlanations analysis),as well as accuracy,precision,F1-score,recall,cross-validation,specificity,likelihood ratio(LR)+,LR−,and Youden index results.Results:The experimental results showed that the top four FS algorithms achieved improved results when applied to the ensemble learning classification model.The proposed ensemble model produced an area under the curve(AUC)score of 0.994 for gene expression,0.97 for methylation,and 0.96 for miRNA expression data.Through the integration of bioinformatics and ML approach of the transcriptomic and epigenomic multi-omics dataset,we identified potential marker genes,namely,UBE2D2,HPCAL4,IGHA1,DPT,and FN3K.In-silico molecular docking revealed a strong binding affinity between ANKRD13C and the FDA-approved drug Everolimus(binding affinity−10.1 kcal/mol),identifying ANKRD13C as a potential therapeutic drug-repurposing target for SC.Conclusion:The proposed framework RankXLAN outperforms other existing frameworks for serum biomarker identification,therapeutic target identification,and SC classification with multi-omics datasets.
文摘Driven by both the“new engineering”initiative and the energy revolution,the traditional engineering education model can hardly meet the demand of the energy and electric power industry for diversified and interdisciplinary outstanding engineers.Based on the“industry-university-research-application”four-in-one collaborative education concept,this paper constructs a new training system centered on classified cultivation and classified evaluation.The system aims to solve core problems such as homogeneous training,disconnection between industry and academia,single evaluation method,and insufficient faculty.Through measures including modular courses,the dual-tutor system,and diversified practical platforms,it realizes differentiated and precise talent training,so as to deliver outstanding engineers with the ability to“define problems,break through technologies,and create value”for the energy and electric power industry.
文摘Machine learning techniques and a dataset of five wells from the Rawat oilfield in Sudan containing 93,925 samples per feature(seven well logs and one facies log) were used to classify four facies. Data preprocessing and preparation involve two processes: data cleaning and feature scaling. Several machine learning algorithms, including Linear Regression(LR), Decision Tree(DT), Support Vector Machine(SVM),Random Forest(RF), and Gradient Boosting(GB) for classification, were tested using different iterations and various combinations of features and parameters. The support vector radial kernel training model achieved an accuracy of 72.49% without grid search and 64.02% with grid search, while the blind-well test scores were 71.01% and 69.67%, respectively. The Decision Tree(DT) Hyperparameter Optimization model showed an accuracy of 64.15% for training and 67.45% for testing. In comparison, the Decision Tree coupled with grid search yielded better results, with a training score of 69.91% and a testing score of67.89%. The model's validation was carried out using the blind well validation approach, which achieved an accuracy of 69.81%. Three algorithms were used to generate the gradient-boosting model. During training, the Gradient Boosting classifier achieved an accuracy score of 71.57%, and during testing, it achieved 69.89%. The Grid Search model achieved a higher accuracy score of 72.14% during testing. The Extreme Gradient Boosting model had the lowest accuracy score, with only 66.13% for training and66.12% for testing. For validation, the Gradient Boosting(GB) classifier model achieved an accuracy score of 75.41% on the blind well test, while the Gradient Boosting with Grid Search achieved an accuracy score of 71.36%. The Enhanced Random Forest and Random Forest with Bagging algorithms were the most effective, with validation accuracies of 78.30% and 79.18%, respectively. However, the Random Forest and Random Forest with Grid Search models displayed significant variance between their training and testing scores, indicating the potential for overfitting. Random Forest(RF) and Gradient Boosting(GB) are highly effective for facies classification because they handle complex relationships and provide high predictive accuracy. The choice between the two depends on specific project requirements, including interpretability, computational resources, and data nature.
文摘Background:In the field of genetic diagnostics,DNA sequencing is an important tool because the depth and complexity of this field have major implications in light of the genetic architectures of diseases and the identification of risk factors associated with genetic disorders.Methods:Our study introduces a novel two-tiered analytical framework to raise the precision and reliability of genetic data interpretation.It is initiated by extracting and analyzing salient features from DNA sequences through a CNN-based feature analysis,taking advantage of the power inherent in Convolutional neural networks(CNNs)to attain complex patterns and minute mutations in genetic data.This study embraces an elite collection of machine learning classifiers interweaved through a stern voting mechanism,which synergistically joins the predictions made from multiple classifiers to generate comprehensive and well-balanced interpretations of the genetic data.Results:This state-of-the-art method was further tested by carrying out an empirical analysis on a variants'dataset of DNA sequences taken from patients affected by breast cancer,juxtaposed with a control group composed of healthy people.Thus,the integration of CNNs with a voting-based ensemble of classifiers returned outstanding outcomes,with performance metrics accuracy,precision,recall,and F1-scorereaching the outstanding rate of 0.88,outperforming previous models.Conclusions:This dual accomplishment underlines the transformative potential that integrating deep learning techniques with ensemble machine learning might provide in real added value for further genetic diagnostics and prognostics.These results from this study set a new benchmark in the accuracy of disease diagnosis through DNA sequencing and promise future studies on improved personalized medicine and healthcare approaches with precise genetic information.
基金funded by the Open Access Initiative of the University of Bremen and the DFG via SuUB Bremen.Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2024R348),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Human Activity Recognition(HAR)in drone-captured videos has become popular because of the interest in various fields such as video surveillance,sports analysis,and human-robot interaction.However,recognizing actions from such videos poses the following challenges:variations of human motion,the complexity of backdrops,motion blurs,occlusions,and restricted camera angles.This research presents a human activity recognition system to address these challenges by working with drones’red-green-blue(RGB)videos.The first step in the proposed system involves partitioning videos into frames and then using bilateral filtering to improve the quality of object foregrounds while reducing background interference before converting from RGB to grayscale images.The YOLO(You Only Look Once)algorithm detects and extracts humans from each frame,obtaining their skeletons for further processing.The joint angles,displacement and velocity,histogram of oriented gradients(HOG),3D points,and geodesic Distance are included.These features are optimized using Quadratic Discriminant Analysis(QDA)and utilized in a Neuro-Fuzzy Classifier(NFC)for activity classification.Real-world evaluations on the Drone-Action,Unmanned Aerial Vehicle(UAV)-Gesture,and Okutama-Action datasets substantiate the proposed system’s superiority in accuracy rates over existing methods.In particular,the system obtains recognition rates of 93%for drone action,97%for UAV gestures,and 81%for Okutama-action,demonstrating the system’s reliability and ability to learn human activity from drone videos.
基金supported by the National Natural Science Foundation of China under Grant Nos.523B2043 and 52475112.
文摘Machinery condition monitoring is beneficial to equipment maintenance and has been receiving much attention from academia and industry.Machine learning,especially deep learning,has become popular for machinery condition monitoring because that can fully use available data and computational power.Since significant accidents might be caused if wrong fault alarms are given for machine condition monitoring,interpretable machine learning models,integrate signal processing knowledge to enhance trustworthiness of models,are gradually becoming a research hotspot.A previous spectrum-based and interpretable optimized weights method has been proposed to indicate faulty and fundamental frequencies when the analyzed data only contains a healthy type and a fault type.Considering that multiclass fault types are naturally met in practice,this work aims to explore the interpretable optimized weights method for multiclass fault type scenarios.Therefore,a new multiclass optimized weights spectrum(OWS)is proposed and further studied theoretically and numerically.It is found that the multiclass OWS is capable of capturing the characteristic components associated with different conditions and clearly indicating specific fault characteristic frequencies(FCFs)corresponding to each fault condition.This work can provide new insights into spectrum-based fault classification models,and the new multiclass OWS also shows great potential for practical applications.
文摘The categorization of brain tumors is a significant issue for healthcare applications.Perfect and timely identification of brain tumors is important for employing an effective treatment of this disease.Brain tumors possess high changes in terms of size,shape,and amount,and hence the classification process acts as a more difficult research problem.This paper suggests a deep learning model using the magnetic resonance imaging technique that overcomes the limitations associated with the existing classification methods.The effectiveness of the suggested method depends on the coyote optimization algorithm,also known as the LOBO algorithm,which optimizes the weights of the deep-convolutional neural network classifier.The accuracy,sensitivity,and specificity indices,which are obtained to be 92.40%,94.15%,and 91.92%,respectively,are used to validate the effectiveness of the suggested method.The result suggests that the suggested strategy is superior for effectively classifying brain tumors.
基金supported in part by the National Science Fund for Distinguished Young Scholars under Grant (42025403)the National Key Research and Development Plan of China (2021YFA0716800)the National Key Research and Development Plan of China (2022YFC2903804)。
文摘The increasing risk of ground pressure disasters resulting from deep well mining highlights the urgent need for advanced monitoring and early warning systems.Ground pressure monitoring,supported by microseismic technology,plays a pivotal role in ensuring mine safety by enabling real-time identifi cation and accurate classification of vibration signals such as microseismic signals,blasting signals,and noise.These classifications are critical for improving the efficacy of ground pressure monitoring systems,conducting stability analyses of deep rock masses,and implementing timely and precise roadway support measures.Such eff orts are essential for mitigating ground pressure disasters and ensuring safe mining operations.This study proposes an artificial intelligence-based automatic classification network model for mine vibration signals.Based on conventional convolutional neural networks,the proposed model further incorporates long short-term memory(LSTM)networks and attention mechanisms.The LSTM component eff ectively captures temporal correlations in time-series mining vibration data,while the attention mechanism enhances the models’ability to focus on critical features within the data.To validate the eff ectiveness of our proposed model,a dataset comprising 480,526 waveform records collected in 2022 by the microseismic monitoring system at Guangxi Shanhu Tungsten Mine was used for training,validation,and testing purposes.Results demonstrate that the proposed artifi cial intelligence-based classifi cation method achieves a higher recognition accuracy of 92.21%,significantly outperforming traditional manual classification methods.The proposed model represents a signifi cant advancement in ground pressure monitoring and disaster mitigation.
文摘Biometric recognition refers to the identification of individuals through their unique behavioral features(e.g.,fingerprint,face,and iris).We need distinguishing characteristics to identify people,such as fingerprints,which are world-renowned as the most reliablemethod to identify people.The recognition of fingerprints has become a standard procedure in forensics,and different techniques are available for this purpose.Most current techniques lack interest in image enhancement and rely on high-dimensional features to generate classification models.Therefore,we proposed an effective fingerprint classification method for classifying the fingerprint image as authentic or altered since criminals and hackers routinely change their fingerprints to generate fake ones.In order to improve fingerprint classification accuracy,our proposed method used the most effective texture features and classifiers.Discriminant Analysis(DCA)and Gaussian Discriminant Analysis(GDA)are employed as classifiers,along with Histogram of Oriented Gradient(HOG)and Segmentation-based Feature Texture Analysis(SFTA)feature vectors as inputs.The performance of the classifiers is determined by assessing a range of feature sets,and the most accurate results are obtained.The proposed method is tested using a Sokoto Coventry Fingerprint Dataset(SOCOFing).The SOCOFing project includes 6,000 fingerprint images collected from 600 African people whose fingerprints were taken ten times.Three distinct degrees of obliteration,central rotation,and z-cut have been performed to obtain synthetically altered replicas of the genuine fingerprints.The proposal achieved massive success with a classification accuracy reaching 99%.The experimental results indicate that the proposed method for fingerprint classification is feasible and effective.The experiments also showed that the proposed SFTA-based GDA method outperformed state-of-art approaches in feature dimension and classification accuracy.
文摘The rise of fake news on social media has had a detrimental effect on society. Numerous performance evaluations on classifiers that can detect fake news have previously been undertaken by researchers in this area. To assess their performance, we used 14 different classifiers in this study. Secondly, we looked at how soft voting and hard voting classifiers performed in a mixture of distinct individual classifiers. Finally, heuristics are used to create 9 models of stacking classifiers. The F1 score, prediction, recall, and accuracy have all been used to assess performance. Models 6 and 7 achieved the best accuracy of 96.13 while having a larger computational complexity. For benchmarking purposes, other individual classifiers are also tested.
文摘To improve the performance of the multiple classifier system, a new method of feature-decision level fusion is proposed based on knowledge discovery. In the new method, the base classifiers operate on different feature spaces and their types depend on different measures of between-class separability. The uncertainty measures corresponding to each output of each base classifier are induced from the established decision tables (DTs) in the form of mass function in the Dempster-Shafer theory (DST). Furthermore, an effective fusion framework is built at the feature-decision level on the basis of a generalized rough set model and the DST. The experiment for the classification of hyperspectral remote sensing images shows that the performance of the classification can be improved by the proposed method compared with that of plurality voting (PV).
基金Supported by the National Natural Science Foundation of China (70873117)
文摘This paper proposed an algorithm in which the maximum probability and the weighted average strategy were used for the combination of member classifiers. Using parallel computing, we test the algorithm on a China-Brazil Earth Resources Satellite (CBERS) image for land cover classification. The results show that using three computers in parallel can reduce the classification time by 30%, as compared with using only one computer with a dual core processor. The accuracy of the final image is 93.34%, and Kappa is 0.92. Multiple classifier combination can enhance the precision of the image classification, and parallel computing can increase the speed of calculation so that it becomes possible to process remote sensing images with high efficiency and accuracy.
基金supported by National Natural Science Foundation of China (Grant Nos. 51074012, 51204009)
文摘The suitable process parameters for a two-stage turbo air classifier are important for obtaining the ultrafine powder that has a narrow particle-size distribution, however little has been published internationally on the classification process for the two-stage turbo air classifier in series. The influence of the process parameters of a two-stage turbo air classifier in series on classification performance is empirically studied by using aluminum oxide powders as the experimental material. The experimental results show the following: 1) When the rotor cage rotary speed of the first-stage classifier is increased from 2 300 r/min to 2 500 r/min with a constant rotor cage rotary speed of the second-stage classifier, classification precision is increased from 0.64 to 0.67. However, in this case, the final ultrafine powder yield is decreased from 79% to 74%, which means the classification precision and the final ultrafine powder yield can be regulated through adjusting the rotor cage rotary speed of the first-stage classifier. 2) When the rotor cage rotary speed of the second-stage classifier is increased from 2 500 r/min to 3 100 r/min with a constant rotor cage rotary speed of the first-stage classifier, the cut size is decreased from 13.16 μm to 8.76 μm, which means the cut size of the ultrafine powder can be regulated through adjusting the rotor cage rotary speed of the second-stage classifier. 3) When the feeding speed is increased from 35 kg/h to 50 kg/h, the 'fish-hook' effect is strengthened, which makes the ultrafine powder yield decrease. 4) To weaken the 'fish-hook' effect, the equalization of the two-stage wind speeds or the combination of a high first-stage wind speed with a low second-stage wind speed should be selected. This empirical study provides a criterion of process parameter configurations for a two-stage or multi-stage classifier in series, which offers a theoretical basis for practical production.
文摘Automatic modulation classification is the process of identification of the modulation type of a signal in a general environment. This paper proposes a new method to evaluate the tracking performance of large margin classifier against signal-tonoise ratio (SNR), and classifies all forms of primary user's signals in a cognitive radio environment. For achieving this objective, two structures of a large margin are developed in additive white Gaussian noise (AWGN) channels with priori unknown SNR. A combination of higher order statistics and instantaneous characteristics is selected as effective features. Simulation results show that the classification rates of the proposed structures are well robust against environmental SNR changes.
基金This work is supported by the KIAS(Research Number:CG076601)and in part by Sejong University Faculty Research Fund.
文摘Roman Urdu has been used for text messaging over the Internet for years especially in Indo-Pak Subcontinent.Persons from the subcontinent may speak the same Urdu language but they might be using different scripts for writing.The communication using the Roman characters,which are used in the script of Urdu language on social media,is now considered the most typical standard of communication in an Indian landmass that makes it an expensive information supply.English Text classification is a solved problem but there have been only a few efforts to examine the rich information supply of Roman Urdu in the past.This is due to the numerous complexities involved in the processing of Roman Urdu data.The complexities associated with Roman Urdu include the non-availability of the tagged corpus,lack of a set of rules,and lack of standardized spellings.A large amount of Roman Urdu news data is available on mainstream news websites and social media websites like Facebook,Twitter but meaningful information can only be extracted if data is in a structured format.We have developed a Roman Urdu news headline classifier,which will help to classify news into relevant categories on which further analysis and modeling can be done.The author of this research aims to develop the Roman Urdu news classifier,which will classify the news into five categories(health,business,technology,sports,international).First,we will develop the news dataset using scraping tools and then after preprocessing,we will compare the results of different machine learning algorithms like Logistic Regression(LR),Multinomial Naïve Bayes(MNB),Long short term memory(LSTM),and Convolutional Neural Network(CNN).After this,we will use a phonetic algorithm to control lexical variation and test news from different websites.The preliminary results suggest that a more accurate classification can be accomplished by monitoring noise inside data and by classifying the news.After applying above mentioned different machine learning algorithms,results have shown that Multinomial Naïve Bayes classifier is giving the best accuracy of 90.17%which is due to the noise lexical variation.
文摘This paper presents a supervised classification method of sonar image, which takes advantages of both multi-fractal theory and wavelet analysis. In the process of feature extraction, image transformation and wavelet decomposition are combined and a feature set based on multi-fractal dimension is obtained. In the part of classifier construction, the Learning Vector Quantization (LVQ) network is adopted as a classifier. Experiments of sonar image classification were carried out with satisfactory results, which verify the effectiveness of this method.
文摘Leukemia is blood cancer, including bone marrow and lymphatictissues, typically involving white blood cells. Leukemia produces an abnormalamount of white blood cells compared to normal blood. Deoxyribonucleicacid (DNA) microarrays provide reliable medical diagnostic services to helpmore patients find the proposed treatment for infections. DNA microarraysare also known as biochips that consist of microscopic DNA spots attachedto a solid glass surface. Currently, it is difficult to classify cancers usingmicroarray data. Nearly many data mining techniques have failed becauseof the small sample size, which has become more critical for organizations.However, they are not highly effective in improving results and are frequently employed by doctors for cancer diagnosis. This study proposes a novelmethod using machine learning algorithms based on microarrays of leukemiaGSE9476 cells. The main aim was to predict the initial leukemia disease.Machine learning algorithms such as decision tree (DT), naive bayes (NB),random forest (RF), gradient boosting machine (GBM), linear regression(LinR), support vector machine (SVM), and novel approach based on thecombination of Logistic Regression (LR), DT and SVM named as ensembleLDSVM model. The k-fold cross-validation and grid search optimizationmethods were used with the LDSVM model to classify leukemia in patientsand comparatively analyze their impacts. The proposed approach evaluatedbetter accuracy, precision, recall, and f1 scores than the other algorithms.Furthermore, the results were relatively assessed, which showed LDSVMperformance. This study aims to successfully predict leukemia in patientsand enhance prediction accuracy in minimum time. Moreover, a Syntheticminority oversampling technique (SMOTE) and Principal compenent analysis(PCA) approaches were implemented. This makes the records generalized andevaluates the outcomes well. PCA reduces the feature count without losing anyinformation and deals with class imbalanced datasets, as well as faster modelexecution along with less computation cost. In this study, a novel processwas used to reduce the column results to develop a faster and more rapidexperiment execution.
文摘Obstructive Sleep Apnea(OSA)is a respiratory syndrome that occurs due to insufficient airflow through the respiratory or respiratory arrest while sleeping and sometimes due to the reduced oxygen saturation.The aim of this paper is to analyze the respiratory signal of a person to detect the Normal Breathing Activity and the Sleep Apnea(SA)activity.In the proposed method,the time domain and frequency domain features of respiration signal obtained from the PPG device are extracted.These features are applied to the Classification and Regression Tree(CART)-Particle Swarm Optimization(PSO)classifier which classifies the signal into normal breathing signal and sleep apnea signal.The proposed method is validated to measure the performance metrics like sensitivity,specificity,accuracy and F1 score by applying time domain and frequency domain features separately.Additionally,the performance of the CART-PSO(CPSO)classification algorithm is evaluated through comparing its measures with existing classification algorithms.Concurrently,the effect of the PSO algorithm in the classifier is validated by varying the parameters of PSO.