Background:Stomach cancer(SC)is one of the most lethal malignancies worldwide due to late-stage diagnosis and limited treatment.The transcriptomic,epigenomic,and proteomic,etc.,omics datasets generated by high-through...Background:Stomach cancer(SC)is one of the most lethal malignancies worldwide due to late-stage diagnosis and limited treatment.The transcriptomic,epigenomic,and proteomic,etc.,omics datasets generated by high-throughput sequencing technology have become prominent in biomedical research,and they reveal molecular aspects of cancer diagnosis and therapy.Despite the development of advanced sequencing technology,the presence of high-dimensionality in multi-omics data makes it challenging to interpret the data.Methods:In this study,we introduce RankXLAN,an explainable ensemble-based multi-omics framework that integrates feature selection(FS),ensemble learning,bioinformatics,and in-silico validation for robust biomarker detection,potential therapeutic drug-repurposing candidates’identification,and classification of SC.To enhance the interpretability of the model,we incorporated explainable artificial intelligence(SHapley Additive exPlanations analysis),as well as accuracy,precision,F1-score,recall,cross-validation,specificity,likelihood ratio(LR)+,LR−,and Youden index results.Results:The experimental results showed that the top four FS algorithms achieved improved results when applied to the ensemble learning classification model.The proposed ensemble model produced an area under the curve(AUC)score of 0.994 for gene expression,0.97 for methylation,and 0.96 for miRNA expression data.Through the integration of bioinformatics and ML approach of the transcriptomic and epigenomic multi-omics dataset,we identified potential marker genes,namely,UBE2D2,HPCAL4,IGHA1,DPT,and FN3K.In-silico molecular docking revealed a strong binding affinity between ANKRD13C and the FDA-approved drug Everolimus(binding affinity−10.1 kcal/mol),identifying ANKRD13C as a potential therapeutic drug-repurposing target for SC.Conclusion:The proposed framework RankXLAN outperforms other existing frameworks for serum biomarker identification,therapeutic target identification,and SC classification with multi-omics datasets.展开更多
In this paper,we consider the Fisher informations among three classical type β-ensembles when β>0 scales with n satisfying lim βn=∞.We offer the exact order of-the corresponding two Fisher informations,which in...In this paper,we consider the Fisher informations among three classical type β-ensembles when β>0 scales with n satisfying lim βn=∞.We offer the exact order of-the corresponding two Fisher informations,which indicates that theβ-Laguerre ensembles do not satisfy the logarithmic Sobolev inequality.We also give some limit theorems on the extremals of β-Jacobi ensembles for β>0 fixed.展开更多
The burgeoning market for lithium-ion batteries has stimulated a growing need for more reliable battery performance monitoring. Accurate state-of-health(SOH) estimation is critical for ensuring battery operational per...The burgeoning market for lithium-ion batteries has stimulated a growing need for more reliable battery performance monitoring. Accurate state-of-health(SOH) estimation is critical for ensuring battery operational performance. Despite numerous data-driven methods reported in existing research for battery SOH estimation, these methods often exhibit inconsistent performance across different application scenarios. To address this issue and overcome the performance limitations of individual data-driven models,integrating multiple models for SOH estimation has received considerable attention. Ensemble learning(EL) typically leverages the strengths of multiple base models to achieve more robust and accurate outputs. However, the lack of a clear review of current research hinders the further development of ensemble methods in SOH estimation. Therefore, this paper comprehensively reviews multi-model ensemble learning methods for battery SOH estimation. First, existing ensemble methods are systematically categorized into 6 classes based on their combination strategies. Different realizations and underlying connections are meticulously analyzed for each category of EL methods, highlighting distinctions, innovations, and typical applications. Subsequently, these ensemble methods are comprehensively compared in terms of base models, combination strategies, and publication trends. Evaluations across 6 dimensions underscore the outstanding performance of stacking-based ensemble methods. Following this, these ensemble methods are further inspected from the perspectives of weighted ensemble and diversity, aiming to inspire potential approaches for enhancing ensemble performance. Moreover, addressing challenges such as base model selection, measuring model robustness and uncertainty, and interpretability of ensemble models in practical applications is emphasized. Finally, future research prospects are outlined, specifically noting that deep learning ensemble is poised to advance ensemble methods for battery SOH estimation. The convergence of advanced machine learning with ensemble learning is anticipated to yield valuable avenues for research. Accelerated research in ensemble learning holds promising prospects for achieving more accurate and reliable battery SOH estimation under real-world conditions.展开更多
Deep learning algorithms have been rapidly incorporated into many different applications due to the increase in computational power and the availability of massive amounts of data.Recently,both deep learning and ensem...Deep learning algorithms have been rapidly incorporated into many different applications due to the increase in computational power and the availability of massive amounts of data.Recently,both deep learning and ensemble learning have been used to recognize underlying structures and patterns from high-level features to make predictions/decisions.With the growth in popularity of deep learning and ensemble learning algorithms,they have received significant attention from both scientists and the industrial community due to their superior ability to learn features from big data.Ensemble deep learning has exhibited significant performance in enhancing learning generalization through the use of multiple deep learning algorithms.Although ensemble deep learning has large quantities of training parameters,which results in time and space overheads,it performs much better than traditional ensemble learning.Ensemble deep learning has been successfully used in several areas,such as bioinformatics,finance,and health care.In this paper,we review and investigate recent ensemble deep learning algorithms and techniques in health care domains,medical imaging,health care data analytics,genomics,diagnosis,disease prevention,and drug discovery.We cover several widely used deep learning algorithms along with their architectures,including deep neural networks(DNNs),convolutional neural networks(CNNs),recurrent neural networks(RNNs),and generative adversarial networks(GANs).Common healthcare tasks,such as medical imaging,electronic health records,and genomics,are also demonstrated.Furthermore,in this review,the challenges inherent in reducing the burden on the healthcare system are discussed and explored.Finally,future directions and opportunities for enhancing healthcare model performance are discussed.展开更多
Hepatocellular carcinoma(HCC)remains a leading cause of cancer-related mortality globally,necessitating advanced diagnostic tools to improve early detection and personalized targeted therapy.This review synthesizes ev...Hepatocellular carcinoma(HCC)remains a leading cause of cancer-related mortality globally,necessitating advanced diagnostic tools to improve early detection and personalized targeted therapy.This review synthesizes evidence on explainable ensemble learning approaches for HCC classification,emphasizing their integration with clinical workflows and multi-omics data.A systematic analysis[including datasets such as The Cancer Genome Atlas,Gene Expression Omnibus,and the Surveillance,Epidemiology,and End Results(SEER)datasets]revealed that explainable ensemble learning models achieve high diagnostic accuracy by combining clinical features,serum biomarkers such as alpha-fetoprotein,imaging features such as computed tomography and magnetic resonance imaging,and genomic data.For instance,SHapley Additive exPlanations(SHAP)-based random forests trained on NCBI GSE14520 microarray data(n=445)achieved 96.53%accuracy,while stacking ensembles applied to the SEER program data(n=1897)demonstrated an area under the receiver operating characteristic curve of 0.779 for mortality prediction.Despite promising results,challenges persist,including the computational costs of SHAP and local interpretable model-agnostic explanations analyses(e.g.,TreeSHAP requiring distributed computing for metabolomics datasets)and dataset biases(e.g.,SEER’s Western population dominance limiting generalizability).Future research must address inter-cohort heterogeneity,standardize explainability metrics,and prioritize lightweight surrogate models for resource-limited settings.This review presents the potential of explainable ensemble learning frameworks to bridge the gap between predictive accuracy and clinical interpretability,though rigorous validation in independent,multi-center cohorts is critical for real-world deployment.展开更多
This study was aimed to prepare landslide susceptibility maps for the Pithoragarh district in Uttarakhand,India,using advanced ensemble models that combined Radial Basis Function Networks(RBFN)with three ensemble lear...This study was aimed to prepare landslide susceptibility maps for the Pithoragarh district in Uttarakhand,India,using advanced ensemble models that combined Radial Basis Function Networks(RBFN)with three ensemble learning techniques:DAGGING(DG),MULTIBOOST(MB),and ADABOOST(AB).This combination resulted in three distinct ensemble models:DG-RBFN,MB-RBFN,and AB-RBFN.Additionally,a traditional weighted method,Information Value(IV),and a benchmark machine learning(ML)model,Multilayer Perceptron Neural Network(MLP),were employed for comparison and validation.The models were developed using ten landslide conditioning factors,which included slope,aspect,elevation,curvature,land cover,geomorphology,overburden depth,lithology,distance to rivers and distance to roads.These factors were instrumental in predicting the output variable,which was the probability of landslide occurrence.Statistical analysis of the models’performance indicated that the DG-RBFN model,with an Area Under ROC Curve(AUC)of 0.931,outperformed the other models.The AB-RBFN model achieved an AUC of 0.929,the MB-RBFN model had an AUC of 0.913,and the MLP model recorded an AUC of 0.926.These results suggest that the advanced ensemble ML model DG-RBFN was more accurate than traditional statistical model,single MLP model,and other ensemble models in preparing trustworthy landslide susceptibility maps,thereby enhancing land use planning and decision-making.展开更多
Typically, relationship between well logs and lithofacies is complex, which leads to low accuracy of lithofacies identification. Machine learning (ML) methods are often applied to identify lithofacies using logs label...Typically, relationship between well logs and lithofacies is complex, which leads to low accuracy of lithofacies identification. Machine learning (ML) methods are often applied to identify lithofacies using logs labelled by rock cores. However, these methods have accuracy limits to some extent. To further improve their accuracies, practical and novel ensemble learning strategy and principles are proposed in this work, which allows geologists not familiar with ML to establish a good ML lithofacies identification model and help geologists familiar with ML further improve accuracy of lithofacies identification. The ensemble learning strategy combines ML methods as sub-classifiers to generate a comprehensive lithofacies identification model, which aims to reduce the variance errors in prediction. Each sub-classifier is trained by randomly sampled labelled data with random features. The novelty of this work lies in the ensemble principles making sub-classifiers just overfitting by algorithm parameter setting and sub-dataset sampling. The principles can help reduce the bias errors in the prediction. Two issues are discussed, videlicet (1) whether only a relatively simple single-classifier method can be as sub-classifiers and how to select proper ML methods as sub-classifiers;(2) whether different kinds of ML methods can be combined as sub-classifiers. If yes, how to determine a proper combination. In order to test the effectiveness of the ensemble strategy and principles for lithofacies identification, different kinds of machine learning algorithms are selected as sub-classifiers, including regular classifiers (LDA, NB, KNN, ID3 tree and CART), kernel method (SVM), and ensemble learning algorithms (RF, AdaBoost, XGBoost and LightGBM). In this work, the experiments used a published dataset of lithofacies from Daniudi gas field (DGF) in Ordes Basin, China. Based on a series of comparisons between ML algorithms and their corresponding ensemble models using the ensemble strategy and principles, conclusions are drawn: (1) not only decision tree but also other single-classifiers and ensemble-learning-classifiers can be used as sub-classifiers of homogeneous ensemble learning and the ensemble can improve the accuracy of the original classifiers;(2) the ensemble principles for the introduced homogeneous and heterogeneous ensemble strategy are effective in promoting ML in lithofacies identification;(3) in practice, heterogeneous ensemble is more suitable for building a more powerful lithofacies identification model, though it is complex.展开更多
As batteries become increasingly essential for energy storage technologies,battery prognosis,and diagnosis remain central to ensure reliable operation and effective management,as well as to aid the in-depth investigat...As batteries become increasingly essential for energy storage technologies,battery prognosis,and diagnosis remain central to ensure reliable operation and effective management,as well as to aid the in-depth investigation of degradation mechanisms.However,dynamic operating conditions,cell-to-cell inconsistencies,and limited availability of labeled data have posed significant challenges to accurate and robust prognosis and diagnosis.Herein,we introduce a time-series-decomposition-based ensembled lightweight learning model(TELL-Me),which employs a synergistic dual-module framework to facilitate accurate and reliable forecasting.The feature module formulates features with physical implications and sheds light on battery aging mechanisms,while the gradient module monitors capacity degradation rates and captures aging trend.TELL-Me achieves high accuracy in end-of-life prediction using minimal historical data from a single battery without requiring offline training dataset,and demonstrates impressive generality and robustness across various operating conditions and battery types.Additionally,by correlating feature contributions with degradation mechanisms across different datasets,TELL-Me is endowed with the diagnostic ability that not only enhances prediction reliability but also provides critical insights into the design and optimization of next-generation batteries.展开更多
Smart manufacturing and Industry 4.0 are transforming traditional manufacturing processes by utilizing innovative technologies such as the artificial intelligence(AI)and internet of things(IoT)to enhance efficiency,re...Smart manufacturing and Industry 4.0 are transforming traditional manufacturing processes by utilizing innovative technologies such as the artificial intelligence(AI)and internet of things(IoT)to enhance efficiency,reduce costs,and ensure product quality.In light of the recent advancement of Industry 4.0,identifying defects has become important for ensuring the quality of products during the manufacturing process.In this research,we present an ensemble methodology for accurately classifying hot rolled steel surface defects by combining the strengths of four pre-trained convolutional neural network(CNN)architectures:VGG16,VGG19,Xception,and Mobile-Net V2,compensating for their individual weaknesses.We evaluated our methodology on the Xsteel surface defect dataset(XSDD),which comprises seven different classes.The ensemble methodology integrated the predictions of individual models through two methods:model averaging and weighted averaging.Our evaluation showed that the model averaging ensemble achieved an accuracy of 98.89%,a recall of 98.92%,a precision of 99.05%,and an F1-score of 98.97%,while the weighted averaging ensemble reached an accuracy of 99.72%,a recall of 99.74%,a precision of 99.67%,and an F1-score of 99.70%.The proposed weighted averaging ensemble model outperformed the model averaging method and the individual models in detecting defects in terms of accuracy,recall,precision,and F1-score.Comparative analysis with recent studies also showed the superior performance of our methodology.展开更多
Ensemble learning,a pivotal branch of machine learning,amalgamates multiple base models to enhance the overarching performance of predictive models,capitalising on the diversity and collective wisdom of the ensemble t...Ensemble learning,a pivotal branch of machine learning,amalgamates multiple base models to enhance the overarching performance of predictive models,capitalising on the diversity and collective wisdom of the ensemble to surpass individual models and mitigate overfitting.In this review,a four-layer research framework is established for the research of ensemble learning,which can offer a comprehensive and structured review of ensemble learning from bottom to top.Firstly,this survey commences by introducing fundamental ensemble learning techniques,including bagging,boosting,and stacking,while also exploring the ensemble's diversity.Then,deep ensemble learning and semi-supervised ensemble learning are studied in detail.Furthermore,the utilisation of ensemble learning techniques to navigate challenging datasets,such as imbalanced and highdimensional data,is discussed.The application of ensemble learning techniques across various research domains,including healthcare,transportation,finance,manufacturing,and the Internet,is also examined.The survey concludes by discussing challenges intrinsic to ensemble learning.展开更多
Non-technical losses(NTL)of electric power are a serious problem for electric distribution companies.The solution determines the cost,stability,reliability,and quality of the supplied electricity.The widespread use of...Non-technical losses(NTL)of electric power are a serious problem for electric distribution companies.The solution determines the cost,stability,reliability,and quality of the supplied electricity.The widespread use of advanced metering infrastructure(AMI)and Smart Grid allows all participants in the distribution grid to store and track electricity consumption.During the research,a machine learning model is developed that allows analyzing and predicting the probability of NTL for each consumer of the distribution grid based on daily electricity consumption readings.This model is an ensemble meta-algorithm(stacking)that generalizes the algorithms of random forest,LightGBM,and a homogeneous ensemble of artificial neural networks.The best accuracy of the proposed meta-algorithm in comparison to basic classifiers is experimentally confirmed on the test sample.Such a model,due to good accuracy indicators(ROC-AUC-0.88),can be used as a methodological basis for a decision support system,the purpose of which is to form a sample of suspected NTL sources.The use of such a sample will allow the top management of electric distribution companies to increase the efficiency of raids by performers,making them targeted and accurate,which should contribute to the fight against NTL and the sustainable development of the electric power industry.展开更多
Heart disease prediction is a critical issue in healthcare,where accurate early diagnosis can save lives and reduce healthcare costs.The problem is inherently complex due to the high dimensionality of medical data,irr...Heart disease prediction is a critical issue in healthcare,where accurate early diagnosis can save lives and reduce healthcare costs.The problem is inherently complex due to the high dimensionality of medical data,irrelevant or redundant features,and the variability in risk factors such as age,lifestyle,andmedical history.These challenges often lead to inefficient and less accuratemodels.Traditional predictionmethodologies face limitations in effectively handling large feature sets and optimizing classification performance,which can result in overfitting poor generalization,and high computational cost.This work proposes a novel classification model for heart disease prediction that addresses these challenges by integrating feature selection through a Genetic Algorithm(GA)with an ensemble deep learning approach optimized using the Tunicate Swarm Algorithm(TSA).GA selects the most relevant features,reducing dimensionality and improvingmodel efficiency.Theselected features are then used to train an ensemble of deep learning models,where the TSA optimizes the weight of each model in the ensemble to enhance prediction accuracy.This hybrid approach addresses key challenges in the field,such as high dimensionality,redundant features,and classification performance,by introducing an efficient feature selection mechanism and optimizing the weighting of deep learning models in the ensemble.These enhancements result in a model that achieves superior accuracy,generalization,and efficiency compared to traditional methods.The proposed model demonstrated notable advancements in both prediction accuracy and computational efficiency over traditionalmodels.Specifically,it achieved an accuracy of 97.5%,a sensitivity of 97.2%,and a specificity of 97.8%.Additionally,with a 60-40 data split and 5-fold cross-validation,the model showed a significant reduction in training time(90 s),memory consumption(950 MB),and CPU usage(80%),highlighting its effectiveness in processing large,complex medical datasets for heart disease prediction.展开更多
Pangu-Weather(PGW),trained with deep learning–based methods(DL-based model),shows significant potential for global medium-range weather forecasting.However,the interpretability and trustworthiness of global medium-ra...Pangu-Weather(PGW),trained with deep learning–based methods(DL-based model),shows significant potential for global medium-range weather forecasting.However,the interpretability and trustworthiness of global medium-range DLbased models raise many concerns.This study uses the singular vector(SV)initial condition(IC)perturbations of the China Meteorological Administration's Global Ensemble Prediction System(CMA-GEPS)as inputs of PGW for global ensemble prediction(PGW-GEPS)to investigate the ensemble forecast sensitivity of DL-based models to the IC errors.Meanwhile,the CMA-GEPS forecasts serve as benchmarks for comparison and verification.The spatial structures and prediction performance of PGW-GEPS are discussed and compared to CMA-GEPS based on seasonal ensemble experiments.The results show that the ensemble mean and dispersion of PGW-GEPS are similar to those of CMA-GEPS in the medium range but with smoother forecasts.Meanwhile,PGW-GEPS is sensitive to the SV IC perturbations.Specifically,PGWGEPS can generate realistic ensemble spread beyond the sub-synoptic scale(wavenumbers≤64)with SV IC perturbations.However,PGW's kinetic energy is significantly reduced at the sub-synoptic scale,leading to error growth behavior inconsistent with CMA-GEPS at that scale.Thus,this behavior indicates that the effective resolution of PGW-GEPS is beyond the sub-synoptic scale and is limited to predicting mesoscale atmospheric motions.In terms of the global mediumrange ensemble prediction performance,the probability prediction skill of PGW-GEPS is comparable to CMA-GEPS in the extratropic when they use the same IC perturbations.That means that PGW has a general ability to provide skillful global medium-range forecasts with different ICs from numerical weather prediction.展开更多
Today,phishing is an online attack designed to obtain sensitive information such as credit card and bank account numbers,passwords,and usernames.We can find several anti-phishing solutions,such as heuristic detection,...Today,phishing is an online attack designed to obtain sensitive information such as credit card and bank account numbers,passwords,and usernames.We can find several anti-phishing solutions,such as heuristic detection,virtual similarity detection,black and white lists,and machine learning(ML).However,phishing attempts remain a problem,and establishing an effective anti-phishing strategy is a work in progress.Furthermore,while most antiphishing solutions achieve the highest levels of accuracy on a given dataset,their methods suffer from an increased number of false positives.These methods are ineffective against zero-hour attacks.Phishing sites with a high False Positive Rate(FPR)are considered genuine because they can cause people to lose a lot ofmoney by visiting them.Feature selection is critical when developing phishing detection strategies.Good feature selection helps improve accuracy;however,duplicate features can also increase noise in the dataset and reduce the accuracy of the algorithm.Therefore,a combination of filter-based feature selection methods is proposed to detect phishing attacks,including constant feature removal,duplicate feature removal,quasi-feature removal,correlated feature removal,mutual information extraction,and Analysis of Variance(ANOVA)testing.The technique has been tested with differentMachine Learning classifiers:Random Forest,Artificial Neural Network(ANN),Ada-Boost,Extreme Gradient Boosting(XGBoost),Logistic Regression,Decision Trees,Gradient Boosting Classifiers,Support Vector Machine(SVM),and two types of ensemble models,stacking and majority voting to gain A low false positive rate is achieved.Stacked ensemble classifiers(gradient boosting,randomforest,support vector machine)achieve 1.31%FPR and 98.17%accuracy on Dataset 1,2.81%FPR and Dataset 3 shows 2.81%FPR and 97.61%accuracy,while Dataset 2 shows 3.47%FPR and 96.47%accuracy.展开更多
This paper proposes a novel hybrid fraud detection framework that integrates multi-stage feature selection,unsupervised clustering,and ensemble learning to improve classification performance in financial transaction m...This paper proposes a novel hybrid fraud detection framework that integrates multi-stage feature selection,unsupervised clustering,and ensemble learning to improve classification performance in financial transaction monitoring systems.The framework is structured into three core layers:(1)feature selection using Recursive Feature Elimination(RFE),Principal Component Analysis(PCA),and Mutual Information(MI)to reduce dimensionality and enhance input relevance;(2)anomaly detection through unsupervised clustering using K-Means,Density-Based Spatial Clustering(DBSCAN),and Hierarchical Clustering to flag suspicious patterns in unlabeled data;and(3)final classification using a voting-based hybrid ensemble of Support Vector Machine(SVM),Random Forest(RF),and Gradient Boosting Classifier(GBC).The experimental evaluation is conducted on a synthetically generated dataset comprising one million financial transactions,with 5% labelled as fraudulent,simulating realistic fraud rates and behavioural features,including transaction time,origin,amount,and geo-location.The proposed model demonstrated a significant improvement over baseline classifiers,achieving an accuracy of 99%,a precision of 99%,a recall of 97%,and an F1-score of 99%.Compared to individual models,it yielded a 9% gain in overall detection accuracy.It reduced the false positive rate to below 3.5%,thereby minimising the operational costs associated with manually reviewing false alerts.The model’s interpretability is enhanced by the integration of Shapley Additive Explanations(SHAP)values for feature importance,supporting transparency and regulatory auditability.These results affirm the practical relevance of the proposed system for deployment in real-time fraud detection scenarios such as credit card transactions,mobile banking,and cross-border payments.The study also highlights future directions,including the deployment of lightweight models and the integration of multimodal data for scalable fraud analytics.展开更多
The biomass and coal co-pyrolysis (BCP) technology combines the advantages of both resources, achieving efficient resource complementarity, reducing reliance on coal, and minimizing pollutant emissions. However, this ...The biomass and coal co-pyrolysis (BCP) technology combines the advantages of both resources, achieving efficient resource complementarity, reducing reliance on coal, and minimizing pollutant emissions. However, this process still encounters numerous challenges in attaining optimal economic and environmental performance. Therefore, an ensemble learning (EL) framework is proposed for the BCP process in this study to optimize the synergistic benefits while minimizing negative environmental impacts. Six different ensemble learning models are developed to investigate the impact of input features, such as biomass characteristics, coal characteristics, and pyrolysis conditions on the product profit and CO_(2) emissions of the BCP processes. The Optuna method is further employed to automatically optimize the hyperparameters of BCP process models for enhancing their predictive accuracy and robustness. The results indicate that the categorical boosting (CAB) model of the BCP process has demonstrated exceptional performance in accurately predicting its product profit and CO_(2) emission (R2>0.92) after undergoing five-fold cross-validation. To enhance the interpretability of this preferred model, the Shapley additive explanations and partial dependence plot analyses are conducted to evaluate the impact and importance of biomass characteristics, coal characteristics, and pyrolysis conditions on the product profitability and CO_(2) emissions of the BCP processes. Finally, the preferred model coupled with a reference vector guided evolutionary algorithm is carried to identify the optimal conditions for maximizing the product profit of BCP process products while minimizing CO_(2) emissions. It indicates the optimal BCP process can achieve high product profits (5290.85 CNY·t−1) and low CO_(2) emissions (7.45 kg·t^(−1)).展开更多
Glaucoma,a chronic eye disease affecting millions worldwide,poses a substantial threat to eyesight and can result in permanent vision loss if left untreated.Manual identification of glaucoma is a complicated and time-...Glaucoma,a chronic eye disease affecting millions worldwide,poses a substantial threat to eyesight and can result in permanent vision loss if left untreated.Manual identification of glaucoma is a complicated and time-consuming practice requiring specialized expertise and results may be subjective.To address these challenges,this research proposes a computer-aided diagnosis(CAD)approach using Artificial Intelligence(AI)techniques for binary and multiclass classification of glaucoma stages.An ensemble fusion mechanism that combines the outputs of three pre-trained convolutional neural network(ConvNet)models–ResNet-50,VGG-16,and InceptionV3 is utilized in this paper.This fusion technique enhances diagnostic accuracy and robustness by ensemble-averaging the predictions from individual models,leveraging their complementary strengths.The objective of this work is to assess the model’s capability for early-stage glaucoma diagnosis.Classification is performed on a dataset collected from the Harvard Dataverse repository.With the proposed technique,for Normal vs.Advanced glaucoma classification,a validation accuracy of 98.04%and testing accuracy of 98.03%is achieved,with a specificity of 100%which outperforms stateof-the-art methods.For multiclass classification,the suggested ensemble approach achieved a precision and sensitivity of 97%,specificity,and testing accuracy of 98.57%and 96.82%,respectively.The proposed E-GlauNet model has significant potential in assisting ophthalmologists in the screening and fast diagnosis of glaucoma,leading to more reliable,efficient,and timely diagnosis,particularly for early-stage detection and staging of the disease.While the proposed method demonstrates high accuracy and robustness,the study is limited by the evaluation of a single dataset.Future work will focus on external validation across diverse datasets and enhancing interpretability using explainable AI techniques.展开更多
Healthcare networks prove to be an urgent issue in terms of intrusion detection due to the critical consequences of cyber threats and the extreme sensitivity of medical information.The proposed Auto-Stack ID in the st...Healthcare networks prove to be an urgent issue in terms of intrusion detection due to the critical consequences of cyber threats and the extreme sensitivity of medical information.The proposed Auto-Stack ID in the study is a stacked ensemble of encoder-enhanced auctions that can be used to improve intrusion detection in healthcare networks.TheWUSTL-EHMS 2020 dataset trains and evaluates themodel,constituting an imbalanced class distribution(87.46% normal traffic and 12.53% intrusion attacks).To address this imbalance,the study balances the effect of training Bias through Stratified K-fold cross-validation(K=5),so that each class is represented similarly on training and validation splits.Second,the Auto-Stack ID method combines many base classifiers such as TabNet,LightGBM,Gaussian Naive Bayes,Histogram-Based Gradient Boosting(HGB),and Logistic Regression.We apply a two-stage training process based on the first stage,where we have base classifiers that predict out-of-fold(OOF)predictions,which we use as inputs for the second-stage meta-learner XGBoost.The meta-learner learns to refine predictions to capture complicated interactions between base models,thus improving detection accuracy without introducing bias,overfitting,or requiring domain knowledge of the meta-data.In addition,the auto-stack ID model got 98.41% accuracy and 93.45%F1 score,better than individual classifiers.It can identify intrusions due to its 90.55% recall and 96.53% precision with minimal false positives.These findings identify its suitability in ensuring healthcare networks’security through ensemble learning.Ongoing efforts will be deployed in real time to improve response to evolving threats.展开更多
Breast cancer is among the leading causes of cancer mortality globally,and its diagnosis through histopathological image analysis is often prone to inter-observer variability and misclassification.Existing machine lea...Breast cancer is among the leading causes of cancer mortality globally,and its diagnosis through histopathological image analysis is often prone to inter-observer variability and misclassification.Existing machine learning(ML)methods struggle with intra-class heterogeneity and inter-class similarity,necessitating more robust classification models.This study presents an ML classifier ensemble hybrid model for deep feature extraction with deep learning(DL)and Bat Swarm Optimization(BSO)hyperparameter optimization to improve breast cancer histopathology(BCH)image classification.A dataset of 804 Hematoxylin and Eosin(H&E)stained images classified as Benign,in situ,Invasive,and Normal categories(ICIAR2018_BACH_Challenge)has been utilized.ResNet50 was utilized for feature extraction,while Support Vector Machines(SVM),Random Forests(RF),XGBoosts(XGB),Decision Trees(DT),and AdaBoosts(ADB)were utilized for classification.BSO was utilized for hyperparameter optimization in a soft voting ensemble approach.Accuracy,precision,recall,specificity,F1-score,Receiver Operating Characteristic(ROC),and Precision-Recall(PR)were utilized for model performance metrics.The model using an ensemble outperformed individual classifiers in terms of having greater accuracy(~90.0%),precision(~86.4%),recall(~86.3%),and specificity(~96.6%).The robustness of the model was verified by both ROC and PR curves,which showed AUC values of 1.00,0.99,and 0.98 for Benign,Invasive,and in situ instances,respectively.This ensemble model delivers a strong and clinically valid methodology for breast cancer classification that enhances precision and minimizes diagnostic errors.Future work should focus on explainable AI,multi-modal fusion,few-shot learning,and edge computing for real-world deployment.展开更多
The potential applications of multimodal physiological signals in healthcare,pain monitoring,and clinical decision support systems have garnered significant attention in biomedical research.Subjective self-reporting i...The potential applications of multimodal physiological signals in healthcare,pain monitoring,and clinical decision support systems have garnered significant attention in biomedical research.Subjective self-reporting is the foundation of conventional pain assessment methods,which may be unreliable.Deep learning is a promising alternative to resolve this limitation through automated pain classification.This paper proposes an ensemble deep-learning framework for pain assessment.The framework makes use of features collected from electromyography(EMG),skin conductance level(SCL),and electrocardiography(ECG)signals.We integrate Convolutional Neural Networks(CNN),Long Short-Term Memory Networks(LSTM),Bidirectional Gated Recurrent Units(BiGRU),and Deep Neural Networks(DNN)models.We then aggregate their predictions using a weighted averaging ensemble technique to increase the classification’s robustness.To improve computing efficiency and remove redundant features,we use Particle Swarm Optimization(PSO)for feature selection.This enables us to reduce the features’dimensionality without sacrificing the classification’s accuracy.With improved accuracy,precision,recall,and F1-score across all pain levels,the experimental results show that the suggested ensemble model performs better than individual deep learning classifiers.In our experiments,the suggested model achieved over 98%accuracy,suggesting promising automated pain assessment performance.However,due to differences in validation protocols,comparisons with previous studies are still limited.Combining deep learning and feature selection techniques significantly improves model generalization,reducing overfitting and enhancing classification performance.The evaluation was conducted using the BioVid Heat Pain Dataset,confirming the model’s effectiveness in distinguishing between different pain intensity levels.展开更多
基金the Deanship of Research and Graduate Studies at King Khalid University,KSA,for funding this work through the Large Research Project under grant number RGP2/164/46.
文摘Background:Stomach cancer(SC)is one of the most lethal malignancies worldwide due to late-stage diagnosis and limited treatment.The transcriptomic,epigenomic,and proteomic,etc.,omics datasets generated by high-throughput sequencing technology have become prominent in biomedical research,and they reveal molecular aspects of cancer diagnosis and therapy.Despite the development of advanced sequencing technology,the presence of high-dimensionality in multi-omics data makes it challenging to interpret the data.Methods:In this study,we introduce RankXLAN,an explainable ensemble-based multi-omics framework that integrates feature selection(FS),ensemble learning,bioinformatics,and in-silico validation for robust biomarker detection,potential therapeutic drug-repurposing candidates’identification,and classification of SC.To enhance the interpretability of the model,we incorporated explainable artificial intelligence(SHapley Additive exPlanations analysis),as well as accuracy,precision,F1-score,recall,cross-validation,specificity,likelihood ratio(LR)+,LR−,and Youden index results.Results:The experimental results showed that the top four FS algorithms achieved improved results when applied to the ensemble learning classification model.The proposed ensemble model produced an area under the curve(AUC)score of 0.994 for gene expression,0.97 for methylation,and 0.96 for miRNA expression data.Through the integration of bioinformatics and ML approach of the transcriptomic and epigenomic multi-omics dataset,we identified potential marker genes,namely,UBE2D2,HPCAL4,IGHA1,DPT,and FN3K.In-silico molecular docking revealed a strong binding affinity between ANKRD13C and the FDA-approved drug Everolimus(binding affinity−10.1 kcal/mol),identifying ANKRD13C as a potential therapeutic drug-repurposing target for SC.Conclusion:The proposed framework RankXLAN outperforms other existing frameworks for serum biomarker identification,therapeutic target identification,and SC classification with multi-omics datasets.
基金supported by the NSFC(12171038)and 985 Projects。
文摘In this paper,we consider the Fisher informations among three classical type β-ensembles when β>0 scales with n satisfying lim βn=∞.We offer the exact order of-the corresponding two Fisher informations,which indicates that theβ-Laguerre ensembles do not satisfy the logarithmic Sobolev inequality.We also give some limit theorems on the extremals of β-Jacobi ensembles for β>0 fixed.
基金National Natural Science Foundation of China (52075420)Fundamental Research Funds for the Central Universities (xzy022023049)National Key Research and Development Program of China (2023YFB3408600)。
文摘The burgeoning market for lithium-ion batteries has stimulated a growing need for more reliable battery performance monitoring. Accurate state-of-health(SOH) estimation is critical for ensuring battery operational performance. Despite numerous data-driven methods reported in existing research for battery SOH estimation, these methods often exhibit inconsistent performance across different application scenarios. To address this issue and overcome the performance limitations of individual data-driven models,integrating multiple models for SOH estimation has received considerable attention. Ensemble learning(EL) typically leverages the strengths of multiple base models to achieve more robust and accurate outputs. However, the lack of a clear review of current research hinders the further development of ensemble methods in SOH estimation. Therefore, this paper comprehensively reviews multi-model ensemble learning methods for battery SOH estimation. First, existing ensemble methods are systematically categorized into 6 classes based on their combination strategies. Different realizations and underlying connections are meticulously analyzed for each category of EL methods, highlighting distinctions, innovations, and typical applications. Subsequently, these ensemble methods are comprehensively compared in terms of base models, combination strategies, and publication trends. Evaluations across 6 dimensions underscore the outstanding performance of stacking-based ensemble methods. Following this, these ensemble methods are further inspected from the perspectives of weighted ensemble and diversity, aiming to inspire potential approaches for enhancing ensemble performance. Moreover, addressing challenges such as base model selection, measuring model robustness and uncertainty, and interpretability of ensemble models in practical applications is emphasized. Finally, future research prospects are outlined, specifically noting that deep learning ensemble is poised to advance ensemble methods for battery SOH estimation. The convergence of advanced machine learning with ensemble learning is anticipated to yield valuable avenues for research. Accelerated research in ensemble learning holds promising prospects for achieving more accurate and reliable battery SOH estimation under real-world conditions.
基金funded by Taif University,Saudi Arabia,project No.(TU-DSPP-2024-263).
文摘Deep learning algorithms have been rapidly incorporated into many different applications due to the increase in computational power and the availability of massive amounts of data.Recently,both deep learning and ensemble learning have been used to recognize underlying structures and patterns from high-level features to make predictions/decisions.With the growth in popularity of deep learning and ensemble learning algorithms,they have received significant attention from both scientists and the industrial community due to their superior ability to learn features from big data.Ensemble deep learning has exhibited significant performance in enhancing learning generalization through the use of multiple deep learning algorithms.Although ensemble deep learning has large quantities of training parameters,which results in time and space overheads,it performs much better than traditional ensemble learning.Ensemble deep learning has been successfully used in several areas,such as bioinformatics,finance,and health care.In this paper,we review and investigate recent ensemble deep learning algorithms and techniques in health care domains,medical imaging,health care data analytics,genomics,diagnosis,disease prevention,and drug discovery.We cover several widely used deep learning algorithms along with their architectures,including deep neural networks(DNNs),convolutional neural networks(CNNs),recurrent neural networks(RNNs),and generative adversarial networks(GANs).Common healthcare tasks,such as medical imaging,electronic health records,and genomics,are also demonstrated.Furthermore,in this review,the challenges inherent in reducing the burden on the healthcare system are discussed and explored.Finally,future directions and opportunities for enhancing healthcare model performance are discussed.
文摘Hepatocellular carcinoma(HCC)remains a leading cause of cancer-related mortality globally,necessitating advanced diagnostic tools to improve early detection and personalized targeted therapy.This review synthesizes evidence on explainable ensemble learning approaches for HCC classification,emphasizing their integration with clinical workflows and multi-omics data.A systematic analysis[including datasets such as The Cancer Genome Atlas,Gene Expression Omnibus,and the Surveillance,Epidemiology,and End Results(SEER)datasets]revealed that explainable ensemble learning models achieve high diagnostic accuracy by combining clinical features,serum biomarkers such as alpha-fetoprotein,imaging features such as computed tomography and magnetic resonance imaging,and genomic data.For instance,SHapley Additive exPlanations(SHAP)-based random forests trained on NCBI GSE14520 microarray data(n=445)achieved 96.53%accuracy,while stacking ensembles applied to the SEER program data(n=1897)demonstrated an area under the receiver operating characteristic curve of 0.779 for mortality prediction.Despite promising results,challenges persist,including the computational costs of SHAP and local interpretable model-agnostic explanations analyses(e.g.,TreeSHAP requiring distributed computing for metabolomics datasets)and dataset biases(e.g.,SEER’s Western population dominance limiting generalizability).Future research must address inter-cohort heterogeneity,standardize explainability metrics,and prioritize lightweight surrogate models for resource-limited settings.This review presents the potential of explainable ensemble learning frameworks to bridge the gap between predictive accuracy and clinical interpretability,though rigorous validation in independent,multi-center cohorts is critical for real-world deployment.
基金the University of Transport Technology under the project entitled“Application of Machine Learning Algorithms in Landslide Susceptibility Mapping in Mountainous Areas”with grant number DTTD2022-16.
文摘This study was aimed to prepare landslide susceptibility maps for the Pithoragarh district in Uttarakhand,India,using advanced ensemble models that combined Radial Basis Function Networks(RBFN)with three ensemble learning techniques:DAGGING(DG),MULTIBOOST(MB),and ADABOOST(AB).This combination resulted in three distinct ensemble models:DG-RBFN,MB-RBFN,and AB-RBFN.Additionally,a traditional weighted method,Information Value(IV),and a benchmark machine learning(ML)model,Multilayer Perceptron Neural Network(MLP),were employed for comparison and validation.The models were developed using ten landslide conditioning factors,which included slope,aspect,elevation,curvature,land cover,geomorphology,overburden depth,lithology,distance to rivers and distance to roads.These factors were instrumental in predicting the output variable,which was the probability of landslide occurrence.Statistical analysis of the models’performance indicated that the DG-RBFN model,with an Area Under ROC Curve(AUC)of 0.931,outperformed the other models.The AB-RBFN model achieved an AUC of 0.929,the MB-RBFN model had an AUC of 0.913,and the MLP model recorded an AUC of 0.926.These results suggest that the advanced ensemble ML model DG-RBFN was more accurate than traditional statistical model,single MLP model,and other ensemble models in preparing trustworthy landslide susceptibility maps,thereby enhancing land use planning and decision-making.
基金financially supported by the National Natural Science Foundation of China(Grant No.42002134)China Postdoctoral Science Foundation(Grant No.2021T140735)Science Foundation of China University of Petroleum,Beijing(Grant Nos.2462020XKJS02 and 2462020YXZZ004).
文摘Typically, relationship between well logs and lithofacies is complex, which leads to low accuracy of lithofacies identification. Machine learning (ML) methods are often applied to identify lithofacies using logs labelled by rock cores. However, these methods have accuracy limits to some extent. To further improve their accuracies, practical and novel ensemble learning strategy and principles are proposed in this work, which allows geologists not familiar with ML to establish a good ML lithofacies identification model and help geologists familiar with ML further improve accuracy of lithofacies identification. The ensemble learning strategy combines ML methods as sub-classifiers to generate a comprehensive lithofacies identification model, which aims to reduce the variance errors in prediction. Each sub-classifier is trained by randomly sampled labelled data with random features. The novelty of this work lies in the ensemble principles making sub-classifiers just overfitting by algorithm parameter setting and sub-dataset sampling. The principles can help reduce the bias errors in the prediction. Two issues are discussed, videlicet (1) whether only a relatively simple single-classifier method can be as sub-classifiers and how to select proper ML methods as sub-classifiers;(2) whether different kinds of ML methods can be combined as sub-classifiers. If yes, how to determine a proper combination. In order to test the effectiveness of the ensemble strategy and principles for lithofacies identification, different kinds of machine learning algorithms are selected as sub-classifiers, including regular classifiers (LDA, NB, KNN, ID3 tree and CART), kernel method (SVM), and ensemble learning algorithms (RF, AdaBoost, XGBoost and LightGBM). In this work, the experiments used a published dataset of lithofacies from Daniudi gas field (DGF) in Ordes Basin, China. Based on a series of comparisons between ML algorithms and their corresponding ensemble models using the ensemble strategy and principles, conclusions are drawn: (1) not only decision tree but also other single-classifiers and ensemble-learning-classifiers can be used as sub-classifiers of homogeneous ensemble learning and the ensemble can improve the accuracy of the original classifiers;(2) the ensemble principles for the introduced homogeneous and heterogeneous ensemble strategy are effective in promoting ML in lithofacies identification;(3) in practice, heterogeneous ensemble is more suitable for building a more powerful lithofacies identification model, though it is complex.
基金supported by the National Natural Science Foundation of China(22379021 and 22479021)。
文摘As batteries become increasingly essential for energy storage technologies,battery prognosis,and diagnosis remain central to ensure reliable operation and effective management,as well as to aid the in-depth investigation of degradation mechanisms.However,dynamic operating conditions,cell-to-cell inconsistencies,and limited availability of labeled data have posed significant challenges to accurate and robust prognosis and diagnosis.Herein,we introduce a time-series-decomposition-based ensembled lightweight learning model(TELL-Me),which employs a synergistic dual-module framework to facilitate accurate and reliable forecasting.The feature module formulates features with physical implications and sheds light on battery aging mechanisms,while the gradient module monitors capacity degradation rates and captures aging trend.TELL-Me achieves high accuracy in end-of-life prediction using minimal historical data from a single battery without requiring offline training dataset,and demonstrates impressive generality and robustness across various operating conditions and battery types.Additionally,by correlating feature contributions with degradation mechanisms across different datasets,TELL-Me is endowed with the diagnostic ability that not only enhances prediction reliability but also provides critical insights into the design and optimization of next-generation batteries.
基金supported by the Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(NRF-2022R1I1A3063493).
文摘Smart manufacturing and Industry 4.0 are transforming traditional manufacturing processes by utilizing innovative technologies such as the artificial intelligence(AI)and internet of things(IoT)to enhance efficiency,reduce costs,and ensure product quality.In light of the recent advancement of Industry 4.0,identifying defects has become important for ensuring the quality of products during the manufacturing process.In this research,we present an ensemble methodology for accurately classifying hot rolled steel surface defects by combining the strengths of four pre-trained convolutional neural network(CNN)architectures:VGG16,VGG19,Xception,and Mobile-Net V2,compensating for their individual weaknesses.We evaluated our methodology on the Xsteel surface defect dataset(XSDD),which comprises seven different classes.The ensemble methodology integrated the predictions of individual models through two methods:model averaging and weighted averaging.Our evaluation showed that the model averaging ensemble achieved an accuracy of 98.89%,a recall of 98.92%,a precision of 99.05%,and an F1-score of 98.97%,while the weighted averaging ensemble reached an accuracy of 99.72%,a recall of 99.74%,a precision of 99.67%,and an F1-score of 99.70%.The proposed weighted averaging ensemble model outperformed the model averaging method and the individual models in detecting defects in terms of accuracy,recall,precision,and F1-score.Comparative analysis with recent studies also showed the superior performance of our methodology.
基金supported in part by National Natural Science Foundation of China No.92467109,U21A20478National Key R&D Program of China 2023YFA1011601the Major Key Project of PCL(Grant PCL2024A05).
文摘Ensemble learning,a pivotal branch of machine learning,amalgamates multiple base models to enhance the overarching performance of predictive models,capitalising on the diversity and collective wisdom of the ensemble to surpass individual models and mitigate overfitting.In this review,a four-layer research framework is established for the research of ensemble learning,which can offer a comprehensive and structured review of ensemble learning from bottom to top.Firstly,this survey commences by introducing fundamental ensemble learning techniques,including bagging,boosting,and stacking,while also exploring the ensemble's diversity.Then,deep ensemble learning and semi-supervised ensemble learning are studied in detail.Furthermore,the utilisation of ensemble learning techniques to navigate challenging datasets,such as imbalanced and highdimensional data,is discussed.The application of ensemble learning techniques across various research domains,including healthcare,transportation,finance,manufacturing,and the Internet,is also examined.The survey concludes by discussing challenges intrinsic to ensemble learning.
文摘Non-technical losses(NTL)of electric power are a serious problem for electric distribution companies.The solution determines the cost,stability,reliability,and quality of the supplied electricity.The widespread use of advanced metering infrastructure(AMI)and Smart Grid allows all participants in the distribution grid to store and track electricity consumption.During the research,a machine learning model is developed that allows analyzing and predicting the probability of NTL for each consumer of the distribution grid based on daily electricity consumption readings.This model is an ensemble meta-algorithm(stacking)that generalizes the algorithms of random forest,LightGBM,and a homogeneous ensemble of artificial neural networks.The best accuracy of the proposed meta-algorithm in comparison to basic classifiers is experimentally confirmed on the test sample.Such a model,due to good accuracy indicators(ROC-AUC-0.88),can be used as a methodological basis for a decision support system,the purpose of which is to form a sample of suspected NTL sources.The use of such a sample will allow the top management of electric distribution companies to increase the efficiency of raids by performers,making them targeted and accurate,which should contribute to the fight against NTL and the sustainable development of the electric power industry.
文摘Heart disease prediction is a critical issue in healthcare,where accurate early diagnosis can save lives and reduce healthcare costs.The problem is inherently complex due to the high dimensionality of medical data,irrelevant or redundant features,and the variability in risk factors such as age,lifestyle,andmedical history.These challenges often lead to inefficient and less accuratemodels.Traditional predictionmethodologies face limitations in effectively handling large feature sets and optimizing classification performance,which can result in overfitting poor generalization,and high computational cost.This work proposes a novel classification model for heart disease prediction that addresses these challenges by integrating feature selection through a Genetic Algorithm(GA)with an ensemble deep learning approach optimized using the Tunicate Swarm Algorithm(TSA).GA selects the most relevant features,reducing dimensionality and improvingmodel efficiency.Theselected features are then used to train an ensemble of deep learning models,where the TSA optimizes the weight of each model in the ensemble to enhance prediction accuracy.This hybrid approach addresses key challenges in the field,such as high dimensionality,redundant features,and classification performance,by introducing an efficient feature selection mechanism and optimizing the weighting of deep learning models in the ensemble.These enhancements result in a model that achieves superior accuracy,generalization,and efficiency compared to traditional methods.The proposed model demonstrated notable advancements in both prediction accuracy and computational efficiency over traditionalmodels.Specifically,it achieved an accuracy of 97.5%,a sensitivity of 97.2%,and a specificity of 97.8%.Additionally,with a 60-40 data split and 5-fold cross-validation,the model showed a significant reduction in training time(90 s),memory consumption(950 MB),and CPU usage(80%),highlighting its effectiveness in processing large,complex medical datasets for heart disease prediction.
基金supported by the joint funds of the Chinese National Natural Science Foundation(NSFC)(Grant No.U2242213)the funds of the NSFC(Grant No.42341209)+2 种基金the National Key Research and Development(R&D)Program of the Ministry of Science and Technology of China(Grant No.2021YFC3000902)the National Science Foundation for Young Scholars(Grant No.42205166)the Joint Research Project for Meteorological Capacity Improvement(Grant No.22NLTSQ008)。
文摘Pangu-Weather(PGW),trained with deep learning–based methods(DL-based model),shows significant potential for global medium-range weather forecasting.However,the interpretability and trustworthiness of global medium-range DLbased models raise many concerns.This study uses the singular vector(SV)initial condition(IC)perturbations of the China Meteorological Administration's Global Ensemble Prediction System(CMA-GEPS)as inputs of PGW for global ensemble prediction(PGW-GEPS)to investigate the ensemble forecast sensitivity of DL-based models to the IC errors.Meanwhile,the CMA-GEPS forecasts serve as benchmarks for comparison and verification.The spatial structures and prediction performance of PGW-GEPS are discussed and compared to CMA-GEPS based on seasonal ensemble experiments.The results show that the ensemble mean and dispersion of PGW-GEPS are similar to those of CMA-GEPS in the medium range but with smoother forecasts.Meanwhile,PGW-GEPS is sensitive to the SV IC perturbations.Specifically,PGWGEPS can generate realistic ensemble spread beyond the sub-synoptic scale(wavenumbers≤64)with SV IC perturbations.However,PGW's kinetic energy is significantly reduced at the sub-synoptic scale,leading to error growth behavior inconsistent with CMA-GEPS at that scale.Thus,this behavior indicates that the effective resolution of PGW-GEPS is beyond the sub-synoptic scale and is limited to predicting mesoscale atmospheric motions.In terms of the global mediumrange ensemble prediction performance,the probability prediction skill of PGW-GEPS is comparable to CMA-GEPS in the extratropic when they use the same IC perturbations.That means that PGW has a general ability to provide skillful global medium-range forecasts with different ICs from numerical weather prediction.
基金financially supported by the Deanship of Scientific Research and Graduate Studies at King Khalid University under research grant number(R.G.P.2/21/46)in part by the Deanship of Scientific Research,Vice Presidency for Graduate Studies and Scientific Research,King Faisal University,Saudi Arabia,under Grant KFU253116.
文摘Today,phishing is an online attack designed to obtain sensitive information such as credit card and bank account numbers,passwords,and usernames.We can find several anti-phishing solutions,such as heuristic detection,virtual similarity detection,black and white lists,and machine learning(ML).However,phishing attempts remain a problem,and establishing an effective anti-phishing strategy is a work in progress.Furthermore,while most antiphishing solutions achieve the highest levels of accuracy on a given dataset,their methods suffer from an increased number of false positives.These methods are ineffective against zero-hour attacks.Phishing sites with a high False Positive Rate(FPR)are considered genuine because they can cause people to lose a lot ofmoney by visiting them.Feature selection is critical when developing phishing detection strategies.Good feature selection helps improve accuracy;however,duplicate features can also increase noise in the dataset and reduce the accuracy of the algorithm.Therefore,a combination of filter-based feature selection methods is proposed to detect phishing attacks,including constant feature removal,duplicate feature removal,quasi-feature removal,correlated feature removal,mutual information extraction,and Analysis of Variance(ANOVA)testing.The technique has been tested with differentMachine Learning classifiers:Random Forest,Artificial Neural Network(ANN),Ada-Boost,Extreme Gradient Boosting(XGBoost),Logistic Regression,Decision Trees,Gradient Boosting Classifiers,Support Vector Machine(SVM),and two types of ensemble models,stacking and majority voting to gain A low false positive rate is achieved.Stacked ensemble classifiers(gradient boosting,randomforest,support vector machine)achieve 1.31%FPR and 98.17%accuracy on Dataset 1,2.81%FPR and Dataset 3 shows 2.81%FPR and 97.61%accuracy,while Dataset 2 shows 3.47%FPR and 96.47%accuracy.
基金funded by the Deanship of Scientific Research,Vice Presidency for Graduate Studies and Scientific Research,King Faisal University,Saudi Arabia[Grant No.KFU241683].
文摘This paper proposes a novel hybrid fraud detection framework that integrates multi-stage feature selection,unsupervised clustering,and ensemble learning to improve classification performance in financial transaction monitoring systems.The framework is structured into three core layers:(1)feature selection using Recursive Feature Elimination(RFE),Principal Component Analysis(PCA),and Mutual Information(MI)to reduce dimensionality and enhance input relevance;(2)anomaly detection through unsupervised clustering using K-Means,Density-Based Spatial Clustering(DBSCAN),and Hierarchical Clustering to flag suspicious patterns in unlabeled data;and(3)final classification using a voting-based hybrid ensemble of Support Vector Machine(SVM),Random Forest(RF),and Gradient Boosting Classifier(GBC).The experimental evaluation is conducted on a synthetically generated dataset comprising one million financial transactions,with 5% labelled as fraudulent,simulating realistic fraud rates and behavioural features,including transaction time,origin,amount,and geo-location.The proposed model demonstrated a significant improvement over baseline classifiers,achieving an accuracy of 99%,a precision of 99%,a recall of 97%,and an F1-score of 99%.Compared to individual models,it yielded a 9% gain in overall detection accuracy.It reduced the false positive rate to below 3.5%,thereby minimising the operational costs associated with manually reviewing false alerts.The model’s interpretability is enhanced by the integration of Shapley Additive Explanations(SHAP)values for feature importance,supporting transparency and regulatory auditability.These results affirm the practical relevance of the proposed system for deployment in real-time fraud detection scenarios such as credit card transactions,mobile banking,and cross-border payments.The study also highlights future directions,including the deployment of lightweight models and the integration of multimodal data for scalable fraud analytics.
基金support from the National Natural Science Foundation of China(22108052).
文摘The biomass and coal co-pyrolysis (BCP) technology combines the advantages of both resources, achieving efficient resource complementarity, reducing reliance on coal, and minimizing pollutant emissions. However, this process still encounters numerous challenges in attaining optimal economic and environmental performance. Therefore, an ensemble learning (EL) framework is proposed for the BCP process in this study to optimize the synergistic benefits while minimizing negative environmental impacts. Six different ensemble learning models are developed to investigate the impact of input features, such as biomass characteristics, coal characteristics, and pyrolysis conditions on the product profit and CO_(2) emissions of the BCP processes. The Optuna method is further employed to automatically optimize the hyperparameters of BCP process models for enhancing their predictive accuracy and robustness. The results indicate that the categorical boosting (CAB) model of the BCP process has demonstrated exceptional performance in accurately predicting its product profit and CO_(2) emission (R2>0.92) after undergoing five-fold cross-validation. To enhance the interpretability of this preferred model, the Shapley additive explanations and partial dependence plot analyses are conducted to evaluate the impact and importance of biomass characteristics, coal characteristics, and pyrolysis conditions on the product profitability and CO_(2) emissions of the BCP processes. Finally, the preferred model coupled with a reference vector guided evolutionary algorithm is carried to identify the optimal conditions for maximizing the product profit of BCP process products while minimizing CO_(2) emissions. It indicates the optimal BCP process can achieve high product profits (5290.85 CNY·t−1) and low CO_(2) emissions (7.45 kg·t^(−1)).
基金funded by Department of Robotics and Mechatronics Engineering,Kennesaw State University,Marietta,GA 30060,USA.
文摘Glaucoma,a chronic eye disease affecting millions worldwide,poses a substantial threat to eyesight and can result in permanent vision loss if left untreated.Manual identification of glaucoma is a complicated and time-consuming practice requiring specialized expertise and results may be subjective.To address these challenges,this research proposes a computer-aided diagnosis(CAD)approach using Artificial Intelligence(AI)techniques for binary and multiclass classification of glaucoma stages.An ensemble fusion mechanism that combines the outputs of three pre-trained convolutional neural network(ConvNet)models–ResNet-50,VGG-16,and InceptionV3 is utilized in this paper.This fusion technique enhances diagnostic accuracy and robustness by ensemble-averaging the predictions from individual models,leveraging their complementary strengths.The objective of this work is to assess the model’s capability for early-stage glaucoma diagnosis.Classification is performed on a dataset collected from the Harvard Dataverse repository.With the proposed technique,for Normal vs.Advanced glaucoma classification,a validation accuracy of 98.04%and testing accuracy of 98.03%is achieved,with a specificity of 100%which outperforms stateof-the-art methods.For multiclass classification,the suggested ensemble approach achieved a precision and sensitivity of 97%,specificity,and testing accuracy of 98.57%and 96.82%,respectively.The proposed E-GlauNet model has significant potential in assisting ophthalmologists in the screening and fast diagnosis of glaucoma,leading to more reliable,efficient,and timely diagnosis,particularly for early-stage detection and staging of the disease.While the proposed method demonstrates high accuracy and robustness,the study is limited by the evaluation of a single dataset.Future work will focus on external validation across diverse datasets and enhancing interpretability using explainable AI techniques.
基金funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2025R319),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia and Prince Sultan University for covering the article processing charges(APC)associated with this publicationResearchers Supporting Project Number(RSPD2025R1107),King Saud University,Riyadh,Saudi Arabia.
文摘Healthcare networks prove to be an urgent issue in terms of intrusion detection due to the critical consequences of cyber threats and the extreme sensitivity of medical information.The proposed Auto-Stack ID in the study is a stacked ensemble of encoder-enhanced auctions that can be used to improve intrusion detection in healthcare networks.TheWUSTL-EHMS 2020 dataset trains and evaluates themodel,constituting an imbalanced class distribution(87.46% normal traffic and 12.53% intrusion attacks).To address this imbalance,the study balances the effect of training Bias through Stratified K-fold cross-validation(K=5),so that each class is represented similarly on training and validation splits.Second,the Auto-Stack ID method combines many base classifiers such as TabNet,LightGBM,Gaussian Naive Bayes,Histogram-Based Gradient Boosting(HGB),and Logistic Regression.We apply a two-stage training process based on the first stage,where we have base classifiers that predict out-of-fold(OOF)predictions,which we use as inputs for the second-stage meta-learner XGBoost.The meta-learner learns to refine predictions to capture complicated interactions between base models,thus improving detection accuracy without introducing bias,overfitting,or requiring domain knowledge of the meta-data.In addition,the auto-stack ID model got 98.41% accuracy and 93.45%F1 score,better than individual classifiers.It can identify intrusions due to its 90.55% recall and 96.53% precision with minimal false positives.These findings identify its suitability in ensuring healthcare networks’security through ensemble learning.Ongoing efforts will be deployed in real time to improve response to evolving threats.
文摘Breast cancer is among the leading causes of cancer mortality globally,and its diagnosis through histopathological image analysis is often prone to inter-observer variability and misclassification.Existing machine learning(ML)methods struggle with intra-class heterogeneity and inter-class similarity,necessitating more robust classification models.This study presents an ML classifier ensemble hybrid model for deep feature extraction with deep learning(DL)and Bat Swarm Optimization(BSO)hyperparameter optimization to improve breast cancer histopathology(BCH)image classification.A dataset of 804 Hematoxylin and Eosin(H&E)stained images classified as Benign,in situ,Invasive,and Normal categories(ICIAR2018_BACH_Challenge)has been utilized.ResNet50 was utilized for feature extraction,while Support Vector Machines(SVM),Random Forests(RF),XGBoosts(XGB),Decision Trees(DT),and AdaBoosts(ADB)were utilized for classification.BSO was utilized for hyperparameter optimization in a soft voting ensemble approach.Accuracy,precision,recall,specificity,F1-score,Receiver Operating Characteristic(ROC),and Precision-Recall(PR)were utilized for model performance metrics.The model using an ensemble outperformed individual classifiers in terms of having greater accuracy(~90.0%),precision(~86.4%),recall(~86.3%),and specificity(~96.6%).The robustness of the model was verified by both ROC and PR curves,which showed AUC values of 1.00,0.99,and 0.98 for Benign,Invasive,and in situ instances,respectively.This ensemble model delivers a strong and clinically valid methodology for breast cancer classification that enhances precision and minimizes diagnostic errors.Future work should focus on explainable AI,multi-modal fusion,few-shot learning,and edge computing for real-world deployment.
基金funded by the Deanship of Graduate Studies and Scientific Research at Jouf University under grant No.(DGSSR-2023-02-02341).
文摘The potential applications of multimodal physiological signals in healthcare,pain monitoring,and clinical decision support systems have garnered significant attention in biomedical research.Subjective self-reporting is the foundation of conventional pain assessment methods,which may be unreliable.Deep learning is a promising alternative to resolve this limitation through automated pain classification.This paper proposes an ensemble deep-learning framework for pain assessment.The framework makes use of features collected from electromyography(EMG),skin conductance level(SCL),and electrocardiography(ECG)signals.We integrate Convolutional Neural Networks(CNN),Long Short-Term Memory Networks(LSTM),Bidirectional Gated Recurrent Units(BiGRU),and Deep Neural Networks(DNN)models.We then aggregate their predictions using a weighted averaging ensemble technique to increase the classification’s robustness.To improve computing efficiency and remove redundant features,we use Particle Swarm Optimization(PSO)for feature selection.This enables us to reduce the features’dimensionality without sacrificing the classification’s accuracy.With improved accuracy,precision,recall,and F1-score across all pain levels,the experimental results show that the suggested ensemble model performs better than individual deep learning classifiers.In our experiments,the suggested model achieved over 98%accuracy,suggesting promising automated pain assessment performance.However,due to differences in validation protocols,comparisons with previous studies are still limited.Combining deep learning and feature selection techniques significantly improves model generalization,reducing overfitting and enhancing classification performance.The evaluation was conducted using the BioVid Heat Pain Dataset,confirming the model’s effectiveness in distinguishing between different pain intensity levels.