Automated classification of gas flow states in blast furnaces using top-camera imagery typically demands a large volume of labeled data,whose manual annotation is both labor-intensive and cost-prohibitive.To mitigate ...Automated classification of gas flow states in blast furnaces using top-camera imagery typically demands a large volume of labeled data,whose manual annotation is both labor-intensive and cost-prohibitive.To mitigate this challenge,we present an enhanced semi-supervised learning approach based on the Mean Teacher framework,incorporating a novel feature loss module to maximize classification performance with limited labeled samples.The model studies show that the proposed model surpasses both the baseline Mean Teacher model and fully supervised method in accuracy.Specifically,for datasets with 20%,30%,and 40%label ratios,using a single training iteration,the model yields accuracies of 78.61%,82.21%,and 85.2%,respectively,while multiple-cycle training iterations achieves 82.09%,81.97%,and 81.59%,respectively.Furthermore,scenario-specific training schemes are introduced to support diverse deployment need.These findings highlight the potential of the proposed technique in minimizing labeling requirements and advancing intelligent blast furnace diagnostics.展开更多
Background:Stomach cancer(SC)is one of the most lethal malignancies worldwide due to late-stage diagnosis and limited treatment.The transcriptomic,epigenomic,and proteomic,etc.,omics datasets generated by high-through...Background:Stomach cancer(SC)is one of the most lethal malignancies worldwide due to late-stage diagnosis and limited treatment.The transcriptomic,epigenomic,and proteomic,etc.,omics datasets generated by high-throughput sequencing technology have become prominent in biomedical research,and they reveal molecular aspects of cancer diagnosis and therapy.Despite the development of advanced sequencing technology,the presence of high-dimensionality in multi-omics data makes it challenging to interpret the data.Methods:In this study,we introduce RankXLAN,an explainable ensemble-based multi-omics framework that integrates feature selection(FS),ensemble learning,bioinformatics,and in-silico validation for robust biomarker detection,potential therapeutic drug-repurposing candidates’identification,and classification of SC.To enhance the interpretability of the model,we incorporated explainable artificial intelligence(SHapley Additive exPlanations analysis),as well as accuracy,precision,F1-score,recall,cross-validation,specificity,likelihood ratio(LR)+,LR−,and Youden index results.Results:The experimental results showed that the top four FS algorithms achieved improved results when applied to the ensemble learning classification model.The proposed ensemble model produced an area under the curve(AUC)score of 0.994 for gene expression,0.97 for methylation,and 0.96 for miRNA expression data.Through the integration of bioinformatics and ML approach of the transcriptomic and epigenomic multi-omics dataset,we identified potential marker genes,namely,UBE2D2,HPCAL4,IGHA1,DPT,and FN3K.In-silico molecular docking revealed a strong binding affinity between ANKRD13C and the FDA-approved drug Everolimus(binding affinity−10.1 kcal/mol),identifying ANKRD13C as a potential therapeutic drug-repurposing target for SC.Conclusion:The proposed framework RankXLAN outperforms other existing frameworks for serum biomarker identification,therapeutic target identification,and SC classification with multi-omics datasets.展开更多
The burgeoning market for lithium-ion batteries has stimulated a growing need for more reliable battery performance monitoring. Accurate state-of-health(SOH) estimation is critical for ensuring battery operational per...The burgeoning market for lithium-ion batteries has stimulated a growing need for more reliable battery performance monitoring. Accurate state-of-health(SOH) estimation is critical for ensuring battery operational performance. Despite numerous data-driven methods reported in existing research for battery SOH estimation, these methods often exhibit inconsistent performance across different application scenarios. To address this issue and overcome the performance limitations of individual data-driven models,integrating multiple models for SOH estimation has received considerable attention. Ensemble learning(EL) typically leverages the strengths of multiple base models to achieve more robust and accurate outputs. However, the lack of a clear review of current research hinders the further development of ensemble methods in SOH estimation. Therefore, this paper comprehensively reviews multi-model ensemble learning methods for battery SOH estimation. First, existing ensemble methods are systematically categorized into 6 classes based on their combination strategies. Different realizations and underlying connections are meticulously analyzed for each category of EL methods, highlighting distinctions, innovations, and typical applications. Subsequently, these ensemble methods are comprehensively compared in terms of base models, combination strategies, and publication trends. Evaluations across 6 dimensions underscore the outstanding performance of stacking-based ensemble methods. Following this, these ensemble methods are further inspected from the perspectives of weighted ensemble and diversity, aiming to inspire potential approaches for enhancing ensemble performance. Moreover, addressing challenges such as base model selection, measuring model robustness and uncertainty, and interpretability of ensemble models in practical applications is emphasized. Finally, future research prospects are outlined, specifically noting that deep learning ensemble is poised to advance ensemble methods for battery SOH estimation. The convergence of advanced machine learning with ensemble learning is anticipated to yield valuable avenues for research. Accelerated research in ensemble learning holds promising prospects for achieving more accurate and reliable battery SOH estimation under real-world conditions.展开更多
Accurate prediction of the remaining useful life(RUL)is crucial for the design and management of lithium-ion batteries.Although various machine learning models offer promising predictions,one critical but often overlo...Accurate prediction of the remaining useful life(RUL)is crucial for the design and management of lithium-ion batteries.Although various machine learning models offer promising predictions,one critical but often overlooked challenge is their demand for considerable run-to-failure data for training.Collection of such training data leads to prohibitive testing efforts as the run-to-failure tests can last for years.Here,we propose a semi-supervised representation learning method to enhance prediction accuracy by learning from data without RUL labels.Our approach builds on a sophisticated deep neural network that comprises an encoder and three decoder heads to extract time-dependent representation features from short-term battery operating data regardless of the existence of RUL labels.The approach is validated using three datasets collected from 34 batteries operating under various conditions,encompassing over 19,900 charge and discharge cycles.Our method achieves a root mean squared error(RMSE)within 25 cycles,even when only 1/50 of the training dataset is labelled,representing a reduction of 48%compared to the conventional approach.We also demonstrate the method's robustness with varying numbers of labelled data and different weights assigned to the three decoder heads.The projection of extracted features in low space reveals that our method effectively learns degradation features from unlabelled data.Our approach highlights the promise of utilising semi-supervised learning to reduce the data demand for reliability monitoring of energy devices.展开更多
Deep learning algorithms have been rapidly incorporated into many different applications due to the increase in computational power and the availability of massive amounts of data.Recently,both deep learning and ensem...Deep learning algorithms have been rapidly incorporated into many different applications due to the increase in computational power and the availability of massive amounts of data.Recently,both deep learning and ensemble learning have been used to recognize underlying structures and patterns from high-level features to make predictions/decisions.With the growth in popularity of deep learning and ensemble learning algorithms,they have received significant attention from both scientists and the industrial community due to their superior ability to learn features from big data.Ensemble deep learning has exhibited significant performance in enhancing learning generalization through the use of multiple deep learning algorithms.Although ensemble deep learning has large quantities of training parameters,which results in time and space overheads,it performs much better than traditional ensemble learning.Ensemble deep learning has been successfully used in several areas,such as bioinformatics,finance,and health care.In this paper,we review and investigate recent ensemble deep learning algorithms and techniques in health care domains,medical imaging,health care data analytics,genomics,diagnosis,disease prevention,and drug discovery.We cover several widely used deep learning algorithms along with their architectures,including deep neural networks(DNNs),convolutional neural networks(CNNs),recurrent neural networks(RNNs),and generative adversarial networks(GANs).Common healthcare tasks,such as medical imaging,electronic health records,and genomics,are also demonstrated.Furthermore,in this review,the challenges inherent in reducing the burden on the healthcare system are discussed and explored.Finally,future directions and opportunities for enhancing healthcare model performance are discussed.展开更多
As batteries become increasingly essential for energy storage technologies,battery prognosis,and diagnosis remain central to ensure reliable operation and effective management,as well as to aid the in-depth investigat...As batteries become increasingly essential for energy storage technologies,battery prognosis,and diagnosis remain central to ensure reliable operation and effective management,as well as to aid the in-depth investigation of degradation mechanisms.However,dynamic operating conditions,cell-to-cell inconsistencies,and limited availability of labeled data have posed significant challenges to accurate and robust prognosis and diagnosis.Herein,we introduce a time-series-decomposition-based ensembled lightweight learning model(TELL-Me),which employs a synergistic dual-module framework to facilitate accurate and reliable forecasting.The feature module formulates features with physical implications and sheds light on battery aging mechanisms,while the gradient module monitors capacity degradation rates and captures aging trend.TELL-Me achieves high accuracy in end-of-life prediction using minimal historical data from a single battery without requiring offline training dataset,and demonstrates impressive generality and robustness across various operating conditions and battery types.Additionally,by correlating feature contributions with degradation mechanisms across different datasets,TELL-Me is endowed with the diagnostic ability that not only enhances prediction reliability but also provides critical insights into the design and optimization of next-generation batteries.展开更多
Ensemble learning,a pivotal branch of machine learning,amalgamates multiple base models to enhance the overarching performance of predictive models,capitalising on the diversity and collective wisdom of the ensemble t...Ensemble learning,a pivotal branch of machine learning,amalgamates multiple base models to enhance the overarching performance of predictive models,capitalising on the diversity and collective wisdom of the ensemble to surpass individual models and mitigate overfitting.In this review,a four-layer research framework is established for the research of ensemble learning,which can offer a comprehensive and structured review of ensemble learning from bottom to top.Firstly,this survey commences by introducing fundamental ensemble learning techniques,including bagging,boosting,and stacking,while also exploring the ensemble's diversity.Then,deep ensemble learning and semi-supervised ensemble learning are studied in detail.Furthermore,the utilisation of ensemble learning techniques to navigate challenging datasets,such as imbalanced and highdimensional data,is discussed.The application of ensemble learning techniques across various research domains,including healthcare,transportation,finance,manufacturing,and the Internet,is also examined.The survey concludes by discussing challenges intrinsic to ensemble learning.展开更多
Hepatocellular carcinoma(HCC)remains a leading cause of cancer-related mortality globally,necessitating advanced diagnostic tools to improve early detection and personalized targeted therapy.This review synthesizes ev...Hepatocellular carcinoma(HCC)remains a leading cause of cancer-related mortality globally,necessitating advanced diagnostic tools to improve early detection and personalized targeted therapy.This review synthesizes evidence on explainable ensemble learning approaches for HCC classification,emphasizing their integration with clinical workflows and multi-omics data.A systematic analysis[including datasets such as The Cancer Genome Atlas,Gene Expression Omnibus,and the Surveillance,Epidemiology,and End Results(SEER)datasets]revealed that explainable ensemble learning models achieve high diagnostic accuracy by combining clinical features,serum biomarkers such as alpha-fetoprotein,imaging features such as computed tomography and magnetic resonance imaging,and genomic data.For instance,SHapley Additive exPlanations(SHAP)-based random forests trained on NCBI GSE14520 microarray data(n=445)achieved 96.53%accuracy,while stacking ensembles applied to the SEER program data(n=1897)demonstrated an area under the receiver operating characteristic curve of 0.779 for mortality prediction.Despite promising results,challenges persist,including the computational costs of SHAP and local interpretable model-agnostic explanations analyses(e.g.,TreeSHAP requiring distributed computing for metabolomics datasets)and dataset biases(e.g.,SEER’s Western population dominance limiting generalizability).Future research must address inter-cohort heterogeneity,standardize explainability metrics,and prioritize lightweight surrogate models for resource-limited settings.This review presents the potential of explainable ensemble learning frameworks to bridge the gap between predictive accuracy and clinical interpretability,though rigorous validation in independent,multi-center cohorts is critical for real-world deployment.展开更多
Glaucoma,a chronic eye disease affecting millions worldwide,poses a substantial threat to eyesight and can result in permanent vision loss if left untreated.Manual identification of glaucoma is a complicated and time-...Glaucoma,a chronic eye disease affecting millions worldwide,poses a substantial threat to eyesight and can result in permanent vision loss if left untreated.Manual identification of glaucoma is a complicated and time-consuming practice requiring specialized expertise and results may be subjective.To address these challenges,this research proposes a computer-aided diagnosis(CAD)approach using Artificial Intelligence(AI)techniques for binary and multiclass classification of glaucoma stages.An ensemble fusion mechanism that combines the outputs of three pre-trained convolutional neural network(ConvNet)models–ResNet-50,VGG-16,and InceptionV3 is utilized in this paper.This fusion technique enhances diagnostic accuracy and robustness by ensemble-averaging the predictions from individual models,leveraging their complementary strengths.The objective of this work is to assess the model’s capability for early-stage glaucoma diagnosis.Classification is performed on a dataset collected from the Harvard Dataverse repository.With the proposed technique,for Normal vs.Advanced glaucoma classification,a validation accuracy of 98.04%and testing accuracy of 98.03%is achieved,with a specificity of 100%which outperforms stateof-the-art methods.For multiclass classification,the suggested ensemble approach achieved a precision and sensitivity of 97%,specificity,and testing accuracy of 98.57%and 96.82%,respectively.The proposed E-GlauNet model has significant potential in assisting ophthalmologists in the screening and fast diagnosis of glaucoma,leading to more reliable,efficient,and timely diagnosis,particularly for early-stage detection and staging of the disease.While the proposed method demonstrates high accuracy and robustness,the study is limited by the evaluation of a single dataset.Future work will focus on external validation across diverse datasets and enhancing interpretability using explainable AI techniques.展开更多
The potential applications of multimodal physiological signals in healthcare,pain monitoring,and clinical decision support systems have garnered significant attention in biomedical research.Subjective self-reporting i...The potential applications of multimodal physiological signals in healthcare,pain monitoring,and clinical decision support systems have garnered significant attention in biomedical research.Subjective self-reporting is the foundation of conventional pain assessment methods,which may be unreliable.Deep learning is a promising alternative to resolve this limitation through automated pain classification.This paper proposes an ensemble deep-learning framework for pain assessment.The framework makes use of features collected from electromyography(EMG),skin conductance level(SCL),and electrocardiography(ECG)signals.We integrate Convolutional Neural Networks(CNN),Long Short-Term Memory Networks(LSTM),Bidirectional Gated Recurrent Units(BiGRU),and Deep Neural Networks(DNN)models.We then aggregate their predictions using a weighted averaging ensemble technique to increase the classification’s robustness.To improve computing efficiency and remove redundant features,we use Particle Swarm Optimization(PSO)for feature selection.This enables us to reduce the features’dimensionality without sacrificing the classification’s accuracy.With improved accuracy,precision,recall,and F1-score across all pain levels,the experimental results show that the suggested ensemble model performs better than individual deep learning classifiers.In our experiments,the suggested model achieved over 98%accuracy,suggesting promising automated pain assessment performance.However,due to differences in validation protocols,comparisons with previous studies are still limited.Combining deep learning and feature selection techniques significantly improves model generalization,reducing overfitting and enhancing classification performance.The evaluation was conducted using the BioVid Heat Pain Dataset,confirming the model’s effectiveness in distinguishing between different pain intensity levels.展开更多
Accurately evaluating the safety status of lithium-ion battery systems in electric vehicles is imperative due to the challenges in effectively predicting potential battery failure risks under stochastic profiles.Compl...Accurately evaluating the safety status of lithium-ion battery systems in electric vehicles is imperative due to the challenges in effectively predicting potential battery failure risks under stochastic profiles.Complex battery fault mechanisms and limited poor-quality data collection impede fault detection for battery systems under real-world conditions.This paper proposes a novel graph-guided fault detection method designed to recognize concealed anomalies in realistic data.Graphs guided by physical relationships are constructed for learning the dynamic evolution of physical quantities under normal conditions and their potential change characteristics in fault scenarios.An ensemble Graph Sample and Aggregate Network model are developed to tackle sample distribution imbalances and non-uniformity battery system specifications across vehicles.Failure risk probabilities for diverse battery charging and discharging segments are derived.An ablation study verifies the necessity of ensemble learning in addressing imbalanced datasets.Analysis of 102,095 segments across 86 vehicles with different battery material systems,battery capacities,and numbers of cells and temperature sensors confirms the robustness and generalization of the proposed method,yielding a recall of 98.37%.By introducing the graph,spatio-temporal global fault characteristics of battery systems are automatically extracted.The coupling relationship and evolution of physical quantities under both normal and faulty states are established,effectively uncovering fault information hidden in collected battery data without observable anomalies.The safety state of battery systems is reflected in terms of failure risk probability,providing reliable data support for battery system maintenance.展开更多
Breast cancer is among the leading causes of cancer mortality globally,and its diagnosis through histopathological image analysis is often prone to inter-observer variability and misclassification.Existing machine lea...Breast cancer is among the leading causes of cancer mortality globally,and its diagnosis through histopathological image analysis is often prone to inter-observer variability and misclassification.Existing machine learning(ML)methods struggle with intra-class heterogeneity and inter-class similarity,necessitating more robust classification models.This study presents an ML classifier ensemble hybrid model for deep feature extraction with deep learning(DL)and Bat Swarm Optimization(BSO)hyperparameter optimization to improve breast cancer histopathology(BCH)image classification.A dataset of 804 Hematoxylin and Eosin(H&E)stained images classified as Benign,in situ,Invasive,and Normal categories(ICIAR2018_BACH_Challenge)has been utilized.ResNet50 was utilized for feature extraction,while Support Vector Machines(SVM),Random Forests(RF),XGBoosts(XGB),Decision Trees(DT),and AdaBoosts(ADB)were utilized for classification.BSO was utilized for hyperparameter optimization in a soft voting ensemble approach.Accuracy,precision,recall,specificity,F1-score,Receiver Operating Characteristic(ROC),and Precision-Recall(PR)were utilized for model performance metrics.The model using an ensemble outperformed individual classifiers in terms of having greater accuracy(~90.0%),precision(~86.4%),recall(~86.3%),and specificity(~96.6%).The robustness of the model was verified by both ROC and PR curves,which showed AUC values of 1.00,0.99,and 0.98 for Benign,Invasive,and in situ instances,respectively.This ensemble model delivers a strong and clinically valid methodology for breast cancer classification that enhances precision and minimizes diagnostic errors.Future work should focus on explainable AI,multi-modal fusion,few-shot learning,and edge computing for real-world deployment.展开更多
The assessment of beach quality is an important prerequisite for beach development and serves as the foundation for coastal zone management and sustainable development.This topic has attracted widespread attention,and...The assessment of beach quality is an important prerequisite for beach development and serves as the foundation for coastal zone management and sustainable development.This topic has attracted widespread attention,and various evaluation systems have been established.Given that beach quality assessment(BQA)involves multidimensional and nonlinear indicators,machine learning methods are well-suited to handling complex data relationships.However,current research utilizing machine learning for BQA often faces challenges such as limited evaluation indicators and difficulties in obtaining relevant data.in this study,a machine learning-based model for beach quality evaluation is proposed to address the limitations of existing evaluation frameworks,particular-ly under conditions of data scarcity.Simulated data were generated,and the analytic hierarchy process was integrated to extract fea-tures from 21 beach evaluation factors.A comparative analysis was conducted using the following four machine learning models:de-cision tree,random forest,XGBoost,and MLP.Results indicate that XGBoost(mean squared error(MSE)=0.1825,weighted F1=0.7513)and MLP(Pearson coefficient=0.6053)outperform traditional models.Furthermore,an ensemble learning model combining XGBoost and MLP was developed,substantially improving predictive performance(reducing MSE to 0.0753,increasing the Pearson coefficient to 0.8002,and achieving an F1 score of 0.783).Validation using real data from Yangkou Beach demonstrated that the model maintained an accuracy of 58%even when 5–10 evaluation factors had randomly missing values.展开更多
Heart disease prediction is a critical issue in healthcare,where accurate early diagnosis can save lives and reduce healthcare costs.The problem is inherently complex due to the high dimensionality of medical data,irr...Heart disease prediction is a critical issue in healthcare,where accurate early diagnosis can save lives and reduce healthcare costs.The problem is inherently complex due to the high dimensionality of medical data,irrelevant or redundant features,and the variability in risk factors such as age,lifestyle,andmedical history.These challenges often lead to inefficient and less accuratemodels.Traditional predictionmethodologies face limitations in effectively handling large feature sets and optimizing classification performance,which can result in overfitting poor generalization,and high computational cost.This work proposes a novel classification model for heart disease prediction that addresses these challenges by integrating feature selection through a Genetic Algorithm(GA)with an ensemble deep learning approach optimized using the Tunicate Swarm Algorithm(TSA).GA selects the most relevant features,reducing dimensionality and improvingmodel efficiency.Theselected features are then used to train an ensemble of deep learning models,where the TSA optimizes the weight of each model in the ensemble to enhance prediction accuracy.This hybrid approach addresses key challenges in the field,such as high dimensionality,redundant features,and classification performance,by introducing an efficient feature selection mechanism and optimizing the weighting of deep learning models in the ensemble.These enhancements result in a model that achieves superior accuracy,generalization,and efficiency compared to traditional methods.The proposed model demonstrated notable advancements in both prediction accuracy and computational efficiency over traditionalmodels.Specifically,it achieved an accuracy of 97.5%,a sensitivity of 97.2%,and a specificity of 97.8%.Additionally,with a 60-40 data split and 5-fold cross-validation,the model showed a significant reduction in training time(90 s),memory consumption(950 MB),and CPU usage(80%),highlighting its effectiveness in processing large,complex medical datasets for heart disease prediction.展开更多
Non-technical losses(NTL)of electric power are a serious problem for electric distribution companies.The solution determines the cost,stability,reliability,and quality of the supplied electricity.The widespread use of...Non-technical losses(NTL)of electric power are a serious problem for electric distribution companies.The solution determines the cost,stability,reliability,and quality of the supplied electricity.The widespread use of advanced metering infrastructure(AMI)and Smart Grid allows all participants in the distribution grid to store and track electricity consumption.During the research,a machine learning model is developed that allows analyzing and predicting the probability of NTL for each consumer of the distribution grid based on daily electricity consumption readings.This model is an ensemble meta-algorithm(stacking)that generalizes the algorithms of random forest,LightGBM,and a homogeneous ensemble of artificial neural networks.The best accuracy of the proposed meta-algorithm in comparison to basic classifiers is experimentally confirmed on the test sample.Such a model,due to good accuracy indicators(ROC-AUC-0.88),can be used as a methodological basis for a decision support system,the purpose of which is to form a sample of suspected NTL sources.The use of such a sample will allow the top management of electric distribution companies to increase the efficiency of raids by performers,making them targeted and accurate,which should contribute to the fight against NTL and the sustainable development of the electric power industry.展开更多
The biomass and coal co-pyrolysis (BCP) technology combines the advantages of both resources, achieving efficient resource complementarity, reducing reliance on coal, and minimizing pollutant emissions. However, this ...The biomass and coal co-pyrolysis (BCP) technology combines the advantages of both resources, achieving efficient resource complementarity, reducing reliance on coal, and minimizing pollutant emissions. However, this process still encounters numerous challenges in attaining optimal economic and environmental performance. Therefore, an ensemble learning (EL) framework is proposed for the BCP process in this study to optimize the synergistic benefits while minimizing negative environmental impacts. Six different ensemble learning models are developed to investigate the impact of input features, such as biomass characteristics, coal characteristics, and pyrolysis conditions on the product profit and CO_(2) emissions of the BCP processes. The Optuna method is further employed to automatically optimize the hyperparameters of BCP process models for enhancing their predictive accuracy and robustness. The results indicate that the categorical boosting (CAB) model of the BCP process has demonstrated exceptional performance in accurately predicting its product profit and CO_(2) emission (R2>0.92) after undergoing five-fold cross-validation. To enhance the interpretability of this preferred model, the Shapley additive explanations and partial dependence plot analyses are conducted to evaluate the impact and importance of biomass characteristics, coal characteristics, and pyrolysis conditions on the product profitability and CO_(2) emissions of the BCP processes. Finally, the preferred model coupled with a reference vector guided evolutionary algorithm is carried to identify the optimal conditions for maximizing the product profit of BCP process products while minimizing CO_(2) emissions. It indicates the optimal BCP process can achieve high product profits (5290.85 CNY·t−1) and low CO_(2) emissions (7.45 kg·t^(−1)).展开更多
Essential proteins are crucial for biological processes and can be identified through both experimental and computational methods.While experimental approaches are highly accurate,they often demand extensive time and ...Essential proteins are crucial for biological processes and can be identified through both experimental and computational methods.While experimental approaches are highly accurate,they often demand extensive time and resources.To address these challenges,we present a computational ensemble learning framework designed to identify essential proteins more efficiently.Our method begins by using node2vec to transform proteins in the protein–protein interaction(PPI)network into continuous,low-dimensional vectors.We also extract a range of features from protein sequences,including graph-theory-based,information-based,compositional,and physiochemical attributes.Additionally,we leverage deep learning techniques to analyze high-dimensional position-specific scoring matrices(PSSMs)and capture evolutionary information.We then combine these features for classification using various machine learning algorithms.To enhance performance,we integrate the outputs of these algorithms through ensemble methods such as voting,weighted averaging,and stacking.This approach effectively addresses data imbalances and improves both robustness and accuracy.Our ensemble learning framework achieves an AUC of 0.960 and an accuracy of 0.9252,outperforming other computational methods.These results demonstrate the effectiveness of our approach in accurately identifying essential proteins and highlight its superior feature extraction capabilities.展开更多
A mortality prediction model based on small acute myocardial infarction(AMI)patients coherent with low death rate is established.In total,1639 AMI patients are selected as research objects who received treatment in se...A mortality prediction model based on small acute myocardial infarction(AMI)patients coherent with low death rate is established.In total,1639 AMI patients are selected as research objects who received treatment in seven tertiary and secondary hospitals in Shanghai between January 1,2016 and January 1,2018.Among them,72 patients deceased during the two-year follow-up.Models are established with ensemble learning framework and machine learning algorithms based on 51 physiological indicators of the patient.Shapley additive explanations algorithm and univariate test with point-biserial and phi correlation coefficients are employed to determine significant features and rank feature importance.Based on 5-fold cross validation experiment and external validation,prediction model with self-paced ensemble framework and random forest algorithm achieves the best performance with area under receiver operating characteristic curve(AUROC)score of 0.911 and recall of 0.864.Both feature ranking methods showed that ejection fractions,serum creatinine(admission),hemoglobin and Killip class are the most important features.With these top-ranked features,the simplified prediction model is capable of achieving a comparable result with AUROC score of 0.872 and recall of 0.818.This work proposes a new method to establish mortality prediction models for AMI patients based on self-paced ensemble framework,which allows models to achieve high performance with small scale of patients coherent with low death rate.It will assist in medical decision and prognosis as a new reference.展开更多
Pangu-Weather(PGW),trained with deep learning–based methods(DL-based model),shows significant potential for global medium-range weather forecasting.However,the interpretability and trustworthiness of global medium-ra...Pangu-Weather(PGW),trained with deep learning–based methods(DL-based model),shows significant potential for global medium-range weather forecasting.However,the interpretability and trustworthiness of global medium-range DLbased models raise many concerns.This study uses the singular vector(SV)initial condition(IC)perturbations of the China Meteorological Administration's Global Ensemble Prediction System(CMA-GEPS)as inputs of PGW for global ensemble prediction(PGW-GEPS)to investigate the ensemble forecast sensitivity of DL-based models to the IC errors.Meanwhile,the CMA-GEPS forecasts serve as benchmarks for comparison and verification.The spatial structures and prediction performance of PGW-GEPS are discussed and compared to CMA-GEPS based on seasonal ensemble experiments.The results show that the ensemble mean and dispersion of PGW-GEPS are similar to those of CMA-GEPS in the medium range but with smoother forecasts.Meanwhile,PGW-GEPS is sensitive to the SV IC perturbations.Specifically,PGWGEPS can generate realistic ensemble spread beyond the sub-synoptic scale(wavenumbers≤64)with SV IC perturbations.However,PGW's kinetic energy is significantly reduced at the sub-synoptic scale,leading to error growth behavior inconsistent with CMA-GEPS at that scale.Thus,this behavior indicates that the effective resolution of PGW-GEPS is beyond the sub-synoptic scale and is limited to predicting mesoscale atmospheric motions.In terms of the global mediumrange ensemble prediction performance,the probability prediction skill of PGW-GEPS is comparable to CMA-GEPS in the extratropic when they use the same IC perturbations.That means that PGW has a general ability to provide skillful global medium-range forecasts with different ICs from numerical weather prediction.展开更多
This paper proposes a novel hybrid fraud detection framework that integrates multi-stage feature selection,unsupervised clustering,and ensemble learning to improve classification performance in financial transaction m...This paper proposes a novel hybrid fraud detection framework that integrates multi-stage feature selection,unsupervised clustering,and ensemble learning to improve classification performance in financial transaction monitoring systems.The framework is structured into three core layers:(1)feature selection using Recursive Feature Elimination(RFE),Principal Component Analysis(PCA),and Mutual Information(MI)to reduce dimensionality and enhance input relevance;(2)anomaly detection through unsupervised clustering using K-Means,Density-Based Spatial Clustering(DBSCAN),and Hierarchical Clustering to flag suspicious patterns in unlabeled data;and(3)final classification using a voting-based hybrid ensemble of Support Vector Machine(SVM),Random Forest(RF),and Gradient Boosting Classifier(GBC).The experimental evaluation is conducted on a synthetically generated dataset comprising one million financial transactions,with 5% labelled as fraudulent,simulating realistic fraud rates and behavioural features,including transaction time,origin,amount,and geo-location.The proposed model demonstrated a significant improvement over baseline classifiers,achieving an accuracy of 99%,a precision of 99%,a recall of 97%,and an F1-score of 99%.Compared to individual models,it yielded a 9% gain in overall detection accuracy.It reduced the false positive rate to below 3.5%,thereby minimising the operational costs associated with manually reviewing false alerts.The model’s interpretability is enhanced by the integration of Shapley Additive Explanations(SHAP)values for feature importance,supporting transparency and regulatory auditability.These results affirm the practical relevance of the proposed system for deployment in real-time fraud detection scenarios such as credit card transactions,mobile banking,and cross-border payments.The study also highlights future directions,including the deployment of lightweight models and the integration of multimodal data for scalable fraud analytics.展开更多
基金financial support provided by the Natural Science Foundation of Hebei Province,China(No.E2024105036)the Tangshan Talent Funding Project,China(Nos.B202302007 and A2021110015)+1 种基金the National Natural Science Foundation of China(No.52264042)the Australian Research Council(No.IH230100010)。
文摘Automated classification of gas flow states in blast furnaces using top-camera imagery typically demands a large volume of labeled data,whose manual annotation is both labor-intensive and cost-prohibitive.To mitigate this challenge,we present an enhanced semi-supervised learning approach based on the Mean Teacher framework,incorporating a novel feature loss module to maximize classification performance with limited labeled samples.The model studies show that the proposed model surpasses both the baseline Mean Teacher model and fully supervised method in accuracy.Specifically,for datasets with 20%,30%,and 40%label ratios,using a single training iteration,the model yields accuracies of 78.61%,82.21%,and 85.2%,respectively,while multiple-cycle training iterations achieves 82.09%,81.97%,and 81.59%,respectively.Furthermore,scenario-specific training schemes are introduced to support diverse deployment need.These findings highlight the potential of the proposed technique in minimizing labeling requirements and advancing intelligent blast furnace diagnostics.
基金the Deanship of Research and Graduate Studies at King Khalid University,KSA,for funding this work through the Large Research Project under grant number RGP2/164/46.
文摘Background:Stomach cancer(SC)is one of the most lethal malignancies worldwide due to late-stage diagnosis and limited treatment.The transcriptomic,epigenomic,and proteomic,etc.,omics datasets generated by high-throughput sequencing technology have become prominent in biomedical research,and they reveal molecular aspects of cancer diagnosis and therapy.Despite the development of advanced sequencing technology,the presence of high-dimensionality in multi-omics data makes it challenging to interpret the data.Methods:In this study,we introduce RankXLAN,an explainable ensemble-based multi-omics framework that integrates feature selection(FS),ensemble learning,bioinformatics,and in-silico validation for robust biomarker detection,potential therapeutic drug-repurposing candidates’identification,and classification of SC.To enhance the interpretability of the model,we incorporated explainable artificial intelligence(SHapley Additive exPlanations analysis),as well as accuracy,precision,F1-score,recall,cross-validation,specificity,likelihood ratio(LR)+,LR−,and Youden index results.Results:The experimental results showed that the top four FS algorithms achieved improved results when applied to the ensemble learning classification model.The proposed ensemble model produced an area under the curve(AUC)score of 0.994 for gene expression,0.97 for methylation,and 0.96 for miRNA expression data.Through the integration of bioinformatics and ML approach of the transcriptomic and epigenomic multi-omics dataset,we identified potential marker genes,namely,UBE2D2,HPCAL4,IGHA1,DPT,and FN3K.In-silico molecular docking revealed a strong binding affinity between ANKRD13C and the FDA-approved drug Everolimus(binding affinity−10.1 kcal/mol),identifying ANKRD13C as a potential therapeutic drug-repurposing target for SC.Conclusion:The proposed framework RankXLAN outperforms other existing frameworks for serum biomarker identification,therapeutic target identification,and SC classification with multi-omics datasets.
基金National Natural Science Foundation of China (52075420)Fundamental Research Funds for the Central Universities (xzy022023049)National Key Research and Development Program of China (2023YFB3408600)。
文摘The burgeoning market for lithium-ion batteries has stimulated a growing need for more reliable battery performance monitoring. Accurate state-of-health(SOH) estimation is critical for ensuring battery operational performance. Despite numerous data-driven methods reported in existing research for battery SOH estimation, these methods often exhibit inconsistent performance across different application scenarios. To address this issue and overcome the performance limitations of individual data-driven models,integrating multiple models for SOH estimation has received considerable attention. Ensemble learning(EL) typically leverages the strengths of multiple base models to achieve more robust and accurate outputs. However, the lack of a clear review of current research hinders the further development of ensemble methods in SOH estimation. Therefore, this paper comprehensively reviews multi-model ensemble learning methods for battery SOH estimation. First, existing ensemble methods are systematically categorized into 6 classes based on their combination strategies. Different realizations and underlying connections are meticulously analyzed for each category of EL methods, highlighting distinctions, innovations, and typical applications. Subsequently, these ensemble methods are comprehensively compared in terms of base models, combination strategies, and publication trends. Evaluations across 6 dimensions underscore the outstanding performance of stacking-based ensemble methods. Following this, these ensemble methods are further inspected from the perspectives of weighted ensemble and diversity, aiming to inspire potential approaches for enhancing ensemble performance. Moreover, addressing challenges such as base model selection, measuring model robustness and uncertainty, and interpretability of ensemble models in practical applications is emphasized. Finally, future research prospects are outlined, specifically noting that deep learning ensemble is poised to advance ensemble methods for battery SOH estimation. The convergence of advanced machine learning with ensemble learning is anticipated to yield valuable avenues for research. Accelerated research in ensemble learning holds promising prospects for achieving more accurate and reliable battery SOH estimation under real-world conditions.
基金supported by the National Natural Science Foundation of China(No.52207229)the Key Research and Development Program of Ningxia Hui Autonomous Region of China(No.2024BEE02003)+1 种基金the financial support from the AEGiS Research Grant 2024,University of Wollongong(No.R6254)the financial support from the China Scholarship Council(No.202207550010).
文摘Accurate prediction of the remaining useful life(RUL)is crucial for the design and management of lithium-ion batteries.Although various machine learning models offer promising predictions,one critical but often overlooked challenge is their demand for considerable run-to-failure data for training.Collection of such training data leads to prohibitive testing efforts as the run-to-failure tests can last for years.Here,we propose a semi-supervised representation learning method to enhance prediction accuracy by learning from data without RUL labels.Our approach builds on a sophisticated deep neural network that comprises an encoder and three decoder heads to extract time-dependent representation features from short-term battery operating data regardless of the existence of RUL labels.The approach is validated using three datasets collected from 34 batteries operating under various conditions,encompassing over 19,900 charge and discharge cycles.Our method achieves a root mean squared error(RMSE)within 25 cycles,even when only 1/50 of the training dataset is labelled,representing a reduction of 48%compared to the conventional approach.We also demonstrate the method's robustness with varying numbers of labelled data and different weights assigned to the three decoder heads.The projection of extracted features in low space reveals that our method effectively learns degradation features from unlabelled data.Our approach highlights the promise of utilising semi-supervised learning to reduce the data demand for reliability monitoring of energy devices.
基金funded by Taif University,Saudi Arabia,project No.(TU-DSPP-2024-263).
文摘Deep learning algorithms have been rapidly incorporated into many different applications due to the increase in computational power and the availability of massive amounts of data.Recently,both deep learning and ensemble learning have been used to recognize underlying structures and patterns from high-level features to make predictions/decisions.With the growth in popularity of deep learning and ensemble learning algorithms,they have received significant attention from both scientists and the industrial community due to their superior ability to learn features from big data.Ensemble deep learning has exhibited significant performance in enhancing learning generalization through the use of multiple deep learning algorithms.Although ensemble deep learning has large quantities of training parameters,which results in time and space overheads,it performs much better than traditional ensemble learning.Ensemble deep learning has been successfully used in several areas,such as bioinformatics,finance,and health care.In this paper,we review and investigate recent ensemble deep learning algorithms and techniques in health care domains,medical imaging,health care data analytics,genomics,diagnosis,disease prevention,and drug discovery.We cover several widely used deep learning algorithms along with their architectures,including deep neural networks(DNNs),convolutional neural networks(CNNs),recurrent neural networks(RNNs),and generative adversarial networks(GANs).Common healthcare tasks,such as medical imaging,electronic health records,and genomics,are also demonstrated.Furthermore,in this review,the challenges inherent in reducing the burden on the healthcare system are discussed and explored.Finally,future directions and opportunities for enhancing healthcare model performance are discussed.
基金supported by the National Natural Science Foundation of China(22379021 and 22479021)。
文摘As batteries become increasingly essential for energy storage technologies,battery prognosis,and diagnosis remain central to ensure reliable operation and effective management,as well as to aid the in-depth investigation of degradation mechanisms.However,dynamic operating conditions,cell-to-cell inconsistencies,and limited availability of labeled data have posed significant challenges to accurate and robust prognosis and diagnosis.Herein,we introduce a time-series-decomposition-based ensembled lightweight learning model(TELL-Me),which employs a synergistic dual-module framework to facilitate accurate and reliable forecasting.The feature module formulates features with physical implications and sheds light on battery aging mechanisms,while the gradient module monitors capacity degradation rates and captures aging trend.TELL-Me achieves high accuracy in end-of-life prediction using minimal historical data from a single battery without requiring offline training dataset,and demonstrates impressive generality and robustness across various operating conditions and battery types.Additionally,by correlating feature contributions with degradation mechanisms across different datasets,TELL-Me is endowed with the diagnostic ability that not only enhances prediction reliability but also provides critical insights into the design and optimization of next-generation batteries.
基金supported in part by National Natural Science Foundation of China No.92467109,U21A20478National Key R&D Program of China 2023YFA1011601the Major Key Project of PCL(Grant PCL2024A05).
文摘Ensemble learning,a pivotal branch of machine learning,amalgamates multiple base models to enhance the overarching performance of predictive models,capitalising on the diversity and collective wisdom of the ensemble to surpass individual models and mitigate overfitting.In this review,a four-layer research framework is established for the research of ensemble learning,which can offer a comprehensive and structured review of ensemble learning from bottom to top.Firstly,this survey commences by introducing fundamental ensemble learning techniques,including bagging,boosting,and stacking,while also exploring the ensemble's diversity.Then,deep ensemble learning and semi-supervised ensemble learning are studied in detail.Furthermore,the utilisation of ensemble learning techniques to navigate challenging datasets,such as imbalanced and highdimensional data,is discussed.The application of ensemble learning techniques across various research domains,including healthcare,transportation,finance,manufacturing,and the Internet,is also examined.The survey concludes by discussing challenges intrinsic to ensemble learning.
文摘Hepatocellular carcinoma(HCC)remains a leading cause of cancer-related mortality globally,necessitating advanced diagnostic tools to improve early detection and personalized targeted therapy.This review synthesizes evidence on explainable ensemble learning approaches for HCC classification,emphasizing their integration with clinical workflows and multi-omics data.A systematic analysis[including datasets such as The Cancer Genome Atlas,Gene Expression Omnibus,and the Surveillance,Epidemiology,and End Results(SEER)datasets]revealed that explainable ensemble learning models achieve high diagnostic accuracy by combining clinical features,serum biomarkers such as alpha-fetoprotein,imaging features such as computed tomography and magnetic resonance imaging,and genomic data.For instance,SHapley Additive exPlanations(SHAP)-based random forests trained on NCBI GSE14520 microarray data(n=445)achieved 96.53%accuracy,while stacking ensembles applied to the SEER program data(n=1897)demonstrated an area under the receiver operating characteristic curve of 0.779 for mortality prediction.Despite promising results,challenges persist,including the computational costs of SHAP and local interpretable model-agnostic explanations analyses(e.g.,TreeSHAP requiring distributed computing for metabolomics datasets)and dataset biases(e.g.,SEER’s Western population dominance limiting generalizability).Future research must address inter-cohort heterogeneity,standardize explainability metrics,and prioritize lightweight surrogate models for resource-limited settings.This review presents the potential of explainable ensemble learning frameworks to bridge the gap between predictive accuracy and clinical interpretability,though rigorous validation in independent,multi-center cohorts is critical for real-world deployment.
基金funded by Department of Robotics and Mechatronics Engineering,Kennesaw State University,Marietta,GA 30060,USA.
文摘Glaucoma,a chronic eye disease affecting millions worldwide,poses a substantial threat to eyesight and can result in permanent vision loss if left untreated.Manual identification of glaucoma is a complicated and time-consuming practice requiring specialized expertise and results may be subjective.To address these challenges,this research proposes a computer-aided diagnosis(CAD)approach using Artificial Intelligence(AI)techniques for binary and multiclass classification of glaucoma stages.An ensemble fusion mechanism that combines the outputs of three pre-trained convolutional neural network(ConvNet)models–ResNet-50,VGG-16,and InceptionV3 is utilized in this paper.This fusion technique enhances diagnostic accuracy and robustness by ensemble-averaging the predictions from individual models,leveraging their complementary strengths.The objective of this work is to assess the model’s capability for early-stage glaucoma diagnosis.Classification is performed on a dataset collected from the Harvard Dataverse repository.With the proposed technique,for Normal vs.Advanced glaucoma classification,a validation accuracy of 98.04%and testing accuracy of 98.03%is achieved,with a specificity of 100%which outperforms stateof-the-art methods.For multiclass classification,the suggested ensemble approach achieved a precision and sensitivity of 97%,specificity,and testing accuracy of 98.57%and 96.82%,respectively.The proposed E-GlauNet model has significant potential in assisting ophthalmologists in the screening and fast diagnosis of glaucoma,leading to more reliable,efficient,and timely diagnosis,particularly for early-stage detection and staging of the disease.While the proposed method demonstrates high accuracy and robustness,the study is limited by the evaluation of a single dataset.Future work will focus on external validation across diverse datasets and enhancing interpretability using explainable AI techniques.
基金funded by the Deanship of Graduate Studies and Scientific Research at Jouf University under grant No.(DGSSR-2023-02-02341).
文摘The potential applications of multimodal physiological signals in healthcare,pain monitoring,and clinical decision support systems have garnered significant attention in biomedical research.Subjective self-reporting is the foundation of conventional pain assessment methods,which may be unreliable.Deep learning is a promising alternative to resolve this limitation through automated pain classification.This paper proposes an ensemble deep-learning framework for pain assessment.The framework makes use of features collected from electromyography(EMG),skin conductance level(SCL),and electrocardiography(ECG)signals.We integrate Convolutional Neural Networks(CNN),Long Short-Term Memory Networks(LSTM),Bidirectional Gated Recurrent Units(BiGRU),and Deep Neural Networks(DNN)models.We then aggregate their predictions using a weighted averaging ensemble technique to increase the classification’s robustness.To improve computing efficiency and remove redundant features,we use Particle Swarm Optimization(PSO)for feature selection.This enables us to reduce the features’dimensionality without sacrificing the classification’s accuracy.With improved accuracy,precision,recall,and F1-score across all pain levels,the experimental results show that the suggested ensemble model performs better than individual deep learning classifiers.In our experiments,the suggested model achieved over 98%accuracy,suggesting promising automated pain assessment performance.However,due to differences in validation protocols,comparisons with previous studies are still limited.Combining deep learning and feature selection techniques significantly improves model generalization,reducing overfitting and enhancing classification performance.The evaluation was conducted using the BioVid Heat Pain Dataset,confirming the model’s effectiveness in distinguishing between different pain intensity levels.
基金funded by the National Natural Science Foundation of China(Grant No.52222708)。
文摘Accurately evaluating the safety status of lithium-ion battery systems in electric vehicles is imperative due to the challenges in effectively predicting potential battery failure risks under stochastic profiles.Complex battery fault mechanisms and limited poor-quality data collection impede fault detection for battery systems under real-world conditions.This paper proposes a novel graph-guided fault detection method designed to recognize concealed anomalies in realistic data.Graphs guided by physical relationships are constructed for learning the dynamic evolution of physical quantities under normal conditions and their potential change characteristics in fault scenarios.An ensemble Graph Sample and Aggregate Network model are developed to tackle sample distribution imbalances and non-uniformity battery system specifications across vehicles.Failure risk probabilities for diverse battery charging and discharging segments are derived.An ablation study verifies the necessity of ensemble learning in addressing imbalanced datasets.Analysis of 102,095 segments across 86 vehicles with different battery material systems,battery capacities,and numbers of cells and temperature sensors confirms the robustness and generalization of the proposed method,yielding a recall of 98.37%.By introducing the graph,spatio-temporal global fault characteristics of battery systems are automatically extracted.The coupling relationship and evolution of physical quantities under both normal and faulty states are established,effectively uncovering fault information hidden in collected battery data without observable anomalies.The safety state of battery systems is reflected in terms of failure risk probability,providing reliable data support for battery system maintenance.
文摘Breast cancer is among the leading causes of cancer mortality globally,and its diagnosis through histopathological image analysis is often prone to inter-observer variability and misclassification.Existing machine learning(ML)methods struggle with intra-class heterogeneity and inter-class similarity,necessitating more robust classification models.This study presents an ML classifier ensemble hybrid model for deep feature extraction with deep learning(DL)and Bat Swarm Optimization(BSO)hyperparameter optimization to improve breast cancer histopathology(BCH)image classification.A dataset of 804 Hematoxylin and Eosin(H&E)stained images classified as Benign,in situ,Invasive,and Normal categories(ICIAR2018_BACH_Challenge)has been utilized.ResNet50 was utilized for feature extraction,while Support Vector Machines(SVM),Random Forests(RF),XGBoosts(XGB),Decision Trees(DT),and AdaBoosts(ADB)were utilized for classification.BSO was utilized for hyperparameter optimization in a soft voting ensemble approach.Accuracy,precision,recall,specificity,F1-score,Receiver Operating Characteristic(ROC),and Precision-Recall(PR)were utilized for model performance metrics.The model using an ensemble outperformed individual classifiers in terms of having greater accuracy(~90.0%),precision(~86.4%),recall(~86.3%),and specificity(~96.6%).The robustness of the model was verified by both ROC and PR curves,which showed AUC values of 1.00,0.99,and 0.98 for Benign,Invasive,and in situ instances,respectively.This ensemble model delivers a strong and clinically valid methodology for breast cancer classification that enhances precision and minimizes diagnostic errors.Future work should focus on explainable AI,multi-modal fusion,few-shot learning,and edge computing for real-world deployment.
基金supported by the National Natural Science Foundation of China(Nos.82202299,62203060,62403492).
文摘The assessment of beach quality is an important prerequisite for beach development and serves as the foundation for coastal zone management and sustainable development.This topic has attracted widespread attention,and various evaluation systems have been established.Given that beach quality assessment(BQA)involves multidimensional and nonlinear indicators,machine learning methods are well-suited to handling complex data relationships.However,current research utilizing machine learning for BQA often faces challenges such as limited evaluation indicators and difficulties in obtaining relevant data.in this study,a machine learning-based model for beach quality evaluation is proposed to address the limitations of existing evaluation frameworks,particular-ly under conditions of data scarcity.Simulated data were generated,and the analytic hierarchy process was integrated to extract fea-tures from 21 beach evaluation factors.A comparative analysis was conducted using the following four machine learning models:de-cision tree,random forest,XGBoost,and MLP.Results indicate that XGBoost(mean squared error(MSE)=0.1825,weighted F1=0.7513)and MLP(Pearson coefficient=0.6053)outperform traditional models.Furthermore,an ensemble learning model combining XGBoost and MLP was developed,substantially improving predictive performance(reducing MSE to 0.0753,increasing the Pearson coefficient to 0.8002,and achieving an F1 score of 0.783).Validation using real data from Yangkou Beach demonstrated that the model maintained an accuracy of 58%even when 5–10 evaluation factors had randomly missing values.
文摘Heart disease prediction is a critical issue in healthcare,where accurate early diagnosis can save lives and reduce healthcare costs.The problem is inherently complex due to the high dimensionality of medical data,irrelevant or redundant features,and the variability in risk factors such as age,lifestyle,andmedical history.These challenges often lead to inefficient and less accuratemodels.Traditional predictionmethodologies face limitations in effectively handling large feature sets and optimizing classification performance,which can result in overfitting poor generalization,and high computational cost.This work proposes a novel classification model for heart disease prediction that addresses these challenges by integrating feature selection through a Genetic Algorithm(GA)with an ensemble deep learning approach optimized using the Tunicate Swarm Algorithm(TSA).GA selects the most relevant features,reducing dimensionality and improvingmodel efficiency.Theselected features are then used to train an ensemble of deep learning models,where the TSA optimizes the weight of each model in the ensemble to enhance prediction accuracy.This hybrid approach addresses key challenges in the field,such as high dimensionality,redundant features,and classification performance,by introducing an efficient feature selection mechanism and optimizing the weighting of deep learning models in the ensemble.These enhancements result in a model that achieves superior accuracy,generalization,and efficiency compared to traditional methods.The proposed model demonstrated notable advancements in both prediction accuracy and computational efficiency over traditionalmodels.Specifically,it achieved an accuracy of 97.5%,a sensitivity of 97.2%,and a specificity of 97.8%.Additionally,with a 60-40 data split and 5-fold cross-validation,the model showed a significant reduction in training time(90 s),memory consumption(950 MB),and CPU usage(80%),highlighting its effectiveness in processing large,complex medical datasets for heart disease prediction.
文摘Non-technical losses(NTL)of electric power are a serious problem for electric distribution companies.The solution determines the cost,stability,reliability,and quality of the supplied electricity.The widespread use of advanced metering infrastructure(AMI)and Smart Grid allows all participants in the distribution grid to store and track electricity consumption.During the research,a machine learning model is developed that allows analyzing and predicting the probability of NTL for each consumer of the distribution grid based on daily electricity consumption readings.This model is an ensemble meta-algorithm(stacking)that generalizes the algorithms of random forest,LightGBM,and a homogeneous ensemble of artificial neural networks.The best accuracy of the proposed meta-algorithm in comparison to basic classifiers is experimentally confirmed on the test sample.Such a model,due to good accuracy indicators(ROC-AUC-0.88),can be used as a methodological basis for a decision support system,the purpose of which is to form a sample of suspected NTL sources.The use of such a sample will allow the top management of electric distribution companies to increase the efficiency of raids by performers,making them targeted and accurate,which should contribute to the fight against NTL and the sustainable development of the electric power industry.
基金support from the National Natural Science Foundation of China(22108052).
文摘The biomass and coal co-pyrolysis (BCP) technology combines the advantages of both resources, achieving efficient resource complementarity, reducing reliance on coal, and minimizing pollutant emissions. However, this process still encounters numerous challenges in attaining optimal economic and environmental performance. Therefore, an ensemble learning (EL) framework is proposed for the BCP process in this study to optimize the synergistic benefits while minimizing negative environmental impacts. Six different ensemble learning models are developed to investigate the impact of input features, such as biomass characteristics, coal characteristics, and pyrolysis conditions on the product profit and CO_(2) emissions of the BCP processes. The Optuna method is further employed to automatically optimize the hyperparameters of BCP process models for enhancing their predictive accuracy and robustness. The results indicate that the categorical boosting (CAB) model of the BCP process has demonstrated exceptional performance in accurately predicting its product profit and CO_(2) emission (R2>0.92) after undergoing five-fold cross-validation. To enhance the interpretability of this preferred model, the Shapley additive explanations and partial dependence plot analyses are conducted to evaluate the impact and importance of biomass characteristics, coal characteristics, and pyrolysis conditions on the product profitability and CO_(2) emissions of the BCP processes. Finally, the preferred model coupled with a reference vector guided evolutionary algorithm is carried to identify the optimal conditions for maximizing the product profit of BCP process products while minimizing CO_(2) emissions. It indicates the optimal BCP process can achieve high product profits (5290.85 CNY·t−1) and low CO_(2) emissions (7.45 kg·t^(−1)).
基金financially supported by the National Key R&D Program of China(Grant No.2022YFF1202600)the National Natural Science Foundation of China(Grant No.82301158)+4 种基金Science and Technology Innovation Action Plan of Shanghai Science and Technology Committee(Grant No.22015820100)Two-hundred Talent Support(Grant No.20152224)Translational Medicine Innovation Project of Shanghai Jiao Tong University School of Medicine(Grant No.TM201915)Clinical Research Project of Multi-Disciplinary Team,Shanghai Ninth People’s Hospital,Shanghai Jiao Tong University School of Medicine(Grant No.201914)China Postdoctoral Science Foundation(Grant No.2023M742332)。
文摘Essential proteins are crucial for biological processes and can be identified through both experimental and computational methods.While experimental approaches are highly accurate,they often demand extensive time and resources.To address these challenges,we present a computational ensemble learning framework designed to identify essential proteins more efficiently.Our method begins by using node2vec to transform proteins in the protein–protein interaction(PPI)network into continuous,low-dimensional vectors.We also extract a range of features from protein sequences,including graph-theory-based,information-based,compositional,and physiochemical attributes.Additionally,we leverage deep learning techniques to analyze high-dimensional position-specific scoring matrices(PSSMs)and capture evolutionary information.We then combine these features for classification using various machine learning algorithms.To enhance performance,we integrate the outputs of these algorithms through ensemble methods such as voting,weighted averaging,and stacking.This approach effectively addresses data imbalances and improves both robustness and accuracy.Our ensemble learning framework achieves an AUC of 0.960 and an accuracy of 0.9252,outperforming other computational methods.These results demonstrate the effectiveness of our approach in accurately identifying essential proteins and highlight its superior feature extraction capabilities.
基金the National Natural Science Foundation of China(No.81900308)。
文摘A mortality prediction model based on small acute myocardial infarction(AMI)patients coherent with low death rate is established.In total,1639 AMI patients are selected as research objects who received treatment in seven tertiary and secondary hospitals in Shanghai between January 1,2016 and January 1,2018.Among them,72 patients deceased during the two-year follow-up.Models are established with ensemble learning framework and machine learning algorithms based on 51 physiological indicators of the patient.Shapley additive explanations algorithm and univariate test with point-biserial and phi correlation coefficients are employed to determine significant features and rank feature importance.Based on 5-fold cross validation experiment and external validation,prediction model with self-paced ensemble framework and random forest algorithm achieves the best performance with area under receiver operating characteristic curve(AUROC)score of 0.911 and recall of 0.864.Both feature ranking methods showed that ejection fractions,serum creatinine(admission),hemoglobin and Killip class are the most important features.With these top-ranked features,the simplified prediction model is capable of achieving a comparable result with AUROC score of 0.872 and recall of 0.818.This work proposes a new method to establish mortality prediction models for AMI patients based on self-paced ensemble framework,which allows models to achieve high performance with small scale of patients coherent with low death rate.It will assist in medical decision and prognosis as a new reference.
基金supported by the joint funds of the Chinese National Natural Science Foundation(NSFC)(Grant No.U2242213)the funds of the NSFC(Grant No.42341209)+2 种基金the National Key Research and Development(R&D)Program of the Ministry of Science and Technology of China(Grant No.2021YFC3000902)the National Science Foundation for Young Scholars(Grant No.42205166)the Joint Research Project for Meteorological Capacity Improvement(Grant No.22NLTSQ008)。
文摘Pangu-Weather(PGW),trained with deep learning–based methods(DL-based model),shows significant potential for global medium-range weather forecasting.However,the interpretability and trustworthiness of global medium-range DLbased models raise many concerns.This study uses the singular vector(SV)initial condition(IC)perturbations of the China Meteorological Administration's Global Ensemble Prediction System(CMA-GEPS)as inputs of PGW for global ensemble prediction(PGW-GEPS)to investigate the ensemble forecast sensitivity of DL-based models to the IC errors.Meanwhile,the CMA-GEPS forecasts serve as benchmarks for comparison and verification.The spatial structures and prediction performance of PGW-GEPS are discussed and compared to CMA-GEPS based on seasonal ensemble experiments.The results show that the ensemble mean and dispersion of PGW-GEPS are similar to those of CMA-GEPS in the medium range but with smoother forecasts.Meanwhile,PGW-GEPS is sensitive to the SV IC perturbations.Specifically,PGWGEPS can generate realistic ensemble spread beyond the sub-synoptic scale(wavenumbers≤64)with SV IC perturbations.However,PGW's kinetic energy is significantly reduced at the sub-synoptic scale,leading to error growth behavior inconsistent with CMA-GEPS at that scale.Thus,this behavior indicates that the effective resolution of PGW-GEPS is beyond the sub-synoptic scale and is limited to predicting mesoscale atmospheric motions.In terms of the global mediumrange ensemble prediction performance,the probability prediction skill of PGW-GEPS is comparable to CMA-GEPS in the extratropic when they use the same IC perturbations.That means that PGW has a general ability to provide skillful global medium-range forecasts with different ICs from numerical weather prediction.
基金funded by the Deanship of Scientific Research,Vice Presidency for Graduate Studies and Scientific Research,King Faisal University,Saudi Arabia[Grant No.KFU241683].
文摘This paper proposes a novel hybrid fraud detection framework that integrates multi-stage feature selection,unsupervised clustering,and ensemble learning to improve classification performance in financial transaction monitoring systems.The framework is structured into three core layers:(1)feature selection using Recursive Feature Elimination(RFE),Principal Component Analysis(PCA),and Mutual Information(MI)to reduce dimensionality and enhance input relevance;(2)anomaly detection through unsupervised clustering using K-Means,Density-Based Spatial Clustering(DBSCAN),and Hierarchical Clustering to flag suspicious patterns in unlabeled data;and(3)final classification using a voting-based hybrid ensemble of Support Vector Machine(SVM),Random Forest(RF),and Gradient Boosting Classifier(GBC).The experimental evaluation is conducted on a synthetically generated dataset comprising one million financial transactions,with 5% labelled as fraudulent,simulating realistic fraud rates and behavioural features,including transaction time,origin,amount,and geo-location.The proposed model demonstrated a significant improvement over baseline classifiers,achieving an accuracy of 99%,a precision of 99%,a recall of 97%,and an F1-score of 99%.Compared to individual models,it yielded a 9% gain in overall detection accuracy.It reduced the false positive rate to below 3.5%,thereby minimising the operational costs associated with manually reviewing false alerts.The model’s interpretability is enhanced by the integration of Shapley Additive Explanations(SHAP)values for feature importance,supporting transparency and regulatory auditability.These results affirm the practical relevance of the proposed system for deployment in real-time fraud detection scenarios such as credit card transactions,mobile banking,and cross-border payments.The study also highlights future directions,including the deployment of lightweight models and the integration of multimodal data for scalable fraud analytics.