Diabetes is increasing commonly in people’s daily life and represents an extraordinary threat to human well-being.Machine Learning(ML)in the healthcare industry has recently made headlines.Several ML models are devel...Diabetes is increasing commonly in people’s daily life and represents an extraordinary threat to human well-being.Machine Learning(ML)in the healthcare industry has recently made headlines.Several ML models are developed around different datasets for diabetic prediction.It is essential for ML models to predict diabetes accurately.Highly informative features of the dataset are vital to determine the capability factors of the model in the prediction of diabetes.Feature engineering(FE)is the way of taking forward in yielding highly informative features.Pima Indian Diabetes Dataset(PIDD)is used in this work,and the impact of informative features in ML models is experimented with and analyzed for the prediction of diabetes.Missing values(MV)and the effect of the imputation process in the data distribution of each feature are analyzed.Permutation importance and partial dependence are carried out extensively and the results revealed that Glucose(GLUC),Body Mass Index(BMI),and Insulin(INS)are highly informative features.Derived features are obtained for BMI and INS to add more information with its raw form.The ensemble classifier with an ensemble of AdaBoost(AB)and XGBoost(XB)is considered for the impact analysis of the proposed FE approach.The ensemble model performs well for the inclusion of derived features provided the high Diagnostics Odds Ratio(DOR)of 117.694.This shows a high margin of 8.2%when compared with the ensemble model with no derived features(DOR=96.306)included in the experiment.The inclusion of derived features with the FE approach of the current state-of-the-art made the ensemble model performs well with Sensitivity(0.793),Specificity(0.945),DOR(79.517),and False Omission Rate(0.090)which further improves the state-of-the-art results.展开更多
This research work proposes a new stack-based generalization ensemble model to forecast the number of incidences of conjunctivitis disease.In addition to forecasting the occurrences of conjunctivitis incidences,the pr...This research work proposes a new stack-based generalization ensemble model to forecast the number of incidences of conjunctivitis disease.In addition to forecasting the occurrences of conjunctivitis incidences,the proposed model also improves performance by using the ensemble model.Weekly rate of acute Conjunctivitis per 1000 for Hong Kong is collected for the duration of the first week of January 2010 to the last week of December 2019.Pre-processing techniques such as imputation of missing values and logarithmic transformation are applied to pre-process the data sets.A stacked generalization ensemble model based on Auto-ARIMA(Autoregressive Integrated Moving Average),NNAR(Neural Network Autoregression),ETS(Exponential Smoothing),HW(Holt Winter)is proposed and applied on the dataset.Predictive analysis is conducted on the collected dataset of conjunctivitis disease,and further compared for different performance measures.The result shows that the RMSE(Root Mean Square Error),MAE(Mean Absolute Error),MAPE(Mean Absolute Percentage Error),ACF1(Auto Correlation Function)of the proposed ensemble is decreased significantly.Considering the RMSE,for instance,error values are reduced by 39.23%,9.13%,20.42%,and 17.13%in comparison to Auto-ARIMA,NAR,ETS,and HW model respectively.This research concludes that the accuracy of the forecasting of diseases can be significantly increased by applying the proposed stack generalization ensemble model as it minimizes the prediction error and hence provides better prediction trends as compared to Auto-ARIMA,NAR,ETS,and HW model applied discretely.展开更多
Background:Stomach cancer(SC)is one of the most lethal malignancies worldwide due to late-stage diagnosis and limited treatment.The transcriptomic,epigenomic,and proteomic,etc.,omics datasets generated by high-through...Background:Stomach cancer(SC)is one of the most lethal malignancies worldwide due to late-stage diagnosis and limited treatment.The transcriptomic,epigenomic,and proteomic,etc.,omics datasets generated by high-throughput sequencing technology have become prominent in biomedical research,and they reveal molecular aspects of cancer diagnosis and therapy.Despite the development of advanced sequencing technology,the presence of high-dimensionality in multi-omics data makes it challenging to interpret the data.Methods:In this study,we introduce RankXLAN,an explainable ensemble-based multi-omics framework that integrates feature selection(FS),ensemble learning,bioinformatics,and in-silico validation for robust biomarker detection,potential therapeutic drug-repurposing candidates’identification,and classification of SC.To enhance the interpretability of the model,we incorporated explainable artificial intelligence(SHapley Additive exPlanations analysis),as well as accuracy,precision,F1-score,recall,cross-validation,specificity,likelihood ratio(LR)+,LR−,and Youden index results.Results:The experimental results showed that the top four FS algorithms achieved improved results when applied to the ensemble learning classification model.The proposed ensemble model produced an area under the curve(AUC)score of 0.994 for gene expression,0.97 for methylation,and 0.96 for miRNA expression data.Through the integration of bioinformatics and ML approach of the transcriptomic and epigenomic multi-omics dataset,we identified potential marker genes,namely,UBE2D2,HPCAL4,IGHA1,DPT,and FN3K.In-silico molecular docking revealed a strong binding affinity between ANKRD13C and the FDA-approved drug Everolimus(binding affinity−10.1 kcal/mol),identifying ANKRD13C as a potential therapeutic drug-repurposing target for SC.Conclusion:The proposed framework RankXLAN outperforms other existing frameworks for serum biomarker identification,therapeutic target identification,and SC classification with multi-omics datasets.展开更多
Modern intrusion detection systems(MIDS)face persistent challenges in coping with the rapid evolution of cyber threats,high-volume network traffic,and imbalanced datasets.Traditional models often lack the robustness a...Modern intrusion detection systems(MIDS)face persistent challenges in coping with the rapid evolution of cyber threats,high-volume network traffic,and imbalanced datasets.Traditional models often lack the robustness and explainability required to detect novel and sophisticated attacks effectively.This study introduces an advanced,explainable machine learning framework for multi-class IDS using the KDD99 and IDS datasets,which reflects real-world network behavior through a blend of normal and diverse attack classes.The methodology begins with sophisticated data preprocessing,incorporating both RobustScaler and QuantileTransformer to address outliers and skewed feature distributions,ensuring standardized and model-ready inputs.Critical dimensionality reduction is achieved via the Harris Hawks Optimization(HHO)algorithm—a nature-inspired metaheuristic modeled on hawks’hunting strategies.HHO efficiently identifies the most informative features by optimizing a fitness function based on classification performance.Following feature selection,the SMOTE is applied to the training data to resolve class imbalance by synthetically augmenting underrepresented attack types.The stacked architecture is then employed,combining the strengths of XGBoost,SVM,and RF as base learners.This layered approach improves prediction robustness and generalization by balancing bias and variance across diverse classifiers.The model was evaluated using standard classification metrics:precision,recall,F1-score,and overall accuracy.The best overall performance was recorded with an accuracy of 99.44%for UNSW-NB15,demonstrating the model’s effectiveness.After balancing,the model demonstrated a clear improvement in detecting the attacks.We tested the model on four datasets to show the effectiveness of the proposed approach and performed the ablation study to check the effect of each parameter.Also,the proposed model is computationaly efficient.To support transparency and trust in decision-making,explainable AI(XAI)techniques are incorporated that provides both global and local insight into feature contributions,and offers intuitive visualizations for individual predictions.This makes it suitable for practical deployment in cybersecurity environments that demand both precision and accountability.展开更多
Processes supported by process-aware information systems are subject to continuous and often subtle changes due to evolving operational,organizational,or regulatory factors.These changes,referred to as incremental con...Processes supported by process-aware information systems are subject to continuous and often subtle changes due to evolving operational,organizational,or regulatory factors.These changes,referred to as incremental concept drift,gradually alter the behavior or structure of processes,making their detection and localization a challenging task.Traditional process mining techniques frequently assume process stationarity and are limited in their ability to detect such drift,particularly from a control-flow perspective.The objective of this research is to develop an interpretable and robust framework capable of detecting and localizing incremental concept drift in event logs,with a specific emphasis on the structural evolution of control-flow semantics in processes.We propose DriftXMiner,a control-flow-aware hybrid framework that combines statistical,machine learning,and process model analysis techniques.The approach comprises three key components:(1)Cumulative Drift Scanner that tracks directional statistical deviations to detect early drift signals;(2)a Temporal Clustering and Drift-Aware Forest Ensemble(DAFE)to capture distributional and classification-level changes in process behavior;and(3)Petri net-based process model reconstruction,which enables the precise localization of structural drift using transition deviation metrics and replay fitness scores.Experimental validation on the BPI Challenge 2017 event log demonstrates that DriftXMiner effectively identifies and localizes gradual and incremental process drift over time.The framework achieves a detection accuracy of 92.5%,a localization precision of 90.3%,and an F1-score of 0.91,outperforming competitive baselines such as CUSUM+Histograms and ADWIN+Alpha Miner.Visual analyses further confirm that identified drift points align with transitions in control-flow models and behavioral cluster structures.DriftXMiner offers a novel and interpretable solution for incremental concept drift detection and localization in dynamic,process-aware systems.By integrating statistical signal accumulation,temporal behavior profiling,and structural process mining,the framework enables finegrained drift explanation and supports adaptive process intelligence in evolving environments.Its modular architecture supports extension to streaming data and real-time monitoring contexts.展开更多
The surge in smishing attacks underscores the urgent need for robust,real-time detection systems powered by advanced deep learning models.This paper introduces PhishNet,a novel ensemble learning framework that integra...The surge in smishing attacks underscores the urgent need for robust,real-time detection systems powered by advanced deep learning models.This paper introduces PhishNet,a novel ensemble learning framework that integrates transformer-based models(RoBERTa)and large language models(LLMs)(GPT-OSS 120B,LLaMA3.370B,and Qwen332B)to enhance smishing detection performance significantly.To mitigate class imbalance,we apply synthetic data augmentation using T5 and leverage various text preprocessing techniques.Our system employs a duallayer voting mechanism:weighted majority voting among LLMs and a final ensemble vote to classify messages as ham,spam,or smishing.Experimental results show an average accuracy improvement from 96%to 98.5%compared to the best standalone transformer,and from 93%to 98.5%when compared to LLMs across datasets.Furthermore,we present a real-time,user-friendly application to operationalize our detection model for practical use.PhishNet demonstrates superior scalability,usability,and detection accuracy,filling critical gaps in current smishing detection methodologies.展开更多
Optical non-reciprocity is a fundamental phenomenon in photonics.It is crucial for developing devices that rely on directional signal control,such as optical isolators and circulators.However,most research in this fie...Optical non-reciprocity is a fundamental phenomenon in photonics.It is crucial for developing devices that rely on directional signal control,such as optical isolators and circulators.However,most research in this field has focused on systems in equilibrium or steady states.In this work,we demonstrate a room-temperature Rydberg atomic platform where the unidirectional propagation of light acts as a switch to mediate time-crystalline-like collective oscillations through atomic synchronization.展开更多
This study was aimed to prepare landslide susceptibility maps for the Pithoragarh district in Uttarakhand,India,using advanced ensemble models that combined Radial Basis Function Networks(RBFN)with three ensemble lear...This study was aimed to prepare landslide susceptibility maps for the Pithoragarh district in Uttarakhand,India,using advanced ensemble models that combined Radial Basis Function Networks(RBFN)with three ensemble learning techniques:DAGGING(DG),MULTIBOOST(MB),and ADABOOST(AB).This combination resulted in three distinct ensemble models:DG-RBFN,MB-RBFN,and AB-RBFN.Additionally,a traditional weighted method,Information Value(IV),and a benchmark machine learning(ML)model,Multilayer Perceptron Neural Network(MLP),were employed for comparison and validation.The models were developed using ten landslide conditioning factors,which included slope,aspect,elevation,curvature,land cover,geomorphology,overburden depth,lithology,distance to rivers and distance to roads.These factors were instrumental in predicting the output variable,which was the probability of landslide occurrence.Statistical analysis of the models’performance indicated that the DG-RBFN model,with an Area Under ROC Curve(AUC)of 0.931,outperformed the other models.The AB-RBFN model achieved an AUC of 0.929,the MB-RBFN model had an AUC of 0.913,and the MLP model recorded an AUC of 0.926.These results suggest that the advanced ensemble ML model DG-RBFN was more accurate than traditional statistical model,single MLP model,and other ensemble models in preparing trustworthy landslide susceptibility maps,thereby enhancing land use planning and decision-making.展开更多
The burgeoning market for lithium-ion batteries has stimulated a growing need for more reliable battery performance monitoring. Accurate state-of-health(SOH) estimation is critical for ensuring battery operational per...The burgeoning market for lithium-ion batteries has stimulated a growing need for more reliable battery performance monitoring. Accurate state-of-health(SOH) estimation is critical for ensuring battery operational performance. Despite numerous data-driven methods reported in existing research for battery SOH estimation, these methods often exhibit inconsistent performance across different application scenarios. To address this issue and overcome the performance limitations of individual data-driven models,integrating multiple models for SOH estimation has received considerable attention. Ensemble learning(EL) typically leverages the strengths of multiple base models to achieve more robust and accurate outputs. However, the lack of a clear review of current research hinders the further development of ensemble methods in SOH estimation. Therefore, this paper comprehensively reviews multi-model ensemble learning methods for battery SOH estimation. First, existing ensemble methods are systematically categorized into 6 classes based on their combination strategies. Different realizations and underlying connections are meticulously analyzed for each category of EL methods, highlighting distinctions, innovations, and typical applications. Subsequently, these ensemble methods are comprehensively compared in terms of base models, combination strategies, and publication trends. Evaluations across 6 dimensions underscore the outstanding performance of stacking-based ensemble methods. Following this, these ensemble methods are further inspected from the perspectives of weighted ensemble and diversity, aiming to inspire potential approaches for enhancing ensemble performance. Moreover, addressing challenges such as base model selection, measuring model robustness and uncertainty, and interpretability of ensemble models in practical applications is emphasized. Finally, future research prospects are outlined, specifically noting that deep learning ensemble is poised to advance ensemble methods for battery SOH estimation. The convergence of advanced machine learning with ensemble learning is anticipated to yield valuable avenues for research. Accelerated research in ensemble learning holds promising prospects for achieving more accurate and reliable battery SOH estimation under real-world conditions.展开更多
In this article,our nonlinear theory and technology for reducing the uncertainties of high-impact ocean‒atmosphere event predictions,with the conditional nonlinear optimal perturbation(CNOP)method as its core,are revi...In this article,our nonlinear theory and technology for reducing the uncertainties of high-impact ocean‒atmosphere event predictions,with the conditional nonlinear optimal perturbation(CNOP)method as its core,are reviewed,and the“spring predictability barrier”problem for El Nino‒Southern Oscillation events and targeted observation issues for tropical cyclone forecasts are taken as two representative examples.Nonlinear theory reveals that initial errors of particular spatial structures,environmental conditions,and nonlinear processes contribute to significant prediction errors,whereas nonlinear technology provides a pioneering approach for reducing observational and forecast errors via targeted observations through the application of the CNOP method.Follow-up research further validates the scientific rigor of the theory in revealing the nonlinear mechanism of significant prediction errors,and relevant practical field campaigns for targeted observations verify the effectiveness of the technology in reducing prediction uncertainties.The CNOP method has achieved international recognition;furthermore,its applications further extend to ensemble forecasts for weather and climate and further enrich the nonlinear technology for reducing prediction uncertainties.It is expected that this nonlinear theory and technology will play a considerably important role in reducing prediction uncertainties for high-impact weather and climate events.展开更多
Deep learning algorithms have been rapidly incorporated into many different applications due to the increase in computational power and the availability of massive amounts of data.Recently,both deep learning and ensem...Deep learning algorithms have been rapidly incorporated into many different applications due to the increase in computational power and the availability of massive amounts of data.Recently,both deep learning and ensemble learning have been used to recognize underlying structures and patterns from high-level features to make predictions/decisions.With the growth in popularity of deep learning and ensemble learning algorithms,they have received significant attention from both scientists and the industrial community due to their superior ability to learn features from big data.Ensemble deep learning has exhibited significant performance in enhancing learning generalization through the use of multiple deep learning algorithms.Although ensemble deep learning has large quantities of training parameters,which results in time and space overheads,it performs much better than traditional ensemble learning.Ensemble deep learning has been successfully used in several areas,such as bioinformatics,finance,and health care.In this paper,we review and investigate recent ensemble deep learning algorithms and techniques in health care domains,medical imaging,health care data analytics,genomics,diagnosis,disease prevention,and drug discovery.We cover several widely used deep learning algorithms along with their architectures,including deep neural networks(DNNs),convolutional neural networks(CNNs),recurrent neural networks(RNNs),and generative adversarial networks(GANs).Common healthcare tasks,such as medical imaging,electronic health records,and genomics,are also demonstrated.Furthermore,in this review,the challenges inherent in reducing the burden on the healthcare system are discussed and explored.Finally,future directions and opportunities for enhancing healthcare model performance are discussed.展开更多
As batteries become increasingly essential for energy storage technologies,battery prognosis,and diagnosis remain central to ensure reliable operation and effective management,as well as to aid the in-depth investigat...As batteries become increasingly essential for energy storage technologies,battery prognosis,and diagnosis remain central to ensure reliable operation and effective management,as well as to aid the in-depth investigation of degradation mechanisms.However,dynamic operating conditions,cell-to-cell inconsistencies,and limited availability of labeled data have posed significant challenges to accurate and robust prognosis and diagnosis.Herein,we introduce a time-series-decomposition-based ensembled lightweight learning model(TELL-Me),which employs a synergistic dual-module framework to facilitate accurate and reliable forecasting.The feature module formulates features with physical implications and sheds light on battery aging mechanisms,while the gradient module monitors capacity degradation rates and captures aging trend.TELL-Me achieves high accuracy in end-of-life prediction using minimal historical data from a single battery without requiring offline training dataset,and demonstrates impressive generality and robustness across various operating conditions and battery types.Additionally,by correlating feature contributions with degradation mechanisms across different datasets,TELL-Me is endowed with the diagnostic ability that not only enhances prediction reliability but also provides critical insights into the design and optimization of next-generation batteries.展开更多
Three common species of Miniopterus fuliginosus,M.magnater and M.pusillus are known to inhabit China.However,M.fuliginosus and M.magnater are so similar in external morphology as to pose great challenges for accurate ...Three common species of Miniopterus fuliginosus,M.magnater and M.pusillus are known to inhabit China.However,M.fuliginosus and M.magnater are so similar in external morphology as to pose great challenges for accurate classification.Furthermore,taxonomic statuses,distribution ranges and taxonomic keys of these three species have remained controversial.For addressing these outstanding issues,the authors integrated molecular phylogenetic analyses,ensemble species distribution models(ESDMs),multiple morphological comparisons and decision tree algorithms for reassessing their taxonomy and distribution in China.Mitochondrial cytochrome c oxidase subunit I(COI)gene phylogeny revealed three distinct monophyletic groups corresponding to M.fuliginosus,M.magnater and M.pusillus.And the observed distribution patterns indicated M.fuliginosus had a broad distribution across China while M.magnater and M.pusillus exhibited a more restricted distribution,overlapping with M.fuliginosus in South China.And cranial morphometry indicated M.magnater was slightly larger than M.fuliginosus and significantly larger than M.pusillus.Also three-dimensional(3D)skull geomorphometry uncovered distinct features for each species in rostrum,braincase,tympanic bullae and mandibular shape.Decision tree algorithms helped to identify forearm length,braincase breadth and width across the third upper molars as three major taxonomic keys for assisting species identification.This study corroborated the importance of integrative approaches for identifying Miniopterus species and validated a methodological approach applicable to other cryptic species complexes.展开更多
Smart manufacturing and Industry 4.0 are transforming traditional manufacturing processes by utilizing innovative technologies such as the artificial intelligence(AI)and internet of things(IoT)to enhance efficiency,re...Smart manufacturing and Industry 4.0 are transforming traditional manufacturing processes by utilizing innovative technologies such as the artificial intelligence(AI)and internet of things(IoT)to enhance efficiency,reduce costs,and ensure product quality.In light of the recent advancement of Industry 4.0,identifying defects has become important for ensuring the quality of products during the manufacturing process.In this research,we present an ensemble methodology for accurately classifying hot rolled steel surface defects by combining the strengths of four pre-trained convolutional neural network(CNN)architectures:VGG16,VGG19,Xception,and Mobile-Net V2,compensating for their individual weaknesses.We evaluated our methodology on the Xsteel surface defect dataset(XSDD),which comprises seven different classes.The ensemble methodology integrated the predictions of individual models through two methods:model averaging and weighted averaging.Our evaluation showed that the model averaging ensemble achieved an accuracy of 98.89%,a recall of 98.92%,a precision of 99.05%,and an F1-score of 98.97%,while the weighted averaging ensemble reached an accuracy of 99.72%,a recall of 99.74%,a precision of 99.67%,and an F1-score of 99.70%.The proposed weighted averaging ensemble model outperformed the model averaging method and the individual models in detecting defects in terms of accuracy,recall,precision,and F1-score.Comparative analysis with recent studies also showed the superior performance of our methodology.展开更多
As an essential candidate for environment-friendly luminescent quantum dots(QDs),CuInS-based QDs have attracted more attention in recent years.However,several drawbacks still hamper their industrial applications,such ...As an essential candidate for environment-friendly luminescent quantum dots(QDs),CuInS-based QDs have attracted more attention in recent years.However,several drawbacks still hamper their industrial applications,such as lower photoluminescence quantum yield(PLQY),complex synthetic pathways,uncontrollable emission spectra,and insufficient photostability.In this study,CuInZnS@ZnS core/shell QDs was prepared via a one-pot/three-step synthetic scheme with accurate and tunable control of PL spectra.Then their ensemble spectroscopic properties during nucleation formation,alloying,and ZnS shell growth processes were systematically investigated.PL peaks of these QDs can be precisely manipulated from 530 to 850 nm by controlling the stoichiometric ratio of Cu/In,Zn^(2+)doping and ZnS shell growth.In particular,CuInZnS@ZnS QDs possess a significantly long emission lifetime(up to 750 ns),high PLQY(up to 85%),and excellent crystallinity.Their spectroscopic evolution is well validated by Cu-deficient related intragap emission model.By controlling the stoichiometric ratio of Cu/In,two distinct Cu-deficient related emission pathways are established based on the differing oxidation states of Cu defects.Therefore,this work provides deeper insights for fabricating high luminescent ternary or quaternary-alloyed QDs.展开更多
On a compact Riemann surface with finite punctures P_(1),…P_(k),we define toric curves as multivalued,totallyunramified holomorphic maps to P^(n)with monodromy in a maximal torus of PSU(n+1).Toric solutions to SU(n+1...On a compact Riemann surface with finite punctures P_(1),…P_(k),we define toric curves as multivalued,totallyunramified holomorphic maps to P^(n)with monodromy in a maximal torus of PSU(n+1).Toric solutions to SU(n+1)Todasystems on X\{P_(1);…;P_(k)}are recognized by the associated toric curves in.We introduce character n-ensembles as-tuples of meromorphic one-forms with simple poles and purely imaginary periods,generating toric curves on minus finitelymany points.On X,we establish a correspondence between character-ensembles and toric solutions to the SU(n+1)system with finitely many cone singularities.Our approach not only broadens seminal solutions with two conesingularities on the Riemann sphere,as classified by Jost-Wang(Int.Math.Res.Not.,2002,(6):277-290)andLin-Wei-Ye(Invent.Math.,2012,190(1):169-207),but also advances beyond the limits of Lin-Yang-Zhong’s existencetheorems(J.Differential Geom.,2020,114(2):337-391)by introducing a new solution class.展开更多
吉林大学计算机科学与技术学院2022级博士研究生杨雨欣为第一作者的论文“Ensemble Conformal Predictor (En CP):A New Conformal Predictor with Robustness Guarantees Against Data Poisoning Attacks”被IEEE Symposium on Securit...吉林大学计算机科学与技术学院2022级博士研究生杨雨欣为第一作者的论文“Ensemble Conformal Predictor (En CP):A New Conformal Predictor with Robustness Guarantees Against Data Poisoning Attacks”被IEEE Symposium on Security and Privacy (IEEE S&P 2026)接收。作者还包括杨雨欣的指导教师教授李强、吉林大学人工智能学院博士研究生封润洋,共同通信作者是美国丰田工业大学芝加哥分校教授Liren Shan和美国伊利诺伊理工大学教授Binghui Wang。展开更多
In this paper,we consider the Fisher informations among three classical type β-ensembles when β>0 scales with n satisfying lim βn=∞.We offer the exact order of-the corresponding two Fisher informations,which in...In this paper,we consider the Fisher informations among three classical type β-ensembles when β>0 scales with n satisfying lim βn=∞.We offer the exact order of-the corresponding two Fisher informations,which indicates that theβ-Laguerre ensembles do not satisfy the logarithmic Sobolev inequality.We also give some limit theorems on the extremals of β-Jacobi ensembles for β>0 fixed.展开更多
Hepatocellular carcinoma(HCC)remains a leading cause of cancer-related mortality globally,necessitating advanced diagnostic tools to improve early detection and personalized targeted therapy.This review synthesizes ev...Hepatocellular carcinoma(HCC)remains a leading cause of cancer-related mortality globally,necessitating advanced diagnostic tools to improve early detection and personalized targeted therapy.This review synthesizes evidence on explainable ensemble learning approaches for HCC classification,emphasizing their integration with clinical workflows and multi-omics data.A systematic analysis[including datasets such as The Cancer Genome Atlas,Gene Expression Omnibus,and the Surveillance,Epidemiology,and End Results(SEER)datasets]revealed that explainable ensemble learning models achieve high diagnostic accuracy by combining clinical features,serum biomarkers such as alpha-fetoprotein,imaging features such as computed tomography and magnetic resonance imaging,and genomic data.For instance,SHapley Additive exPlanations(SHAP)-based random forests trained on NCBI GSE14520 microarray data(n=445)achieved 96.53%accuracy,while stacking ensembles applied to the SEER program data(n=1897)demonstrated an area under the receiver operating characteristic curve of 0.779 for mortality prediction.Despite promising results,challenges persist,including the computational costs of SHAP and local interpretable model-agnostic explanations analyses(e.g.,TreeSHAP requiring distributed computing for metabolomics datasets)and dataset biases(e.g.,SEER’s Western population dominance limiting generalizability).Future research must address inter-cohort heterogeneity,standardize explainability metrics,and prioritize lightweight surrogate models for resource-limited settings.This review presents the potential of explainable ensemble learning frameworks to bridge the gap between predictive accuracy and clinical interpretability,though rigorous validation in independent,multi-center cohorts is critical for real-world deployment.展开更多
文摘Diabetes is increasing commonly in people’s daily life and represents an extraordinary threat to human well-being.Machine Learning(ML)in the healthcare industry has recently made headlines.Several ML models are developed around different datasets for diabetic prediction.It is essential for ML models to predict diabetes accurately.Highly informative features of the dataset are vital to determine the capability factors of the model in the prediction of diabetes.Feature engineering(FE)is the way of taking forward in yielding highly informative features.Pima Indian Diabetes Dataset(PIDD)is used in this work,and the impact of informative features in ML models is experimented with and analyzed for the prediction of diabetes.Missing values(MV)and the effect of the imputation process in the data distribution of each feature are analyzed.Permutation importance and partial dependence are carried out extensively and the results revealed that Glucose(GLUC),Body Mass Index(BMI),and Insulin(INS)are highly informative features.Derived features are obtained for BMI and INS to add more information with its raw form.The ensemble classifier with an ensemble of AdaBoost(AB)and XGBoost(XB)is considered for the impact analysis of the proposed FE approach.The ensemble model performs well for the inclusion of derived features provided the high Diagnostics Odds Ratio(DOR)of 117.694.This shows a high margin of 8.2%when compared with the ensemble model with no derived features(DOR=96.306)included in the experiment.The inclusion of derived features with the FE approach of the current state-of-the-art made the ensemble model performs well with Sensitivity(0.793),Specificity(0.945),DOR(79.517),and False Omission Rate(0.090)which further improves the state-of-the-art results.
基金The authors would like to express their gratitude to Taif University,Taif,Saudi Arabia for providing administrative and technical support.This work was supported by the Taif University Researchers supporting Project number(TURSP-2020/254).
文摘This research work proposes a new stack-based generalization ensemble model to forecast the number of incidences of conjunctivitis disease.In addition to forecasting the occurrences of conjunctivitis incidences,the proposed model also improves performance by using the ensemble model.Weekly rate of acute Conjunctivitis per 1000 for Hong Kong is collected for the duration of the first week of January 2010 to the last week of December 2019.Pre-processing techniques such as imputation of missing values and logarithmic transformation are applied to pre-process the data sets.A stacked generalization ensemble model based on Auto-ARIMA(Autoregressive Integrated Moving Average),NNAR(Neural Network Autoregression),ETS(Exponential Smoothing),HW(Holt Winter)is proposed and applied on the dataset.Predictive analysis is conducted on the collected dataset of conjunctivitis disease,and further compared for different performance measures.The result shows that the RMSE(Root Mean Square Error),MAE(Mean Absolute Error),MAPE(Mean Absolute Percentage Error),ACF1(Auto Correlation Function)of the proposed ensemble is decreased significantly.Considering the RMSE,for instance,error values are reduced by 39.23%,9.13%,20.42%,and 17.13%in comparison to Auto-ARIMA,NAR,ETS,and HW model respectively.This research concludes that the accuracy of the forecasting of diseases can be significantly increased by applying the proposed stack generalization ensemble model as it minimizes the prediction error and hence provides better prediction trends as compared to Auto-ARIMA,NAR,ETS,and HW model applied discretely.
基金the Deanship of Research and Graduate Studies at King Khalid University,KSA,for funding this work through the Large Research Project under grant number RGP2/164/46.
文摘Background:Stomach cancer(SC)is one of the most lethal malignancies worldwide due to late-stage diagnosis and limited treatment.The transcriptomic,epigenomic,and proteomic,etc.,omics datasets generated by high-throughput sequencing technology have become prominent in biomedical research,and they reveal molecular aspects of cancer diagnosis and therapy.Despite the development of advanced sequencing technology,the presence of high-dimensionality in multi-omics data makes it challenging to interpret the data.Methods:In this study,we introduce RankXLAN,an explainable ensemble-based multi-omics framework that integrates feature selection(FS),ensemble learning,bioinformatics,and in-silico validation for robust biomarker detection,potential therapeutic drug-repurposing candidates’identification,and classification of SC.To enhance the interpretability of the model,we incorporated explainable artificial intelligence(SHapley Additive exPlanations analysis),as well as accuracy,precision,F1-score,recall,cross-validation,specificity,likelihood ratio(LR)+,LR−,and Youden index results.Results:The experimental results showed that the top four FS algorithms achieved improved results when applied to the ensemble learning classification model.The proposed ensemble model produced an area under the curve(AUC)score of 0.994 for gene expression,0.97 for methylation,and 0.96 for miRNA expression data.Through the integration of bioinformatics and ML approach of the transcriptomic and epigenomic multi-omics dataset,we identified potential marker genes,namely,UBE2D2,HPCAL4,IGHA1,DPT,and FN3K.In-silico molecular docking revealed a strong binding affinity between ANKRD13C and the FDA-approved drug Everolimus(binding affinity−10.1 kcal/mol),identifying ANKRD13C as a potential therapeutic drug-repurposing target for SC.Conclusion:The proposed framework RankXLAN outperforms other existing frameworks for serum biomarker identification,therapeutic target identification,and SC classification with multi-omics datasets.
基金funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2025R104)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Modern intrusion detection systems(MIDS)face persistent challenges in coping with the rapid evolution of cyber threats,high-volume network traffic,and imbalanced datasets.Traditional models often lack the robustness and explainability required to detect novel and sophisticated attacks effectively.This study introduces an advanced,explainable machine learning framework for multi-class IDS using the KDD99 and IDS datasets,which reflects real-world network behavior through a blend of normal and diverse attack classes.The methodology begins with sophisticated data preprocessing,incorporating both RobustScaler and QuantileTransformer to address outliers and skewed feature distributions,ensuring standardized and model-ready inputs.Critical dimensionality reduction is achieved via the Harris Hawks Optimization(HHO)algorithm—a nature-inspired metaheuristic modeled on hawks’hunting strategies.HHO efficiently identifies the most informative features by optimizing a fitness function based on classification performance.Following feature selection,the SMOTE is applied to the training data to resolve class imbalance by synthetically augmenting underrepresented attack types.The stacked architecture is then employed,combining the strengths of XGBoost,SVM,and RF as base learners.This layered approach improves prediction robustness and generalization by balancing bias and variance across diverse classifiers.The model was evaluated using standard classification metrics:precision,recall,F1-score,and overall accuracy.The best overall performance was recorded with an accuracy of 99.44%for UNSW-NB15,demonstrating the model’s effectiveness.After balancing,the model demonstrated a clear improvement in detecting the attacks.We tested the model on four datasets to show the effectiveness of the proposed approach and performed the ablation study to check the effect of each parameter.Also,the proposed model is computationaly efficient.To support transparency and trust in decision-making,explainable AI(XAI)techniques are incorporated that provides both global and local insight into feature contributions,and offers intuitive visualizations for individual predictions.This makes it suitable for practical deployment in cybersecurity environments that demand both precision and accountability.
文摘Processes supported by process-aware information systems are subject to continuous and often subtle changes due to evolving operational,organizational,or regulatory factors.These changes,referred to as incremental concept drift,gradually alter the behavior or structure of processes,making their detection and localization a challenging task.Traditional process mining techniques frequently assume process stationarity and are limited in their ability to detect such drift,particularly from a control-flow perspective.The objective of this research is to develop an interpretable and robust framework capable of detecting and localizing incremental concept drift in event logs,with a specific emphasis on the structural evolution of control-flow semantics in processes.We propose DriftXMiner,a control-flow-aware hybrid framework that combines statistical,machine learning,and process model analysis techniques.The approach comprises three key components:(1)Cumulative Drift Scanner that tracks directional statistical deviations to detect early drift signals;(2)a Temporal Clustering and Drift-Aware Forest Ensemble(DAFE)to capture distributional and classification-level changes in process behavior;and(3)Petri net-based process model reconstruction,which enables the precise localization of structural drift using transition deviation metrics and replay fitness scores.Experimental validation on the BPI Challenge 2017 event log demonstrates that DriftXMiner effectively identifies and localizes gradual and incremental process drift over time.The framework achieves a detection accuracy of 92.5%,a localization precision of 90.3%,and an F1-score of 0.91,outperforming competitive baselines such as CUSUM+Histograms and ADWIN+Alpha Miner.Visual analyses further confirm that identified drift points align with transitions in control-flow models and behavioral cluster structures.DriftXMiner offers a novel and interpretable solution for incremental concept drift detection and localization in dynamic,process-aware systems.By integrating statistical signal accumulation,temporal behavior profiling,and structural process mining,the framework enables finegrained drift explanation and supports adaptive process intelligence in evolving environments.Its modular architecture supports extension to streaming data and real-time monitoring contexts.
基金funded by the Deanship of Scientific Research(DSR)at King Abdulaziz University,Jeddah,under Grant No.(GPIP:1074-612-2024).
文摘The surge in smishing attacks underscores the urgent need for robust,real-time detection systems powered by advanced deep learning models.This paper introduces PhishNet,a novel ensemble learning framework that integrates transformer-based models(RoBERTa)and large language models(LLMs)(GPT-OSS 120B,LLaMA3.370B,and Qwen332B)to enhance smishing detection performance significantly.To mitigate class imbalance,we apply synthetic data augmentation using T5 and leverage various text preprocessing techniques.Our system employs a duallayer voting mechanism:weighted majority voting among LLMs and a final ensemble vote to classify messages as ham,spam,or smishing.Experimental results show an average accuracy improvement from 96%to 98.5%compared to the best standalone transformer,and from 93%to 98.5%when compared to LLMs across datasets.Furthermore,we present a real-time,user-friendly application to operationalize our detection model for practical use.PhishNet demonstrates superior scalability,usability,and detection accuracy,filling critical gaps in current smishing detection methodologies.
基金supported by the National Natural Science Foundation of China (Grant No.12274131)the Innovation Program for Quantum Science and Technology (Grant No.2024ZD0300101)。
文摘Optical non-reciprocity is a fundamental phenomenon in photonics.It is crucial for developing devices that rely on directional signal control,such as optical isolators and circulators.However,most research in this field has focused on systems in equilibrium or steady states.In this work,we demonstrate a room-temperature Rydberg atomic platform where the unidirectional propagation of light acts as a switch to mediate time-crystalline-like collective oscillations through atomic synchronization.
基金the University of Transport Technology under the project entitled“Application of Machine Learning Algorithms in Landslide Susceptibility Mapping in Mountainous Areas”with grant number DTTD2022-16.
文摘This study was aimed to prepare landslide susceptibility maps for the Pithoragarh district in Uttarakhand,India,using advanced ensemble models that combined Radial Basis Function Networks(RBFN)with three ensemble learning techniques:DAGGING(DG),MULTIBOOST(MB),and ADABOOST(AB).This combination resulted in three distinct ensemble models:DG-RBFN,MB-RBFN,and AB-RBFN.Additionally,a traditional weighted method,Information Value(IV),and a benchmark machine learning(ML)model,Multilayer Perceptron Neural Network(MLP),were employed for comparison and validation.The models were developed using ten landslide conditioning factors,which included slope,aspect,elevation,curvature,land cover,geomorphology,overburden depth,lithology,distance to rivers and distance to roads.These factors were instrumental in predicting the output variable,which was the probability of landslide occurrence.Statistical analysis of the models’performance indicated that the DG-RBFN model,with an Area Under ROC Curve(AUC)of 0.931,outperformed the other models.The AB-RBFN model achieved an AUC of 0.929,the MB-RBFN model had an AUC of 0.913,and the MLP model recorded an AUC of 0.926.These results suggest that the advanced ensemble ML model DG-RBFN was more accurate than traditional statistical model,single MLP model,and other ensemble models in preparing trustworthy landslide susceptibility maps,thereby enhancing land use planning and decision-making.
基金National Natural Science Foundation of China (52075420)Fundamental Research Funds for the Central Universities (xzy022023049)National Key Research and Development Program of China (2023YFB3408600)。
文摘The burgeoning market for lithium-ion batteries has stimulated a growing need for more reliable battery performance monitoring. Accurate state-of-health(SOH) estimation is critical for ensuring battery operational performance. Despite numerous data-driven methods reported in existing research for battery SOH estimation, these methods often exhibit inconsistent performance across different application scenarios. To address this issue and overcome the performance limitations of individual data-driven models,integrating multiple models for SOH estimation has received considerable attention. Ensemble learning(EL) typically leverages the strengths of multiple base models to achieve more robust and accurate outputs. However, the lack of a clear review of current research hinders the further development of ensemble methods in SOH estimation. Therefore, this paper comprehensively reviews multi-model ensemble learning methods for battery SOH estimation. First, existing ensemble methods are systematically categorized into 6 classes based on their combination strategies. Different realizations and underlying connections are meticulously analyzed for each category of EL methods, highlighting distinctions, innovations, and typical applications. Subsequently, these ensemble methods are comprehensively compared in terms of base models, combination strategies, and publication trends. Evaluations across 6 dimensions underscore the outstanding performance of stacking-based ensemble methods. Following this, these ensemble methods are further inspected from the perspectives of weighted ensemble and diversity, aiming to inspire potential approaches for enhancing ensemble performance. Moreover, addressing challenges such as base model selection, measuring model robustness and uncertainty, and interpretability of ensemble models in practical applications is emphasized. Finally, future research prospects are outlined, specifically noting that deep learning ensemble is poised to advance ensemble methods for battery SOH estimation. The convergence of advanced machine learning with ensemble learning is anticipated to yield valuable avenues for research. Accelerated research in ensemble learning holds promising prospects for achieving more accurate and reliable battery SOH estimation under real-world conditions.
基金sponsored by the National Natural Science Foun-dation of China(Grant No.42330111).
文摘In this article,our nonlinear theory and technology for reducing the uncertainties of high-impact ocean‒atmosphere event predictions,with the conditional nonlinear optimal perturbation(CNOP)method as its core,are reviewed,and the“spring predictability barrier”problem for El Nino‒Southern Oscillation events and targeted observation issues for tropical cyclone forecasts are taken as two representative examples.Nonlinear theory reveals that initial errors of particular spatial structures,environmental conditions,and nonlinear processes contribute to significant prediction errors,whereas nonlinear technology provides a pioneering approach for reducing observational and forecast errors via targeted observations through the application of the CNOP method.Follow-up research further validates the scientific rigor of the theory in revealing the nonlinear mechanism of significant prediction errors,and relevant practical field campaigns for targeted observations verify the effectiveness of the technology in reducing prediction uncertainties.The CNOP method has achieved international recognition;furthermore,its applications further extend to ensemble forecasts for weather and climate and further enrich the nonlinear technology for reducing prediction uncertainties.It is expected that this nonlinear theory and technology will play a considerably important role in reducing prediction uncertainties for high-impact weather and climate events.
基金funded by Taif University,Saudi Arabia,project No.(TU-DSPP-2024-263).
文摘Deep learning algorithms have been rapidly incorporated into many different applications due to the increase in computational power and the availability of massive amounts of data.Recently,both deep learning and ensemble learning have been used to recognize underlying structures and patterns from high-level features to make predictions/decisions.With the growth in popularity of deep learning and ensemble learning algorithms,they have received significant attention from both scientists and the industrial community due to their superior ability to learn features from big data.Ensemble deep learning has exhibited significant performance in enhancing learning generalization through the use of multiple deep learning algorithms.Although ensemble deep learning has large quantities of training parameters,which results in time and space overheads,it performs much better than traditional ensemble learning.Ensemble deep learning has been successfully used in several areas,such as bioinformatics,finance,and health care.In this paper,we review and investigate recent ensemble deep learning algorithms and techniques in health care domains,medical imaging,health care data analytics,genomics,diagnosis,disease prevention,and drug discovery.We cover several widely used deep learning algorithms along with their architectures,including deep neural networks(DNNs),convolutional neural networks(CNNs),recurrent neural networks(RNNs),and generative adversarial networks(GANs).Common healthcare tasks,such as medical imaging,electronic health records,and genomics,are also demonstrated.Furthermore,in this review,the challenges inherent in reducing the burden on the healthcare system are discussed and explored.Finally,future directions and opportunities for enhancing healthcare model performance are discussed.
基金supported by the National Natural Science Foundation of China(22379021 and 22479021)。
文摘As batteries become increasingly essential for energy storage technologies,battery prognosis,and diagnosis remain central to ensure reliable operation and effective management,as well as to aid the in-depth investigation of degradation mechanisms.However,dynamic operating conditions,cell-to-cell inconsistencies,and limited availability of labeled data have posed significant challenges to accurate and robust prognosis and diagnosis.Herein,we introduce a time-series-decomposition-based ensembled lightweight learning model(TELL-Me),which employs a synergistic dual-module framework to facilitate accurate and reliable forecasting.The feature module formulates features with physical implications and sheds light on battery aging mechanisms,while the gradient module monitors capacity degradation rates and captures aging trend.TELL-Me achieves high accuracy in end-of-life prediction using minimal historical data from a single battery without requiring offline training dataset,and demonstrates impressive generality and robustness across various operating conditions and battery types.Additionally,by correlating feature contributions with degradation mechanisms across different datasets,TELL-Me is endowed with the diagnostic ability that not only enhances prediction reliability but also provides critical insights into the design and optimization of next-generation batteries.
基金the National Natural Sciences Foundation of China(32192421)the Special Grant Foundations for National Science and &Technology Basic Research Program of China(2021FY100303)the DFGP Project of Fauna of Guangdong Province(202115)。
文摘Three common species of Miniopterus fuliginosus,M.magnater and M.pusillus are known to inhabit China.However,M.fuliginosus and M.magnater are so similar in external morphology as to pose great challenges for accurate classification.Furthermore,taxonomic statuses,distribution ranges and taxonomic keys of these three species have remained controversial.For addressing these outstanding issues,the authors integrated molecular phylogenetic analyses,ensemble species distribution models(ESDMs),multiple morphological comparisons and decision tree algorithms for reassessing their taxonomy and distribution in China.Mitochondrial cytochrome c oxidase subunit I(COI)gene phylogeny revealed three distinct monophyletic groups corresponding to M.fuliginosus,M.magnater and M.pusillus.And the observed distribution patterns indicated M.fuliginosus had a broad distribution across China while M.magnater and M.pusillus exhibited a more restricted distribution,overlapping with M.fuliginosus in South China.And cranial morphometry indicated M.magnater was slightly larger than M.fuliginosus and significantly larger than M.pusillus.Also three-dimensional(3D)skull geomorphometry uncovered distinct features for each species in rostrum,braincase,tympanic bullae and mandibular shape.Decision tree algorithms helped to identify forearm length,braincase breadth and width across the third upper molars as three major taxonomic keys for assisting species identification.This study corroborated the importance of integrative approaches for identifying Miniopterus species and validated a methodological approach applicable to other cryptic species complexes.
基金supported by the Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(NRF-2022R1I1A3063493).
文摘Smart manufacturing and Industry 4.0 are transforming traditional manufacturing processes by utilizing innovative technologies such as the artificial intelligence(AI)and internet of things(IoT)to enhance efficiency,reduce costs,and ensure product quality.In light of the recent advancement of Industry 4.0,identifying defects has become important for ensuring the quality of products during the manufacturing process.In this research,we present an ensemble methodology for accurately classifying hot rolled steel surface defects by combining the strengths of four pre-trained convolutional neural network(CNN)architectures:VGG16,VGG19,Xception,and Mobile-Net V2,compensating for their individual weaknesses.We evaluated our methodology on the Xsteel surface defect dataset(XSDD),which comprises seven different classes.The ensemble methodology integrated the predictions of individual models through two methods:model averaging and weighted averaging.Our evaluation showed that the model averaging ensemble achieved an accuracy of 98.89%,a recall of 98.92%,a precision of 99.05%,and an F1-score of 98.97%,while the weighted averaging ensemble reached an accuracy of 99.72%,a recall of 99.74%,a precision of 99.67%,and an F1-score of 99.70%.The proposed weighted averaging ensemble model outperformed the model averaging method and the individual models in detecting defects in terms of accuracy,recall,precision,and F1-score.Comparative analysis with recent studies also showed the superior performance of our methodology.
基金Fund Project for Transformation of Scientific and Technological Achievements of Jiangsu Province of China(BA2023020)。
文摘As an essential candidate for environment-friendly luminescent quantum dots(QDs),CuInS-based QDs have attracted more attention in recent years.However,several drawbacks still hamper their industrial applications,such as lower photoluminescence quantum yield(PLQY),complex synthetic pathways,uncontrollable emission spectra,and insufficient photostability.In this study,CuInZnS@ZnS core/shell QDs was prepared via a one-pot/three-step synthetic scheme with accurate and tunable control of PL spectra.Then their ensemble spectroscopic properties during nucleation formation,alloying,and ZnS shell growth processes were systematically investigated.PL peaks of these QDs can be precisely manipulated from 530 to 850 nm by controlling the stoichiometric ratio of Cu/In,Zn^(2+)doping and ZnS shell growth.In particular,CuInZnS@ZnS QDs possess a significantly long emission lifetime(up to 750 ns),high PLQY(up to 85%),and excellent crystallinity.Their spectroscopic evolution is well validated by Cu-deficient related intragap emission model.By controlling the stoichiometric ratio of Cu/In,two distinct Cu-deficient related emission pathways are established based on the differing oxidation states of Cu defects.Therefore,this work provides deeper insights for fabricating high luminescent ternary or quaternary-alloyed QDs.
基金supported by the National Natural Science Foundation of China(11931009,12271495,11971450,and 12071449)Anhui Initiative in Quantum Information Technologies(AHY150200)the Project of Stable Support for Youth Team in Basic Research Field,Chinese Academy of Sciences(YSBR-001).
文摘On a compact Riemann surface with finite punctures P_(1),…P_(k),we define toric curves as multivalued,totallyunramified holomorphic maps to P^(n)with monodromy in a maximal torus of PSU(n+1).Toric solutions to SU(n+1)Todasystems on X\{P_(1);…;P_(k)}are recognized by the associated toric curves in.We introduce character n-ensembles as-tuples of meromorphic one-forms with simple poles and purely imaginary periods,generating toric curves on minus finitelymany points.On X,we establish a correspondence between character-ensembles and toric solutions to the SU(n+1)system with finitely many cone singularities.Our approach not only broadens seminal solutions with two conesingularities on the Riemann sphere,as classified by Jost-Wang(Int.Math.Res.Not.,2002,(6):277-290)andLin-Wei-Ye(Invent.Math.,2012,190(1):169-207),but also advances beyond the limits of Lin-Yang-Zhong’s existencetheorems(J.Differential Geom.,2020,114(2):337-391)by introducing a new solution class.
文摘吉林大学计算机科学与技术学院2022级博士研究生杨雨欣为第一作者的论文“Ensemble Conformal Predictor (En CP):A New Conformal Predictor with Robustness Guarantees Against Data Poisoning Attacks”被IEEE Symposium on Security and Privacy (IEEE S&P 2026)接收。作者还包括杨雨欣的指导教师教授李强、吉林大学人工智能学院博士研究生封润洋,共同通信作者是美国丰田工业大学芝加哥分校教授Liren Shan和美国伊利诺伊理工大学教授Binghui Wang。
基金supported by the NSFC(12171038)and 985 Projects。
文摘In this paper,we consider the Fisher informations among three classical type β-ensembles when β>0 scales with n satisfying lim βn=∞.We offer the exact order of-the corresponding two Fisher informations,which indicates that theβ-Laguerre ensembles do not satisfy the logarithmic Sobolev inequality.We also give some limit theorems on the extremals of β-Jacobi ensembles for β>0 fixed.
文摘Hepatocellular carcinoma(HCC)remains a leading cause of cancer-related mortality globally,necessitating advanced diagnostic tools to improve early detection and personalized targeted therapy.This review synthesizes evidence on explainable ensemble learning approaches for HCC classification,emphasizing their integration with clinical workflows and multi-omics data.A systematic analysis[including datasets such as The Cancer Genome Atlas,Gene Expression Omnibus,and the Surveillance,Epidemiology,and End Results(SEER)datasets]revealed that explainable ensemble learning models achieve high diagnostic accuracy by combining clinical features,serum biomarkers such as alpha-fetoprotein,imaging features such as computed tomography and magnetic resonance imaging,and genomic data.For instance,SHapley Additive exPlanations(SHAP)-based random forests trained on NCBI GSE14520 microarray data(n=445)achieved 96.53%accuracy,while stacking ensembles applied to the SEER program data(n=1897)demonstrated an area under the receiver operating characteristic curve of 0.779 for mortality prediction.Despite promising results,challenges persist,including the computational costs of SHAP and local interpretable model-agnostic explanations analyses(e.g.,TreeSHAP requiring distributed computing for metabolomics datasets)and dataset biases(e.g.,SEER’s Western population dominance limiting generalizability).Future research must address inter-cohort heterogeneity,standardize explainability metrics,and prioritize lightweight surrogate models for resource-limited settings.This review presents the potential of explainable ensemble learning frameworks to bridge the gap between predictive accuracy and clinical interpretability,though rigorous validation in independent,multi-center cohorts is critical for real-world deployment.