Cyclohexene is an important raw material in the production of nylon.Selective hydrogenation of benzene is a key method for preparing cyclohexene.However,the Ru catalysts used in current industrial processes still face...Cyclohexene is an important raw material in the production of nylon.Selective hydrogenation of benzene is a key method for preparing cyclohexene.However,the Ru catalysts used in current industrial processes still face challenges,including high metal usage,high process costs,and low cyclohexene yield.This study utilizes existing literature data combined with machine learning methods to analyze the factors influencing benzene conversion,cyclohexene selectivity,and yield in the benzene hydrogenation to cyclohexene reaction.It constructs predictive models based on XGBoost and Random Forest algorithms.After analysis,it was found that reaction time,Ru content,and space velocity are key factors influencing cyclohexene yield,selectivity,and benzene conversion.Shapley Additive Explanations(SHAP)analysis and feature importance analysis further revealed the contribution of each variable to the reaction outcomes.Additionally,we randomly generated one million variable combinations using the Dirichlet distribution to attempt to predict high-yield catalyst formulations.This paper provides new insights into the application of machine learning in heterogeneous catalysis and offers some reference for further research.展开更多
The uplift resistance of the soil overlying shield tunnels significantly impacts their anti-floating stability.However,research on uplift resistance concerning special-shaped shield tunnels is limited.This study combi...The uplift resistance of the soil overlying shield tunnels significantly impacts their anti-floating stability.However,research on uplift resistance concerning special-shaped shield tunnels is limited.This study combines numerical simulation with machine learning techniques to explore this issue.It presents a summary of special-shaped tunnel geometries and introduces a shape coefficient.Through the finite element software,Plaxis3D,the study simulates six key parameters—shape coefficient,burial depth ratio,tunnel’s longest horizontal length,internal friction angle,cohesion,and soil submerged bulk density—that impact uplift resistance across different conditions.Employing XGBoost and ANN methods,the feature importance of each parameter was analyzed based on the numerical simulation results.The findings demonstrate that a tunnel shape more closely resembling a circle leads to reduced uplift resistance in the overlying soil,whereas other parameters exhibit the contrary effects.Furthermore,the study reveals a diminishing trend in the feature importance of buried depth ratio,internal friction angle,tunnel longest horizontal length,cohesion,soil submerged bulk density,and shape coefficient in influencing uplift resistance.展开更多
The advantages of genome selection(GS) in animal and plant breeding are self-evident.Traditional parametric models have disadvantage in better fit the increasingly large sequencing data and capture complex effects acc...The advantages of genome selection(GS) in animal and plant breeding are self-evident.Traditional parametric models have disadvantage in better fit the increasingly large sequencing data and capture complex effects accurately.Machine learning models have demonstrated remarkable potential in addressing these challenges.In this study,we introduced the concept of mixed kernel functions to explore the performance of support vector machine regression(SVR) in GS.Six single kernel functions(SVR_L,SVR_C,SVR_G,SVR_P,SVR_S,SVR_L) and four mixed kernel functions(SVR_GS,SVR_GP,SVR_LS,SVR_LP) were used to predict genome breeding values.The prediction accuracy,mean squared error(MSE) and mean absolute error(MAE) were used as evaluation indicators to compare with two traditional parametric models(GBLUP,BayesB) and two popular machine learning models(RF,KcRR).The results indicate that in most cases,the performance of the mixed kernel function model significantly outperforms that of GBLUP,BayesB and single kernel function.For instance,for T1 in the pig dataset,the predictive accuracy of SVR_GS is improved by 10% compared to GBLUP,and by approximately 4.4 and 18.6% compared to SVR_G and SVR_S respectively.For E1 in the wheat dataset,SVR_GS achieves 13.3% higher prediction accuracy than GBLUP.Among single kernel functions,the Laplacian and Gaussian kernel functions yield similar results,with the Gaussian kernel function performing better.The mixed kernel function notably reduces the MSE and MAE when compared to all single kernel functions.Furthermore,regarding runtime,SVR_GS and SVR_GP mixed kernel functions run approximately three times faster than GBLUP in the pig dataset,with only a slight increase in runtime compared to the single kernel function model.In summary,the mixed kernel function model of SVR demonstrates speed and accuracy competitiveness,and the model such as SVR_GS has important application potential for GS.展开更多
Accurate assessment of snowpack volumetric liquid water content and bulk density is essential for understanding snow hydrology,avalanche risk management,and monitoring cryosphere changes.This study presents a novel du...Accurate assessment of snowpack volumetric liquid water content and bulk density is essential for understanding snow hydrology,avalanche risk management,and monitoring cryosphere changes.This study presents a novel dual-parameter inversion framework that integrates synthetic electromagnetic modelling,dimensionality reduction,and machine learning algorithms to extract relative permittivity and log-resistivity from ground-penetrating radar(GPR)data.Traditional snowpack measurements are invasive,labor-intensive,and limited to point observations.To overcome these limitations,we developed a non-invasive,scalable,and data-driven framework that uses synthetic GPR datasets representing diverse snowpack conditions with variable moisture and density profiles.Synthetic 1D time series reflections(A-scans)are generated using finite-difference time-domain simulations in the state-of-the-art electromagnetic simulator gprMax.Principal component analysis(PCA)is applied to compress each A-scan while preserving key features,which significantly improved and enhanced the model training efficiency.Four machine learning models,including random forest,neural network,support vector machine,and eXtreme gradient boosting,are trained on PCA-reduced features.Among these,the neural network model achieved the best performance,with R^(2)>0.97 for permittivity and R 2>0.92 for resistivity.Gaussian noise(signal-to-noise ratio of 6 dB)is introduced to the synthetic data,and then targeted domain adaptation is employed to enhance generalization to field data.The framework is validated on two contrasting GPR transects in the Altay Mountains of the Chinese mainland,representing moist(T750)and wet(G125)snowpack conditions.The neural network model predictions are most consistent with the GPR derived estimates,Snowfork measurements,and snow pit data,achieving volumetric liquid water content deviation of≤1.5% and bulk density error within the range of 30-84 kg m^(-3).The results demonstrate that machine learning-based inversion,supported by realistic simulations and data augmentation enables scalable,non-invasive snowpack characterization with significant applications in hydrological forecasting,snow monitoring,and water resource management.展开更多
In the pharmaceutical field,machine learning can play an important role in drug development,production and treatment.Co-crystallization techniques have shown promising potential to enhance the properties of active pha...In the pharmaceutical field,machine learning can play an important role in drug development,production and treatment.Co-crystallization techniques have shown promising potential to enhance the properties of active pharmaceutical ingredients(APIs)such as solubility,permeability,and bioavailability,all without altering their chemical structure.This approach opens new avenues for developing natural products into effective drugs,especially those previously challenging in formulation.Emodin,an anthraquinone-based natural product,is a notable example due to its diverse biological activities;however,its physicochemical limitations,such as poor solubility and easy sublimation,restricted its clinical application.While various methods have improved emodin's physicochemical properties,research on its bioavailability remains limited.In our study,we summarize cocrystals and salts produced through co-crystallization technology and identify piperazine as a favorable coformer.Conflicting conclusions from computational chemistry and molecular modeling method and machine learning method regarding the formation of an emodin-piperazine cocrystal or salt led us to experimentally validate these possibilities.Ultimately,we successfully obtained the emodin-piperazine cocrystal,which were characterized and evaluated by several in vitro methods and pharmacokinetic studies.In addition,experiments have shown that emodin has a certain therapeutic effect on sepsis,so we also evaluated emodin-piperazine biological activity in a sepsis model.The results demonstrate that co-crystallization significantly enhances emodin's solubility,permeability,and bioavailability.Pharmacodynamic studies indicate that the emodin-piperazine cocrystal improves sepsis symptoms and provides protective effects against liver and kidney damage associated with sepsis.This study offers renewed hope for natural products with broad biological activities yet hindered by physicochemical limitations by advancing co-crystallization as a viable development approach.展开更多
Nasopharyngeal carcinoma(NPC)is a malignant tumor prevalent in southern China and Southeast Asia,where its early detection is crucial for improving patient prognosis and reducing mortality rates.However,existing scree...Nasopharyngeal carcinoma(NPC)is a malignant tumor prevalent in southern China and Southeast Asia,where its early detection is crucial for improving patient prognosis and reducing mortality rates.However,existing screening methods suffer from limitations in accuracy and accessibility,hindering their application in large-scale population screening.In this work,a surface-enhanced Raman spectroscopy(SERS)-based method was established to explore the profiles of different stratified components in saliva from NPC and healthy subjects after fractionation processing.The study findings indicate that all fractionated samples exhibit diseaseassociated molecular signaling differences,where small-molecule(molecular weight cut-offvalue is 10 kDa)demonstrating superior classification capabilities with sensitivity of 90.5%and speci-ficity of 75.6%,area under receiver operating characteristic(ROC)curve of 0:925±0:031.The primary objective of this study was to qualitatively explore patterns in saliva composition across groups.The proposed SERS detection strategy for fractionated saliva offers novel insights for enhancing the sensitivity and reliability of noninvasive NPC screening,laying the foundation for translational application in large-scale clinical settings.展开更多
Accurate land surface temperature(LST)assessment is crucial for comprehending and reducing the impacts of climate change and understanding land use evolution.This study presents an innovative method by utilizing ensem...Accurate land surface temperature(LST)assessment is crucial for comprehending and reducing the impacts of climate change and understanding land use evolution.This study presents an innovative method by utilizing ensemble models,advanced correlation analysis,and trend analysis to investigate its environmental influences.Google Earth Engine(GEE)was utilized to process the datasets from Landsat-7 and Landsat-8 for the five big cities of Punjab,Pakistan,from 2001 to 2023.Results from this study show significant urban warming trends,and a strong correlation between environmental variables and LST was identified.The ensemble-based three machine learning models,including XGBoost,AdaBoost,and random forest(RF),were adopted to improve the accuracy of LST evaluation.Although XGBoost and AdaBoost attained modest levels of accuracy,with R^(2) values of 0.767 and 0.706,respectively,the RF model outperformed them by achieving an exceptional R^(2) of 0.796 and RMSE of 0.476.Moreover,Pearson correlation analysis revealed a negative relationship between LST and normalized difference latent heat index(NDLI)with r=-0.67,normalized difference vegetation index(NDVI)with r=-0.6,and modified normalized difference water index(MNDWI)with the value of r as -0.57.In addition,wavelet analysis showed that vegetation and water offer long-term LST cooling,lasting up to 64 months,while built-up areas and bare soil contribute to short-term warming,lasting 4 to 8 months.Latent heat indicated variable cooling periods,surpassing 60 months in cities.These findings enhance the understanding of LST changes and the impact of climate change on the environment.展开更多
As urbanization continues to accelerate,the challenges associated with managing transportation in metropolitan areas become increasingly complex.The surge in population density contributes to traffic congestion,impact...As urbanization continues to accelerate,the challenges associated with managing transportation in metropolitan areas become increasingly complex.The surge in population density contributes to traffic congestion,impacting travel experiences and posing safety risks.Smart urban transportation management emerges as a strategic solution,conceptualized here as a multidimensional big data problem.The success of this strategy hinges on the effective collection of information from diverse,extensive,and heterogeneous data sources,necessitating the implementation of full⁃stack Information and Communication Technology(ICT)solutions.The main idea of the work is to investigate the current technologies of Intelligent Transportation Systems(ITS)and enhance the safety of urban transportation systems.Machine learning models,trained on historical data,can predict traffic congestion,allowing for the implementation of preventive measures.Deep learning architectures,with their ability to handle complex data representations,further refine traffic predictions,contributing to more accurate and dynamic transportation management.The background of this research underscores the challenges posed by traffic congestion in metropolitan areas and emphasizes the need for advanced technological solutions.By integrating GPS and GIS technologies with machine learning algorithms,this work aims to pay attention to the development of intelligent transportation systems that not only address current challenges but also pave the way for future advancements in urban transportation management.展开更多
Replicating the chaotic characteristics inherent in nonlinear dynamical systems via machine learning(ML)is a key challenge in this rapidly advancing interdisciplinary field.In this work,we explore the potential of var...Replicating the chaotic characteristics inherent in nonlinear dynamical systems via machine learning(ML)is a key challenge in this rapidly advancing interdisciplinary field.In this work,we explore the potential of variational quantum circuits(VQC)for learning the stochastic properties of classical nonlinear dynamical systems.Specifically,we focus on the one-and two-dimensional logistic maps,which,while simple,remain under-explored in the context of learning dynamical characteristics.Our findings reveal that,even for such simple dynamical systems,accurately replicating longterm characteristics is hindered by a pronounced sensitivity to overfitting.While increasing the parameter complexity of the ML model typically enhances short-term prediction accuracy,it also leads to a degradation in the model’s ability to replicate long-term characteristics,primarily due to the detrimental effects of overfitting on generalization power.By comparing the VQC with two widely recognized classical ML techniques,which are long short-term memory(LSTM)networks for timeseries processing and reservoir computing,we demonstrate that VQC outperforms these methods in terms of replicating long-term characteristics.Our results suggest that for the ML of dynamics,it is demanded to develop more compact and efficient models(such as VQC)rather than more complicated and large-scale ones.展开更多
The rapid advancement of machine learning based tight-binding Hamiltonian(MLTB)methods has opened new avenues for efficient and accurate electronic structure simulations,particularly in large-scale systems and long-ti...The rapid advancement of machine learning based tight-binding Hamiltonian(MLTB)methods has opened new avenues for efficient and accurate electronic structure simulations,particularly in large-scale systems and long-time scenarios.This review begins with a concise overview of traditional tight-binding(TB)models,including both(semi-)empirical and first-principles approaches,establishing the foundation for understanding MLTB developments.We then present a systematic classification of existing MLTB methodologies,grouped into two major categories:direct prediction of TB Hamiltonian elements and inference of empirical parameters.A comparative analysis with other ML-based electronic structure models is also provided,highlighting the advancement of MLTB approaches.Finally,we explore the emerging MLTB application ecosystem,highlighting how the integration of MLTB models with a diverse suite of post-processing tools from linear-scaling solvers to quantum transport frameworks and molecular dynamics interfaces is essential for tackling complex scientific problems across different domains.The continued advancement of this integrated paradigm promises to accelerate materials discovery and open new frontiers in the predictive simulation of complex quantum phenomena.展开更多
The water hammer problem is an important issue in the dynamics of liquid propulsion system.This paper aims to use the Lattice Boltzmann Method(LBM)with entropy limiter to study the water hammer problems in propellant ...The water hammer problem is an important issue in the dynamics of liquid propulsion system.This paper aims to use the Lattice Boltzmann Method(LBM)with entropy limiter to study the water hammer problems in propellant feedlines.The dynamic characteristics of valve-closing water hammer and filling water hammer are investigated by this method,and the sensitivity of filling water hammer is analyzed with a single factor sensitivity analysis with 8 factors and 9 levels and a multi-factor sensitivity analysis with L_(27)(3^(13))orthogonal experiment based on range method.It is found that the solving result of LBM with entropy limiter is basically in good agreement with finite volume method,and using the entropy limiter can eliminate numerical oscillations when solving valve-closing water hammer problems and solve the numerical"blow up"when solving filling water hammer problems.It can be seen that the dynamic characteristics of valve-closing water hammer are relatively simple,while there are many factors that affect the filling water hammer and the degree of these effects varies.The effects on the maximum water hammer pressure are relatively uniform,but those on the water hammer response time vary greatly through the skewness analysis.展开更多
The viscosity of refining slags plays a critical role in metallurgical processes.However,obtaining accurate viscosity data remains challenging due to the complexities of high-temperature experiments,often relying on e...The viscosity of refining slags plays a critical role in metallurgical processes.However,obtaining accurate viscosity data remains challenging due to the complexities of high-temperature experiments,often relying on empirical models with limited predictive capabilities.This study focuses on the influence of optical basicity on viscosity in CaO-Al_(2)O_(3)-based refining slags,leveraging machine learning to address data scarcity and improve prediction accuracy.An automated framework for algorithm integration,parameter tuning,and evaluation ranking framework(Auto-APE)is employed to develop customized data-driven models for various slag systems,including CaO-Al_(2)O_(3)-SiO_(2),CaO-Al_(2)O_(3)-CaF_(2),CaO-Al_(2)O_(3)-SiO_(2)-MgO,and CaO-Al_(2)O_(3)-SiO_(2)-MgO-CaF_(2).By incorporating optical basicity as a key feature,the models achieve an average validation error of 8.0%to 15.1%,significantly outperforming traditional empirical models.Additionally,symbolic regression is introduced to rapidly construct domain-specific features,such as optical basicity-like descriptors,offering a potential breakthrough in performance prediction for small datasets.This work highlights the critical role of domain-specific knowledge in understanding and predicting viscosity,providing a robust machine learning-based approach for optimizing refining slag properties.展开更多
Lithium-ion batteries(LIBs)are widely deployed,from grid-scale storage to electric vehicles.LIBs remain stationary most of their service life,where calendar aging degrades capacity.Understanding the mechanisms of LIB ...Lithium-ion batteries(LIBs)are widely deployed,from grid-scale storage to electric vehicles.LIBs remain stationary most of their service life,where calendar aging degrades capacity.Understanding the mechanisms of LIB calendar aging is crucial for extending battery lifespan.However,LIB calendar aging is influenced by multiple factors,including battery material,its state,and storage environment.Calendar aging experiments are also time-consuming,costly,and lack standardized testing conditions.This study employs a data-driven approach to establish a cross-scale database linking materials,side-reaction mechanisms,and calendar aging of LIBs.MELODI(Mechanism-informed,Explainable,Learning-based Optimization for Degradation Identification)is proposed to identify calendar aging mechanisms and quantify the effects of multi-scale factors.Results reveal that cathode material loss drives up to 91.42%of calendar aging degradation in high-nickel(Ni)batteries,while solid electrolyte interphase growth dominates in lithium iron phosphate(LFP)and low-Ni batteries,contributing up to 82.43%of degradation in LFP batteries and 99.10%of decay in low-Ni batteries,respectively.This study systematically quantifies calendar aging in commercial LIBs under varying materials,states of charge,and temperatures.These findings offer quantitative guidance for experimental design or battery use,and implications for emerging applications like aerial robotics,vehicle-to-grid,and embodied intelligence systems.展开更多
The glass transition temperature(T_(g))of styrene-butadiene rubber(SBR)is a key parameter determining its low-temperature flexibility and processing performance.Accurate prediction of T_(g)is crucial formaterial desig...The glass transition temperature(T_(g))of styrene-butadiene rubber(SBR)is a key parameter determining its low-temperature flexibility and processing performance.Accurate prediction of T_(g)is crucial formaterial design and application optimisation.Addressing the limitations of traditional experimental measurements and theoretical models in terms of efficiency,cost,and accuracy,this study proposes a machine learning prediction framework that integrates multi-model ensemble and Bayesian optimization by constructing a multi-component feature dataset and algorithm optimization strategy.Based on the constructed high-quality dataset containing 96 SBR samples,ninemachine learning models were employed to predict the T_(g)of SBR and compare their prediction performance.Ultimately,aGPR-XGBoost mixed model was constructed through model ensemble,achieving high-precision prediction with R^(2)values greater than 0.9 on both the training and test sets.Further feature attribution and local effect analysis were conducted using feature analysis methods such as SHAP and ALE,revealing the nonlinear influence patterns of various components on T_(g),providing a theoretical basis for SBR formulation design and T_(g)regulation.The machine learning prediction framework established in this study combines high-precision prediction with interpretability,significantly enhancing the prediction performance of the T_(g)of SBR.It offers an efficient tool for SBR molecular design and holds great potential for promotion and application.展开更多
Landslides pose a formidable natural hazard across the Qinghai-Tibet Plateau(QTP),endangering both ecosystems and human life.Identifying the driving factors behind landslides and accurately assessing susceptibility ar...Landslides pose a formidable natural hazard across the Qinghai-Tibet Plateau(QTP),endangering both ecosystems and human life.Identifying the driving factors behind landslides and accurately assessing susceptibility are key to mitigating disaster risk.This study integrated multi-source historical landslide data with 15 predictive factors and used several machine learning models—Random Forest(RF),Gradient Boosting Regression Trees(GBRT),Extreme Gradient Boosting(XGBoost),and Categorical Boosting(CatBoost)—to generate susceptibility maps.The Shapley additive explanation(SHAP)method was applied to quantify factor importance and explore their nonlinear effects.The results showed that:(1)CatBoost was the best-performing model(CA=0.938,AUC=0.980)in assessing landslide susceptibility,with altitude emerging as the most significant factor,followed by distance to roads and earthquake sites,precipitation,and slope;(2)the SHAP method revealed critical nonlinear thresholds,demonstrating that historical landslides were concentrated at mid-altitudes(1400-4000 m)and decreased markedly above 4000 m,with a parallel reduction in probability beyond 700 m from roads;and(3)landslide-prone areas,comprising 13%of the QTP,were concentrated in the southeastern and northeastern parts of the plateau.By integrating machine learning and SHAP analysis,this study revealed landslide hazard-prone areas and their driving factors,providing insights to support disaster management strategies and sustainable regional planning.展开更多
Modern intrusion detection systems(MIDS)face persistent challenges in coping with the rapid evolution of cyber threats,high-volume network traffic,and imbalanced datasets.Traditional models often lack the robustness a...Modern intrusion detection systems(MIDS)face persistent challenges in coping with the rapid evolution of cyber threats,high-volume network traffic,and imbalanced datasets.Traditional models often lack the robustness and explainability required to detect novel and sophisticated attacks effectively.This study introduces an advanced,explainable machine learning framework for multi-class IDS using the KDD99 and IDS datasets,which reflects real-world network behavior through a blend of normal and diverse attack classes.The methodology begins with sophisticated data preprocessing,incorporating both RobustScaler and QuantileTransformer to address outliers and skewed feature distributions,ensuring standardized and model-ready inputs.Critical dimensionality reduction is achieved via the Harris Hawks Optimization(HHO)algorithm—a nature-inspired metaheuristic modeled on hawks’hunting strategies.HHO efficiently identifies the most informative features by optimizing a fitness function based on classification performance.Following feature selection,the SMOTE is applied to the training data to resolve class imbalance by synthetically augmenting underrepresented attack types.The stacked architecture is then employed,combining the strengths of XGBoost,SVM,and RF as base learners.This layered approach improves prediction robustness and generalization by balancing bias and variance across diverse classifiers.The model was evaluated using standard classification metrics:precision,recall,F1-score,and overall accuracy.The best overall performance was recorded with an accuracy of 99.44%for UNSW-NB15,demonstrating the model’s effectiveness.After balancing,the model demonstrated a clear improvement in detecting the attacks.We tested the model on four datasets to show the effectiveness of the proposed approach and performed the ablation study to check the effect of each parameter.Also,the proposed model is computationaly efficient.To support transparency and trust in decision-making,explainable AI(XAI)techniques are incorporated that provides both global and local insight into feature contributions,and offers intuitive visualizations for individual predictions.This makes it suitable for practical deployment in cybersecurity environments that demand both precision and accountability.展开更多
Lithium manganese silicate(Li-Mn-Si-O)cathodes are key components of lithium-ion batteries,and their physical and mechanical properties are strongly influenced by their underlying crystal structures.In this study,a ra...Lithium manganese silicate(Li-Mn-Si-O)cathodes are key components of lithium-ion batteries,and their physical and mechanical properties are strongly influenced by their underlying crystal structures.In this study,a range of machine learning(ML)algorithms were developed and compared to predict the crystal systems of Li-Mn-Si-O cathode materials using density functional theory(DFT)data obtained from the Materials Project database.The dataset comprised 211 compositions characterized by key descriptors,including formation energy,energy above the hull,bandgap,atomic site number,density,and unit cell volume.These features were utilized to classify the materials into monoclinic(0)and triclinic(1)crystal systems.A comprehensive comparison of various classification algorithms including Decision Tree,Random Forest,XGBoost,Support VectorMachine,k-Nearest Neighbor,Stochastic Gradient Descent,Gaussian Naive Bayes,Gaussian Process,and Artificial Neural Network(ANN)was conducted.Among these,the optimized ANN architecture(6–14-14-14-1)exhibited the highest predictive performance,achieving an accuracy of 95.3%,aMatthews correlation coefficient(MCC)of 0.894,and an F-score of 0.963,demonstrating excellent consistency with DFT-predicted crystal structures.Meanwhile,RandomForest and Gaussian Processmodels also exhibited reliable and consistent predictive capability,indicating their potential as complementary approaches,particularly when data are limited or computational efficiency is required.This comparative framework provides valuable insights into model selection for crystal system classification in complex cathode materials.展开更多
基金Supported by CAS Basic and Interdisciplinary Frontier Scientific Research Pilot Project(XDB1190300,XDB1190302)Youth Innovation Promotion Association CAS(Y2021056)+1 种基金Joint Fund of the Yulin University and the Dalian National Laboratory for Clean Energy(YLU-DNL Fund 2022007)The special fund for Science and Technology Innovation Teams of Shanxi Province(202304051001007)。
文摘Cyclohexene is an important raw material in the production of nylon.Selective hydrogenation of benzene is a key method for preparing cyclohexene.However,the Ru catalysts used in current industrial processes still face challenges,including high metal usage,high process costs,and low cyclohexene yield.This study utilizes existing literature data combined with machine learning methods to analyze the factors influencing benzene conversion,cyclohexene selectivity,and yield in the benzene hydrogenation to cyclohexene reaction.It constructs predictive models based on XGBoost and Random Forest algorithms.After analysis,it was found that reaction time,Ru content,and space velocity are key factors influencing cyclohexene yield,selectivity,and benzene conversion.Shapley Additive Explanations(SHAP)analysis and feature importance analysis further revealed the contribution of each variable to the reaction outcomes.Additionally,we randomly generated one million variable combinations using the Dirichlet distribution to attempt to predict high-yield catalyst formulations.This paper provides new insights into the application of machine learning in heterogeneous catalysis and offers some reference for further research.
基金Guangzhou Metro Scientific Research Project(No.JT204-100111-23001)Chongqing Municipal Special Project for Technological Innovation and Application Development(No.CSTB2022TIAD-KPX0101)Science and Technology Research and Development Program of China State Railway Group Co.,Ltd.(No.N2023G045)。
文摘The uplift resistance of the soil overlying shield tunnels significantly impacts their anti-floating stability.However,research on uplift resistance concerning special-shaped shield tunnels is limited.This study combines numerical simulation with machine learning techniques to explore this issue.It presents a summary of special-shaped tunnel geometries and introduces a shape coefficient.Through the finite element software,Plaxis3D,the study simulates six key parameters—shape coefficient,burial depth ratio,tunnel’s longest horizontal length,internal friction angle,cohesion,and soil submerged bulk density—that impact uplift resistance across different conditions.Employing XGBoost and ANN methods,the feature importance of each parameter was analyzed based on the numerical simulation results.The findings demonstrate that a tunnel shape more closely resembling a circle leads to reduced uplift resistance in the overlying soil,whereas other parameters exhibit the contrary effects.Furthermore,the study reveals a diminishing trend in the feature importance of buried depth ratio,internal friction angle,tunnel longest horizontal length,cohesion,soil submerged bulk density,and shape coefficient in influencing uplift resistance.
基金supported by the China Agriculture Research System of MOF and MARAthe National Natural Science Foundation of China (31872337 and 31501919)the Agricultural Science and Technology Innovation Project,China (ASTIP-IAS02)。
文摘The advantages of genome selection(GS) in animal and plant breeding are self-evident.Traditional parametric models have disadvantage in better fit the increasingly large sequencing data and capture complex effects accurately.Machine learning models have demonstrated remarkable potential in addressing these challenges.In this study,we introduced the concept of mixed kernel functions to explore the performance of support vector machine regression(SVR) in GS.Six single kernel functions(SVR_L,SVR_C,SVR_G,SVR_P,SVR_S,SVR_L) and four mixed kernel functions(SVR_GS,SVR_GP,SVR_LS,SVR_LP) were used to predict genome breeding values.The prediction accuracy,mean squared error(MSE) and mean absolute error(MAE) were used as evaluation indicators to compare with two traditional parametric models(GBLUP,BayesB) and two popular machine learning models(RF,KcRR).The results indicate that in most cases,the performance of the mixed kernel function model significantly outperforms that of GBLUP,BayesB and single kernel function.For instance,for T1 in the pig dataset,the predictive accuracy of SVR_GS is improved by 10% compared to GBLUP,and by approximately 4.4 and 18.6% compared to SVR_G and SVR_S respectively.For E1 in the wheat dataset,SVR_GS achieves 13.3% higher prediction accuracy than GBLUP.Among single kernel functions,the Laplacian and Gaussian kernel functions yield similar results,with the Gaussian kernel function performing better.The mixed kernel function notably reduces the MSE and MAE when compared to all single kernel functions.Furthermore,regarding runtime,SVR_GS and SVR_GP mixed kernel functions run approximately three times faster than GBLUP in the pig dataset,with only a slight increase in runtime compared to the single kernel function model.In summary,the mixed kernel function model of SVR demonstrates speed and accuracy competitiveness,and the model such as SVR_GS has important application potential for GS.
基金supported by the National Key R&D Program of China(Grant Nos.2023YFC3008300&2023YFC3008305)the National Natural Science Foundation of China(Grant No.42172320)+1 种基金the Key Laboratory of Mountain Hazards and Engineering Resilience,Institute of Mountain Hazards and Environment,Chinese Academy of Sciences(Grant Nos.KLMHER-Z06&KLMHER-T07)the Science and Technology Research Program of Institute of Mountain Hazards and Environment,Chinese Academy of Sciences(Grant No.IMHE-CXTD.04).
文摘Accurate assessment of snowpack volumetric liquid water content and bulk density is essential for understanding snow hydrology,avalanche risk management,and monitoring cryosphere changes.This study presents a novel dual-parameter inversion framework that integrates synthetic electromagnetic modelling,dimensionality reduction,and machine learning algorithms to extract relative permittivity and log-resistivity from ground-penetrating radar(GPR)data.Traditional snowpack measurements are invasive,labor-intensive,and limited to point observations.To overcome these limitations,we developed a non-invasive,scalable,and data-driven framework that uses synthetic GPR datasets representing diverse snowpack conditions with variable moisture and density profiles.Synthetic 1D time series reflections(A-scans)are generated using finite-difference time-domain simulations in the state-of-the-art electromagnetic simulator gprMax.Principal component analysis(PCA)is applied to compress each A-scan while preserving key features,which significantly improved and enhanced the model training efficiency.Four machine learning models,including random forest,neural network,support vector machine,and eXtreme gradient boosting,are trained on PCA-reduced features.Among these,the neural network model achieved the best performance,with R^(2)>0.97 for permittivity and R 2>0.92 for resistivity.Gaussian noise(signal-to-noise ratio of 6 dB)is introduced to the synthetic data,and then targeted domain adaptation is employed to enhance generalization to field data.The framework is validated on two contrasting GPR transects in the Altay Mountains of the Chinese mainland,representing moist(T750)and wet(G125)snowpack conditions.The neural network model predictions are most consistent with the GPR derived estimates,Snowfork measurements,and snow pit data,achieving volumetric liquid water content deviation of≤1.5% and bulk density error within the range of 30-84 kg m^(-3).The results demonstrate that machine learning-based inversion,supported by realistic simulations and data augmentation enables scalable,non-invasive snowpack characterization with significant applications in hydrological forecasting,snow monitoring,and water resource management.
基金funded by the National Natural Science Foundation of China(No.22278443)CAMS Innovation Fund for Medical Sciences(No.2022-I2M-1-015)+3 种基金the Key R&D Program of Shandong Province(No.2021ZDSYS26)Xinjiang Uygur Autonomous Region Innovation Environment Construction Special Fund and Technology Innovation Base Construction Key Laboratory Open Project(No.2023D04065)2023 Xinjiang Uygur Autonomous Region Innovation Tianchi Talent Introduction Program for financial supportthe Key Project of Natural Science of Bengbu Medical University(No.2024byzd138).
文摘In the pharmaceutical field,machine learning can play an important role in drug development,production and treatment.Co-crystallization techniques have shown promising potential to enhance the properties of active pharmaceutical ingredients(APIs)such as solubility,permeability,and bioavailability,all without altering their chemical structure.This approach opens new avenues for developing natural products into effective drugs,especially those previously challenging in formulation.Emodin,an anthraquinone-based natural product,is a notable example due to its diverse biological activities;however,its physicochemical limitations,such as poor solubility and easy sublimation,restricted its clinical application.While various methods have improved emodin's physicochemical properties,research on its bioavailability remains limited.In our study,we summarize cocrystals and salts produced through co-crystallization technology and identify piperazine as a favorable coformer.Conflicting conclusions from computational chemistry and molecular modeling method and machine learning method regarding the formation of an emodin-piperazine cocrystal or salt led us to experimentally validate these possibilities.Ultimately,we successfully obtained the emodin-piperazine cocrystal,which were characterized and evaluated by several in vitro methods and pharmacokinetic studies.In addition,experiments have shown that emodin has a certain therapeutic effect on sepsis,so we also evaluated emodin-piperazine biological activity in a sepsis model.The results demonstrate that co-crystallization significantly enhances emodin's solubility,permeability,and bioavailability.Pharmacodynamic studies indicate that the emodin-piperazine cocrystal improves sepsis symptoms and provides protective effects against liver and kidney damage associated with sepsis.This study offers renewed hope for natural products with broad biological activities yet hindered by physicochemical limitations by advancing co-crystallization as a viable development approach.
基金financially supported by National Natural Science Foundation ofChina(No.12374405)Provincial Science Foundation for Distinguished Young Scholars of Fujian(No.2024J010024)+1 种基金Natural Science Foundation of Fujian Province of China(No.2023J011267)Major Research Projects for Young and Middle-aged Researchers of Fujian Provincial Health Commission(No.2021ZQNZD010).
文摘Nasopharyngeal carcinoma(NPC)is a malignant tumor prevalent in southern China and Southeast Asia,where its early detection is crucial for improving patient prognosis and reducing mortality rates.However,existing screening methods suffer from limitations in accuracy and accessibility,hindering their application in large-scale population screening.In this work,a surface-enhanced Raman spectroscopy(SERS)-based method was established to explore the profiles of different stratified components in saliva from NPC and healthy subjects after fractionation processing.The study findings indicate that all fractionated samples exhibit diseaseassociated molecular signaling differences,where small-molecule(molecular weight cut-offvalue is 10 kDa)demonstrating superior classification capabilities with sensitivity of 90.5%and speci-ficity of 75.6%,area under receiver operating characteristic(ROC)curve of 0:925±0:031.The primary objective of this study was to qualitatively explore patterns in saliva composition across groups.The proposed SERS detection strategy for fractionated saliva offers novel insights for enhancing the sensitivity and reliability of noninvasive NPC screening,laying the foundation for translational application in large-scale clinical settings.
基金supported by the National Natural Science Foundation of China(Grant Nos.52479045,52279042)the Key Research and Development Program in Guangxi(Grant No.AB23026021)the Open Research Fund of Guangxi Key Laboratory of Water Engineering Materials and Structures,Guangxi Institute of Water Resources Research(Grant No.GXHRIWEMS-2022-07).
文摘Accurate land surface temperature(LST)assessment is crucial for comprehending and reducing the impacts of climate change and understanding land use evolution.This study presents an innovative method by utilizing ensemble models,advanced correlation analysis,and trend analysis to investigate its environmental influences.Google Earth Engine(GEE)was utilized to process the datasets from Landsat-7 and Landsat-8 for the five big cities of Punjab,Pakistan,from 2001 to 2023.Results from this study show significant urban warming trends,and a strong correlation between environmental variables and LST was identified.The ensemble-based three machine learning models,including XGBoost,AdaBoost,and random forest(RF),were adopted to improve the accuracy of LST evaluation.Although XGBoost and AdaBoost attained modest levels of accuracy,with R^(2) values of 0.767 and 0.706,respectively,the RF model outperformed them by achieving an exceptional R^(2) of 0.796 and RMSE of 0.476.Moreover,Pearson correlation analysis revealed a negative relationship between LST and normalized difference latent heat index(NDLI)with r=-0.67,normalized difference vegetation index(NDVI)with r=-0.6,and modified normalized difference water index(MNDWI)with the value of r as -0.57.In addition,wavelet analysis showed that vegetation and water offer long-term LST cooling,lasting up to 64 months,while built-up areas and bare soil contribute to short-term warming,lasting 4 to 8 months.Latent heat indicated variable cooling periods,surpassing 60 months in cities.These findings enhance the understanding of LST changes and the impact of climate change on the environment.
文摘As urbanization continues to accelerate,the challenges associated with managing transportation in metropolitan areas become increasingly complex.The surge in population density contributes to traffic congestion,impacting travel experiences and posing safety risks.Smart urban transportation management emerges as a strategic solution,conceptualized here as a multidimensional big data problem.The success of this strategy hinges on the effective collection of information from diverse,extensive,and heterogeneous data sources,necessitating the implementation of full⁃stack Information and Communication Technology(ICT)solutions.The main idea of the work is to investigate the current technologies of Intelligent Transportation Systems(ITS)and enhance the safety of urban transportation systems.Machine learning models,trained on historical data,can predict traffic congestion,allowing for the implementation of preventive measures.Deep learning architectures,with their ability to handle complex data representations,further refine traffic predictions,contributing to more accurate and dynamic transportation management.The background of this research underscores the challenges posed by traffic congestion in metropolitan areas and emphasizes the need for advanced technological solutions.By integrating GPS and GIS technologies with machine learning algorithms,this work aims to pay attention to the development of intelligent transportation systems that not only address current challenges but also pave the way for future advancements in urban transportation management.
基金Project supported in part by Beijing Natural Science Foundation(Grant No.1232025)Peng Huanwu Visiting Pro-fessor Program,and Academy for Multidisciplinary Studies,Capital Normal University.
文摘Replicating the chaotic characteristics inherent in nonlinear dynamical systems via machine learning(ML)is a key challenge in this rapidly advancing interdisciplinary field.In this work,we explore the potential of variational quantum circuits(VQC)for learning the stochastic properties of classical nonlinear dynamical systems.Specifically,we focus on the one-and two-dimensional logistic maps,which,while simple,remain under-explored in the context of learning dynamical characteristics.Our findings reveal that,even for such simple dynamical systems,accurately replicating longterm characteristics is hindered by a pronounced sensitivity to overfitting.While increasing the parameter complexity of the ML model typically enhances short-term prediction accuracy,it also leads to a degradation in the model’s ability to replicate long-term characteristics,primarily due to the detrimental effects of overfitting on generalization power.By comparing the VQC with two widely recognized classical ML techniques,which are long short-term memory(LSTM)networks for timeseries processing and reservoir computing,we demonstrate that VQC outperforms these methods in terms of replicating long-term characteristics.Our results suggest that for the ML of dynamics,it is demanded to develop more compact and efficient models(such as VQC)rather than more complicated and large-scale ones.
基金supported by the Advanced Materials-National Science and Technology Major Project(Grant No.2025ZD0618401)the National Natural Science Foundation of China(Grant No.12504285)+1 种基金the Natural Science Foundation of Jiangsu Province(Grant No.BK20250472)NFSG grant from BITS-Pilani,Dubai campus。
文摘The rapid advancement of machine learning based tight-binding Hamiltonian(MLTB)methods has opened new avenues for efficient and accurate electronic structure simulations,particularly in large-scale systems and long-time scenarios.This review begins with a concise overview of traditional tight-binding(TB)models,including both(semi-)empirical and first-principles approaches,establishing the foundation for understanding MLTB developments.We then present a systematic classification of existing MLTB methodologies,grouped into two major categories:direct prediction of TB Hamiltonian elements and inference of empirical parameters.A comparative analysis with other ML-based electronic structure models is also provided,highlighting the advancement of MLTB approaches.Finally,we explore the emerging MLTB application ecosystem,highlighting how the integration of MLTB models with a diverse suite of post-processing tools from linear-scaling solvers to quantum transport frameworks and molecular dynamics interfaces is essential for tackling complex scientific problems across different domains.The continued advancement of this integrated paradigm promises to accelerate materials discovery and open new frontiers in the predictive simulation of complex quantum phenomena.
基金supported by the Natural Science BasicResearch Program of Shaanxi,China(No.2021JC-14)。
文摘The water hammer problem is an important issue in the dynamics of liquid propulsion system.This paper aims to use the Lattice Boltzmann Method(LBM)with entropy limiter to study the water hammer problems in propellant feedlines.The dynamic characteristics of valve-closing water hammer and filling water hammer are investigated by this method,and the sensitivity of filling water hammer is analyzed with a single factor sensitivity analysis with 8 factors and 9 levels and a multi-factor sensitivity analysis with L_(27)(3^(13))orthogonal experiment based on range method.It is found that the solving result of LBM with entropy limiter is basically in good agreement with finite volume method,and using the entropy limiter can eliminate numerical oscillations when solving valve-closing water hammer problems and solve the numerical"blow up"when solving filling water hammer problems.It can be seen that the dynamic characteristics of valve-closing water hammer are relatively simple,while there are many factors that affect the filling water hammer and the degree of these effects varies.The effects on the maximum water hammer pressure are relatively uniform,but those on the water hammer response time vary greatly through the skewness analysis.
基金supported by the National Key Research and Development Program of China(No.2023YFB3712401),the National Natural Science Foundation of China(No.52274301)the Aeronautical Science Foundation of China(No.2023Z0530S6005)the Ningbo Yongjiang Talent-Introduction Programme(No.2022A-023-C).
文摘The viscosity of refining slags plays a critical role in metallurgical processes.However,obtaining accurate viscosity data remains challenging due to the complexities of high-temperature experiments,often relying on empirical models with limited predictive capabilities.This study focuses on the influence of optical basicity on viscosity in CaO-Al_(2)O_(3)-based refining slags,leveraging machine learning to address data scarcity and improve prediction accuracy.An automated framework for algorithm integration,parameter tuning,and evaluation ranking framework(Auto-APE)is employed to develop customized data-driven models for various slag systems,including CaO-Al_(2)O_(3)-SiO_(2),CaO-Al_(2)O_(3)-CaF_(2),CaO-Al_(2)O_(3)-SiO_(2)-MgO,and CaO-Al_(2)O_(3)-SiO_(2)-MgO-CaF_(2).By incorporating optical basicity as a key feature,the models achieve an average validation error of 8.0%to 15.1%,significantly outperforming traditional empirical models.Additionally,symbolic regression is introduced to rapidly construct domain-specific features,such as optical basicity-like descriptors,offering a potential breakthrough in performance prediction for small datasets.This work highlights the critical role of domain-specific knowledge in understanding and predicting viscosity,providing a robust machine learning-based approach for optimizing refining slag properties.
基金supported by the National Key Research and Development Program of China(2024YFE0213000)the Postdoctoral Innovative Talents Support Program(BX20240232)+1 种基金the Natural Science Foundation of China for Young Scholars(72304031)the Fundamental Research Funds for the Central Universities(FRF-TP-22-024A1).
文摘Lithium-ion batteries(LIBs)are widely deployed,from grid-scale storage to electric vehicles.LIBs remain stationary most of their service life,where calendar aging degrades capacity.Understanding the mechanisms of LIB calendar aging is crucial for extending battery lifespan.However,LIB calendar aging is influenced by multiple factors,including battery material,its state,and storage environment.Calendar aging experiments are also time-consuming,costly,and lack standardized testing conditions.This study employs a data-driven approach to establish a cross-scale database linking materials,side-reaction mechanisms,and calendar aging of LIBs.MELODI(Mechanism-informed,Explainable,Learning-based Optimization for Degradation Identification)is proposed to identify calendar aging mechanisms and quantify the effects of multi-scale factors.Results reveal that cathode material loss drives up to 91.42%of calendar aging degradation in high-nickel(Ni)batteries,while solid electrolyte interphase growth dominates in lithium iron phosphate(LFP)and low-Ni batteries,contributing up to 82.43%of degradation in LFP batteries and 99.10%of decay in low-Ni batteries,respectively.This study systematically quantifies calendar aging in commercial LIBs under varying materials,states of charge,and temperatures.These findings offer quantitative guidance for experimental design or battery use,and implications for emerging applications like aerial robotics,vehicle-to-grid,and embodied intelligence systems.
基金supported by the National Natural Science Foundation of China(grant numbers 52250357 and 52203003).
文摘The glass transition temperature(T_(g))of styrene-butadiene rubber(SBR)is a key parameter determining its low-temperature flexibility and processing performance.Accurate prediction of T_(g)is crucial formaterial design and application optimisation.Addressing the limitations of traditional experimental measurements and theoretical models in terms of efficiency,cost,and accuracy,this study proposes a machine learning prediction framework that integrates multi-model ensemble and Bayesian optimization by constructing a multi-component feature dataset and algorithm optimization strategy.Based on the constructed high-quality dataset containing 96 SBR samples,ninemachine learning models were employed to predict the T_(g)of SBR and compare their prediction performance.Ultimately,aGPR-XGBoost mixed model was constructed through model ensemble,achieving high-precision prediction with R^(2)values greater than 0.9 on both the training and test sets.Further feature attribution and local effect analysis were conducted using feature analysis methods such as SHAP and ALE,revealing the nonlinear influence patterns of various components on T_(g),providing a theoretical basis for SBR formulation design and T_(g)regulation.The machine learning prediction framework established in this study combines high-precision prediction with interpretability,significantly enhancing the prediction performance of the T_(g)of SBR.It offers an efficient tool for SBR molecular design and holds great potential for promotion and application.
基金The National Key Research and Development Program of China,No.2023YFC3206601。
文摘Landslides pose a formidable natural hazard across the Qinghai-Tibet Plateau(QTP),endangering both ecosystems and human life.Identifying the driving factors behind landslides and accurately assessing susceptibility are key to mitigating disaster risk.This study integrated multi-source historical landslide data with 15 predictive factors and used several machine learning models—Random Forest(RF),Gradient Boosting Regression Trees(GBRT),Extreme Gradient Boosting(XGBoost),and Categorical Boosting(CatBoost)—to generate susceptibility maps.The Shapley additive explanation(SHAP)method was applied to quantify factor importance and explore their nonlinear effects.The results showed that:(1)CatBoost was the best-performing model(CA=0.938,AUC=0.980)in assessing landslide susceptibility,with altitude emerging as the most significant factor,followed by distance to roads and earthquake sites,precipitation,and slope;(2)the SHAP method revealed critical nonlinear thresholds,demonstrating that historical landslides were concentrated at mid-altitudes(1400-4000 m)and decreased markedly above 4000 m,with a parallel reduction in probability beyond 700 m from roads;and(3)landslide-prone areas,comprising 13%of the QTP,were concentrated in the southeastern and northeastern parts of the plateau.By integrating machine learning and SHAP analysis,this study revealed landslide hazard-prone areas and their driving factors,providing insights to support disaster management strategies and sustainable regional planning.
基金funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2025R104)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Modern intrusion detection systems(MIDS)face persistent challenges in coping with the rapid evolution of cyber threats,high-volume network traffic,and imbalanced datasets.Traditional models often lack the robustness and explainability required to detect novel and sophisticated attacks effectively.This study introduces an advanced,explainable machine learning framework for multi-class IDS using the KDD99 and IDS datasets,which reflects real-world network behavior through a blend of normal and diverse attack classes.The methodology begins with sophisticated data preprocessing,incorporating both RobustScaler and QuantileTransformer to address outliers and skewed feature distributions,ensuring standardized and model-ready inputs.Critical dimensionality reduction is achieved via the Harris Hawks Optimization(HHO)algorithm—a nature-inspired metaheuristic modeled on hawks’hunting strategies.HHO efficiently identifies the most informative features by optimizing a fitness function based on classification performance.Following feature selection,the SMOTE is applied to the training data to resolve class imbalance by synthetically augmenting underrepresented attack types.The stacked architecture is then employed,combining the strengths of XGBoost,SVM,and RF as base learners.This layered approach improves prediction robustness and generalization by balancing bias and variance across diverse classifiers.The model was evaluated using standard classification metrics:precision,recall,F1-score,and overall accuracy.The best overall performance was recorded with an accuracy of 99.44%for UNSW-NB15,demonstrating the model’s effectiveness.After balancing,the model demonstrated a clear improvement in detecting the attacks.We tested the model on four datasets to show the effectiveness of the proposed approach and performed the ablation study to check the effect of each parameter.Also,the proposed model is computationaly efficient.To support transparency and trust in decision-making,explainable AI(XAI)techniques are incorporated that provides both global and local insight into feature contributions,and offers intuitive visualizations for individual predictions.This makes it suitable for practical deployment in cybersecurity environments that demand both precision and accountability.
基金supported by the Learning&Academic Research Institution for Master’s,PhD students,and Postdocs LAMP Program of the National Research Foundation of Korea(NRF)grant funded by the Ministry of Education(No.RS-2023-00301974)This work was also supported by the Glocal University 30 Project fund of Gyeongsang National University in 2025.
文摘Lithium manganese silicate(Li-Mn-Si-O)cathodes are key components of lithium-ion batteries,and their physical and mechanical properties are strongly influenced by their underlying crystal structures.In this study,a range of machine learning(ML)algorithms were developed and compared to predict the crystal systems of Li-Mn-Si-O cathode materials using density functional theory(DFT)data obtained from the Materials Project database.The dataset comprised 211 compositions characterized by key descriptors,including formation energy,energy above the hull,bandgap,atomic site number,density,and unit cell volume.These features were utilized to classify the materials into monoclinic(0)and triclinic(1)crystal systems.A comprehensive comparison of various classification algorithms including Decision Tree,Random Forest,XGBoost,Support VectorMachine,k-Nearest Neighbor,Stochastic Gradient Descent,Gaussian Naive Bayes,Gaussian Process,and Artificial Neural Network(ANN)was conducted.Among these,the optimized ANN architecture(6–14-14-14-1)exhibited the highest predictive performance,achieving an accuracy of 95.3%,aMatthews correlation coefficient(MCC)of 0.894,and an F-score of 0.963,demonstrating excellent consistency with DFT-predicted crystal structures.Meanwhile,RandomForest and Gaussian Processmodels also exhibited reliable and consistent predictive capability,indicating their potential as complementary approaches,particularly when data are limited or computational efficiency is required.This comparative framework provides valuable insights into model selection for crystal system classification in complex cathode materials.