The safe and reliable operation of lithium-ion batteries necessitates the accurate prediction of remaining useful life(RUL).However,this task is challenging due to the diverse ageing mechanisms,various operating condi...The safe and reliable operation of lithium-ion batteries necessitates the accurate prediction of remaining useful life(RUL).However,this task is challenging due to the diverse ageing mechanisms,various operating conditions,and limited measured signals.Although data-driven methods are perceived as a promising solution,they ignore intrinsic battery physics,leading to compromised accuracy,low efficiency,and low interpretability.In response,this study integrates domain knowledge into deep learning to enhance the RUL prediction performance.We demonstrate accurate RUL prediction using only a single charging curve.First,a generalisable physics-based model is developed to extract ageing-correlated parameters that can describe and explain battery degradation from battery charging data.The parameters inform a deep neural network(DNN)to predict RUL with high accuracy and efficiency.The trained model is validated under 3 types of batteries working under 7 conditions,considering fully charged and partially charged cases.Using data from one cycle only,the proposed method achieves a root mean squared error(RMSE)of 11.42 cycles and a mean absolute relative error(MARE)of 3.19%on average,which are over45%and 44%lower compared to the two state-of-the-art data-driven methods,respectively.Besides its accuracy,the proposed method also outperforms existing methods in terms of efficiency,input burden,and robustness.The inherent relationship between the model parameters and the battery degradation mechanism is further revealed,substantiating the intrinsic superiority of the proposed method.展开更多
The accurate prediction of peak overpressure of explosion shockwaves is significant in fields such as explosion hazard assessment and structural protection, where explosion shockwaves serve as typical destructive elem...The accurate prediction of peak overpressure of explosion shockwaves is significant in fields such as explosion hazard assessment and structural protection, where explosion shockwaves serve as typical destructive elements. Aiming at the problem of insufficient accuracy of the existing physical models for predicting the peak overpressure of ground reflected waves, two physics-informed machine learning models are constructed. The results demonstrate that the machine learning models, which incorporate physical information by predicting the deviation between the physical model and actual values and adding a physical loss term in the loss function, can accurately predict both the training and out-oftraining dataset. Compared to existing physical models, the average relative error in the predicted training domain is reduced from 17.459%-48.588% to 2%, and the proportion of average relative error less than 20% increased from 0% to 59.4% to more than 99%. In addition, the relative average error outside the prediction training set range is reduced from 14.496%-29.389% to 5%, and the proportion of relative average error less than 20% increased from 0% to 71.39% to more than 99%. The inclusion of a physical loss term enforcing monotonicity in the loss function effectively improves the extrapolation performance of machine learning. The findings of this study provide valuable reference for explosion hazard assessment and anti-explosion structural design in various fields.展开更多
Landslide susceptibility mapping is a crucial tool for disaster prevention and management.The performance of conventional data-driven model is greatly influenced by the quality of the samples data.The random selection...Landslide susceptibility mapping is a crucial tool for disaster prevention and management.The performance of conventional data-driven model is greatly influenced by the quality of the samples data.The random selection of negative samples results in the lack of interpretability throughout the assessment process.To address this limitation and construct a high-quality negative samples database,this study introduces a physics-informed machine learning approach,combining the random forest model with Scoops 3D,to optimize the negative samples selection strategy and assess the landslide susceptibility of the study area.The Scoops 3D is employed to determine the factor of safety value leveraging Bishop’s simplified method.Instead of conventional random selection,negative samples are extracted from the areas with a high factor of safety value.Subsequently,the results of conventional random forest model and physics-informed data-driven model are analyzed and discussed,focusing on model performance and prediction uncertainty.In comparison to conventional methods,the physics-informed model,set with a safety area threshold of 3,demonstrates a noteworthy improvement in the mean AUC value by 36.7%,coupled with a reduced prediction uncertainty.It is evident that the determination of the safety area threshold exerts an impact on both prediction uncertainty and model performance.展开更多
Pressure swing adsorption(PSA)modeling remains a challenging task since it exhibits strong dynamic and cyclic behavior.This study presents a systematic physics-informed machine learning method that integrates transfer...Pressure swing adsorption(PSA)modeling remains a challenging task since it exhibits strong dynamic and cyclic behavior.This study presents a systematic physics-informed machine learning method that integrates transfer learning and labeled data to construct a spatiotemporal model of the PSA process.To approximate the latent solutions of partial differential equations(PDEs)in the specific steps of pressurization,adsorption,heavy reflux,counter-current depressurization,and light reflux,the system's network representation is decomposed into five lightweight sub-networks.On this basis,we propose a parameter-based transfer learning(TL)combined with domain decomposition to address the long-term integration of periodic PDEs and expedite the network training process.Moreover,to tackle challenges related to sharp adsorption fronts,our method allows for the inclusion of a specified amount of labeled data at the boundaries and/or within the system in the loss function.The results show that the proposed method closely matches the outcomes achieved through the conventional numerical method,effectively simulating all steps and cyclic behavior within the PSA processes.展开更多
The polarity of solvents plays a critical role in various research applications,particularly in their solubilities.Polarity is conveniently characterized by the Kamlet-Taft parameters that is,the hydrogen bonding acid...The polarity of solvents plays a critical role in various research applications,particularly in their solubilities.Polarity is conveniently characterized by the Kamlet-Taft parameters that is,the hydrogen bonding acidity(α),the basicity(β),and the polarizability(π^(*)).Obtaining Kamlet-Taft parameters is very important for designer solvents,namely ionic liquids(ILs)and deep eutectic solvents(DESs).However,given the unlimited theoretical number of combinations of ionic pairs in ILs and hydrogen-bond donor/acceptor pairs in DESs,experimental determination of their Kamlet-Taft parameters is impractical.To address this,the present study developed two different machine learning(ML)algorithms to predict Kamlet-Taft parameters for designer solvents using quantum chemically derived input features.The ML models developed in the present study showed accurate predictions with high determination coefficient(R^(2))and low root mean square error(RMSE)values.Further,in the context of present interest in the circular bioeconomy,the relationship between the basicities and acidities of designer solvents and their ability to dissolve lignin and carbon dioxide(CO_(2))is discussed.Our method thus guides the design of effective solvents with optimal Kamlet-Taft parameter values dissolving and converting biomass and CO_(2)into valuable chemicals.展开更多
Large language models(LLMs)have emerged as powerful tools for addressing a wide range of problems,including those in scientific computing,particularly in solving partial differential equations(PDEs).However,different ...Large language models(LLMs)have emerged as powerful tools for addressing a wide range of problems,including those in scientific computing,particularly in solving partial differential equations(PDEs).However,different models exhibit distinct strengths and preferences,resulting in varying levels of performance.In this paper,we compare the capabilities of the most advanced LLMs—DeepSeek,ChatGPT,and Claude—along with their reasoning-optimized versions in addressing computational challenges.Specifically,we evaluate their proficiency in solving traditional numerical problems in scientific computing as well as leveraging scientific machine learning techniques for PDE-based problems.We designed all our experiments so that a nontrivial decision is required,e.g,defining the proper space of input functions for neural operator learning.Our findings show that reasoning and hybrid-reasoning models consistently and significantly outperform non-reasoning ones in solving challenging problems,with ChatGPT o3-mini-high generally offering the fastest reasoning speed.展开更多
Excellent detonation performances and low sensitivity are prerequisites for the deployment of energetic materials.Exploring the underlying factors that affect impact sensitivity and detonation performances as well as ...Excellent detonation performances and low sensitivity are prerequisites for the deployment of energetic materials.Exploring the underlying factors that affect impact sensitivity and detonation performances as well as exploring how to obtain materials with desired properties remains a long-term challenge.Machine learning with its ability to solve complex tasks and perform robust data processing can reveal the relationship between performance and descriptive indicators,potentially accelerating the development process of energetic materials.In this background,impact sensitivity,detonation performances,and 28 physicochemical parameters for 222 energetic materials from density functional theory calculations and published literature were sorted out.Four machine learning algorithms were employed to predict various properties of energetic materials,including impact sensitivity,detonation velocity,detonation pressure,and Gurney energy.Analysis of Pearson coefficients and feature importance showed that the heat of explosion,oxygen balance,decomposition products,and HOMO energy levels have a strong correlation with the impact sensitivity of energetic materials.Oxygen balance,decomposition products,and density have a strong correlation with detonation performances.Utilizing impact sensitivity of 2,3,4-trinitrotoluene and the detonation performances of 2,4,6-trinitrobenzene-1,3,5-triamine as the benchmark,the analysis of feature importance rankings and statistical data revealed the optimal range of key features balancing impact sensitivity and detonation performances:oxygen balance values should be between-40%and-30%,density should range from 1.66 to 1.72 g/cm^(3),HOMO energy levels should be between-6.34 and-6.31 eV,and lipophilicity should be between-1.0 and 0.1,4.49 and 5.59.These findings not only offer important insights into the impact sensitivity and detonation performances of energetic materials,but also provide a theoretical guidance paradigm for the design and development of new energetic materials with optimal detonation performances and reduced sensitivity.展开更多
The presence of aluminum(Al^(3+))and fluoride(F^(−))ions in the environment can be harmful to ecosystems and human health,highlighting the need for accurate and efficient monitoring.In this paper,an innovative approac...The presence of aluminum(Al^(3+))and fluoride(F^(−))ions in the environment can be harmful to ecosystems and human health,highlighting the need for accurate and efficient monitoring.In this paper,an innovative approach is presented that leverages the power of machine learning to enhance the accuracy and efficiency of fluorescence-based detection for sequential quantitative analysis of aluminum(Al^(3+))and fluoride(F^(−))ions in aqueous solutions.The proposed method involves the synthesis of sulfur-functionalized carbon dots(C-dots)as fluorescence probes,with fluorescence enhancement upon interaction with Al^(3+)ions,achieving a detection limit of 4.2 nmol/L.Subsequently,in the presence of F^(−)ions,fluorescence is quenched,with a detection limit of 47.6 nmol/L.The fingerprints of fluorescence images are extracted using a cross-platform computer vision library in Python,followed by data preprocessing.Subsequently,the fingerprint data is subjected to cluster analysis using the K-means model from machine learning,and the average Silhouette Coefficient indicates excellent model performance.Finally,a regression analysis based on the principal component analysis method is employed to achieve more precise quantitative analysis of aluminum and fluoride ions.The results demonstrate that the developed model excels in terms of accuracy and sensitivity.This groundbreaking model not only showcases exceptional performance but also addresses the urgent need for effective environmental monitoring and risk assessment,making it a valuable tool for safeguarding our ecosystems and public health.展开更多
To better understand the migration behavior of plastic fragments in the environment,development of rapid non-destructive methods for in-situ identification and characterization of plastic fragments is necessary.Howeve...To better understand the migration behavior of plastic fragments in the environment,development of rapid non-destructive methods for in-situ identification and characterization of plastic fragments is necessary.However,most of the studies had focused only on colored plastic fragments,ignoring colorless plastic fragments and the effects of different environmental media(backgrounds),thus underestimating their abundance.To address this issue,the present study used near-infrared spectroscopy to compare the identification of colored and colorless plastic fragments based on partial least squares-discriminant analysis(PLS-DA),extreme gradient boost,support vector machine and random forest classifier.The effects of polymer color,type,thickness,and background on the plastic fragments classification were evaluated.PLS-DA presented the best and most stable outcome,with higher robustness and lower misclassification rate.All models frequently misinterpreted colorless plastic fragments and its background when the fragment thickness was less than 0.1mm.A two-stage modeling method,which first distinguishes the plastic types and then identifies colorless plastic fragments that had been misclassified as background,was proposed.The method presented an accuracy higher than 99%in different backgrounds.In summary,this study developed a novel method for rapid and synchronous identification of colored and colorless plastic fragments under complex environmental backgrounds.展开更多
The application of machine learning for pyrite discrimination establishes a robust foundation for constructing the ore-forming history of multi-stage deposits;however,published models face challenges related to limite...The application of machine learning for pyrite discrimination establishes a robust foundation for constructing the ore-forming history of multi-stage deposits;however,published models face challenges related to limited,imbalanced datasets and oversampling.In this study,the dataset was expanded to approximately 500 samples for each type,including 508 sedimentary,573 orogenic gold,548 sedimentary exhalative(SEDEX)deposits,and 364 volcanogenic massive sulfides(VMS)pyrites,utilizing random forest(RF)and support vector machine(SVM)methodologies to enhance the reliability of the classifier models.The RF classifier achieved an overall accuracy of 99.8%,and the SVM classifier attained an overall accuracy of 100%.The model was evaluated by a five-fold cross-validation approach with 93.8%accuracy for the RF and 94.9%for the SVM classifier.These results demonstrate the strong feasibility of pyrite classification,supported by a relatively large,balanced dataset and high accuracy rates.The classifier was employed to reveal the genesis of the controversial Keketale Pb-Zn deposit in NW China,which has been inconclusive among SEDEX,VMS,or a SEDEX-VMS transition.Petrographic investigations indicated that the deposit comprises early fine-grained layered pyrite(Py1)and late recrystallized pyrite(Py2).The majority voting classified Py1 as the VMS type,with an accuracy of RF and SVM being 72.2%and 75%,respectively,and confirmed Py2 as an orogenic type with 74.3% and 77.1%accuracy,respectively.The new findings indicated that the Keketale deposit originated from a submarine VMS mineralization system,followed by late orogenic-type overprinting of metamorphism and deformation,which is consistent with the geological and geochemical observations.This study further emphasizes the advantages of Machine learning(ML)methods in accurately and directly discriminating the deposit types and reconstructing the formation history of multi-stage deposits.展开更多
Liposomes serve as critical carriers for drugs and vaccines,with their biological effects influenced by their size.The microfluidic method,renowned for its precise control,reproducibility,and scalability,has been wide...Liposomes serve as critical carriers for drugs and vaccines,with their biological effects influenced by their size.The microfluidic method,renowned for its precise control,reproducibility,and scalability,has been widely employed for liposome preparation.Although some studies have explored factors affecting liposomal size in microfluidic processes,most focus on small-sized liposomes,predominantly through experimental data analysis.However,the production of larger liposomes,which are equally significant,remains underexplored.In this work,we thoroughly investigate multiple variables influencing liposome size during microfluidic preparation and develop a machine learning(ML)model capable of accurately predicting liposomal size.Experimental validation was conducted using a staggered herringbone micromixer(SHM)chip.Our findings reveal that most investigated variables significantly influence liposomal size,often interrelating in complex ways.We evaluated the predictive performance of several widely-used ML algorithms,including ensemble methods,through cross-validation(CV)for both lipo-some size and polydispersity index(PDI).A standalone dataset was experimentally validated to assess the accuracy of the ML predictions,with results indicating that ensemble algorithms provided the most reliable predictions.Specifically,gradient boosting was selected for size prediction,while random forest was employed for PDI prediction.We successfully produced uniform large(600 nm)and small(100 nm)liposomes using the optimised experimental conditions derived from the ML models.In conclusion,this study presents a robust methodology that enables precise control over liposome size distribution,of-fering valuable insights for medicinal research applications.展开更多
Arsenic(As)pollution in soils is a pervasive environmental issue.Biochar immobilization offers a promising solution for addressing soil As contamination.The efficiency of biochar in immobilizing As in soils primarily ...Arsenic(As)pollution in soils is a pervasive environmental issue.Biochar immobilization offers a promising solution for addressing soil As contamination.The efficiency of biochar in immobilizing As in soils primarily hinges on the characteristics of both the soil and the biochar.However,the influence of a specific property on As immobilization varies among different studies,and the development and application of arsenic passivation materials based on biochar often rely on empirical knowledge.To enhance immobilization efficiency and reduce labor and time costs,a machine learning(ML)model was employed to predict As immobilization efficiency before biochar application.In this study,we collected a dataset comprising 182 data points on As immobilization efficiency from 17 publications to construct three ML models.The results demonstrated that the random forest(RF)model outperformed gradient boost regression tree and support vector regression models in predictive performance.Relative importance analysis and partial dependence plots based on the RF model were conducted to identify the most crucial factors influencing As immobilization.These findings highlighted the significant roles of biochar application time and biochar pH in As immobilization efficiency in soils.Furthermore,the study revealed that Fe-modified biochar exhibited a substantial improvement in As immobilization.These insights can facilitate targeted biochar property design and optimization of biochar application conditions to enhance As immobilization efficiency.展开更多
Open caissons are widely used in foundation engineering because of their load-bearing efficiency and adaptability in diverse soil conditions.However,accurately predicting their undrained bearing capacity in layered so...Open caissons are widely used in foundation engineering because of their load-bearing efficiency and adaptability in diverse soil conditions.However,accurately predicting their undrained bearing capacity in layered soils remains a complex challenge.This study presents a novel application of five ensemble machine(ML)algorithms-random forest(RF),gradient boosting machine(GBM),extreme gradient boosting(XGBoost),adaptive boosting(AdaBoost),and categorical boosting(CatBoost)-to predict the undrained bearing capacity factor(Nc)of circular open caissons embedded in two-layered clay on the basis of results from finite element limit analysis(FELA).The input dataset consists of 1188 numerical simulations using the Tresca failure criterion,varying in geometrical and soil parameters.The FELA was performed via OptumG2 software with adaptive meshing techniques and verified against existing benchmark studies.The ML models were trained on 70% of the dataset and tested on the remaining 30%.Their performance was evaluated using six statistical metrics:coefficient of determination(R²),mean absolute error(MAE),root mean squared error(RMSE),index of scatter(IOS),RMSE-to-standard deviation ratio(RSR),and variance explained factor(VAF).The results indicate that all the models achieved high accuracy,with R²values exceeding 97.6%and RMSE values below 0.02.Among them,AdaBoost and CatBoost consistently outperformed the other methods across both the training and testing datasets,demonstrating superior generalizability and robustness.The proposed ML framework offers an efficient,accurate,and data-driven alternative to traditional methods for estimating caisson capacity in stratified soils.This approach can aid in reducing computational costs while improving reliability in the early stages of foundation design.展开更多
BACKGROUND To investigate the preoperative factors influencing textbook outcomes(TO)in Intrahepatic cholangiocarcinoma(ICC)patients and evaluate the feasibility of an interpretable machine learning model for preoperat...BACKGROUND To investigate the preoperative factors influencing textbook outcomes(TO)in Intrahepatic cholangiocarcinoma(ICC)patients and evaluate the feasibility of an interpretable machine learning model for preoperative prediction of TO,we developed a machine learning model for preoperative prediction of TO and used the SHapley Additive exPlanations(SHAP)technique to illustrate the prediction process.AIM To analyze the factors influencing textbook outcomes before surgery and to establish interpretable machine learning models for preoperative prediction.METHODS A total of 376 patients diagnosed with ICC were retrospectively collected from four major medical institutions in China,covering the period from 2011 to 2017.Logistic regression analysis was conducted to identify preoperative variables associated with achieving TO.Based on these variables,an EXtreme Gradient Boosting(XGBoost)machine learning prediction model was constructed using the XGBoost package.The SHAP(package:Shapviz)algorithm was employed to visualize each variable's contribution to the model's predictions.Kaplan-Meier survival analysis was performed to compare the prognostic differences between the TO-achieving and non-TO-achieving groups.RESULTS Among 376 patients,287 were included in the training group and 89 in the validation group.Logistic regression identified the following preoperative variables influencing TO:Child-Pugh classification,Eastern Cooperative Oncology Group(ECOG)score,hepatitis B,and tumor size.The XGBoost prediction model demonstrated high accuracy in internal validation(AUC=0.8825)and external validation(AUC=0.8346).Survival analysis revealed that the disease-free survival rates for patients achieving TO at 1,2,and 3 years were 64.2%,56.8%,and 43.4%,respectively.CONCLUSION Child-Pugh classification,ECOG score,hepatitis B,and tumor size are preoperative predictors of TO.In both the training group and the validation group,the machine learning model had certain effectiveness in predicting TO before surgery.The SHAP algorithm provided intuitive visualization of the machine learning prediction process,enhancing its interpretability.展开更多
Integrating exhaled breath analysis into the diagnosis of cardiovascular diseases holds significant promise as a valuable tool for future clinical use,particularly for ischemic heart disease(IHD).However,current resea...Integrating exhaled breath analysis into the diagnosis of cardiovascular diseases holds significant promise as a valuable tool for future clinical use,particularly for ischemic heart disease(IHD).However,current research on the volatilome(exhaled breath composition)in heart disease remains underexplored and lacks sufficient evidence to confirm its clinical validity.Key challenges hindering the application of breath analysis in diagnosing IHD include the scarcity of studies(only three published papers to date),substantial methodological bias in two of these studies,and the absence of standardized protocols for clinical imple-mentation.Additionally,inconsistencies in methodologies—such as sample collection,analytical techniques,machine learning(ML)approaches,and result interpretation—vary widely across studies,further complicating their reprodu-cibility and comparability.To address these gaps,there is an urgent need to establish unified guidelines that define best practices for breath sample collection,data analysis,ML integration,and biomarker annotation.Until these challenges are systematically resolved,the widespread adoption of exhaled breath analysis as a reliable diagnostic tool for IHD remains a distant goal rather than an immi-nent reality.展开更多
The application of machine learning in alloy design is increasingly widespread,yet traditional models still face challenges when dealing with limited datasets and complex nonlinear relationships.This work proposes an ...The application of machine learning in alloy design is increasingly widespread,yet traditional models still face challenges when dealing with limited datasets and complex nonlinear relationships.This work proposes an interpretable machine learning method based on data augmentation and reconstruction,excavating high-performance low-alloyed magnesium(Mg)alloys.The data augmentation technique expands the original dataset through Gaussian noise.The data reconstruction method reorganizes and transforms the original data to extract more representative features,significantly improving the model's generalization ability and prediction accuracy,with a coefficient of determination(R^(2))of 95.9%for the ultimate tensile strength(UTS)model and a R^(2)of 95.3%for the elongation-to-failure(EL)model.The correlation coefficient assisted screening(CCAS)method is proposed to filter low-alloyed target alloys.A new Mg-2.2Mn-0.4Zn-0.2Al-0.2Ca(MZAX2000,wt%)alloy is designed and extruded into bar at given processing parameters,achieving room-temperature strength-ductility synergy showing an excellent UTS of 395 MPa and a high EL of 17.9%.This is closely related to its hetero-structured characteristic in the as-extruded MZAX2000 alloy consisting of coarse grains(16%),fine grains(75%),and fiber regions(9%).Therefore,this work offers new insights into optimizing alloy compositions and processing parameters for attaining new high strong and ductile low-alloyed Mg alloys.展开更多
Carbon emissions resulting from energy consumption have become a pressing issue for governments worldwide.Accurate estimation of carbon emissions using satellite remote sensing data has become a crucial research probl...Carbon emissions resulting from energy consumption have become a pressing issue for governments worldwide.Accurate estimation of carbon emissions using satellite remote sensing data has become a crucial research problem.Previous studies relied on statistical regression models that failed to capture the complex nonlinear relationships between carbon emissions and characteristic variables.In this study,we propose a machine learning algorithm for carbon emissions,a Bayesian optimized XGboost regression model,using multi-year energy carbon emission data and nighttime lights(NTL)remote sensing data from Shaanxi Province,China.Our results demonstrate that the XGboost algorithm outperforms linear regression and four other machine learning models,with an R^(2)of 0.906 and RMSE of 5.687.We observe an annual increase in carbon emissions,with high-emission counties primarily concentrated in northern and central Shaanxi Province,displaying a shift from discrete,sporadic points to contiguous,extended spatial distribution.Spatial autocorrelation clustering reveals predominantly high-high and low-low clustering patterns,with economically developed counties showing high-emission clustering and economically relatively backward counties displaying low-emission clustering.Our findings show that the use of NTL data and the XGboost algorithm can estimate and predict carbon emissionsmore accurately and provide a complementary reference for satellite remote sensing image data to serve carbon emission monitoring and assessment.This research provides an important theoretical basis for formulating practical carbon emission reduction policies and contributes to the development of techniques for accurate carbon emission estimation using remote sensing data.展开更多
BACKGROUND Patients with early-stage hepatocellular carcinoma(HCC)generally have good survival rates following surgical resection.However,a subset of these patients experience recurrence within five years post-surgery...BACKGROUND Patients with early-stage hepatocellular carcinoma(HCC)generally have good survival rates following surgical resection.However,a subset of these patients experience recurrence within five years post-surgery.AIM To develop predictive models utilizing machine learning(ML)methods to detect early-stage patients at a high risk of mortality.METHODS Eight hundred and eight patients with HCC at Beijing Ditan Hospital were randomly allocated to training and validation cohorts in a 2:1 ratio.Prognostic models were generated using random survival forests and artificial neural networks(ANNs).These ML models were compared with other classic HCC scoring systems.A decision-tree model was established to validate the contri-bution of immune-inflammatory indicators to the long-term outlook of patients with early-stage HCC.RESULTS Immune-inflammatory markers,albumin-bilirubin scores,alpha-fetoprotein,tumor size,and International Normalized Ratio were closely associated with the 5-year survival rates.Among various predictive models,the ANN model gene-rated using these indicators through ML algorithms exhibited superior perfor-mance,with a 5-year area under the curve(AUC)of 0.85(95%CI:0.82-0.88).In the validation cohort,the 5-year AUC was 0.82(95%CI:0.74-0.85).According to the ANN model,patients were classified into high-risk and low-risk groups,with an overall survival hazard ratio of 7.98(95%CI:5.85-10.93,P<0.0001)between the two cohorts.INTRODUCTION Hepatocellular carcinoma(HCC)is one of the six most prevalent cancers[1]and the third leading cause of cancer-related mortality[2].China has some of the highest incidence and mortality rates for liver cancer,accounting for half of global cases[3,4].The Barcelona Clinic Liver Cancer(BCLC)Staging System is the most widely used framework for diagnosing and treating HCC[5].The optimal candidates for surgical treatment are those with early-stage HCC,classified as BCLC stage 0 or A.Patients with early-stage liver cancer typically have a better prognosis after surgical resection,achieving a 5-year survival rate of 60%-70%[6].However,the high postoperative recurrence rates of HCC remain a major obstacle to long-term efficacy.To improve the prognosis of patients with early-stage HCC,it is necessary to develop models that can identify those with poor prognoses,enabling stratified and personalized treatment and follow-up strategies.Chronic inflammation is linked to the development and advancement of tumors[7].Recently,peripheral blood immune indicators,such as neutrophil-to-lymphocyte ratio(NLR),platelet-to-lymphocyte ratio(PLR),and lymphocyte-to-monocyte ratio(LMR),have garnered extensive attention and have been used to predict survival in various tumors and inflammation-related diseases[8-10].However,the relationship between these combinations of immune markers and the outcomes in patients with early-stage HCC require further investigation.Machine learning(ML)algorithms are capable of handling large and complex datasets,generating more accurate and personalized predictions through unique training algorithms that better manage nonlinear statistical relationships than traditional analytical methods.Commonly used ML models include artificial neural networks(ANNs)and random survival forests(RSFs),which have shown satisfactory accuracy in prognostic predictions across various cancers and other diseases[11-13].ANNs have performed well in identifying the progression from liver cirrhosis to HCC and predicting overall survival(OS)in patients with HCC[14,15].However,no studies have confirmed the ability of ML models to predict post-surgical survival in patients with early-stage HCC.Through ML,a better understanding of the risk factors for early-stage HCC prognosis can be achieved.This aids in surgical decision-making,identifying patients at a high risk of mortality,and selecting subsequent treatment strategies.In this study,we aimed to establish a 5-year prognostic model for patients with early-stage HCC after surgical resection,based on ML and systemic immune-inflammatory indicators.This model seeks to improve the early monitoring of high-risk patients and provide personalized treatment plans.展开更多
With the rapid development of artificial intelligence,magnetocaloric materials as well as other materials are being developed with increased efficiency and enhanced performance.However,most studies do not take phase t...With the rapid development of artificial intelligence,magnetocaloric materials as well as other materials are being developed with increased efficiency and enhanced performance.However,most studies do not take phase transitions into account,and as a result,the predictions are usually not accurate enough.In this context,we have established an explicable relationship between alloy compositions and phase transition by feature imputation.A facile machine learning is proposed to screen candidate NiMn-based Heusler alloys with desired magnetic entropy change and magnetic transition temperature with a high accuracy R^(2)≈0.98.As expected,the measured properties of prepared NiMn-based alloys,including phase transition type,magnetic entropy changes and transition temperature,are all in good agreement with the ML predictions.As well as being the first to demonstrate an explicable relationship between alloy compositions,phase transitions and magnetocaloric properties,our proposed ML model is highly predictive and interpretable,which can provide a strong theoretical foundation for identifying high-performance magnetocaloric materials in the future.展开更多
Superconducting radio-frequency(SRF)cavities are the core components of SRF linear accelerators,making their stable operation considerably important.However,the operational experience from different accelerator labora...Superconducting radio-frequency(SRF)cavities are the core components of SRF linear accelerators,making their stable operation considerably important.However,the operational experience from different accelerator laboratories has revealed that SRF faults are the leading cause of short machine downtime trips.When a cavity fault occurs,system experts analyze the time-series data recorded by low-level RF systems and identify the fault type.However,this requires expertise and intuition,posing a major challenge for control-room operators.Here,we propose an expert feature-based machine learning model for automating SRF cavity fault recognition.The main challenge in converting the"expert reasoning"process for SRF faults into a"model inference"process lies in feature extraction,which is attributed to the associated multidimensional and complex time-series waveforms.Existing autoregression-based feature-extraction methods require the signal to be stable and autocorrelated,resulting in difficulty in capturing the abrupt features that exist in several SRF failure patterns.To address these issues,we introduce expertise into the classification model through reasonable feature engineering.We demonstrate the feasibility of this method using the SRF cavity of the China accelerator facility for superheavy elements(CAFE2).Although specific faults in SRF cavities may vary across different accelerators,similarities exist in the RF signals.Therefore,this study provides valuable guidance for fault analysis of the entire SRF community.展开更多
基金the financial support from the National Natural Science Foundation of China(52207229)the financial support from the China Scholarship Council(202207550010)。
文摘The safe and reliable operation of lithium-ion batteries necessitates the accurate prediction of remaining useful life(RUL).However,this task is challenging due to the diverse ageing mechanisms,various operating conditions,and limited measured signals.Although data-driven methods are perceived as a promising solution,they ignore intrinsic battery physics,leading to compromised accuracy,low efficiency,and low interpretability.In response,this study integrates domain knowledge into deep learning to enhance the RUL prediction performance.We demonstrate accurate RUL prediction using only a single charging curve.First,a generalisable physics-based model is developed to extract ageing-correlated parameters that can describe and explain battery degradation from battery charging data.The parameters inform a deep neural network(DNN)to predict RUL with high accuracy and efficiency.The trained model is validated under 3 types of batteries working under 7 conditions,considering fully charged and partially charged cases.Using data from one cycle only,the proposed method achieves a root mean squared error(RMSE)of 11.42 cycles and a mean absolute relative error(MARE)of 3.19%on average,which are over45%and 44%lower compared to the two state-of-the-art data-driven methods,respectively.Besides its accuracy,the proposed method also outperforms existing methods in terms of efficiency,input burden,and robustness.The inherent relationship between the model parameters and the battery degradation mechanism is further revealed,substantiating the intrinsic superiority of the proposed method.
文摘The accurate prediction of peak overpressure of explosion shockwaves is significant in fields such as explosion hazard assessment and structural protection, where explosion shockwaves serve as typical destructive elements. Aiming at the problem of insufficient accuracy of the existing physical models for predicting the peak overpressure of ground reflected waves, two physics-informed machine learning models are constructed. The results demonstrate that the machine learning models, which incorporate physical information by predicting the deviation between the physical model and actual values and adding a physical loss term in the loss function, can accurately predict both the training and out-oftraining dataset. Compared to existing physical models, the average relative error in the predicted training domain is reduced from 17.459%-48.588% to 2%, and the proportion of average relative error less than 20% increased from 0% to 59.4% to more than 99%. In addition, the relative average error outside the prediction training set range is reduced from 14.496%-29.389% to 5%, and the proportion of relative average error less than 20% increased from 0% to 71.39% to more than 99%. The inclusion of a physical loss term enforcing monotonicity in the loss function effectively improves the extrapolation performance of machine learning. The findings of this study provide valuable reference for explosion hazard assessment and anti-explosion structural design in various fields.
基金Project(G2022165004L)supported by the High-end Foreign Expert Introduction Program,ChinaProject(2021XM3008)supported by the Special Foundation of Postdoctoral Support Program,Chongqing,China+1 种基金Project(2018-ZL-01)supported by the Sichuan Transportation Science and Technology Project,ChinaProject(HZ2021001)supported by the Chongqing Municipal Education Commission,China。
文摘Landslide susceptibility mapping is a crucial tool for disaster prevention and management.The performance of conventional data-driven model is greatly influenced by the quality of the samples data.The random selection of negative samples results in the lack of interpretability throughout the assessment process.To address this limitation and construct a high-quality negative samples database,this study introduces a physics-informed machine learning approach,combining the random forest model with Scoops 3D,to optimize the negative samples selection strategy and assess the landslide susceptibility of the study area.The Scoops 3D is employed to determine the factor of safety value leveraging Bishop’s simplified method.Instead of conventional random selection,negative samples are extracted from the areas with a high factor of safety value.Subsequently,the results of conventional random forest model and physics-informed data-driven model are analyzed and discussed,focusing on model performance and prediction uncertainty.In comparison to conventional methods,the physics-informed model,set with a safety area threshold of 3,demonstrates a noteworthy improvement in the mean AUC value by 36.7%,coupled with a reduced prediction uncertainty.It is evident that the determination of the safety area threshold exerts an impact on both prediction uncertainty and model performance.
基金supported by the National Natural Science Foundation of China(Nos.22078373 and 22078372).
文摘Pressure swing adsorption(PSA)modeling remains a challenging task since it exhibits strong dynamic and cyclic behavior.This study presents a systematic physics-informed machine learning method that integrates transfer learning and labeled data to construct a spatiotemporal model of the PSA process.To approximate the latent solutions of partial differential equations(PDEs)in the specific steps of pressurization,adsorption,heavy reflux,counter-current depressurization,and light reflux,the system's network representation is decomposed into five lightweight sub-networks.On this basis,we propose a parameter-based transfer learning(TL)combined with domain decomposition to address the long-term integration of periodic PDEs and expedite the network training process.Moreover,to tackle challenges related to sharp adsorption fronts,our method allows for the inclusion of a specified amount of labeled data at the boundaries and/or within the system in the loss function.The results show that the proposed method closely matches the outcomes achieved through the conventional numerical method,effectively simulating all steps and cyclic behavior within the PSA processes.
基金supported by the U.S.Department of Energy,Office of Science,Biological and Environmental Research Program under award#ERKP752,and the DOE Office of Science,Office of Basic Energy Sciences,Division of Chemical Sciences,Geosciences,and Biosciences(CSGB)(Award No.DE-SC0022214FWP 3ERKCG25)This manuscript has been authored by UT-Battelle,LLC,under contract DEAC05-00OR22725 with the US Department of Energy(DOE)。
文摘The polarity of solvents plays a critical role in various research applications,particularly in their solubilities.Polarity is conveniently characterized by the Kamlet-Taft parameters that is,the hydrogen bonding acidity(α),the basicity(β),and the polarizability(π^(*)).Obtaining Kamlet-Taft parameters is very important for designer solvents,namely ionic liquids(ILs)and deep eutectic solvents(DESs).However,given the unlimited theoretical number of combinations of ionic pairs in ILs and hydrogen-bond donor/acceptor pairs in DESs,experimental determination of their Kamlet-Taft parameters is impractical.To address this,the present study developed two different machine learning(ML)algorithms to predict Kamlet-Taft parameters for designer solvents using quantum chemically derived input features.The ML models developed in the present study showed accurate predictions with high determination coefficient(R^(2))and low root mean square error(RMSE)values.Further,in the context of present interest in the circular bioeconomy,the relationship between the basicities and acidities of designer solvents and their ability to dissolve lignin and carbon dioxide(CO_(2))is discussed.Our method thus guides the design of effective solvents with optimal Kamlet-Taft parameter values dissolving and converting biomass and CO_(2)into valuable chemicals.
基金supported by the ONR Vannevar Bush Faculty Fellowship(Grant No.N00014-22-1-2795).
文摘Large language models(LLMs)have emerged as powerful tools for addressing a wide range of problems,including those in scientific computing,particularly in solving partial differential equations(PDEs).However,different models exhibit distinct strengths and preferences,resulting in varying levels of performance.In this paper,we compare the capabilities of the most advanced LLMs—DeepSeek,ChatGPT,and Claude—along with their reasoning-optimized versions in addressing computational challenges.Specifically,we evaluate their proficiency in solving traditional numerical problems in scientific computing as well as leveraging scientific machine learning techniques for PDE-based problems.We designed all our experiments so that a nontrivial decision is required,e.g,defining the proper space of input functions for neural operator learning.Our findings show that reasoning and hybrid-reasoning models consistently and significantly outperform non-reasoning ones in solving challenging problems,with ChatGPT o3-mini-high generally offering the fastest reasoning speed.
基金supported by the Fundamental Research Funds for the Central Universities(Grant No.2682024GF019)。
文摘Excellent detonation performances and low sensitivity are prerequisites for the deployment of energetic materials.Exploring the underlying factors that affect impact sensitivity and detonation performances as well as exploring how to obtain materials with desired properties remains a long-term challenge.Machine learning with its ability to solve complex tasks and perform robust data processing can reveal the relationship between performance and descriptive indicators,potentially accelerating the development process of energetic materials.In this background,impact sensitivity,detonation performances,and 28 physicochemical parameters for 222 energetic materials from density functional theory calculations and published literature were sorted out.Four machine learning algorithms were employed to predict various properties of energetic materials,including impact sensitivity,detonation velocity,detonation pressure,and Gurney energy.Analysis of Pearson coefficients and feature importance showed that the heat of explosion,oxygen balance,decomposition products,and HOMO energy levels have a strong correlation with the impact sensitivity of energetic materials.Oxygen balance,decomposition products,and density have a strong correlation with detonation performances.Utilizing impact sensitivity of 2,3,4-trinitrotoluene and the detonation performances of 2,4,6-trinitrobenzene-1,3,5-triamine as the benchmark,the analysis of feature importance rankings and statistical data revealed the optimal range of key features balancing impact sensitivity and detonation performances:oxygen balance values should be between-40%and-30%,density should range from 1.66 to 1.72 g/cm^(3),HOMO energy levels should be between-6.34 and-6.31 eV,and lipophilicity should be between-1.0 and 0.1,4.49 and 5.59.These findings not only offer important insights into the impact sensitivity and detonation performances of energetic materials,but also provide a theoretical guidance paradigm for the design and development of new energetic materials with optimal detonation performances and reduced sensitivity.
基金supported by the National Natural Science Foundation of China(No.U21A20290)Guangdong Basic and Applied Basic Research Foundation(No.2022A1515011656)+2 种基金the Projects of Talents Recruitment of GDUPT(No.2023rcyj1003)the 2022“Sail Plan”Project of Maoming Green Chemical Industry Research Institute(No.MMGCIRI2022YFJH-Y-024)Maoming Science and Technology Project(No.2023382).
文摘The presence of aluminum(Al^(3+))and fluoride(F^(−))ions in the environment can be harmful to ecosystems and human health,highlighting the need for accurate and efficient monitoring.In this paper,an innovative approach is presented that leverages the power of machine learning to enhance the accuracy and efficiency of fluorescence-based detection for sequential quantitative analysis of aluminum(Al^(3+))and fluoride(F^(−))ions in aqueous solutions.The proposed method involves the synthesis of sulfur-functionalized carbon dots(C-dots)as fluorescence probes,with fluorescence enhancement upon interaction with Al^(3+)ions,achieving a detection limit of 4.2 nmol/L.Subsequently,in the presence of F^(−)ions,fluorescence is quenched,with a detection limit of 47.6 nmol/L.The fingerprints of fluorescence images are extracted using a cross-platform computer vision library in Python,followed by data preprocessing.Subsequently,the fingerprint data is subjected to cluster analysis using the K-means model from machine learning,and the average Silhouette Coefficient indicates excellent model performance.Finally,a regression analysis based on the principal component analysis method is employed to achieve more precise quantitative analysis of aluminum and fluoride ions.The results demonstrate that the developed model excels in terms of accuracy and sensitivity.This groundbreaking model not only showcases exceptional performance but also addresses the urgent need for effective environmental monitoring and risk assessment,making it a valuable tool for safeguarding our ecosystems and public health.
基金supported by the National Natural Science Foundation of China(No.22276139)the Shanghai’s Municipal State-owned Assets Supervision and Administration Commission(No.2022028).
文摘To better understand the migration behavior of plastic fragments in the environment,development of rapid non-destructive methods for in-situ identification and characterization of plastic fragments is necessary.However,most of the studies had focused only on colored plastic fragments,ignoring colorless plastic fragments and the effects of different environmental media(backgrounds),thus underestimating their abundance.To address this issue,the present study used near-infrared spectroscopy to compare the identification of colored and colorless plastic fragments based on partial least squares-discriminant analysis(PLS-DA),extreme gradient boost,support vector machine and random forest classifier.The effects of polymer color,type,thickness,and background on the plastic fragments classification were evaluated.PLS-DA presented the best and most stable outcome,with higher robustness and lower misclassification rate.All models frequently misinterpreted colorless plastic fragments and its background when the fragment thickness was less than 0.1mm.A two-stage modeling method,which first distinguishes the plastic types and then identifies colorless plastic fragments that had been misclassified as background,was proposed.The method presented an accuracy higher than 99%in different backgrounds.In summary,this study developed a novel method for rapid and synchronous identification of colored and colorless plastic fragments under complex environmental backgrounds.
基金the National Key Research and Development Program of China(2021YFC2900300)the Natural Science Foundation of Guangdong Province(2024A1515030216)+2 种基金MOST Special Fund from State Key Laboratory of Geological Processes and Mineral Resources,China University of Geosciences(GPMR202437)the Guangdong Province Introduced of Innovative R&D Team(2021ZT09H399)the Third Xinjiang Scientific Expedition Program(2022xjkk1301).
文摘The application of machine learning for pyrite discrimination establishes a robust foundation for constructing the ore-forming history of multi-stage deposits;however,published models face challenges related to limited,imbalanced datasets and oversampling.In this study,the dataset was expanded to approximately 500 samples for each type,including 508 sedimentary,573 orogenic gold,548 sedimentary exhalative(SEDEX)deposits,and 364 volcanogenic massive sulfides(VMS)pyrites,utilizing random forest(RF)and support vector machine(SVM)methodologies to enhance the reliability of the classifier models.The RF classifier achieved an overall accuracy of 99.8%,and the SVM classifier attained an overall accuracy of 100%.The model was evaluated by a five-fold cross-validation approach with 93.8%accuracy for the RF and 94.9%for the SVM classifier.These results demonstrate the strong feasibility of pyrite classification,supported by a relatively large,balanced dataset and high accuracy rates.The classifier was employed to reveal the genesis of the controversial Keketale Pb-Zn deposit in NW China,which has been inconclusive among SEDEX,VMS,or a SEDEX-VMS transition.Petrographic investigations indicated that the deposit comprises early fine-grained layered pyrite(Py1)and late recrystallized pyrite(Py2).The majority voting classified Py1 as the VMS type,with an accuracy of RF and SVM being 72.2%and 75%,respectively,and confirmed Py2 as an orogenic type with 74.3% and 77.1%accuracy,respectively.The new findings indicated that the Keketale deposit originated from a submarine VMS mineralization system,followed by late orogenic-type overprinting of metamorphism and deformation,which is consistent with the geological and geochemical observations.This study further emphasizes the advantages of Machine learning(ML)methods in accurately and directly discriminating the deposit types and reconstructing the formation history of multi-stage deposits.
基金supported by the National Key Research and Development Plan of the Ministry of Science and Technology,China(Grant No.:2022YFE0125300)the National Natural Science Foundation of China(Grant No:81690262)+2 种基金the National Science and Technology Major Project,China(Grant No.:2017ZX09201004-021)the Open Project of National facility for Translational Medicine(Shanghai),China(Grant No.:TMSK-2021-104)Shanghai Jiao Tong University STAR Grant,China(Grant Nos.:YG2022ZD024 and YG2022QN111).
文摘Liposomes serve as critical carriers for drugs and vaccines,with their biological effects influenced by their size.The microfluidic method,renowned for its precise control,reproducibility,and scalability,has been widely employed for liposome preparation.Although some studies have explored factors affecting liposomal size in microfluidic processes,most focus on small-sized liposomes,predominantly through experimental data analysis.However,the production of larger liposomes,which are equally significant,remains underexplored.In this work,we thoroughly investigate multiple variables influencing liposome size during microfluidic preparation and develop a machine learning(ML)model capable of accurately predicting liposomal size.Experimental validation was conducted using a staggered herringbone micromixer(SHM)chip.Our findings reveal that most investigated variables significantly influence liposomal size,often interrelating in complex ways.We evaluated the predictive performance of several widely-used ML algorithms,including ensemble methods,through cross-validation(CV)for both lipo-some size and polydispersity index(PDI).A standalone dataset was experimentally validated to assess the accuracy of the ML predictions,with results indicating that ensemble algorithms provided the most reliable predictions.Specifically,gradient boosting was selected for size prediction,while random forest was employed for PDI prediction.We successfully produced uniform large(600 nm)and small(100 nm)liposomes using the optimised experimental conditions derived from the ML models.In conclusion,this study presents a robust methodology that enables precise control over liposome size distribution,of-fering valuable insights for medicinal research applications.
基金supported by the National Key Research and Development Program of China(No.2020YFC1808701).
文摘Arsenic(As)pollution in soils is a pervasive environmental issue.Biochar immobilization offers a promising solution for addressing soil As contamination.The efficiency of biochar in immobilizing As in soils primarily hinges on the characteristics of both the soil and the biochar.However,the influence of a specific property on As immobilization varies among different studies,and the development and application of arsenic passivation materials based on biochar often rely on empirical knowledge.To enhance immobilization efficiency and reduce labor and time costs,a machine learning(ML)model was employed to predict As immobilization efficiency before biochar application.In this study,we collected a dataset comprising 182 data points on As immobilization efficiency from 17 publications to construct three ML models.The results demonstrated that the random forest(RF)model outperformed gradient boost regression tree and support vector regression models in predictive performance.Relative importance analysis and partial dependence plots based on the RF model were conducted to identify the most crucial factors influencing As immobilization.These findings highlighted the significant roles of biochar application time and biochar pH in As immobilization efficiency in soils.Furthermore,the study revealed that Fe-modified biochar exhibited a substantial improvement in As immobilization.These insights can facilitate targeted biochar property design and optimization of biochar application conditions to enhance As immobilization efficiency.
文摘Open caissons are widely used in foundation engineering because of their load-bearing efficiency and adaptability in diverse soil conditions.However,accurately predicting their undrained bearing capacity in layered soils remains a complex challenge.This study presents a novel application of five ensemble machine(ML)algorithms-random forest(RF),gradient boosting machine(GBM),extreme gradient boosting(XGBoost),adaptive boosting(AdaBoost),and categorical boosting(CatBoost)-to predict the undrained bearing capacity factor(Nc)of circular open caissons embedded in two-layered clay on the basis of results from finite element limit analysis(FELA).The input dataset consists of 1188 numerical simulations using the Tresca failure criterion,varying in geometrical and soil parameters.The FELA was performed via OptumG2 software with adaptive meshing techniques and verified against existing benchmark studies.The ML models were trained on 70% of the dataset and tested on the remaining 30%.Their performance was evaluated using six statistical metrics:coefficient of determination(R²),mean absolute error(MAE),root mean squared error(RMSE),index of scatter(IOS),RMSE-to-standard deviation ratio(RSR),and variance explained factor(VAF).The results indicate that all the models achieved high accuracy,with R²values exceeding 97.6%and RMSE values below 0.02.Among them,AdaBoost and CatBoost consistently outperformed the other methods across both the training and testing datasets,demonstrating superior generalizability and robustness.The proposed ML framework offers an efficient,accurate,and data-driven alternative to traditional methods for estimating caisson capacity in stratified soils.This approach can aid in reducing computational costs while improving reliability in the early stages of foundation design.
基金Supported by National Key Research and Development Program,No.2022YFC2407304Major Research Project for Middle-Aged and Young Scientists of Fujian Provincial Health Commission,No.2021ZQNZD013+2 种基金The National Natural Science Foundation of China,No.62275050Fujian Province Science and Technology Innovation Joint Fund Project,No.2019Y9108Major Science and Technology Projects of Fujian Province,No.2021YZ036017.
文摘BACKGROUND To investigate the preoperative factors influencing textbook outcomes(TO)in Intrahepatic cholangiocarcinoma(ICC)patients and evaluate the feasibility of an interpretable machine learning model for preoperative prediction of TO,we developed a machine learning model for preoperative prediction of TO and used the SHapley Additive exPlanations(SHAP)technique to illustrate the prediction process.AIM To analyze the factors influencing textbook outcomes before surgery and to establish interpretable machine learning models for preoperative prediction.METHODS A total of 376 patients diagnosed with ICC were retrospectively collected from four major medical institutions in China,covering the period from 2011 to 2017.Logistic regression analysis was conducted to identify preoperative variables associated with achieving TO.Based on these variables,an EXtreme Gradient Boosting(XGBoost)machine learning prediction model was constructed using the XGBoost package.The SHAP(package:Shapviz)algorithm was employed to visualize each variable's contribution to the model's predictions.Kaplan-Meier survival analysis was performed to compare the prognostic differences between the TO-achieving and non-TO-achieving groups.RESULTS Among 376 patients,287 were included in the training group and 89 in the validation group.Logistic regression identified the following preoperative variables influencing TO:Child-Pugh classification,Eastern Cooperative Oncology Group(ECOG)score,hepatitis B,and tumor size.The XGBoost prediction model demonstrated high accuracy in internal validation(AUC=0.8825)and external validation(AUC=0.8346).Survival analysis revealed that the disease-free survival rates for patients achieving TO at 1,2,and 3 years were 64.2%,56.8%,and 43.4%,respectively.CONCLUSION Child-Pugh classification,ECOG score,hepatitis B,and tumor size are preoperative predictors of TO.In both the training group and the validation group,the machine learning model had certain effectiveness in predicting TO before surgery.The SHAP algorithm provided intuitive visualization of the machine learning prediction process,enhancing its interpretability.
基金Supported by The government assignment,No.1023022600020-6The Ministry of Science and Higher Education of the Russian Federation Within The Framework of State Support for The Creation and Development of World-Class Research Center“Digital Biodesign and Personalized Healthcare,”No.075-15-2022-304RSF grant,No.24-15-00549.
文摘Integrating exhaled breath analysis into the diagnosis of cardiovascular diseases holds significant promise as a valuable tool for future clinical use,particularly for ischemic heart disease(IHD).However,current research on the volatilome(exhaled breath composition)in heart disease remains underexplored and lacks sufficient evidence to confirm its clinical validity.Key challenges hindering the application of breath analysis in diagnosing IHD include the scarcity of studies(only three published papers to date),substantial methodological bias in two of these studies,and the absence of standardized protocols for clinical imple-mentation.Additionally,inconsistencies in methodologies—such as sample collection,analytical techniques,machine learning(ML)approaches,and result interpretation—vary widely across studies,further complicating their reprodu-cibility and comparability.To address these gaps,there is an urgent need to establish unified guidelines that define best practices for breath sample collection,data analysis,ML integration,and biomarker annotation.Until these challenges are systematically resolved,the widespread adoption of exhaled breath analysis as a reliable diagnostic tool for IHD remains a distant goal rather than an immi-nent reality.
基金funded by the National Natural Science Foundation of China(No.52204407)the Natural Science Foundation of Jiangsu Province(No.BK20220595)+1 种基金the China Postdoctoral Science Foundation(No.2022M723689)the Industrial Collaborative Innovation Project of Shanghai(No.XTCX-KJ-2022-2-11)。
文摘The application of machine learning in alloy design is increasingly widespread,yet traditional models still face challenges when dealing with limited datasets and complex nonlinear relationships.This work proposes an interpretable machine learning method based on data augmentation and reconstruction,excavating high-performance low-alloyed magnesium(Mg)alloys.The data augmentation technique expands the original dataset through Gaussian noise.The data reconstruction method reorganizes and transforms the original data to extract more representative features,significantly improving the model's generalization ability and prediction accuracy,with a coefficient of determination(R^(2))of 95.9%for the ultimate tensile strength(UTS)model and a R^(2)of 95.3%for the elongation-to-failure(EL)model.The correlation coefficient assisted screening(CCAS)method is proposed to filter low-alloyed target alloys.A new Mg-2.2Mn-0.4Zn-0.2Al-0.2Ca(MZAX2000,wt%)alloy is designed and extruded into bar at given processing parameters,achieving room-temperature strength-ductility synergy showing an excellent UTS of 395 MPa and a high EL of 17.9%.This is closely related to its hetero-structured characteristic in the as-extruded MZAX2000 alloy consisting of coarse grains(16%),fine grains(75%),and fiber regions(9%).Therefore,this work offers new insights into optimizing alloy compositions and processing parameters for attaining new high strong and ductile low-alloyed Mg alloys.
基金supported by the Key Research and Development Program in Shaanxi Province,China(No.2022ZDLSF07-05)the Fundamental Research Funds for the Central Universities,CHD(No.300102352901)。
文摘Carbon emissions resulting from energy consumption have become a pressing issue for governments worldwide.Accurate estimation of carbon emissions using satellite remote sensing data has become a crucial research problem.Previous studies relied on statistical regression models that failed to capture the complex nonlinear relationships between carbon emissions and characteristic variables.In this study,we propose a machine learning algorithm for carbon emissions,a Bayesian optimized XGboost regression model,using multi-year energy carbon emission data and nighttime lights(NTL)remote sensing data from Shaanxi Province,China.Our results demonstrate that the XGboost algorithm outperforms linear regression and four other machine learning models,with an R^(2)of 0.906 and RMSE of 5.687.We observe an annual increase in carbon emissions,with high-emission counties primarily concentrated in northern and central Shaanxi Province,displaying a shift from discrete,sporadic points to contiguous,extended spatial distribution.Spatial autocorrelation clustering reveals predominantly high-high and low-low clustering patterns,with economically developed counties showing high-emission clustering and economically relatively backward counties displaying low-emission clustering.Our findings show that the use of NTL data and the XGboost algorithm can estimate and predict carbon emissionsmore accurately and provide a complementary reference for satellite remote sensing image data to serve carbon emission monitoring and assessment.This research provides an important theoretical basis for formulating practical carbon emission reduction policies and contributes to the development of techniques for accurate carbon emission estimation using remote sensing data.
基金Supported by High-Level Chinese Medicine Key Discipline Construction Project,No.zyyzdxk-2023005Capital Health Development Research Project,No.2024-1-2173the National Natural Science Foundation of China,No.82474426 and No.82474419。
文摘BACKGROUND Patients with early-stage hepatocellular carcinoma(HCC)generally have good survival rates following surgical resection.However,a subset of these patients experience recurrence within five years post-surgery.AIM To develop predictive models utilizing machine learning(ML)methods to detect early-stage patients at a high risk of mortality.METHODS Eight hundred and eight patients with HCC at Beijing Ditan Hospital were randomly allocated to training and validation cohorts in a 2:1 ratio.Prognostic models were generated using random survival forests and artificial neural networks(ANNs).These ML models were compared with other classic HCC scoring systems.A decision-tree model was established to validate the contri-bution of immune-inflammatory indicators to the long-term outlook of patients with early-stage HCC.RESULTS Immune-inflammatory markers,albumin-bilirubin scores,alpha-fetoprotein,tumor size,and International Normalized Ratio were closely associated with the 5-year survival rates.Among various predictive models,the ANN model gene-rated using these indicators through ML algorithms exhibited superior perfor-mance,with a 5-year area under the curve(AUC)of 0.85(95%CI:0.82-0.88).In the validation cohort,the 5-year AUC was 0.82(95%CI:0.74-0.85).According to the ANN model,patients were classified into high-risk and low-risk groups,with an overall survival hazard ratio of 7.98(95%CI:5.85-10.93,P<0.0001)between the two cohorts.INTRODUCTION Hepatocellular carcinoma(HCC)is one of the six most prevalent cancers[1]and the third leading cause of cancer-related mortality[2].China has some of the highest incidence and mortality rates for liver cancer,accounting for half of global cases[3,4].The Barcelona Clinic Liver Cancer(BCLC)Staging System is the most widely used framework for diagnosing and treating HCC[5].The optimal candidates for surgical treatment are those with early-stage HCC,classified as BCLC stage 0 or A.Patients with early-stage liver cancer typically have a better prognosis after surgical resection,achieving a 5-year survival rate of 60%-70%[6].However,the high postoperative recurrence rates of HCC remain a major obstacle to long-term efficacy.To improve the prognosis of patients with early-stage HCC,it is necessary to develop models that can identify those with poor prognoses,enabling stratified and personalized treatment and follow-up strategies.Chronic inflammation is linked to the development and advancement of tumors[7].Recently,peripheral blood immune indicators,such as neutrophil-to-lymphocyte ratio(NLR),platelet-to-lymphocyte ratio(PLR),and lymphocyte-to-monocyte ratio(LMR),have garnered extensive attention and have been used to predict survival in various tumors and inflammation-related diseases[8-10].However,the relationship between these combinations of immune markers and the outcomes in patients with early-stage HCC require further investigation.Machine learning(ML)algorithms are capable of handling large and complex datasets,generating more accurate and personalized predictions through unique training algorithms that better manage nonlinear statistical relationships than traditional analytical methods.Commonly used ML models include artificial neural networks(ANNs)and random survival forests(RSFs),which have shown satisfactory accuracy in prognostic predictions across various cancers and other diseases[11-13].ANNs have performed well in identifying the progression from liver cirrhosis to HCC and predicting overall survival(OS)in patients with HCC[14,15].However,no studies have confirmed the ability of ML models to predict post-surgical survival in patients with early-stage HCC.Through ML,a better understanding of the risk factors for early-stage HCC prognosis can be achieved.This aids in surgical decision-making,identifying patients at a high risk of mortality,and selecting subsequent treatment strategies.In this study,we aimed to establish a 5-year prognostic model for patients with early-stage HCC after surgical resection,based on ML and systemic immune-inflammatory indicators.This model seeks to improve the early monitoring of high-risk patients and provide personalized treatment plans.
基金supported by the National Key R&D Program of China(No.2022YFE0109500)the National Natural Science Foundation of China(Nos.52071255,52301250,52171190 and 12304027)+2 种基金the Key R&D Project of Shaanxi Province(No.2022GXLH-01-07)the Fundamental Research Funds for the Central Universities(China)the World-Class Universities(Disciplines)and the Characteristic Development Guidance Funds for the Central Universities.
文摘With the rapid development of artificial intelligence,magnetocaloric materials as well as other materials are being developed with increased efficiency and enhanced performance.However,most studies do not take phase transitions into account,and as a result,the predictions are usually not accurate enough.In this context,we have established an explicable relationship between alloy compositions and phase transition by feature imputation.A facile machine learning is proposed to screen candidate NiMn-based Heusler alloys with desired magnetic entropy change and magnetic transition temperature with a high accuracy R^(2)≈0.98.As expected,the measured properties of prepared NiMn-based alloys,including phase transition type,magnetic entropy changes and transition temperature,are all in good agreement with the ML predictions.As well as being the first to demonstrate an explicable relationship between alloy compositions,phase transitions and magnetocaloric properties,our proposed ML model is highly predictive and interpretable,which can provide a strong theoretical foundation for identifying high-performance magnetocaloric materials in the future.
基金supported by the studies of intelligent LLRF control algorithms for superconducting RF cavities(No.E129851YR0)the National Natural Science Foundation of China(No.U22A20261)Applications of Artificial Intelligence in the Stability Study of Superconducting Linear Accelerators(No.E429851YR0)。
文摘Superconducting radio-frequency(SRF)cavities are the core components of SRF linear accelerators,making their stable operation considerably important.However,the operational experience from different accelerator laboratories has revealed that SRF faults are the leading cause of short machine downtime trips.When a cavity fault occurs,system experts analyze the time-series data recorded by low-level RF systems and identify the fault type.However,this requires expertise and intuition,posing a major challenge for control-room operators.Here,we propose an expert feature-based machine learning model for automating SRF cavity fault recognition.The main challenge in converting the"expert reasoning"process for SRF faults into a"model inference"process lies in feature extraction,which is attributed to the associated multidimensional and complex time-series waveforms.Existing autoregression-based feature-extraction methods require the signal to be stable and autocorrelated,resulting in difficulty in capturing the abrupt features that exist in several SRF failure patterns.To address these issues,we introduce expertise into the classification model through reasonable feature engineering.We demonstrate the feasibility of this method using the SRF cavity of the China accelerator facility for superheavy elements(CAFE2).Although specific faults in SRF cavities may vary across different accelerators,similarities exist in the RF signals.Therefore,this study provides valuable guidance for fault analysis of the entire SRF community.