Arsenic(As)pollution in soils is a pervasive environmental issue.Biochar immobilization offers a promising solution for addressing soil As contamination.The efficiency of biochar in immobilizing As in soils primarily ...Arsenic(As)pollution in soils is a pervasive environmental issue.Biochar immobilization offers a promising solution for addressing soil As contamination.The efficiency of biochar in immobilizing As in soils primarily hinges on the characteristics of both the soil and the biochar.However,the influence of a specific property on As immobilization varies among different studies,and the development and application of arsenic passivation materials based on biochar often rely on empirical knowledge.To enhance immobilization efficiency and reduce labor and time costs,a machine learning(ML)model was employed to predict As immobilization efficiency before biochar application.In this study,we collected a dataset comprising 182 data points on As immobilization efficiency from 17 publications to construct three ML models.The results demonstrated that the random forest(RF)model outperformed gradient boost regression tree and support vector regression models in predictive performance.Relative importance analysis and partial dependence plots based on the RF model were conducted to identify the most crucial factors influencing As immobilization.These findings highlighted the significant roles of biochar application time and biochar pH in As immobilization efficiency in soils.Furthermore,the study revealed that Fe-modified biochar exhibited a substantial improvement in As immobilization.These insights can facilitate targeted biochar property design and optimization of biochar application conditions to enhance As immobilization efficiency.展开更多
In engineering practice,it is often necessary to determine functional relationships between dependent and independent variables.These relationships can be highly nonlinear,and classical regression approaches cannot al...In engineering practice,it is often necessary to determine functional relationships between dependent and independent variables.These relationships can be highly nonlinear,and classical regression approaches cannot always provide sufficiently reliable solutions.Nevertheless,Machine Learning(ML)techniques,which offer advanced regression tools to address complicated engineering issues,have been developed and widely explored.This study investigates the selected ML techniques to evaluate their suitability for application in the hot deformation behavior of metallic materials.The ML-based regression methods of Artificial Neural Networks(ANNs),Support Vector Machine(SVM),Decision Tree Regression(DTR),and Gaussian Process Regression(GPR)are applied to mathematically describe hot flow stress curve datasets acquired experimentally for a medium-carbon steel.Although the GPR method has not been used for such a regression task before,the results showed that its performance is the most favorable and practically unrivaled;neither the ANN method nor the other studied ML techniques provide such precise results of the solved regression analysis.展开更多
Excellent detonation performances and low sensitivity are prerequisites for the deployment of energetic materials.Exploring the underlying factors that affect impact sensitivity and detonation performances as well as ...Excellent detonation performances and low sensitivity are prerequisites for the deployment of energetic materials.Exploring the underlying factors that affect impact sensitivity and detonation performances as well as exploring how to obtain materials with desired properties remains a long-term challenge.Machine learning with its ability to solve complex tasks and perform robust data processing can reveal the relationship between performance and descriptive indicators,potentially accelerating the development process of energetic materials.In this background,impact sensitivity,detonation performances,and 28 physicochemical parameters for 222 energetic materials from density functional theory calculations and published literature were sorted out.Four machine learning algorithms were employed to predict various properties of energetic materials,including impact sensitivity,detonation velocity,detonation pressure,and Gurney energy.Analysis of Pearson coefficients and feature importance showed that the heat of explosion,oxygen balance,decomposition products,and HOMO energy levels have a strong correlation with the impact sensitivity of energetic materials.Oxygen balance,decomposition products,and density have a strong correlation with detonation performances.Utilizing impact sensitivity of 2,3,4-trinitrotoluene and the detonation performances of 2,4,6-trinitrobenzene-1,3,5-triamine as the benchmark,the analysis of feature importance rankings and statistical data revealed the optimal range of key features balancing impact sensitivity and detonation performances:oxygen balance values should be between-40%and-30%,density should range from 1.66 to 1.72 g/cm^(3),HOMO energy levels should be between-6.34 and-6.31 eV,and lipophilicity should be between-1.0 and 0.1,4.49 and 5.59.These findings not only offer important insights into the impact sensitivity and detonation performances of energetic materials,but also provide a theoretical guidance paradigm for the design and development of new energetic materials with optimal detonation performances and reduced sensitivity.展开更多
Finding materials with specific properties is a hot topic in materials science.Traditional materials design relies on empirical and trial-and-error methods,requiring extensive experiments and time,resulting in high co...Finding materials with specific properties is a hot topic in materials science.Traditional materials design relies on empirical and trial-and-error methods,requiring extensive experiments and time,resulting in high costs.With the development of physics,statistics,computer science,and other fields,machine learning offers opportunities for systematically discovering new materials.Especially through machine learning-based inverse design,machine learning algorithms analyze the mapping relationships between materials and their properties to find materials with desired properties.This paper first outlines the basic concepts of materials inverse design and the challenges faced by machine learning-based approaches to materials inverse design.Then,three main inverse design methods—exploration-based,model-based,and optimization-based—are analyzed in the context of different application scenarios.Finally,the applications of inverse design methods in alloys,optical materials,and acoustic materials are elaborated on,and the prospects for materials inverse design are discussed.The authors hope to accelerate the discovery of new materials and provide new possibilities for advancing materials science and innovative design methods.展开更多
Geological analysis,despite being a long-term method for identifying adverse geology in tunnels,has significant limitations due to its reliance on empirical analysis.The quantitative aspects of geochemical anomalies a...Geological analysis,despite being a long-term method for identifying adverse geology in tunnels,has significant limitations due to its reliance on empirical analysis.The quantitative aspects of geochemical anomalies associated with adverse geology provide a novel strategy for addressing these limitations.However,statistical methods for identifying geochemical anomalies are insufficient for tunnel engineering.In contrast,data mining techniques such as machine learning have demonstrated greater efficacy when applied to geological data.Herein,a method for identifying adverse geology using machine learning of geochemical anomalies is proposed.The method was identified geochemical anomalies in tunnel that were not identified by statistical methods.We by employing robust factor analysis and self-organizing maps to reduce the dimensionality of geochemical data and extract the anomaly elements combination(AEC).Using the AEC sample data,we trained an isolation forest model to identify the multi-element anomalies,successfully.We analyzed the adverse geological features based the multi-element anomalies.This study,therefore,extends the traditional approach of geological analysis in tunnels and demonstrates that machine learning is an effective tool for intelligent geological analysis.Correspondingly,the research offers new insights regarding the adverse geology and the prevention of hazards during the construction of tunnels and underground engineering projects.展开更多
Determination of Shear Bond strength(SBS)at interlayer of double-layer asphalt concrete is crucial in flexible pavement structures.The study used three Machine Learning(ML)models,including K-Nearest Neighbors(KNN),Ext...Determination of Shear Bond strength(SBS)at interlayer of double-layer asphalt concrete is crucial in flexible pavement structures.The study used three Machine Learning(ML)models,including K-Nearest Neighbors(KNN),Extra Trees(ET),and Light Gradient Boosting Machine(LGBM),to predict SBS based on easily determinable input parameters.Also,the Grid Search technique was employed for hyper-parameter tuning of the ML models,and cross-validation and learning curve analysis were used for training the models.The models were built on a database of 240 experimental results and three input variables:temperature,normal pressure,and tack coat rate.Model validation was performed using three statistical criteria:the coefficient of determination(R2),the Root Mean Square Error(RMSE),and the mean absolute error(MAE).Additionally,SHAP analysis was also used to validate the importance of the input variables in the prediction of the SBS.Results show that these models accurately predict SBS,with LGBM providing outstanding performance.SHAP(Shapley Additive explanation)analysis for LGBM indicates that temperature is the most influential factor on SBS.Consequently,the proposed ML models can quickly and accurately predict SBS between two layers of asphalt concrete,serving practical applications in flexible pavement structure design.展开更多
As energy demands continue to rise in modern society,the development of high-performance lithium-ion batteries(LIBs)has become crucial.However,traditional research methods of material science face challenges such as l...As energy demands continue to rise in modern society,the development of high-performance lithium-ion batteries(LIBs)has become crucial.However,traditional research methods of material science face challenges such as lengthy timelines and complex processes.In recent years,the integration of machine learning(ML)in LIB materials,including electrolytes,solid-state electrolytes,and electrodes,has yielded remarkable achievements.This comprehensive review explores the latest applications of ML in predicting LIB material performance,covering the core principles and recent advancements in three key inverse material design strategies:high-throughput virtual screening,global optimization,and generative models.These strategies have played a pivotal role in fostering LIB material innovations.Meanwhile,the paper briefly discusses the challenges associated with applying ML to materials research and offers insights and directions for future research.展开更多
Background Cotton is one of the most important commercial crops after food crops,especially in countries like India,where it’s grown extensively under rainfed conditions.Because of its usage in multiple industries,su...Background Cotton is one of the most important commercial crops after food crops,especially in countries like India,where it’s grown extensively under rainfed conditions.Because of its usage in multiple industries,such as textile,medicine,and automobile industries,it has greater commercial importance.The crop’s performance is greatly influenced by prevailing weather dynamics.As climate changes,assessing how weather changes affect crop performance is essential.Among various techniques that are available,crop models are the most effective and widely used tools for predicting yields.Results This study compares statistical and machine learning models to assess their ability to predict cotton yield across major producing districts of Karnataka,India,utilizing a long-term dataset spanning from 1990 to 2023 that includes yield and weather factors.The artificial neural networks(ANNs)performed superiorly with acceptable yield deviations ranging within±10%during both vegetative stage(F1)and mid stage(F2)for cotton.The model evaluation metrics such as root mean square error(RMSE),normalized root mean square error(nRMSE),and modelling efficiency(EF)were also within the acceptance limits in most districts.Furthermore,the tested ANN model was used to assess the importance of the dominant weather factors influencing crop yield in each district.Specifically,the use of morning relative humidity as an individual parameter and its interaction with maximum and minimum tempera-ture had a major influence on cotton yield in most of the yield predicted districts.These differences highlighted the differential interactions of weather factors in each district for cotton yield formation,highlighting individual response of each weather factor under different soils and management conditions over the major cotton growing districts of Karnataka.Conclusions Compared with statistical models,machine learning models such as ANNs proved higher efficiency in forecasting the cotton yield due to their ability to consider the interactive effects of weather factors on yield forma-tion at different growth stages.This highlights the best suitability of ANNs for yield forecasting in rainfed conditions and for the study on relative impacts of weather factors on yield.Thus,the study aims to provide valuable insights to support stakeholders in planning effective crop management strategies and formulating relevant policies.展开更多
Teacher–student relationships play a vital role in improving college students’academic performance and the quality of higher education.However,empirical studies with substantial data-driven insights remain limited.T...Teacher–student relationships play a vital role in improving college students’academic performance and the quality of higher education.However,empirical studies with substantial data-driven insights remain limited.To address this gap,this study collected 3278 questionnaires from seven universities across four provinces in China to analyze the key factors affecting college students’academic performance.A machine learning framework,CQFOA-KELM,was developed by enhancing the Fruit Fly Optimization Algorithm(FOA)with Covariance Matrix Adaptation Evolution Strategy(CMAES)and Quadratic Approximation(QA).CQFOA significantly improved population diversity and was validated on the IEEE CEC2017 benchmark functions.The CQFOA-KELM model achieved an accuracy of 98.15%and a sensitivity of 98.53%in predicting college students’academic performance.Additionally,it effectively identified the key factors influencing academic performance through the feature selection process.展开更多
Accuracy allocation is crucial in the accuracy design of machining tools.Current accuracy allocation methods primarily focus on positional deviation,with little consideration for tool direction deviation.To address th...Accuracy allocation is crucial in the accuracy design of machining tools.Current accuracy allocation methods primarily focus on positional deviation,with little consideration for tool direction deviation.To address this issue,we propose a geometric error cost sensitivity-based accuracy allocation method for five-axis machine tools.A geometric error model consisting of 4l error components is constructed based on homogeneous transformation matrices.Volumetric points with positional and tool direction deviations are randomly sampled to evaluate the accuracy of the machine tool.The sensitivity of each error component at these sampling points is analyzed using the Sobol method.To balance the needs of geometric precision and manufacturing cost,a geometric error cost sensitivity function is developed to estimate the required cost.By allocating error components affecting tool direction deviation first and the remaining components second,this allocation scheme ensures that both deviations meet the requirements.We also perform numerical simulation of a BC-type(B-axis and C-axis type)five-axis machine tool to validate the method.The results show that the new allocation scheme reduces the total geometric error cost by 27.8%compared to a uniform allocation scheme,and yields the same positional and tool direction machining accuracies.展开更多
Lung cancer, the leading cause of cancer deaths worldwide and in China, has a 19.7% five-year survival rate due to terminal-stage diagnosis^([1-3]).Although low-dose computed tomography(CT) screening can reduce mortal...Lung cancer, the leading cause of cancer deaths worldwide and in China, has a 19.7% five-year survival rate due to terminal-stage diagnosis^([1-3]).Although low-dose computed tomography(CT) screening can reduce mortality, high false positive rates can create economic and psychological burdens.展开更多
Solar cells made from perovskites have experienced rapid development as examples of third-generation solar cells in recent years. The traditional trial-and-error method is inefficient, and the search space is incredib...Solar cells made from perovskites have experienced rapid development as examples of third-generation solar cells in recent years. The traditional trial-and-error method is inefficient, and the search space is incredibly large. This makes developing advanced perovskite materials, as well as high conversion efficiencies and stability of perovskite solar cells (PSCs), a challenging task. A growing number of data-driven machine learning (ML) applications are being developed in the materials science field, due to the availability of large databases and increased computing power. There are many advantages associated with the use of machine learning to predict the properties of potential perovskite materials, as well as provide additional knowledge on how these materials work to fast-track their progress. Thus, the purpose of this paper is to develop a conceptual model to improve the efficiency of a perovskite solar cell using machine learning techniques in order to improve its performance. This study relies on the application of design science as a method to conduct the research as part of the study. The developed model consists of six phases: Data collection and preprocessing, feature selection and engineering, model training and evaluation, performance assessment, optimization and fine-tuning, and deployment and application. As a result of this model, there is a great deal of promise in advancing the field of perovskite solar cells as well as providing a basis for developing more efficient and cost-effective solar energy technologies in the future.展开更多
Accurate and robust detection of wax appearance(a medium-to high-molecular-weight component of crude oil)is crucial for the efficient operation of hydrocarbon transportation.The wax appearance temperature(WAT)is the l...Accurate and robust detection of wax appearance(a medium-to high-molecular-weight component of crude oil)is crucial for the efficient operation of hydrocarbon transportation.The wax appearance temperature(WAT)is the lowest temperature at which the wax begins to form.When crude oil cools to its WAT,wax crystals precipitate,forming deposits on pipelines as the solubility limit is reached.Therefore,WAT is a crucial quality assurance parameter,especially when dealing with modern fuel oil blends.In this study,we use machine learning via MATLAB’s Bioinformatics Toolbox to predict the WAT of marine fuel samples by correlating near-infrared spectral data with laboratory-measured values.The dataset provided by Intertek PLC-a total quality assurance provider of inspection,testing,and certification services-includes industrial data that is imbalanced,with a higher proportion of high-WAT samples compared to low-WAT samples.The objective is to predict marine fuel oil blends with unusually high WAT values(>35℃)without relying on time-consuming and irregular laboratory-based measurements.The results demonstrate that the developed model,based on the one-class support vector machine(OCSVM)algorithm,achieved a Recall of 96,accurately predicting 96%of fuel samples with WAT>35℃.For standard binary classification,the Recall was 85.7.The trained OCSVM model is expected to facilitate rapid and well-informed decision-making for logistics and storage when choosing fuel oils.展开更多
NJmat is a user-friendly,data-driven machine learning interface designed for materials design and analysis.The platform integrates advanced computational techniques,including natural language processing(NLP),large lan...NJmat is a user-friendly,data-driven machine learning interface designed for materials design and analysis.The platform integrates advanced computational techniques,including natural language processing(NLP),large language models(LLM),machine learning potentials(MLP),and graph neural networks(GNN),to facili-tate materials discovery.The platform has been applied in diverse materials research areas,including perovskite surface design,catalyst discovery,battery materials screening,structural alloy design,and molecular informatics.By automating feature selection,predictive modeling,and result interpretation,NJmat accelerates the development of high-performance materials across energy storage,conversion,and structural applications.Additionally,NJmat serves as an educational tool,allowing students and researchers to apply machine learning techniques in materials science with minimal coding expertise.Through automated feature extraction,genetic algorithms,and interpretable machine learning models,NJmat simplifies the workflow for materials informatics,bridging the gap between AI and experimental materials research.The latest version(available at https://figshare.com/articles/software/NJmatML/24607893(accessed on 01 January 2025))enhances its functionality by incorporating NJmatNLP,a module leveraging language models like MatBERT and those based on Word2Vec to support materials prediction tasks.By utilizing clustering and cosine similarity analysis with UMAP visualization,NJmat enables intuitive exploration of materials datasets.While NJmat primarily focuses on structure-property relationships and the discovery of novel chemistries,it can also assist in optimizing processing conditions when relevant parameters are included in the training data.By providing an accessible,integrated environment for machine learning-driven materials discovery,NJmat aligns with the objectives of the Materials Genome Initiative and promotes broader adoption of AI techniques in materials science.展开更多
Objective:To assess the effectiveness of machine learning in automating the prediction of vestibular abnormalities after cochlear implantation(CI)in patients with sensorineural hearing loss(SNHL),with the goal of deve...Objective:To assess the effectiveness of machine learning in automating the prediction of vestibular abnormalities after cochlear implantation(CI)in patients with sensorineural hearing loss(SNHL),with the goal of developing a practical model that can accurately predict long-term vestibular function outcomes and identify associated risk factors.Methods:Clinical data,including imaging,vestibular evoked myogenic potentials(VEMPs),and auditory information,were collected from patients with sensorineural hearing loss(SNHL)before and after CI.The decision tree algorithm was employed to address missing values and screen pre-CI clinical features.Six machine learning methods were subsequently utilized to predict the relationships between the extracted features and post-CI vestibular dysfunction.The best-performing method determined the ranking of feature importance,which was regarded as risk factors for predicting symptoms and VEMPs results after CI.Results:Logistic regression models effectively predicted both post-CI vestibular dysfunction and abnormal cervical VEMP(c VEMP),with accuracies of 80%and 78%,respectively.The relative importance of the features,in descending order,was as follows:c VEMP latency,c VEMP amplitude,and residual hearing threshold.Moreover,the support vector machine(SVM)model attained an accuracy of 88%in predicting abnormal ocular VEMP(o VEMP)post-CI.For the SVM model,the feature importance ranking was as follows:o VEMP latency,o VEMP amplitude,and residual hearing threshold.Conclusions:This study successfully leverages machine learning techniques,specifically support vector machines(SVM)and logistic regression models,to predict the impact of CI on vestibular function.These predictive models provide valuable insights for presurgical planning and decision-making in CI procedures.Moreover,the findings highlight the critical risk factors associated with vestibular dysfunction,offering a robust reference for guiding vestibular rehabilitation strategies.展开更多
The prediction of tool wear in CNC machine tools is a critical aspect of ensuring the efficient operation and longevity of manufacturing equipment.Tool wear significantly impacts machining accuracy,surface finish qual...The prediction of tool wear in CNC machine tools is a critical aspect of ensuring the efficient operation and longevity of manufacturing equipment.Tool wear significantly impacts machining accuracy,surface finish quality,and operational downtime,making its prediction essential for proactive maintenance strategies.This paper explores the integration of Digital Twin technology with tool wear prediction models to enhance the precision and reliability of wear forecasting in CNC machines.We review existing methodologies for tool wear prediction,including physics-based models,data-driven approaches,and hybrid models,with an emphasis on their strengths and limitations.Furthermore,the paper highlights the role of Digital Twin technology in creating real-time,virtual replicas of CNC machines that can dynamically monitor tool wear and provide actionable insights for optimization.By leveraging real-time data and advanced simulation techniques,Digital Twin-based prediction models offer significant improvements over traditional methods.The paper concludes by discussing future directions for integrating machine learning,deep learning,and real-time data analytics into the tool wear prediction process,ultimately contributing to the development of more intelligent and adaptive manufacturing systems.展开更多
To achieve carbon dioxide(CO_(2))storage through enhanced oil recovery,accurate forecasting of CO_(2) subsurface storage and cumulative oil production is essential.This study develops hybrid predictive models for the ...To achieve carbon dioxide(CO_(2))storage through enhanced oil recovery,accurate forecasting of CO_(2) subsurface storage and cumulative oil production is essential.This study develops hybrid predictive models for the determination of CO_(2) storage mass and cumulative oil production in unconventional reservoirs.It does so with two multi-layer perceptron neural networks(MLPNN)and a least-squares support vector machine(LSSVM),hybridized with grey wolf optimization(GWO)and/or particle swarm optimization(PSO).Large,simulated datasets were divided into training(70%)and testing(30%)groups,with normalization applied to both groups.Mahalanobis distance identifies/eliminates outliers in the training subset only.A non-dominated sorting genetic algorithm(NSGA-II)combined with LSSVM selected seven influential features from the nine available input parameters:reservoir depth,porosity,permeability,thickness,bottom-hole pressure,area,CO_(2) injection rate,residual oil saturation to gas flooding,and residual oil saturation to water flooding.Predictive models were developed and tested,with performance evaluated with an overfitting index(OFI),scoring analysis,and partial dependence plots(PDP),during training and independent testing to enhance model focus and effectiveness.The LSSVM-GWO model generated the lowest root mean square error(RMSE)values(0.4052 MMT for CO_(2) storage and 9.7392 MMbbl for cumulative oil production)in the training group.That trained model also exhibited excellent generalization and minimal overfitting when applied to the testing group(RMSE of 0.6224 MMT for CO_(2) storage and 12.5143 MMbbl for cumulative oil production).PDP analysis revealed that the input features“area”and“porosity”had the most influence on the LSSVM-GWO model's pre-diction performance.This paper presents a new hybrid modeling approach that achieves accurate forecasting of CO_(2) subsurface storage and cumulative oil production.It also establishes a new standard for such forecasting,which can lead to the development of more effective and sustainable solutions for oil recovery.展开更多
Metal alloy anode materials with high specific capacity and low voltage have recently gained significant attention due to their excellent electrochemical performance and the ability to suppress dendrite growth.However...Metal alloy anode materials with high specific capacity and low voltage have recently gained significant attention due to their excellent electrochemical performance and the ability to suppress dendrite growth.However,experimental investigations of metal alloys can be time-consuming and expensive,often requiring extensive experimental design and effort.In this study,we developed a machine learning model based on the Crystal Graph Convolutional Neural Network(CGCNN)to screen alloy anode materials for seven battery systems,including lithium(Li),sodium(Na),potassium(K),zinc(Zn),magnesium(Mg),calcium(Ca),and aluminum(Al).We utilized data with tens of thousands of alloy materials from the Materials Project(MP)and Automatic FLOW for Materials Discovery(AFLOW)databases.Without any experimental voltage input,we identified over 30 alloy systems that have been experimentally validated with good precision.Additionally,we predicted over 100 alloy anodes with low potential and high specific capacity.We hope this work to spur further interest in employing advanced machine learning models for the design of battery materials.展开更多
Crystal structure prediction(CSP)is a foundational computational technique for determining the atomic arrangements of crystalline materials,especially under high-pressure conditions.While CSP plays a critical role in ...Crystal structure prediction(CSP)is a foundational computational technique for determining the atomic arrangements of crystalline materials,especially under high-pressure conditions.While CSP plays a critical role in materials science,traditional approaches often encounter significant challenges related to computational efficiency and scalability,particularly when applied to complex systems.Recent advances in machine learning(ML)have shown tremendous promise in addressing these limitations,enabling the rapid and accurate prediction of crystal structures across a wide range of chemical compositions and external conditions.This review provides a concise overview of recent progress in ML-assisted CSP methodologies,with a particular focus on machine learning potentials and generative models.By critically analyzing these advances,we highlight the transformative impact of ML in accelerating materials discovery,enhancing computational efficiency,and broadening the applicability of CSP.Additionally,we discuss emerging opportunities and challenges in this rapidly evolving field.展开更多
BACKGROUND Ki-67 is a routine test item in clinical pathology departments.However,its prognostic value requires further investigation,especially in the context of research using machine learning(ML),which remains rela...BACKGROUND Ki-67 is a routine test item in clinical pathology departments.However,its prognostic value requires further investigation,especially in the context of research using machine learning(ML),which remains relatively underdeveloped.AIM To investigate the prognostic value of Ki-67 in cases of colorectal carcinoma(CRC)and explore the potential application of ML algorithms to predict the Ki-67 index.METHODS Case data and pathological sections from two centers were systematically collected.To analyze the prognostic value of the Ki-67 index in CRC,multiple cutoff values were established.Meanwhile,by virtue of the histological features presented in the hematoxylin and eosin-stained CRC images,three mainstream ML algorithms,support vector machine(SVM),random forest(RF),and eXtreme gradient boosting(XGBoost)were employed to construct prediction models.Subsequently,the potential of these algorithms to classify and predict the Ki-67 index was explored.RESULTS Non-parametric tests revealed that Ki-67≥40%correlated with a high histological grade(P=0.017),deficient mismatch repair protein status associated with≥50%-90%cutoffs(all P≤0.028),and≥80%linked to lymph node metastasis(P=0.006).Kaplan-Meier analysis showed that Ki-67≥50%predicted higher survival(log-rank P=0.0299,hazard ratio=2.142),with no differences for other cutoffs.COX regression identified the Ki-67 positive rate as a significant predictor(P=0.027,hazard ratio=2.583),while other variables had no association.In algorithmic model predictions,the SVM,RF,and XGBoost models achieved training area under the curve(AUC)values of 0.851,0.948,and 0.872,respectively,with corresponding test set AUC values of 0.795,0.755,and 0.750,respectively.During external validation,their AUC values for predicting Ki-67 status reached 0.757,0.749,and 0.783,respectively.CONCLUSION In algorithmic model predictions,the SVM,RF,and XGBoost models achieved training AUC values of 0.851,0.948,and 0.872,respectively,with corresponding test set AUC values of 0.795,0.755,and 0.750,respectively.During external validation,their AUC values for predicting Ki-67 status reached 0.757,0.749,and 0.783,respectively.展开更多
基金supported by the National Key Research and Development Program of China(No.2020YFC1808701).
文摘Arsenic(As)pollution in soils is a pervasive environmental issue.Biochar immobilization offers a promising solution for addressing soil As contamination.The efficiency of biochar in immobilizing As in soils primarily hinges on the characteristics of both the soil and the biochar.However,the influence of a specific property on As immobilization varies among different studies,and the development and application of arsenic passivation materials based on biochar often rely on empirical knowledge.To enhance immobilization efficiency and reduce labor and time costs,a machine learning(ML)model was employed to predict As immobilization efficiency before biochar application.In this study,we collected a dataset comprising 182 data points on As immobilization efficiency from 17 publications to construct three ML models.The results demonstrated that the random forest(RF)model outperformed gradient boost regression tree and support vector regression models in predictive performance.Relative importance analysis and partial dependence plots based on the RF model were conducted to identify the most crucial factors influencing As immobilization.These findings highlighted the significant roles of biochar application time and biochar pH in As immobilization efficiency in soils.Furthermore,the study revealed that Fe-modified biochar exhibited a substantial improvement in As immobilization.These insights can facilitate targeted biochar property design and optimization of biochar application conditions to enhance As immobilization efficiency.
基金supported by the SP2024/089 Project by the Faculty of Materials Science and Technology,VˇSB-Technical University of Ostrava.
文摘In engineering practice,it is often necessary to determine functional relationships between dependent and independent variables.These relationships can be highly nonlinear,and classical regression approaches cannot always provide sufficiently reliable solutions.Nevertheless,Machine Learning(ML)techniques,which offer advanced regression tools to address complicated engineering issues,have been developed and widely explored.This study investigates the selected ML techniques to evaluate their suitability for application in the hot deformation behavior of metallic materials.The ML-based regression methods of Artificial Neural Networks(ANNs),Support Vector Machine(SVM),Decision Tree Regression(DTR),and Gaussian Process Regression(GPR)are applied to mathematically describe hot flow stress curve datasets acquired experimentally for a medium-carbon steel.Although the GPR method has not been used for such a regression task before,the results showed that its performance is the most favorable and practically unrivaled;neither the ANN method nor the other studied ML techniques provide such precise results of the solved regression analysis.
基金supported by the Fundamental Research Funds for the Central Universities(Grant No.2682024GF019)。
文摘Excellent detonation performances and low sensitivity are prerequisites for the deployment of energetic materials.Exploring the underlying factors that affect impact sensitivity and detonation performances as well as exploring how to obtain materials with desired properties remains a long-term challenge.Machine learning with its ability to solve complex tasks and perform robust data processing can reveal the relationship between performance and descriptive indicators,potentially accelerating the development process of energetic materials.In this background,impact sensitivity,detonation performances,and 28 physicochemical parameters for 222 energetic materials from density functional theory calculations and published literature were sorted out.Four machine learning algorithms were employed to predict various properties of energetic materials,including impact sensitivity,detonation velocity,detonation pressure,and Gurney energy.Analysis of Pearson coefficients and feature importance showed that the heat of explosion,oxygen balance,decomposition products,and HOMO energy levels have a strong correlation with the impact sensitivity of energetic materials.Oxygen balance,decomposition products,and density have a strong correlation with detonation performances.Utilizing impact sensitivity of 2,3,4-trinitrotoluene and the detonation performances of 2,4,6-trinitrobenzene-1,3,5-triamine as the benchmark,the analysis of feature importance rankings and statistical data revealed the optimal range of key features balancing impact sensitivity and detonation performances:oxygen balance values should be between-40%and-30%,density should range from 1.66 to 1.72 g/cm^(3),HOMO energy levels should be between-6.34 and-6.31 eV,and lipophilicity should be between-1.0 and 0.1,4.49 and 5.59.These findings not only offer important insights into the impact sensitivity and detonation performances of energetic materials,but also provide a theoretical guidance paradigm for the design and development of new energetic materials with optimal detonation performances and reduced sensitivity.
基金funded by theNationalNatural Science Foundation of China(52061020)Major Science and Technology Projects in Yunnan Province(202302AG050009)Yunnan Fundamental Research Projects(202301AV070003).
文摘Finding materials with specific properties is a hot topic in materials science.Traditional materials design relies on empirical and trial-and-error methods,requiring extensive experiments and time,resulting in high costs.With the development of physics,statistics,computer science,and other fields,machine learning offers opportunities for systematically discovering new materials.Especially through machine learning-based inverse design,machine learning algorithms analyze the mapping relationships between materials and their properties to find materials with desired properties.This paper first outlines the basic concepts of materials inverse design and the challenges faced by machine learning-based approaches to materials inverse design.Then,three main inverse design methods—exploration-based,model-based,and optimization-based—are analyzed in the context of different application scenarios.Finally,the applications of inverse design methods in alloys,optical materials,and acoustic materials are elaborated on,and the prospects for materials inverse design are discussed.The authors hope to accelerate the discovery of new materials and provide new possibilities for advancing materials science and innovative design methods.
基金the support from the National Natural Science Foundation of China(Nos.52279103,52379103)the Natural Science Foundation of Shandong Province(No.ZR2023YQ049)。
文摘Geological analysis,despite being a long-term method for identifying adverse geology in tunnels,has significant limitations due to its reliance on empirical analysis.The quantitative aspects of geochemical anomalies associated with adverse geology provide a novel strategy for addressing these limitations.However,statistical methods for identifying geochemical anomalies are insufficient for tunnel engineering.In contrast,data mining techniques such as machine learning have demonstrated greater efficacy when applied to geological data.Herein,a method for identifying adverse geology using machine learning of geochemical anomalies is proposed.The method was identified geochemical anomalies in tunnel that were not identified by statistical methods.We by employing robust factor analysis and self-organizing maps to reduce the dimensionality of geochemical data and extract the anomaly elements combination(AEC).Using the AEC sample data,we trained an isolation forest model to identify the multi-element anomalies,successfully.We analyzed the adverse geological features based the multi-element anomalies.This study,therefore,extends the traditional approach of geological analysis in tunnels and demonstrates that machine learning is an effective tool for intelligent geological analysis.Correspondingly,the research offers new insights regarding the adverse geology and the prevention of hazards during the construction of tunnels and underground engineering projects.
基金the University of Transport Technology under grant number DTTD2022-12.
文摘Determination of Shear Bond strength(SBS)at interlayer of double-layer asphalt concrete is crucial in flexible pavement structures.The study used three Machine Learning(ML)models,including K-Nearest Neighbors(KNN),Extra Trees(ET),and Light Gradient Boosting Machine(LGBM),to predict SBS based on easily determinable input parameters.Also,the Grid Search technique was employed for hyper-parameter tuning of the ML models,and cross-validation and learning curve analysis were used for training the models.The models were built on a database of 240 experimental results and three input variables:temperature,normal pressure,and tack coat rate.Model validation was performed using three statistical criteria:the coefficient of determination(R2),the Root Mean Square Error(RMSE),and the mean absolute error(MAE).Additionally,SHAP analysis was also used to validate the importance of the input variables in the prediction of the SBS.Results show that these models accurately predict SBS,with LGBM providing outstanding performance.SHAP(Shapley Additive explanation)analysis for LGBM indicates that temperature is the most influential factor on SBS.Consequently,the proposed ML models can quickly and accurately predict SBS between two layers of asphalt concrete,serving practical applications in flexible pavement structure design.
基金supported by the National Natural Science Foundation of China(Grant Nos.22225801,W2441009,22408228)。
文摘As energy demands continue to rise in modern society,the development of high-performance lithium-ion batteries(LIBs)has become crucial.However,traditional research methods of material science face challenges such as lengthy timelines and complex processes.In recent years,the integration of machine learning(ML)in LIB materials,including electrolytes,solid-state electrolytes,and electrodes,has yielded remarkable achievements.This comprehensive review explores the latest applications of ML in predicting LIB material performance,covering the core principles and recent advancements in three key inverse material design strategies:high-throughput virtual screening,global optimization,and generative models.These strategies have played a pivotal role in fostering LIB material innovations.Meanwhile,the paper briefly discusses the challenges associated with applying ML to materials research and offers insights and directions for future research.
基金funded through India Meteorological Department,New Delhi,India under the Forecasting Agricultural output using Space,Agrometeorol ogy and Land based observations(FASAL)project and fund number:No.ASC/FASAL/KT-11/01/HQ-2010.
文摘Background Cotton is one of the most important commercial crops after food crops,especially in countries like India,where it’s grown extensively under rainfed conditions.Because of its usage in multiple industries,such as textile,medicine,and automobile industries,it has greater commercial importance.The crop’s performance is greatly influenced by prevailing weather dynamics.As climate changes,assessing how weather changes affect crop performance is essential.Among various techniques that are available,crop models are the most effective and widely used tools for predicting yields.Results This study compares statistical and machine learning models to assess their ability to predict cotton yield across major producing districts of Karnataka,India,utilizing a long-term dataset spanning from 1990 to 2023 that includes yield and weather factors.The artificial neural networks(ANNs)performed superiorly with acceptable yield deviations ranging within±10%during both vegetative stage(F1)and mid stage(F2)for cotton.The model evaluation metrics such as root mean square error(RMSE),normalized root mean square error(nRMSE),and modelling efficiency(EF)were also within the acceptance limits in most districts.Furthermore,the tested ANN model was used to assess the importance of the dominant weather factors influencing crop yield in each district.Specifically,the use of morning relative humidity as an individual parameter and its interaction with maximum and minimum tempera-ture had a major influence on cotton yield in most of the yield predicted districts.These differences highlighted the differential interactions of weather factors in each district for cotton yield formation,highlighting individual response of each weather factor under different soils and management conditions over the major cotton growing districts of Karnataka.Conclusions Compared with statistical models,machine learning models such as ANNs proved higher efficiency in forecasting the cotton yield due to their ability to consider the interactive effects of weather factors on yield forma-tion at different growth stages.This highlights the best suitability of ANNs for yield forecasting in rainfed conditions and for the study on relative impacts of weather factors on yield.Thus,the study aims to provide valuable insights to support stakeholders in planning effective crop management strategies and formulating relevant policies.
文摘Teacher–student relationships play a vital role in improving college students’academic performance and the quality of higher education.However,empirical studies with substantial data-driven insights remain limited.To address this gap,this study collected 3278 questionnaires from seven universities across four provinces in China to analyze the key factors affecting college students’academic performance.A machine learning framework,CQFOA-KELM,was developed by enhancing the Fruit Fly Optimization Algorithm(FOA)with Covariance Matrix Adaptation Evolution Strategy(CMAES)and Quadratic Approximation(QA).CQFOA significantly improved population diversity and was validated on the IEEE CEC2017 benchmark functions.The CQFOA-KELM model achieved an accuracy of 98.15%and a sensitivity of 98.53%in predicting college students’academic performance.Additionally,it effectively identified the key factors influencing academic performance through the feature selection process.
基金supported by the Key R&D Program of Zhejiang Province(Nos.2023C01166 and 2024SJCZX0046)the Zhejiang Provincial Natural Science Foundation of China(Nos.LDT23E05013E05 and LD24E050009)the Natural Science Foundation of Ningbo(No.2021J150),China.
文摘Accuracy allocation is crucial in the accuracy design of machining tools.Current accuracy allocation methods primarily focus on positional deviation,with little consideration for tool direction deviation.To address this issue,we propose a geometric error cost sensitivity-based accuracy allocation method for five-axis machine tools.A geometric error model consisting of 4l error components is constructed based on homogeneous transformation matrices.Volumetric points with positional and tool direction deviations are randomly sampled to evaluate the accuracy of the machine tool.The sensitivity of each error component at these sampling points is analyzed using the Sobol method.To balance the needs of geometric precision and manufacturing cost,a geometric error cost sensitivity function is developed to estimate the required cost.By allocating error components affecting tool direction deviation first and the remaining components second,this allocation scheme ensures that both deviations meet the requirements.We also perform numerical simulation of a BC-type(B-axis and C-axis type)five-axis machine tool to validate the method.The results show that the new allocation scheme reduces the total geometric error cost by 27.8%compared to a uniform allocation scheme,and yields the same positional and tool direction machining accuracies.
基金supported by the National Natural Science Foundation of China(grant numbers 82204127 and 72204172)。
文摘Lung cancer, the leading cause of cancer deaths worldwide and in China, has a 19.7% five-year survival rate due to terminal-stage diagnosis^([1-3]).Although low-dose computed tomography(CT) screening can reduce mortality, high false positive rates can create economic and psychological burdens.
文摘Solar cells made from perovskites have experienced rapid development as examples of third-generation solar cells in recent years. The traditional trial-and-error method is inefficient, and the search space is incredibly large. This makes developing advanced perovskite materials, as well as high conversion efficiencies and stability of perovskite solar cells (PSCs), a challenging task. A growing number of data-driven machine learning (ML) applications are being developed in the materials science field, due to the availability of large databases and increased computing power. There are many advantages associated with the use of machine learning to predict the properties of potential perovskite materials, as well as provide additional knowledge on how these materials work to fast-track their progress. Thus, the purpose of this paper is to develop a conceptual model to improve the efficiency of a perovskite solar cell using machine learning techniques in order to improve its performance. This study relies on the application of design science as a method to conduct the research as part of the study. The developed model consists of six phases: Data collection and preprocessing, feature selection and engineering, model training and evaluation, performance assessment, optimization and fine-tuning, and deployment and application. As a result of this model, there is a great deal of promise in advancing the field of perovskite solar cells as well as providing a basis for developing more efficient and cost-effective solar energy technologies in the future.
基金Newcastle University and EPSRC(Grant No.2020/21 DTP:ref.EP/T517914/1).
文摘Accurate and robust detection of wax appearance(a medium-to high-molecular-weight component of crude oil)is crucial for the efficient operation of hydrocarbon transportation.The wax appearance temperature(WAT)is the lowest temperature at which the wax begins to form.When crude oil cools to its WAT,wax crystals precipitate,forming deposits on pipelines as the solubility limit is reached.Therefore,WAT is a crucial quality assurance parameter,especially when dealing with modern fuel oil blends.In this study,we use machine learning via MATLAB’s Bioinformatics Toolbox to predict the WAT of marine fuel samples by correlating near-infrared spectral data with laboratory-measured values.The dataset provided by Intertek PLC-a total quality assurance provider of inspection,testing,and certification services-includes industrial data that is imbalanced,with a higher proportion of high-WAT samples compared to low-WAT samples.The objective is to predict marine fuel oil blends with unusually high WAT values(>35℃)without relying on time-consuming and irregular laboratory-based measurements.The results demonstrate that the developed model,based on the one-class support vector machine(OCSVM)algorithm,achieved a Recall of 96,accurately predicting 96%of fuel samples with WAT>35℃.For standard binary classification,the Recall was 85.7.The trained OCSVM model is expected to facilitate rapid and well-informed decision-making for logistics and storage when choosing fuel oils.
基金supported by the Jiangsu Provincial Science and Technology Project Basic Research Program(Natural Science Foundation of Jiangsu Province)(No.BK20211283).
文摘NJmat is a user-friendly,data-driven machine learning interface designed for materials design and analysis.The platform integrates advanced computational techniques,including natural language processing(NLP),large language models(LLM),machine learning potentials(MLP),and graph neural networks(GNN),to facili-tate materials discovery.The platform has been applied in diverse materials research areas,including perovskite surface design,catalyst discovery,battery materials screening,structural alloy design,and molecular informatics.By automating feature selection,predictive modeling,and result interpretation,NJmat accelerates the development of high-performance materials across energy storage,conversion,and structural applications.Additionally,NJmat serves as an educational tool,allowing students and researchers to apply machine learning techniques in materials science with minimal coding expertise.Through automated feature extraction,genetic algorithms,and interpretable machine learning models,NJmat simplifies the workflow for materials informatics,bridging the gap between AI and experimental materials research.The latest version(available at https://figshare.com/articles/software/NJmatML/24607893(accessed on 01 January 2025))enhances its functionality by incorporating NJmatNLP,a module leveraging language models like MatBERT and those based on Word2Vec to support materials prediction tasks.By utilizing clustering and cosine similarity analysis with UMAP visualization,NJmat enables intuitive exploration of materials datasets.While NJmat primarily focuses on structure-property relationships and the discovery of novel chemistries,it can also assist in optimizing processing conditions when relevant parameters are included in the training data.By providing an accessible,integrated environment for machine learning-driven materials discovery,NJmat aligns with the objectives of the Materials Genome Initiative and promotes broader adoption of AI techniques in materials science.
基金a grant from the Beijing Hospitals Authority Youth Programme(grant:QML20230204)a grant from National Natural Science Foundation of China(No.82471179)a grant from the National Key Research and Development Plan(grant:2022YFC2402705)。
文摘Objective:To assess the effectiveness of machine learning in automating the prediction of vestibular abnormalities after cochlear implantation(CI)in patients with sensorineural hearing loss(SNHL),with the goal of developing a practical model that can accurately predict long-term vestibular function outcomes and identify associated risk factors.Methods:Clinical data,including imaging,vestibular evoked myogenic potentials(VEMPs),and auditory information,were collected from patients with sensorineural hearing loss(SNHL)before and after CI.The decision tree algorithm was employed to address missing values and screen pre-CI clinical features.Six machine learning methods were subsequently utilized to predict the relationships between the extracted features and post-CI vestibular dysfunction.The best-performing method determined the ranking of feature importance,which was regarded as risk factors for predicting symptoms and VEMPs results after CI.Results:Logistic regression models effectively predicted both post-CI vestibular dysfunction and abnormal cervical VEMP(c VEMP),with accuracies of 80%and 78%,respectively.The relative importance of the features,in descending order,was as follows:c VEMP latency,c VEMP amplitude,and residual hearing threshold.Moreover,the support vector machine(SVM)model attained an accuracy of 88%in predicting abnormal ocular VEMP(o VEMP)post-CI.For the SVM model,the feature importance ranking was as follows:o VEMP latency,o VEMP amplitude,and residual hearing threshold.Conclusions:This study successfully leverages machine learning techniques,specifically support vector machines(SVM)and logistic regression models,to predict the impact of CI on vestibular function.These predictive models provide valuable insights for presurgical planning and decision-making in CI procedures.Moreover,the findings highlight the critical risk factors associated with vestibular dysfunction,offering a robust reference for guiding vestibular rehabilitation strategies.
文摘The prediction of tool wear in CNC machine tools is a critical aspect of ensuring the efficient operation and longevity of manufacturing equipment.Tool wear significantly impacts machining accuracy,surface finish quality,and operational downtime,making its prediction essential for proactive maintenance strategies.This paper explores the integration of Digital Twin technology with tool wear prediction models to enhance the precision and reliability of wear forecasting in CNC machines.We review existing methodologies for tool wear prediction,including physics-based models,data-driven approaches,and hybrid models,with an emphasis on their strengths and limitations.Furthermore,the paper highlights the role of Digital Twin technology in creating real-time,virtual replicas of CNC machines that can dynamically monitor tool wear and provide actionable insights for optimization.By leveraging real-time data and advanced simulation techniques,Digital Twin-based prediction models offer significant improvements over traditional methods.The paper concludes by discussing future directions for integrating machine learning,deep learning,and real-time data analytics into the tool wear prediction process,ultimately contributing to the development of more intelligent and adaptive manufacturing systems.
文摘To achieve carbon dioxide(CO_(2))storage through enhanced oil recovery,accurate forecasting of CO_(2) subsurface storage and cumulative oil production is essential.This study develops hybrid predictive models for the determination of CO_(2) storage mass and cumulative oil production in unconventional reservoirs.It does so with two multi-layer perceptron neural networks(MLPNN)and a least-squares support vector machine(LSSVM),hybridized with grey wolf optimization(GWO)and/or particle swarm optimization(PSO).Large,simulated datasets were divided into training(70%)and testing(30%)groups,with normalization applied to both groups.Mahalanobis distance identifies/eliminates outliers in the training subset only.A non-dominated sorting genetic algorithm(NSGA-II)combined with LSSVM selected seven influential features from the nine available input parameters:reservoir depth,porosity,permeability,thickness,bottom-hole pressure,area,CO_(2) injection rate,residual oil saturation to gas flooding,and residual oil saturation to water flooding.Predictive models were developed and tested,with performance evaluated with an overfitting index(OFI),scoring analysis,and partial dependence plots(PDP),during training and independent testing to enhance model focus and effectiveness.The LSSVM-GWO model generated the lowest root mean square error(RMSE)values(0.4052 MMT for CO_(2) storage and 9.7392 MMbbl for cumulative oil production)in the training group.That trained model also exhibited excellent generalization and minimal overfitting when applied to the testing group(RMSE of 0.6224 MMT for CO_(2) storage and 12.5143 MMbbl for cumulative oil production).PDP analysis revealed that the input features“area”and“porosity”had the most influence on the LSSVM-GWO model's pre-diction performance.This paper presents a new hybrid modeling approach that achieves accurate forecasting of CO_(2) subsurface storage and cumulative oil production.It also establishes a new standard for such forecasting,which can lead to the development of more effective and sustainable solutions for oil recovery.
基金supported by the National Key R&D Program of China(2023YFB2504000,YH)a start-up grant from Zhejiang University and the Fundamental Research Funds for the Central Universities(2021FZZX001,226-2024-00005)supported by Special Support Plan for High Level Talents in Zhejiang Province(2023R5231)。
文摘Metal alloy anode materials with high specific capacity and low voltage have recently gained significant attention due to their excellent electrochemical performance and the ability to suppress dendrite growth.However,experimental investigations of metal alloys can be time-consuming and expensive,often requiring extensive experimental design and effort.In this study,we developed a machine learning model based on the Crystal Graph Convolutional Neural Network(CGCNN)to screen alloy anode materials for seven battery systems,including lithium(Li),sodium(Na),potassium(K),zinc(Zn),magnesium(Mg),calcium(Ca),and aluminum(Al).We utilized data with tens of thousands of alloy materials from the Materials Project(MP)and Automatic FLOW for Materials Discovery(AFLOW)databases.Without any experimental voltage input,we identified over 30 alloy systems that have been experimentally validated with good precision.Additionally,we predicted over 100 alloy anodes with low potential and high specific capacity.We hope this work to spur further interest in employing advanced machine learning models for the design of battery materials.
基金supported by the National Key Research and Development Program of China(Grant No.2022YFA1402304)the National Natural Science Foundation of China(Grant Nos.12034009,12374005,52288102,52090024,and T2225013)+1 种基金the Fundamental Research Funds for the Central Universitiesthe Program for JLU Science and Technology Innovative Research Team.
文摘Crystal structure prediction(CSP)is a foundational computational technique for determining the atomic arrangements of crystalline materials,especially under high-pressure conditions.While CSP plays a critical role in materials science,traditional approaches often encounter significant challenges related to computational efficiency and scalability,particularly when applied to complex systems.Recent advances in machine learning(ML)have shown tremendous promise in addressing these limitations,enabling the rapid and accurate prediction of crystal structures across a wide range of chemical compositions and external conditions.This review provides a concise overview of recent progress in ML-assisted CSP methodologies,with a particular focus on machine learning potentials and generative models.By critically analyzing these advances,we highlight the transformative impact of ML in accelerating materials discovery,enhancing computational efficiency,and broadening the applicability of CSP.Additionally,we discuss emerging opportunities and challenges in this rapidly evolving field.
基金Supported by the Guangxi Zhuang Autonomous Region Health Commission Scientific Research Project,No.Z20210442。
文摘BACKGROUND Ki-67 is a routine test item in clinical pathology departments.However,its prognostic value requires further investigation,especially in the context of research using machine learning(ML),which remains relatively underdeveloped.AIM To investigate the prognostic value of Ki-67 in cases of colorectal carcinoma(CRC)and explore the potential application of ML algorithms to predict the Ki-67 index.METHODS Case data and pathological sections from two centers were systematically collected.To analyze the prognostic value of the Ki-67 index in CRC,multiple cutoff values were established.Meanwhile,by virtue of the histological features presented in the hematoxylin and eosin-stained CRC images,three mainstream ML algorithms,support vector machine(SVM),random forest(RF),and eXtreme gradient boosting(XGBoost)were employed to construct prediction models.Subsequently,the potential of these algorithms to classify and predict the Ki-67 index was explored.RESULTS Non-parametric tests revealed that Ki-67≥40%correlated with a high histological grade(P=0.017),deficient mismatch repair protein status associated with≥50%-90%cutoffs(all P≤0.028),and≥80%linked to lymph node metastasis(P=0.006).Kaplan-Meier analysis showed that Ki-67≥50%predicted higher survival(log-rank P=0.0299,hazard ratio=2.142),with no differences for other cutoffs.COX regression identified the Ki-67 positive rate as a significant predictor(P=0.027,hazard ratio=2.583),while other variables had no association.In algorithmic model predictions,the SVM,RF,and XGBoost models achieved training area under the curve(AUC)values of 0.851,0.948,and 0.872,respectively,with corresponding test set AUC values of 0.795,0.755,and 0.750,respectively.During external validation,their AUC values for predicting Ki-67 status reached 0.757,0.749,and 0.783,respectively.CONCLUSION In algorithmic model predictions,the SVM,RF,and XGBoost models achieved training AUC values of 0.851,0.948,and 0.872,respectively,with corresponding test set AUC values of 0.795,0.755,and 0.750,respectively.During external validation,their AUC values for predicting Ki-67 status reached 0.757,0.749,and 0.783,respectively.