Polymer electrolyte membrane fuel cells(PEMFCs)are considered a promising alternative to internal combustion engines in the automotive sector.Their commercialization is mainly hindered due to the cost and effectivenes...Polymer electrolyte membrane fuel cells(PEMFCs)are considered a promising alternative to internal combustion engines in the automotive sector.Their commercialization is mainly hindered due to the cost and effectiveness of using platinum(Pt)in them.The cathode catalyst layer(CL)is considered a core component in PEMFCs,and its composition often considerably affects the cell performance(V_(cell))also PEMFC fabrication and production(C_(stack))costs.In this study,a data-driven multi-objective optimization analysis is conducted to effectively evaluate the effects of various cathode CL compositions on Vcelland Cstack.Four essential cathode CL parameters,i.e.,platinum loading(L_(Pt)),weight ratio of ionomer to carbon(wt_(I/C)),weight ratio of Pt to carbon(wt_(Pt/c)),and porosity of cathode CL(ε_(cCL)),are considered as the design variables.The simulation results of a three-dimensional,multi-scale,two-phase comprehensive PEMFC model are used to train and test two famous surrogates:multi-layer perceptron(MLP)and response surface analysis(RSA).Their accuracies are verified using root mean square error and adjusted R^(2).MLP which outperforms RSA in terms of prediction capability is then linked to a multi-objective non-dominated sorting genetic algorithmⅡ.Compared to a typical PEMFC stack,the results of the optimal study show that the single-cell voltage,Vcellis improved by 28 m V for the same stack price and the stack cost evaluated through the U.S department of energy cost model is reduced by$5.86/k W for the same stack performance.展开更多
Wetting deformation in earth-rockfill dams is a critical factor influencingdam safety.Although numerous mathematical models have been developed to describe this phenomenon,most of them rely on empirical formulations a...Wetting deformation in earth-rockfill dams is a critical factor influencingdam safety.Although numerous mathematical models have been developed to describe this phenomenon,most of them rely on empirical formulations and lack prior knowledge of model parameters,which is essential for Bayesian parameter inversion to enhance accuracy and reduce uncertainty.This study introduces a datadriven approach to establishing prior knowledge of earth-rockfill dams.Driving factors are utilized to determine the potential range of model parameters,and settlement changes within this range are calculated.The results are iteratively compared with actual monitoring data until the calculated range encompasses the observed data,thereby providing prior knowledge of the model parameters.The proposed method is applied to the right-bank earth-rockfilldam of Danjiangkou.Employing a Gibbs sample size of 30,000,the proposed method effectively calibrates the prior knowledge of the wetting model parameters,achieving a root mean square error(RMSE)of 5.18 mm for the settlement predictions.By comparison,the use of non-informative priors with sample sizes of 30,000 and 50,000 results in significantly larger RMSE values of 11.97 mm and 16.07 mm,respectively.Furthermore,the computational efficiencyof the proposed method is demonstrated by an inversion computation time of 902 s for 30,000 samples,which is notably shorter than the 1026 s and 1558 s required for noninformative priors with 30,000 and 50,000 samples,respectively.These findingsunderscore the superior performance of the proposed approach in terms of both prediction accuracy and computational efficiency.These results demonstrate that the proposed method not only improves the predictive accuracy but also enhances the computational efficiency,enabling optimal parameter identificationwith reduced computational effort.This approach provides a robust and efficientframework for advancing dam safety assessments.展开更多
We propose an integrated method of data-driven and mechanism models for well logging formation evaluation,explicitly focusing on predicting reservoir parameters,such as porosity and water saturation.Accurately interpr...We propose an integrated method of data-driven and mechanism models for well logging formation evaluation,explicitly focusing on predicting reservoir parameters,such as porosity and water saturation.Accurately interpreting these parameters is crucial for effectively exploring and developing oil and gas.However,with the increasing complexity of geological conditions in this industry,there is a growing demand for improved accuracy in reservoir parameter prediction,leading to higher costs associated with manual interpretation.The conventional logging interpretation methods rely on empirical relationships between logging data and reservoir parameters,which suffer from low interpretation efficiency,intense subjectivity,and suitability for ideal conditions.The application of artificial intelligence in the interpretation of logging data provides a new solution to the problems existing in traditional methods.It is expected to improve the accuracy and efficiency of the interpretation.If large and high-quality datasets exist,data-driven models can reveal relationships of arbitrary complexity.Nevertheless,constructing sufficiently large logging datasets with reliable labels remains challenging,making it difficult to apply data-driven models effectively in logging data interpretation.Furthermore,data-driven models often act as“black boxes”without explaining their predictions or ensuring compliance with primary physical constraints.This paper proposes a machine learning method with strong physical constraints by integrating mechanism and data-driven models.Prior knowledge of logging data interpretation is embedded into machine learning regarding network structure,loss function,and optimization algorithm.We employ the Physically Informed Auto-Encoder(PIAE)to predict porosity and water saturation,which can be trained without labeled reservoir parameters using self-supervised learning techniques.This approach effectively achieves automated interpretation and facilitates generalization across diverse datasets.展开更多
Landslide susceptibility mapping(LSM)plays a crucial role in assessing geological risks.The current LSM techniques face a significant challenge in achieving accurate results due to uncertainties associated with region...Landslide susceptibility mapping(LSM)plays a crucial role in assessing geological risks.The current LSM techniques face a significant challenge in achieving accurate results due to uncertainties associated with regional-scale geotechnical parameters.To explore rainfall-induced LSM,this study proposes a hybrid model that combines the physically-based probabilistic model(PPM)with convolutional neural network(CNN).The PPM is capable of effectively capturing the spatial distribution of landslides by incorporating the probability of failure(POF)considering the slope stability mechanism under rainfall conditions.This significantly characterizes the variation of POF caused by parameter uncertainties.CNN was used as a binary classifier to capture the spatial and channel correlation between landslide conditioning factors and the probability of landslide occurrence.OpenCV image enhancement technique was utilized to extract non-landslide points based on the POF of landslides.The proposed model comprehensively considers physical mechanics when selecting non-landslide samples,effectively filtering out samples that do not adhere to physical principles and reduce the risk of overfitting.The results indicate that the proposed PPM-CNN hybrid model presents a higher prediction accuracy,with an area under the curve(AUC)value of 0.85 based on the landslide case of the Niangniangba area of Gansu Province,China compared with the individual CNN model(AUC=0.61)and the PPM(AUC=0.74).This model can also consider the statistical correlation and non-normal probability distributions of model parameters.These results offer practical guidance for future research on rainfall-induced LSM at the regional scale.展开更多
The constitutive models of shape memory alloys(SMAs)play an important role in facilitating the widespread application of such types of alloys in various engineering fields.However,to accurately describe the deformatio...The constitutive models of shape memory alloys(SMAs)play an important role in facilitating the widespread application of such types of alloys in various engineering fields.However,to accurately describe the deformation behaviors of SMAs,the concepts in classical plasticity are employed in the existing constitutive models,and a series of complex mathematical equations are involved.Such complexity brings inconvenience for the construction,implementation,and application of the constitutive models.To overcome these shortcomings,a data-driven constitutive model of SMAs is developed in this work based on the artificial neural network(ANN).In the proposed model,the components of the strain tensor in principal space,ambient temperature,and the maximum equivalent strain in the deformation history from the initial state to the current loading state are chosen as the input features,and the components of the stress tensor in principal space are set as the output.The proposed ANN-based constitutive model is implemented into the finite element program ABAQUS by deriving its consistent tangent modulus and writing a user-defined material subroutine.The stress-strain responses of SMA material under various loading paths and at different ambient temperatures are used to train the ANN model,which is generated from the existing constitutive model(numerical experiments).To validate the capability of the proposed model,the predicted stress-strain responses of SMA material,and the global and local responses of two typical SMA structures are compared with the corresponding numerical experiments.This work demonstrates a good potential to obtain the constitutive model of SMAs by pure data and avoid the need for vast stores of knowledge for the construction of constitutive models.展开更多
This paper focuses on the numerical solution of a tumor growth model under a data-driven approach.Based on the inherent laws of the data and reasonable assumptions,an ordinary differential equation model for tumor gro...This paper focuses on the numerical solution of a tumor growth model under a data-driven approach.Based on the inherent laws of the data and reasonable assumptions,an ordinary differential equation model for tumor growth is established.Nonlinear fitting is employed to obtain the optimal parameter estimation of the mathematical model,and the numerical solution is carried out using the Matlab software.By comparing the clinical data with the simulation results,a good agreement is achieved,which verifies the rationality and feasibility of the model.展开更多
With the rapid advancement of machine learning technology and its growing adoption in research and engineering applications,an increasing number of studies have embraced data-driven approaches for modeling wind turbin...With the rapid advancement of machine learning technology and its growing adoption in research and engineering applications,an increasing number of studies have embraced data-driven approaches for modeling wind turbine wakes.These models leverage the ability to capture complex,high-dimensional characteristics of wind turbine wakes while offering significantly greater efficiency in the prediction process than physics-driven models.As a result,data-driven wind turbine wake models are regarded as powerful and effective tools for predicting wake behavior and turbine power output.This paper aims to provide a concise yet comprehensive review of existing studies on wind turbine wake modeling that employ data-driven approaches.It begins by defining and classifying machine learning methods to facilitate a clearer understanding of the reviewed literature.Subsequently,the related studies are categorized into four key areas:wind turbine power prediction,data-driven analytic wake models,wake field reconstruction,and the incorporation of explicit physical constraints.The accuracy of data-driven models is influenced by two primary factors:the quality of the training data and the performance of the model itself.Accordingly,both data accuracy and model structure are discussed in detail within the review.展开更多
The distillation process is an important chemical process,and the application of data-driven modelling approach has the potential to reduce model complexity compared to mechanistic modelling,thus improving the efficie...The distillation process is an important chemical process,and the application of data-driven modelling approach has the potential to reduce model complexity compared to mechanistic modelling,thus improving the efficiency of process optimization or monitoring studies.However,the distillation process is highly nonlinear and has multiple uncertainty perturbation intervals,which brings challenges to accurate data-driven modelling of distillation processes.This paper proposes a systematic data-driven modelling framework to solve these problems.Firstly,data segment variance was introduced into the K-means algorithm to form K-means data interval(KMDI)clustering in order to cluster the data into perturbed and steady state intervals for steady-state data extraction.Secondly,maximal information coefficient(MIC)was employed to calculate the nonlinear correlation between variables for removing redundant features.Finally,extreme gradient boosting(XGBoost)was integrated as the basic learner into adaptive boosting(AdaBoost)with the error threshold(ET)set to improve weights update strategy to construct the new integrated learning algorithm,XGBoost-AdaBoost-ET.The superiority of the proposed framework is verified by applying this data-driven modelling framework to a real industrial process of propylene distillation.展开更多
Permanent magnet synchronous motor(PMSM)is widely used in alternating current servo systems as it provides high eficiency,high power density,and a wide speed regulation range.The servo system is placing higher demands...Permanent magnet synchronous motor(PMSM)is widely used in alternating current servo systems as it provides high eficiency,high power density,and a wide speed regulation range.The servo system is placing higher demands on its control performance.The model predictive control(MPC)algorithm is emerging as a potential high-performance motor control algorithm due to its capability of handling multiple-input and multipleoutput variables and imposed constraints.For the MPC used in the PMSM control process,there is a nonlinear disturbance caused by the change of electromagnetic parameters or load disturbance that may lead to a mismatch between the nominal model and the controlled object,which causes the prediction error and thus affects the dynamic stability of the control system.This paper proposes a data-driven MPC strategy in which the historical data in an appropriate range are utilized to eliminate the impact of parameter mismatch and further improve the control performance.The stability of the proposed algorithm is proved as the simulation demonstrates the feasibility.Compared with the classical MPC strategy,the superiority of the algorithm has also been verified.展开更多
Current experimental and computational methods have limitations in accurately and efficiently classifying ion channels within vast protein spaces.Here we have developed a deep learning algorithm,GPT2 Ion Channel Class...Current experimental and computational methods have limitations in accurately and efficiently classifying ion channels within vast protein spaces.Here we have developed a deep learning algorithm,GPT2 Ion Channel Classifier(GPT2-ICC),which effectively distinguishing ion channels from a test set containing approximately 239 times more non-ion-channel proteins.GPT2-ICC integrates representation learning with a large language model(LLM)-based classifier,enabling highly accurate identification of potential ion channels.Several potential ion channels were predicated from the unannotated human proteome,further demonstrating GPT2-ICC’s generalization ability.This study marks a significant advancement in artificial-intelligence-driven ion channel research,highlighting the adaptability and effectiveness of combining representation learning with LLMs to address the challenges of imbalanced protein sequence data.Moreover,it provides a valuable computational tool for uncovering previously uncharacterized ion channels.展开更多
To ensure the safe operation of batteries,accurately obtaining key internal state parameters is essential.However,traditional parameter measurement methods either require opening the battery or long-term measurements,...To ensure the safe operation of batteries,accurately obtaining key internal state parameters is essential.However,traditional parameter measurement methods either require opening the battery or long-term measurements,which are impractical.Therefore,the fixed values are commonly used for these parameters in electrochemical models and have significant limitations.To overcome these limitations,this paper proposes a deep neural network(DNN)based data-driven evaluation method to determine model parameters.By coupling an improved one-dimensional isothermal pseudo-twodimensional(P2D)model with DNN,this study identified concentration-dependent parameters through detailed discharge curve analysis.The results show that the data-driven method can effectively obtain the change trend of concentration-dependent parameters through the charge and discharge curve,and the method can be extended to different battery systems in different discharge rates and aging applications.This work is expected to provide new parameter selection insights for data-driven battery prediction and monitoring models.展开更多
A data-driven model ofmultiple variable cutting(M-VCUT)level set-based substructure is proposed for the topology optimization of lattice structures.TheM-VCUTlevel setmethod is used to represent substructures,enriching...A data-driven model ofmultiple variable cutting(M-VCUT)level set-based substructure is proposed for the topology optimization of lattice structures.TheM-VCUTlevel setmethod is used to represent substructures,enriching their diversity of configuration while ensuring connectivity.To construct the data-driven model of substructure,a database is prepared by sampling the space of substructures spanned by several substructure prototypes.Then,for each substructure in this database,the stiffness matrix is condensed so that its degrees of freedomare reduced.Thereafter,the data-drivenmodel of substructures is constructed through interpolationwith compactly supported radial basis function(CS-RBF).The inputs of the data-driven model are the design variables of topology optimization,and the outputs are the condensed stiffness matrix and volume of substructures.During the optimization,this data-driven model is used,thus avoiding repeated static condensation that would requiremuch computation time.Several numerical examples are provided to verify the proposed method.展开更多
Against the backdrop of the national innovation strategy and the digital transformation of education,the traditional“extensive”training model for innovation and entrepreneurship talents struggles to meet the persona...Against the backdrop of the national innovation strategy and the digital transformation of education,the traditional“extensive”training model for innovation and entrepreneurship talents struggles to meet the personalized development needs of students,making an urgent shift toward precision and intelligence necessary.This study constructs a four-dimensional integrated framework centered on data,“Goal-Data-Intervention-Evaluation”,and proposes a data-driven training model for innovation and entrepreneurship talents in universities.By collecting multi-source data such as learning behaviors,competency assessments,and practical projects,the model conducts in-depth analysis of students’individual characteristics and development potential,enabling precise decision-making in goal setting,teaching intervention,and practical guidance.Based on data analysis,a supportive system for personalized teaching and practical activities is established.Combined with process-oriented and summative evaluations,a closed-loop feedback mechanism is formed to improve training effectiveness.This model provides a theoretical framework and practical path for the scientific,personalized,and intelligent development of innovation and entrepreneurship education in universities.展开更多
The Underwater Acoustic(UWA)channel is bandwidth-constrained and experiences doubly selective fading.It is challenging to acquire perfect channel knowledge for Orthogonal Frequency Division Multiplexing(OFDM)communica...The Underwater Acoustic(UWA)channel is bandwidth-constrained and experiences doubly selective fading.It is challenging to acquire perfect channel knowledge for Orthogonal Frequency Division Multiplexing(OFDM)communications using a finite number of pilots.On the other hand,Deep Learning(DL)approaches have been very successful in wireless OFDM communications.However,whether they will work underwater is still a mystery.For the first time,this paper compares two categories of DL-based UWA OFDM receivers:the DataDriven(DD)method,which performs as an end-to-end black box,and the Model-Driven(MD)method,also known as the model-based data-driven method,which combines DL and expert OFDM receiver knowledge.The encoder-decoder framework and Convolutional Neural Network(CNN)structure are employed to establish the DD receiver.On the other hand,an unfolding-based Minimum Mean Square Error(MMSE)structure is adopted for the MD receiver.We analyze the characteristics of different receivers by Monte Carlo simulations under diverse communications conditions and propose a strategy for selecting a proper receiver under different communication scenarios.Field trials in the pool and sea are also conducted to verify the feasibility and advantages of the DL receivers.It is observed that DL receivers perform better than conventional receivers in terms of bit error rate.展开更多
In the rapidly evolving technological landscape,state-owned enterprises(SOEs)encounter significant challenges in sustaining their competitiveness through efficient R&D management.Integrated Product Development(IPD...In the rapidly evolving technological landscape,state-owned enterprises(SOEs)encounter significant challenges in sustaining their competitiveness through efficient R&D management.Integrated Product Development(IPD),with its emphasis on cross-functional teamwork,concurrent engineering,and data-driven decision-making,has been widely recognized for enhancing R&D efficiency and product quality.However,the unique characteristics of SOEs pose challenges to the effective implementation of IPD.The advancement of big data and artificial intelligence technologies offers new opportunities for optimizing IPD R&D management through data-driven decision-making models.This paper constructs and validates a data-driven decision-making model tailored to the IPD R&D management of SOEs.By integrating data mining,machine learning,and other advanced analytical techniques,the model serves as a scientific and efficient decision-making tool.It aids SOEs in optimizing R&D resource allocation,shortening product development cycles,reducing R&D costs,and improving product quality and innovation.Moreover,this study contributes to a deeper theoretical understanding of the value of data-driven decision-making in the context of IPD.展开更多
The impacts of lateral boundary conditions(LBCs)provided by numerical models and data-driven networks on convective-scale ensemble forecasts are investigated in this study.Four experiments are conducted on the Hangzho...The impacts of lateral boundary conditions(LBCs)provided by numerical models and data-driven networks on convective-scale ensemble forecasts are investigated in this study.Four experiments are conducted on the Hangzhou RDP(19th Hangzhou Asian Games Research Development Project on Convective-scale Ensemble Prediction and Application)testbed,with the LBCs respectively sourced from National Centers for Environmental Prediction(NCEP)Global Forecast System(GFS)forecasts with 33 vertical levels(Exp_GFS),Pangu forecasts with 13 vertical levels(Exp_Pangu),Fuxi forecasts with 13 vertical levels(Exp_Fuxi),and NCEP GFS forecasts with the vertical levels reduced to 13(the same as those of Exp_Pangu and Exp_Fuxi)(Exp_GFSRDV).In general,Exp_Pangu performs comparably to Exp_GFS,while Exp_Fuxi shows slightly inferior performance compared to Exp_Pangu,possibly due to its less accurate large-scale predictions.Therefore,the ability of using data-driven networks to efficiently provide LBCs for convective-scale ensemble forecasts has been demonstrated.Moreover,Exp_GFSRDV has the worst convective-scale forecasts among the four experiments,which indicates the potential improvement of using data-driven networks for LBCs by increasing the vertical levels of the networks.However,the ensemble spread of the four experiments barely increases with lead time.Thus,each experiment has insufficient ensemble spread to present realistic forecast uncertainties,which will be investigated in a future study.展开更多
Data-driven research on recycled aggregate concrete(RAC)has long faced the challenge of lacking a unified testing standard dataset,hindering accurate model evaluation and trust in predictive outcomes.This paper review...Data-driven research on recycled aggregate concrete(RAC)has long faced the challenge of lacking a unified testing standard dataset,hindering accurate model evaluation and trust in predictive outcomes.This paper reviews critical parameters influencing mechanical properties in 35 RAC studies,compiles four datasets encompassing these parameters,and compiles the performance and key findings of 77 published data-driven models.Baseline capability tests are conducted on the nine most used models.The paper also outlines advanced methodological frameworks for future RAC research,examining the principles and challenges of physics-informed neural networks(PINNs)and generative adversarial networks(GANs),and employs SHAP and PDP tools to interpret model behaviour and enhance transparency.Findings indicate a clear trend toward integrated systems,hybrid models,and advanced optimization strategies,with integrated tree-based models showing superior performance across various prediction tasks.Based on this comprehensive review,we offer a recommendation for future research on how AI can be effectively oriented in RAC studies to support practical deployment and build confidence in data-driven approaches.展开更多
Accurate prediction of strip width is a key factor related to the quality of hot rolling manufacture.Firstly,based on strip width formation mechanism model within strip rolling process,an improved width mechanism calc...Accurate prediction of strip width is a key factor related to the quality of hot rolling manufacture.Firstly,based on strip width formation mechanism model within strip rolling process,an improved width mechanism calculation model is delineated for the optimization of process parameters via the particle swarm optimization algorithm.Subsequently,a hybrid strip width prediction model is proposed by effectively combining the respective advantages of the improved mechanism model and the data-driven model.In acknowledgment of prerequisite for positive error in strip width prediction,an adaptive width error compensation algorithm is proposed.Finally,comparative simulation experiments are designed on the actual rolling dataset after completing data cleaning and feature engineering.The experimental results show that the hybrid prediction model proposed has superior precision and robustness compared with the improved mechanism model and the other eight common data-driven models and satisfies the needs of practical applications.Moreover,the hybrid model can realize the complementary advantages of the mechanism model and the data-driven model,effectively alleviating the problems of difficult to improve the accuracy of the mechanism model and poor interpretability of the data-driven model,which bears significant practical implications for the research of strip width control.展开更多
Advancements in dynamic modeling methods of robotic manipulator are critical to the effective implementation of model-based control.Traditional approaches rely on rigorous first-principles-based dynamic modeling and p...Advancements in dynamic modeling methods of robotic manipulator are critical to the effective implementation of model-based control.Traditional approaches rely on rigorous first-principles-based dynamic modeling and precise parameter identification,while this paper explores an altemative through data-driven model reconstruction.To tackle the curse of dimensionality in the model reconstruction of a serial robotic manipulator with multi-degree-of-freedom,a relative activation indicator is proposed.Based on this indicator,the k-means clustering algorithm is utilized to classify the data under different working conditions.Sub-sequently,we leverage the fundamental prior knowledge to find the dynamical characteristics of each cluster and reconstruct the dynamic model in a stepwise manner using the method of sparse identification of nonlinear dynamics(SINDy).For the library generation of SINDy,the strategy of double-feature-set for serial manipulators with common joint types is proposed.Simula-tion results show that the stepwise model reconstruction approach not only reduces the size of the library of candidate functions but also decreases the impact of data noise on the reconstruction results.Finally,controllers based on the reconstructed mod.els are deployed on the experimental platform and the experimental results demonstrate the improvement in trajectory tracking performance and the potential of the proposed method in engineering applications.展开更多
This study focuses on empirical modeling of the strength characteristics of urban soils contaminated with heavy metals using machine learning tools and their subsequent stabilization with ordinary Portland cement(OPC)...This study focuses on empirical modeling of the strength characteristics of urban soils contaminated with heavy metals using machine learning tools and their subsequent stabilization with ordinary Portland cement(OPC).For dataset collection,an extensive experimental program was designed to estimate the unconfined compressive strength(Qu)of heavy metal-contaminated soils collected from awide range of land use pattern,i.e.residential,industrial and roadside soils.Accordingly,a robust comparison of predictive performances of four data-driven models including extreme learning machines(ELMs),gene expression programming(GEP),random forests(RFs),and multiple linear regression(MLR)has been presented.For completeness,a comprehensive experimental database has been established and partitioned into 80%for training and 20%for testing the developed models.Inputs included varying levels of heavy metals like Cd,Cu,Cr,Pb and Zn,along with OPC.The results revealed that the GEP model outperformed its counterparts:explaining approximately 96%of the variability in both training(R2=0.964)and testing phases(R^(2)=0.961),and thus achieving the lowest RMSE and MAE values.ELM performed commendably but was slightly less accurate than GEP whereas MLR had the lowest performance metrics.GEP also provided the benefit of traceable mathematical equation,enhancing its applicability not just as a predictive but also as an explanatory tool.Despite its insights,the study is limited by its focus on a specific set of heavy metals and urban soil samples of a particular region,which may affect the generalizability of the findings to different contamination profiles or environmental conditions.The study recommends GEP for predicting Qu in heavy metal-contaminated soils,and suggests further research to adapt these models to different environmental conditions.展开更多
基金supported by the Technology Innovation Program of the Korea Evaluation Institute of Industrial Technology (KEIT)under the Ministry of Trade,Industry and Energy (MOTIE)of Republic of Korea (20012121)by the National Research Foundation of Korea (NRF)grant funded by the Korea government (MSIT) (2022M3J7A106294)。
文摘Polymer electrolyte membrane fuel cells(PEMFCs)are considered a promising alternative to internal combustion engines in the automotive sector.Their commercialization is mainly hindered due to the cost and effectiveness of using platinum(Pt)in them.The cathode catalyst layer(CL)is considered a core component in PEMFCs,and its composition often considerably affects the cell performance(V_(cell))also PEMFC fabrication and production(C_(stack))costs.In this study,a data-driven multi-objective optimization analysis is conducted to effectively evaluate the effects of various cathode CL compositions on Vcelland Cstack.Four essential cathode CL parameters,i.e.,platinum loading(L_(Pt)),weight ratio of ionomer to carbon(wt_(I/C)),weight ratio of Pt to carbon(wt_(Pt/c)),and porosity of cathode CL(ε_(cCL)),are considered as the design variables.The simulation results of a three-dimensional,multi-scale,two-phase comprehensive PEMFC model are used to train and test two famous surrogates:multi-layer perceptron(MLP)and response surface analysis(RSA).Their accuracies are verified using root mean square error and adjusted R^(2).MLP which outperforms RSA in terms of prediction capability is then linked to a multi-objective non-dominated sorting genetic algorithmⅡ.Compared to a typical PEMFC stack,the results of the optimal study show that the single-cell voltage,Vcellis improved by 28 m V for the same stack price and the stack cost evaluated through the U.S department of energy cost model is reduced by$5.86/k W for the same stack performance.
基金supported by the National Key R&D Program of China(Grant No.2023YFC3209504)Natural Science Foundation of Wuhan(Grant No.2024040801020271)the Fundamental Research Funds for Central Public Welfare Research Institutes(Grant No.CKSF2025718/YT).
文摘Wetting deformation in earth-rockfill dams is a critical factor influencingdam safety.Although numerous mathematical models have been developed to describe this phenomenon,most of them rely on empirical formulations and lack prior knowledge of model parameters,which is essential for Bayesian parameter inversion to enhance accuracy and reduce uncertainty.This study introduces a datadriven approach to establishing prior knowledge of earth-rockfill dams.Driving factors are utilized to determine the potential range of model parameters,and settlement changes within this range are calculated.The results are iteratively compared with actual monitoring data until the calculated range encompasses the observed data,thereby providing prior knowledge of the model parameters.The proposed method is applied to the right-bank earth-rockfilldam of Danjiangkou.Employing a Gibbs sample size of 30,000,the proposed method effectively calibrates the prior knowledge of the wetting model parameters,achieving a root mean square error(RMSE)of 5.18 mm for the settlement predictions.By comparison,the use of non-informative priors with sample sizes of 30,000 and 50,000 results in significantly larger RMSE values of 11.97 mm and 16.07 mm,respectively.Furthermore,the computational efficiencyof the proposed method is demonstrated by an inversion computation time of 902 s for 30,000 samples,which is notably shorter than the 1026 s and 1558 s required for noninformative priors with 30,000 and 50,000 samples,respectively.These findingsunderscore the superior performance of the proposed approach in terms of both prediction accuracy and computational efficiency.These results demonstrate that the proposed method not only improves the predictive accuracy but also enhances the computational efficiency,enabling optimal parameter identificationwith reduced computational effort.This approach provides a robust and efficientframework for advancing dam safety assessments.
基金supported by National Key Research and Development Program (2019YFA0708301)National Natural Science Foundation of China (51974337)+2 种基金the Strategic Cooperation Projects of CNPC and CUPB (ZLZX2020-03)Science and Technology Innovation Fund of CNPC (2021DQ02-0403)Open Fund of Petroleum Exploration and Development Research Institute of CNPC (2022-KFKT-09)
文摘We propose an integrated method of data-driven and mechanism models for well logging formation evaluation,explicitly focusing on predicting reservoir parameters,such as porosity and water saturation.Accurately interpreting these parameters is crucial for effectively exploring and developing oil and gas.However,with the increasing complexity of geological conditions in this industry,there is a growing demand for improved accuracy in reservoir parameter prediction,leading to higher costs associated with manual interpretation.The conventional logging interpretation methods rely on empirical relationships between logging data and reservoir parameters,which suffer from low interpretation efficiency,intense subjectivity,and suitability for ideal conditions.The application of artificial intelligence in the interpretation of logging data provides a new solution to the problems existing in traditional methods.It is expected to improve the accuracy and efficiency of the interpretation.If large and high-quality datasets exist,data-driven models can reveal relationships of arbitrary complexity.Nevertheless,constructing sufficiently large logging datasets with reliable labels remains challenging,making it difficult to apply data-driven models effectively in logging data interpretation.Furthermore,data-driven models often act as“black boxes”without explaining their predictions or ensuring compliance with primary physical constraints.This paper proposes a machine learning method with strong physical constraints by integrating mechanism and data-driven models.Prior knowledge of logging data interpretation is embedded into machine learning regarding network structure,loss function,and optimization algorithm.We employ the Physically Informed Auto-Encoder(PIAE)to predict porosity and water saturation,which can be trained without labeled reservoir parameters using self-supervised learning techniques.This approach effectively achieves automated interpretation and facilitates generalization across diverse datasets.
基金funding support from the National Natural Science Foundation of China(Grant Nos.U22A20594,52079045)Hong-Zhi Cui acknowledges the financial support of the China Scholarship Council(Grant No.CSC:202206710014)for his research at Universitat Politecnica de Catalunya,Barcelona.
文摘Landslide susceptibility mapping(LSM)plays a crucial role in assessing geological risks.The current LSM techniques face a significant challenge in achieving accurate results due to uncertainties associated with regional-scale geotechnical parameters.To explore rainfall-induced LSM,this study proposes a hybrid model that combines the physically-based probabilistic model(PPM)with convolutional neural network(CNN).The PPM is capable of effectively capturing the spatial distribution of landslides by incorporating the probability of failure(POF)considering the slope stability mechanism under rainfall conditions.This significantly characterizes the variation of POF caused by parameter uncertainties.CNN was used as a binary classifier to capture the spatial and channel correlation between landslide conditioning factors and the probability of landslide occurrence.OpenCV image enhancement technique was utilized to extract non-landslide points based on the POF of landslides.The proposed model comprehensively considers physical mechanics when selecting non-landslide samples,effectively filtering out samples that do not adhere to physical principles and reduce the risk of overfitting.The results indicate that the proposed PPM-CNN hybrid model presents a higher prediction accuracy,with an area under the curve(AUC)value of 0.85 based on the landslide case of the Niangniangba area of Gansu Province,China compared with the individual CNN model(AUC=0.61)and the PPM(AUC=0.74).This model can also consider the statistical correlation and non-normal probability distributions of model parameters.These results offer practical guidance for future research on rainfall-induced LSM at the regional scale.
基金supported by the National Natural Science Foundation of China(NSFC)(Grant No.12322203).
文摘The constitutive models of shape memory alloys(SMAs)play an important role in facilitating the widespread application of such types of alloys in various engineering fields.However,to accurately describe the deformation behaviors of SMAs,the concepts in classical plasticity are employed in the existing constitutive models,and a series of complex mathematical equations are involved.Such complexity brings inconvenience for the construction,implementation,and application of the constitutive models.To overcome these shortcomings,a data-driven constitutive model of SMAs is developed in this work based on the artificial neural network(ANN).In the proposed model,the components of the strain tensor in principal space,ambient temperature,and the maximum equivalent strain in the deformation history from the initial state to the current loading state are chosen as the input features,and the components of the stress tensor in principal space are set as the output.The proposed ANN-based constitutive model is implemented into the finite element program ABAQUS by deriving its consistent tangent modulus and writing a user-defined material subroutine.The stress-strain responses of SMA material under various loading paths and at different ambient temperatures are used to train the ANN model,which is generated from the existing constitutive model(numerical experiments).To validate the capability of the proposed model,the predicted stress-strain responses of SMA material,and the global and local responses of two typical SMA structures are compared with the corresponding numerical experiments.This work demonstrates a good potential to obtain the constitutive model of SMAs by pure data and avoid the need for vast stores of knowledge for the construction of constitutive models.
基金National Natural Science Foundation of China(Project No.:12371428)Projects of the Provincial College Students’Innovation and Training Program in 2024(Project No.:S202413023106,S202413023110)。
文摘This paper focuses on the numerical solution of a tumor growth model under a data-driven approach.Based on the inherent laws of the data and reasonable assumptions,an ordinary differential equation model for tumor growth is established.Nonlinear fitting is employed to obtain the optimal parameter estimation of the mathematical model,and the numerical solution is carried out using the Matlab software.By comparing the clinical data with the simulation results,a good agreement is achieved,which verifies the rationality and feasibility of the model.
基金Supported by the National Natural Science Foundation of China under Grant No.52131102.
文摘With the rapid advancement of machine learning technology and its growing adoption in research and engineering applications,an increasing number of studies have embraced data-driven approaches for modeling wind turbine wakes.These models leverage the ability to capture complex,high-dimensional characteristics of wind turbine wakes while offering significantly greater efficiency in the prediction process than physics-driven models.As a result,data-driven wind turbine wake models are regarded as powerful and effective tools for predicting wake behavior and turbine power output.This paper aims to provide a concise yet comprehensive review of existing studies on wind turbine wake modeling that employ data-driven approaches.It begins by defining and classifying machine learning methods to facilitate a clearer understanding of the reviewed literature.Subsequently,the related studies are categorized into four key areas:wind turbine power prediction,data-driven analytic wake models,wake field reconstruction,and the incorporation of explicit physical constraints.The accuracy of data-driven models is influenced by two primary factors:the quality of the training data and the performance of the model itself.Accordingly,both data accuracy and model structure are discussed in detail within the review.
基金supported by the National Key Research and Development Program of China(2023YFB3307801)the National Natural Science Foundation of China(62394343,62373155,62073142)+3 种基金Major Science and Technology Project of Xinjiang(No.2022A01006-4)the Programme of Introducing Talents of Discipline to Universities(the 111 Project)under Grant B17017the Fundamental Research Funds for the Central Universities,Science Foundation of China University of Petroleum,Beijing(No.2462024YJRC011)the Open Research Project of the State Key Laboratory of Industrial Control Technology,China(Grant No.ICT2024B70).
文摘The distillation process is an important chemical process,and the application of data-driven modelling approach has the potential to reduce model complexity compared to mechanistic modelling,thus improving the efficiency of process optimization or monitoring studies.However,the distillation process is highly nonlinear and has multiple uncertainty perturbation intervals,which brings challenges to accurate data-driven modelling of distillation processes.This paper proposes a systematic data-driven modelling framework to solve these problems.Firstly,data segment variance was introduced into the K-means algorithm to form K-means data interval(KMDI)clustering in order to cluster the data into perturbed and steady state intervals for steady-state data extraction.Secondly,maximal information coefficient(MIC)was employed to calculate the nonlinear correlation between variables for removing redundant features.Finally,extreme gradient boosting(XGBoost)was integrated as the basic learner into adaptive boosting(AdaBoost)with the error threshold(ET)set to improve weights update strategy to construct the new integrated learning algorithm,XGBoost-AdaBoost-ET.The superiority of the proposed framework is verified by applying this data-driven modelling framework to a real industrial process of propylene distillation.
文摘Permanent magnet synchronous motor(PMSM)is widely used in alternating current servo systems as it provides high eficiency,high power density,and a wide speed regulation range.The servo system is placing higher demands on its control performance.The model predictive control(MPC)algorithm is emerging as a potential high-performance motor control algorithm due to its capability of handling multiple-input and multipleoutput variables and imposed constraints.For the MPC used in the PMSM control process,there is a nonlinear disturbance caused by the change of electromagnetic parameters or load disturbance that may lead to a mismatch between the nominal model and the controlled object,which causes the prediction error and thus affects the dynamic stability of the control system.This paper proposes a data-driven MPC strategy in which the historical data in an appropriate range are utilized to eliminate the impact of parameter mismatch and further improve the control performance.The stability of the proposed algorithm is proved as the simulation demonstrates the feasibility.Compared with the classical MPC strategy,the superiority of the algorithm has also been verified.
基金funded by grants from the National Key Research and Development Program of China(Grant Nos.:2022YFE0205600 and 2022YFC3400504)the National Natural Science Foundation of China(Grant Nos.:82373792 and 82273857)the Fundamental Research Funds for the Central Universities,China,and the East China Normal University Medicine and Health Joint Fund,China(Grant No.:2022JKXYD07001).
文摘Current experimental and computational methods have limitations in accurately and efficiently classifying ion channels within vast protein spaces.Here we have developed a deep learning algorithm,GPT2 Ion Channel Classifier(GPT2-ICC),which effectively distinguishing ion channels from a test set containing approximately 239 times more non-ion-channel proteins.GPT2-ICC integrates representation learning with a large language model(LLM)-based classifier,enabling highly accurate identification of potential ion channels.Several potential ion channels were predicated from the unannotated human proteome,further demonstrating GPT2-ICC’s generalization ability.This study marks a significant advancement in artificial-intelligence-driven ion channel research,highlighting the adaptability and effectiveness of combining representation learning with LLMs to address the challenges of imbalanced protein sequence data.Moreover,it provides a valuable computational tool for uncovering previously uncharacterized ion channels.
基金supported by National Natural Science Foundation of China(22478239)Science and Technology Commission of Shanghai Municipality(19DZ2271100)National Natural Science Foundation of China(22208208)。
文摘To ensure the safe operation of batteries,accurately obtaining key internal state parameters is essential.However,traditional parameter measurement methods either require opening the battery or long-term measurements,which are impractical.Therefore,the fixed values are commonly used for these parameters in electrochemical models and have significant limitations.To overcome these limitations,this paper proposes a deep neural network(DNN)based data-driven evaluation method to determine model parameters.By coupling an improved one-dimensional isothermal pseudo-twodimensional(P2D)model with DNN,this study identified concentration-dependent parameters through detailed discharge curve analysis.The results show that the data-driven method can effectively obtain the change trend of concentration-dependent parameters through the charge and discharge curve,and the method can be extended to different battery systems in different discharge rates and aging applications.This work is expected to provide new parameter selection insights for data-driven battery prediction and monitoring models.
基金supported by the National Natural Science Foundation of China(Grant No.12272144).
文摘A data-driven model ofmultiple variable cutting(M-VCUT)level set-based substructure is proposed for the topology optimization of lattice structures.TheM-VCUTlevel setmethod is used to represent substructures,enriching their diversity of configuration while ensuring connectivity.To construct the data-driven model of substructure,a database is prepared by sampling the space of substructures spanned by several substructure prototypes.Then,for each substructure in this database,the stiffness matrix is condensed so that its degrees of freedomare reduced.Thereafter,the data-drivenmodel of substructures is constructed through interpolationwith compactly supported radial basis function(CS-RBF).The inputs of the data-driven model are the design variables of topology optimization,and the outputs are the condensed stiffness matrix and volume of substructures.During the optimization,this data-driven model is used,thus avoiding repeated static condensation that would requiremuch computation time.Several numerical examples are provided to verify the proposed method.
基金Special Fund for Teacher Development Research Program of University of Shanghai for Science and Technology(Project No.:CFTD2025YB28)。
文摘Against the backdrop of the national innovation strategy and the digital transformation of education,the traditional“extensive”training model for innovation and entrepreneurship talents struggles to meet the personalized development needs of students,making an urgent shift toward precision and intelligence necessary.This study constructs a four-dimensional integrated framework centered on data,“Goal-Data-Intervention-Evaluation”,and proposes a data-driven training model for innovation and entrepreneurship talents in universities.By collecting multi-source data such as learning behaviors,competency assessments,and practical projects,the model conducts in-depth analysis of students’individual characteristics and development potential,enabling precise decision-making in goal setting,teaching intervention,and practical guidance.Based on data analysis,a supportive system for personalized teaching and practical activities is established.Combined with process-oriented and summative evaluations,a closed-loop feedback mechanism is formed to improve training effectiveness.This model provides a theoretical framework and practical path for the scientific,personalized,and intelligent development of innovation and entrepreneurship education in universities.
基金funded in part by the National Natural Science Foundation of China under Grant 62401167 and 62192712in part by the Key Laboratory of Marine Environmental Survey Technology and Application,Ministry of Natural Resources,P.R.China under Grant MESTA-2023-B001in part by the Stable Supporting Fund of National Key Laboratory of Underwater Acoustic Technology under Grant JCKYS2022604SSJS007.
文摘The Underwater Acoustic(UWA)channel is bandwidth-constrained and experiences doubly selective fading.It is challenging to acquire perfect channel knowledge for Orthogonal Frequency Division Multiplexing(OFDM)communications using a finite number of pilots.On the other hand,Deep Learning(DL)approaches have been very successful in wireless OFDM communications.However,whether they will work underwater is still a mystery.For the first time,this paper compares two categories of DL-based UWA OFDM receivers:the DataDriven(DD)method,which performs as an end-to-end black box,and the Model-Driven(MD)method,also known as the model-based data-driven method,which combines DL and expert OFDM receiver knowledge.The encoder-decoder framework and Convolutional Neural Network(CNN)structure are employed to establish the DD receiver.On the other hand,an unfolding-based Minimum Mean Square Error(MMSE)structure is adopted for the MD receiver.We analyze the characteristics of different receivers by Monte Carlo simulations under diverse communications conditions and propose a strategy for selecting a proper receiver under different communication scenarios.Field trials in the pool and sea are also conducted to verify the feasibility and advantages of the DL receivers.It is observed that DL receivers perform better than conventional receivers in terms of bit error rate.
文摘In the rapidly evolving technological landscape,state-owned enterprises(SOEs)encounter significant challenges in sustaining their competitiveness through efficient R&D management.Integrated Product Development(IPD),with its emphasis on cross-functional teamwork,concurrent engineering,and data-driven decision-making,has been widely recognized for enhancing R&D efficiency and product quality.However,the unique characteristics of SOEs pose challenges to the effective implementation of IPD.The advancement of big data and artificial intelligence technologies offers new opportunities for optimizing IPD R&D management through data-driven decision-making models.This paper constructs and validates a data-driven decision-making model tailored to the IPD R&D management of SOEs.By integrating data mining,machine learning,and other advanced analytical techniques,the model serves as a scientific and efficient decision-making tool.It aids SOEs in optimizing R&D resource allocation,shortening product development cycles,reducing R&D costs,and improving product quality and innovation.Moreover,this study contributes to a deeper theoretical understanding of the value of data-driven decision-making in the context of IPD.
基金supported by the Strategic Research and Consulting Project of the Chinese Academy of Engineering[grant number 2024-XBZD-14]the National Natural Science Foundation of China[grant numbers 42192553 and 41922036]the Fundamental Research Funds for the Central Universities–Cemac“GeoX”Interdisciplinary Program[grant number 020714380207]。
文摘The impacts of lateral boundary conditions(LBCs)provided by numerical models and data-driven networks on convective-scale ensemble forecasts are investigated in this study.Four experiments are conducted on the Hangzhou RDP(19th Hangzhou Asian Games Research Development Project on Convective-scale Ensemble Prediction and Application)testbed,with the LBCs respectively sourced from National Centers for Environmental Prediction(NCEP)Global Forecast System(GFS)forecasts with 33 vertical levels(Exp_GFS),Pangu forecasts with 13 vertical levels(Exp_Pangu),Fuxi forecasts with 13 vertical levels(Exp_Fuxi),and NCEP GFS forecasts with the vertical levels reduced to 13(the same as those of Exp_Pangu and Exp_Fuxi)(Exp_GFSRDV).In general,Exp_Pangu performs comparably to Exp_GFS,while Exp_Fuxi shows slightly inferior performance compared to Exp_Pangu,possibly due to its less accurate large-scale predictions.Therefore,the ability of using data-driven networks to efficiently provide LBCs for convective-scale ensemble forecasts has been demonstrated.Moreover,Exp_GFSRDV has the worst convective-scale forecasts among the four experiments,which indicates the potential improvement of using data-driven networks for LBCs by increasing the vertical levels of the networks.However,the ensemble spread of the four experiments barely increases with lead time.Thus,each experiment has insufficient ensemble spread to present realistic forecast uncertainties,which will be investigated in a future study.
文摘Data-driven research on recycled aggregate concrete(RAC)has long faced the challenge of lacking a unified testing standard dataset,hindering accurate model evaluation and trust in predictive outcomes.This paper reviews critical parameters influencing mechanical properties in 35 RAC studies,compiles four datasets encompassing these parameters,and compiles the performance and key findings of 77 published data-driven models.Baseline capability tests are conducted on the nine most used models.The paper also outlines advanced methodological frameworks for future RAC research,examining the principles and challenges of physics-informed neural networks(PINNs)and generative adversarial networks(GANs),and employs SHAP and PDP tools to interpret model behaviour and enhance transparency.Findings indicate a clear trend toward integrated systems,hybrid models,and advanced optimization strategies,with integrated tree-based models showing superior performance across various prediction tasks.Based on this comprehensive review,we offer a recommendation for future research on how AI can be effectively oriented in RAC studies to support practical deployment and build confidence in data-driven approaches.
基金supported by the National Natural Science Foundation of China(No.62273234)Key Research and Development Program of Shaanxi(Program No.2022GY-306)Technology Innovation Leading Program of Shaanxi(Program No.2022QFY01-16).
文摘Accurate prediction of strip width is a key factor related to the quality of hot rolling manufacture.Firstly,based on strip width formation mechanism model within strip rolling process,an improved width mechanism calculation model is delineated for the optimization of process parameters via the particle swarm optimization algorithm.Subsequently,a hybrid strip width prediction model is proposed by effectively combining the respective advantages of the improved mechanism model and the data-driven model.In acknowledgment of prerequisite for positive error in strip width prediction,an adaptive width error compensation algorithm is proposed.Finally,comparative simulation experiments are designed on the actual rolling dataset after completing data cleaning and feature engineering.The experimental results show that the hybrid prediction model proposed has superior precision and robustness compared with the improved mechanism model and the other eight common data-driven models and satisfies the needs of practical applications.Moreover,the hybrid model can realize the complementary advantages of the mechanism model and the data-driven model,effectively alleviating the problems of difficult to improve the accuracy of the mechanism model and poor interpretability of the data-driven model,which bears significant practical implications for the research of strip width control.
基金supported by the National Natural Science Foundation of China(Grant Nos.12072237,12472022,12372022,12372065,and U2441202)the Fundamental Research Funds for the Central Universities(Grant No.22120220590)。
文摘Advancements in dynamic modeling methods of robotic manipulator are critical to the effective implementation of model-based control.Traditional approaches rely on rigorous first-principles-based dynamic modeling and precise parameter identification,while this paper explores an altemative through data-driven model reconstruction.To tackle the curse of dimensionality in the model reconstruction of a serial robotic manipulator with multi-degree-of-freedom,a relative activation indicator is proposed.Based on this indicator,the k-means clustering algorithm is utilized to classify the data under different working conditions.Sub-sequently,we leverage the fundamental prior knowledge to find the dynamical characteristics of each cluster and reconstruct the dynamic model in a stepwise manner using the method of sparse identification of nonlinear dynamics(SINDy).For the library generation of SINDy,the strategy of double-feature-set for serial manipulators with common joint types is proposed.Simula-tion results show that the stepwise model reconstruction approach not only reduces the size of the library of candidate functions but also decreases the impact of data noise on the reconstruction results.Finally,controllers based on the reconstructed mod.els are deployed on the experimental platform and the experimental results demonstrate the improvement in trajectory tracking performance and the potential of the proposed method in engineering applications.
基金funded by the Natural Science Foundation of China(Grant No.52090084)was partially supported by the Sand Hazards and Opportunities for Resilience,Energy,and Sustainability(SHORES)Center,funded by Tamkeen under the NYUAD Research Institute Award CG013.
文摘This study focuses on empirical modeling of the strength characteristics of urban soils contaminated with heavy metals using machine learning tools and their subsequent stabilization with ordinary Portland cement(OPC).For dataset collection,an extensive experimental program was designed to estimate the unconfined compressive strength(Qu)of heavy metal-contaminated soils collected from awide range of land use pattern,i.e.residential,industrial and roadside soils.Accordingly,a robust comparison of predictive performances of four data-driven models including extreme learning machines(ELMs),gene expression programming(GEP),random forests(RFs),and multiple linear regression(MLR)has been presented.For completeness,a comprehensive experimental database has been established and partitioned into 80%for training and 20%for testing the developed models.Inputs included varying levels of heavy metals like Cd,Cu,Cr,Pb and Zn,along with OPC.The results revealed that the GEP model outperformed its counterparts:explaining approximately 96%of the variability in both training(R2=0.964)and testing phases(R^(2)=0.961),and thus achieving the lowest RMSE and MAE values.ELM performed commendably but was slightly less accurate than GEP whereas MLR had the lowest performance metrics.GEP also provided the benefit of traceable mathematical equation,enhancing its applicability not just as a predictive but also as an explanatory tool.Despite its insights,the study is limited by its focus on a specific set of heavy metals and urban soil samples of a particular region,which may affect the generalizability of the findings to different contamination profiles or environmental conditions.The study recommends GEP for predicting Qu in heavy metal-contaminated soils,and suggests further research to adapt these models to different environmental conditions.