We propose an integrated method of data-driven and mechanism models for well logging formation evaluation,explicitly focusing on predicting reservoir parameters,such as porosity and water saturation.Accurately interpr...We propose an integrated method of data-driven and mechanism models for well logging formation evaluation,explicitly focusing on predicting reservoir parameters,such as porosity and water saturation.Accurately interpreting these parameters is crucial for effectively exploring and developing oil and gas.However,with the increasing complexity of geological conditions in this industry,there is a growing demand for improved accuracy in reservoir parameter prediction,leading to higher costs associated with manual interpretation.The conventional logging interpretation methods rely on empirical relationships between logging data and reservoir parameters,which suffer from low interpretation efficiency,intense subjectivity,and suitability for ideal conditions.The application of artificial intelligence in the interpretation of logging data provides a new solution to the problems existing in traditional methods.It is expected to improve the accuracy and efficiency of the interpretation.If large and high-quality datasets exist,data-driven models can reveal relationships of arbitrary complexity.Nevertheless,constructing sufficiently large logging datasets with reliable labels remains challenging,making it difficult to apply data-driven models effectively in logging data interpretation.Furthermore,data-driven models often act as“black boxes”without explaining their predictions or ensuring compliance with primary physical constraints.This paper proposes a machine learning method with strong physical constraints by integrating mechanism and data-driven models.Prior knowledge of logging data interpretation is embedded into machine learning regarding network structure,loss function,and optimization algorithm.We employ the Physically Informed Auto-Encoder(PIAE)to predict porosity and water saturation,which can be trained without labeled reservoir parameters using self-supervised learning techniques.This approach effectively achieves automated interpretation and facilitates generalization across diverse datasets.展开更多
The Underwater Acoustic(UWA)channel is bandwidth-constrained and experiences doubly selective fading.It is challenging to acquire perfect channel knowledge for Orthogonal Frequency Division Multiplexing(OFDM)communica...The Underwater Acoustic(UWA)channel is bandwidth-constrained and experiences doubly selective fading.It is challenging to acquire perfect channel knowledge for Orthogonal Frequency Division Multiplexing(OFDM)communications using a finite number of pilots.On the other hand,Deep Learning(DL)approaches have been very successful in wireless OFDM communications.However,whether they will work underwater is still a mystery.For the first time,this paper compares two categories of DL-based UWA OFDM receivers:the DataDriven(DD)method,which performs as an end-to-end black box,and the Model-Driven(MD)method,also known as the model-based data-driven method,which combines DL and expert OFDM receiver knowledge.The encoder-decoder framework and Convolutional Neural Network(CNN)structure are employed to establish the DD receiver.On the other hand,an unfolding-based Minimum Mean Square Error(MMSE)structure is adopted for the MD receiver.We analyze the characteristics of different receivers by Monte Carlo simulations under diverse communications conditions and propose a strategy for selecting a proper receiver under different communication scenarios.Field trials in the pool and sea are also conducted to verify the feasibility and advantages of the DL receivers.It is observed that DL receivers perform better than conventional receivers in terms of bit error rate.展开更多
A data-driven model ofmultiple variable cutting(M-VCUT)level set-based substructure is proposed for the topology optimization of lattice structures.TheM-VCUTlevel setmethod is used to represent substructures,enriching...A data-driven model ofmultiple variable cutting(M-VCUT)level set-based substructure is proposed for the topology optimization of lattice structures.TheM-VCUTlevel setmethod is used to represent substructures,enriching their diversity of configuration while ensuring connectivity.To construct the data-driven model of substructure,a database is prepared by sampling the space of substructures spanned by several substructure prototypes.Then,for each substructure in this database,the stiffness matrix is condensed so that its degrees of freedomare reduced.Thereafter,the data-drivenmodel of substructures is constructed through interpolationwith compactly supported radial basis function(CS-RBF).The inputs of the data-driven model are the design variables of topology optimization,and the outputs are the condensed stiffness matrix and volume of substructures.During the optimization,this data-driven model is used,thus avoiding repeated static condensation that would requiremuch computation time.Several numerical examples are provided to verify the proposed method.展开更多
When assessing seismic liquefaction potential with data-driven models,addressing the uncertainties of establishing models,interpreting cone penetration tests(CPT)data and decision threshold is crucial for avoiding bia...When assessing seismic liquefaction potential with data-driven models,addressing the uncertainties of establishing models,interpreting cone penetration tests(CPT)data and decision threshold is crucial for avoiding biased data selection,ameliorating overconfident models,and being flexible to varying practical objectives,especially when the training and testing data are not identically distributed.A workflow characterized by leveraging Bayesian methodology was proposed to address these issues.Employing a Multi-Layer Perceptron(MLP)as the foundational model,this approach was benchmarked against empirical methods and advanced algorithms for its efficacy in simplicity,accuracy,and resistance to overfitting.The analysis revealed that,while MLP models optimized via maximum a posteriori algorithm suffices for straightforward scenarios,Bayesian neural networks showed great potential for preventing overfitting.Additionally,integrating decision thresholds through various evaluative principles offers insights for challenging decisions.Two case studies demonstrate the framework's capacity for nuanced interpretation of in situ data,employing a model committee for a detailed evaluation of liquefaction potential via Monte Carlo simulations and basic statistics.Overall,the proposed step-by-step workflow for analyzing seismic liquefaction incorporates multifold testing and real-world data validation,showing improved robustness against overfitting and greater versatility in addressing practical challenges.This research contributes to the seismic liquefaction assessment field by providing a structured,adaptable methodology for accurate and reliable analysis.展开更多
In the rapidly evolving technological landscape,state-owned enterprises(SOEs)encounter significant challenges in sustaining their competitiveness through efficient R&D management.Integrated Product Development(IPD...In the rapidly evolving technological landscape,state-owned enterprises(SOEs)encounter significant challenges in sustaining their competitiveness through efficient R&D management.Integrated Product Development(IPD),with its emphasis on cross-functional teamwork,concurrent engineering,and data-driven decision-making,has been widely recognized for enhancing R&D efficiency and product quality.However,the unique characteristics of SOEs pose challenges to the effective implementation of IPD.The advancement of big data and artificial intelligence technologies offers new opportunities for optimizing IPD R&D management through data-driven decision-making models.This paper constructs and validates a data-driven decision-making model tailored to the IPD R&D management of SOEs.By integrating data mining,machine learning,and other advanced analytical techniques,the model serves as a scientific and efficient decision-making tool.It aids SOEs in optimizing R&D resource allocation,shortening product development cycles,reducing R&D costs,and improving product quality and innovation.Moreover,this study contributes to a deeper theoretical understanding of the value of data-driven decision-making in the context of IPD.展开更多
Based on the educational evaluation reform,this study explores the construction of an evidence-based value-added evaluation system based on data-driven,aiming to solve the limitations of traditional evaluation methods...Based on the educational evaluation reform,this study explores the construction of an evidence-based value-added evaluation system based on data-driven,aiming to solve the limitations of traditional evaluation methods.The research adopts the method of combining theoretical analysis and practical application,and designs the evidence-based value-added evaluation framework,which includes the core elements of a multi-source heterogeneous data acquisition and processing system,a value-added evaluation agent based on a large model,and an evaluation implementation and application mechanism.Through empirical research verification,the evaluation system has remarkable effects in improving learning participation,promoting ability development,and supporting teaching decision-making,and provides a theoretical reference and practical path for educational evaluation reform in the new era.The research shows that the evidence-based value-added evaluation system based on data-driven can reflect students’actual progress more fairly and objectively by accurately measuring the difference in starting point and development range of students,and provide strong support for the realization of high-quality education development.展开更多
With the rapid advancement of machine learning technology and its growing adoption in research and engineering applications,an increasing number of studies have embraced data-driven approaches for modeling wind turbin...With the rapid advancement of machine learning technology and its growing adoption in research and engineering applications,an increasing number of studies have embraced data-driven approaches for modeling wind turbine wakes.These models leverage the ability to capture complex,high-dimensional characteristics of wind turbine wakes while offering significantly greater efficiency in the prediction process than physics-driven models.As a result,data-driven wind turbine wake models are regarded as powerful and effective tools for predicting wake behavior and turbine power output.This paper aims to provide a concise yet comprehensive review of existing studies on wind turbine wake modeling that employ data-driven approaches.It begins by defining and classifying machine learning methods to facilitate a clearer understanding of the reviewed literature.Subsequently,the related studies are categorized into four key areas:wind turbine power prediction,data-driven analytic wake models,wake field reconstruction,and the incorporation of explicit physical constraints.The accuracy of data-driven models is influenced by two primary factors:the quality of the training data and the performance of the model itself.Accordingly,both data accuracy and model structure are discussed in detail within the review.展开更多
The outstanding comprehensive mechanical properties of newly developed hybrid lattice structures make them useful in engineering applications for bearing multiple mechanical loads.Additive-manufacturing technologies m...The outstanding comprehensive mechanical properties of newly developed hybrid lattice structures make them useful in engineering applications for bearing multiple mechanical loads.Additive-manufacturing technologies make it possible to fabricate these highly spatially programmable structures and greatly enhance the freedom in their design.However,traditional analytical methods do not sufficiently reflect the actual vibration-damping mechanism of lattice structures and are limited by their high computational cost.In this study,a hybrid lattice structure consisting of various cells was designed based on quasi-static and vibration experiments.Subsequently,a novel parametric design method based on a data-driven approach was developed for hybrid lattices with engineered properties.The response surface method was adopted to define the sensitive optimization target.A prediction model for the lattice geometric parameters and vibration properties was established using a backpropagation neural network.Then,it was integrated into the genetic algorithm to create the optimal hybrid lattice with varying geometric features and the required wide-band vibration-damping characteristics.Validation experiments were conducted,demonstrating that the optimized hybrid lattice can achieve the target properties.In addition,the data-driven parametric design method can reduce computation time and be widely applied to complex structural designs when analytical and empirical solutions are unavailable.展开更多
NJmat is a user-friendly,data-driven machine learning interface designed for materials design and analysis.The platform integrates advanced computational techniques,including natural language processing(NLP),large lan...NJmat is a user-friendly,data-driven machine learning interface designed for materials design and analysis.The platform integrates advanced computational techniques,including natural language processing(NLP),large language models(LLM),machine learning potentials(MLP),and graph neural networks(GNN),to facili-tate materials discovery.The platform has been applied in diverse materials research areas,including perovskite surface design,catalyst discovery,battery materials screening,structural alloy design,and molecular informatics.By automating feature selection,predictive modeling,and result interpretation,NJmat accelerates the development of high-performance materials across energy storage,conversion,and structural applications.Additionally,NJmat serves as an educational tool,allowing students and researchers to apply machine learning techniques in materials science with minimal coding expertise.Through automated feature extraction,genetic algorithms,and interpretable machine learning models,NJmat simplifies the workflow for materials informatics,bridging the gap between AI and experimental materials research.The latest version(available at https://figshare.com/articles/software/NJmatML/24607893(accessed on 01 January 2025))enhances its functionality by incorporating NJmatNLP,a module leveraging language models like MatBERT and those based on Word2Vec to support materials prediction tasks.By utilizing clustering and cosine similarity analysis with UMAP visualization,NJmat enables intuitive exploration of materials datasets.While NJmat primarily focuses on structure-property relationships and the discovery of novel chemistries,it can also assist in optimizing processing conditions when relevant parameters are included in the training data.By providing an accessible,integrated environment for machine learning-driven materials discovery,NJmat aligns with the objectives of the Materials Genome Initiative and promotes broader adoption of AI techniques in materials science.展开更多
Hydraulic fracturing technology has achieved remarkable results in improving the production of tight gas reservoirs,but its effectiveness is under the joint action of multiple factors of complexity.Traditional analysi...Hydraulic fracturing technology has achieved remarkable results in improving the production of tight gas reservoirs,but its effectiveness is under the joint action of multiple factors of complexity.Traditional analysis methods have limitations in dealing with these complex and interrelated factors,and it is difficult to fully reveal the actual contribution of each factor to the production.Machine learning-based methods explore the complex mapping relationships between large amounts of data to provide datadriven insights into the key factors driving production.In this study,a data-driven PCA-RF-VIM(Principal Component Analysis-Random Forest-Variable Importance Measures)approach of analyzing the importance of features is proposed to identify the key factors driving post-fracturing production.Four types of parameters,including log parameters,geological and reservoir physical parameters,hydraulic fracturing design parameters,and reservoir stimulation parameters,were inputted into the PCA-RF-VIM model.The model was trained using 6-fold cross-validation and grid search,and the relative importance ranking of each factor was finally obtained.In order to verify the validity of the PCA-RF-VIM model,a consolidation model that uses three other independent data-driven methods(Pearson correlation coefficient,RF feature significance analysis method,and XGboost feature significance analysis method)are applied to compare with the PCA-RF-VIM model.A comparison the two models shows that they contain almost the same parameters in the top ten,with only minor differences in one parameter.In combination with the reservoir characteristics,the reasonableness of the PCA-RF-VIM model is verified,and the importance ranking of the parameters by this method is more consistent with the reservoir characteristics of the study area.Ultimately,the ten parameters are selected as the controlling factors that have the potential to influence post-fracturing gas production,as the combined importance of these top ten parameters is 91.95%on driving natural gas production.Analyzing and obtaining these ten controlling factors provides engineers with a new insight into the reservoir selection for fracturing stimulation and fracturing parameter optimization to improve fracturing efficiency and productivity.展开更多
Critical for metering and protection in electric railway traction power supply systems(TPSSs),the measurement performance of voltage transformers(VTs)must be timely and reliably monitored.This paper outlines a three-s...Critical for metering and protection in electric railway traction power supply systems(TPSSs),the measurement performance of voltage transformers(VTs)must be timely and reliably monitored.This paper outlines a three-step,RMS data only method for evaluating VTs in TPSSs.First,a kernel principal component analysis approach is used to diagnose the VT exhibiting significant measurement deviations over time,mitigating the influence of stochastic fluctuations in traction loads.Second,a back propagation neural network is employed to continuously estimate the measurement deviations of the targeted VT.Third,a trend analysis method is developed to assess the evolution of the measurement performance of VTs.Case studies conducted on field data from an operational TPSS demonstrate the effectiveness of the proposed method in detecting VTs with measurement deviations exceeding 1%relative to their original accuracy levels.Additionally,the method accurately tracks deviation trends,enabling the identification of potential early-stage faults in VTs and helping prevent significant economic losses in TPSS operations.展开更多
This paper focuses on the numerical solution of a tumor growth model under a data-driven approach.Based on the inherent laws of the data and reasonable assumptions,an ordinary differential equation model for tumor gro...This paper focuses on the numerical solution of a tumor growth model under a data-driven approach.Based on the inherent laws of the data and reasonable assumptions,an ordinary differential equation model for tumor growth is established.Nonlinear fitting is employed to obtain the optimal parameter estimation of the mathematical model,and the numerical solution is carried out using the Matlab software.By comparing the clinical data with the simulation results,a good agreement is achieved,which verifies the rationality and feasibility of the model.展开更多
The impacts of lateral boundary conditions(LBCs)provided by numerical models and data-driven networks on convective-scale ensemble forecasts are investigated in this study.Four experiments are conducted on the Hangzho...The impacts of lateral boundary conditions(LBCs)provided by numerical models and data-driven networks on convective-scale ensemble forecasts are investigated in this study.Four experiments are conducted on the Hangzhou RDP(19th Hangzhou Asian Games Research Development Project on Convective-scale Ensemble Prediction and Application)testbed,with the LBCs respectively sourced from National Centers for Environmental Prediction(NCEP)Global Forecast System(GFS)forecasts with 33 vertical levels(Exp_GFS),Pangu forecasts with 13 vertical levels(Exp_Pangu),Fuxi forecasts with 13 vertical levels(Exp_Fuxi),and NCEP GFS forecasts with the vertical levels reduced to 13(the same as those of Exp_Pangu and Exp_Fuxi)(Exp_GFSRDV).In general,Exp_Pangu performs comparably to Exp_GFS,while Exp_Fuxi shows slightly inferior performance compared to Exp_Pangu,possibly due to its less accurate large-scale predictions.Therefore,the ability of using data-driven networks to efficiently provide LBCs for convective-scale ensemble forecasts has been demonstrated.Moreover,Exp_GFSRDV has the worst convective-scale forecasts among the four experiments,which indicates the potential improvement of using data-driven networks for LBCs by increasing the vertical levels of the networks.However,the ensemble spread of the four experiments barely increases with lead time.Thus,each experiment has insufficient ensemble spread to present realistic forecast uncertainties,which will be investigated in a future study.展开更多
The distillation process is an important chemical process,and the application of data-driven modelling approach has the potential to reduce model complexity compared to mechanistic modelling,thus improving the efficie...The distillation process is an important chemical process,and the application of data-driven modelling approach has the potential to reduce model complexity compared to mechanistic modelling,thus improving the efficiency of process optimization or monitoring studies.However,the distillation process is highly nonlinear and has multiple uncertainty perturbation intervals,which brings challenges to accurate data-driven modelling of distillation processes.This paper proposes a systematic data-driven modelling framework to solve these problems.Firstly,data segment variance was introduced into the K-means algorithm to form K-means data interval(KMDI)clustering in order to cluster the data into perturbed and steady state intervals for steady-state data extraction.Secondly,maximal information coefficient(MIC)was employed to calculate the nonlinear correlation between variables for removing redundant features.Finally,extreme gradient boosting(XGBoost)was integrated as the basic learner into adaptive boosting(AdaBoost)with the error threshold(ET)set to improve weights update strategy to construct the new integrated learning algorithm,XGBoost-AdaBoost-ET.The superiority of the proposed framework is verified by applying this data-driven modelling framework to a real industrial process of propylene distillation.展开更多
This study presents a data-driven approach to predict tailplane aerodynamics in icing conditions,supporting the ice-tolerant design of aircraft horizontal stabilizers.The core of this work is a low-cost predictive mod...This study presents a data-driven approach to predict tailplane aerodynamics in icing conditions,supporting the ice-tolerant design of aircraft horizontal stabilizers.The core of this work is a low-cost predictive model for analyzing icing effects on swept tailplanes.The method relies on a multi-fidelity data gathering campaign,enabling seamless integration into multidisciplinary aircraft design workflows.A dataset of iced airfoil shapes was generated using 2D inviscid methods across various flight conditions.High-fidelity CFD simulations were conducted on both clean and iced geometries,forming a multidimensional aerodynamic database.This 2D database feeds a nonlinear vortex lattice method to estimate 3D aerodynamic characteristics,following a'quasi-3D'approach.The resulting reduced-order model delivers fast aerodynamic performance estimates of iced tailplanes.To demonstrate its effectiveness,optimal ice-tolerant tailplane designs were selected from a range of feasible shapes based on a reference transport aircraft.The analysis validates the model's reliability,accuracy,and limitations concerning 3D ice shapes and aerodynamic characteristics.Most notably,the model offers near-zero computational cost compared to high-fidelity simulations,making it a valuable tool for efficient aircraft design.展开更多
Permanent magnet synchronous motor(PMSM)is widely used in alternating current servo systems as it provides high eficiency,high power density,and a wide speed regulation range.The servo system is placing higher demands...Permanent magnet synchronous motor(PMSM)is widely used in alternating current servo systems as it provides high eficiency,high power density,and a wide speed regulation range.The servo system is placing higher demands on its control performance.The model predictive control(MPC)algorithm is emerging as a potential high-performance motor control algorithm due to its capability of handling multiple-input and multipleoutput variables and imposed constraints.For the MPC used in the PMSM control process,there is a nonlinear disturbance caused by the change of electromagnetic parameters or load disturbance that may lead to a mismatch between the nominal model and the controlled object,which causes the prediction error and thus affects the dynamic stability of the control system.This paper proposes a data-driven MPC strategy in which the historical data in an appropriate range are utilized to eliminate the impact of parameter mismatch and further improve the control performance.The stability of the proposed algorithm is proved as the simulation demonstrates the feasibility.Compared with the classical MPC strategy,the superiority of the algorithm has also been verified.展开更多
With its generality and practicality, the combination of partial charging curves and machine learning(ML) for battery capacity estimation has attracted widespread attention. However, a clear classification,fair compar...With its generality and practicality, the combination of partial charging curves and machine learning(ML) for battery capacity estimation has attracted widespread attention. However, a clear classification,fair comparison, and performance rationalization of these methods are lacking, due to the scattered existing studies. To address these issues, we develop 20 capacity estimation methods from three perspectives:charging sequence construction, input forms, and ML models. 22,582 charging curves are generated from 44 cells with different battery chemistry and operating conditions to validate the performance. Through comprehensive and unbiased comparison, the long short-term memory(LSTM) based neural network exhibits the best accuracy and robustness. Across all 6503 tested samples, the mean absolute percentage error(MAPE) for capacity estimation using LSTM is 0.61%, with a maximum error of only 3.94%. Even with the addition of 3 m V voltage noise or the extension of sampling intervals to 60 s, the average MAPE remains below 2%. Furthermore, the charging sequences are provided with physical explanations related to battery degradation to enhance confidence in their application. Recommendations for using other competitive methods are also presented. This work provides valuable insights and guidance for estimating battery capacity based on partial charging curves.展开更多
Aerodynamic surrogate modeling mostly relies only on integrated loads data obtained from simulation or experiment,while neglecting and wasting the valuable distributed physical information on the surface.To make full ...Aerodynamic surrogate modeling mostly relies only on integrated loads data obtained from simulation or experiment,while neglecting and wasting the valuable distributed physical information on the surface.To make full use of both integrated and distributed loads,a modeling paradigm,called the heterogeneous data-driven aerodynamic modeling,is presented.The essential concept is to incorporate the physical information of distributed loads as additional constraints within the end-to-end aerodynamic modeling.Towards heterogenous data,a novel and easily applicable physical feature embedding modeling framework is designed.This framework extracts lowdimensional physical features from pressure distribution and then effectively enhances the modeling of the integrated loads via feature embedding.The proposed framework can be coupled with multiple feature extraction methods,and the well-performed generalization capabilities over different airfoils are verified through a transonic case.Compared with traditional direct modeling,the proposed framework can reduce testing errors by almost 50%.Given the same prediction accuracy,it can save more than half of the training samples.Furthermore,the visualization analysis has revealed a significant correlation between the discovered low-dimensional physical features and the heterogeneous aerodynamic loads,which shows the interpretability and credibility of the superior performance offered by the proposed deep learning framework.展开更多
The complex sand-casting process combined with the interactions between process parameters makes it difficult to control the casting quality,resulting in a high scrap rate.A strategy based on a data-driven model was p...The complex sand-casting process combined with the interactions between process parameters makes it difficult to control the casting quality,resulting in a high scrap rate.A strategy based on a data-driven model was proposed to reduce casting defects and improve production efficiency,which includes the random forest(RF)classification model,the feature importance analysis,and the process parameters optimization with Monte Carlo simulation.The collected data includes four types of defects and corresponding process parameters were used to construct the RF model.Classification results show a recall rate above 90% for all categories.The Gini Index was used to assess the importance of the process parameters in the formation of various defects in the RF model.Finally,the classification model was applied to different production conditions for quality prediction.In the case of process parameters optimization for gas porosity defects,this model serves as an experimental process in the Monte Carlo method to estimate a better temperature distribution.The prediction model,when applied to the factory,greatly improved the efficiency of defect detection.Results show that the scrap rate decreased from 10.16% to 6.68%.展开更多
For unachievable tracking problems, where the system output cannot precisely track a given reference, achieving the best possible approximation for the reference trajectory becomes the objective. This study aims to in...For unachievable tracking problems, where the system output cannot precisely track a given reference, achieving the best possible approximation for the reference trajectory becomes the objective. This study aims to investigate solutions using the Ptype learning control scheme. Initially, we demonstrate the necessity of gradient information for achieving the best approximation.Subsequently, we propose an input-output-driven learning gain design to handle the imprecise gradients of a class of uncertain systems. However, it is discovered that the desired performance may not be attainable when faced with incomplete information.To address this issue, an extended iterative learning control scheme is introduced. In this scheme, the tracking errors are modified through output data sampling, which incorporates lowmemory footprints and offers flexibility in learning gain design.The input sequence is shown to converge towards the desired input, resulting in an output that is closest to the given reference in the least square sense. Numerical simulations are provided to validate the theoretical findings.展开更多
基金supported by National Key Research and Development Program (2019YFA0708301)National Natural Science Foundation of China (51974337)+2 种基金the Strategic Cooperation Projects of CNPC and CUPB (ZLZX2020-03)Science and Technology Innovation Fund of CNPC (2021DQ02-0403)Open Fund of Petroleum Exploration and Development Research Institute of CNPC (2022-KFKT-09)
文摘We propose an integrated method of data-driven and mechanism models for well logging formation evaluation,explicitly focusing on predicting reservoir parameters,such as porosity and water saturation.Accurately interpreting these parameters is crucial for effectively exploring and developing oil and gas.However,with the increasing complexity of geological conditions in this industry,there is a growing demand for improved accuracy in reservoir parameter prediction,leading to higher costs associated with manual interpretation.The conventional logging interpretation methods rely on empirical relationships between logging data and reservoir parameters,which suffer from low interpretation efficiency,intense subjectivity,and suitability for ideal conditions.The application of artificial intelligence in the interpretation of logging data provides a new solution to the problems existing in traditional methods.It is expected to improve the accuracy and efficiency of the interpretation.If large and high-quality datasets exist,data-driven models can reveal relationships of arbitrary complexity.Nevertheless,constructing sufficiently large logging datasets with reliable labels remains challenging,making it difficult to apply data-driven models effectively in logging data interpretation.Furthermore,data-driven models often act as“black boxes”without explaining their predictions or ensuring compliance with primary physical constraints.This paper proposes a machine learning method with strong physical constraints by integrating mechanism and data-driven models.Prior knowledge of logging data interpretation is embedded into machine learning regarding network structure,loss function,and optimization algorithm.We employ the Physically Informed Auto-Encoder(PIAE)to predict porosity and water saturation,which can be trained without labeled reservoir parameters using self-supervised learning techniques.This approach effectively achieves automated interpretation and facilitates generalization across diverse datasets.
基金funded in part by the National Natural Science Foundation of China under Grant 62401167 and 62192712in part by the Key Laboratory of Marine Environmental Survey Technology and Application,Ministry of Natural Resources,P.R.China under Grant MESTA-2023-B001in part by the Stable Supporting Fund of National Key Laboratory of Underwater Acoustic Technology under Grant JCKYS2022604SSJS007.
文摘The Underwater Acoustic(UWA)channel is bandwidth-constrained and experiences doubly selective fading.It is challenging to acquire perfect channel knowledge for Orthogonal Frequency Division Multiplexing(OFDM)communications using a finite number of pilots.On the other hand,Deep Learning(DL)approaches have been very successful in wireless OFDM communications.However,whether they will work underwater is still a mystery.For the first time,this paper compares two categories of DL-based UWA OFDM receivers:the DataDriven(DD)method,which performs as an end-to-end black box,and the Model-Driven(MD)method,also known as the model-based data-driven method,which combines DL and expert OFDM receiver knowledge.The encoder-decoder framework and Convolutional Neural Network(CNN)structure are employed to establish the DD receiver.On the other hand,an unfolding-based Minimum Mean Square Error(MMSE)structure is adopted for the MD receiver.We analyze the characteristics of different receivers by Monte Carlo simulations under diverse communications conditions and propose a strategy for selecting a proper receiver under different communication scenarios.Field trials in the pool and sea are also conducted to verify the feasibility and advantages of the DL receivers.It is observed that DL receivers perform better than conventional receivers in terms of bit error rate.
基金supported by the National Natural Science Foundation of China(Grant No.12272144).
文摘A data-driven model ofmultiple variable cutting(M-VCUT)level set-based substructure is proposed for the topology optimization of lattice structures.TheM-VCUTlevel setmethod is used to represent substructures,enriching their diversity of configuration while ensuring connectivity.To construct the data-driven model of substructure,a database is prepared by sampling the space of substructures spanned by several substructure prototypes.Then,for each substructure in this database,the stiffness matrix is condensed so that its degrees of freedomare reduced.Thereafter,the data-drivenmodel of substructures is constructed through interpolationwith compactly supported radial basis function(CS-RBF).The inputs of the data-driven model are the design variables of topology optimization,and the outputs are the condensed stiffness matrix and volume of substructures.During the optimization,this data-driven model is used,thus avoiding repeated static condensation that would requiremuch computation time.Several numerical examples are provided to verify the proposed method.
文摘When assessing seismic liquefaction potential with data-driven models,addressing the uncertainties of establishing models,interpreting cone penetration tests(CPT)data and decision threshold is crucial for avoiding biased data selection,ameliorating overconfident models,and being flexible to varying practical objectives,especially when the training and testing data are not identically distributed.A workflow characterized by leveraging Bayesian methodology was proposed to address these issues.Employing a Multi-Layer Perceptron(MLP)as the foundational model,this approach was benchmarked against empirical methods and advanced algorithms for its efficacy in simplicity,accuracy,and resistance to overfitting.The analysis revealed that,while MLP models optimized via maximum a posteriori algorithm suffices for straightforward scenarios,Bayesian neural networks showed great potential for preventing overfitting.Additionally,integrating decision thresholds through various evaluative principles offers insights for challenging decisions.Two case studies demonstrate the framework's capacity for nuanced interpretation of in situ data,employing a model committee for a detailed evaluation of liquefaction potential via Monte Carlo simulations and basic statistics.Overall,the proposed step-by-step workflow for analyzing seismic liquefaction incorporates multifold testing and real-world data validation,showing improved robustness against overfitting and greater versatility in addressing practical challenges.This research contributes to the seismic liquefaction assessment field by providing a structured,adaptable methodology for accurate and reliable analysis.
文摘In the rapidly evolving technological landscape,state-owned enterprises(SOEs)encounter significant challenges in sustaining their competitiveness through efficient R&D management.Integrated Product Development(IPD),with its emphasis on cross-functional teamwork,concurrent engineering,and data-driven decision-making,has been widely recognized for enhancing R&D efficiency and product quality.However,the unique characteristics of SOEs pose challenges to the effective implementation of IPD.The advancement of big data and artificial intelligence technologies offers new opportunities for optimizing IPD R&D management through data-driven decision-making models.This paper constructs and validates a data-driven decision-making model tailored to the IPD R&D management of SOEs.By integrating data mining,machine learning,and other advanced analytical techniques,the model serves as a scientific and efficient decision-making tool.It aids SOEs in optimizing R&D resource allocation,shortening product development cycles,reducing R&D costs,and improving product quality and innovation.Moreover,this study contributes to a deeper theoretical understanding of the value of data-driven decision-making in the context of IPD.
基金This paper is the research result of“Research on Innovation of Evidence-Based Teaching Paradigm in Vocational Education under the Background of New Quality Productivity”(2024JXQ176)the Shandong Province Artificial Intelligence Education Research Project(SDDJ202501035),which explores the application of artificial intelligence big models in student value-added evaluation from an evidence-based perspective。
文摘Based on the educational evaluation reform,this study explores the construction of an evidence-based value-added evaluation system based on data-driven,aiming to solve the limitations of traditional evaluation methods.The research adopts the method of combining theoretical analysis and practical application,and designs the evidence-based value-added evaluation framework,which includes the core elements of a multi-source heterogeneous data acquisition and processing system,a value-added evaluation agent based on a large model,and an evaluation implementation and application mechanism.Through empirical research verification,the evaluation system has remarkable effects in improving learning participation,promoting ability development,and supporting teaching decision-making,and provides a theoretical reference and practical path for educational evaluation reform in the new era.The research shows that the evidence-based value-added evaluation system based on data-driven can reflect students’actual progress more fairly and objectively by accurately measuring the difference in starting point and development range of students,and provide strong support for the realization of high-quality education development.
基金Supported by the National Natural Science Foundation of China under Grant No.52131102.
文摘With the rapid advancement of machine learning technology and its growing adoption in research and engineering applications,an increasing number of studies have embraced data-driven approaches for modeling wind turbine wakes.These models leverage the ability to capture complex,high-dimensional characteristics of wind turbine wakes while offering significantly greater efficiency in the prediction process than physics-driven models.As a result,data-driven wind turbine wake models are regarded as powerful and effective tools for predicting wake behavior and turbine power output.This paper aims to provide a concise yet comprehensive review of existing studies on wind turbine wake modeling that employ data-driven approaches.It begins by defining and classifying machine learning methods to facilitate a clearer understanding of the reviewed literature.Subsequently,the related studies are categorized into four key areas:wind turbine power prediction,data-driven analytic wake models,wake field reconstruction,and the incorporation of explicit physical constraints.The accuracy of data-driven models is influenced by two primary factors:the quality of the training data and the performance of the model itself.Accordingly,both data accuracy and model structure are discussed in detail within the review.
基金supported by National Natural Science Foundation of China(Grant No.52375380)National Key R&D Program of China(Grant No.2022YFB3402200)the Key Project of National Natural Science Foundation of China(Grant No.12032018).
文摘The outstanding comprehensive mechanical properties of newly developed hybrid lattice structures make them useful in engineering applications for bearing multiple mechanical loads.Additive-manufacturing technologies make it possible to fabricate these highly spatially programmable structures and greatly enhance the freedom in their design.However,traditional analytical methods do not sufficiently reflect the actual vibration-damping mechanism of lattice structures and are limited by their high computational cost.In this study,a hybrid lattice structure consisting of various cells was designed based on quasi-static and vibration experiments.Subsequently,a novel parametric design method based on a data-driven approach was developed for hybrid lattices with engineered properties.The response surface method was adopted to define the sensitive optimization target.A prediction model for the lattice geometric parameters and vibration properties was established using a backpropagation neural network.Then,it was integrated into the genetic algorithm to create the optimal hybrid lattice with varying geometric features and the required wide-band vibration-damping characteristics.Validation experiments were conducted,demonstrating that the optimized hybrid lattice can achieve the target properties.In addition,the data-driven parametric design method can reduce computation time and be widely applied to complex structural designs when analytical and empirical solutions are unavailable.
基金supported by the Jiangsu Provincial Science and Technology Project Basic Research Program(Natural Science Foundation of Jiangsu Province)(No.BK20211283).
文摘NJmat is a user-friendly,data-driven machine learning interface designed for materials design and analysis.The platform integrates advanced computational techniques,including natural language processing(NLP),large language models(LLM),machine learning potentials(MLP),and graph neural networks(GNN),to facili-tate materials discovery.The platform has been applied in diverse materials research areas,including perovskite surface design,catalyst discovery,battery materials screening,structural alloy design,and molecular informatics.By automating feature selection,predictive modeling,and result interpretation,NJmat accelerates the development of high-performance materials across energy storage,conversion,and structural applications.Additionally,NJmat serves as an educational tool,allowing students and researchers to apply machine learning techniques in materials science with minimal coding expertise.Through automated feature extraction,genetic algorithms,and interpretable machine learning models,NJmat simplifies the workflow for materials informatics,bridging the gap between AI and experimental materials research.The latest version(available at https://figshare.com/articles/software/NJmatML/24607893(accessed on 01 January 2025))enhances its functionality by incorporating NJmatNLP,a module leveraging language models like MatBERT and those based on Word2Vec to support materials prediction tasks.By utilizing clustering and cosine similarity analysis with UMAP visualization,NJmat enables intuitive exploration of materials datasets.While NJmat primarily focuses on structure-property relationships and the discovery of novel chemistries,it can also assist in optimizing processing conditions when relevant parameters are included in the training data.By providing an accessible,integrated environment for machine learning-driven materials discovery,NJmat aligns with the objectives of the Materials Genome Initiative and promotes broader adoption of AI techniques in materials science.
基金funded by the Key Research and Development Program of Shaanxi,China(No.2024GX-YBXM-503)the National Natural Science Foundation of China(No.51974254)。
文摘Hydraulic fracturing technology has achieved remarkable results in improving the production of tight gas reservoirs,but its effectiveness is under the joint action of multiple factors of complexity.Traditional analysis methods have limitations in dealing with these complex and interrelated factors,and it is difficult to fully reveal the actual contribution of each factor to the production.Machine learning-based methods explore the complex mapping relationships between large amounts of data to provide datadriven insights into the key factors driving production.In this study,a data-driven PCA-RF-VIM(Principal Component Analysis-Random Forest-Variable Importance Measures)approach of analyzing the importance of features is proposed to identify the key factors driving post-fracturing production.Four types of parameters,including log parameters,geological and reservoir physical parameters,hydraulic fracturing design parameters,and reservoir stimulation parameters,were inputted into the PCA-RF-VIM model.The model was trained using 6-fold cross-validation and grid search,and the relative importance ranking of each factor was finally obtained.In order to verify the validity of the PCA-RF-VIM model,a consolidation model that uses three other independent data-driven methods(Pearson correlation coefficient,RF feature significance analysis method,and XGboost feature significance analysis method)are applied to compare with the PCA-RF-VIM model.A comparison the two models shows that they contain almost the same parameters in the top ten,with only minor differences in one parameter.In combination with the reservoir characteristics,the reasonableness of the PCA-RF-VIM model is verified,and the importance ranking of the parameters by this method is more consistent with the reservoir characteristics of the study area.Ultimately,the ten parameters are selected as the controlling factors that have the potential to influence post-fracturing gas production,as the combined importance of these top ten parameters is 91.95%on driving natural gas production.Analyzing and obtaining these ten controlling factors provides engineers with a new insight into the reservoir selection for fracturing stimulation and fracturing parameter optimization to improve fracturing efficiency and productivity.
基金supported by the National Natural Science Foundation of China(No.52107125)Applied Basic Research Project of Sichuan Province(No.2022NSFSC0250)Chengdu Guojia Electrical Engineering Co.,Ltd.(No.KYL202312-0043).
文摘Critical for metering and protection in electric railway traction power supply systems(TPSSs),the measurement performance of voltage transformers(VTs)must be timely and reliably monitored.This paper outlines a three-step,RMS data only method for evaluating VTs in TPSSs.First,a kernel principal component analysis approach is used to diagnose the VT exhibiting significant measurement deviations over time,mitigating the influence of stochastic fluctuations in traction loads.Second,a back propagation neural network is employed to continuously estimate the measurement deviations of the targeted VT.Third,a trend analysis method is developed to assess the evolution of the measurement performance of VTs.Case studies conducted on field data from an operational TPSS demonstrate the effectiveness of the proposed method in detecting VTs with measurement deviations exceeding 1%relative to their original accuracy levels.Additionally,the method accurately tracks deviation trends,enabling the identification of potential early-stage faults in VTs and helping prevent significant economic losses in TPSS operations.
基金National Natural Science Foundation of China(Project No.:12371428)Projects of the Provincial College Students’Innovation and Training Program in 2024(Project No.:S202413023106,S202413023110)。
文摘This paper focuses on the numerical solution of a tumor growth model under a data-driven approach.Based on the inherent laws of the data and reasonable assumptions,an ordinary differential equation model for tumor growth is established.Nonlinear fitting is employed to obtain the optimal parameter estimation of the mathematical model,and the numerical solution is carried out using the Matlab software.By comparing the clinical data with the simulation results,a good agreement is achieved,which verifies the rationality and feasibility of the model.
基金supported by the Strategic Research and Consulting Project of the Chinese Academy of Engineering[grant number 2024-XBZD-14]the National Natural Science Foundation of China[grant numbers 42192553 and 41922036]the Fundamental Research Funds for the Central Universities–Cemac“GeoX”Interdisciplinary Program[grant number 020714380207]。
文摘The impacts of lateral boundary conditions(LBCs)provided by numerical models and data-driven networks on convective-scale ensemble forecasts are investigated in this study.Four experiments are conducted on the Hangzhou RDP(19th Hangzhou Asian Games Research Development Project on Convective-scale Ensemble Prediction and Application)testbed,with the LBCs respectively sourced from National Centers for Environmental Prediction(NCEP)Global Forecast System(GFS)forecasts with 33 vertical levels(Exp_GFS),Pangu forecasts with 13 vertical levels(Exp_Pangu),Fuxi forecasts with 13 vertical levels(Exp_Fuxi),and NCEP GFS forecasts with the vertical levels reduced to 13(the same as those of Exp_Pangu and Exp_Fuxi)(Exp_GFSRDV).In general,Exp_Pangu performs comparably to Exp_GFS,while Exp_Fuxi shows slightly inferior performance compared to Exp_Pangu,possibly due to its less accurate large-scale predictions.Therefore,the ability of using data-driven networks to efficiently provide LBCs for convective-scale ensemble forecasts has been demonstrated.Moreover,Exp_GFSRDV has the worst convective-scale forecasts among the four experiments,which indicates the potential improvement of using data-driven networks for LBCs by increasing the vertical levels of the networks.However,the ensemble spread of the four experiments barely increases with lead time.Thus,each experiment has insufficient ensemble spread to present realistic forecast uncertainties,which will be investigated in a future study.
基金supported by the National Key Research and Development Program of China(2023YFB3307801)the National Natural Science Foundation of China(62394343,62373155,62073142)+3 种基金Major Science and Technology Project of Xinjiang(No.2022A01006-4)the Programme of Introducing Talents of Discipline to Universities(the 111 Project)under Grant B17017the Fundamental Research Funds for the Central Universities,Science Foundation of China University of Petroleum,Beijing(No.2462024YJRC011)the Open Research Project of the State Key Laboratory of Industrial Control Technology,China(Grant No.ICT2024B70).
文摘The distillation process is an important chemical process,and the application of data-driven modelling approach has the potential to reduce model complexity compared to mechanistic modelling,thus improving the efficiency of process optimization or monitoring studies.However,the distillation process is highly nonlinear and has multiple uncertainty perturbation intervals,which brings challenges to accurate data-driven modelling of distillation processes.This paper proposes a systematic data-driven modelling framework to solve these problems.Firstly,data segment variance was introduced into the K-means algorithm to form K-means data interval(KMDI)clustering in order to cluster the data into perturbed and steady state intervals for steady-state data extraction.Secondly,maximal information coefficient(MIC)was employed to calculate the nonlinear correlation between variables for removing redundant features.Finally,extreme gradient boosting(XGBoost)was integrated as the basic learner into adaptive boosting(AdaBoost)with the error threshold(ET)set to improve weights update strategy to construct the new integrated learning algorithm,XGBoost-AdaBoost-ET.The superiority of the proposed framework is verified by applying this data-driven modelling framework to a real industrial process of propylene distillation.
基金funding from the Department of Industrial Engineering,University of Naples FedericoⅡ,Italy。
文摘This study presents a data-driven approach to predict tailplane aerodynamics in icing conditions,supporting the ice-tolerant design of aircraft horizontal stabilizers.The core of this work is a low-cost predictive model for analyzing icing effects on swept tailplanes.The method relies on a multi-fidelity data gathering campaign,enabling seamless integration into multidisciplinary aircraft design workflows.A dataset of iced airfoil shapes was generated using 2D inviscid methods across various flight conditions.High-fidelity CFD simulations were conducted on both clean and iced geometries,forming a multidimensional aerodynamic database.This 2D database feeds a nonlinear vortex lattice method to estimate 3D aerodynamic characteristics,following a'quasi-3D'approach.The resulting reduced-order model delivers fast aerodynamic performance estimates of iced tailplanes.To demonstrate its effectiveness,optimal ice-tolerant tailplane designs were selected from a range of feasible shapes based on a reference transport aircraft.The analysis validates the model's reliability,accuracy,and limitations concerning 3D ice shapes and aerodynamic characteristics.Most notably,the model offers near-zero computational cost compared to high-fidelity simulations,making it a valuable tool for efficient aircraft design.
文摘Permanent magnet synchronous motor(PMSM)is widely used in alternating current servo systems as it provides high eficiency,high power density,and a wide speed regulation range.The servo system is placing higher demands on its control performance.The model predictive control(MPC)algorithm is emerging as a potential high-performance motor control algorithm due to its capability of handling multiple-input and multipleoutput variables and imposed constraints.For the MPC used in the PMSM control process,there is a nonlinear disturbance caused by the change of electromagnetic parameters or load disturbance that may lead to a mismatch between the nominal model and the controlled object,which causes the prediction error and thus affects the dynamic stability of the control system.This paper proposes a data-driven MPC strategy in which the historical data in an appropriate range are utilized to eliminate the impact of parameter mismatch and further improve the control performance.The stability of the proposed algorithm is proved as the simulation demonstrates the feasibility.Compared with the classical MPC strategy,the superiority of the algorithm has also been verified.
基金supported by the National Natural Science Foundation of China (52075420)the National Key Research and Development Program of China (2020YFB1708400)。
文摘With its generality and practicality, the combination of partial charging curves and machine learning(ML) for battery capacity estimation has attracted widespread attention. However, a clear classification,fair comparison, and performance rationalization of these methods are lacking, due to the scattered existing studies. To address these issues, we develop 20 capacity estimation methods from three perspectives:charging sequence construction, input forms, and ML models. 22,582 charging curves are generated from 44 cells with different battery chemistry and operating conditions to validate the performance. Through comprehensive and unbiased comparison, the long short-term memory(LSTM) based neural network exhibits the best accuracy and robustness. Across all 6503 tested samples, the mean absolute percentage error(MAPE) for capacity estimation using LSTM is 0.61%, with a maximum error of only 3.94%. Even with the addition of 3 m V voltage noise or the extension of sampling intervals to 60 s, the average MAPE remains below 2%. Furthermore, the charging sequences are provided with physical explanations related to battery degradation to enhance confidence in their application. Recommendations for using other competitive methods are also presented. This work provides valuable insights and guidance for estimating battery capacity based on partial charging curves.
基金supported by the National Natural Science Foundation of China(Nos.92152301,12072282)。
文摘Aerodynamic surrogate modeling mostly relies only on integrated loads data obtained from simulation or experiment,while neglecting and wasting the valuable distributed physical information on the surface.To make full use of both integrated and distributed loads,a modeling paradigm,called the heterogeneous data-driven aerodynamic modeling,is presented.The essential concept is to incorporate the physical information of distributed loads as additional constraints within the end-to-end aerodynamic modeling.Towards heterogenous data,a novel and easily applicable physical feature embedding modeling framework is designed.This framework extracts lowdimensional physical features from pressure distribution and then effectively enhances the modeling of the integrated loads via feature embedding.The proposed framework can be coupled with multiple feature extraction methods,and the well-performed generalization capabilities over different airfoils are verified through a transonic case.Compared with traditional direct modeling,the proposed framework can reduce testing errors by almost 50%.Given the same prediction accuracy,it can save more than half of the training samples.Furthermore,the visualization analysis has revealed a significant correlation between the discovered low-dimensional physical features and the heterogeneous aerodynamic loads,which shows the interpretability and credibility of the superior performance offered by the proposed deep learning framework.
基金financially supported by the National Key Research and Development Program of China(2022YFB3706800,2020YFB1710100)the National Natural Science Foundation of China(51821001,52090042,52074183)。
文摘The complex sand-casting process combined with the interactions between process parameters makes it difficult to control the casting quality,resulting in a high scrap rate.A strategy based on a data-driven model was proposed to reduce casting defects and improve production efficiency,which includes the random forest(RF)classification model,the feature importance analysis,and the process parameters optimization with Monte Carlo simulation.The collected data includes four types of defects and corresponding process parameters were used to construct the RF model.Classification results show a recall rate above 90% for all categories.The Gini Index was used to assess the importance of the process parameters in the formation of various defects in the RF model.Finally,the classification model was applied to different production conditions for quality prediction.In the case of process parameters optimization for gas porosity defects,this model serves as an experimental process in the Monte Carlo method to estimate a better temperature distribution.The prediction model,when applied to the factory,greatly improved the efficiency of defect detection.Results show that the scrap rate decreased from 10.16% to 6.68%.
基金supported by the National Natural Science Foundation of China (62173333, 12271522)Beijing Natural Science Foundation (Z210002)the Research Fund of Renmin University of China (2021030187)。
文摘For unachievable tracking problems, where the system output cannot precisely track a given reference, achieving the best possible approximation for the reference trajectory becomes the objective. This study aims to investigate solutions using the Ptype learning control scheme. Initially, we demonstrate the necessity of gradient information for achieving the best approximation.Subsequently, we propose an input-output-driven learning gain design to handle the imprecise gradients of a class of uncertain systems. However, it is discovered that the desired performance may not be attainable when faced with incomplete information.To address this issue, an extended iterative learning control scheme is introduced. In this scheme, the tracking errors are modified through output data sampling, which incorporates lowmemory footprints and offers flexibility in learning gain design.The input sequence is shown to converge towards the desired input, resulting in an output that is closest to the given reference in the least square sense. Numerical simulations are provided to validate the theoretical findings.