Artificial intelligence(AI)has emerged as a transformative technology in accelerating drug discovery and development within natural medicines research.Natural medicines,characterized by their complex chemical composit...Artificial intelligence(AI)has emerged as a transformative technology in accelerating drug discovery and development within natural medicines research.Natural medicines,characterized by their complex chemical compositions and multifaceted pharmacological mechanisms,demonstrate widespread application in treating diverse diseases.However,research and development face significant challenges,including component complexity,extraction difficulties,and efficacy validation.AI technology,particularly through deep learning(DL)and machine learning(ML)approaches,enables efficient analysis of extensive datasets,facilitating drug screening,component analysis,and pharmacological mechanism elucidation.The implementation of AI technology demonstrates considerable potential in virtual screening,compound optimization,and synthetic pathway design,thereby enhancing natural medicines’bioavailability and safety profiles.Nevertheless,current applications encounter limitations regarding data quality,model interpretability,and ethical considerations.As AI technologies continue to evolve,natural medicines research and development will achieve greater efficiency and precision,advancing both personalized medicine and contemporary drug development approaches.展开更多
The bandgap is a key parameter for understanding and designing hybrid perovskite material properties,as well as developing photovoltaic devices.Traditional bandgap calculation methods like ultravioletvisible spectrosc...The bandgap is a key parameter for understanding and designing hybrid perovskite material properties,as well as developing photovoltaic devices.Traditional bandgap calculation methods like ultravioletvisible spectroscopy and first-principles calculations are time-and power-consuming,not to mention capturing bandgap change mechanisms for hybrid perovskite materials across a wide range of unknown space.In the present work,an artificial intelligence ensemble comprising two classifiers(with F1 scores of 0.9125 and 0.925)and a regressor(with mean squared error of 0.0014 eV)is constructed to achieve high-precision prediction of the bandgap.The bandgap perovskite dataset is established through highthroughput prediction of bandgaps by the ensemble.Based on the self-built dataset,partial dependence analysis(PDA)is developed to interpret the bandgap influential mechanism.Meanwhile,an interpretable mathematical model with an R^(2)of 0.8417 is generated using the genetic programming symbolic regression(GPSR)technique.The constructed PDA maps agree well with the Shapley Additive exPlanations,the GPSR model,and experiment verification.Through PDA,we reveal the boundary effect,the bowing effect,and their evolution trends with key descriptors.展开更多
This study introduces a comprehensive and automated framework that leverages data-driven method-ologies to address various challenges in shale gas development and production.Specifically,it harnesses the power of Auto...This study introduces a comprehensive and automated framework that leverages data-driven method-ologies to address various challenges in shale gas development and production.Specifically,it harnesses the power of Automated Machine Learning(AutoML)to construct an ensemble model to predict the estimated ultimate recovery(EUR)of shale gas wells.To demystify the“black-box”nature of the ensemble model,KernelSHAP,a kernel-based approach to compute Shapley values,is utilized for elucidating the influential factors that affect shale gas production at both global and local scales.Furthermore,a bi-objective optimization algorithm named NSGA-Ⅱ is seamlessly incorporated to opti-mize hydraulic fracturing designs for production boost and cost control.This innovative framework addresses critical limitations often encountered in applying machine learning(ML)to shale gas pro-duction:the challenge of achieving sufficient model accuracy with limited samples,the multidisciplinary expertise required for developing robust ML models,and the need for interpretability in“black-box”models.Validation with field data from the Fuling shale gas field in the Sichuan Basin substantiates the framework's efficacy in enhancing the precision and applicability of data-driven techniques.The test accuracy of the ensemble ML model reached 83%compared to a maximum of 72%of single ML models.The contribution of each geological and engineering factor to the overall production was quantitatively evaluated.Fracturing design optimization raised EUR by 7%-34%under different production and cost tradeoff scenarios.The results empower domain experts to conduct more precise and objective data-driven analyses and optimizations for shale gas production with minimal expertise in data science.展开更多
Calorific value is one of the most important properties of coal.Machine learning(ML)can be used in the prediction of calorific value to reduce experimental costs.China is one of the world’s largest coal production co...Calorific value is one of the most important properties of coal.Machine learning(ML)can be used in the prediction of calorific value to reduce experimental costs.China is one of the world’s largest coal production countries and coal occupies an important position in its national energy structure.However,ML models with a large database for the overall regions of China are still missing.Based on the extensive coal gasification practices in East China University of Science and Technology,we have built ML models with a large database for overall regions of China.An AutoML model was proposed and achieved a minimum MSE of 1.021.SHAP method was used to increase the model interpretability,and model validity was proved with literature data and additional in-house experiments.The model adaptability was discussed based on the databases of China and USA,showing that geography-specific ML models are essential.This study integrated a large coal database and AutoML method for accurate calorific value prediction and could offer key tools for Chinese coal industry.展开更多
Machinery condition monitoring is beneficial to equipment maintenance and has been receiving much attention from academia and industry.Machine learning,especially deep learning,has become popular for machinery conditi...Machinery condition monitoring is beneficial to equipment maintenance and has been receiving much attention from academia and industry.Machine learning,especially deep learning,has become popular for machinery condition monitoring because that can fully use available data and computational power.Since significant accidents might be caused if wrong fault alarms are given for machine condition monitoring,interpretable machine learning models,integrate signal processing knowledge to enhance trustworthiness of models,are gradually becoming a research hotspot.A previous spectrum-based and interpretable optimized weights method has been proposed to indicate faulty and fundamental frequencies when the analyzed data only contains a healthy type and a fault type.Considering that multiclass fault types are naturally met in practice,this work aims to explore the interpretable optimized weights method for multiclass fault type scenarios.Therefore,a new multiclass optimized weights spectrum(OWS)is proposed and further studied theoretically and numerically.It is found that the multiclass OWS is capable of capturing the characteristic components associated with different conditions and clearly indicating specific fault characteristic frequencies(FCFs)corresponding to each fault condition.This work can provide new insights into spectrum-based fault classification models,and the new multiclass OWS also shows great potential for practical applications.展开更多
Given the growing concern over global warming and the critical role of carbon dioxide(CO_(2))in this phenomenon,the study of CO_(2)-induced alterations in coal strength has garnered significant attention due to its im...Given the growing concern over global warming and the critical role of carbon dioxide(CO_(2))in this phenomenon,the study of CO_(2)-induced alterations in coal strength has garnered significant attention due to its implications for carbon sequestration.A large number of experiments have proved that CO_(2) interaction time(T),saturation pressure(P)and other parameters have significant effects on coal strength.However,accurate evaluation of CO_(2)-induced alterations in coal strength is still a difficult problem,so it is particularly important to establish accurate and efficient prediction models.This study explored the application of advancedmachine learning(ML)algorithms and Gene Expression Programming(GEP)techniques to predict CO_(2)-induced alterations in coal strength.Sixmodels were developed,including three metaheuristic-optimized XGBoost models(GWO-XGBoost,SSA-XGBoost,PO-XGBoost)and three GEP models(GEP-1,GEP-2,GEP-3).Comprehensive evaluations using multiple metrics revealed that all models demonstrated high predictive accuracy,with the SSA-XGBoost model achieving the best performance(R2—Coefficient of determination=0.99396,RMSE—Root Mean Square Error=0.62102,MAE—Mean Absolute Error=0.36164,MAPE—Mean Absolute Percentage Error=4.8101%,RPD—Residual Predictive Deviation=13.4741).Model interpretability analyses using SHAP(Shapley Additive exPlanations),ICE(Individual Conditional Expectation),and PDP(Partial Dependence Plot)techniques highlighted the dominant role of fixed carbon content(FC)and significant interactions between FC and CO_(2) saturation pressure(P).Theresults demonstrated that the proposedmodels effectively address the challenges of CO_(2)-induced strength prediction,providing valuable insights for geological storage safety and environmental applications.展开更多
Formation pore pressure is the foundation of well plan,and it is related to the safety and efficiency of drilling operations in oil and gas development.However,the traditional method for predicting formation pore pres...Formation pore pressure is the foundation of well plan,and it is related to the safety and efficiency of drilling operations in oil and gas development.However,the traditional method for predicting formation pore pressure involves applying post-drilling measurement data from nearby wells to the target well,which may not accurately reflect the formation pore pressure of the target well.In this paper,a novel method for predicting formation pore pressure ahead of the drill bit by embedding petrophysical theory into machine learning based on seismic and logging-while-drilling(LWD)data was proposed.Gated recurrent unit(GRU)and long short-term memory(LSTM)models were developed and validated using data from three wells in the Bohai Oilfield,and the Shapley additive explanations(SHAP)were utilized to visualize and interpret the models proposed in this study,thereby providing valuable insights into the relative importance and impact of input features.The results show that among the eight models trained in this study,almost all model prediction errors converge to 0.05 g/cm^(3),with the largest root mean square error(RMSE)being 0.03072 and the smallest RMSE being 0.008964.Moreover,continuously updating the model with the increasing training data during drilling operations can further improve accuracy.Compared to other approaches,this study accurately and precisely depicts formation pore pressure,while SHAP analysis guides effective model refinement and feature engineering strategies.This work underscores the potential of integrating advanced machine learning techniques with domain-specific knowledge to enhance predictive accuracy for petroleum engineering applications.展开更多
The balancing market in the energy sector plays a critical role in physically and financially balancing the supply and demand.Modeling dynamics in the balancing market can provide valuable insights and prognosis for p...The balancing market in the energy sector plays a critical role in physically and financially balancing the supply and demand.Modeling dynamics in the balancing market can provide valuable insights and prognosis for power grid stability and secure energy supply.While complex machine learning models can achieve high accuracy,their“blackbox”nature severely limits the model interpretability.In this paper,we explore the trade-off between model accuracy and interpretability for the energy balancing market.Particularly,we take the example of forecasting manual frequency restoration reserve(mFRR)activation price in the balancing market using real market data from different energy price zones.We explore the interpretability of mFRR forecasting using two models:extreme gradient boosting(XGBoost)machine and explainable boosting machine(EBM).We also integrate the two models,and we benchmark all the models against a baseline naive model.Our results show that EBM provides forecasting accuracy comparable to XGBoost while yielding a considerable level of interpretability.Our analysis also underscores the challenge of accurately predicting the mFRR price for the instances when the activation price deviates significantly from the spot price.Importantly,EBM's interpretability features reveal insights into non-linear mFRR price drivers and regional market dynamics.Our study demonstrates that EBM is a viable and valuable interpretable alternative to complex black-box AI models in the forecast for the balancing market.展开更多
Discernment of seismic soil liquefaction is a complex and non-linear procedure that is affected by diversified factors of uncertainties and complexity.The Bayesian belief network(BBN)is an effective tool to present a ...Discernment of seismic soil liquefaction is a complex and non-linear procedure that is affected by diversified factors of uncertainties and complexity.The Bayesian belief network(BBN)is an effective tool to present a suitable framework to handle insights into such uncertainties and cause–effect relationships.The intention of this study is to use a hybrid approach methodology for the development of BBN model based on cone penetration test(CPT)case history records to evaluate seismic soil liquefaction potential.In this hybrid approach,naive model is developed initially only by an interpretive structural modeling(ISM)technique using domain knowledge(DK).Subsequently,some useful information about the naive model are embedded as DK in the K2 algorithm to develop a BBN-K2 and DK model.The results of the BBN models are compared and validated with the available artificial neural network(ANN)and C4.5 decision tree(DT)models and found that the BBN model developed by hybrid approach showed compatible and promising results for liquefaction potential assessment.The BBN model developed by hybrid approach provides a viable tool for geotechnical engineers to assess sites conditions susceptible to seismic soil liquefaction.This study also presents sensitivity analysis of the BBN model based on hybrid approach and the most probable explanation of liquefied sites,owing to know the most likely scenario of the liquefaction phenomenon.展开更多
The evolution of pore structure in shales is affected by both the thermal evolution of organic matter(OM)and by inorganic diagenesis,resulting in a wide variety of pore structures.This paper examines the OM distributi...The evolution of pore structure in shales is affected by both the thermal evolution of organic matter(OM)and by inorganic diagenesis,resulting in a wide variety of pore structures.This paper examines the OM distribution in lacustrine shales and its influence on pore structure,and describes the process of porosity development.The principal findings are:(i)Three distribution patterns of OM in lacustrine shales are distinguished;laminated continuous distribution,clumped distribution,and stellate scattered distribution.The differences in total organic carbon(TOC)content,free hydrocarbon content(S_(1)),and OM porosity among these distribution patterns are discussed.(ii)Porosity is negatively correlated with TOC and plagioclase content and positively correlated with quartz,dolomite,and clay mineral content.(iii)Pore evolution in lacustrine shales is characterized by a sequence of decreasing-increasing-decreasing porosity,followed by continuously increasing porosity until a relatively stable condition is reached.(iv)A new model for evaluating porosity in lacustrine shales is proposed.Using this model,the organic and inorganic porosity of shales in the Permian Lucaogou Formation are calculated to be 2.5%-5%and 1%-6.3%,respectively,which correlate closely with measured data.These findings may provide a scientific basis and technical support for the sweet spotting in lacustrine shales in China.展开更多
Characterized by self-monitoring and agile adaptation to fast changing dynamics in complex production environments,smart manufacturing as envisioned under Industry 4.0 aims to improve the throughput and reliability of...Characterized by self-monitoring and agile adaptation to fast changing dynamics in complex production environments,smart manufacturing as envisioned under Industry 4.0 aims to improve the throughput and reliability of production beyond the state-of-the-art.While the widespread application of deep learning(DL)has opened up new opportunities to accomplish the goal,data quality and model interpretability have continued to present a roadblock for the widespread acceptance of DL for real-world applications.This has motivated research on two fronts:data curation,which aims to provide quality data as input for meaningful DL-based analysis,and model interpretation,which intends to reveal the physical reasoning underlying DL model outputs and promote trust from the users.This paper summarizes several key techniques in data curation where breakthroughs in data denoising,outlier detection,imputation,balancing,and semantic annotation have demonstrated the effectiveness in information extraction from noisy,incomplete,insufficient,and/or unannotated data.Also highlighted are model interpretation methods that address the“black-box”nature of DL towards model transparency.展开更多
A data augmentation technique is employed in the current work on a training dataset of 610 bulk metallic glasses(BMGs),which are randomly selected from 762 collected data.An ensemble machine learning(ML)model is devel...A data augmentation technique is employed in the current work on a training dataset of 610 bulk metallic glasses(BMGs),which are randomly selected from 762 collected data.An ensemble machine learning(ML)model is developed on augmented training dataset and tested by the rest 152 data.The result shows that ML model has the ability to predict the maximal diameter Dmaxof BMGs more accurate than all reported ML models.In addition,the novel ML model gives the glass forming ability(GFA)rules:average atomic radius ranging from 140 pm to 165 pm,the value of TT/(T-T)(T-T)being higher than 2.5,the entropy of mixing being higher than 10 J/K/mol,and the enthalpy of mixing ranging from-32 k J/mol to-26 k J/mol.ML model is interpretative,thereby deepening the understanding of GFA.展开更多
For ecological restoration and reconstruction of the degraded area, it is an important premise to correctly understand the degradation factors of the ecosystem in the arid-hot valleys. The factors including vegetation...For ecological restoration and reconstruction of the degraded area, it is an important premise to correctly understand the degradation factors of the ecosystem in the arid-hot valleys. The factors including vegetation degradation, land degradation, arid climate, policy failure, forest fire, rapid population growth, excessive deforestation, overgrazing, steep slope reclamation, economic poverty, engineering construction, lithology, slope, low cultural level, geological hazards, biological disaster, soil properties etc, were selected to study the Yuanmou arid-hot valleys. Based on the interpretative structural model (ISM), it has found out that the degradation factors of the Yuanmou arid-hot valleys were not at the same level but in a multilevel hierarchical system with internal relations, which pointed out that the degradation mode of the arid-hot valleys was "straight (appearance)-penetrating-background". Such researches have important directive significance for the restoration and reconstruction of the arid-hot valleys ecosystem.展开更多
This study was conducted to enable prompt classification of malware,which was becoming increasingly sophisticated.To do this,we analyzed the important features of malware and the relative importance of selected featur...This study was conducted to enable prompt classification of malware,which was becoming increasingly sophisticated.To do this,we analyzed the important features of malware and the relative importance of selected features according to a learning model to assess how those important features were identified.Initially,the analysis features were extracted using Cuckoo Sandbox,an open-source malware analysis tool,then the features were divided into five categories using the extracted information.The 804 extracted features were reduced by 70%after selecting only the most suitable ones for malware classification using a learning model-based feature selection method called the recursive feature elimination.Next,these important features were analyzed.The level of contribution from each one was assessed by the Random Forest classifier method.The results showed that System call features were mostly allocated.At the end,it was possible to accurately identify the malware type using only 36 to 76 features for each of the four types of malware with the most analysis samples available.These were the Trojan,Adware,Downloader,and Backdoor malware.展开更多
In order to improve the interpretation of production log data on gas-water elongated bubble (EB) flow in horizontal wells, a multi-phase flow simulation device was set up to conduct a series of measurement experimen...In order to improve the interpretation of production log data on gas-water elongated bubble (EB) flow in horizontal wells, a multi-phase flow simulation device was set up to conduct a series of measurement experiments using air and tap water as test media, which were measured using a real production logging tool (PLT) string at different deviations and in different mixed flow states. By understanding the characteristics and mechanisms of gas-water EB flow in transparent experimental boreholes during production logging, combined with an analysis of the production log response characteristics and experimental production logging flow pattern maps, a method for flow pattern identification relying on log responses and a drift-flux model were proposed for gas-water EB flow. This model, built upon experimental data of EB flow, reveals physical mechanisms of gas-water EB flow during measurement processing. The coefficients it contains are the specific values under experimental conditions and with the PLT string used in our experiments. These coefficients also reveal the interference with original downhole flow patterns by the PLT string. Due to the representativeness that our simulated flow experiments and PLT string possess, the model coefficients can be applied as empirical values of logging interpretation model parameters directly to real production logging data interpretation, when the measurement circumstances and PLT strings are similar.展开更多
The distributions of local velocity and local phase holdup along the radial direction of pipes are complicated because of gravity differentiation,and the distribution of fluid velocity fi eld changes along the gravity...The distributions of local velocity and local phase holdup along the radial direction of pipes are complicated because of gravity differentiation,and the distribution of fluid velocity fi eld changes along the gravity direction in horizontal wells.Therefore,measuring the mixture flow and water holdup is difficult,resulting in poor interpretation accuracy of the production logging output profile.In this paper,oil–water two-phase flow dynamic simulation logging experiments in horizontal oil–water two-phase fl ow simulation wells were conducted using the Multiple Array Production Suite,which comprises a capacitance array tool(CAT)and a spinner array tool(SAT),and then the response characteristics of SAT and CAT in diff erent fl ow rates and water cut production conditions were studied.According to the response characteristics of CAT in diff erent water holdup ranges,interpolation imaging along the wellbore section determines the water holdup distribution,and then,the oil–water two-phase velocity fi eld in the fl ow section was reconstructed on the basis of the fl ow section water holdup distribution and the logging value of SAT and combined with the rheological equation of viscous fl uid,and the calculation method of the oil–water partial phase fl ow rate in the fl ow section was proposed.This new approach was applied in the experiment data calculations,and the results are basically consistent with the experimental data.The total fl ow rate and water holdup from the calculation are in agreement with the set values in the experiment,suggesting that the method has high accuracy.展开更多
Alarm flood is one of the main problems in the alarm systems of industrial process. Alarm root-cause analysis and alarm prioritization are good for alarm flood reduction. This paper proposes a systematic rationalizati...Alarm flood is one of the main problems in the alarm systems of industrial process. Alarm root-cause analysis and alarm prioritization are good for alarm flood reduction. This paper proposes a systematic rationalization method for multivariate correlated alarms to realize the root cause analysis and alarm prioritization. An information fusion based interpretive structural model is constructed according to the data-driven partial correlation coefficient calculation and process knowledge modification. This hierarchical multi-layer model is helpful in abnormality propagation path identification and root-cause analysis. Revised Likert scale method is adopted to determine the alarm priority and reduce the blindness of alarm handling. As a case study, the Tennessee Eastman process is utilized to show the effectiveness and validity of proposed approach. Alarm system performance comparison shows that our rationalization methodology can reduce the alarm flood to some extent and improve the performance.展开更多
Nonlinear characteristic fault detection and diagnosis method based on higher-order statistical(HOS) is an effective data-driven method, but the calculation costs much for a large-scale process control system. An HOS-...Nonlinear characteristic fault detection and diagnosis method based on higher-order statistical(HOS) is an effective data-driven method, but the calculation costs much for a large-scale process control system. An HOS-ISM fault diagnosis framework combining interpretative structural model(ISM) and HOS is proposed:(1) the adjacency matrix is determined by partial correlation coefficient;(2) the modified adjacency matrix is defined by directed graph with prior knowledge of process piping and instrument diagram;(3) interpretative structural for large-scale process control system is built by this ISM method; and(4) non-Gaussianity index, nonlinearity index, and total nonlinearity index are calculated dynamically based on interpretative structural to effectively eliminate uncertainty of the nonlinear characteristic diagnostic method with reasonable sampling period and data window. The proposed HOS-ISM fault diagnosis framework is verified by the Tennessee Eastman process and presents improvement for highly non-linear characteristic for selected fault cases.展开更多
With the rapid development of the Internet,network security and data privacy are increasingly valued.Although classical Network Intrusion Detection System(NIDS)based on Deep Learning(DL)models can provide good detecti...With the rapid development of the Internet,network security and data privacy are increasingly valued.Although classical Network Intrusion Detection System(NIDS)based on Deep Learning(DL)models can provide good detection accuracy,but collecting samples for centralized training brings the huge risk of data privacy leakage.Furthermore,the training of supervised deep learning models requires a large number of labeled samples,which is usually cumbersome.The“black-box”problem also makes the DL models of NIDS untrustworthy.In this paper,we propose a trusted Federated Learning(FL)Traffic IDS method called FL-TIDS to address the above-mentioned problems.In FL-TIDS,we design an unsupervised intrusion detection model based on autoencoders that alleviates the reliance on marked samples.At the same time,we use FL for model training to protect data privacy.In addition,we design an improved SHAP interpretable method based on chi-square test to perform interpretable analysis of the trained model.We conducted several experiments to evaluate the proposed FL-TIDS.We first determine experimentally the structure and the number of neurons of the unsupervised AE model.Secondly,we evaluated the proposed method using the UNSW-NB15 and CICIDS2017 datasets.The exper-imental results show that the unsupervised AE model has better performance than the other 7 intrusion detection models in terms of precision,recall and f1-score.Then,federated learning is used to train the intrusion detection model.The experimental results indicate that the model is more accurate than the local learning model.Finally,we use an improved SHAP explainability method based on Chi-square test to analyze the explainability.The analysis results show that the identification characteristics of the model are consistent with the attack characteristics,and the model is reliable.展开更多
The systematic analysis of the hierarchical relationship among the factors affecting the sustainable supply chain implementation of water diversion projects has theoretical value and practical significance for the sus...The systematic analysis of the hierarchical relationship among the factors affecting the sustainable supply chain implementation of water diversion projects has theoretical value and practical significance for the sustainable development of large-scale water diversion projects. Through the investigation of relevant literature, books, web pages, materials, and discussions with relevant experts and scholars, a total of 23 factors influencing the sustainable supply chain implementation of water diversion projects were identified. Then using ISM (Interpretative Structural Modeling Method) to analyze the causality of each factor, a multi-level hierarchical structure model was obtained. The results showed that: 1) The surface-level influencing factors of the sustaina<span>ble supply chain implementation of the water diversion project mainly i</span>ncluded 8 factors such as water-saving awareness and water-saving intensity in the diversion area, water quality, water pollution and other disasters, effective incentive mechanisms, etc., and surface-level influencing factors were directly related to the sustainable supply chain implementation of water diversio<span>n projects. 2) The indirect influencing factors of the sustainable supply chai</span>n of water diversion projects included 12 factors such as the water quality and quantity guarantee rate of the supply chain, the government’s enforcement of laws and regulations, water distribution, ecological compensation, and compensatio<span>n mechanisms for residents in the water source area. Indirect influencing</span> factor scan acts directly on the direct influencing factors, and int<span>ervening in the factors that can be controlled by humans is one of the important ways to improve the sustainable operation of water diversion proj</span><span>e</span><span>cts. 3) T</span><span>he fundamental influencing factors for the sustainable supply chain implementation of water diversion projects included three f</span>actors: Resettlement policy, government financial support, and sound laws and regulations. Deep influencing factors had multi-channel influence and controllability, and intervening in them was the main means to improve the sustainable operation of water diversion projects.展开更多
基金supports from the National Key Research and Development Program of China(No.2020YFE0202200)the National Natural Science Foundation of China(Nos.81903538,82322073,92253303)+1 种基金the Innovation Team and Talents Cultivation Program of National Administration of Traditional Chinese Medicine(No.ZYYCXTD-D-202004)the Science and Technology Commission of Shanghai Municipality(Nos.22ZR1474200,24JS2830200).
文摘Artificial intelligence(AI)has emerged as a transformative technology in accelerating drug discovery and development within natural medicines research.Natural medicines,characterized by their complex chemical compositions and multifaceted pharmacological mechanisms,demonstrate widespread application in treating diverse diseases.However,research and development face significant challenges,including component complexity,extraction difficulties,and efficacy validation.AI technology,particularly through deep learning(DL)and machine learning(ML)approaches,enables efficient analysis of extensive datasets,facilitating drug screening,component analysis,and pharmacological mechanism elucidation.The implementation of AI technology demonstrates considerable potential in virtual screening,compound optimization,and synthetic pathway design,thereby enhancing natural medicines’bioavailability and safety profiles.Nevertheless,current applications encounter limitations regarding data quality,model interpretability,and ethical considerations.As AI technologies continue to evolve,natural medicines research and development will achieve greater efficiency and precision,advancing both personalized medicine and contemporary drug development approaches.
基金supported by the National Research Foundation of Korea(NRF)funded by the Korean government(MSIT)(Grant number:RS-2025-02316700,and RS-2025-00522430)the China Scholarship Council Program。
文摘The bandgap is a key parameter for understanding and designing hybrid perovskite material properties,as well as developing photovoltaic devices.Traditional bandgap calculation methods like ultravioletvisible spectroscopy and first-principles calculations are time-and power-consuming,not to mention capturing bandgap change mechanisms for hybrid perovskite materials across a wide range of unknown space.In the present work,an artificial intelligence ensemble comprising two classifiers(with F1 scores of 0.9125 and 0.925)and a regressor(with mean squared error of 0.0014 eV)is constructed to achieve high-precision prediction of the bandgap.The bandgap perovskite dataset is established through highthroughput prediction of bandgaps by the ensemble.Based on the self-built dataset,partial dependence analysis(PDA)is developed to interpret the bandgap influential mechanism.Meanwhile,an interpretable mathematical model with an R^(2)of 0.8417 is generated using the genetic programming symbolic regression(GPSR)technique.The constructed PDA maps agree well with the Shapley Additive exPlanations,the GPSR model,and experiment verification.Through PDA,we reveal the boundary effect,the bowing effect,and their evolution trends with key descriptors.
基金funded by the National Natural Science Foundation of China(42050104).
文摘This study introduces a comprehensive and automated framework that leverages data-driven method-ologies to address various challenges in shale gas development and production.Specifically,it harnesses the power of Automated Machine Learning(AutoML)to construct an ensemble model to predict the estimated ultimate recovery(EUR)of shale gas wells.To demystify the“black-box”nature of the ensemble model,KernelSHAP,a kernel-based approach to compute Shapley values,is utilized for elucidating the influential factors that affect shale gas production at both global and local scales.Furthermore,a bi-objective optimization algorithm named NSGA-Ⅱ is seamlessly incorporated to opti-mize hydraulic fracturing designs for production boost and cost control.This innovative framework addresses critical limitations often encountered in applying machine learning(ML)to shale gas pro-duction:the challenge of achieving sufficient model accuracy with limited samples,the multidisciplinary expertise required for developing robust ML models,and the need for interpretability in“black-box”models.Validation with field data from the Fuling shale gas field in the Sichuan Basin substantiates the framework's efficacy in enhancing the precision and applicability of data-driven techniques.The test accuracy of the ensemble ML model reached 83%compared to a maximum of 72%of single ML models.The contribution of each geological and engineering factor to the overall production was quantitatively evaluated.Fracturing design optimization raised EUR by 7%-34%under different production and cost tradeoff scenarios.The results empower domain experts to conduct more precise and objective data-driven analyses and optimizations for shale gas production with minimal expertise in data science.
基金Shanghai Yangfan Program,22YF1410300,Yunfei GaoNational Natural Science Foundation of China,22208104,Yunfei Gao+1 种基金Shanghai Chenguang Program,21CGA35,Yunfei GaoNational Key Research and Development Program of China,2022YFA1504701,Yunfei Gao,2022YFB4101900,Yunfei Gao。
文摘Calorific value is one of the most important properties of coal.Machine learning(ML)can be used in the prediction of calorific value to reduce experimental costs.China is one of the world’s largest coal production countries and coal occupies an important position in its national energy structure.However,ML models with a large database for the overall regions of China are still missing.Based on the extensive coal gasification practices in East China University of Science and Technology,we have built ML models with a large database for overall regions of China.An AutoML model was proposed and achieved a minimum MSE of 1.021.SHAP method was used to increase the model interpretability,and model validity was proved with literature data and additional in-house experiments.The model adaptability was discussed based on the databases of China and USA,showing that geography-specific ML models are essential.This study integrated a large coal database and AutoML method for accurate calorific value prediction and could offer key tools for Chinese coal industry.
基金supported by the National Natural Science Foundation of China under Grant Nos.523B2043 and 52475112.
文摘Machinery condition monitoring is beneficial to equipment maintenance and has been receiving much attention from academia and industry.Machine learning,especially deep learning,has become popular for machinery condition monitoring because that can fully use available data and computational power.Since significant accidents might be caused if wrong fault alarms are given for machine condition monitoring,interpretable machine learning models,integrate signal processing knowledge to enhance trustworthiness of models,are gradually becoming a research hotspot.A previous spectrum-based and interpretable optimized weights method has been proposed to indicate faulty and fundamental frequencies when the analyzed data only contains a healthy type and a fault type.Considering that multiclass fault types are naturally met in practice,this work aims to explore the interpretable optimized weights method for multiclass fault type scenarios.Therefore,a new multiclass optimized weights spectrum(OWS)is proposed and further studied theoretically and numerically.It is found that the multiclass OWS is capable of capturing the characteristic components associated with different conditions and clearly indicating specific fault characteristic frequencies(FCFs)corresponding to each fault condition.This work can provide new insights into spectrum-based fault classification models,and the new multiclass OWS also shows great potential for practical applications.
基金partially supported by the National Natural Science Foundation of China(42177164,52474121)the Outstanding Youth Project of Hunan Provincial Department of Education(23B0008).
文摘Given the growing concern over global warming and the critical role of carbon dioxide(CO_(2))in this phenomenon,the study of CO_(2)-induced alterations in coal strength has garnered significant attention due to its implications for carbon sequestration.A large number of experiments have proved that CO_(2) interaction time(T),saturation pressure(P)and other parameters have significant effects on coal strength.However,accurate evaluation of CO_(2)-induced alterations in coal strength is still a difficult problem,so it is particularly important to establish accurate and efficient prediction models.This study explored the application of advancedmachine learning(ML)algorithms and Gene Expression Programming(GEP)techniques to predict CO_(2)-induced alterations in coal strength.Sixmodels were developed,including three metaheuristic-optimized XGBoost models(GWO-XGBoost,SSA-XGBoost,PO-XGBoost)and three GEP models(GEP-1,GEP-2,GEP-3).Comprehensive evaluations using multiple metrics revealed that all models demonstrated high predictive accuracy,with the SSA-XGBoost model achieving the best performance(R2—Coefficient of determination=0.99396,RMSE—Root Mean Square Error=0.62102,MAE—Mean Absolute Error=0.36164,MAPE—Mean Absolute Percentage Error=4.8101%,RPD—Residual Predictive Deviation=13.4741).Model interpretability analyses using SHAP(Shapley Additive exPlanations),ICE(Individual Conditional Expectation),and PDP(Partial Dependence Plot)techniques highlighted the dominant role of fixed carbon content(FC)and significant interactions between FC and CO_(2) saturation pressure(P).Theresults demonstrated that the proposedmodels effectively address the challenges of CO_(2)-induced strength prediction,providing valuable insights for geological storage safety and environmental applications.
基金supported by the National Natural Science Foundation of China(Grant numbers:52174012,52394250,52394255,52234002,U22B20126,51804322).
文摘Formation pore pressure is the foundation of well plan,and it is related to the safety and efficiency of drilling operations in oil and gas development.However,the traditional method for predicting formation pore pressure involves applying post-drilling measurement data from nearby wells to the target well,which may not accurately reflect the formation pore pressure of the target well.In this paper,a novel method for predicting formation pore pressure ahead of the drill bit by embedding petrophysical theory into machine learning based on seismic and logging-while-drilling(LWD)data was proposed.Gated recurrent unit(GRU)and long short-term memory(LSTM)models were developed and validated using data from three wells in the Bohai Oilfield,and the Shapley additive explanations(SHAP)were utilized to visualize and interpret the models proposed in this study,thereby providing valuable insights into the relative importance and impact of input features.The results show that among the eight models trained in this study,almost all model prediction errors converge to 0.05 g/cm^(3),with the largest root mean square error(RMSE)being 0.03072 and the smallest RMSE being 0.008964.Moreover,continuously updating the model with the increasing training data during drilling operations can further improve accuracy.Compared to other approaches,this study accurately and precisely depicts formation pore pressure,while SHAP analysis guides effective model refinement and feature engineering strategies.This work underscores the potential of integrating advanced machine learning techniques with domain-specific knowledge to enhance predictive accuracy for petroleum engineering applications.
基金PriTEM project funded by UiO:Energy Convergence Environments
文摘The balancing market in the energy sector plays a critical role in physically and financially balancing the supply and demand.Modeling dynamics in the balancing market can provide valuable insights and prognosis for power grid stability and secure energy supply.While complex machine learning models can achieve high accuracy,their“blackbox”nature severely limits the model interpretability.In this paper,we explore the trade-off between model accuracy and interpretability for the energy balancing market.Particularly,we take the example of forecasting manual frequency restoration reserve(mFRR)activation price in the balancing market using real market data from different energy price zones.We explore the interpretability of mFRR forecasting using two models:extreme gradient boosting(XGBoost)machine and explainable boosting machine(EBM).We also integrate the two models,and we benchmark all the models against a baseline naive model.Our results show that EBM provides forecasting accuracy comparable to XGBoost while yielding a considerable level of interpretability.Our analysis also underscores the challenge of accurately predicting the mFRR price for the instances when the activation price deviates significantly from the spot price.Importantly,EBM's interpretability features reveal insights into non-linear mFRR price drivers and regional market dynamics.Our study demonstrates that EBM is a viable and valuable interpretable alternative to complex black-box AI models in the forecast for the balancing market.
基金Projects(2016YFE0200100,2018YFC1505300-5.3)supported by the National Key Research&Development Plan of ChinaProject(51639002)supported by the Key Program of National Natural Science Foundation of China
文摘Discernment of seismic soil liquefaction is a complex and non-linear procedure that is affected by diversified factors of uncertainties and complexity.The Bayesian belief network(BBN)is an effective tool to present a suitable framework to handle insights into such uncertainties and cause–effect relationships.The intention of this study is to use a hybrid approach methodology for the development of BBN model based on cone penetration test(CPT)case history records to evaluate seismic soil liquefaction potential.In this hybrid approach,naive model is developed initially only by an interpretive structural modeling(ISM)technique using domain knowledge(DK).Subsequently,some useful information about the naive model are embedded as DK in the K2 algorithm to develop a BBN-K2 and DK model.The results of the BBN models are compared and validated with the available artificial neural network(ANN)and C4.5 decision tree(DT)models and found that the BBN model developed by hybrid approach showed compatible and promising results for liquefaction potential assessment.The BBN model developed by hybrid approach provides a viable tool for geotechnical engineers to assess sites conditions susceptible to seismic soil liquefaction.This study also presents sensitivity analysis of the BBN model based on hybrid approach and the most probable explanation of liquefied sites,owing to know the most likely scenario of the liquefaction phenomenon.
基金sponsored by the National Natural Science Foundation of China(42072187,42090025)CNPC Key Project of Science and Technology(2021DQ0405)。
文摘The evolution of pore structure in shales is affected by both the thermal evolution of organic matter(OM)and by inorganic diagenesis,resulting in a wide variety of pore structures.This paper examines the OM distribution in lacustrine shales and its influence on pore structure,and describes the process of porosity development.The principal findings are:(i)Three distribution patterns of OM in lacustrine shales are distinguished;laminated continuous distribution,clumped distribution,and stellate scattered distribution.The differences in total organic carbon(TOC)content,free hydrocarbon content(S_(1)),and OM porosity among these distribution patterns are discussed.(ii)Porosity is negatively correlated with TOC and plagioclase content and positively correlated with quartz,dolomite,and clay mineral content.(iii)Pore evolution in lacustrine shales is characterized by a sequence of decreasing-increasing-decreasing porosity,followed by continuously increasing porosity until a relatively stable condition is reached.(iv)A new model for evaluating porosity in lacustrine shales is proposed.Using this model,the organic and inorganic porosity of shales in the Permian Lucaogou Formation are calculated to be 2.5%-5%and 1%-6.3%,respectively,which correlate closely with measured data.These findings may provide a scientific basis and technical support for the sweet spotting in lacustrine shales in China.
文摘Characterized by self-monitoring and agile adaptation to fast changing dynamics in complex production environments,smart manufacturing as envisioned under Industry 4.0 aims to improve the throughput and reliability of production beyond the state-of-the-art.While the widespread application of deep learning(DL)has opened up new opportunities to accomplish the goal,data quality and model interpretability have continued to present a roadblock for the widespread acceptance of DL for real-world applications.This has motivated research on two fronts:data curation,which aims to provide quality data as input for meaningful DL-based analysis,and model interpretation,which intends to reveal the physical reasoning underlying DL model outputs and promote trust from the users.This paper summarizes several key techniques in data curation where breakthroughs in data denoising,outlier detection,imputation,balancing,and semantic annotation have demonstrated the effectiveness in information extraction from noisy,incomplete,insufficient,and/or unannotated data.Also highlighted are model interpretation methods that address the“black-box”nature of DL towards model transparency.
基金the National Key R&D Program of China(No.2018YFB0704404)the Guangdong Basic and Applied Basic Research Foundation(No.2020A1515110798)+1 种基金the National Natural Science Foundation of China(Grant Nos.91860115)the Stable Supporting Fund of Shenzhen(GXWD20201230155427003-20200728114835006)。
文摘A data augmentation technique is employed in the current work on a training dataset of 610 bulk metallic glasses(BMGs),which are randomly selected from 762 collected data.An ensemble machine learning(ML)model is developed on augmented training dataset and tested by the rest 152 data.The result shows that ML model has the ability to predict the maximal diameter Dmaxof BMGs more accurate than all reported ML models.In addition,the novel ML model gives the glass forming ability(GFA)rules:average atomic radius ranging from 140 pm to 165 pm,the value of TT/(T-T)(T-T)being higher than 2.5,the entropy of mixing being higher than 10 J/K/mol,and the enthalpy of mixing ranging from-32 k J/mol to-26 k J/mol.ML model is interpretative,thereby deepening the understanding of GFA.
基金the National Basic Research Program of China (973 Program) ( 2007CB407206)the National Key Technologies Research and Develop-ment Program in the Eleventh Five-Year Plan of China (2006BAC01A11)
文摘For ecological restoration and reconstruction of the degraded area, it is an important premise to correctly understand the degradation factors of the ecosystem in the arid-hot valleys. The factors including vegetation degradation, land degradation, arid climate, policy failure, forest fire, rapid population growth, excessive deforestation, overgrazing, steep slope reclamation, economic poverty, engineering construction, lithology, slope, low cultural level, geological hazards, biological disaster, soil properties etc, were selected to study the Yuanmou arid-hot valleys. Based on the interpretative structural model (ISM), it has found out that the degradation factors of the Yuanmou arid-hot valleys were not at the same level but in a multilevel hierarchical system with internal relations, which pointed out that the degradation mode of the arid-hot valleys was "straight (appearance)-penetrating-background". Such researches have important directive significance for the restoration and reconstruction of the arid-hot valleys ecosystem.
基金supported by the Research Program through the National Research Foundation of Korea,NRF-2018R1D1A1B07050864.
文摘This study was conducted to enable prompt classification of malware,which was becoming increasingly sophisticated.To do this,we analyzed the important features of malware and the relative importance of selected features according to a learning model to assess how those important features were identified.Initially,the analysis features were extracted using Cuckoo Sandbox,an open-source malware analysis tool,then the features were divided into five categories using the extracted information.The 804 extracted features were reduced by 70%after selecting only the most suitable ones for malware classification using a learning model-based feature selection method called the recursive feature elimination.Next,these important features were analyzed.The level of contribution from each one was assessed by the Random Forest classifier method.The results showed that System call features were mostly allocated.At the end,it was possible to accurately identify the malware type using only 36 to 76 features for each of the four types of malware with the most analysis samples available.These were the Trojan,Adware,Downloader,and Backdoor malware.
文摘In order to improve the interpretation of production log data on gas-water elongated bubble (EB) flow in horizontal wells, a multi-phase flow simulation device was set up to conduct a series of measurement experiments using air and tap water as test media, which were measured using a real production logging tool (PLT) string at different deviations and in different mixed flow states. By understanding the characteristics and mechanisms of gas-water EB flow in transparent experimental boreholes during production logging, combined with an analysis of the production log response characteristics and experimental production logging flow pattern maps, a method for flow pattern identification relying on log responses and a drift-flux model were proposed for gas-water EB flow. This model, built upon experimental data of EB flow, reveals physical mechanisms of gas-water EB flow during measurement processing. The coefficients it contains are the specific values under experimental conditions and with the PLT string used in our experiments. These coefficients also reveal the interference with original downhole flow patterns by the PLT string. Due to the representativeness that our simulated flow experiments and PLT string possess, the model coefficients can be applied as empirical values of logging interpretation model parameters directly to real production logging data interpretation, when the measurement circumstances and PLT strings are similar.
基金supported by National Natural Science Foundation of China(41474115,42174155)Open Fund of Key Laboratory of Exploration Technologies for Oil and Gas Resources(Yangtze University)Ministry of Education of China(No K2018-02)。
文摘The distributions of local velocity and local phase holdup along the radial direction of pipes are complicated because of gravity differentiation,and the distribution of fluid velocity fi eld changes along the gravity direction in horizontal wells.Therefore,measuring the mixture flow and water holdup is difficult,resulting in poor interpretation accuracy of the production logging output profile.In this paper,oil–water two-phase flow dynamic simulation logging experiments in horizontal oil–water two-phase fl ow simulation wells were conducted using the Multiple Array Production Suite,which comprises a capacitance array tool(CAT)and a spinner array tool(SAT),and then the response characteristics of SAT and CAT in diff erent fl ow rates and water cut production conditions were studied.According to the response characteristics of CAT in diff erent water holdup ranges,interpolation imaging along the wellbore section determines the water holdup distribution,and then,the oil–water two-phase velocity fi eld in the fl ow section was reconstructed on the basis of the fl ow section water holdup distribution and the logging value of SAT and combined with the rheological equation of viscous fl uid,and the calculation method of the oil–water partial phase fl ow rate in the fl ow section was proposed.This new approach was applied in the experiment data calculations,and the results are basically consistent with the experimental data.The total fl ow rate and water holdup from the calculation are in agreement with the set values in the experiment,suggesting that the method has high accuracy.
基金Supported by the National Natural Science Foundation of China(61473026,61104131)the Fundamental Research Funds for the Central Universities(JD1413)
文摘Alarm flood is one of the main problems in the alarm systems of industrial process. Alarm root-cause analysis and alarm prioritization are good for alarm flood reduction. This paper proposes a systematic rationalization method for multivariate correlated alarms to realize the root cause analysis and alarm prioritization. An information fusion based interpretive structural model is constructed according to the data-driven partial correlation coefficient calculation and process knowledge modification. This hierarchical multi-layer model is helpful in abnormality propagation path identification and root-cause analysis. Revised Likert scale method is adopted to determine the alarm priority and reduce the blindness of alarm handling. As a case study, the Tennessee Eastman process is utilized to show the effectiveness and validity of proposed approach. Alarm system performance comparison shows that our rationalization methodology can reduce the alarm flood to some extent and improve the performance.
基金Supported by the National Natural Science Foundation of China(61374166)the Doctoral Fund of Ministry of Education of China(20120010110010)the Natural Science Fund of Ningbo(2012A610001)
文摘Nonlinear characteristic fault detection and diagnosis method based on higher-order statistical(HOS) is an effective data-driven method, but the calculation costs much for a large-scale process control system. An HOS-ISM fault diagnosis framework combining interpretative structural model(ISM) and HOS is proposed:(1) the adjacency matrix is determined by partial correlation coefficient;(2) the modified adjacency matrix is defined by directed graph with prior knowledge of process piping and instrument diagram;(3) interpretative structural for large-scale process control system is built by this ISM method; and(4) non-Gaussianity index, nonlinearity index, and total nonlinearity index are calculated dynamically based on interpretative structural to effectively eliminate uncertainty of the nonlinear characteristic diagnostic method with reasonable sampling period and data window. The proposed HOS-ISM fault diagnosis framework is verified by the Tennessee Eastman process and presents improvement for highly non-linear characteristic for selected fault cases.
基金supported by National Natural Science Fundation of China under Grant 61972208National Natural Science Fundation(General Program)of China under Grant 61972211+2 种基金National Key Research and Development Project of China under Grant 2020YFB1804700Future Network Innovation Research and Application Projects under Grant No.2021FNA020062021 Jiangsu Postgraduate Research Innovation Plan under Grant No.KYCX210794.
文摘With the rapid development of the Internet,network security and data privacy are increasingly valued.Although classical Network Intrusion Detection System(NIDS)based on Deep Learning(DL)models can provide good detection accuracy,but collecting samples for centralized training brings the huge risk of data privacy leakage.Furthermore,the training of supervised deep learning models requires a large number of labeled samples,which is usually cumbersome.The“black-box”problem also makes the DL models of NIDS untrustworthy.In this paper,we propose a trusted Federated Learning(FL)Traffic IDS method called FL-TIDS to address the above-mentioned problems.In FL-TIDS,we design an unsupervised intrusion detection model based on autoencoders that alleviates the reliance on marked samples.At the same time,we use FL for model training to protect data privacy.In addition,we design an improved SHAP interpretable method based on chi-square test to perform interpretable analysis of the trained model.We conducted several experiments to evaluate the proposed FL-TIDS.We first determine experimentally the structure and the number of neurons of the unsupervised AE model.Secondly,we evaluated the proposed method using the UNSW-NB15 and CICIDS2017 datasets.The exper-imental results show that the unsupervised AE model has better performance than the other 7 intrusion detection models in terms of precision,recall and f1-score.Then,federated learning is used to train the intrusion detection model.The experimental results indicate that the model is more accurate than the local learning model.Finally,we use an improved SHAP explainability method based on Chi-square test to analyze the explainability.The analysis results show that the identification characteristics of the model are consistent with the attack characteristics,and the model is reliable.
文摘The systematic analysis of the hierarchical relationship among the factors affecting the sustainable supply chain implementation of water diversion projects has theoretical value and practical significance for the sustainable development of large-scale water diversion projects. Through the investigation of relevant literature, books, web pages, materials, and discussions with relevant experts and scholars, a total of 23 factors influencing the sustainable supply chain implementation of water diversion projects were identified. Then using ISM (Interpretative Structural Modeling Method) to analyze the causality of each factor, a multi-level hierarchical structure model was obtained. The results showed that: 1) The surface-level influencing factors of the sustaina<span>ble supply chain implementation of the water diversion project mainly i</span>ncluded 8 factors such as water-saving awareness and water-saving intensity in the diversion area, water quality, water pollution and other disasters, effective incentive mechanisms, etc., and surface-level influencing factors were directly related to the sustainable supply chain implementation of water diversio<span>n projects. 2) The indirect influencing factors of the sustainable supply chai</span>n of water diversion projects included 12 factors such as the water quality and quantity guarantee rate of the supply chain, the government’s enforcement of laws and regulations, water distribution, ecological compensation, and compensatio<span>n mechanisms for residents in the water source area. Indirect influencing</span> factor scan acts directly on the direct influencing factors, and int<span>ervening in the factors that can be controlled by humans is one of the important ways to improve the sustainable operation of water diversion proj</span><span>e</span><span>cts. 3) T</span><span>he fundamental influencing factors for the sustainable supply chain implementation of water diversion projects included three f</span>actors: Resettlement policy, government financial support, and sound laws and regulations. Deep influencing factors had multi-channel influence and controllability, and intervening in them was the main means to improve the sustainable operation of water diversion projects.