BACKGROUND:This study aims to develop and validate a machine learning-based in-hospital mortality predictive model for acute aortic syndrome(AAS)in the emergency department(ED)and to derive a simplifi ed version suita...BACKGROUND:This study aims to develop and validate a machine learning-based in-hospital mortality predictive model for acute aortic syndrome(AAS)in the emergency department(ED)and to derive a simplifi ed version suitable for rapid clinical application.METHODS:In this multi-center retrospective cohort study,AAS patient data from three hospitals were analyzed.The modeling cohort included data from the First Affiliated Hospital of Zhengzhou University and the People’s Hospital of Xinjiang Uygur Autonomous Region,with Peking University Third Hospital data serving as the external test set.Four machine learning algorithms—logistic regression(LR),multilayer perceptron(MLP),Gaussian naive Bayes(GNB),and random forest(RF)—were used to develop predictive models based on 34 early-accessible clinical variables.A simplifi ed model was then derived based on fi ve key variables(Stanford type,pericardial eff usion,asymmetric peripheral arterial pulsation,decreased bowel sounds,and dyspnea)via Least Absolute Shrinkage and Selection Operator(LASSO)regression to improve ED applicability.RESULTS:A total of 929 patients were included in the modeling cohort,and 210 were included in the external test set.Four machine learning models based on 34 clinical variables were developed,achieving internal and external validation AUCs of 0.85-0.90 and 0.73-0.85,respectively.The simplifi ed model incorporating fi ve key variables demonstrated internal and external validation AUCs of 0.71-0.86 and 0.75-0.78,respectively.Both models showed robust calibration and predictive stability across datasets.CONCLUSION:Both kinds of models were built based on machine learning tools,and proved to have certain prediction performance and extrapolation.展开更多
Accurate prediction of flood events is important for flood control and risk management.Machine learning techniques contributed greatly to advances in flood predictions,and existing studies mainly focused on predicting...Accurate prediction of flood events is important for flood control and risk management.Machine learning techniques contributed greatly to advances in flood predictions,and existing studies mainly focused on predicting flood resource variables using single or hybrid machine learning techniques.However,class-based flood predictions have rarely been investigated,which can aid in quickly diagnosing comprehensive flood characteristics and proposing targeted management strategies.This study proposed a prediction approach of flood regime metrics and event classes coupling machine learning algorithms with clustering-deduced membership degrees.Five algorithms were adopted for this exploration.Results showed that the class membership degrees accurately determined event classes with class hit rates up to 100%,compared with the four classes clustered from nine regime metrics.The nonlinear algorithms(Multiple Linear Regression,Random Forest,and least squares-Support Vector Machine)outperformed the linear techniques(Multiple Linear Regression and Stepwise Regression)in predicting flood regime metrics.The proposed approach well predicted flood event classes with average class hit rates of 66.0%-85.4%and 47.2%-76.0%in calibration and validation periods,respectively,particularly for the slow and late flood events.The predictive capability of the proposed prediction approach for flood regime metrics and classes was considerably stronger than that of hydrological modeling approach.展开更多
The key problem of the adaptive mixture background model is that the parameters can adaptively change according to the input data. To address the problem, a new method is proposed. Firstly, the recursive equations are...The key problem of the adaptive mixture background model is that the parameters can adaptively change according to the input data. To address the problem, a new method is proposed. Firstly, the recursive equations are inferred based on the maximum likelihood rule. Secondly, the forgetting factor and learning rate factor are redefined, and their still more general formulations are obtained by analyzing their practical functions. Lastly, the convergence of the proposed algorithm is proved to enable the estimation converge to a local maximum of the data likelihood function according to the stochastic approximation theory. The experiments show that the proposed learning algorithm excels the formers both in converging rate and accuracy.展开更多
Accurate and reliable photovoltaic(PV)modeling is crucial for the performance evaluation,control,and optimization of PV systems.However,existing methods for PV parameter identification often suffer from limitations in...Accurate and reliable photovoltaic(PV)modeling is crucial for the performance evaluation,control,and optimization of PV systems.However,existing methods for PV parameter identification often suffer from limitations in accuracy and efficiency.To address these challenges,we propose an adaptive multi-learning cooperation search algorithm(AMLCSA)for efficient identification of unknown parameters in PV models.AMLCSA is a novel algorithm inspired by teamwork behaviors in modern enterprises.It enhances the original cooperation search algorithm in two key aspects:(i)an adaptive multi-learning strategy that dynamically adjusts search ranges using adaptive weights,allowing better individuals to focus on local exploitation while guiding poorer individuals toward global exploration;and(ii)a chaotic grouping reflection strategy that introduces chaotic sequences to enhance population diversity and improve search performance.The effectiveness of AMLCSA is demonstrated on single-diode,double-diode,and three PV-module models.Simulation results show that AMLCSA offers significant advantages in convergence,accuracy,and stability compared to existing state-of-the-art algorithms.展开更多
The distillation process is an important chemical process,and the application of data-driven modelling approach has the potential to reduce model complexity compared to mechanistic modelling,thus improving the efficie...The distillation process is an important chemical process,and the application of data-driven modelling approach has the potential to reduce model complexity compared to mechanistic modelling,thus improving the efficiency of process optimization or monitoring studies.However,the distillation process is highly nonlinear and has multiple uncertainty perturbation intervals,which brings challenges to accurate data-driven modelling of distillation processes.This paper proposes a systematic data-driven modelling framework to solve these problems.Firstly,data segment variance was introduced into the K-means algorithm to form K-means data interval(KMDI)clustering in order to cluster the data into perturbed and steady state intervals for steady-state data extraction.Secondly,maximal information coefficient(MIC)was employed to calculate the nonlinear correlation between variables for removing redundant features.Finally,extreme gradient boosting(XGBoost)was integrated as the basic learner into adaptive boosting(AdaBoost)with the error threshold(ET)set to improve weights update strategy to construct the new integrated learning algorithm,XGBoost-AdaBoost-ET.The superiority of the proposed framework is verified by applying this data-driven modelling framework to a real industrial process of propylene distillation.展开更多
BACKGROUND Difficulty of colonoscopy insertion(DCI)significantly affects colonoscopy effectiveness and serves as a key quality indicator.Predicting and evaluating DCI risk preoperatively is crucial for optimizing intr...BACKGROUND Difficulty of colonoscopy insertion(DCI)significantly affects colonoscopy effectiveness and serves as a key quality indicator.Predicting and evaluating DCI risk preoperatively is crucial for optimizing intraoperative strategies.AIM To evaluate the predictive performance of machine learning(ML)algorithms for DCI by comparing three modeling approaches,identify factors influencing DCI,and develop a preoperative prediction model using ML algorithms to enhance colonoscopy quality and efficiency.METHODS This cross-sectional study enrolled 712 patients who underwent colonoscopy at a tertiary hospital between June 2020 and May 2021.Demographic data,past medical history,medication use,and psychological status were collected.The endoscopist assessed DCI using the visual analogue scale.After univariate screening,predictive models were developed using multivariable logistic regression,least absolute shrinkage and selection operator(LASSO)regression,and random forest(RF)algorithms.Model performance was evaluated based on discrimination,calibration,and decision curve analysis(DCA),and results were visualized using nomograms.RESULTS A total of 712 patients(53.8%male;mean age 54.5 years±12.9 years)were included.Logistic regression analysis identified constipation[odds ratio(OR)=2.254,95%confidence interval(CI):1.289-3.931],abdominal circumference(AC)(77.5–91.9 cm,OR=1.895,95%CI:1.065-3.350;AC≥92 cm,OR=1.271,95%CI:0.730-2.188),and anxiety(OR=1.071,95%CI:1.044-1.100)as predictive factors for DCI,validated by LASSO and RF methods.Model performance revealed training/validation sensitivities of 0.826/0.925,0.924/0.868,and 1.000/0.981;specificities of 0.602/0.511,0.510/0.562,and 0.977/0.526;and corresponding area under the receiver operating characteristic curves(AUCs)of 0.780(0.737-0.823)/0.726(0.654-0.799),0.754(0.710-0.798)/0.723(0.656-0.791),and 1.000(1.000-1.000)/0.754(0.688-0.820),respectively.DCA indicated optimal net benefit within probability thresholds of 0-0.9 and 0.05-0.37.The RF model demonstrated superior diagnostic accuracy,reflected by perfect training sensitivity(1.000)and highest validation AUC(0.754),outperforming other methods in clinical applicability.CONCLUSION The RF-based model exhibited superior predictive accuracy for DCI compared to multivariable logistic and LASSO regression models.This approach supports individualized preoperative optimization,enhancing colonoscopy quality through targeted risk stratification.展开更多
Model parameters estimation is a pivotal issue for runoff modeling in ungauged catchments.The nonlinear relationship between model parameters and catchment descriptors is a major obstacle for parameter regionalization...Model parameters estimation is a pivotal issue for runoff modeling in ungauged catchments.The nonlinear relationship between model parameters and catchment descriptors is a major obstacle for parameter regionalization,which is the most widely used approach.Runoff modeling was studied in 38 catchments located in the Yellow–Huai–Hai River Basin(YHHRB).The values of the Nash–Sutcliffe efficiency coefficient(NSE),coefficient of determination(R2),and percent bias(PBIAS)indicated the acceptable performance of the soil and water assessment tool(SWAT)model in the YHHRB.Nine descriptors belonging to the categories of climate,soil,vegetation,and topography were used to express the catchment characteristics related to the hydrological processes.The quantitative relationships between the parameters of the SWAT model and the catchment descriptors were analyzed by six regression-based models,including linear regression(LR)equations,support vector regression(SVR),random forest(RF),k-nearest neighbor(kNN),decision tree(DT),and radial basis function(RBF).Each of the 38 catchments was assumed to be an ungauged catchment in turn.Then,the parameters in each target catchment were estimated by the constructed regression models based on the remaining 37 donor catchments.Furthermore,the similaritybased regionalization scheme was used for comparison with the regression-based approach.The results indicated that the runoff with the highest accuracy was modeled by the SVR-based scheme in ungauged catchments.Compared with the traditional LR-based approach,the accuracy of the runoff modeling in ungauged catchments was improved by the machine learning algorithms because of the outstanding capability to deal with nonlinear relationships.The performances of different approaches were similar in humid regions,while the advantages of the machine learning techniques were more evident in arid regions.When the study area contained nested catchments,the best result was calculated with the similarity-based parameter regionalization scheme because of the high catchment density and short spatial distance.The new findings could improve flood forecasting and water resources planning in regions that lack observed data.展开更多
Background:Diabetic nephropathy(DN)is the most common complication of type 2 diabetes mellitus and the main cause of end-stage renal disease worldwide.Diagnostic biomarkers may allow early diagnosis and treatment of D...Background:Diabetic nephropathy(DN)is the most common complication of type 2 diabetes mellitus and the main cause of end-stage renal disease worldwide.Diagnostic biomarkers may allow early diagnosis and treatment of DN to reduce the prevalence and delay the development of DN.Kidney biopsy is the gold standard for diagnosing DN;however,its invasive character is its primary limitation.The machine learning approach provides a non-invasive and specific criterion for diagnosing DN,although traditional machine learning algorithms need to be improved to enhance diagnostic performance.Methods:We applied high-throughput RNA sequencing to obtain the genes related to DN tubular tissues and normal tubular tissues of mice.Then machine learning algorithms,random forest,LASSO logistic regression,and principal component analysis were used to identify key genes(CES1G,CYP4A14,NDUFA4,ABCC4,ACE).Then,the genetic algorithm-optimized backpropagation neural network(GA-BPNN)was used to improve the DN diagnostic model.Results:The AUC value of the GA-BPNN model in the training dataset was 0.83,and the AUC value of the model in the validation dataset was 0.81,while the AUC values of the SVM model in the training dataset and external validation dataset were 0.756 and 0.650,respectively.Thus,this GA-BPNN gave better values than the traditional SVM model.This diagnosis model may aim for personalized diagnosis and treatment of patients with DN.Immunohistochemical staining further confirmed that the tissue and cell expression of NADH dehydrogenase(ubiquinone)1 alpha subcomplex,4-like 2(NDUFA4L2)in tubular tissue in DN mice were decreased.Conclusion:The GA-BPNN model has better accuracy than the traditional SVM model and may provide an effective tool for diagnosing DN.展开更多
Edge Machine Learning(EdgeML)and Tiny Machine Learning(TinyML)are fast-growing fields that bring machine learning to resource-constrained devices,allowing real-time data processing and decision-making at the network’...Edge Machine Learning(EdgeML)and Tiny Machine Learning(TinyML)are fast-growing fields that bring machine learning to resource-constrained devices,allowing real-time data processing and decision-making at the network’s edge.However,the complexity of model conversion techniques,diverse inference mechanisms,and varied learning strategies make designing and deploying these models challenging.Additionally,deploying TinyML models on resource-constrained hardware with specific software frameworks has broadened EdgeML’s applications across various sectors.These factors underscore the necessity for a comprehensive literature review,as current reviews do not systematically encompass the most recent findings on these topics.Consequently,it provides a comprehensive overview of state-of-the-art techniques in model conversion,inference mechanisms,learning strategies within EdgeML,and deploying these models on resource-constrained edge devices using TinyML.It identifies 90 research articles published between 2018 and 2025,categorizing them into two main areas:(1)model conversion,inference,and learning strategies in EdgeML and(2)deploying TinyML models on resource-constrained hardware using specific software frameworks.In the first category,the synthesis of selected research articles compares and critically reviews various model conversion techniques,inference mechanisms,and learning strategies.In the second category,the synthesis identifies and elaborates on major development boards,software frameworks,sensors,and algorithms used in various applications across six major sectors.As a result,this article provides valuable insights for researchers,practitioners,and developers.It assists them in choosing suitable model conversion techniques,inference mechanisms,learning strategies,hardware development boards,software frameworks,sensors,and algorithms tailored to their specific needs and applications across various sectors.展开更多
Antarctic krill(Euphausia superba),widely distributes around Antarctica,is a key species supporting the biodiversity of the Southern Ocean ecosystem.The Commission for the Conservation of Antarctic Marine Living Resou...Antarctic krill(Euphausia superba),widely distributes around Antarctica,is a key species supporting the biodiversity of the Southern Ocean ecosystem.The Commission for the Conservation of Antarctic Marine Living Resources(CCAMLR)has thus managed the krill fishery according to a precautionary way.Currently,CCAMLR is making effort to develop a refined krill fishery management approach based on more solid science,which requires accurate predictions of krill distribution.To address this need,this study investigated the effects of algorithm and spatial resolution on the performance of Antarctic krill distribution modelling.We integrated acoustic data from 4 surveys conducted in the waters adjacent to the Antarctic Peninsula with 11 environmental variables characterizing krill prey conditions,water mass properties,and seafloor topography.These data were processed at 4 spatial resolutions(5,10,15,and 20 km)to fit distribution models using 4 algorithms:Random Forests(RF),Generalized Additive Models(GAM),Extreme Gradient Boosting(XGBoost),and Artificial Neural Networks(ANN).Model performance was assessed and compared in terms of goodness-of-fit and predictive accuracy.The results showed that RF achieved the highest predictive performance at most resolutions,whereas GAM performed best at the coarsest resolution(20 km).XGBoost closely following RF in accuracy and demonstrated robustness as evidenced by the highly consistent partial dependence curves across resolutions.In contrast,ANN exhibited limitations with smaller sample sizes,resulting in comparatively poorer predictive performance.The analysis revealed a trade-off whereby reducing spatial resolution improved model fit and mitigated zero-inflation at the expense of fine-scale information and overall predictive accuracy.Ensemble models,integrating RF,GAM,and XGBoost,are proposed as potential balanced solutions to improve predictive stability,offering a more robust scientific basis for the refinement of krill management.展开更多
This study introduces a novel approach to addressing the challenges of high-dimensional variables and strong nonlinearity in reservoir production and layer configuration optimization.For the first time,relational mach...This study introduces a novel approach to addressing the challenges of high-dimensional variables and strong nonlinearity in reservoir production and layer configuration optimization.For the first time,relational machine learning models are applied in reservoir development optimization.Traditional regression-based models often struggle in complex scenarios,but the proposed relational and regression-based composite differential evolution(RRCODE)method combines a Gaussian naive Bayes relational model with a radial basis function network regression model.This integration effectively captures complex relationships in the optimization process,improving both accuracy and convergence speed.Experimental tests on a multi-layer multi-channel reservoir model,the Egg reservoir model,and a real-field reservoir model(the S reservoir)demonstrate that RRCODE significantly reduces water injection and production volumes while increasing economic returns and cumulative oil recovery.Moreover,the surrogate models employed in RRCODE exhibit lightweight characteristics with low computational overhead.These results highlight RRCODE's superior performance in the integrated optimization of reservoir production and layer configurations,offering more efficient and economically viable solutions for oilfield development.展开更多
Floods and storm surges pose significant threats to coastal regions worldwide,demanding timely and accurate early warning systems(EWS)for disaster preparedness.Traditional numerical and statistical methods often fall ...Floods and storm surges pose significant threats to coastal regions worldwide,demanding timely and accurate early warning systems(EWS)for disaster preparedness.Traditional numerical and statistical methods often fall short in capturing complex,nonlinear,and real-time environmental dynamics.In recent years,machine learning(ML)and deep learning(DL)techniques have emerged as promising alternatives for enhancing the accuracy,speed,and scalability of EWS.This review critically evaluates the evolution of ML models—such as Artificial Neural Networks(ANN),Convolutional Neural Networks(CNN),and Long Short-Term Memory(LSTM)—in coastal flood prediction,highlighting their architectures,data requirements,performance metrics,and implementation challenges.A unique contribution of this work is the synthesis of real-time deployment challenges including latency,edge-cloud tradeoffs,and policy-level integration,areas often overlooked in prior literature.Furthermore,the review presents a comparative framework of model performance across different geographic and hydrologic settings,offering actionable insights for researchers and practitioners.Limitations of current AI-driven models,such as interpretability,data scarcity,and generalization across regions,are discussed in detail.Finally,the paper outlines future research directions including hybrid modelling,transfer learning,explainable AI,and policy-aware alert systems.By bridging technical performance and operational feasibility,this review aims to guide the development of next-generation intelligent EWS for resilient and adaptive coastal management.展开更多
With the rapid development of economy,air pollution caused by industrial expansion has caused serious harm to human health and social development.Therefore,establishing an effective air pollution concentration predict...With the rapid development of economy,air pollution caused by industrial expansion has caused serious harm to human health and social development.Therefore,establishing an effective air pollution concentration prediction system is of great scientific and practical significance for accurate and reliable predictions.This paper proposes a combination of pointinterval prediction system for pollutant concentration prediction by leveraging neural network,meta-heuristic optimization algorithm,and fuzzy theory.Fuzzy information granulation technology is used in data preprocessing to transform numerical sequences into fuzzy particles for comprehensive feature extraction.The golden Jackal optimization algorithm is employed in the optimization stage to fine-tune model hyperparameters.In the prediction stage,an ensemble learning method combines training results frommultiplemodels to obtain final point predictions while also utilizing quantile regression and kernel density estimation methods for interval predictions on the test set.Experimental results demonstrate that the combined model achieves a high goodness of fit coefficient of determination(R^(2))at 99.3% and a maximum difference between prediction accuracy mean absolute percentage error(MAPE)and benchmark model at 12.6%.This suggests that the integrated learning system proposed in this paper can provide more accurate deterministic predictions as well as reliable uncertainty analysis compared to traditionalmodels,offering practical reference for air quality early warning.展开更多
Domain Generation Algorithms(DGAs)continue to pose a significant threat inmodernmalware infrastructures by enabling resilient and evasive communication with Command and Control(C&C)servers.Traditional detection me...Domain Generation Algorithms(DGAs)continue to pose a significant threat inmodernmalware infrastructures by enabling resilient and evasive communication with Command and Control(C&C)servers.Traditional detection methods-rooted in statistical heuristics,feature engineering,and shallow machine learning-struggle to adapt to the increasing sophistication,linguistic mimicry,and adversarial variability of DGA variants.The emergence of Large Language Models(LLMs)marks a transformative shift in this landscape.Leveraging deep contextual understanding,semantic generalization,and few-shot learning capabilities,LLMs such as BERT,GPT,and T5 have shown promising results in detecting both character-based and dictionary-based DGAs,including previously unseen(zeroday)variants.This paper provides a comprehensive and critical review of LLM-driven DGA detection,introducing a structured taxonomy of LLM architectures,evaluating the linguistic and behavioral properties of benchmark datasets,and comparing recent detection frameworks across accuracy,latency,robustness,and multilingual performance.We also highlight key limitations,including challenges in adversarial resilience,model interpretability,deployment scalability,and privacy risks.To address these gaps,we present a forward-looking research roadmap encompassing adversarial training,model compression,cross-lingual benchmarking,and real-time integration with SIEM/SOAR platforms.This survey aims to serve as a foundational resource for advancing the development of scalable,explainable,and operationally viable LLM-based DGA detection systems.展开更多
Accurate determination of rock mass parameters is essential for ensuring the accuracy of numericalsimulations. Displacement back-analysis is the most widely used method;however, the reliability of thecurrent approache...Accurate determination of rock mass parameters is essential for ensuring the accuracy of numericalsimulations. Displacement back-analysis is the most widely used method;however, the reliability of thecurrent approaches remains unsatisfactory. Therefore, in this paper, a multistage rock mass parameterback-analysis method, that considers the construction process and displacement losses is proposed andimplemented through the coupling of numerical simulation, auto-machine learning (AutoML), andmulti-objective optimization algorithms (MOOAs). First, a parametric modeling platform for mechanizedtwin tunnels is developed, generating a dataset through extensive numerical simulations. Next, theAutoML method is utilized to establish a surrogate model linking rock parameters and displacements.The tunnel construction process is divided into multiple stages, transforming the rock mass parameterback-analysis into a multi-objective optimization problem, for which multi-objective optimization algorithmsare introduced to obtain the rock mass parameters. The newly proposed rock mass parameterback-analysis method is validated in a mechanized twin tunnel project, and its accuracy and effectivenessare demonstrated. Compared with traditional single-stage back-analysis methods, the proposedmodel decreases the average absolute percentage error from 12.73% to 4.34%, significantly improving theaccuracy of the back-analysis. Moreover, although the accuracy of back analysis significantly increaseswith the number of construction stages considered, the back analysis time is acceptable. This studyprovides a new method for displacement back analysis that is efficient and accurate, thereby paving theway for precise parameter determination in numerical simulations.展开更多
This paper deals with deriving the properties of updated neural network model that is exploited to identify an unknown nonlinear system via the standard gradient learning algorithm. The convergence of this algorithm f...This paper deals with deriving the properties of updated neural network model that is exploited to identify an unknown nonlinear system via the standard gradient learning algorithm. The convergence of this algorithm for online training the three-layer neural networks in stochastic environment is studied. A special case where an unknown nonlinearity can exactly be approximated by some neural network with a nonlinear activation function for its output layer is considered. To analyze the asymptotic behavior of the learning processes, the so-called Lyapunov-like approach is utilized. As the Lyapunov function, the expected value of the square of approximation error depending on network parameters is chosen. Within this approach, sufficient conditions guaranteeing the convergence of learning algorithm with probability 1 are derived. Simulation results are presented to support the theoretical analysis.展开更多
To curb the worsening tropospheric ozone(O_(3))pollution problem in China,a rapid and accurate identification of O_(3)-precursor sensitivity(OPS)is a crucial prerequisite for formulating effective contingency O_(3) po...To curb the worsening tropospheric ozone(O_(3))pollution problem in China,a rapid and accurate identification of O_(3)-precursor sensitivity(OPS)is a crucial prerequisite for formulating effective contingency O_(3) pollution control strategies.However,currently widely-used methods,such as statistical models and numerical models,exhibit inherent limitations in identifying OPS in a timely and accurate manner.In this study,we developed a novel approach to identify OPS based on eXtreme Gradient Boosting model,Shapley additive explanation(SHAP)al-gorithm,and volatile organic compound(VOC)photochemical decay adjustment,using the meteorology and speciated pollutant monitoring data as the input.By comparing the difference in SHAP values between base sce-nario and precursor reduction scenario for nitrogen oxides(NO_(x))and VOCs,OPS was divided into NO_(x)-limited,VOCs-limited and transition regime.Using the long-lasting O_(3) pollution episode in the autumn of 2022 at the Guangdong-Hong Kong-Macao Greater Bay Area(GBA)as an example,we demonstrated large spatiotemporal heterogeneities of OPS over the GBA,which were generally shifted from NO_(x)-limited to VOCs-limited from September to October and more inclined to be VOCs-limited at the central and NO_(x)-limited in the peripheral areas.This study developed an innovative OPS identification method by comparing the difference in SHAP value before and after precursor emission reduction.Our method enables the accurate identification of OPS in the time scale of seconds,thereby providing a state-of-the-art tool for the rapid guidance of spatial-specific O_(3) control strategies.展开更多
Real-time and reliable measurements of the effluent quality are essential to improve operating efficiency and reduce energy consumption for the wastewater treatment process.Due to the low accuracy and unstable perform...Real-time and reliable measurements of the effluent quality are essential to improve operating efficiency and reduce energy consumption for the wastewater treatment process.Due to the low accuracy and unstable performance of the traditional effluent quality measurements,we propose a selective ensemble extreme learning machine modeling method to enhance the effluent quality predictions.Extreme learning machine algorithm is inserted into a selective ensemble frame as the component model since it runs much faster and provides better generalization performance than other popular learning algorithms.Ensemble extreme learning machine models overcome variations in different trials of simulations for single model.Selective ensemble based on genetic algorithm is used to further exclude some bad components from all the available ensembles in order to reduce the computation complexity and improve the generalization performance.The proposed method is verified with the data from an industrial wastewater treatment plant,located in Shenyang,China.Experimental results show that the proposed method has relatively stronger generalization and higher accuracy than partial least square,neural network partial least square,single extreme learning machine and ensemble extreme learning machine model.展开更多
基金supported by the special fund of the National Clinical Key Specialty Construction Program[(2022)301-2305].
文摘BACKGROUND:This study aims to develop and validate a machine learning-based in-hospital mortality predictive model for acute aortic syndrome(AAS)in the emergency department(ED)and to derive a simplifi ed version suitable for rapid clinical application.METHODS:In this multi-center retrospective cohort study,AAS patient data from three hospitals were analyzed.The modeling cohort included data from the First Affiliated Hospital of Zhengzhou University and the People’s Hospital of Xinjiang Uygur Autonomous Region,with Peking University Third Hospital data serving as the external test set.Four machine learning algorithms—logistic regression(LR),multilayer perceptron(MLP),Gaussian naive Bayes(GNB),and random forest(RF)—were used to develop predictive models based on 34 early-accessible clinical variables.A simplifi ed model was then derived based on fi ve key variables(Stanford type,pericardial eff usion,asymmetric peripheral arterial pulsation,decreased bowel sounds,and dyspnea)via Least Absolute Shrinkage and Selection Operator(LASSO)regression to improve ED applicability.RESULTS:A total of 929 patients were included in the modeling cohort,and 210 were included in the external test set.Four machine learning models based on 34 clinical variables were developed,achieving internal and external validation AUCs of 0.85-0.90 and 0.73-0.85,respectively.The simplifi ed model incorporating fi ve key variables demonstrated internal and external validation AUCs of 0.71-0.86 and 0.75-0.78,respectively.Both models showed robust calibration and predictive stability across datasets.CONCLUSION:Both kinds of models were built based on machine learning tools,and proved to have certain prediction performance and extrapolation.
基金National Key Research and Development Program of China,No.2023YFC3006704National Natural Science Foundation of China,No.42171047CAS-CSIRO Partnership Joint Project of 2024,No.177GJHZ2023097MI。
文摘Accurate prediction of flood events is important for flood control and risk management.Machine learning techniques contributed greatly to advances in flood predictions,and existing studies mainly focused on predicting flood resource variables using single or hybrid machine learning techniques.However,class-based flood predictions have rarely been investigated,which can aid in quickly diagnosing comprehensive flood characteristics and proposing targeted management strategies.This study proposed a prediction approach of flood regime metrics and event classes coupling machine learning algorithms with clustering-deduced membership degrees.Five algorithms were adopted for this exploration.Results showed that the class membership degrees accurately determined event classes with class hit rates up to 100%,compared with the four classes clustered from nine regime metrics.The nonlinear algorithms(Multiple Linear Regression,Random Forest,and least squares-Support Vector Machine)outperformed the linear techniques(Multiple Linear Regression and Stepwise Regression)in predicting flood regime metrics.The proposed approach well predicted flood event classes with average class hit rates of 66.0%-85.4%and 47.2%-76.0%in calibration and validation periods,respectively,particularly for the slow and late flood events.The predictive capability of the proposed prediction approach for flood regime metrics and classes was considerably stronger than that of hydrological modeling approach.
基金the Doctorate Foundation of the Engineering College, Air Force Engineering University.
文摘The key problem of the adaptive mixture background model is that the parameters can adaptively change according to the input data. To address the problem, a new method is proposed. Firstly, the recursive equations are inferred based on the maximum likelihood rule. Secondly, the forgetting factor and learning rate factor are redefined, and their still more general formulations are obtained by analyzing their practical functions. Lastly, the convergence of the proposed algorithm is proved to enable the estimation converge to a local maximum of the data likelihood function according to the stochastic approximation theory. The experiments show that the proposed learning algorithm excels the formers both in converging rate and accuracy.
基金supported by the National Natural Science Foundation of China(Grant Nos.62303197,62273214)the Natural Science Foundation of Shandong Province(ZR2024MFO18).
文摘Accurate and reliable photovoltaic(PV)modeling is crucial for the performance evaluation,control,and optimization of PV systems.However,existing methods for PV parameter identification often suffer from limitations in accuracy and efficiency.To address these challenges,we propose an adaptive multi-learning cooperation search algorithm(AMLCSA)for efficient identification of unknown parameters in PV models.AMLCSA is a novel algorithm inspired by teamwork behaviors in modern enterprises.It enhances the original cooperation search algorithm in two key aspects:(i)an adaptive multi-learning strategy that dynamically adjusts search ranges using adaptive weights,allowing better individuals to focus on local exploitation while guiding poorer individuals toward global exploration;and(ii)a chaotic grouping reflection strategy that introduces chaotic sequences to enhance population diversity and improve search performance.The effectiveness of AMLCSA is demonstrated on single-diode,double-diode,and three PV-module models.Simulation results show that AMLCSA offers significant advantages in convergence,accuracy,and stability compared to existing state-of-the-art algorithms.
基金supported by the National Key Research and Development Program of China(2023YFB3307801)the National Natural Science Foundation of China(62394343,62373155,62073142)+3 种基金Major Science and Technology Project of Xinjiang(No.2022A01006-4)the Programme of Introducing Talents of Discipline to Universities(the 111 Project)under Grant B17017the Fundamental Research Funds for the Central Universities,Science Foundation of China University of Petroleum,Beijing(No.2462024YJRC011)the Open Research Project of the State Key Laboratory of Industrial Control Technology,China(Grant No.ICT2024B70).
文摘The distillation process is an important chemical process,and the application of data-driven modelling approach has the potential to reduce model complexity compared to mechanistic modelling,thus improving the efficiency of process optimization or monitoring studies.However,the distillation process is highly nonlinear and has multiple uncertainty perturbation intervals,which brings challenges to accurate data-driven modelling of distillation processes.This paper proposes a systematic data-driven modelling framework to solve these problems.Firstly,data segment variance was introduced into the K-means algorithm to form K-means data interval(KMDI)clustering in order to cluster the data into perturbed and steady state intervals for steady-state data extraction.Secondly,maximal information coefficient(MIC)was employed to calculate the nonlinear correlation between variables for removing redundant features.Finally,extreme gradient boosting(XGBoost)was integrated as the basic learner into adaptive boosting(AdaBoost)with the error threshold(ET)set to improve weights update strategy to construct the new integrated learning algorithm,XGBoost-AdaBoost-ET.The superiority of the proposed framework is verified by applying this data-driven modelling framework to a real industrial process of propylene distillation.
基金the Chinese Clinical Trial Registry(No.ChiCTR2000040109)approved by the Hospital Ethics Committee(No.20210130017).
文摘BACKGROUND Difficulty of colonoscopy insertion(DCI)significantly affects colonoscopy effectiveness and serves as a key quality indicator.Predicting and evaluating DCI risk preoperatively is crucial for optimizing intraoperative strategies.AIM To evaluate the predictive performance of machine learning(ML)algorithms for DCI by comparing three modeling approaches,identify factors influencing DCI,and develop a preoperative prediction model using ML algorithms to enhance colonoscopy quality and efficiency.METHODS This cross-sectional study enrolled 712 patients who underwent colonoscopy at a tertiary hospital between June 2020 and May 2021.Demographic data,past medical history,medication use,and psychological status were collected.The endoscopist assessed DCI using the visual analogue scale.After univariate screening,predictive models were developed using multivariable logistic regression,least absolute shrinkage and selection operator(LASSO)regression,and random forest(RF)algorithms.Model performance was evaluated based on discrimination,calibration,and decision curve analysis(DCA),and results were visualized using nomograms.RESULTS A total of 712 patients(53.8%male;mean age 54.5 years±12.9 years)were included.Logistic regression analysis identified constipation[odds ratio(OR)=2.254,95%confidence interval(CI):1.289-3.931],abdominal circumference(AC)(77.5–91.9 cm,OR=1.895,95%CI:1.065-3.350;AC≥92 cm,OR=1.271,95%CI:0.730-2.188),and anxiety(OR=1.071,95%CI:1.044-1.100)as predictive factors for DCI,validated by LASSO and RF methods.Model performance revealed training/validation sensitivities of 0.826/0.925,0.924/0.868,and 1.000/0.981;specificities of 0.602/0.511,0.510/0.562,and 0.977/0.526;and corresponding area under the receiver operating characteristic curves(AUCs)of 0.780(0.737-0.823)/0.726(0.654-0.799),0.754(0.710-0.798)/0.723(0.656-0.791),and 1.000(1.000-1.000)/0.754(0.688-0.820),respectively.DCA indicated optimal net benefit within probability thresholds of 0-0.9 and 0.05-0.37.The RF model demonstrated superior diagnostic accuracy,reflected by perfect training sensitivity(1.000)and highest validation AUC(0.754),outperforming other methods in clinical applicability.CONCLUSION The RF-based model exhibited superior predictive accuracy for DCI compared to multivariable logistic and LASSO regression models.This approach supports individualized preoperative optimization,enhancing colonoscopy quality through targeted risk stratification.
基金funded by the National Key Research and Development Program of China(2017YFA0605002,2017YFA0605004,and 2016YFA0601501)the National Natural Science Foundation of China(41961124007,51779145,and 41830863)“Six top talents”in Jiangsu Province(RJFW-031)。
文摘Model parameters estimation is a pivotal issue for runoff modeling in ungauged catchments.The nonlinear relationship between model parameters and catchment descriptors is a major obstacle for parameter regionalization,which is the most widely used approach.Runoff modeling was studied in 38 catchments located in the Yellow–Huai–Hai River Basin(YHHRB).The values of the Nash–Sutcliffe efficiency coefficient(NSE),coefficient of determination(R2),and percent bias(PBIAS)indicated the acceptable performance of the soil and water assessment tool(SWAT)model in the YHHRB.Nine descriptors belonging to the categories of climate,soil,vegetation,and topography were used to express the catchment characteristics related to the hydrological processes.The quantitative relationships between the parameters of the SWAT model and the catchment descriptors were analyzed by six regression-based models,including linear regression(LR)equations,support vector regression(SVR),random forest(RF),k-nearest neighbor(kNN),decision tree(DT),and radial basis function(RBF).Each of the 38 catchments was assumed to be an ungauged catchment in turn.Then,the parameters in each target catchment were estimated by the constructed regression models based on the remaining 37 donor catchments.Furthermore,the similaritybased regionalization scheme was used for comparison with the regression-based approach.The results indicated that the runoff with the highest accuracy was modeled by the SVR-based scheme in ungauged catchments.Compared with the traditional LR-based approach,the accuracy of the runoff modeling in ungauged catchments was improved by the machine learning algorithms because of the outstanding capability to deal with nonlinear relationships.The performances of different approaches were similar in humid regions,while the advantages of the machine learning techniques were more evident in arid regions.When the study area contained nested catchments,the best result was calculated with the similarity-based parameter regionalization scheme because of the high catchment density and short spatial distance.The new findings could improve flood forecasting and water resources planning in regions that lack observed data.
基金the National Natural Science Foundation of China(Grant Number:81970631 to W.L.).
文摘Background:Diabetic nephropathy(DN)is the most common complication of type 2 diabetes mellitus and the main cause of end-stage renal disease worldwide.Diagnostic biomarkers may allow early diagnosis and treatment of DN to reduce the prevalence and delay the development of DN.Kidney biopsy is the gold standard for diagnosing DN;however,its invasive character is its primary limitation.The machine learning approach provides a non-invasive and specific criterion for diagnosing DN,although traditional machine learning algorithms need to be improved to enhance diagnostic performance.Methods:We applied high-throughput RNA sequencing to obtain the genes related to DN tubular tissues and normal tubular tissues of mice.Then machine learning algorithms,random forest,LASSO logistic regression,and principal component analysis were used to identify key genes(CES1G,CYP4A14,NDUFA4,ABCC4,ACE).Then,the genetic algorithm-optimized backpropagation neural network(GA-BPNN)was used to improve the DN diagnostic model.Results:The AUC value of the GA-BPNN model in the training dataset was 0.83,and the AUC value of the model in the validation dataset was 0.81,while the AUC values of the SVM model in the training dataset and external validation dataset were 0.756 and 0.650,respectively.Thus,this GA-BPNN gave better values than the traditional SVM model.This diagnosis model may aim for personalized diagnosis and treatment of patients with DN.Immunohistochemical staining further confirmed that the tissue and cell expression of NADH dehydrogenase(ubiquinone)1 alpha subcomplex,4-like 2(NDUFA4L2)in tubular tissue in DN mice were decreased.Conclusion:The GA-BPNN model has better accuracy than the traditional SVM model and may provide an effective tool for diagnosing DN.
文摘Edge Machine Learning(EdgeML)and Tiny Machine Learning(TinyML)are fast-growing fields that bring machine learning to resource-constrained devices,allowing real-time data processing and decision-making at the network’s edge.However,the complexity of model conversion techniques,diverse inference mechanisms,and varied learning strategies make designing and deploying these models challenging.Additionally,deploying TinyML models on resource-constrained hardware with specific software frameworks has broadened EdgeML’s applications across various sectors.These factors underscore the necessity for a comprehensive literature review,as current reviews do not systematically encompass the most recent findings on these topics.Consequently,it provides a comprehensive overview of state-of-the-art techniques in model conversion,inference mechanisms,learning strategies within EdgeML,and deploying these models on resource-constrained edge devices using TinyML.It identifies 90 research articles published between 2018 and 2025,categorizing them into two main areas:(1)model conversion,inference,and learning strategies in EdgeML and(2)deploying TinyML models on resource-constrained hardware using specific software frameworks.In the first category,the synthesis of selected research articles compares and critically reviews various model conversion techniques,inference mechanisms,and learning strategies.In the second category,the synthesis identifies and elaborates on major development boards,software frameworks,sensors,and algorithms used in various applications across six major sectors.As a result,this article provides valuable insights for researchers,practitioners,and developers.It assists them in choosing suitable model conversion techniques,inference mechanisms,learning strategies,hardware development boards,software frameworks,sensors,and algorithms tailored to their specific needs and applications across various sectors.
基金funded by the National Key R&D Program of China(Grant no.2022YFC2807504)the Marine S&T Fund of Shandong Province for Qingdao Marine Science and Technology Center(Grant no.2022QNLM030002-1)the Central Public-interest Scientific Institution Basal Research(Grant no.2023TD02).
文摘Antarctic krill(Euphausia superba),widely distributes around Antarctica,is a key species supporting the biodiversity of the Southern Ocean ecosystem.The Commission for the Conservation of Antarctic Marine Living Resources(CCAMLR)has thus managed the krill fishery according to a precautionary way.Currently,CCAMLR is making effort to develop a refined krill fishery management approach based on more solid science,which requires accurate predictions of krill distribution.To address this need,this study investigated the effects of algorithm and spatial resolution on the performance of Antarctic krill distribution modelling.We integrated acoustic data from 4 surveys conducted in the waters adjacent to the Antarctic Peninsula with 11 environmental variables characterizing krill prey conditions,water mass properties,and seafloor topography.These data were processed at 4 spatial resolutions(5,10,15,and 20 km)to fit distribution models using 4 algorithms:Random Forests(RF),Generalized Additive Models(GAM),Extreme Gradient Boosting(XGBoost),and Artificial Neural Networks(ANN).Model performance was assessed and compared in terms of goodness-of-fit and predictive accuracy.The results showed that RF achieved the highest predictive performance at most resolutions,whereas GAM performed best at the coarsest resolution(20 km).XGBoost closely following RF in accuracy and demonstrated robustness as evidenced by the highly consistent partial dependence curves across resolutions.In contrast,ANN exhibited limitations with smaller sample sizes,resulting in comparatively poorer predictive performance.The analysis revealed a trade-off whereby reducing spatial resolution improved model fit and mitigated zero-inflation at the expense of fine-scale information and overall predictive accuracy.Ensemble models,integrating RF,GAM,and XGBoost,are proposed as potential balanced solutions to improve predictive stability,offering a more robust scientific basis for the refinement of krill management.
基金supported by the National Natural Science Foundation of China under Grant 52325402,52274057,and 52074340the National Key R&D Program of China under Grant 2023YFB4104200+2 种基金the Major Scientific and Technological Projects of CNOOC under Grant CCL2022RCPS0397RSN111 Project under Grant B08028China Scholarship Council under Grant 202306450108.
文摘This study introduces a novel approach to addressing the challenges of high-dimensional variables and strong nonlinearity in reservoir production and layer configuration optimization.For the first time,relational machine learning models are applied in reservoir development optimization.Traditional regression-based models often struggle in complex scenarios,but the proposed relational and regression-based composite differential evolution(RRCODE)method combines a Gaussian naive Bayes relational model with a radial basis function network regression model.This integration effectively captures complex relationships in the optimization process,improving both accuracy and convergence speed.Experimental tests on a multi-layer multi-channel reservoir model,the Egg reservoir model,and a real-field reservoir model(the S reservoir)demonstrate that RRCODE significantly reduces water injection and production volumes while increasing economic returns and cumulative oil recovery.Moreover,the surrogate models employed in RRCODE exhibit lightweight characteristics with low computational overhead.These results highlight RRCODE's superior performance in the integrated optimization of reservoir production and layer configurations,offering more efficient and economically viable solutions for oilfield development.
文摘Floods and storm surges pose significant threats to coastal regions worldwide,demanding timely and accurate early warning systems(EWS)for disaster preparedness.Traditional numerical and statistical methods often fall short in capturing complex,nonlinear,and real-time environmental dynamics.In recent years,machine learning(ML)and deep learning(DL)techniques have emerged as promising alternatives for enhancing the accuracy,speed,and scalability of EWS.This review critically evaluates the evolution of ML models—such as Artificial Neural Networks(ANN),Convolutional Neural Networks(CNN),and Long Short-Term Memory(LSTM)—in coastal flood prediction,highlighting their architectures,data requirements,performance metrics,and implementation challenges.A unique contribution of this work is the synthesis of real-time deployment challenges including latency,edge-cloud tradeoffs,and policy-level integration,areas often overlooked in prior literature.Furthermore,the review presents a comparative framework of model performance across different geographic and hydrologic settings,offering actionable insights for researchers and practitioners.Limitations of current AI-driven models,such as interpretability,data scarcity,and generalization across regions,are discussed in detail.Finally,the paper outlines future research directions including hybrid modelling,transfer learning,explainable AI,and policy-aware alert systems.By bridging technical performance and operational feasibility,this review aims to guide the development of next-generation intelligent EWS for resilient and adaptive coastal management.
基金supported by General Scientific Research Funding of the Science and Technology Development Fund(FDCT)in Macao(No.0150/2022/A)the Faculty Research Grants of Macao University of Science and Technology(No.FRG-22-074-FIE).
文摘With the rapid development of economy,air pollution caused by industrial expansion has caused serious harm to human health and social development.Therefore,establishing an effective air pollution concentration prediction system is of great scientific and practical significance for accurate and reliable predictions.This paper proposes a combination of pointinterval prediction system for pollutant concentration prediction by leveraging neural network,meta-heuristic optimization algorithm,and fuzzy theory.Fuzzy information granulation technology is used in data preprocessing to transform numerical sequences into fuzzy particles for comprehensive feature extraction.The golden Jackal optimization algorithm is employed in the optimization stage to fine-tune model hyperparameters.In the prediction stage,an ensemble learning method combines training results frommultiplemodels to obtain final point predictions while also utilizing quantile regression and kernel density estimation methods for interval predictions on the test set.Experimental results demonstrate that the combined model achieves a high goodness of fit coefficient of determination(R^(2))at 99.3% and a maximum difference between prediction accuracy mean absolute percentage error(MAPE)and benchmark model at 12.6%.This suggests that the integrated learning system proposed in this paper can provide more accurate deterministic predictions as well as reliable uncertainty analysis compared to traditionalmodels,offering practical reference for air quality early warning.
基金the Deanship of Scientific Research at King Khalid University for funding this work through large group under grant number(GRP.2/663/46).
文摘Domain Generation Algorithms(DGAs)continue to pose a significant threat inmodernmalware infrastructures by enabling resilient and evasive communication with Command and Control(C&C)servers.Traditional detection methods-rooted in statistical heuristics,feature engineering,and shallow machine learning-struggle to adapt to the increasing sophistication,linguistic mimicry,and adversarial variability of DGA variants.The emergence of Large Language Models(LLMs)marks a transformative shift in this landscape.Leveraging deep contextual understanding,semantic generalization,and few-shot learning capabilities,LLMs such as BERT,GPT,and T5 have shown promising results in detecting both character-based and dictionary-based DGAs,including previously unseen(zeroday)variants.This paper provides a comprehensive and critical review of LLM-driven DGA detection,introducing a structured taxonomy of LLM architectures,evaluating the linguistic and behavioral properties of benchmark datasets,and comparing recent detection frameworks across accuracy,latency,robustness,and multilingual performance.We also highlight key limitations,including challenges in adversarial resilience,model interpretability,deployment scalability,and privacy risks.To address these gaps,we present a forward-looking research roadmap encompassing adversarial training,model compression,cross-lingual benchmarking,and real-time integration with SIEM/SOAR platforms.This survey aims to serve as a foundational resource for advancing the development of scalable,explainable,and operationally viable LLM-based DGA detection systems.
基金supported by the National Natural Science Foundation of China(Grant Nos.52090081,52079068)the State Key Laboratory of Hydroscience and Hydraulic Engineering(Grant No.2021-KY-04).
文摘Accurate determination of rock mass parameters is essential for ensuring the accuracy of numericalsimulations. Displacement back-analysis is the most widely used method;however, the reliability of thecurrent approaches remains unsatisfactory. Therefore, in this paper, a multistage rock mass parameterback-analysis method, that considers the construction process and displacement losses is proposed andimplemented through the coupling of numerical simulation, auto-machine learning (AutoML), andmulti-objective optimization algorithms (MOOAs). First, a parametric modeling platform for mechanizedtwin tunnels is developed, generating a dataset through extensive numerical simulations. Next, theAutoML method is utilized to establish a surrogate model linking rock parameters and displacements.The tunnel construction process is divided into multiple stages, transforming the rock mass parameterback-analysis into a multi-objective optimization problem, for which multi-objective optimization algorithmsare introduced to obtain the rock mass parameters. The newly proposed rock mass parameterback-analysis method is validated in a mechanized twin tunnel project, and its accuracy and effectivenessare demonstrated. Compared with traditional single-stage back-analysis methods, the proposedmodel decreases the average absolute percentage error from 12.73% to 4.34%, significantly improving theaccuracy of the back-analysis. Moreover, although the accuracy of back analysis significantly increaseswith the number of construction stages considered, the back analysis time is acceptable. This studyprovides a new method for displacement back analysis that is efficient and accurate, thereby paving theway for precise parameter determination in numerical simulations.
文摘This paper deals with deriving the properties of updated neural network model that is exploited to identify an unknown nonlinear system via the standard gradient learning algorithm. The convergence of this algorithm for online training the three-layer neural networks in stochastic environment is studied. A special case where an unknown nonlinearity can exactly be approximated by some neural network with a nonlinear activation function for its output layer is considered. To analyze the asymptotic behavior of the learning processes, the so-called Lyapunov-like approach is utilized. As the Lyapunov function, the expected value of the square of approximation error depending on network parameters is chosen. Within this approach, sufficient conditions guaranteeing the convergence of learning algorithm with probability 1 are derived. Simulation results are presented to support the theoretical analysis.
基金supported by the Key-Area Research and Development Program of Guangdong Province(No.2020B1111360003)the National Natural Science Foundation of China(Nos.42465008 and 42105164)+2 种基金Yunnan Science and Technology Department Project(No.202501AT070239)Yunnan Science and Technology Department Youth Project(No.202401AU070202)Xianyang Rapid Response Decision Support Project for Ozone(No.YZ2024-ZB019).
文摘To curb the worsening tropospheric ozone(O_(3))pollution problem in China,a rapid and accurate identification of O_(3)-precursor sensitivity(OPS)is a crucial prerequisite for formulating effective contingency O_(3) pollution control strategies.However,currently widely-used methods,such as statistical models and numerical models,exhibit inherent limitations in identifying OPS in a timely and accurate manner.In this study,we developed a novel approach to identify OPS based on eXtreme Gradient Boosting model,Shapley additive explanation(SHAP)al-gorithm,and volatile organic compound(VOC)photochemical decay adjustment,using the meteorology and speciated pollutant monitoring data as the input.By comparing the difference in SHAP values between base sce-nario and precursor reduction scenario for nitrogen oxides(NO_(x))and VOCs,OPS was divided into NO_(x)-limited,VOCs-limited and transition regime.Using the long-lasting O_(3) pollution episode in the autumn of 2022 at the Guangdong-Hong Kong-Macao Greater Bay Area(GBA)as an example,we demonstrated large spatiotemporal heterogeneities of OPS over the GBA,which were generally shifted from NO_(x)-limited to VOCs-limited from September to October and more inclined to be VOCs-limited at the central and NO_(x)-limited in the peripheral areas.This study developed an innovative OPS identification method by comparing the difference in SHAP value before and after precursor emission reduction.Our method enables the accurate identification of OPS in the time scale of seconds,thereby providing a state-of-the-art tool for the rapid guidance of spatial-specific O_(3) control strategies.
基金supported by National Natural Science Foundation of China(Nos.61203102 and 60874057)Postdoctoral Science Foundation of China(No.20100471464)
文摘Real-time and reliable measurements of the effluent quality are essential to improve operating efficiency and reduce energy consumption for the wastewater treatment process.Due to the low accuracy and unstable performance of the traditional effluent quality measurements,we propose a selective ensemble extreme learning machine modeling method to enhance the effluent quality predictions.Extreme learning machine algorithm is inserted into a selective ensemble frame as the component model since it runs much faster and provides better generalization performance than other popular learning algorithms.Ensemble extreme learning machine models overcome variations in different trials of simulations for single model.Selective ensemble based on genetic algorithm is used to further exclude some bad components from all the available ensembles in order to reduce the computation complexity and improve the generalization performance.The proposed method is verified with the data from an industrial wastewater treatment plant,located in Shenyang,China.Experimental results show that the proposed method has relatively stronger generalization and higher accuracy than partial least square,neural network partial least square,single extreme learning machine and ensemble extreme learning machine model.