In clinical research,subgroup analysis can help identify patient groups that respond better or worse to specific treatments,improve therapeutic effect and safety,and is of great significance in precision medicine.This...In clinical research,subgroup analysis can help identify patient groups that respond better or worse to specific treatments,improve therapeutic effect and safety,and is of great significance in precision medicine.This article considers subgroup analysis methods for longitudinal data containing multiple covariates and biomarkers.We divide subgroups based on whether a linear combination of these biomarkers exceeds a predetermined threshold,and assess the heterogeneity of treatment effects across subgroups using the interaction between subgroups and exposure variables.Quantile regression is used to better characterize the global distribution of the response variable and sparsity penalties are imposed to achieve variable selection of covariates and biomarkers.The effectiveness of our proposed methodology for both variable selection and parameter estimation is verified through random simulations.Finally,we demonstrate the application of this method by analyzing data from the PA.3 trial,further illustrating the practicality of the method proposed in this paper.展开更多
As the core component of inertial navigation systems, fiber optic gyroscope (FOG), with technical advantages such as low power consumption, long lifespan, fast startup speed, and flexible structural design, are widely...As the core component of inertial navigation systems, fiber optic gyroscope (FOG), with technical advantages such as low power consumption, long lifespan, fast startup speed, and flexible structural design, are widely used in aerospace, unmanned driving, and other fields. However, due to the temper-ature sensitivity of optical devices, the influence of environmen-tal temperature causes errors in FOG, thereby greatly limiting their output accuracy. This work researches on machine-learn-ing based temperature error compensation techniques for FOG. Specifically, it focuses on compensating for the bias errors gen-erated in the fiber ring due to the Shupe effect. This work pro-poses a composite model based on k-means clustering, sup-port vector regression, and particle swarm optimization algo-rithms. And it significantly reduced redundancy within the sam-ples by adopting the interval sequence sample. Moreover, met-rics such as root mean square error (RMSE), mean absolute error (MAE), bias stability, and Allan variance, are selected to evaluate the model’s performance and compensation effective-ness. This work effectively enhances the consistency between data and models across different temperature ranges and tem-perature gradients, improving the bias stability of the FOG from 0.022 °/h to 0.006 °/h. Compared to the existing methods utiliz-ing a single machine learning model, the proposed method increases the bias stability of the compensated FOG from 57.11% to 71.98%, and enhances the suppression of rate ramp noise coefficient from 2.29% to 14.83%. This work improves the accuracy of FOG after compensation, providing theoretical guid-ance and technical references for sensors error compensation work in other fields.展开更多
The results of mass appraisal in many countries are used as a basis for calculating the amount of real estate tax,therefore,regardless of the methods used to calculate it,the resulting value should be as close as poss...The results of mass appraisal in many countries are used as a basis for calculating the amount of real estate tax,therefore,regardless of the methods used to calculate it,the resulting value should be as close as possible to the market value of the real estate to maintain a balance of interests between the state and the rights holders.In practice,this condition is not always met,since,firstly,the quality of market data is often very low,and secondly,some markets are characterized by low activity,which is expressed in a deficit of information on asking prices.The aim of the work is ecological valuation of land use:how regression-based mass appraisal can inform ecological conservation,land degradation,and sustainable land management.Four multiple regression models were constructed for AI generated map of land plots for recreational use in St.Petersburg(Russia)with different volumes of market information(32,30,20 and 15 units of market information with four price-forming factors).During the analysis of the quality of the models,it was revealed that the best result is shown by the model built on the maximum sample size,then the model based on 15 analogs,which proves that a larger number of analog objects does not always allow us to achieve better results,since the more analog objects there are.展开更多
The work proposes a distributed Kalman filtering(KF)algorithm to track a time-varying unknown signal process for a stochastic regression model over network systems in a cooperative way.We provide the stability analysi...The work proposes a distributed Kalman filtering(KF)algorithm to track a time-varying unknown signal process for a stochastic regression model over network systems in a cooperative way.We provide the stability analysis of the proposed distributed KF algorithm without independent and stationary signal assumptions,which implies that the theoretical results are able to be applied to stochastic feedback systems.Note that the main difficulty of stability analysis lies in analyzing the properties of the product of non-independent and non-stationary random matrices involved in the error equation.We employ analysis techniques such as stochastic Lyapunov function,stability theory of stochastic systems,and algebraic graph theory to deal with the above issue.The stochastic spatio-temporal cooperative information condition shows the cooperative property of multiple sensors that even though any local sensor cannot track the time-varying unknown signal,the distributed KF algorithm can be utilized to finish the filtering task in a cooperative way.At last,we illustrate the property of the proposed distributed KF algorithm by a simulation example.展开更多
Purpose:The purpose of this study is to develop and compare model choice strategies in context of logistic regression.Model choice means the choice of the covariates to be included in the model.Design/methodology/appr...Purpose:The purpose of this study is to develop and compare model choice strategies in context of logistic regression.Model choice means the choice of the covariates to be included in the model.Design/methodology/approach:The study is based on Monte Carlo simulations.The methods are compared in terms of three measures of accuracy:specificity and two kinds of sensitivity.A loss function combining sensitivity and specificity is introduced and used for a final comparison.Findings:The choice of method depends on how much the users emphasize sensitivity against specificity.It also depends on the sample size.For a typical logistic regression setting with a moderate sample size and a small to moderate effect size,either BIC,BICc or Lasso seems to be optimal.Research limitations:Numerical simulations cannot cover the whole range of data-generating processes occurring with real-world data.Thus,more simulations are needed.Practical implications:Researchers can refer to these results if they believe that their data-generating process is somewhat similar to some of the scenarios presented in this paper.Alternatively,they could run their own simulations and calculate the loss function.Originality/value:This is a systematic comparison of model choice algorithms and heuristics in context of logistic regression.The distinction between two types of sensitivity and a comparison based on a loss function are methodological novelties.展开更多
To accurately model flows with shock waves using staggered-grid Lagrangian hydrodynamics, the artificial viscosity has to be introduced to convert kinetic energy into internal energy, thereby increasing the entropy ac...To accurately model flows with shock waves using staggered-grid Lagrangian hydrodynamics, the artificial viscosity has to be introduced to convert kinetic energy into internal energy, thereby increasing the entropy across shocks. Determining the appropriate strength of the artificial viscosity is an art and strongly depends on the particular problem and experience of the researcher. The objective of this study is to pose the problem of finding the appropriate strength of the artificial viscosity as an optimization problem and solve this problem using machine learning (ML) tools, specifically using surrogate models based on Gaussian Process regression (GPR) and Bayesian analysis. We describe the optimization method and discuss various practical details of its implementation. The shock-containing problems for which we apply this method all have been implemented in the LANL code FLAG (Burton in Connectivity structures and differencing techniques for staggered-grid free-Lagrange hydrodynamics, Tech. Rep. UCRL-JC-110555, Lawrence Livermore National Laboratory, Livermore, CA, 1992, 1992, in Consistent finite-volume discretization of hydrodynamic conservation laws for unstructured grids, Tech. Rep. CRL-JC-118788, Lawrence Livermore National Laboratory, Livermore, CA, 1992, 1994, Multidimensional discretization of conservation laws for unstructured polyhedral grids, Tech. Rep. UCRL-JC-118306, Lawrence Livermore National Laboratory, Livermore, CA, 1992, 1994, in FLAG, a multi-dimensional, multiple mesh, adaptive free-Lagrange, hydrodynamics code. In: NECDC, 1992). First, we apply ML to find optimal values to isolated shock problems of different strengths. Second, we apply ML to optimize the viscosity for a one-dimensional (1D) propagating detonation problem based on Zel’dovich-von Neumann-Doring (ZND) (Fickett and Davis in Detonation: theory and experiment. Dover books on physics. Dover Publications, Mineola, 2000) detonation theory using a reactive burn model. We compare results for default (currently used values in FLAG) and optimized values of the artificial viscosity for these problems demonstrating the potential for significant improvement in the accuracy of computations.展开更多
Carbon emissions have become a critical concern in the global effort to combat climate change,with each country or region contributing differently based on its economic structures,energy sources,and industrial activit...Carbon emissions have become a critical concern in the global effort to combat climate change,with each country or region contributing differently based on its economic structures,energy sources,and industrial activities.The factors influencing carbon emissions vary across countries and sectors.This study examined the factors influencing CO_(2)emissions in the 7 South American countries including Argentina,Brazil,Chile,Colombia,Ecuador,Peru,and Venezuela.We used the Seemingly Unrelated Regression(SUR)model to analyse the relationship of CO_(2)emissions with gross domestic product(GDP),renewable energy use,urbanization,industrialization,international tourism,agricultural productivity,and forest area based on data from 2000 to 2022.According to the SUR model,we found that GDP and industrialization had a moderate positive effect on CO_(2)emissions,whereas renewable energy use had a moderate negative effect on CO_(2)emissions.International tourism generally had a positive impact on CO_(2)emissions,while forest area tended to decrease CO_(2)emissions.Different variables had different effects on CO_(2)emissions in the 7 South American countries.In Argentina and Venezuela,GDP,international tourism,and agricultural productivity significantly affected CO_(2)emissions.In Colombia,GDP and international tourism had a negative impact on CO_(2)emissions.In Brazil,CO_(2)emissions were primarily driven by GDP,while in Chile,Ecuador,and Peru,international tourism had a negative effect on CO_(2)emissions.Overall,this study highlights the importance of country-specific strategies for reducing CO_(2)emissions and emphasizes the varying roles of these driving factors in shaping environmental quality in the 7 South American countries.展开更多
BACKGROUND The spread of the severe acute respiratory syndrome coronavirus 2 outbreak worldwide has caused concern regarding the mortality rate caused by the infection.The determinants of mortality on a global scale c...BACKGROUND The spread of the severe acute respiratory syndrome coronavirus 2 outbreak worldwide has caused concern regarding the mortality rate caused by the infection.The determinants of mortality on a global scale cannot be fully understood due to lack of information.AIM To identify key factors that may explain the variability in case lethality across countries.METHODS We identified 21 Potential risk factors for coronavirus disease 2019(COVID-19)case fatality rate for all the countries with available data.We examined univariate relationships of each variable with case fatality rate(CFR),and all independent variables to identify candidate variables for our final multiple model.Multiple regression analysis technique was used to assess the strength of relationship.RESULTS The mean of COVID-19 mortality was 1.52±1.72%.There was a statistically significant inverse correlation between health expenditure,and number of computed tomography scanners per 1 million with CFR,and significant direct correlation was found between literacy,and air pollution with CFR.This final model can predict approximately 97%of the changes in CFR.CONCLUSION The current study recommends some new predictors explaining affect mortality rate.Thus,it could help decision-makers develop health policies to fight COVID-19.展开更多
As maritime activities increase globally,there is a greater dependency on technology in monitoring,control,and surveillance of vessel activity.One of the most prominent systems for monitoring vessel activity is the Au...As maritime activities increase globally,there is a greater dependency on technology in monitoring,control,and surveillance of vessel activity.One of the most prominent systems for monitoring vessel activity is the Automatic Identification System(AIS).An increase in both vessels fitted with AIS transponders and satellite and terrestrial AIS receivers has resulted in a significant increase in AIS messages received globally.This resultant rich spatial and temporal data source related to vessel activity provides analysts with the ability to perform enhanced vessel movement analytics,of which a pertinent example is the improvement of vessel location predictions.In this paper,we propose a novel strategy for predicting future locations of vessels making use of historic AIS data.The proposed method uses a Linear Regression Model(LRM)and utilizes historic AIS movement data in the form of a-priori generated spatial maps of the course over ground(LRMAC).The LRMAC is an accurate low complexity first-order method that is easy to implement operationally and shows promising results in areas where there is a consistency in the directionality of historic vessel movement.In areas where the historic directionality of vessel movement is diverse,such as areas close to harbors and ports,the LRMAC defaults to the LRM.The proposed LRMAC method is compared to the Single-Point Neighbor Search(SPNS),which is also a first-order method and has a similar level of computational complexity,and for the use case of predicting tanker and cargo vessel trajectories up to 8 hours into the future,the LRMAC showed improved results both in terms of prediction accuracy and execution time.展开更多
Possible changes in the structure and seasonal variability of the subtropical ridge may lead to changes in the rainfall’s variability modes over Caribbean region. This generates additional difficulties around water r...Possible changes in the structure and seasonal variability of the subtropical ridge may lead to changes in the rainfall’s variability modes over Caribbean region. This generates additional difficulties around water resource planning, therefore, obtaining seasonal prediction models that allow these variations to be characterized in detail, it’s a concern, specially for island states. This research proposes the construction of statistical-dynamic models based on PCA regression methods. It is used as predictand the monthly precipitation accumulated, while the predictors (6) are extracted from the ECMWF-SEAS5 ensemble mean forecasts with a lag of one month with respect to the target month. In the construction of the models, two sequential training schemes are evaluated, obtaining that only the shorter preserves the seasonal characteristics of the predictand. The evaluation metrics used, where cell-point and dichotomous methodologies are combined, suggest that the predictors related to sea surface temperatures do not adequately represent the seasonal variability of the predictand, however, others such as the temperature at 850 hPa and the Outgoing Longwave Radiation are represented with a good approximation regardless of the model chosen. In this sense, the models built with the nearest neighbor methodology were the most efficient. Using the individual models with the best results, an ensemble is built that allows improving the individual skill of the models selected as members by correcting the underestimation of precipitation in the dynamic model during the wet season, although problems of overestimation persist for thresholds lower than 50 mm.展开更多
Efficient water quality monitoring and ensuring the safety of drinking water by government agencies in areas where the resource is constantly depleted due to anthropogenic or natural factors cannot be overemphasized. ...Efficient water quality monitoring and ensuring the safety of drinking water by government agencies in areas where the resource is constantly depleted due to anthropogenic or natural factors cannot be overemphasized. The above statement holds for West Texas, Midland, and Odessa Precisely. Two machine learning regression algorithms (Random Forest and XGBoost) were employed to develop models for the prediction of total dissolved solids (TDS) and sodium absorption ratio (SAR) for efficient water quality monitoring of two vital aquifers: Edward-Trinity (plateau), and Ogallala aquifers. These two aquifers have contributed immensely to providing water for different uses ranging from domestic, agricultural, industrial, etc. The data was obtained from the Texas Water Development Board (TWDB). The XGBoost and Random Forest models used in this study gave an accurate prediction of observed data (TDS and SAR) for both the Edward-Trinity (plateau) and Ogallala aquifers with the R<sup>2</sup> values consistently greater than 0.83. The Random Forest model gave a better prediction of TDS and SAR concentration with an average R, MAE, RMSE and MSE of 0.977, 0.015, 0.029 and 0.00, respectively. For the XGBoost, an average R, MAE, RMSE, and MSE of 0.953, 0.016, 0.037 and 0.00, respectively, were achieved. The overall performance of the models produced was impressive. From this study, we can clearly understand that Random Forest and XGBoost are appropriate for water quality prediction and monitoring in an area of high hydrocarbon activities like Midland and Odessa and West Texas at large.展开更多
In this paper, a logistical regression statistical analysis (LR) is presented for a set of variables used in experimental measurements in reversed field pinch (RFP) machines, commonly known as “slinky mode” (SM), ob...In this paper, a logistical regression statistical analysis (LR) is presented for a set of variables used in experimental measurements in reversed field pinch (RFP) machines, commonly known as “slinky mode” (SM), observed to travel around the torus in Madison Symmetric Torus (MST). The LR analysis is used to utilize the modified Sine-Gordon dynamic equation model to predict with high confidence whether the slinky mode will lock or not lock when compared to the experimentally measured motion of the slinky mode. It is observed that under certain conditions, the slinky mode “locks” at or near the intersection of poloidal and/or toroidal gaps in MST. However, locked mode cease to travel around the torus;while unlocked mode keeps traveling without a change in the energy, making it hard to determine an exact set of conditions to predict locking/unlocking behaviour. The significant key model parameters determined by LR analysis are shown to improve the Sine-Gordon model’s ability to determine the locking/unlocking of magnetohydrodyamic (MHD) modes. The LR analysis of measured variables provides high confidence in anticipating locking versus unlocking of slinky mode proven by relational comparisons between simulations and the experimentally measured motion of the slinky mode in MST.展开更多
This study aims to analyze and predict the relationship between the average price per box in the cigarette market of City A and government procurement,providing a scientific basis and support for decision-making.By re...This study aims to analyze and predict the relationship between the average price per box in the cigarette market of City A and government procurement,providing a scientific basis and support for decision-making.By reviewing relevant theories and literature,qualitative prediction methods,regression prediction models,and other related theories were explored.Through the analysis of annual cigarette sales data and government procurement data in City A,a comprehensive understanding of the development of the tobacco industry and the economic trends of tobacco companies in the county was obtained.By predicting and analyzing the average price per box of cigarette sales across different years,corresponding prediction results were derived and compared with actual sales data.The prediction results indicate that the correlation coefficient between the average price per box of cigarette sales and government procurement is 0.982,implying that government procurement accounts for 96.4%of the changes in the average price per box of cigarettes.These findings offer an in-depth exploration of the relationship between the average price per box of cigarettes in City A and government procurement,providing a scientific foundation for corporate decision-making and market operations.展开更多
Firstly,based on the data of air quality and the meteorological data in Baoding City from 2017 to 2021,the correlations of meteorological elements and pollutants with O_(3)concentration were explored to determine the ...Firstly,based on the data of air quality and the meteorological data in Baoding City from 2017 to 2021,the correlations of meteorological elements and pollutants with O_(3)concentration were explored to determine the forecast factors of forecast models.Secondly,the O_(3)-8h concentration in Baoding City in 2021 was predicted based on the constructed models of multiple linear regression(MLR),backward propagation neural network(BPNN),and auto regressive integrated moving average(ARIMA),and the predicted values were compared with the observed values to test their prediction effects.The results show that overall,the MLR,BPNN and ARIMA models were able to forecast the changing trend of O_(3)-8h concentration in Baoding in 2021,but the BPNN model gave better forecast results than the ARIMA and MLR models,especially for the prediction of the high values of O_(3)-8h concentration,and the correlation coefficients between the predicted values and the observed values were all higher than 0.9 during June-September.The mean error(ME),mean absolute error(MAE),and root mean square error(RMSE)of the predicted values and the observed values of daily O_(3)-8h concentration based on the BPNN model were 0.45,19.11 and 24.41μg/m 3,respectively,which were significantly better than those of the MLR and ARIMA models.The prediction effects of the MLR,BPNN and ARIMA models were the best at the pollution level,followed by the excellent level,and it was the worst at the good level.In comparison,the prediction effect of BPNN model was better than that of the MLR and ARIMA models as a whole,especially for the pollution and excellent levels.The TS scores of the BPNN model were all above 66%,and the PC values were above 86%.The BPNN model can forecast the changing trend of O_(3)concentration more accurately,and has a good practical application value,but at the same time,the predicted high values of O_(3)concentration should be appropriately increased according to error characteristics of the model.展开更多
In recent years,there has been a concerted effort to improve anomaly detection tech-niques,particularly in the context of high-dimensional,distributed clinical data.Analysing patient data within clinical settings reve...In recent years,there has been a concerted effort to improve anomaly detection tech-niques,particularly in the context of high-dimensional,distributed clinical data.Analysing patient data within clinical settings reveals a pronounced focus on refining diagnostic accuracy,personalising treatment plans,and optimising resource allocation to enhance clinical outcomes.Nonetheless,this domain faces unique challenges,such as irregular data collection,inconsistent data quality,and patient-specific structural variations.This paper proposed a novel hybrid approach that integrates heuristic and stochastic methods for anomaly detection in patient clinical data to address these challenges.The strategy combines HPO-based optimal Density-Based Spatial Clustering of Applications with Noise for clustering patient exercise data,facilitating efficient anomaly identification.Subsequently,a stochastic method based on the Interquartile Range filters unreliable data points,ensuring that medical tools and professionals receive only the most pertinent and accurate information.The primary objective of this study is to equip healthcare pro-fessionals and researchers with a robust tool for managing extensive,high-dimensional clinical datasets,enabling effective isolation and removal of aberrant data points.Furthermore,a sophisticated regression model has been developed using Automated Machine Learning(AutoML)to assess the impact of the ensemble abnormal pattern detection approach.Various statistical error estimation techniques validate the efficacy of the hybrid approach alongside AutoML.Experimental results show that implementing this innovative hybrid model on patient rehabilitation data leads to a notable enhance-ment in AutoML performance,with an average improvement of 0.041 in the R2 score,surpassing the effectiveness of traditional regression models.展开更多
BACKGROUND Difficulty of colonoscopy insertion(DCI)significantly affects colonoscopy effectiveness and serves as a key quality indicator.Predicting and evaluating DCI risk preoperatively is crucial for optimizing intr...BACKGROUND Difficulty of colonoscopy insertion(DCI)significantly affects colonoscopy effectiveness and serves as a key quality indicator.Predicting and evaluating DCI risk preoperatively is crucial for optimizing intraoperative strategies.AIM To evaluate the predictive performance of machine learning(ML)algorithms for DCI by comparing three modeling approaches,identify factors influencing DCI,and develop a preoperative prediction model using ML algorithms to enhance colonoscopy quality and efficiency.METHODS This cross-sectional study enrolled 712 patients who underwent colonoscopy at a tertiary hospital between June 2020 and May 2021.Demographic data,past medical history,medication use,and psychological status were collected.The endoscopist assessed DCI using the visual analogue scale.After univariate screening,predictive models were developed using multivariable logistic regression,least absolute shrinkage and selection operator(LASSO)regression,and random forest(RF)algorithms.Model performance was evaluated based on discrimination,calibration,and decision curve analysis(DCA),and results were visualized using nomograms.RESULTS A total of 712 patients(53.8%male;mean age 54.5 years±12.9 years)were included.Logistic regression analysis identified constipation[odds ratio(OR)=2.254,95%confidence interval(CI):1.289-3.931],abdominal circumference(AC)(77.5–91.9 cm,OR=1.895,95%CI:1.065-3.350;AC≥92 cm,OR=1.271,95%CI:0.730-2.188),and anxiety(OR=1.071,95%CI:1.044-1.100)as predictive factors for DCI,validated by LASSO and RF methods.Model performance revealed training/validation sensitivities of 0.826/0.925,0.924/0.868,and 1.000/0.981;specificities of 0.602/0.511,0.510/0.562,and 0.977/0.526;and corresponding area under the receiver operating characteristic curves(AUCs)of 0.780(0.737-0.823)/0.726(0.654-0.799),0.754(0.710-0.798)/0.723(0.656-0.791),and 1.000(1.000-1.000)/0.754(0.688-0.820),respectively.DCA indicated optimal net benefit within probability thresholds of 0-0.9 and 0.05-0.37.The RF model demonstrated superior diagnostic accuracy,reflected by perfect training sensitivity(1.000)and highest validation AUC(0.754),outperforming other methods in clinical applicability.CONCLUSION The RF-based model exhibited superior predictive accuracy for DCI compared to multivariable logistic and LASSO regression models.This approach supports individualized preoperative optimization,enhancing colonoscopy quality through targeted risk stratification.展开更多
High-dimensional heterogeneous data have acquired increasing attention and discussion in the past decade.In the context of heterogeneity,semiparametric regression emerges as a popular method to model this type of data...High-dimensional heterogeneous data have acquired increasing attention and discussion in the past decade.In the context of heterogeneity,semiparametric regression emerges as a popular method to model this type of data in statistics.In this paper,we leverage the benefits of expectile regression for computational efficiency and analytical robustness in heterogeneity,and propose a regularized partially linear additive expectile regression model with a nonconvex penalty,such as SCAD or MCP,for high-dimensional heterogeneous data.We focus on a more realistic scenario where the regression error exhibits a heavy-tailed distribution with only finite moments.This scenario challenges the classical sub-gaussian distribution assumption and is more prevalent in practical applications.Under certain regular conditions,we demonstrate that with probability tending to one,the oracle estimator is one of the local minima of the induced optimization problem.Our theoretical analysis suggests that the dimensionality of linear covariates that our estimation procedure can handle is fundamentally limited by the moment condition of the regression error.Computationally,given the nonconvex and nonsmooth nature of the induced optimization problem,we have developed a two-step algorithm.Finally,our method’s effectiveness is demonstrated through its high estimation accuracy and effective model selection,as evidenced by Monte Carlo simulation studies and a real-data application.Furthermore,by taking various expectile weights,our method effectively detects heterogeneity and explores the complete conditional distribution of the response variable,underscoring its utility in analyzing high-dimensional heterogeneous data.展开更多
The increasing frequency of extreme weather events raises the likelihood of forest wildfires.Therefore,establishing an effective fire prediction model is vital for protecting human life and property,and the environmen...The increasing frequency of extreme weather events raises the likelihood of forest wildfires.Therefore,establishing an effective fire prediction model is vital for protecting human life and property,and the environment.This study aims to build a prediction model to understand the spatial characteristics and piecewise effects of forest fire drivers.Using monthly grid data from 2006 to 2020,a modeling study analyzed fire occurrences during the September to April fire season in Fujian Province,China.We compared the fitting performance of the logistic regression model(LRM),the generalized additive logistic model(GALM),and the spatial generalized additive logistic model(SGALM).The results indicate that SGALMs had the best fitting results and the highest prediction accuracy.Meteorological factors significantly impacted forest fires in Fujian Province.Areas with high fire incidence were mainly concentrated in the northwest and southeast.SGALMs improved the fitting effect of fire prediction models by considering spatial effects and the flexible fitting ability of nonlinear interpretation.This model provides piecewise interpretations of forest wildfire occurrences,which can be valuable for relevant departments and will assist forest managers in refining prevention measures based on temporal and spatial differences.展开更多
In the physical model test of landslides,the selection of analogous materials is the key,and it is difficult to consider the similarity of mechanical properties and seepage performance at the same time.To develop a mo...In the physical model test of landslides,the selection of analogous materials is the key,and it is difficult to consider the similarity of mechanical properties and seepage performance at the same time.To develop a model material suitable for analysing the deformation and failure of reservoir landslides,based on the existing research foundation of analogous materials,5 materials and 5 physical-mechanical parameters were selected to design an orthogonal test.The factor sensitivity of each component ratio and its influence on the physical-mechanical indices were studied by range analysis and stepwise regression analysis,and the proportioning method was determined.Finally,the model material was developed,and a model test was carried out considering Huangtupo as the prototype application.The results showed that(1)the model material composed of sand,barite powder,glass beads,clay,and bentonite had a wide distribution of physical-mechanical parameters,which could be applied to model tests under different conditions;(2)the physical-mechanical parameters of analogous materials matched the application prototype;and(3)the mechanical properties and seepage performance of the model material sample met the requirements of reservoir landslide model tests,which could be used to simulate landslide evolution and analyse the deformation process.展开更多
BACKGROUND Severe dengue children with critical complications have been attributed to high mortality rates,varying from approximately 1%to over 20%.To date,there is a lack of data on machine-learning-based algorithms ...BACKGROUND Severe dengue children with critical complications have been attributed to high mortality rates,varying from approximately 1%to over 20%.To date,there is a lack of data on machine-learning-based algorithms for predicting the risk of inhospital mortality in children with dengue shock syndrome(DSS).AIM To develop machine-learning models to estimate the risk of death in hospitalized children with DSS.METHODS This single-center retrospective study was conducted at tertiary Children’s Hospital No.2 in Viet Nam,between 2013 and 2022.The primary outcome was the in-hospital mortality rate in children with DSS admitted to the pediatric intensive care unit(PICU).Nine significant features were predetermined for further analysis using machine learning models.An oversampling method was used to enhance the model performance.Supervised models,including logistic regression,Naïve Bayes,Random Forest(RF),K-nearest neighbors,Decision Tree and Extreme Gradient Boosting(XGBoost),were employed to develop predictive models.The Shapley Additive Explanation was used to determine the degree of contribution of the features.RESULTS In total,1278 PICU-admitted children with complete data were included in the analysis.The median patient age was 8.1 years(interquartile range:5.4-10.7).Thirty-nine patients(3%)died.The RF and XGboost models demonstrated the highest performance.The Shapley Addictive Explanations model revealed that the most important predictive features included younger age,female patients,presence of underlying diseases,severe transaminitis,severe bleeding,low platelet counts requiring platelet transfusion,elevated levels of international normalized ratio,blood lactate and serum creatinine,large volume of resuscitation fluid and a high vasoactive inotropic score(>30).CONCLUSION We developed robust machine learning-based models to estimate the risk of death in hospitalized children with DSS.The study findings are applicable to the design of management schemes to enhance survival outcomes of patients with DSS.展开更多
基金Supported by the Natural Science Foundation of Fujian Province(2022J011177,2024J01903)the Key Project of Fujian Provincial Education Department(JZ230054)。
文摘In clinical research,subgroup analysis can help identify patient groups that respond better or worse to specific treatments,improve therapeutic effect and safety,and is of great significance in precision medicine.This article considers subgroup analysis methods for longitudinal data containing multiple covariates and biomarkers.We divide subgroups based on whether a linear combination of these biomarkers exceeds a predetermined threshold,and assess the heterogeneity of treatment effects across subgroups using the interaction between subgroups and exposure variables.Quantile regression is used to better characterize the global distribution of the response variable and sparsity penalties are imposed to achieve variable selection of covariates and biomarkers.The effectiveness of our proposed methodology for both variable selection and parameter estimation is verified through random simulations.Finally,we demonstrate the application of this method by analyzing data from the PA.3 trial,further illustrating the practicality of the method proposed in this paper.
基金supported by the National Natural Science Foundation of China(62375013).
文摘As the core component of inertial navigation systems, fiber optic gyroscope (FOG), with technical advantages such as low power consumption, long lifespan, fast startup speed, and flexible structural design, are widely used in aerospace, unmanned driving, and other fields. However, due to the temper-ature sensitivity of optical devices, the influence of environmen-tal temperature causes errors in FOG, thereby greatly limiting their output accuracy. This work researches on machine-learn-ing based temperature error compensation techniques for FOG. Specifically, it focuses on compensating for the bias errors gen-erated in the fiber ring due to the Shupe effect. This work pro-poses a composite model based on k-means clustering, sup-port vector regression, and particle swarm optimization algo-rithms. And it significantly reduced redundancy within the sam-ples by adopting the interval sequence sample. Moreover, met-rics such as root mean square error (RMSE), mean absolute error (MAE), bias stability, and Allan variance, are selected to evaluate the model’s performance and compensation effective-ness. This work effectively enhances the consistency between data and models across different temperature ranges and tem-perature gradients, improving the bias stability of the FOG from 0.022 °/h to 0.006 °/h. Compared to the existing methods utiliz-ing a single machine learning model, the proposed method increases the bias stability of the compensated FOG from 57.11% to 71.98%, and enhances the suppression of rate ramp noise coefficient from 2.29% to 14.83%. This work improves the accuracy of FOG after compensation, providing theoretical guid-ance and technical references for sensors error compensation work in other fields.
基金financed as part of the project“Development of a methodology for instrumental base formation for analysis and modeling of the spatial socio-economic development of systems based on internal reserves in the context of digitalization”(FSEG-2023-0008)funded by the Russian Science Foundation(Agreement 23-41-10001,https://doi.org/https://rscf.ru/project/23-41-10001/).
文摘The results of mass appraisal in many countries are used as a basis for calculating the amount of real estate tax,therefore,regardless of the methods used to calculate it,the resulting value should be as close as possible to the market value of the real estate to maintain a balance of interests between the state and the rights holders.In practice,this condition is not always met,since,firstly,the quality of market data is often very low,and secondly,some markets are characterized by low activity,which is expressed in a deficit of information on asking prices.The aim of the work is ecological valuation of land use:how regression-based mass appraisal can inform ecological conservation,land degradation,and sustainable land management.Four multiple regression models were constructed for AI generated map of land plots for recreational use in St.Petersburg(Russia)with different volumes of market information(32,30,20 and 15 units of market information with four price-forming factors).During the analysis of the quality of the models,it was revealed that the best result is shown by the model built on the maximum sample size,then the model based on 15 analogs,which proves that a larger number of analog objects does not always allow us to achieve better results,since the more analog objects there are.
基金supported in part by Sichuan Science and Technology Program under Grant No.2025ZNSFSC151in part by the Strategic Priority Research Program of Chinese Academy of Sciences under Grant No.XDA27030201+1 种基金the Natural Science Foundation of China under Grant No.U21B6001in part by the Natural Science Foundation of Tianjin under Grant No.24JCQNJC01930.
文摘The work proposes a distributed Kalman filtering(KF)algorithm to track a time-varying unknown signal process for a stochastic regression model over network systems in a cooperative way.We provide the stability analysis of the proposed distributed KF algorithm without independent and stationary signal assumptions,which implies that the theoretical results are able to be applied to stochastic feedback systems.Note that the main difficulty of stability analysis lies in analyzing the properties of the product of non-independent and non-stationary random matrices involved in the error equation.We employ analysis techniques such as stochastic Lyapunov function,stability theory of stochastic systems,and algebraic graph theory to deal with the above issue.The stochastic spatio-temporal cooperative information condition shows the cooperative property of multiple sensors that even though any local sensor cannot track the time-varying unknown signal,the distributed KF algorithm can be utilized to finish the filtering task in a cooperative way.At last,we illustrate the property of the proposed distributed KF algorithm by a simulation example.
文摘Purpose:The purpose of this study is to develop and compare model choice strategies in context of logistic regression.Model choice means the choice of the covariates to be included in the model.Design/methodology/approach:The study is based on Monte Carlo simulations.The methods are compared in terms of three measures of accuracy:specificity and two kinds of sensitivity.A loss function combining sensitivity and specificity is introduced and used for a final comparison.Findings:The choice of method depends on how much the users emphasize sensitivity against specificity.It also depends on the sample size.For a typical logistic regression setting with a moderate sample size and a small to moderate effect size,either BIC,BICc or Lasso seems to be optimal.Research limitations:Numerical simulations cannot cover the whole range of data-generating processes occurring with real-world data.Thus,more simulations are needed.Practical implications:Researchers can refer to these results if they believe that their data-generating process is somewhat similar to some of the scenarios presented in this paper.Alternatively,they could run their own simulations and calculate the loss function.Originality/value:This is a systematic comparison of model choice algorithms and heuristics in context of logistic regression.The distinction between two types of sensitivity and a comparison based on a loss function are methodological novelties.
基金This work was performed under the auspices of the National Nuclear Security Administration of the US Department of Energy at Los Alamos National Laboratory under Contract No.89233218CNA000001The Authors gratefully acknowledge the support of the US Department of Energy National Nuclear Security Administration Advanced Simulation and Computing Program.LA-UR-22-33159.
文摘To accurately model flows with shock waves using staggered-grid Lagrangian hydrodynamics, the artificial viscosity has to be introduced to convert kinetic energy into internal energy, thereby increasing the entropy across shocks. Determining the appropriate strength of the artificial viscosity is an art and strongly depends on the particular problem and experience of the researcher. The objective of this study is to pose the problem of finding the appropriate strength of the artificial viscosity as an optimization problem and solve this problem using machine learning (ML) tools, specifically using surrogate models based on Gaussian Process regression (GPR) and Bayesian analysis. We describe the optimization method and discuss various practical details of its implementation. The shock-containing problems for which we apply this method all have been implemented in the LANL code FLAG (Burton in Connectivity structures and differencing techniques for staggered-grid free-Lagrange hydrodynamics, Tech. Rep. UCRL-JC-110555, Lawrence Livermore National Laboratory, Livermore, CA, 1992, 1992, in Consistent finite-volume discretization of hydrodynamic conservation laws for unstructured grids, Tech. Rep. CRL-JC-118788, Lawrence Livermore National Laboratory, Livermore, CA, 1992, 1994, Multidimensional discretization of conservation laws for unstructured polyhedral grids, Tech. Rep. UCRL-JC-118306, Lawrence Livermore National Laboratory, Livermore, CA, 1992, 1994, in FLAG, a multi-dimensional, multiple mesh, adaptive free-Lagrange, hydrodynamics code. In: NECDC, 1992). First, we apply ML to find optimal values to isolated shock problems of different strengths. Second, we apply ML to optimize the viscosity for a one-dimensional (1D) propagating detonation problem based on Zel’dovich-von Neumann-Doring (ZND) (Fickett and Davis in Detonation: theory and experiment. Dover books on physics. Dover Publications, Mineola, 2000) detonation theory using a reactive burn model. We compare results for default (currently used values in FLAG) and optimized values of the artificial viscosity for these problems demonstrating the potential for significant improvement in the accuracy of computations.
文摘Carbon emissions have become a critical concern in the global effort to combat climate change,with each country or region contributing differently based on its economic structures,energy sources,and industrial activities.The factors influencing carbon emissions vary across countries and sectors.This study examined the factors influencing CO_(2)emissions in the 7 South American countries including Argentina,Brazil,Chile,Colombia,Ecuador,Peru,and Venezuela.We used the Seemingly Unrelated Regression(SUR)model to analyse the relationship of CO_(2)emissions with gross domestic product(GDP),renewable energy use,urbanization,industrialization,international tourism,agricultural productivity,and forest area based on data from 2000 to 2022.According to the SUR model,we found that GDP and industrialization had a moderate positive effect on CO_(2)emissions,whereas renewable energy use had a moderate negative effect on CO_(2)emissions.International tourism generally had a positive impact on CO_(2)emissions,while forest area tended to decrease CO_(2)emissions.Different variables had different effects on CO_(2)emissions in the 7 South American countries.In Argentina and Venezuela,GDP,international tourism,and agricultural productivity significantly affected CO_(2)emissions.In Colombia,GDP and international tourism had a negative impact on CO_(2)emissions.In Brazil,CO_(2)emissions were primarily driven by GDP,while in Chile,Ecuador,and Peru,international tourism had a negative effect on CO_(2)emissions.Overall,this study highlights the importance of country-specific strategies for reducing CO_(2)emissions and emphasizes the varying roles of these driving factors in shaping environmental quality in the 7 South American countries.
文摘BACKGROUND The spread of the severe acute respiratory syndrome coronavirus 2 outbreak worldwide has caused concern regarding the mortality rate caused by the infection.The determinants of mortality on a global scale cannot be fully understood due to lack of information.AIM To identify key factors that may explain the variability in case lethality across countries.METHODS We identified 21 Potential risk factors for coronavirus disease 2019(COVID-19)case fatality rate for all the countries with available data.We examined univariate relationships of each variable with case fatality rate(CFR),and all independent variables to identify candidate variables for our final multiple model.Multiple regression analysis technique was used to assess the strength of relationship.RESULTS The mean of COVID-19 mortality was 1.52±1.72%.There was a statistically significant inverse correlation between health expenditure,and number of computed tomography scanners per 1 million with CFR,and significant direct correlation was found between literacy,and air pollution with CFR.This final model can predict approximately 97%of the changes in CFR.CONCLUSION The current study recommends some new predictors explaining affect mortality rate.Thus,it could help decision-makers develop health policies to fight COVID-19.
文摘As maritime activities increase globally,there is a greater dependency on technology in monitoring,control,and surveillance of vessel activity.One of the most prominent systems for monitoring vessel activity is the Automatic Identification System(AIS).An increase in both vessels fitted with AIS transponders and satellite and terrestrial AIS receivers has resulted in a significant increase in AIS messages received globally.This resultant rich spatial and temporal data source related to vessel activity provides analysts with the ability to perform enhanced vessel movement analytics,of which a pertinent example is the improvement of vessel location predictions.In this paper,we propose a novel strategy for predicting future locations of vessels making use of historic AIS data.The proposed method uses a Linear Regression Model(LRM)and utilizes historic AIS movement data in the form of a-priori generated spatial maps of the course over ground(LRMAC).The LRMAC is an accurate low complexity first-order method that is easy to implement operationally and shows promising results in areas where there is a consistency in the directionality of historic vessel movement.In areas where the historic directionality of vessel movement is diverse,such as areas close to harbors and ports,the LRMAC defaults to the LRM.The proposed LRMAC method is compared to the Single-Point Neighbor Search(SPNS),which is also a first-order method and has a similar level of computational complexity,and for the use case of predicting tanker and cargo vessel trajectories up to 8 hours into the future,the LRMAC showed improved results both in terms of prediction accuracy and execution time.
文摘Possible changes in the structure and seasonal variability of the subtropical ridge may lead to changes in the rainfall’s variability modes over Caribbean region. This generates additional difficulties around water resource planning, therefore, obtaining seasonal prediction models that allow these variations to be characterized in detail, it’s a concern, specially for island states. This research proposes the construction of statistical-dynamic models based on PCA regression methods. It is used as predictand the monthly precipitation accumulated, while the predictors (6) are extracted from the ECMWF-SEAS5 ensemble mean forecasts with a lag of one month with respect to the target month. In the construction of the models, two sequential training schemes are evaluated, obtaining that only the shorter preserves the seasonal characteristics of the predictand. The evaluation metrics used, where cell-point and dichotomous methodologies are combined, suggest that the predictors related to sea surface temperatures do not adequately represent the seasonal variability of the predictand, however, others such as the temperature at 850 hPa and the Outgoing Longwave Radiation are represented with a good approximation regardless of the model chosen. In this sense, the models built with the nearest neighbor methodology were the most efficient. Using the individual models with the best results, an ensemble is built that allows improving the individual skill of the models selected as members by correcting the underestimation of precipitation in the dynamic model during the wet season, although problems of overestimation persist for thresholds lower than 50 mm.
文摘Efficient water quality monitoring and ensuring the safety of drinking water by government agencies in areas where the resource is constantly depleted due to anthropogenic or natural factors cannot be overemphasized. The above statement holds for West Texas, Midland, and Odessa Precisely. Two machine learning regression algorithms (Random Forest and XGBoost) were employed to develop models for the prediction of total dissolved solids (TDS) and sodium absorption ratio (SAR) for efficient water quality monitoring of two vital aquifers: Edward-Trinity (plateau), and Ogallala aquifers. These two aquifers have contributed immensely to providing water for different uses ranging from domestic, agricultural, industrial, etc. The data was obtained from the Texas Water Development Board (TWDB). The XGBoost and Random Forest models used in this study gave an accurate prediction of observed data (TDS and SAR) for both the Edward-Trinity (plateau) and Ogallala aquifers with the R<sup>2</sup> values consistently greater than 0.83. The Random Forest model gave a better prediction of TDS and SAR concentration with an average R, MAE, RMSE and MSE of 0.977, 0.015, 0.029 and 0.00, respectively. For the XGBoost, an average R, MAE, RMSE, and MSE of 0.953, 0.016, 0.037 and 0.00, respectively, were achieved. The overall performance of the models produced was impressive. From this study, we can clearly understand that Random Forest and XGBoost are appropriate for water quality prediction and monitoring in an area of high hydrocarbon activities like Midland and Odessa and West Texas at large.
文摘In this paper, a logistical regression statistical analysis (LR) is presented for a set of variables used in experimental measurements in reversed field pinch (RFP) machines, commonly known as “slinky mode” (SM), observed to travel around the torus in Madison Symmetric Torus (MST). The LR analysis is used to utilize the modified Sine-Gordon dynamic equation model to predict with high confidence whether the slinky mode will lock or not lock when compared to the experimentally measured motion of the slinky mode. It is observed that under certain conditions, the slinky mode “locks” at or near the intersection of poloidal and/or toroidal gaps in MST. However, locked mode cease to travel around the torus;while unlocked mode keeps traveling without a change in the energy, making it hard to determine an exact set of conditions to predict locking/unlocking behaviour. The significant key model parameters determined by LR analysis are shown to improve the Sine-Gordon model’s ability to determine the locking/unlocking of magnetohydrodyamic (MHD) modes. The LR analysis of measured variables provides high confidence in anticipating locking versus unlocking of slinky mode proven by relational comparisons between simulations and the experimentally measured motion of the slinky mode in MST.
基金National Social Science Fund Project“Research on the Operational Risks and Prevention of Government Procurement of Community Services Project System”(Project No.21CSH018)Research and Application of SDM Cigarette Supply Strategy Based on Consumer Data Analysis(Project No.2023ASXM07)。
文摘This study aims to analyze and predict the relationship between the average price per box in the cigarette market of City A and government procurement,providing a scientific basis and support for decision-making.By reviewing relevant theories and literature,qualitative prediction methods,regression prediction models,and other related theories were explored.Through the analysis of annual cigarette sales data and government procurement data in City A,a comprehensive understanding of the development of the tobacco industry and the economic trends of tobacco companies in the county was obtained.By predicting and analyzing the average price per box of cigarette sales across different years,corresponding prediction results were derived and compared with actual sales data.The prediction results indicate that the correlation coefficient between the average price per box of cigarette sales and government procurement is 0.982,implying that government procurement accounts for 96.4%of the changes in the average price per box of cigarettes.These findings offer an in-depth exploration of the relationship between the average price per box of cigarettes in City A and government procurement,providing a scientific foundation for corporate decision-making and market operations.
基金the Project of the Key Open Laboratory of Atmospheric Detection,China Meteorological Administration(2023KLAS02M)the Second Batch of Science and Technology Project of China Meteorological Administration("Jiebangguashuai"):the Research and Development of Short-term and Near-term Warning Products for Severe Convective Weather in Beijing-Tianjin-Hebei Region(CMAJBGS202307).
文摘Firstly,based on the data of air quality and the meteorological data in Baoding City from 2017 to 2021,the correlations of meteorological elements and pollutants with O_(3)concentration were explored to determine the forecast factors of forecast models.Secondly,the O_(3)-8h concentration in Baoding City in 2021 was predicted based on the constructed models of multiple linear regression(MLR),backward propagation neural network(BPNN),and auto regressive integrated moving average(ARIMA),and the predicted values were compared with the observed values to test their prediction effects.The results show that overall,the MLR,BPNN and ARIMA models were able to forecast the changing trend of O_(3)-8h concentration in Baoding in 2021,but the BPNN model gave better forecast results than the ARIMA and MLR models,especially for the prediction of the high values of O_(3)-8h concentration,and the correlation coefficients between the predicted values and the observed values were all higher than 0.9 during June-September.The mean error(ME),mean absolute error(MAE),and root mean square error(RMSE)of the predicted values and the observed values of daily O_(3)-8h concentration based on the BPNN model were 0.45,19.11 and 24.41μg/m 3,respectively,which were significantly better than those of the MLR and ARIMA models.The prediction effects of the MLR,BPNN and ARIMA models were the best at the pollution level,followed by the excellent level,and it was the worst at the good level.In comparison,the prediction effect of BPNN model was better than that of the MLR and ARIMA models as a whole,especially for the pollution and excellent levels.The TS scores of the BPNN model were all above 66%,and the PC values were above 86%.The BPNN model can forecast the changing trend of O_(3)concentration more accurately,and has a good practical application value,but at the same time,the predicted high values of O_(3)concentration should be appropriately increased according to error characteristics of the model.
文摘In recent years,there has been a concerted effort to improve anomaly detection tech-niques,particularly in the context of high-dimensional,distributed clinical data.Analysing patient data within clinical settings reveals a pronounced focus on refining diagnostic accuracy,personalising treatment plans,and optimising resource allocation to enhance clinical outcomes.Nonetheless,this domain faces unique challenges,such as irregular data collection,inconsistent data quality,and patient-specific structural variations.This paper proposed a novel hybrid approach that integrates heuristic and stochastic methods for anomaly detection in patient clinical data to address these challenges.The strategy combines HPO-based optimal Density-Based Spatial Clustering of Applications with Noise for clustering patient exercise data,facilitating efficient anomaly identification.Subsequently,a stochastic method based on the Interquartile Range filters unreliable data points,ensuring that medical tools and professionals receive only the most pertinent and accurate information.The primary objective of this study is to equip healthcare pro-fessionals and researchers with a robust tool for managing extensive,high-dimensional clinical datasets,enabling effective isolation and removal of aberrant data points.Furthermore,a sophisticated regression model has been developed using Automated Machine Learning(AutoML)to assess the impact of the ensemble abnormal pattern detection approach.Various statistical error estimation techniques validate the efficacy of the hybrid approach alongside AutoML.Experimental results show that implementing this innovative hybrid model on patient rehabilitation data leads to a notable enhance-ment in AutoML performance,with an average improvement of 0.041 in the R2 score,surpassing the effectiveness of traditional regression models.
基金the Chinese Clinical Trial Registry(No.ChiCTR2000040109)approved by the Hospital Ethics Committee(No.20210130017).
文摘BACKGROUND Difficulty of colonoscopy insertion(DCI)significantly affects colonoscopy effectiveness and serves as a key quality indicator.Predicting and evaluating DCI risk preoperatively is crucial for optimizing intraoperative strategies.AIM To evaluate the predictive performance of machine learning(ML)algorithms for DCI by comparing three modeling approaches,identify factors influencing DCI,and develop a preoperative prediction model using ML algorithms to enhance colonoscopy quality and efficiency.METHODS This cross-sectional study enrolled 712 patients who underwent colonoscopy at a tertiary hospital between June 2020 and May 2021.Demographic data,past medical history,medication use,and psychological status were collected.The endoscopist assessed DCI using the visual analogue scale.After univariate screening,predictive models were developed using multivariable logistic regression,least absolute shrinkage and selection operator(LASSO)regression,and random forest(RF)algorithms.Model performance was evaluated based on discrimination,calibration,and decision curve analysis(DCA),and results were visualized using nomograms.RESULTS A total of 712 patients(53.8%male;mean age 54.5 years±12.9 years)were included.Logistic regression analysis identified constipation[odds ratio(OR)=2.254,95%confidence interval(CI):1.289-3.931],abdominal circumference(AC)(77.5–91.9 cm,OR=1.895,95%CI:1.065-3.350;AC≥92 cm,OR=1.271,95%CI:0.730-2.188),and anxiety(OR=1.071,95%CI:1.044-1.100)as predictive factors for DCI,validated by LASSO and RF methods.Model performance revealed training/validation sensitivities of 0.826/0.925,0.924/0.868,and 1.000/0.981;specificities of 0.602/0.511,0.510/0.562,and 0.977/0.526;and corresponding area under the receiver operating characteristic curves(AUCs)of 0.780(0.737-0.823)/0.726(0.654-0.799),0.754(0.710-0.798)/0.723(0.656-0.791),and 1.000(1.000-1.000)/0.754(0.688-0.820),respectively.DCA indicated optimal net benefit within probability thresholds of 0-0.9 and 0.05-0.37.The RF model demonstrated superior diagnostic accuracy,reflected by perfect training sensitivity(1.000)and highest validation AUC(0.754),outperforming other methods in clinical applicability.CONCLUSION The RF-based model exhibited superior predictive accuracy for DCI compared to multivariable logistic and LASSO regression models.This approach supports individualized preoperative optimization,enhancing colonoscopy quality through targeted risk stratification.
基金Supported by the Hangzhou Joint Fund of the Zhejiang Provincial Natural Science Foundation of Chi-na(LHZY24A010002)the MOE Project of Humanities and Social Sciences(21YJCZH235).
文摘High-dimensional heterogeneous data have acquired increasing attention and discussion in the past decade.In the context of heterogeneity,semiparametric regression emerges as a popular method to model this type of data in statistics.In this paper,we leverage the benefits of expectile regression for computational efficiency and analytical robustness in heterogeneity,and propose a regularized partially linear additive expectile regression model with a nonconvex penalty,such as SCAD or MCP,for high-dimensional heterogeneous data.We focus on a more realistic scenario where the regression error exhibits a heavy-tailed distribution with only finite moments.This scenario challenges the classical sub-gaussian distribution assumption and is more prevalent in practical applications.Under certain regular conditions,we demonstrate that with probability tending to one,the oracle estimator is one of the local minima of the induced optimization problem.Our theoretical analysis suggests that the dimensionality of linear covariates that our estimation procedure can handle is fundamentally limited by the moment condition of the regression error.Computationally,given the nonconvex and nonsmooth nature of the induced optimization problem,we have developed a two-step algorithm.Finally,our method’s effectiveness is demonstrated through its high estimation accuracy and effective model selection,as evidenced by Monte Carlo simulation studies and a real-data application.Furthermore,by taking various expectile weights,our method effectively detects heterogeneity and explores the complete conditional distribution of the response variable,underscoring its utility in analyzing high-dimensional heterogeneous data.
基金supported by the Fujian Provincial Science and Technology Program“University-Industry Cooperation Project”(2024Y4015)National Key R&D Plan of Strategic International Scientific and Technological Innovation Cooperation Project(2018YFE0207800).
文摘The increasing frequency of extreme weather events raises the likelihood of forest wildfires.Therefore,establishing an effective fire prediction model is vital for protecting human life and property,and the environment.This study aims to build a prediction model to understand the spatial characteristics and piecewise effects of forest fire drivers.Using monthly grid data from 2006 to 2020,a modeling study analyzed fire occurrences during the September to April fire season in Fujian Province,China.We compared the fitting performance of the logistic regression model(LRM),the generalized additive logistic model(GALM),and the spatial generalized additive logistic model(SGALM).The results indicate that SGALMs had the best fitting results and the highest prediction accuracy.Meteorological factors significantly impacted forest fires in Fujian Province.Areas with high fire incidence were mainly concentrated in the northwest and southeast.SGALMs improved the fitting effect of fire prediction models by considering spatial effects and the flexible fitting ability of nonlinear interpretation.This model provides piecewise interpretations of forest wildfire occurrences,which can be valuable for relevant departments and will assist forest managers in refining prevention measures based on temporal and spatial differences.
基金supported by the Major Program of the National Natural Science Foundation of China(No.42090054)the National Key Scientific Instrument and Equipment Development Projects of China(No.41827808)+1 种基金the Major Program of the National Natural Science Foundation of China(No.42090055)the National Science Foundation of China(No.42107194)。
文摘In the physical model test of landslides,the selection of analogous materials is the key,and it is difficult to consider the similarity of mechanical properties and seepage performance at the same time.To develop a model material suitable for analysing the deformation and failure of reservoir landslides,based on the existing research foundation of analogous materials,5 materials and 5 physical-mechanical parameters were selected to design an orthogonal test.The factor sensitivity of each component ratio and its influence on the physical-mechanical indices were studied by range analysis and stepwise regression analysis,and the proportioning method was determined.Finally,the model material was developed,and a model test was carried out considering Huangtupo as the prototype application.The results showed that(1)the model material composed of sand,barite powder,glass beads,clay,and bentonite had a wide distribution of physical-mechanical parameters,which could be applied to model tests under different conditions;(2)the physical-mechanical parameters of analogous materials matched the application prototype;and(3)the mechanical properties and seepage performance of the model material sample met the requirements of reservoir landslide model tests,which could be used to simulate landslide evolution and analyse the deformation process.
文摘BACKGROUND Severe dengue children with critical complications have been attributed to high mortality rates,varying from approximately 1%to over 20%.To date,there is a lack of data on machine-learning-based algorithms for predicting the risk of inhospital mortality in children with dengue shock syndrome(DSS).AIM To develop machine-learning models to estimate the risk of death in hospitalized children with DSS.METHODS This single-center retrospective study was conducted at tertiary Children’s Hospital No.2 in Viet Nam,between 2013 and 2022.The primary outcome was the in-hospital mortality rate in children with DSS admitted to the pediatric intensive care unit(PICU).Nine significant features were predetermined for further analysis using machine learning models.An oversampling method was used to enhance the model performance.Supervised models,including logistic regression,Naïve Bayes,Random Forest(RF),K-nearest neighbors,Decision Tree and Extreme Gradient Boosting(XGBoost),were employed to develop predictive models.The Shapley Additive Explanation was used to determine the degree of contribution of the features.RESULTS In total,1278 PICU-admitted children with complete data were included in the analysis.The median patient age was 8.1 years(interquartile range:5.4-10.7).Thirty-nine patients(3%)died.The RF and XGboost models demonstrated the highest performance.The Shapley Addictive Explanations model revealed that the most important predictive features included younger age,female patients,presence of underlying diseases,severe transaminitis,severe bleeding,low platelet counts requiring platelet transfusion,elevated levels of international normalized ratio,blood lactate and serum creatinine,large volume of resuscitation fluid and a high vasoactive inotropic score(>30).CONCLUSION We developed robust machine learning-based models to estimate the risk of death in hospitalized children with DSS.The study findings are applicable to the design of management schemes to enhance survival outcomes of patients with DSS.