On the basis of experimental observations on animals, applications to clinical data on patients and theoretical statistical reasoning, the author developed a com-puter-assisted general mathematical model of the ‘prob...On the basis of experimental observations on animals, applications to clinical data on patients and theoretical statistical reasoning, the author developed a com-puter-assisted general mathematical model of the ‘probacent’-probability equation, Equation (1) and death rate (mortality probability) equation, Equation (2) derivable from Equation (1) that may be applica-ble as a general approximation method to make use-ful predictions of probable outcomes in a variety of biomedical phenomena [1-4]. Equations (1) and (2) contain a constant, γ and c, respectively. In the pre-vious studies, the author used the least maximum- difference principle to determine these constants that were expected to best fit reported data, minimizing the deviation. In this study, the author uses the method of computer-assisted least sum of squares to determine the constants, γ and c in constructing the ‘probacent’-related formulas best fitting the NCHS- reported data on survival probabilities and death rates in the US total adult population for 2001. The results of this study reveal that the method of com-puter-assisted mathematical analysis with the least sum of squares seems to be simple, more accurate, convenient and preferable than the previously used least maximum-difference principle, and better fit-ting the NCHS-reported data on survival probabili-ties and death rates in the US total adult population. The computer program of curved regression for the ‘probacent’-probability and death rate equations may be helpful in research in biomedicine.展开更多
Multinomial logistic regression (MNL) is an attractive statistical approach in modeling the vehicle crash severity as it does not require the assumption of normality, linearity, or homoscedasticity compared to other a...Multinomial logistic regression (MNL) is an attractive statistical approach in modeling the vehicle crash severity as it does not require the assumption of normality, linearity, or homoscedasticity compared to other approaches, such as the discriminant analysis which requires these assumptions to be met. Moreover, it produces sound estimates by changing the probability range between 0.0 and 1.0 to log odds ranging from negative infinity to positive infinity, as it applies transformation of the dependent variable to a continuous variable. The estimates are asymptotically consistent with the requirements of the nonlinear regression process. The results of MNL can be interpreted by both the regression coefficient estimates and/or the odd ratios (the exponentiated coefficients) as well. In addition, the MNL can be used to improve the fitted model by comparing the full model that includes all predictors to a chosen restricted model by excluding the non-significant predictors. As such, this paper presents a detailed step by step overview of incorporating the MNL in crash severity modeling, using vehicle crash data of the Interstate I70 in the State of Missouri, USA for the years (2013-2015).展开更多
To cater the need for real-time crack monitoring of infrastructural facilities,a CNN-regression model is proposed to directly estimate the crack properties from patches.RGB crack images and their corresponding masks o...To cater the need for real-time crack monitoring of infrastructural facilities,a CNN-regression model is proposed to directly estimate the crack properties from patches.RGB crack images and their corresponding masks obtained from a public dataset are cropped into patches of 256 square pixels that are classified with a pre-trained deep convolution neural network,the true positives are segmented,and crack properties are extracted using two different methods.The first method is primarily based on active contour models and level-set segmentation and the second method consists of the domain adaptation of a mathematical morphology-based method known as FIL-FINDER.A statistical test has been performed for the comparison of the stated methods and a database prepared with the more suitable method.An advanced convolution neural network-based multi-output regression model has been proposed which was trained with the prepared database and validated with the held-out dataset for the prediction of crack-length,crack-width,and width-uncertainty directly from input image patches.The pro-posed model has been tested on crack patches collected from different locations.Huber loss has been used to ensure the robustness of the proposed model selected from a set of 288 different variations of it.Additionally,an ablation study has been conducted on the top 3 models that demonstrated the influence of each network component on the pre-diction results.Finally,the best performing model HHc-X among the top 3 has been proposed that predicted crack properties which are in close agreement to the ground truths in the test data.展开更多
Satellite-based precipitation products have been widely used to estimate precipitation, especially over regions with sparse rain gauge networks. However, the low spatial resolution of these products has limited their ...Satellite-based precipitation products have been widely used to estimate precipitation, especially over regions with sparse rain gauge networks. However, the low spatial resolution of these products has limited their application in localized regions and watersheds.This study investigated a spatial downscaling approach, Geographically Weighted Regression Kriging(GWRK), to downscale the Tropical Rainfall Measuring Mission(TRMM) 3 B43 Version 7 over the Lancang River Basin(LRB) for 2001–2015. Downscaling was performed based on the relationships between the TRMM precipitation and the Normalized Difference Vegetation Index(NDVI), the Land Surface Temperature(LST), and the Digital Elevation Model(DEM). Geographical ratio analysis(GRA) was used to calibrate the annual downscaled precipitation data, and the monthly fractions derived from the original TRMM data were used to disaggregate annual downscaled and calibrated precipitation to monthly precipitation at 1 km resolution. The final downscaled precipitation datasets were validated against station-based observed precipitation in 2001–2015. Results showed that: 1) The TRMM 3 B43 precipitation was highly accurate with slight overestimation at the basin scale(i.e., CC(correlation coefficient) = 0.91, Bias = 13.3%). Spatially, the accuracies of the upstream and downstream regions were higher than that of the midstream region. 2) The annual downscaled TRMM precipitation data at 1 km spatial resolution obtained by GWRK effectively captured the high spatial variability of precipitation over the LRB. 3) The annual downscaled TRMM precipitation with GRA calibration gave better accuracy compared with the original TRMM dataset. 4) The final downscaled and calibrated precipitation had significantly improved spatial resolution, and agreed well with data from the validated rain gauge stations, i.e., CC = 0.75, RMSE(root mean square error) = 182 mm, MAE(mean absolute error) = 142 mm, and Bias = 0.78%for annual precipitation and CC = 0.95, RMSE = 25 mm, MAE = 16 mm, and Bias = 0.67% for monthly precipitation.展开更多
This study explored and reviewed the logistic regression (LR) model, a multivariable method for modeling the relationship between multiple independent variables and a categorical dependent variable, with emphasis on m...This study explored and reviewed the logistic regression (LR) model, a multivariable method for modeling the relationship between multiple independent variables and a categorical dependent variable, with emphasis on medical research. Thirty seven research articles published between 2000 and 2018 which employed logistic regression as the main statistical tool as well as six text books on logistic regression were reviewed. Logistic regression concepts such as odds, odds ratio, logit transformation, logistic curve, assumption, selecting dependent and independent variables, model fitting, reporting and interpreting were presented. Upon perusing the literature, considerable deficiencies were found in both the use and reporting of LR. For many studies, the ratio of the number of outcome events to predictor variables (events per variable) was sufficiently small to call into question the accuracy of the regression model. Also, most studies did not report on validation analysis, regression diagnostics or goodness-of-fit measures;measures which authenticate the robustness of the LR model. Here, we demonstrate a good example of the application of the LR model using data obtained on a cohort of pregnant women and the factors that influence their decision to opt for caesarean delivery or vaginal birth. It is recommended that researchers should be more rigorous and pay greater attention to guidelines concerning the use and reporting of LR models.展开更多
BACKGROUND Paternal perinatal depression(PPD)is closely associated with maternal mental health challenges,marital strain,and adverse child developmental outcomes.Despite its significant impact,PPD remains under-recogn...BACKGROUND Paternal perinatal depression(PPD)is closely associated with maternal mental health challenges,marital strain,and adverse child developmental outcomes.Despite its significant impact,PPD remains under-recognized in family-centered clinical practice.Concurrently,against the backdrop of rising rates of delayed marriage and China’s Maternity Incentive Policy,the proportion of women giving birth at an advanced maternal age is increasing.Nevertheless,research specifically examining PPD among spouses of older mothers remains critically scarce,both in China and globally.AIM To investigate PPD and its influencing factors in Chinese advanced maternal age families.METHODS This cross-sectional study included 358 participants;it was conducted among fathers of pregnant women of advanced maternal age at five hospitals in the Pearl River Delta region of China from September 2023 to June 2024.Data were collected via a general information questionnaire,the Social Support Rating Scale,and the Edinburgh Postnatal Depression Scale.Latent profile analysis and regression mixture models(RMMs)were adopted to analyze the latent PPD types and factors that influenced PPD.RESULTS The incidence of PPD was 16.48%,and three profiles were identified:Low-symptomatic(175 cases,48.89%),monophasic(140 cases,39.10%),and high-symptomatic(43 cases,12.01%).The RMM analysis revealed that first pregnancy,low income(<¥3000/month),part-time work,and a history of abnormal pregnancy were positively associated with the high-symptomatic type(P<0.05).Conversely,high subjective support and support utilization were negatively associated with the high-symptomatic type compared with the low-symptomatic type(P<0.05).Good couple relationships,high objective and subjective support,and high support utilization were negatively associated with monophasic disorder(P<0.05).CONCLUSION PPD incidence is high among Chinese fathers with advanced maternal age partners,and the characteristics of depression are varied.Healthcare practitioners should prioritize individuals with low levels of social support.展开更多
Bangladesh is a subtropical monsoon climate characterized by wide seasonal variations in rainfall, moderately warm temperatures, and high humidity. Rainfall is the main source of irrigation water everywhere in the Ban...Bangladesh is a subtropical monsoon climate characterized by wide seasonal variations in rainfall, moderately warm temperatures, and high humidity. Rainfall is the main source of irrigation water everywhere in the Bangladesh where the inhabitants derive their income primarily from farming. Stochastic rainfall models were concerned with the occurrence of wet day and depth of rainfall for different regions to model the daily occurrence of rainfall and achieved satisfactory results around the world. In connection to the Markov chain of different order, logistic regression is conducted to visualize the dependence of current rainfall upon the rainfall of previous two-time period. It had been shown that wet day of the previous two time period compared to the dry day of previous two time period influences positively the wet day of current time period, that is the dependency of dry-wet spell for the occurrence of rain in the rainy season from April to September in the study area. Daily data are collected from meteorological department of about 26 years on rainfall of Dhaka station during the period January 1985-August 2011 to conduct the study. The test result shows that the occurrence of rainfall follows a second order Markov chain and logistic regression also tells that dry followed by dry and wet followed by wet is more likely for the rainfall of Dhaka station and also the model could perform adequately for many applications of rainfall data satisfactorily.展开更多
Least Absolute Shrinkage and Selection Operator (LASSO) is used for variable selection as well as for handling the multicollinearity problem simultaneously in the linear regression model. LASSO produces estimates havi...Least Absolute Shrinkage and Selection Operator (LASSO) is used for variable selection as well as for handling the multicollinearity problem simultaneously in the linear regression model. LASSO produces estimates having high variance if the number of predictors is higher than the number of observations and if high multicollinearity exists among the predictor variables. To handle this problem, Elastic Net (ENet) estimator was introduced by combining LASSO and Ridge estimator (RE). The solutions of LASSO and ENet have been obtained using Least Angle Regression (LARS) and LARS-EN algorithms, respectively. In this article, we proposed an alternative algorithm to overcome the issues in LASSO that can be combined LASSO with other exiting biased estimators namely Almost Unbiased Ridge Estimator (AURE), Liu Estimator (LE), Almost Unbiased Liu Estimator (AULE), Principal Component Regression Estimator (PCRE), r-k class estimator and r-d class estimator. Further, we examine the performance of the proposed algorithm using a Monte-Carlo simulation study and real-world examples. The results showed that the LARS-rk and LARS-rd algorithms,?which are combined LASSO with r-k class estimator and r-d class estimator,?outperformed other algorithms under the moderated and severe multicollinearity.展开更多
As optimization of parameters affects prediction accuracy and generalization ability of support vector regression(SVR) greatly and the predictive model often mismatches nonlinear system model predictive control,a mult...As optimization of parameters affects prediction accuracy and generalization ability of support vector regression(SVR) greatly and the predictive model often mismatches nonlinear system model predictive control,a multi-step model predictive control based on online SVR(OSVR) optimized by multi-agent particle swarm optimization algorithm(MAPSO) is put forward. By integrating the online learning ability of OSVR, the predictive model can self-correct and adapt to the dynamic changes in nonlinear process well.展开更多
The approaches to discrete approximation of Pareto front using multi-objective evolutionary algorithms have the problems of heavy computation burden, long running time and missing Pareto optimal points. In order to ov...The approaches to discrete approximation of Pareto front using multi-objective evolutionary algorithms have the problems of heavy computation burden, long running time and missing Pareto optimal points. In order to overcome these problems, an approach to continuous approximation of Pareto front using geometric support vector regression is presented. The regression model of the small size approximate discrete Pareto front is constructed by geometric support vector regression modeling and is described as the approximate continuous Pareto front. In the process of geometric support vector regression modeling, considering the distribution characteristic of Pareto optimal points, the separable augmented training sample sets are constructed by shifting original training sample points along multiple coordinated axes. Besides, an interactive decision-making(DM)procedure, in which the continuous approximation of Pareto front and decision-making is performed interactively, is designed for improving the accuracy of the preferred Pareto optimal point. The correctness of the continuous approximation of Pareto front is demonstrated with a typical multi-objective optimization problem. In addition,combined with the interactive decision-making procedure, the continuous approximation of Pareto front is applied in the multi-objective optimization for an industrial fed-batch yeast fermentation process. The experimental results show that the generated approximate continuous Pareto front has good accuracy and completeness. Compared with the multi-objective evolutionary algorithm with large size population, a more accurate preferred Pareto optimal point can be obtained from the approximate continuous Pareto front with less computation and shorter running time. The operation strategy corresponding to the final preferred Pareto optimal point generated by the interactive DM procedure can improve the production indexes of the fermentation process effectively.展开更多
Fuzzy regression analysis is an important regression analysis method to predict uncertain information in the real world. In this paper, the input data are crisp with randomness;the output data are trapezoid fuzzy numb...Fuzzy regression analysis is an important regression analysis method to predict uncertain information in the real world. In this paper, the input data are crisp with randomness;the output data are trapezoid fuzzy number, and three different risk preferences and chaos optimization algorithm are introduced to establish fuzzy regression model. On the basis of the principle of the minimum total spread between the observed and the estimated values, risk-neutral, risk-averse, and risk-seeking fuzzy regression model are developed to obtain the parameters of fuzzy linear regression model. Chaos optimization algorithm is used to determine the digital characteristic of random variables. The mean absolute percentage error and variance of errors are adopted to compare the modeling results. A stock rating case is used to evaluate the fuzzy regression models. The comparisons with five existing methods show that our proposed method has satisfactory performance.展开更多
We construct a fuzzy varying coefficient bilinear regression model to deal with the interval financial data and then adopt the least-squares method based on symmetric fuzzy number space. Firstly, we propose a varying ...We construct a fuzzy varying coefficient bilinear regression model to deal with the interval financial data and then adopt the least-squares method based on symmetric fuzzy number space. Firstly, we propose a varying coefficient model on the basis of the fuzzy bilinear regression model. Secondly, we develop the least-squares method according to the complete distance between fuzzy numbers to estimate the coefficients and test the adaptability of the proposed model by means of generalized likelihood ratio test with SSE composite index. Finally, mean square errors and mean absolutely errors are employed to evaluate and compare the fitting of fuzzy auto regression, fuzzy bilinear regression and fuzzy varying coefficient bilinear regression models, and also the forecasting of three models. Empirical analysis turns out that the proposed model has good fitting and forecasting accuracy with regard to other regression models for the capital market.展开更多
Multicollinearity constitutes shared variation among predictors that inflates standard errors of regression coefficients. Several years ago, it was proven that the common practice of mean centering in moderated regres...Multicollinearity constitutes shared variation among predictors that inflates standard errors of regression coefficients. Several years ago, it was proven that the common practice of mean centering in moderated regression cannot alleviate multicollinearity among variables comprising an interaction, but merely masks it. Residual centering (orthogonalizing) is unacceptable because it biases parameters for predictors from which the interaction derives, thus precluding interpretation of moderator effects. I propose and validate residual centering in sequential re-estimations of a moderated regression—sequential residual centering (SRC)—by revealing unbiased multicollinearity conditioning across the interaction and its related terms. Across simulations, SRC reduces variance inflation factors (VIF) regardless of distribution shape or pattern of regression coefficients across predictors. For any predictor, the reduced VIF is used to derive a lower standard error of its regression coefficient. A cancer sample illustrates SRC, which allows unbiased interpretations of symptom clusters. SRC can be applied efficiently to alleviate multicollinearity after data collection and shows promise for advancing synergistic frontiers of research.展开更多
Accurate load prediction plays an important role in smart power management system, either for planning, facing the increasing of load demand, maintenance issues, or power distribution system. In order to achieve a rea...Accurate load prediction plays an important role in smart power management system, either for planning, facing the increasing of load demand, maintenance issues, or power distribution system. In order to achieve a reasonable prediction, authors have applied and compared two features extraction technique presented by kernel partial least square regression and kernel principal component regression, and both of them are carried out by polynomial and Gaussian kernels to map the original features’ to high dimension features’ space, and then draw new predictor variables known as scores and loadings, while kernel principal component regression draws the predictor features to construct new predictor variables without any consideration to response vector. In contrast, kernel partial least square regression does take the response vector into consideration. Models are simulated by three different cities’ electric load data, which used historical load data in addition to weekends and holidays as common predictor features for all models. On the other hand temperature has been used for only one data as a comparative study to measure its effect. Models’ results evaluated by three statistic measurements, show that Gaussian Kernel Partial Least Square Regression offers the more powerful features and significantly can improve the load prediction performance than other presented models.展开更多
The thermal and electrical conductivities of magnesium alloys are highly sensitive to composition and microstructure,with thermal conductivity varying by up to 20-fold across different as-cast alloy systems,making rap...The thermal and electrical conductivities of magnesium alloys are highly sensitive to composition and microstructure,with thermal conductivity varying by up to 20-fold across different as-cast alloy systems,making rapid and accurate prediction crucial for high-throughput screening and development of high-performance alloys.This study introduces a physics-informed symbolic regression approach that addresses the limitations of traditional methods,including the high computational cost of first-principles calculations and the poor interpretability of machine learning models.Comprehensive datasets comprising 1512 data points from 60 literature sources were analyzed,including thermal conductivity measurements from 52 alloy systems and electrical conductivity measurements from 36 systems.The derived symbolic regression model achieved Mean Absolute Percentage Errors(MAPEs)of 11.2%and 11.4%for thermal conductivity in low and high-component systems,respectively.When integrated with the Smith-Palmer equation,electrical conductivity predictions reached MAPEs of 15.6%and 16.4%.Independent validation on an entirely separate dataset of 554 data points from 53 additional literature sources,including 37 previously unseen alloy systems,confirmed model generalizability with MAPEs of 10.7%-15.2%.Shapley Additive Explanations(SHAP)analysis was employed to evaluate the relative importance of different features affecting conductivity,while equation decomposition quantified the contribution of individual functional terms.This methodology bridges data-driven prediction with mechanistic understanding,establishing a foundation for knowledge-based design of magnesium alloys with tailored transport properties.展开更多
Traditional oilfields face increasing extraction challenges, primarily due to reservoir quality degradation and production decline, which are further exacerbated by volatile international crude oil prices—illustrated...Traditional oilfields face increasing extraction challenges, primarily due to reservoir quality degradation and production decline, which are further exacerbated by volatile international crude oil prices—illustrated by Brent Crude’s trajectory from pandemic-induced negative pricing to geopolitically driven surges exceeding USD 100 per barrel. This study addresses these complexities through an integrated methodological framework applied to medium-permeability sandstone reservoirs in the Xinjiang oilfield by combining advanced numerical simulations with multivariate regression analysis. The methodology employs Latin Hypercube Sampling (LHS) to stratify geological parameter distributions and constructs heterogeneous reservoir models using Petrel software, rigorously validated through historical production data matching. Production forecasting integrates numerical simulation and Decline Curve Analysis (DCA), while investment estimation utilizes Ordinary Least Squares (OLS) regression to correlate engineering parameters with drilling and completion costs. Economic evaluation incorporates Discounted Cash Flow (DCF) modeling and breakeven analysis, establishing techno-economic boundaries via oil price sensitivity analysis ranging from USD 40 to 90 per barrel. Visualization tools, including 3D heatmaps, delineate nonlinear interactions among engineering, geological, and investment datasets under economic constraints. Key findings demonstrate that for the target reservoirs, as oil prices increase from USD 40 to USD 90 per barrel, the minimum economic thickness threshold decreases from approximately 5.7 m to about 2.5 m, with model prediction errors consistently below 25% across validation datasets. This framework provides scientifically grounded decision support for optimizing capital allocation and offers actionable insights to enhance undeveloped hydrocarbon development planning amid market uncertainty. Ultimately, it supports national energy security through technically robust and economically viable resource exploitation strategies.展开更多
The accessibility of urban public transit directly influences residents’quality of life,travel behavior,and social equity.Its correlation with housing prices has garnered significant attention across disciplines such...The accessibility of urban public transit directly influences residents’quality of life,travel behavior,and social equity.Its correlation with housing prices has garnered significant attention across disciplines such as geography,economics,and urban planning.Although much existing research focuses on the impact of individual transportation facilities on housing prices,there is a notable gap in comprehensive analyses that assess the influence of overall urban transit accessibility on housing market dynamics.This study selected the main urban area of Hefei,China,as a case to investigate the spatial distribution of housing prices and evaluate public transit accessibility in 2022.Employing techniques such as the optimized parameter geographical detector and local spatial regression models,the study aimed to elucidate the effects and underlying mechanisms of urban transit accessibility on housing prices.The findings revealed that:1)housing prices in Hefei exhibited a clustered spatial pattern,with high prices concentrated in the city center and lower prices in peripheral areas,forming three distinct high-price hotspots with a‘belt-like’distribution;2)public transit accessibility showed a‘coreperiphery’structure,with accessibility declining in a‘circumferential’pattern around the city center.Based on the‘housing price-accessibility’dimension,four categories were identified:high price-high accessibility(37.25%),high price-low accessibility(19.07%),low price-high accessibility(21.95%),and low price-low accessibility(21.73%);3)the impact of transit accessibility on housing prices was spatially heterogeneous,with bus travel showing the strongest explanatory power(0.692),followed by automobile,subway,and bicycle travel.The interaction of these transportation modes generated a synergistic effect on housing price differentiation,with most influencing factors contributing more than 25%.These findings offer valuable insights for optimizing the spatial distribution of public transit infrastructure and improving both urban housing quality and residents’living standards.展开更多
The blast-induced ground vibration prediction using scaled distance regression analysis is one of the most popular methods employed by engineers for many decades. It uses the maximum charge per delay and distance of m...The blast-induced ground vibration prediction using scaled distance regression analysis is one of the most popular methods employed by engineers for many decades. It uses the maximum charge per delay and distance of monitoring as the major factors for predicting the peak particle velocity(PPV). It is established that the PPV is caused by the maximum charge per delay which varies with the distance of monitoring and site geology. While conducting a production blasting, the waves induced by blasting of different holes interfere destructively with each other, which may result in higher PPV than the predicted value with scaled distance regression analysis. This phenomenon of interference/superimposition of waves is not considered while using scaled distance regression analysis. In this paper, an attempt has been made to compare the predicted values of blast-induced ground vibration using multi-hole trial blasting with single-hole blasting in an opencast coal mine under the same geological condition. Further,the modified prediction equation for the multi-hole trial blasting was obtained using single-hole regression analysis. The error between predicted and actual values of multi-hole blast-induced ground vibration was found to be reduced by 8.5%.展开更多
Rudraprayag in Garhwal Himalayan division is one of the most vulnerable districts to landslides in India. Heavy rainfall, steep slope and developmental activities are important factors for the occurrence of landslides...Rudraprayag in Garhwal Himalayan division is one of the most vulnerable districts to landslides in India. Heavy rainfall, steep slope and developmental activities are important factors for the occurrence of landslides in the district. Therefore, specific assessment of landslide susceptibility and its accuracy at regional level is essential for disaster management and proper land use planning. The article evaluates effectiveness of frequency ratio, fuzzy logic and logistic regression models for assessing landslide susceptibility in Rudraprayag district of Uttarakhand state, India. A landslide inventory map was prepared and verified by field data. Fourteen landslide parameters and generated inventory map were utilized to prepare landslide susceptibility maps through frequency ratio, fuzzy logic and logistic regression models. Landslide susceptibility maps generated through these models were classified into very high, high, medium, low and very low categories using natural breaks classification. Receiver operating characteristics(ROC) curve, spatially agreed area approach and seed cell area index(SCAI) method were used to validate the landslide models. Validation results revealed that fuzzy logic model was found to be more effective in assessing landslide susceptibility in the study area. The landslide susceptibility map generated through fuzzy logic model can be best utilized for landslide disaster management and effective land use planning.展开更多
The adjacent-categories, continuation-ratio and proportional odds logit-link regression models provide useful extensions of the multinomial logistic model to ordinal response data. We propose fitting these models with...The adjacent-categories, continuation-ratio and proportional odds logit-link regression models provide useful extensions of the multinomial logistic model to ordinal response data. We propose fitting these models with a logarithmic link to allow estimation of different forms of the risk ratio. Each of the resulting ordinal response log-link models is a constrained version of the log multinomial model, the log-link counterpart of the multinomial logistic model. These models can be estimated using software that allows the user to specify the log likelihood as the objective function to be maximized and to impose constraints on the parameter estimates. In example data with a dichotomous covariate, the unconstrained models produced valid coefficient estimates and standard errors, and the constrained models produced plausible results. Models with a single continuous covariate performed well in data simulations, with low bias and mean squared error on average and appropriate confidence interval coverage in admissible solutions. In an application to real data, practical aspects of the fitting of the models are investigated. We conclude that it is feasible to obtain adjusted estimates of the risk ratio for ordinal outcome data.展开更多
文摘On the basis of experimental observations on animals, applications to clinical data on patients and theoretical statistical reasoning, the author developed a com-puter-assisted general mathematical model of the ‘probacent’-probability equation, Equation (1) and death rate (mortality probability) equation, Equation (2) derivable from Equation (1) that may be applica-ble as a general approximation method to make use-ful predictions of probable outcomes in a variety of biomedical phenomena [1-4]. Equations (1) and (2) contain a constant, γ and c, respectively. In the pre-vious studies, the author used the least maximum- difference principle to determine these constants that were expected to best fit reported data, minimizing the deviation. In this study, the author uses the method of computer-assisted least sum of squares to determine the constants, γ and c in constructing the ‘probacent’-related formulas best fitting the NCHS- reported data on survival probabilities and death rates in the US total adult population for 2001. The results of this study reveal that the method of com-puter-assisted mathematical analysis with the least sum of squares seems to be simple, more accurate, convenient and preferable than the previously used least maximum-difference principle, and better fit-ting the NCHS-reported data on survival probabili-ties and death rates in the US total adult population. The computer program of curved regression for the ‘probacent’-probability and death rate equations may be helpful in research in biomedicine.
文摘Multinomial logistic regression (MNL) is an attractive statistical approach in modeling the vehicle crash severity as it does not require the assumption of normality, linearity, or homoscedasticity compared to other approaches, such as the discriminant analysis which requires these assumptions to be met. Moreover, it produces sound estimates by changing the probability range between 0.0 and 1.0 to log odds ranging from negative infinity to positive infinity, as it applies transformation of the dependent variable to a continuous variable. The estimates are asymptotically consistent with the requirements of the nonlinear regression process. The results of MNL can be interpreted by both the regression coefficient estimates and/or the odd ratios (the exponentiated coefficients) as well. In addition, the MNL can be used to improve the fitted model by comparing the full model that includes all predictors to a chosen restricted model by excluding the non-significant predictors. As such, this paper presents a detailed step by step overview of incorporating the MNL in crash severity modeling, using vehicle crash data of the Interstate I70 in the State of Missouri, USA for the years (2013-2015).
文摘To cater the need for real-time crack monitoring of infrastructural facilities,a CNN-regression model is proposed to directly estimate the crack properties from patches.RGB crack images and their corresponding masks obtained from a public dataset are cropped into patches of 256 square pixels that are classified with a pre-trained deep convolution neural network,the true positives are segmented,and crack properties are extracted using two different methods.The first method is primarily based on active contour models and level-set segmentation and the second method consists of the domain adaptation of a mathematical morphology-based method known as FIL-FINDER.A statistical test has been performed for the comparison of the stated methods and a database prepared with the more suitable method.An advanced convolution neural network-based multi-output regression model has been proposed which was trained with the prepared database and validated with the held-out dataset for the prediction of crack-length,crack-width,and width-uncertainty directly from input image patches.The pro-posed model has been tested on crack patches collected from different locations.Huber loss has been used to ensure the robustness of the proposed model selected from a set of 288 different variations of it.Additionally,an ablation study has been conducted on the top 3 models that demonstrated the influence of each network component on the pre-diction results.Finally,the best performing model HHc-X among the top 3 has been proposed that predicted crack properties which are in close agreement to the ground truths in the test data.
基金Under the auspices of the National Natural Science Foundation of China(No.41661099)the National Key Research and Development Program of China(No.Grant 2016YFA0601601)
文摘Satellite-based precipitation products have been widely used to estimate precipitation, especially over regions with sparse rain gauge networks. However, the low spatial resolution of these products has limited their application in localized regions and watersheds.This study investigated a spatial downscaling approach, Geographically Weighted Regression Kriging(GWRK), to downscale the Tropical Rainfall Measuring Mission(TRMM) 3 B43 Version 7 over the Lancang River Basin(LRB) for 2001–2015. Downscaling was performed based on the relationships between the TRMM precipitation and the Normalized Difference Vegetation Index(NDVI), the Land Surface Temperature(LST), and the Digital Elevation Model(DEM). Geographical ratio analysis(GRA) was used to calibrate the annual downscaled precipitation data, and the monthly fractions derived from the original TRMM data were used to disaggregate annual downscaled and calibrated precipitation to monthly precipitation at 1 km resolution. The final downscaled precipitation datasets were validated against station-based observed precipitation in 2001–2015. Results showed that: 1) The TRMM 3 B43 precipitation was highly accurate with slight overestimation at the basin scale(i.e., CC(correlation coefficient) = 0.91, Bias = 13.3%). Spatially, the accuracies of the upstream and downstream regions were higher than that of the midstream region. 2) The annual downscaled TRMM precipitation data at 1 km spatial resolution obtained by GWRK effectively captured the high spatial variability of precipitation over the LRB. 3) The annual downscaled TRMM precipitation with GRA calibration gave better accuracy compared with the original TRMM dataset. 4) The final downscaled and calibrated precipitation had significantly improved spatial resolution, and agreed well with data from the validated rain gauge stations, i.e., CC = 0.75, RMSE(root mean square error) = 182 mm, MAE(mean absolute error) = 142 mm, and Bias = 0.78%for annual precipitation and CC = 0.95, RMSE = 25 mm, MAE = 16 mm, and Bias = 0.67% for monthly precipitation.
文摘This study explored and reviewed the logistic regression (LR) model, a multivariable method for modeling the relationship between multiple independent variables and a categorical dependent variable, with emphasis on medical research. Thirty seven research articles published between 2000 and 2018 which employed logistic regression as the main statistical tool as well as six text books on logistic regression were reviewed. Logistic regression concepts such as odds, odds ratio, logit transformation, logistic curve, assumption, selecting dependent and independent variables, model fitting, reporting and interpreting were presented. Upon perusing the literature, considerable deficiencies were found in both the use and reporting of LR. For many studies, the ratio of the number of outcome events to predictor variables (events per variable) was sufficiently small to call into question the accuracy of the regression model. Also, most studies did not report on validation analysis, regression diagnostics or goodness-of-fit measures;measures which authenticate the robustness of the LR model. Here, we demonstrate a good example of the application of the LR model using data obtained on a cohort of pregnant women and the factors that influence their decision to opt for caesarean delivery or vaginal birth. It is recommended that researchers should be more rigorous and pay greater attention to guidelines concerning the use and reporting of LR models.
基金Supported by High-level Professional Groups in Gangdong Province,No.GSPZYQ2020101Guangdong Province Educational Research Planning Project,No.2024GXJK742。
文摘BACKGROUND Paternal perinatal depression(PPD)is closely associated with maternal mental health challenges,marital strain,and adverse child developmental outcomes.Despite its significant impact,PPD remains under-recognized in family-centered clinical practice.Concurrently,against the backdrop of rising rates of delayed marriage and China’s Maternity Incentive Policy,the proportion of women giving birth at an advanced maternal age is increasing.Nevertheless,research specifically examining PPD among spouses of older mothers remains critically scarce,both in China and globally.AIM To investigate PPD and its influencing factors in Chinese advanced maternal age families.METHODS This cross-sectional study included 358 participants;it was conducted among fathers of pregnant women of advanced maternal age at five hospitals in the Pearl River Delta region of China from September 2023 to June 2024.Data were collected via a general information questionnaire,the Social Support Rating Scale,and the Edinburgh Postnatal Depression Scale.Latent profile analysis and regression mixture models(RMMs)were adopted to analyze the latent PPD types and factors that influenced PPD.RESULTS The incidence of PPD was 16.48%,and three profiles were identified:Low-symptomatic(175 cases,48.89%),monophasic(140 cases,39.10%),and high-symptomatic(43 cases,12.01%).The RMM analysis revealed that first pregnancy,low income(<¥3000/month),part-time work,and a history of abnormal pregnancy were positively associated with the high-symptomatic type(P<0.05).Conversely,high subjective support and support utilization were negatively associated with the high-symptomatic type compared with the low-symptomatic type(P<0.05).Good couple relationships,high objective and subjective support,and high support utilization were negatively associated with monophasic disorder(P<0.05).CONCLUSION PPD incidence is high among Chinese fathers with advanced maternal age partners,and the characteristics of depression are varied.Healthcare practitioners should prioritize individuals with low levels of social support.
文摘Bangladesh is a subtropical monsoon climate characterized by wide seasonal variations in rainfall, moderately warm temperatures, and high humidity. Rainfall is the main source of irrigation water everywhere in the Bangladesh where the inhabitants derive their income primarily from farming. Stochastic rainfall models were concerned with the occurrence of wet day and depth of rainfall for different regions to model the daily occurrence of rainfall and achieved satisfactory results around the world. In connection to the Markov chain of different order, logistic regression is conducted to visualize the dependence of current rainfall upon the rainfall of previous two-time period. It had been shown that wet day of the previous two time period compared to the dry day of previous two time period influences positively the wet day of current time period, that is the dependency of dry-wet spell for the occurrence of rain in the rainy season from April to September in the study area. Daily data are collected from meteorological department of about 26 years on rainfall of Dhaka station during the period January 1985-August 2011 to conduct the study. The test result shows that the occurrence of rainfall follows a second order Markov chain and logistic regression also tells that dry followed by dry and wet followed by wet is more likely for the rainfall of Dhaka station and also the model could perform adequately for many applications of rainfall data satisfactorily.
文摘Least Absolute Shrinkage and Selection Operator (LASSO) is used for variable selection as well as for handling the multicollinearity problem simultaneously in the linear regression model. LASSO produces estimates having high variance if the number of predictors is higher than the number of observations and if high multicollinearity exists among the predictor variables. To handle this problem, Elastic Net (ENet) estimator was introduced by combining LASSO and Ridge estimator (RE). The solutions of LASSO and ENet have been obtained using Least Angle Regression (LARS) and LARS-EN algorithms, respectively. In this article, we proposed an alternative algorithm to overcome the issues in LASSO that can be combined LASSO with other exiting biased estimators namely Almost Unbiased Ridge Estimator (AURE), Liu Estimator (LE), Almost Unbiased Liu Estimator (AULE), Principal Component Regression Estimator (PCRE), r-k class estimator and r-d class estimator. Further, we examine the performance of the proposed algorithm using a Monte-Carlo simulation study and real-world examples. The results showed that the LARS-rk and LARS-rd algorithms,?which are combined LASSO with r-k class estimator and r-d class estimator,?outperformed other algorithms under the moderated and severe multicollinearity.
基金the National Natural Science Foundation of China(No.60905066)the Natural Science Foundation of Chongqing(No.cstc2018jcyjA0667)
文摘As optimization of parameters affects prediction accuracy and generalization ability of support vector regression(SVR) greatly and the predictive model often mismatches nonlinear system model predictive control,a multi-step model predictive control based on online SVR(OSVR) optimized by multi-agent particle swarm optimization algorithm(MAPSO) is put forward. By integrating the online learning ability of OSVR, the predictive model can self-correct and adapt to the dynamic changes in nonlinear process well.
基金Supported by the National Natural Science Foundation of China(20676013,61240047)
文摘The approaches to discrete approximation of Pareto front using multi-objective evolutionary algorithms have the problems of heavy computation burden, long running time and missing Pareto optimal points. In order to overcome these problems, an approach to continuous approximation of Pareto front using geometric support vector regression is presented. The regression model of the small size approximate discrete Pareto front is constructed by geometric support vector regression modeling and is described as the approximate continuous Pareto front. In the process of geometric support vector regression modeling, considering the distribution characteristic of Pareto optimal points, the separable augmented training sample sets are constructed by shifting original training sample points along multiple coordinated axes. Besides, an interactive decision-making(DM)procedure, in which the continuous approximation of Pareto front and decision-making is performed interactively, is designed for improving the accuracy of the preferred Pareto optimal point. The correctness of the continuous approximation of Pareto front is demonstrated with a typical multi-objective optimization problem. In addition,combined with the interactive decision-making procedure, the continuous approximation of Pareto front is applied in the multi-objective optimization for an industrial fed-batch yeast fermentation process. The experimental results show that the generated approximate continuous Pareto front has good accuracy and completeness. Compared with the multi-objective evolutionary algorithm with large size population, a more accurate preferred Pareto optimal point can be obtained from the approximate continuous Pareto front with less computation and shorter running time. The operation strategy corresponding to the final preferred Pareto optimal point generated by the interactive DM procedure can improve the production indexes of the fermentation process effectively.
文摘Fuzzy regression analysis is an important regression analysis method to predict uncertain information in the real world. In this paper, the input data are crisp with randomness;the output data are trapezoid fuzzy number, and three different risk preferences and chaos optimization algorithm are introduced to establish fuzzy regression model. On the basis of the principle of the minimum total spread between the observed and the estimated values, risk-neutral, risk-averse, and risk-seeking fuzzy regression model are developed to obtain the parameters of fuzzy linear regression model. Chaos optimization algorithm is used to determine the digital characteristic of random variables. The mean absolute percentage error and variance of errors are adopted to compare the modeling results. A stock rating case is used to evaluate the fuzzy regression models. The comparisons with five existing methods show that our proposed method has satisfactory performance.
文摘We construct a fuzzy varying coefficient bilinear regression model to deal with the interval financial data and then adopt the least-squares method based on symmetric fuzzy number space. Firstly, we propose a varying coefficient model on the basis of the fuzzy bilinear regression model. Secondly, we develop the least-squares method according to the complete distance between fuzzy numbers to estimate the coefficients and test the adaptability of the proposed model by means of generalized likelihood ratio test with SSE composite index. Finally, mean square errors and mean absolutely errors are employed to evaluate and compare the fitting of fuzzy auto regression, fuzzy bilinear regression and fuzzy varying coefficient bilinear regression models, and also the forecasting of three models. Empirical analysis turns out that the proposed model has good fitting and forecasting accuracy with regard to other regression models for the capital market.
文摘Multicollinearity constitutes shared variation among predictors that inflates standard errors of regression coefficients. Several years ago, it was proven that the common practice of mean centering in moderated regression cannot alleviate multicollinearity among variables comprising an interaction, but merely masks it. Residual centering (orthogonalizing) is unacceptable because it biases parameters for predictors from which the interaction derives, thus precluding interpretation of moderator effects. I propose and validate residual centering in sequential re-estimations of a moderated regression—sequential residual centering (SRC)—by revealing unbiased multicollinearity conditioning across the interaction and its related terms. Across simulations, SRC reduces variance inflation factors (VIF) regardless of distribution shape or pattern of regression coefficients across predictors. For any predictor, the reduced VIF is used to derive a lower standard error of its regression coefficient. A cancer sample illustrates SRC, which allows unbiased interpretations of symptom clusters. SRC can be applied efficiently to alleviate multicollinearity after data collection and shows promise for advancing synergistic frontiers of research.
文摘Accurate load prediction plays an important role in smart power management system, either for planning, facing the increasing of load demand, maintenance issues, or power distribution system. In order to achieve a reasonable prediction, authors have applied and compared two features extraction technique presented by kernel partial least square regression and kernel principal component regression, and both of them are carried out by polynomial and Gaussian kernels to map the original features’ to high dimension features’ space, and then draw new predictor variables known as scores and loadings, while kernel principal component regression draws the predictor features to construct new predictor variables without any consideration to response vector. In contrast, kernel partial least square regression does take the response vector into consideration. Models are simulated by three different cities’ electric load data, which used historical load data in addition to weekends and holidays as common predictor features for all models. On the other hand temperature has been used for only one data as a comparative study to measure its effect. Models’ results evaluated by three statistic measurements, show that Gaussian Kernel Partial Least Square Regression offers the more powerful features and significantly can improve the load prediction performance than other presented models.
基金supported by the National Key Research and Development Program of China(No.2023YFB3712401)the National Natural Science Foundation of China(No.52274301)+2 种基金the Aeronautical Science Foundation of China(No.2023Z0530S6005)Academician Workstation of Kunming University of Science and Technology(2024),Ningbo Yongjiang Talent-Introduction Program(No.2022A-023C)Zhejiang Phenomenological Materials Technology Co.,Ltd.,China.
文摘The thermal and electrical conductivities of magnesium alloys are highly sensitive to composition and microstructure,with thermal conductivity varying by up to 20-fold across different as-cast alloy systems,making rapid and accurate prediction crucial for high-throughput screening and development of high-performance alloys.This study introduces a physics-informed symbolic regression approach that addresses the limitations of traditional methods,including the high computational cost of first-principles calculations and the poor interpretability of machine learning models.Comprehensive datasets comprising 1512 data points from 60 literature sources were analyzed,including thermal conductivity measurements from 52 alloy systems and electrical conductivity measurements from 36 systems.The derived symbolic regression model achieved Mean Absolute Percentage Errors(MAPEs)of 11.2%and 11.4%for thermal conductivity in low and high-component systems,respectively.When integrated with the Smith-Palmer equation,electrical conductivity predictions reached MAPEs of 15.6%and 16.4%.Independent validation on an entirely separate dataset of 554 data points from 53 additional literature sources,including 37 previously unseen alloy systems,confirmed model generalizability with MAPEs of 10.7%-15.2%.Shapley Additive Explanations(SHAP)analysis was employed to evaluate the relative importance of different features affecting conductivity,while equation decomposition quantified the contribution of individual functional terms.This methodology bridges data-driven prediction with mechanistic understanding,establishing a foundation for knowledge-based design of magnesium alloys with tailored transport properties.
文摘Traditional oilfields face increasing extraction challenges, primarily due to reservoir quality degradation and production decline, which are further exacerbated by volatile international crude oil prices—illustrated by Brent Crude’s trajectory from pandemic-induced negative pricing to geopolitically driven surges exceeding USD 100 per barrel. This study addresses these complexities through an integrated methodological framework applied to medium-permeability sandstone reservoirs in the Xinjiang oilfield by combining advanced numerical simulations with multivariate regression analysis. The methodology employs Latin Hypercube Sampling (LHS) to stratify geological parameter distributions and constructs heterogeneous reservoir models using Petrel software, rigorously validated through historical production data matching. Production forecasting integrates numerical simulation and Decline Curve Analysis (DCA), while investment estimation utilizes Ordinary Least Squares (OLS) regression to correlate engineering parameters with drilling and completion costs. Economic evaluation incorporates Discounted Cash Flow (DCF) modeling and breakeven analysis, establishing techno-economic boundaries via oil price sensitivity analysis ranging from USD 40 to 90 per barrel. Visualization tools, including 3D heatmaps, delineate nonlinear interactions among engineering, geological, and investment datasets under economic constraints. Key findings demonstrate that for the target reservoirs, as oil prices increase from USD 40 to USD 90 per barrel, the minimum economic thickness threshold decreases from approximately 5.7 m to about 2.5 m, with model prediction errors consistently below 25% across validation datasets. This framework provides scientifically grounded decision support for optimizing capital allocation and offers actionable insights to enhance undeveloped hydrocarbon development planning amid market uncertainty. Ultimately, it supports national energy security through technically robust and economically viable resource exploitation strategies.
基金Under the auspices of the National Natural Science Foundation of China(No.42271224,41901193)Ministry of Edu cation Humanities and Social Sciences Research Planning Fund Project of China(No.24YJAZH190)+1 种基金Anhui Province Excellent Youth Research Project in Universities(No.2022AH030019)Anhui Social Sciences Innovation Development Research Project(No.2024CXQ503)。
文摘The accessibility of urban public transit directly influences residents’quality of life,travel behavior,and social equity.Its correlation with housing prices has garnered significant attention across disciplines such as geography,economics,and urban planning.Although much existing research focuses on the impact of individual transportation facilities on housing prices,there is a notable gap in comprehensive analyses that assess the influence of overall urban transit accessibility on housing market dynamics.This study selected the main urban area of Hefei,China,as a case to investigate the spatial distribution of housing prices and evaluate public transit accessibility in 2022.Employing techniques such as the optimized parameter geographical detector and local spatial regression models,the study aimed to elucidate the effects and underlying mechanisms of urban transit accessibility on housing prices.The findings revealed that:1)housing prices in Hefei exhibited a clustered spatial pattern,with high prices concentrated in the city center and lower prices in peripheral areas,forming three distinct high-price hotspots with a‘belt-like’distribution;2)public transit accessibility showed a‘coreperiphery’structure,with accessibility declining in a‘circumferential’pattern around the city center.Based on the‘housing price-accessibility’dimension,four categories were identified:high price-high accessibility(37.25%),high price-low accessibility(19.07%),low price-high accessibility(21.95%),and low price-low accessibility(21.73%);3)the impact of transit accessibility on housing prices was spatially heterogeneous,with bus travel showing the strongest explanatory power(0.692),followed by automobile,subway,and bicycle travel.The interaction of these transportation modes generated a synergistic effect on housing price differentiation,with most influencing factors contributing more than 25%.These findings offer valuable insights for optimizing the spatial distribution of public transit infrastructure and improving both urban housing quality and residents’living standards.
文摘The blast-induced ground vibration prediction using scaled distance regression analysis is one of the most popular methods employed by engineers for many decades. It uses the maximum charge per delay and distance of monitoring as the major factors for predicting the peak particle velocity(PPV). It is established that the PPV is caused by the maximum charge per delay which varies with the distance of monitoring and site geology. While conducting a production blasting, the waves induced by blasting of different holes interfere destructively with each other, which may result in higher PPV than the predicted value with scaled distance regression analysis. This phenomenon of interference/superimposition of waves is not considered while using scaled distance regression analysis. In this paper, an attempt has been made to compare the predicted values of blast-induced ground vibration using multi-hole trial blasting with single-hole blasting in an opencast coal mine under the same geological condition. Further,the modified prediction equation for the multi-hole trial blasting was obtained using single-hole regression analysis. The error between predicted and actual values of multi-hole blast-induced ground vibration was found to be reduced by 8.5%.
文摘Rudraprayag in Garhwal Himalayan division is one of the most vulnerable districts to landslides in India. Heavy rainfall, steep slope and developmental activities are important factors for the occurrence of landslides in the district. Therefore, specific assessment of landslide susceptibility and its accuracy at regional level is essential for disaster management and proper land use planning. The article evaluates effectiveness of frequency ratio, fuzzy logic and logistic regression models for assessing landslide susceptibility in Rudraprayag district of Uttarakhand state, India. A landslide inventory map was prepared and verified by field data. Fourteen landslide parameters and generated inventory map were utilized to prepare landslide susceptibility maps through frequency ratio, fuzzy logic and logistic regression models. Landslide susceptibility maps generated through these models were classified into very high, high, medium, low and very low categories using natural breaks classification. Receiver operating characteristics(ROC) curve, spatially agreed area approach and seed cell area index(SCAI) method were used to validate the landslide models. Validation results revealed that fuzzy logic model was found to be more effective in assessing landslide susceptibility in the study area. The landslide susceptibility map generated through fuzzy logic model can be best utilized for landslide disaster management and effective land use planning.
文摘The adjacent-categories, continuation-ratio and proportional odds logit-link regression models provide useful extensions of the multinomial logistic model to ordinal response data. We propose fitting these models with a logarithmic link to allow estimation of different forms of the risk ratio. Each of the resulting ordinal response log-link models is a constrained version of the log multinomial model, the log-link counterpart of the multinomial logistic model. These models can be estimated using software that allows the user to specify the log likelihood as the objective function to be maximized and to impose constraints on the parameter estimates. In example data with a dichotomous covariate, the unconstrained models produced valid coefficient estimates and standard errors, and the constrained models produced plausible results. Models with a single continuous covariate performed well in data simulations, with low bias and mean squared error on average and appropriate confidence interval coverage in admissible solutions. In an application to real data, practical aspects of the fitting of the models are investigated. We conclude that it is feasible to obtain adjusted estimates of the risk ratio for ordinal outcome data.