We study the quasi likelihood equation in Generalized Linear Models(GLM)with adaptive design∑(i=1)^n xi(yi-h(x'iβ))=0,where yi is a q=vector,and xi is a p×q random matrix.Under some assumptions,it is shown ...We study the quasi likelihood equation in Generalized Linear Models(GLM)with adaptive design∑(i=1)^n xi(yi-h(x'iβ))=0,where yi is a q=vector,and xi is a p×q random matrix.Under some assumptions,it is shown that the Quasi-Likelihood equation for the GLM has a solution which is asymptotic normal.展开更多
This article concerded with a semiparametric generalized partial linear model (GPLM) with the type Ⅱ censored data. A sieve maximum likelihood estimator (MLE) is proposed to estimate the parameter component, allo...This article concerded with a semiparametric generalized partial linear model (GPLM) with the type Ⅱ censored data. A sieve maximum likelihood estimator (MLE) is proposed to estimate the parameter component, allowing exploration of the nonlinear relationship between a certain covariate and the response function. Asymptotic properties of the proposed sieve MLEs are discussed. Under some mild conditions, the estimators are shown to be strongly consistent. Moreover, the estimators of the unknown parameters are asymptotically normal and efficient, and the estimator of the nonparametric function has an optimal convergence rate.展开更多
In a linear regression model, testing for uniformity of the variance of the residuals is a significant integral part of statistical analysis. This is a crucial assumption that requires statistical confirmation via the...In a linear regression model, testing for uniformity of the variance of the residuals is a significant integral part of statistical analysis. This is a crucial assumption that requires statistical confirmation via the use of some statistical tests mostly before carrying out the Analysis of Variance (ANOVA) technique. Many academic researchers have published series of papers (articles) on some tests for detecting variance heterogeneity assumption in multiple linear regression models. So many comparisons on these tests have been made using various statistical techniques like biases, error rates as well as powers. Aside comparisons, modifications of some of these statistical tests for detecting variance heterogeneity have been reported in some literatures in recent years. In a multiple linear regression situation, much work has not been done on comparing some selected statistical tests for homoscedasticity assumption when linear, quadratic, square root, and exponential forms of heteroscedasticity are injected into the residuals. As a result of this fact, the present study intends to work extensively on all these areas of interest with a view to filling the gap. The paper aims at providing a comprehensive comparative analysis of asymptotic behaviour of some selected statistical tests for homoscedasticity assumption in order to hunt for the best statistical test for detecting heteroscedasticity in a multiple linear regression scenario with varying variances and levels of significance. In the literature, several tests for homoscedasticity are available but only nine: Breusch-Godfrey test, studentized Breusch-Pagan test, White’s test, Nonconstant Variance Score test, Park test, Spearman Rank, <span>Glejser test, Goldfeld-Quandt test, Harrison-McCabe test were considered for this study;this is with a view to examining, by Monte Carlo simulations, their</span><span> asymptotic behaviours. However, four different forms of heteroscedastic structures: exponential and linear (generalize of square-root and quadratic structures) were injected into the residual part of the multiple linear regression models at different categories of sample sizes: 30, 50, 100, 200, 500 and 1000. Evaluations of the performances were done within R environment. Among other findings, our investigations revealed that Glejser and Park tests returned the best test to employ to check for heteroscedasticity in EHS and LHS respectively also White and Harrison-McCabe tests returned the best test to employ to check for homoscedasticity in EHS and LHS respectively for sample size less than 50.</span>展开更多
Changes in climate factors such as temperature, rainfall, humidity, and wind speed are natural processes that could significantly impact the incidence of infectious diseases. Dengue is a widespread disease that has of...Changes in climate factors such as temperature, rainfall, humidity, and wind speed are natural processes that could significantly impact the incidence of infectious diseases. Dengue is a widespread disease that has often been documented when it comes to the impact of climate change. It has become a significant concern, especially for the Malaysian health authorities, due to its rapid spread and serious effects, leading to loss of life. Several statistical models were performed to identify climatic factors associated with infectious diseases. However, because of the complex and nonlinear interactions between climate variables and disease components, modelling their relationships have become the main challenge in climate-health studies. Hence, this study proposed a Generalized Linear Model (GLM) via Poisson and Negative Binomial to examine the effects of the climate factors on dengue incidence by considering the collinearity between variables. This study focuses on the dengue hot spots in Malaysia for the year 2014. Since there exists collinearity between climate factors, the analysis was done separately using three different models. The study revealed that rainfall, temperature, humidity, and wind speed were statistically significant with dengue incidence, and most of them shown a negative effect. Of all variables, wind speed has the most significant impact on dengue incidence. Having this kind of relationships, policymakers should formulate better plans such that precautionary steps can be taken to reduce the spread of dengue diseases.展开更多
Stochastic models are derived to estimate the level of coliform count in terms of MPN index, one of the most important water quality characteristic in ground water based on a set of water source location and soil char...Stochastic models are derived to estimate the level of coliform count in terms of MPN index, one of the most important water quality characteristic in ground water based on a set of water source location and soil characteristics. The study is based on about twenty location and soil characteristics, majority of them are observed through laboratory analysis of soil and water samples collected from nearly thee hundred locations of drinking water sources, wells and bore wells selected at random from the district of Kasaragod. The water contamination in wells are found to be relatively more as compared to bore wells. The study reveals that only 7 % of the wells and 40 o~ of the bore wells of the district are within the permissible limit of WHO standard of drinking water quality. The level of contamination is very high in the hospital premises and is very low in the forest area. Two separate multiple ordinal logistic regression models are developed to predict the level of coliform count, one for well and the other for bore well. The significant feature of this study is that in addition to scientifically proving the dependence of the water quality on the distances from waste disposal area and septic tanks etc., it highlights the dependence of two other very significant soil characteristics, the soil organic carbon and soil porosity. The models enable to predict the quality of water in a location based on the set of soil and location characteristics. One of the important uses of the model is in fixing safe locations for waste dump area, septic tank, digging well etc. in town planning, designing residential layouts, industrial layouts, hospital/hostel construction etc. This is the first ever study to describe the ground water quality in terms of the location and soil characteristics.展开更多
A new covariate dependent zero-truncated bivariate Poisson model is proposed in this paper employing generalized linear model. A marginal-conditional approach is used to show the bivariate model. The proposed model wi...A new covariate dependent zero-truncated bivariate Poisson model is proposed in this paper employing generalized linear model. A marginal-conditional approach is used to show the bivariate model. The proposed model with estimation procedure and tests for goodness-of-fit and under (or over) dispersion are shown and applied to road safety data. Two correlated outcome variables considered in this study are number of cars involved in an accident and number of casualties for given number of cars.展开更多
Count data that exhibit over dispersion (variance of counts is larger than its mean) are commonly analyzed using discrete distributions such as negative binomial, Poisson inverse Gaussian and other models. The Poisson...Count data that exhibit over dispersion (variance of counts is larger than its mean) are commonly analyzed using discrete distributions such as negative binomial, Poisson inverse Gaussian and other models. The Poisson is characterized by the equality of mean and variance whereas the Negative Binomial and the Poisson inverse Gaussian have variance larger than the mean and therefore are more appropriate to model over-dispersed count data. As an alternative to these two models, we shall use the generalized Poisson distribution for group comparisons in the presence of multiple covariates. This problem is known as the ANCOVA and is solved for continuous data. Our objectives were to develop ANCOVA using the generalized Poisson distribution, and compare its goodness of fit to that of the nonparametric Generalized Additive Models. We used real life data to show that the model performs quite satisfactorily when compared to the nonparametric Generalized Additive Models.展开更多
Over the past decades,the expansion of natu-ral secondary forests has played a crucial role in offsetting the loss of primary forests and combating climate change.Despite this,there is a gap in our understanding of ho...Over the past decades,the expansion of natu-ral secondary forests has played a crucial role in offsetting the loss of primary forests and combating climate change.Despite this,there is a gap in our understanding of how tree species’growth and mortality patterns vary with eleva-tion in these secondary forests.In this study,we analyzed data from two censuses(spanning a five-year interval)conducted in both evergreen broadleaved forests(EBF)and temperate coniferous forests(TCF),which have been recovering for half a century,across elevation gradients in a subtropical mountain region,Mount Wuyi,China.The results indicated that the relative growth rate(RGR)of EBF(0.028±0.001 cm·cm^(-1)·a^(-1))and the mortality rate(MR)(20.03%±1.70%)were 27.3%and 16.4%higher,respec-tively,than those of TCF.Interestingly,the trade-off between RGR and MR in EBF weakened as elevation increased,a trend not observed in TCF.Conversely,TCF consistently showed a stronger trade-off between RGR and MR compared to EBF.Generalized linear mixed models revealed that ele-vation influences RGR both directly and indirectly through its interactions with slope,crown competition index(CCI),and tree canopy height(CH).However,tree mortality did not show a significant correlation with elevation.Additionally,DBH significantly influenced both tree growth and mortal-ity,whereas and CH and CCI had opposite effects on tree growth between EBF and TCF.Our study underscores the importance of elevation in shaping the population dynamics and the biomass carbon sink balance of mountain forests.These insights enhance our understanding of tree species’life strategies,enabling more accurate predictions of forest dynamics and their response to environmental changes.展开更多
Since its inception,the epsilon distribution has piqued the interest of statisticians.It has been successfully used to solve a variety of statistical problems.In this article,we propose to use the quadratic rank trans...Since its inception,the epsilon distribution has piqued the interest of statisticians.It has been successfully used to solve a variety of statistical problems.In this article,we propose to use the quadratic rank transmutation map mechanism to extend this distribution.This mechanism is not new;it was already used to improve the modeling capabilities of a number of existing distributions.For the original epsilon distribution,we expect the same benefits.As a result,we implement the transmuted epsilon distribution as a flexible three-parameter distribution with a bounded domain.We demonstrate its key features,focusing on the properties of its distributional mechanism and conducting quantile and moment analyses.Applications of the model are presented using two data sets.We also perform a regression analysis based on this distribution.展开更多
The multiscale variability in summer extreme persistent precipitation(SEPP)in China from 1961 to 2020 was investigated via three extreme precipitation indices:consecutive wet days,total precipitation amount,and daily ...The multiscale variability in summer extreme persistent precipitation(SEPP)in China from 1961 to 2020 was investigated via three extreme precipitation indices:consecutive wet days,total precipitation amount,and daily precipitation intensity.The relationships between precursory and concurrent global oceanic modes and SEPP were identified via a generalized linear model(GLM).The influence of oceanic modes on SEPP was finally investigated via numerical simulations.The results revealed that the climatological SEPP(≥14 days)mainly appears across the Tibetan Plateau,Yunnan–Guizhou Plateau,and South China coast.The first EOF mode for all three indices showed strong signals over the Yangtze River.Further analysis via the GLM suggested that the positive phases of the tropical North Atlantic(TNA)in autumn,ENSO in winter,the Indian Ocean Basin(IOB)in spring,and the western North Pacific(WNP)in summer emerged as the most effective precursory factors of SEPP,which could serve as preceding signals for future predictions,contributing 30.2%,36.4%,38.0%,and 55.6%,respectively,to the GLM.Sensitivity experiments revealed that SST forcing in all four seasons contributes to SEPP over China,whereas the winter and summer SST warming over the Pacific and Indian Ocean(IO)contributes the most.Diagnosis of the hydrological cycle suggested that water vapor advection predominantly originates from the western Pacific and IO in summer,driven by the strengthened subtropical high and Asian summer monsoon(ASM).The enhanced vertical water vapor transport is attributed to stronger upward motion across all four seasons.These findings are helpful for better understanding SEPP variabilities and their prediction under SST warming.展开更多
Doline susceptibility mapping(DSM)in karst aquifer is important in terms of estimating the vulnerability of the aquifer to pollutants,estimating the infiltration rate,and infrastructures exposed to the development of ...Doline susceptibility mapping(DSM)in karst aquifer is important in terms of estimating the vulnerability of the aquifer to pollutants,estimating the infiltration rate,and infrastructures exposed to the development of dolines.In this research,doline susceptibility map was prepared in Saldaran mountain by generalized linear model(GLM)using 14 affecting parameters extracted from satellite images,digital elevation model,and geology map.Only 8 parameters have been inputted to the model which had correlation with dolines.In this regards,306 dolines were identified by the photogrammetric Unmanned Aerial Vehicles(UAV)method in 600 hectares of Salderan lands and then,these data were divided into the training(70%)and testing(30%)data for modelling.The results of DSM modeling showed that classified probability of doline occurrences in the Saldaran mountain were as follow:16.5%of the area high to very high,72%in the class of low to very low,and 5%in the moderate class.Also,locally,in Saldaran mountain,the Pirghar aquifer has the highest potential for the doline development,followed by Bagh Rostam and Sarab aquifers.Also,the precipitation,digital elevation model,Topographic Position Index,drainage density,slope,TRASP(transformed the circular aspect to a radiation index),Snow-Covered Days and vegetation cover index are of highest importance in the DSM modeling,respectively.Accurate evaluation of the model using the Receiver Operating Characteristics(ROC)curve represents a very good accuracy(AUC=0.953)of the DSM model.展开更多
Multivariate regression models have been extensively studied in the literature and applied in practice.It is not unusual that some predictors may make the same nonnull contributions to all the elements of the response...Multivariate regression models have been extensively studied in the literature and applied in practice.It is not unusual that some predictors may make the same nonnull contributions to all the elements of the response vector,especially when the number of predictors is very large.For convenience,we call the set of such predictors as the homogeneity set.In this paper,we consider a sparse high-dimensional multivariate generalized linear models with coexisting homogeneity and heterogeneity sets of predictors,which is very important to facilitate the understanding of the effects of different types of predictors as well as improvement on the estimation efficiency.We propose a novel adaptive regularized method by which we can easily identify the homogeneity set of predictors and investigate the asymptotic properties of the parameter estimation.More importantly,the proposed method yields a smaller variance for parameter estimation compared to the ones that do not consider the existence of a homogeneity set of predictors.We also provide a computational algorithm and present its theoretical justification.In addition,we perform extensive simulation studies and present real data examples to demonstrate the proposed method.展开更多
This study investigated the factors contributing to intravenous admixture preparation errors(IAPEs)within Pharmacy Intravenous Admixture Services(PIVAS).A retrospective analysis was conducted on IAPEs documented in th...This study investigated the factors contributing to intravenous admixture preparation errors(IAPEs)within Pharmacy Intravenous Admixture Services(PIVAS).A retrospective analysis was conducted on IAPEs documented in the PIVAS unit of a large multi-specialty hospital in China,which houses over 2000 beds,covering the period from January 1,2015 to December 31,2022.Drug preparation records were examined using a generalized linear mixed model(GLMM)to identify both univariate and multivariate factors associated with IAPE occurrences.A total of 824 IAPE cases were recorded during the study period,yielding an overall error rate of 0.018%.Univariate analysis identified drug categories(general drugs,anti-infective drugs,and antineoplastic drugs),preparation time(workdays),and years of work experience as significant determinants(P<0.05).Multivariate analysis further confirmed that drug categories(general and antineoplastic drugs),preparation time(workdays),and work experience remained statistically significant predictors of IAPE incidence(P<0.05).IAPEs in PIVAS were influenced by multiple factors,predominantly those related to personnel and drug characteristics.Targeted interventions,informed by multivariate analysis,are essential to mitigating these errors and enhancing medication safety.展开更多
Background Abiotic factors exert different impacts on the abundance of individual tree species in the forest but little has been known about the impact of abiotic factors on the individual plant,particularly,in a trop...Background Abiotic factors exert different impacts on the abundance of individual tree species in the forest but little has been known about the impact of abiotic factors on the individual plant,particularly,in a tropical forest.This study identified the impact of abiotic factors on the abundances of Podocarpus falcatus,Croton macrostachyus,Celtis africana,Syzygium guineense,Olea capensis,Diospyros abyssinica,Feliucium decipenses,and Coffea arabica.A systematic sample design was used in the Harana forest,where 1122 plots were established to collect the abundance of species.Random forest(RF),artificial neural network(ANN),and generalized linear model(GLM)models were used to examine the impacts of topographic,climatic,and edaphic factors on the log abundances of woody species.The RF model was used to predict the spatial distribution maps of the log abundances of each species.Results The RF model achieved a better prediction accuracy with R^(2)=71%and a mean squared error(MSE)of 0.28 for Feliucium decipenses.The RF model differentiated elevation,temperature,precipitation,clay,and potassium were the top variables that influenced the abundance of species.The ANN model showed that elevation induced a nega-tive impact on the log abundances of all woody species.The GLM model reaffirmed the negative impact of elevation on all woody species except the log abundances of Syzygium guineense and Olea capensis.The ANN model indicated that soil organic matter(SOM)could positively affect the log abundances of all woody species.The GLM showed a similar positive impact of SOM,except for a negative impact on the log abundance of Celtis africana at p<0.05.The spatial distributions of the log abundances of Coffee arabica,Filicium decipenses,and Celtis africana were confined to the eastern parts,while the log abundance of Olea capensis was limited to the western parts.Conclusions The impacts of abiotic factors on the abundance of woody species may vary with species.This ecological understanding could guide the restoration activity of individual species.The prediction maps in this study provide spatially explicit information which can enhance the successful implementation of species conservation.展开更多
Under the assumption that in the generalized linear model (GLM) the expectation of the response variable has a correct specification and some other smooth conditions, it is shown that with probability one the quasi-li...Under the assumption that in the generalized linear model (GLM) the expectation of the response variable has a correct specification and some other smooth conditions, it is shown that with probability one the quasi-likelihood equation for the GLM has a solution when the sample size n is sufficiently large. The rate of this solution tending to the true value is determined. In an important special case, this rate is the same as specified in the LIL for iid partial sums and thus cannot be improved anymore.展开更多
The generalized linear model is an indispensable tool for analyzing non-Gaussian response data, with both canonical and non-canonical link functions comprehensively used. When missing values are present, many existing...The generalized linear model is an indispensable tool for analyzing non-Gaussian response data, with both canonical and non-canonical link functions comprehensively used. When missing values are present, many existing methods in the literature heavily depend on an unverifiable assumption of the missing data mechanism, and they fail when the assumption is violated. This paper proposes a missing data mechanism that is as generally applicable as possible, which includes both ignorable and nonignorable missing data cases, as well as both scenarios of missing values in response and covariate.Under this general missing data mechanism, the authors adopt an approximate conditional likelihood method to estimate unknown parameters. The authors rigorously establish the regularity conditions under which the unknown parameters are identifiable under the approximate conditional likelihood approach. For parameters that are identifiable, the authors prove the asymptotic normality of the estimators obtained by maximizing the approximate conditional likelihood. Some simulation studies are conducted to evaluate finite sample performance of the proposed estimators as well as estimators from some existing methods. Finally, the authors present a biomarker analysis in prostate cancer study to illustrate the proposed method.展开更多
In this paper, we explore some weakly consistent properties of quasi-maximum likelihood estimates (QMLE) concerning the quasi-likelihood equation $ \sum\nolimits_{i = 1}^n {X_i (y_i - \mu (X_i^\prime \beta ))} $ for u...In this paper, we explore some weakly consistent properties of quasi-maximum likelihood estimates (QMLE) concerning the quasi-likelihood equation $ \sum\nolimits_{i = 1}^n {X_i (y_i - \mu (X_i^\prime \beta ))} $ for univariate generalized linear model E(y|X) = μ(X′β). Given uncorrelated residuals {e i = Y i ? μ(X i ′ β0), 1 ? i ? n} and other conditions, we prove that $$ \hat \beta _n - \beta _0 = O_p (\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{\lambda } _n^{ - 1/2} ) $$ holds, where $ \hat \beta _n $ is a root of the above equation, β 0 is the true value of parameter β and $$ \underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{\lambda } _n $$ denotes the smallest eigenvalue of the matrix S n = ∑ i=1 n X i X i ′ . We also show that the convergence rate above is sharp, provided independent non-asymptotically degenerate residual sequence and other conditions. Moreover, paralleling to the elegant result of Drygas (1976) for classical linear regression models, we point out that the necessary condition guaranteeing the weak consistency of QMLE is S n ?1 → 0, as the sample size n → ∞.展开更多
Generalized linear measurement error models, such as Gaussian regression, Poisson regression and logistic regression, are considered. To eliminate the effects of measurement error on parameter estimation, a corrected ...Generalized linear measurement error models, such as Gaussian regression, Poisson regression and logistic regression, are considered. To eliminate the effects of measurement error on parameter estimation, a corrected empirical likelihood method is proposed to make statistical inference for a class of generalized linear measurement error models based on the moment identities of the corrected score function. The asymptotic distribution of the empirical log-likelihood ratio for the regression parameter is proved to be a Chi-squared distribution under some regularity conditions. The corresponding maximum empirical likelihood estimator of the regression parameter π is derived, and the asymptotic normality is shown. Furthermore, we consider the construction of the confidence intervals for one component of the regression parameter by using the partial profile empirical likelihood. Simulation studies are conducted to assess the finite sample performance. A real data set from the ACTG 175 study is used for illustrating the proposed method.展开更多
The neon flying squid, Ommastrephes bartramii, is a species of economically important cephalopod in the Northwest Pacific Ocean. Its short lifespan increases the susceptibility of the distribution and abundance to the...The neon flying squid, Ommastrephes bartramii, is a species of economically important cephalopod in the Northwest Pacific Ocean. Its short lifespan increases the susceptibility of the distribution and abundance to the direct impact of the environmental conditions. Based on the generalized linear model(GLM) and generalized additive model(GAM), the commercial fishery data from the Chinese squid-jigging fleets during 1995 to 2011 were used to examine the interannual and seasonal variability in the abundance of O. bartramii, and to evaluate the influences of variables on the abundance(catch per unit effort, CPUE). The results from GLM suggested that year, month, latitude, sea surface temperature(SST), mixed layer depth(MLD), and the interaction term(SST×MLD) were significant factors. The optimal model based on GAM included all the six significant variables and could explain 42.43% of the variance in nominal CPUE. The importance of the six variables was ranked by decreasing magnitude: year, month, latitude, SST, MLD and SST×MLD. The squid was mainly distributed in the waters between 40?N and 44?N in the Northwest Pacific Ocean. The optimal ranges of SST and MLD were from 14 to 20℃ and from 10 to 30 m, respectively. The squid abundance greatly fluctuated from 1995 to 2011. The CPUE was low during 1995–2002 and high during 2003–2008. Furthermore, the squid abundance was typically high in August. The interannual and seasonal variabilities in the squid abundance were associated with the variations of marine environmental conditions and the life history characteristics of squid.展开更多
There are many documented sex differences in the clinical course,symptom expression profile,and treatment response of Parkinson’s disease,creating additional challenges for patient management.Although subthalamic nuc...There are many documented sex differences in the clinical course,symptom expression profile,and treatment response of Parkinson’s disease,creating additional challenges for patient management.Although subthalamic nucleus deep brain stimulation is an established therapy for Parkinson’s disease,the effects of sex on treatment outcome are still unclear.The aim of this retrospective observational study,was to examine sex differences in motor symptoms,nonmotor symptoms,and quality of life after subthalamic nucleus deep brain stimulation.Outcome measures were evaluated at 1 and 12 months post-operation in 90 patients with Parkinson’s disease undergoing subthalamic nucleus deep brain stimulation aged 63.00±8.01 years(55 men and 35 women).Outcomes of clinical evaluations were compared between sexes via a Student’s t-test and within sex via a paired-sample t-test,and generalized linear models were established to identify factors associated with treatment efficacy and intensity for each sex.We found that subthalamic nucleus deep brain stimulation could improve motor symptoms in men but not women in the on-medication condition at 1 and 12 months post-operation.Restless legs syndrome was alleviated to a greater extent in men than in women.Women demonstrated poorer quality of life at baseline and achieved less improvement of quality of life than men after subthalamic nucleus deep brain stimulation.Furthermore,Hoehn-Yahr stage was positively correlated with the treatment response in men,while levodopa equivalent dose at 12 months post-operation was negatively correlated with motor improvement in women.In conclusion,women received less benefit from subthalamic nucleus deep brain stimulation than men in terms of motor symptoms,non-motor symptoms,and quality of life.We found sex-specific factors,i.e.,Hoehn-Yahr stage and levodopa equivalent dose,that were related to motor improvements.These findings may help to guide subthalamic nucleus deep brain stimulation patient selection,prognosis,and stimulation programming for optimal therapeutic efficacy in Parkinson’s disease.展开更多
基金Supported by the National Natural Science Foundation of China(10371092)
文摘We study the quasi likelihood equation in Generalized Linear Models(GLM)with adaptive design∑(i=1)^n xi(yi-h(x'iβ))=0,where yi is a q=vector,and xi is a p×q random matrix.Under some assumptions,it is shown that the Quasi-Likelihood equation for the GLM has a solution which is asymptotic normal.
基金The talent research fund launched (3004-893325) of Dalian University of Technologythe NNSF (10271049) of China.
文摘This article concerded with a semiparametric generalized partial linear model (GPLM) with the type Ⅱ censored data. A sieve maximum likelihood estimator (MLE) is proposed to estimate the parameter component, allowing exploration of the nonlinear relationship between a certain covariate and the response function. Asymptotic properties of the proposed sieve MLEs are discussed. Under some mild conditions, the estimators are shown to be strongly consistent. Moreover, the estimators of the unknown parameters are asymptotically normal and efficient, and the estimator of the nonparametric function has an optimal convergence rate.
文摘In a linear regression model, testing for uniformity of the variance of the residuals is a significant integral part of statistical analysis. This is a crucial assumption that requires statistical confirmation via the use of some statistical tests mostly before carrying out the Analysis of Variance (ANOVA) technique. Many academic researchers have published series of papers (articles) on some tests for detecting variance heterogeneity assumption in multiple linear regression models. So many comparisons on these tests have been made using various statistical techniques like biases, error rates as well as powers. Aside comparisons, modifications of some of these statistical tests for detecting variance heterogeneity have been reported in some literatures in recent years. In a multiple linear regression situation, much work has not been done on comparing some selected statistical tests for homoscedasticity assumption when linear, quadratic, square root, and exponential forms of heteroscedasticity are injected into the residuals. As a result of this fact, the present study intends to work extensively on all these areas of interest with a view to filling the gap. The paper aims at providing a comprehensive comparative analysis of asymptotic behaviour of some selected statistical tests for homoscedasticity assumption in order to hunt for the best statistical test for detecting heteroscedasticity in a multiple linear regression scenario with varying variances and levels of significance. In the literature, several tests for homoscedasticity are available but only nine: Breusch-Godfrey test, studentized Breusch-Pagan test, White’s test, Nonconstant Variance Score test, Park test, Spearman Rank, <span>Glejser test, Goldfeld-Quandt test, Harrison-McCabe test were considered for this study;this is with a view to examining, by Monte Carlo simulations, their</span><span> asymptotic behaviours. However, four different forms of heteroscedastic structures: exponential and linear (generalize of square-root and quadratic structures) were injected into the residual part of the multiple linear regression models at different categories of sample sizes: 30, 50, 100, 200, 500 and 1000. Evaluations of the performances were done within R environment. Among other findings, our investigations revealed that Glejser and Park tests returned the best test to employ to check for heteroscedasticity in EHS and LHS respectively also White and Harrison-McCabe tests returned the best test to employ to check for homoscedasticity in EHS and LHS respectively for sample size less than 50.</span>
文摘Changes in climate factors such as temperature, rainfall, humidity, and wind speed are natural processes that could significantly impact the incidence of infectious diseases. Dengue is a widespread disease that has often been documented when it comes to the impact of climate change. It has become a significant concern, especially for the Malaysian health authorities, due to its rapid spread and serious effects, leading to loss of life. Several statistical models were performed to identify climatic factors associated with infectious diseases. However, because of the complex and nonlinear interactions between climate variables and disease components, modelling their relationships have become the main challenge in climate-health studies. Hence, this study proposed a Generalized Linear Model (GLM) via Poisson and Negative Binomial to examine the effects of the climate factors on dengue incidence by considering the collinearity between variables. This study focuses on the dengue hot spots in Malaysia for the year 2014. Since there exists collinearity between climate factors, the analysis was done separately using three different models. The study revealed that rainfall, temperature, humidity, and wind speed were statistically significant with dengue incidence, and most of them shown a negative effect. Of all variables, wind speed has the most significant impact on dengue incidence. Having this kind of relationships, policymakers should formulate better plans such that precautionary steps can be taken to reduce the spread of dengue diseases.
文摘Stochastic models are derived to estimate the level of coliform count in terms of MPN index, one of the most important water quality characteristic in ground water based on a set of water source location and soil characteristics. The study is based on about twenty location and soil characteristics, majority of them are observed through laboratory analysis of soil and water samples collected from nearly thee hundred locations of drinking water sources, wells and bore wells selected at random from the district of Kasaragod. The water contamination in wells are found to be relatively more as compared to bore wells. The study reveals that only 7 % of the wells and 40 o~ of the bore wells of the district are within the permissible limit of WHO standard of drinking water quality. The level of contamination is very high in the hospital premises and is very low in the forest area. Two separate multiple ordinal logistic regression models are developed to predict the level of coliform count, one for well and the other for bore well. The significant feature of this study is that in addition to scientifically proving the dependence of the water quality on the distances from waste disposal area and septic tanks etc., it highlights the dependence of two other very significant soil characteristics, the soil organic carbon and soil porosity. The models enable to predict the quality of water in a location based on the set of soil and location characteristics. One of the important uses of the model is in fixing safe locations for waste dump area, septic tank, digging well etc. in town planning, designing residential layouts, industrial layouts, hospital/hostel construction etc. This is the first ever study to describe the ground water quality in terms of the location and soil characteristics.
文摘A new covariate dependent zero-truncated bivariate Poisson model is proposed in this paper employing generalized linear model. A marginal-conditional approach is used to show the bivariate model. The proposed model with estimation procedure and tests for goodness-of-fit and under (or over) dispersion are shown and applied to road safety data. Two correlated outcome variables considered in this study are number of cars involved in an accident and number of casualties for given number of cars.
文摘Count data that exhibit over dispersion (variance of counts is larger than its mean) are commonly analyzed using discrete distributions such as negative binomial, Poisson inverse Gaussian and other models. The Poisson is characterized by the equality of mean and variance whereas the Negative Binomial and the Poisson inverse Gaussian have variance larger than the mean and therefore are more appropriate to model over-dispersed count data. As an alternative to these two models, we shall use the generalized Poisson distribution for group comparisons in the presence of multiple covariates. This problem is known as the ANCOVA and is solved for continuous data. Our objectives were to develop ANCOVA using the generalized Poisson distribution, and compare its goodness of fit to that of the nonparametric Generalized Additive Models. We used real life data to show that the model performs quite satisfactorily when compared to the nonparametric Generalized Additive Models.
基金funded by the National Natural Science Foundation of China(Grant No.32271872).
文摘Over the past decades,the expansion of natu-ral secondary forests has played a crucial role in offsetting the loss of primary forests and combating climate change.Despite this,there is a gap in our understanding of how tree species’growth and mortality patterns vary with eleva-tion in these secondary forests.In this study,we analyzed data from two censuses(spanning a five-year interval)conducted in both evergreen broadleaved forests(EBF)and temperate coniferous forests(TCF),which have been recovering for half a century,across elevation gradients in a subtropical mountain region,Mount Wuyi,China.The results indicated that the relative growth rate(RGR)of EBF(0.028±0.001 cm·cm^(-1)·a^(-1))and the mortality rate(MR)(20.03%±1.70%)were 27.3%and 16.4%higher,respec-tively,than those of TCF.Interestingly,the trade-off between RGR and MR in EBF weakened as elevation increased,a trend not observed in TCF.Conversely,TCF consistently showed a stronger trade-off between RGR and MR compared to EBF.Generalized linear mixed models revealed that ele-vation influences RGR both directly and indirectly through its interactions with slope,crown competition index(CCI),and tree canopy height(CH).However,tree mortality did not show a significant correlation with elevation.Additionally,DBH significantly influenced both tree growth and mortal-ity,whereas and CH and CCI had opposite effects on tree growth between EBF and TCF.Our study underscores the importance of elevation in shaping the population dynamics and the biomass carbon sink balance of mountain forests.These insights enhance our understanding of tree species’life strategies,enabling more accurate predictions of forest dynamics and their response to environmental changes.
文摘Since its inception,the epsilon distribution has piqued the interest of statisticians.It has been successfully used to solve a variety of statistical problems.In this article,we propose to use the quadratic rank transmutation map mechanism to extend this distribution.This mechanism is not new;it was already used to improve the modeling capabilities of a number of existing distributions.For the original epsilon distribution,we expect the same benefits.As a result,we implement the transmuted epsilon distribution as a flexible three-parameter distribution with a bounded domain.We demonstrate its key features,focusing on the properties of its distributional mechanism and conducting quantile and moment analyses.Applications of the model are presented using two data sets.We also perform a regression analysis based on this distribution.
基金jointly funded by the National Natural Science Foundation of China(Grant Nos.42122035,42288101,42130605,72293604,42475179,and 42475020)the support of the Guangdong Provincial Observation and Research Station for Tropical Ocean Environment in Western Coastal Waters(GSTOEW)+2 种基金Key Laboratory of Space Ocean Remote Sensing and ApplicationCMAGDOU Joint Laboratory for Marine MeteorologyKey Laboratory of Climate Resources and Environment in Continental Shelf Sea and Deep Ocean(LCRE)。
文摘The multiscale variability in summer extreme persistent precipitation(SEPP)in China from 1961 to 2020 was investigated via three extreme precipitation indices:consecutive wet days,total precipitation amount,and daily precipitation intensity.The relationships between precursory and concurrent global oceanic modes and SEPP were identified via a generalized linear model(GLM).The influence of oceanic modes on SEPP was finally investigated via numerical simulations.The results revealed that the climatological SEPP(≥14 days)mainly appears across the Tibetan Plateau,Yunnan–Guizhou Plateau,and South China coast.The first EOF mode for all three indices showed strong signals over the Yangtze River.Further analysis via the GLM suggested that the positive phases of the tropical North Atlantic(TNA)in autumn,ENSO in winter,the Indian Ocean Basin(IOB)in spring,and the western North Pacific(WNP)in summer emerged as the most effective precursory factors of SEPP,which could serve as preceding signals for future predictions,contributing 30.2%,36.4%,38.0%,and 55.6%,respectively,to the GLM.Sensitivity experiments revealed that SST forcing in all four seasons contributes to SEPP over China,whereas the winter and summer SST warming over the Pacific and Indian Ocean(IO)contributes the most.Diagnosis of the hydrological cycle suggested that water vapor advection predominantly originates from the western Pacific and IO in summer,driven by the strengthened subtropical high and Asian summer monsoon(ASM).The enhanced vertical water vapor transport is attributed to stronger upward motion across all four seasons.These findings are helpful for better understanding SEPP variabilities and their prediction under SST warming.
文摘Doline susceptibility mapping(DSM)in karst aquifer is important in terms of estimating the vulnerability of the aquifer to pollutants,estimating the infiltration rate,and infrastructures exposed to the development of dolines.In this research,doline susceptibility map was prepared in Saldaran mountain by generalized linear model(GLM)using 14 affecting parameters extracted from satellite images,digital elevation model,and geology map.Only 8 parameters have been inputted to the model which had correlation with dolines.In this regards,306 dolines were identified by the photogrammetric Unmanned Aerial Vehicles(UAV)method in 600 hectares of Salderan lands and then,these data were divided into the training(70%)and testing(30%)data for modelling.The results of DSM modeling showed that classified probability of doline occurrences in the Saldaran mountain were as follow:16.5%of the area high to very high,72%in the class of low to very low,and 5%in the moderate class.Also,locally,in Saldaran mountain,the Pirghar aquifer has the highest potential for the doline development,followed by Bagh Rostam and Sarab aquifers.Also,the precipitation,digital elevation model,Topographic Position Index,drainage density,slope,TRASP(transformed the circular aspect to a radiation index),Snow-Covered Days and vegetation cover index are of highest importance in the DSM modeling,respectively.Accurate evaluation of the model using the Receiver Operating Characteristics(ROC)curve represents a very good accuracy(AUC=0.953)of the DSM model.
文摘Multivariate regression models have been extensively studied in the literature and applied in practice.It is not unusual that some predictors may make the same nonnull contributions to all the elements of the response vector,especially when the number of predictors is very large.For convenience,we call the set of such predictors as the homogeneity set.In this paper,we consider a sparse high-dimensional multivariate generalized linear models with coexisting homogeneity and heterogeneity sets of predictors,which is very important to facilitate the understanding of the effects of different types of predictors as well as improvement on the estimation efficiency.We propose a novel adaptive regularized method by which we can easily identify the homogeneity set of predictors and investigate the asymptotic properties of the parameter estimation.More importantly,the proposed method yields a smaller variance for parameter estimation compared to the ones that do not consider the existence of a homogeneity set of predictors.We also provide a computational algorithm and present its theoretical justification.In addition,we perform extensive simulation studies and present real data examples to demonstrate the proposed method.
基金The National Natural Science Foundation of China(Grant No.72474013)the Beijing Health Technology Promotion Project(Grant No.BHTPP2024007)。
文摘This study investigated the factors contributing to intravenous admixture preparation errors(IAPEs)within Pharmacy Intravenous Admixture Services(PIVAS).A retrospective analysis was conducted on IAPEs documented in the PIVAS unit of a large multi-specialty hospital in China,which houses over 2000 beds,covering the period from January 1,2015 to December 31,2022.Drug preparation records were examined using a generalized linear mixed model(GLMM)to identify both univariate and multivariate factors associated with IAPE occurrences.A total of 824 IAPE cases were recorded during the study period,yielding an overall error rate of 0.018%.Univariate analysis identified drug categories(general drugs,anti-infective drugs,and antineoplastic drugs),preparation time(workdays),and years of work experience as significant determinants(P<0.05).Multivariate analysis further confirmed that drug categories(general and antineoplastic drugs),preparation time(workdays),and work experience remained statistically significant predictors of IAPE incidence(P<0.05).IAPEs in PIVAS were influenced by multiple factors,predominantly those related to personnel and drug characteristics.Targeted interventions,informed by multivariate analysis,are essential to mitigating these errors and enhancing medication safety.
文摘Background Abiotic factors exert different impacts on the abundance of individual tree species in the forest but little has been known about the impact of abiotic factors on the individual plant,particularly,in a tropical forest.This study identified the impact of abiotic factors on the abundances of Podocarpus falcatus,Croton macrostachyus,Celtis africana,Syzygium guineense,Olea capensis,Diospyros abyssinica,Feliucium decipenses,and Coffea arabica.A systematic sample design was used in the Harana forest,where 1122 plots were established to collect the abundance of species.Random forest(RF),artificial neural network(ANN),and generalized linear model(GLM)models were used to examine the impacts of topographic,climatic,and edaphic factors on the log abundances of woody species.The RF model was used to predict the spatial distribution maps of the log abundances of each species.Results The RF model achieved a better prediction accuracy with R^(2)=71%and a mean squared error(MSE)of 0.28 for Feliucium decipenses.The RF model differentiated elevation,temperature,precipitation,clay,and potassium were the top variables that influenced the abundance of species.The ANN model showed that elevation induced a nega-tive impact on the log abundances of all woody species.The GLM model reaffirmed the negative impact of elevation on all woody species except the log abundances of Syzygium guineense and Olea capensis.The ANN model indicated that soil organic matter(SOM)could positively affect the log abundances of all woody species.The GLM showed a similar positive impact of SOM,except for a negative impact on the log abundance of Celtis africana at p<0.05.The spatial distributions of the log abundances of Coffee arabica,Filicium decipenses,and Celtis africana were confined to the eastern parts,while the log abundance of Olea capensis was limited to the western parts.Conclusions The impacts of abiotic factors on the abundance of woody species may vary with species.This ecological understanding could guide the restoration activity of individual species.The prediction maps in this study provide spatially explicit information which can enhance the successful implementation of species conservation.
基金This work was supported by the National Natural Science Foundation of China.
文摘Under the assumption that in the generalized linear model (GLM) the expectation of the response variable has a correct specification and some other smooth conditions, it is shown that with probability one the quasi-likelihood equation for the GLM has a solution when the sample size n is sufficiently large. The rate of this solution tending to the true value is determined. In an important special case, this rate is the same as specified in the LIL for iid partial sums and thus cannot be improved anymore.
基金supported by the Chinese 111 Project B14019the US National Science Foundation under Grant Nos.DMS-1305474 and DMS-1612873the US National Institutes of Health Award UL1TR001412
文摘The generalized linear model is an indispensable tool for analyzing non-Gaussian response data, with both canonical and non-canonical link functions comprehensively used. When missing values are present, many existing methods in the literature heavily depend on an unverifiable assumption of the missing data mechanism, and they fail when the assumption is violated. This paper proposes a missing data mechanism that is as generally applicable as possible, which includes both ignorable and nonignorable missing data cases, as well as both scenarios of missing values in response and covariate.Under this general missing data mechanism, the authors adopt an approximate conditional likelihood method to estimate unknown parameters. The authors rigorously establish the regularity conditions under which the unknown parameters are identifiable under the approximate conditional likelihood approach. For parameters that are identifiable, the authors prove the asymptotic normality of the estimators obtained by maximizing the approximate conditional likelihood. Some simulation studies are conducted to evaluate finite sample performance of the proposed estimators as well as estimators from some existing methods. Finally, the authors present a biomarker analysis in prostate cancer study to illustrate the proposed method.
基金supported by the President Foundation (Grant No. Y1050)the Scientific Research Foundation(Grant No. KYQD200502) of GUCAS
文摘In this paper, we explore some weakly consistent properties of quasi-maximum likelihood estimates (QMLE) concerning the quasi-likelihood equation $ \sum\nolimits_{i = 1}^n {X_i (y_i - \mu (X_i^\prime \beta ))} $ for univariate generalized linear model E(y|X) = μ(X′β). Given uncorrelated residuals {e i = Y i ? μ(X i ′ β0), 1 ? i ? n} and other conditions, we prove that $$ \hat \beta _n - \beta _0 = O_p (\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{\lambda } _n^{ - 1/2} ) $$ holds, where $ \hat \beta _n $ is a root of the above equation, β 0 is the true value of parameter β and $$ \underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{\lambda } _n $$ denotes the smallest eigenvalue of the matrix S n = ∑ i=1 n X i X i ′ . We also show that the convergence rate above is sharp, provided independent non-asymptotically degenerate residual sequence and other conditions. Moreover, paralleling to the elegant result of Drygas (1976) for classical linear regression models, we point out that the necessary condition guaranteeing the weak consistency of QMLE is S n ?1 → 0, as the sample size n → ∞.
基金supported by National Natural Science Foundation of China(Grant Nos.11301569,11471029 and 11101014)the Beijing Natural Science Foundation(Grant No.1142002)+2 种基金the Science and Technology Project of Beijing Municipal Education Commission(Grant No.KM201410005010)Hong Kong Research Grant(Grant No.HKBU202711)Hong Kong Baptist University FRG Grants(Grant Nos.FRG2/11-12/110 and FRG1/13-14/018)
文摘Generalized linear measurement error models, such as Gaussian regression, Poisson regression and logistic regression, are considered. To eliminate the effects of measurement error on parameter estimation, a corrected empirical likelihood method is proposed to make statistical inference for a class of generalized linear measurement error models based on the moment identities of the corrected score function. The asymptotic distribution of the empirical log-likelihood ratio for the regression parameter is proved to be a Chi-squared distribution under some regularity conditions. The corresponding maximum empirical likelihood estimator of the regression parameter π is derived, and the asymptotic normality is shown. Furthermore, we consider the construction of the confidence intervals for one component of the regression parameter by using the partial profile empirical likelihood. Simulation studies are conducted to assess the finite sample performance. A real data set from the ACTG 175 study is used for illustrating the proposed method.
基金financially supported by the National HighTech R&D Program(863 Program)of China(2012AA 092303)the Project of Shanghai Science and Technology Innovation(12231203900)+3 种基金the Industrialization Program of National Development and Reform Commission(2159999)the National Key Technologies R&D Program of China(2013BAD13B00)the Shanghai Universities First-Class Disciplines Project(Fisheries A)the Funding Program for Outstanding Dissertations in Shanghai Ocean University
文摘The neon flying squid, Ommastrephes bartramii, is a species of economically important cephalopod in the Northwest Pacific Ocean. Its short lifespan increases the susceptibility of the distribution and abundance to the direct impact of the environmental conditions. Based on the generalized linear model(GLM) and generalized additive model(GAM), the commercial fishery data from the Chinese squid-jigging fleets during 1995 to 2011 were used to examine the interannual and seasonal variability in the abundance of O. bartramii, and to evaluate the influences of variables on the abundance(catch per unit effort, CPUE). The results from GLM suggested that year, month, latitude, sea surface temperature(SST), mixed layer depth(MLD), and the interaction term(SST×MLD) were significant factors. The optimal model based on GAM included all the six significant variables and could explain 42.43% of the variance in nominal CPUE. The importance of the six variables was ranked by decreasing magnitude: year, month, latitude, SST, MLD and SST×MLD. The squid was mainly distributed in the waters between 40?N and 44?N in the Northwest Pacific Ocean. The optimal ranges of SST and MLD were from 14 to 20℃ and from 10 to 30 m, respectively. The squid abundance greatly fluctuated from 1995 to 2011. The CPUE was low during 1995–2002 and high during 2003–2008. Furthermore, the squid abundance was typically high in August. The interannual and seasonal variabilities in the squid abundance were associated with the variations of marine environmental conditions and the life history characteristics of squid.
基金support by the National Nature Science Foundation of China,Nos.81830033,61761166004(both to JGZ)。
文摘There are many documented sex differences in the clinical course,symptom expression profile,and treatment response of Parkinson’s disease,creating additional challenges for patient management.Although subthalamic nucleus deep brain stimulation is an established therapy for Parkinson’s disease,the effects of sex on treatment outcome are still unclear.The aim of this retrospective observational study,was to examine sex differences in motor symptoms,nonmotor symptoms,and quality of life after subthalamic nucleus deep brain stimulation.Outcome measures were evaluated at 1 and 12 months post-operation in 90 patients with Parkinson’s disease undergoing subthalamic nucleus deep brain stimulation aged 63.00±8.01 years(55 men and 35 women).Outcomes of clinical evaluations were compared between sexes via a Student’s t-test and within sex via a paired-sample t-test,and generalized linear models were established to identify factors associated with treatment efficacy and intensity for each sex.We found that subthalamic nucleus deep brain stimulation could improve motor symptoms in men but not women in the on-medication condition at 1 and 12 months post-operation.Restless legs syndrome was alleviated to a greater extent in men than in women.Women demonstrated poorer quality of life at baseline and achieved less improvement of quality of life than men after subthalamic nucleus deep brain stimulation.Furthermore,Hoehn-Yahr stage was positively correlated with the treatment response in men,while levodopa equivalent dose at 12 months post-operation was negatively correlated with motor improvement in women.In conclusion,women received less benefit from subthalamic nucleus deep brain stimulation than men in terms of motor symptoms,non-motor symptoms,and quality of life.We found sex-specific factors,i.e.,Hoehn-Yahr stage and levodopa equivalent dose,that were related to motor improvements.These findings may help to guide subthalamic nucleus deep brain stimulation patient selection,prognosis,and stimulation programming for optimal therapeutic efficacy in Parkinson’s disease.