Quantile regression(QR)has become an important tool to measure dependence of response variable's quantiles on a number of predictors for heterogeneous data,especially heavy-tailed data and outliers.However,it is q...Quantile regression(QR)has become an important tool to measure dependence of response variable's quantiles on a number of predictors for heterogeneous data,especially heavy-tailed data and outliers.However,it is quite challenging to make statistical inference on distributed high-dimensional QR with missing data due to the distributed nature,sparsity and missingness of data and nondifferentiable quantile loss function.To overcome the challenge,this paper develops a communicationefficient method to select variables and estimate parameters by utilizing a smooth function to approximate the non-differentiable quantile loss function and incorporating the idea of the inverse probability weighting and the penalty function.The proposed approach has three merits.First,it is both computationally and communicationally efficient because only the first-and second-order information of the approximate objective function are communicated at each iteration.Second,the proposed estimators possess the oracle property after a limited number of iterations without constraint on the number of machines.Third,the proposed method simultaneously selects variables and estimates parameters within a distributed framework,ensuring robustness to the specified response probability or propensity score function of the missing data mechanism.Simulation studies and a real example are used to illustrate the effectiveness of the proposed methodologies.展开更多
Often the lifecycle data occur as count of the vital events and are recorded as integers.The purpose of this article is to model the fertility behavior based on religious,educational,economic,and occupational characte...Often the lifecycle data occur as count of the vital events and are recorded as integers.The purpose of this article is to model the fertility behavior based on religious,educational,economic,and occupational characteristics.The responses of classified groups according to these determinants are examined for significant influence on fertility using Poisson regression model(PRM) based on the National Family Health Survey-3 dataset.The observed and predicted probabilities under PRM indicate modal value of two children for the Poisson distribution modeled data.Presence of dominance of two child in the data motivates the authors to adopt multinomial regression model(MRM) in order to link fertility with various socioeconomic indicators responsible for fertility variation.Choice of the explanatory factors is limited to the availability of data.Trends and patterns of preference for birth counts suggest that religion,caste,wealth,female education,and occupation are the dominant factors shaping the observed birth process.Empirical analysis suggests that both the models used in the study perform similarly on the sample data.However,fitting of MRM by taking birth count of two as comparison category shows improved Akaike information criterion and consistent Akaike information criterion values.Current work contributes to the existing literature as it attempts to provide more insight into the determinants of Indian fertility using Poisson and MRM.展开更多
Model averaging has attracted increasing attention in recent years for the analysis of high-dimensional data. By weighting several competing statistical models suitably, model averaging attempts to achieve stable and ...Model averaging has attracted increasing attention in recent years for the analysis of high-dimensional data. By weighting several competing statistical models suitably, model averaging attempts to achieve stable and improved prediction. To obtain a better understanding of the available model averaging methods, their properties and the relationships between them, this paper is devoted to make a review on some recent progresses in high-dimensional model averaging from the frequentist perspective. Some future research topics are also discussed.展开更多
The present paper proposes a new robust estimator for Poisson regression models. We used the weighted maximum likelihood estimators which are regarded as Mallows-type estimators. We perform a Monte Carlo simulation st...The present paper proposes a new robust estimator for Poisson regression models. We used the weighted maximum likelihood estimators which are regarded as Mallows-type estimators. We perform a Monte Carlo simulation study to assess the performance of a suggested estimator compared to the maximum likelihood estimator and some robust methods. The result shows that, in general, all robust methods in this paper perform better than the classical maximum likelihood estimators when the model contains outliers. The proposed estimators showed the best performance compared to other robust estimators.展开更多
Stop frequency models, as one of the elements of activity based models, represent an important part of travel behavior. Unobserved heterogeneity across the travelers should be taken into consideration to prevent biase...Stop frequency models, as one of the elements of activity based models, represent an important part of travel behavior. Unobserved heterogeneity across the travelers should be taken into consideration to prevent biasedness and inconsistency in the estimated parameters in the stop frequency models. Additionally, previous studies on the stop frequency have mostly been done in larger metropolitan areas and less attention has been paid to the areas with less population. This study addresses these gaps by using 2012 travel data from a medium sized U.S. urban area using the work tour for the case study. Stop in the work tour were classified into three groups of outbound leg, work based subtour, and inbound leg of the commutes. Latent Class Poisson Regression Models were used to analyze the data. The results indicate the presence of heterogeneity across the commuters. Using latent class models significantly improves the predictive power of the models compared to regular one class Poisson regression models. In contrast to one class Poisson models, gender becomes insignificant in predicting the number of tours when unobserved heterogeneity is accounted for. The commuters are associated with increased stops on their work based subtour when the employment density of service-related occupations increases in their work zone, but employment density of retail employment does not significantly contribute to the stop making likelihood of the commuters. Additionally, an increase in the number of work tours was associated with fewer stops on the inbound leg of the commute. The results of this study suggest the consideration of unobserved heterogeneity in the stop frequency models and help transportation agencies and policy makers make better inferences from such models.展开更多
基金supported by the National Key R&D Program of China under Grant No.2022YFA1003701the Open Research Fund of Yunnan Key Laboratory of Statistical Modeling and Data Analysis,Yunnan University under Grant No.SMDAYB2023004。
文摘Quantile regression(QR)has become an important tool to measure dependence of response variable's quantiles on a number of predictors for heterogeneous data,especially heavy-tailed data and outliers.However,it is quite challenging to make statistical inference on distributed high-dimensional QR with missing data due to the distributed nature,sparsity and missingness of data and nondifferentiable quantile loss function.To overcome the challenge,this paper develops a communicationefficient method to select variables and estimate parameters by utilizing a smooth function to approximate the non-differentiable quantile loss function and incorporating the idea of the inverse probability weighting and the penalty function.The proposed approach has three merits.First,it is both computationally and communicationally efficient because only the first-and second-order information of the approximate objective function are communicated at each iteration.Second,the proposed estimators possess the oracle property after a limited number of iterations without constraint on the number of machines.Third,the proposed method simultaneously selects variables and estimates parameters within a distributed framework,ensuring robustness to the specified response probability or propensity score function of the missing data mechanism.Simulation studies and a real example are used to illustrate the effectiveness of the proposed methodologies.
基金supported by R&D Grant from University of DelhiDU-DST PURSE GrantICMR Grant No.3/1/3/JRF-2010/HRD-122(35831)
文摘Often the lifecycle data occur as count of the vital events and are recorded as integers.The purpose of this article is to model the fertility behavior based on religious,educational,economic,and occupational characteristics.The responses of classified groups according to these determinants are examined for significant influence on fertility using Poisson regression model(PRM) based on the National Family Health Survey-3 dataset.The observed and predicted probabilities under PRM indicate modal value of two children for the Poisson distribution modeled data.Presence of dominance of two child in the data motivates the authors to adopt multinomial regression model(MRM) in order to link fertility with various socioeconomic indicators responsible for fertility variation.Choice of the explanatory factors is limited to the availability of data.Trends and patterns of preference for birth counts suggest that religion,caste,wealth,female education,and occupation are the dominant factors shaping the observed birth process.Empirical analysis suggests that both the models used in the study perform similarly on the sample data.However,fitting of MRM by taking birth count of two as comparison category shows improved Akaike information criterion and consistent Akaike information criterion values.Current work contributes to the existing literature as it attempts to provide more insight into the determinants of Indian fertility using Poisson and MRM.
文摘Model averaging has attracted increasing attention in recent years for the analysis of high-dimensional data. By weighting several competing statistical models suitably, model averaging attempts to achieve stable and improved prediction. To obtain a better understanding of the available model averaging methods, their properties and the relationships between them, this paper is devoted to make a review on some recent progresses in high-dimensional model averaging from the frequentist perspective. Some future research topics are also discussed.
文摘The present paper proposes a new robust estimator for Poisson regression models. We used the weighted maximum likelihood estimators which are regarded as Mallows-type estimators. We perform a Monte Carlo simulation study to assess the performance of a suggested estimator compared to the maximum likelihood estimator and some robust methods. The result shows that, in general, all robust methods in this paper perform better than the classical maximum likelihood estimators when the model contains outliers. The proposed estimators showed the best performance compared to other robust estimators.
文摘Stop frequency models, as one of the elements of activity based models, represent an important part of travel behavior. Unobserved heterogeneity across the travelers should be taken into consideration to prevent biasedness and inconsistency in the estimated parameters in the stop frequency models. Additionally, previous studies on the stop frequency have mostly been done in larger metropolitan areas and less attention has been paid to the areas with less population. This study addresses these gaps by using 2012 travel data from a medium sized U.S. urban area using the work tour for the case study. Stop in the work tour were classified into three groups of outbound leg, work based subtour, and inbound leg of the commutes. Latent Class Poisson Regression Models were used to analyze the data. The results indicate the presence of heterogeneity across the commuters. Using latent class models significantly improves the predictive power of the models compared to regular one class Poisson regression models. In contrast to one class Poisson models, gender becomes insignificant in predicting the number of tours when unobserved heterogeneity is accounted for. The commuters are associated with increased stops on their work based subtour when the employment density of service-related occupations increases in their work zone, but employment density of retail employment does not significantly contribute to the stop making likelihood of the commuters. Additionally, an increase in the number of work tours was associated with fewer stops on the inbound leg of the commute. The results of this study suggest the consideration of unobserved heterogeneity in the stop frequency models and help transportation agencies and policy makers make better inferences from such models.