This paper deals with estimation and test procedures for restricted linear errors-invariables(EV) models with nonignorable missing covariates. We develop a restricted weighted corrected least squares(WCLS) estimator b...This paper deals with estimation and test procedures for restricted linear errors-invariables(EV) models with nonignorable missing covariates. We develop a restricted weighted corrected least squares(WCLS) estimator based on the propensity score, which is fitted by an exponentially tilted likelihood method. The limiting distributions of the proposed estimators are discussed when tilted parameter is known or unknown. To test the validity of the constraints,we construct two test procedures based on corrected residual sum of squares and empirical likelihood method and derive their asymptotic properties. Numerical studies are conducted to examine the finite sample performance of our proposed methods.展开更多
Assessing the influence of individual observations of the functional linear models is important and challenging,especially when the observations are subject to missingness.In this paper,we introduce three case-deletio...Assessing the influence of individual observations of the functional linear models is important and challenging,especially when the observations are subject to missingness.In this paper,we introduce three case-deletion diagnostic measures to identify influential observations in functional linear models when the covariate is functional and observations on the scalar response are subject to nonignorable missingness.The nonignorable missing data mechanism is modeled via an exponential tilting semiparametric functional model.A semiparametric imputation procedure is developed to mitigate the effects of missing data.Valid estimations of the functional coefficients are based on functional principal components analysis using the imputed dataset.A smoothed bootstrap samplingmethod is introduced to estimate the diagnostic probability for each proposed diagnostic measure,which is helpful to unveil which observations have the larger influence on estimation and prediction.Simulation studies and a real data example are conducted to illustrate the finite performance of the proposed methods.展开更多
The generalized additive partial linear models(GAPLM)have been widely used for flexiblemodeling of various types of response.In practice,missing data usually occurs in studies of economics,medicine,and public health.W...The generalized additive partial linear models(GAPLM)have been widely used for flexiblemodeling of various types of response.In practice,missing data usually occurs in studies of economics,medicine,and public health.We address the problem of identifying and estimating GAPLM when the response variable is nonignorably missing.Three types of monotone missing data mechanism are assumed,including logistic model,probit model and complementary log-log model.In this situation,likelihood based on observed data may not be identifiable.In this article,we show that the parameters of interest are identifiable under very mild conditions,and then construct the estimators of the unknown parameters and unknown functions based on a likelihood-based approach by expanding the unknown functions as a linear combination of polynomial spline functions.We establish asymptotic normality for the estimators of the parametric components.Simulation studies demonstrate that the proposed inference procedure performs well in many settings.We apply the proposed method to the household income dataset from the Chinese Household Income Project Survey 2013.展开更多
Weconsider a model identification problem in which an outcome variable contains nonignorable missing values.Statistical inference requires a guarantee of the model identifiability to obtain estimators enjoying theoret...Weconsider a model identification problem in which an outcome variable contains nonignorable missing values.Statistical inference requires a guarantee of the model identifiability to obtain estimators enjoying theoretically reasonable properties such as consistency and asymptotic normality.Recently,instrumental or shadow variables,combined with the completeness condition in the outcome model,have been highlighted to make a model identifiable.In this paper,we elucidate the relationship between the completeness condition and model identifiability when the instrumental variable is categorical.We first show that when both the outcome and instrumental variables are categorical,the two conditions are equivalent.However,when one of the outcome and instrumental variables is continuous,the completeness condition may not necessarily hold,even for simple models.Consequently,we provide a sufficient condition that guarantees the identifiability of models exhibiting a monotone-likelihood property,a condition particularly useful in instances where establishing the completeness condition poses significant challenges.Using observed data,we demonstrate that the proposed conditions are easy to check for many practical models and outline their usefulness in numerical experiments and real data analysis.展开更多
We consider the statistical inference for right-censored data when censoring indicators are missing but nonignorable, and propose an adjusted imputation product-limit estimator. The proposed estimator is shown to be c...We consider the statistical inference for right-censored data when censoring indicators are missing but nonignorable, and propose an adjusted imputation product-limit estimator. The proposed estimator is shown to be consistent and converges to a Gaussian process. Furthermore, we develop an empirical processbased testing method to check the MAR (missing at random) mechanism, and establish asymptotic properties for the proposed test statistic. To determine the critical value of the test, a consistent model-based bootstrap method is suggested. We conduct simulation studies to evaluate the numerical performance of the proposed method and compare it with existing methods. We also analyze a real data set from a breast cancer study for an illustration.展开更多
Tang et al. (2003. Analysis of multivariate missing data with nonignorable nonresponse.Biometrika, 90(4), 747–764) and Zhao & Shao (2015. Semiparametric pseudo-likelihoods in generalized linear models with nonign...Tang et al. (2003. Analysis of multivariate missing data with nonignorable nonresponse.Biometrika, 90(4), 747–764) and Zhao & Shao (2015. Semiparametric pseudo-likelihoods in generalized linear models with nonignorable missing data. Journal of the American Statistical Association, 110(512), 1577–1590) proposed a pseudo likelihood approach to estimate unknownparameters in a parametric density of a response Y conditioned on a vector of covariate X, whereY is subjected to nonignorable nonersponse, X is always observed, and the propensity of whetheror not Y is observed conditioned on Y and X is completely unspecified. To identify parameters, Zhao & Shao (2015. Semiparametric pseudo-likelihoods in generalized linear models withnonignorable missing data. Journal of the American Statistical Association, 110(512), 1577–1590)assumed that X can be decomposed into U and Z, where Z can be excluded from the propensitybut is related with Y even conditioned on U. The pseudo likelihood involves the estimation ofthe joint density of U and Z. When this density is estimated nonparametrically, in this paper weapply sufficient dimension reduction to reduce the dimension of U for efficient estimation. Consistency and asymptotic normality of the proposed estimators are established. Simulation resultsare presented to study the finite sample performance of the proposed estimators.展开更多
In this paper, we investigate the model checking problem for a general linear model with nonignorable missing covariates. We show that, without any parametric model assumption for the response probability, the least s...In this paper, we investigate the model checking problem for a general linear model with nonignorable missing covariates. We show that, without any parametric model assumption for the response probability, the least squares method yields consistent estimators for the linear model even if only the complete data are applied. This makes it feasible to propose two testing procedures for the corresponding model checking problem: a score type lack-of-fit test and a test based on the empirical process. The asymptotic properties of the test statistics are investigated. Both tests are shown to have asymptotic power 1 for local alternatives converging to the null at the rate n-r, 0 ≤ r 〈 1/2. Simulation results show that both tests perform satisfactorily.展开更多
We consider multivariate small area estimation under nonignorable, not missing at random(NMAR) nonresponse. We assume a response model that accounts for the different patterns ofthe observed outcomes, (which values ar...We consider multivariate small area estimation under nonignorable, not missing at random(NMAR) nonresponse. We assume a response model that accounts for the different patterns ofthe observed outcomes, (which values are observed and which ones are missing), and estimatethe response probabilities by application of the Missing Information Principle (MIP). By this principle, we first derive the likelihood score equations for the case where the missing outcomes areactually observed, and then integrate out the unobserved outcomes from the score equationswith respect to the distribution holding for the missing data. The latter distribution is definedby the distribution fitted to the observed data for the respondents and the response model. Theintegrated score equations are then solved with respect to the unknown parameters indexingthe response model. Once the response probabilities have been estimated, we impute the missing outcomes from their appropriate distribution, yielding a complete data set with no missingvalues, which is used for predicting the target area means. A parametric bootstrap procedure isdeveloped for assessing the mean squared errors (MSE) of the resulting predictors. We illustratethe approach by a small simulation study.展开更多
Informative dropout often arise in longitudinal data. In this paper we propose a mixture model in which the responses follow a semiparametric varying coefficient random effects model and some of the regression coeffic...Informative dropout often arise in longitudinal data. In this paper we propose a mixture model in which the responses follow a semiparametric varying coefficient random effects model and some of the regression coefficients depend on the dropout time in a non-parametric way. The local linear version of the profile-kernel method is used to estimate the parameters of the model. The proposed estimators are shown to be consistent and asymptotically normal, and the finite performance of the estimators is evaluated by numerical simulation.展开更多
The generalized linear model is an indispensable tool for analyzing non-Gaussian response data, with both canonical and non-canonical link functions comprehensively used. When missing values are present, many existing...The generalized linear model is an indispensable tool for analyzing non-Gaussian response data, with both canonical and non-canonical link functions comprehensively used. When missing values are present, many existing methods in the literature heavily depend on an unverifiable assumption of the missing data mechanism, and they fail when the assumption is violated. This paper proposes a missing data mechanism that is as generally applicable as possible, which includes both ignorable and nonignorable missing data cases, as well as both scenarios of missing values in response and covariate.Under this general missing data mechanism, the authors adopt an approximate conditional likelihood method to estimate unknown parameters. The authors rigorously establish the regularity conditions under which the unknown parameters are identifiable under the approximate conditional likelihood approach. For parameters that are identifiable, the authors prove the asymptotic normality of the estimators obtained by maximizing the approximate conditional likelihood. Some simulation studies are conducted to evaluate finite sample performance of the proposed estimators as well as estimators from some existing methods. Finally, the authors present a biomarker analysis in prostate cancer study to illustrate the proposed method.展开更多
基金Supported by the Zhejiang Provincial Natural Science Foundation of China(LY15A010019)National Natural Science Foundation of China(11501250)
文摘This paper deals with estimation and test procedures for restricted linear errors-invariables(EV) models with nonignorable missing covariates. We develop a restricted weighted corrected least squares(WCLS) estimator based on the propensity score, which is fitted by an exponentially tilted likelihood method. The limiting distributions of the proposed estimators are discussed when tilted parameter is known or unknown. To test the validity of the constraints,we construct two test procedures based on corrected residual sum of squares and empirical likelihood method and derive their asymptotic properties. Numerical studies are conducted to examine the finite sample performance of our proposed methods.
基金supported by the General Project of National Natural Science Foundation of China(Grant No.12071416).
文摘Assessing the influence of individual observations of the functional linear models is important and challenging,especially when the observations are subject to missingness.In this paper,we introduce three case-deletion diagnostic measures to identify influential observations in functional linear models when the covariate is functional and observations on the scalar response are subject to nonignorable missingness.The nonignorable missing data mechanism is modeled via an exponential tilting semiparametric functional model.A semiparametric imputation procedure is developed to mitigate the effects of missing data.Valid estimations of the functional coefficients are based on functional principal components analysis using the imputed dataset.A smoothed bootstrap samplingmethod is introduced to estimate the diagnostic probability for each proposed diagnostic measure,which is helpful to unveil which observations have the larger influence on estimation and prediction.Simulation studies and a real data example are conducted to illustrate the finite performance of the proposed methods.
文摘The generalized additive partial linear models(GAPLM)have been widely used for flexiblemodeling of various types of response.In practice,missing data usually occurs in studies of economics,medicine,and public health.We address the problem of identifying and estimating GAPLM when the response variable is nonignorably missing.Three types of monotone missing data mechanism are assumed,including logistic model,probit model and complementary log-log model.In this situation,likelihood based on observed data may not be identifiable.In this article,we show that the parameters of interest are identifiable under very mild conditions,and then construct the estimators of the unknown parameters and unknown functions based on a likelihood-based approach by expanding the unknown functions as a linear combination of polynomial spline functions.We establish asymptotic normality for the estimators of the parametric components.Simulation studies demonstrate that the proposed inference procedure performs well in many settings.We apply the proposed method to the household income dataset from the Chinese Household Income Project Survey 2013.
基金supported by MEXT Project for Seismology toward Research Innovation with Data of Earthquake(STAR-E)[Grant Number JPJ010217].
文摘Weconsider a model identification problem in which an outcome variable contains nonignorable missing values.Statistical inference requires a guarantee of the model identifiability to obtain estimators enjoying theoretically reasonable properties such as consistency and asymptotic normality.Recently,instrumental or shadow variables,combined with the completeness condition in the outcome model,have been highlighted to make a model identifiable.In this paper,we elucidate the relationship between the completeness condition and model identifiability when the instrumental variable is categorical.We first show that when both the outcome and instrumental variables are categorical,the two conditions are equivalent.However,when one of the outcome and instrumental variables is continuous,the completeness condition may not necessarily hold,even for simple models.Consequently,we provide a sufficient condition that guarantees the identifiability of models exhibiting a monotone-likelihood property,a condition particularly useful in instances where establishing the completeness condition poses significant challenges.Using observed data,we demonstrate that the proposed conditions are easy to check for many practical models and outline their usefulness in numerical experiments and real data analysis.
基金supported by National Natural Science Foundation of China (Grant Nos. 10901162 and 10926073)China Postdoctoral Science Foundation and Foundation of the Key Laboratory of Random Complex Structures and Data Science, Chinese Academy of Sciences+2 种基金supported by National Natural Science Foundation of China (Grant Nos. 10971007 and 11101015)the fund from the government of Beijing (Grant No. 2011D005015000007)supported by National Science Foundation of US (Grant Nos. DMS0806097 and DMS1007167)
文摘We consider the statistical inference for right-censored data when censoring indicators are missing but nonignorable, and propose an adjusted imputation product-limit estimator. The proposed estimator is shown to be consistent and converges to a Gaussian process. Furthermore, we develop an empirical processbased testing method to check the MAR (missing at random) mechanism, and establish asymptotic properties for the proposed test statistic. To determine the critical value of the test, a consistent model-based bootstrap method is suggested. We conduct simulation studies to evaluate the numerical performance of the proposed method and compare it with existing methods. We also analyze a real data set from a breast cancer study for an illustration.
基金This work was supported by Division of Mathematical Sciences[1612873]the Chinese Ministry of Education 111 Project[B14019].
文摘Tang et al. (2003. Analysis of multivariate missing data with nonignorable nonresponse.Biometrika, 90(4), 747–764) and Zhao & Shao (2015. Semiparametric pseudo-likelihoods in generalized linear models with nonignorable missing data. Journal of the American Statistical Association, 110(512), 1577–1590) proposed a pseudo likelihood approach to estimate unknownparameters in a parametric density of a response Y conditioned on a vector of covariate X, whereY is subjected to nonignorable nonersponse, X is always observed, and the propensity of whetheror not Y is observed conditioned on Y and X is completely unspecified. To identify parameters, Zhao & Shao (2015. Semiparametric pseudo-likelihoods in generalized linear models withnonignorable missing data. Journal of the American Statistical Association, 110(512), 1577–1590)assumed that X can be decomposed into U and Z, where Z can be excluded from the propensitybut is related with Y even conditioned on U. The pseudo likelihood involves the estimation ofthe joint density of U and Z. When this density is estimated nonparametrically, in this paper weapply sufficient dimension reduction to reduce the dimension of U for efficient estimation. Consistency and asymptotic normality of the proposed estimators are established. Simulation resultsare presented to study the finite sample performance of the proposed estimators.
基金supported by the National Natural Science Foundation of China (No. 10901162,10926073)China Postdoctoral Science Foundation and the President Fund of GUCAS+1 种基金the foundation of the Key Laboratory of Random Complex Structures and Data Science, CASsupported by a research grant from the Research Committee, The Hong Kong Polytechnic University
文摘In this paper, we investigate the model checking problem for a general linear model with nonignorable missing covariates. We show that, without any parametric model assumption for the response probability, the least squares method yields consistent estimators for the linear model even if only the complete data are applied. This makes it feasible to propose two testing procedures for the corresponding model checking problem: a score type lack-of-fit test and a test based on the empirical process. The asymptotic properties of the test statistics are investigated. Both tests are shown to have asymptotic power 1 for local alternatives converging to the null at the rate n-r, 0 ≤ r 〈 1/2. Simulation results show that both tests perform satisfactorily.
文摘We consider multivariate small area estimation under nonignorable, not missing at random(NMAR) nonresponse. We assume a response model that accounts for the different patterns ofthe observed outcomes, (which values are observed and which ones are missing), and estimatethe response probabilities by application of the Missing Information Principle (MIP). By this principle, we first derive the likelihood score equations for the case where the missing outcomes areactually observed, and then integrate out the unobserved outcomes from the score equationswith respect to the distribution holding for the missing data. The latter distribution is definedby the distribution fitted to the observed data for the respondents and the response model. Theintegrated score equations are then solved with respect to the unknown parameters indexingthe response model. Once the response probabilities have been estimated, we impute the missing outcomes from their appropriate distribution, yielding a complete data set with no missingvalues, which is used for predicting the target area means. A parametric bootstrap procedure isdeveloped for assessing the mean squared errors (MSE) of the resulting predictors. We illustratethe approach by a small simulation study.
基金Supported by the National Natural Science Foundation of China(No.10571008)
文摘Informative dropout often arise in longitudinal data. In this paper we propose a mixture model in which the responses follow a semiparametric varying coefficient random effects model and some of the regression coefficients depend on the dropout time in a non-parametric way. The local linear version of the profile-kernel method is used to estimate the parameters of the model. The proposed estimators are shown to be consistent and asymptotically normal, and the finite performance of the estimators is evaluated by numerical simulation.
基金supported by the Chinese 111 Project B14019the US National Science Foundation under Grant Nos.DMS-1305474 and DMS-1612873the US National Institutes of Health Award UL1TR001412
文摘The generalized linear model is an indispensable tool for analyzing non-Gaussian response data, with both canonical and non-canonical link functions comprehensively used. When missing values are present, many existing methods in the literature heavily depend on an unverifiable assumption of the missing data mechanism, and they fail when the assumption is violated. This paper proposes a missing data mechanism that is as generally applicable as possible, which includes both ignorable and nonignorable missing data cases, as well as both scenarios of missing values in response and covariate.Under this general missing data mechanism, the authors adopt an approximate conditional likelihood method to estimate unknown parameters. The authors rigorously establish the regularity conditions under which the unknown parameters are identifiable under the approximate conditional likelihood approach. For parameters that are identifiable, the authors prove the asymptotic normality of the estimators obtained by maximizing the approximate conditional likelihood. Some simulation studies are conducted to evaluate finite sample performance of the proposed estimators as well as estimators from some existing methods. Finally, the authors present a biomarker analysis in prostate cancer study to illustrate the proposed method.