Quantile regression(QR)has become an important tool to measure dependence of response variable's quantiles on a number of predictors for heterogeneous data,especially heavy-tailed data and outliers.However,it is q...Quantile regression(QR)has become an important tool to measure dependence of response variable's quantiles on a number of predictors for heterogeneous data,especially heavy-tailed data and outliers.However,it is quite challenging to make statistical inference on distributed high-dimensional QR with missing data due to the distributed nature,sparsity and missingness of data and nondifferentiable quantile loss function.To overcome the challenge,this paper develops a communicationefficient method to select variables and estimate parameters by utilizing a smooth function to approximate the non-differentiable quantile loss function and incorporating the idea of the inverse probability weighting and the penalty function.The proposed approach has three merits.First,it is both computationally and communicationally efficient because only the first-and second-order information of the approximate objective function are communicated at each iteration.Second,the proposed estimators possess the oracle property after a limited number of iterations without constraint on the number of machines.Third,the proposed method simultaneously selects variables and estimates parameters within a distributed framework,ensuring robustness to the specified response probability or propensity score function of the missing data mechanism.Simulation studies and a real example are used to illustrate the effectiveness of the proposed methodologies.展开更多
In this paper, a model averaging method is proposed for varying-coefficient models with response missing at random by establishing a weight selection criterion based on cross-validation. Under certain regularity condi...In this paper, a model averaging method is proposed for varying-coefficient models with response missing at random by establishing a weight selection criterion based on cross-validation. Under certain regularity conditions, it is proved that the proposed method is asymptotically optimal in the sense of achieving the minimum squared error.展开更多
It is known that conditional independence is a quite basic assumption in many fields of statistics. How to test its validity is of great importance and has been extensively studied by the literature. Nevertheless, all...It is known that conditional independence is a quite basic assumption in many fields of statistics. How to test its validity is of great importance and has been extensively studied by the literature. Nevertheless, all of the existing methods focus on the case that data are fully observed, but none of them seems having taken into account of the scenario when missing data are present. Motivated by this, this paper develops two testing statistics to handle such a situation relying on the idea of inverse probability weighted and augmented inverse probability weighted techniques. The asymptotic distributions of the proposed statistics are also derived under the null hypothesis. The simulation studies indicate that both testing statistics perform well in terms of size and power.展开更多
This paper aims to develop a unified Bayesian approach for clustered data analysis when observations are subject to missingness at random.The authors consider a general framework in which the parameters of interest ar...This paper aims to develop a unified Bayesian approach for clustered data analysis when observations are subject to missingness at random.The authors consider a general framework in which the parameters of interest are defined through estimating equations,and the probability of missingness follows a general parametric form.The generalized method of moments framework is employed to derive an optimal combination of inverse-probability-weighted estimating equations for the parameters of interest and score equations for propensity score.Using this framework,the authors develop a quasi-Bayesian analysis for clustered samples with missing values.A unified model selection approach is also proposed to compare models characterized by different moment conditions.The authors systematically evaluate the large-sample properties of the proposed quasi-posterior density with both fixed and shrinking priors and establish the selection consistency of the proposed model selection criterion.The proposed results are valid under very mild conditions and offer significant advantages for parameters defined through non-smooth estimating functions.Extensive numerical studies demonstrate that the proposed method performs exceptionally well in finite samples.展开更多
Missingness in mixed-type variables is commonly encountered in a variety of areas.The requirement of complete observations necessitates data imputation when a moderate or large proportion of data is missing.However,in...Missingness in mixed-type variables is commonly encountered in a variety of areas.The requirement of complete observations necessitates data imputation when a moderate or large proportion of data is missing.However,inappropriate imputation would downgrade the performance of machine learning algorithms,leading to bad predictions and unreliable statistical inference.For high-dimensional large-scale mixed-type missing data,we develop a computationally efficient imputation method,missing value imputation via generalized factor models(MIG),under missing at random.The proposed MIG method allows missing variables to be of different types,including continuous,binary,and count variables,and are scalable to both data size n and variable dimension p while existing imputation methods rely on restrictive assumptions such as the same type of missing variables,the low dimensionality of variables,and a limited sample size.We explicitly show that the imputation error of the proposed MIG method diminishes to zero with the rate Op(max{n^(-1/2),p^(-1/2)})as both n and p tend to infinity.Five real datasets demonstrate the superior empirical performance of the proposed MIG method over existing methods that the average normalized absolute imputation error is reduced by 5.3%–34.1%.展开更多
Empirical-likelihood-based inference for parameters defined by the general estimating equations of Qin and Lawless(1994) remains an active research topic. When the response is missing at random(MAR) and the dimension ...Empirical-likelihood-based inference for parameters defined by the general estimating equations of Qin and Lawless(1994) remains an active research topic. When the response is missing at random(MAR) and the dimension of covariate is not low, the authors propose a two-stage estimation procedure by using the dimension-reduced kernel estimators in conjunction with an unbiased estimating function based on augmented inverse probability weighting and multiple imputation(AIPW-MI) methods. The authors show that the resulting estimator achieves consistency and asymptotic normality. In addition, the corresponding empirical likelihood ratio statistics asymptotically follow central chi-square distributions when evaluated at the true parameter. The finite-sample performance of the proposed estimator is studied through simulation, and an application to HIV-CD4 data set is also presented.展开更多
Missing covariate data arise frequently in biomedical studies.In this article,we propose a class of weighted estimating equations for the additive hazards regression model when some of the covariates are missing at ra...Missing covariate data arise frequently in biomedical studies.In this article,we propose a class of weighted estimating equations for the additive hazards regression model when some of the covariates are missing at random.Time-specific and subject-specific weights are incorporated into the formulation of weighted estimating equations.Unified results are established for estimating selection probabilities that cover both parametric and non-parametric modelling schemes.The resulting estimators have closed forms and are shown to be consistent and asymptotically normal.Simulation studies indicate that the proposed estimators perform well for practical settings.An application to a mouse leukemia study is illustrated.展开更多
In this paper,we consider the weighted local polynomial calibration estimation and imputation estimation of a non-parametric function when the data are right censored and the censoring indicators are missing at random...In this paper,we consider the weighted local polynomial calibration estimation and imputation estimation of a non-parametric function when the data are right censored and the censoring indicators are missing at random,and establish the asymptotic normality of these estimators.As their applications,we derive the weighted local linear calibration estimators and imputation estimations of the conditional distribution function,the conditional density function and the conditional quantile function,and investigate the asymptotic normality of these estimators.Finally,the simulation studies are conducted to illustrate the finite sample performance of the estimators.展开更多
In this thesis,we establish non-linear wavelet density estimators and studying the asymptotic properties of the estimators with data missing at random when covariates are present.The outstanding advantage of non-linea...In this thesis,we establish non-linear wavelet density estimators and studying the asymptotic properties of the estimators with data missing at random when covariates are present.The outstanding advantage of non-linear wavelet method is estimating the unsoothed functions,however,the classical kernel estimation cannot do this work.At the same time,we study the larger sample properties of the ISE for hazard rate estimator.展开更多
In this article, to improve the doubly robust estimator, the nonlinear regression models with missing responses are studied. Based on the covariate balancing propensity score (CBPS), estimators for the regression coef...In this article, to improve the doubly robust estimator, the nonlinear regression models with missing responses are studied. Based on the covariate balancing propensity score (CBPS), estimators for the regression coefficients and the population mean are obtained. It is proved that the proposed estimators are asymptotically normal. In simulation studies, the proposed estimators show improved performance relative to usual augmented inverse probability weighted estimators.展开更多
In this study, we investigate the effects of missing data when estimating HIV/TB co-infection. We revisit the concept of missing data and examine three available approaches for dealing with missingness. The main objec...In this study, we investigate the effects of missing data when estimating HIV/TB co-infection. We revisit the concept of missing data and examine three available approaches for dealing with missingness. The main objective is to identify the best method for correcting missing data in TB/HIV Co-infection setting. We employ both empirical data analysis and extensive simulation study to examine the effects of missing data, the accuracy, sensitivity, specificity and train and test error for different approaches. The novelty of this work hinges on the use of modern statistical learning algorithm when treating missingness. In the empirical analysis, both HIV data and TB-HIV co-infection data imputations were performed, and the missing values were imputed using different approaches. In the simulation study, sets of 0% (Complete case), 10%, 30%, 50% and 80% of the data were drawn randomly and replaced with missing values. Results show complete cases only had a co-infection rate (95% Confidence Interval band) of 29% (25%, 33%), weighted method 27% (23%, 31%), likelihood-based approach 26% (24%, 28%) and multiple imputation approach 21% (20%, 22%). In conclusion, MI remains the best approach for dealing with missing data and failure to apply it, results to overestimation of HIV/TB co-infection rate by 8%.展开更多
Within the sufficient dimension reduction framework,research on nonignorable missing data remains relatively scarce,primarily due to the associated identifiability issues.This paper considers the problem of sufficient...Within the sufficient dimension reduction framework,research on nonignorable missing data remains relatively scarce,primarily due to the associated identifiability issues.This paper considers the problem of sufficient dimension reduction when the response is subject to nonignorable missingness.By adopting a flexible semiparametric missingness mechanism to ensure identifiability,the authors construct three classes of estimating equations based on inverse probability weighting,regression imputation and augmented inverse probability weighting.The novel aspects of the proposed methods also include the incorporation of sufficient dimension reduction techniques in the implementation of these estimating equations to mitigate the high-dimensional effect,and the construction of the estimator for the conditional expectation of the estimating functions given both the covariates and the missingness indicator.The authors prove that the resulting three estimators are asymptotically normally distributed.Comprehensive simulation studies are conducted to assess the finite-sample performance of the proposed methods,and an application to PM2.5 concentration data is also presented.展开更多
In this paper, three smoothed empirical log-likelihood ratio functions for the parameters of nonlinear models with missing response are suggested. Under some regular conditions, the corresponding Wilks phenomena are o...In this paper, three smoothed empirical log-likelihood ratio functions for the parameters of nonlinear models with missing response are suggested. Under some regular conditions, the corresponding Wilks phenomena are obtained and the confidence regions for the parameter can be constructed easily.展开更多
This paper considers two estimators of θ= g(x) in a nonparametric regression model Y = g(x) + ε(x∈ (0, 1)p) with missing responses: Imputation and inverse probability weighted esti- mators. Asymptotic nor...This paper considers two estimators of θ= g(x) in a nonparametric regression model Y = g(x) + ε(x∈ (0, 1)p) with missing responses: Imputation and inverse probability weighted esti- mators. Asymptotic normality of the two estimators is established, which is used to construct normal approximation based confidence intervals on θ.展开更多
In this article, empirical likelihood inference for estimating equation with missing data is considered. Based on the weighted-corrected estimating function, an empirical log-likelihood ratio is proved to be a standar...In this article, empirical likelihood inference for estimating equation with missing data is considered. Based on the weighted-corrected estimating function, an empirical log-likelihood ratio is proved to be a standard chiqsquare distribution asymptotically under some suitable conditions. This result is different from those derived before. So it is convenient to construct confidence regions for the parameters of interest. We also prove that our proposed maximum empirical likelihood estimator θ is asymptotically normal and attains the semiparametric efficiency bound of missing data. Some simulations indicate that the proposed method performs the best.展开更多
Multiply robust inference has attracted much attention recently in the context of missing response data. An estimation procedure is multiply robust, if it can incorporate information from multiple candidate models, an...Multiply robust inference has attracted much attention recently in the context of missing response data. An estimation procedure is multiply robust, if it can incorporate information from multiple candidate models, and meanwhile the resulting estimator is consistent as long as one of the candidate models is correctly specified. This property is appealing, since it provides the user a flexible modeling strategy with better protection against model misspecification. We explore this attractive property for the regression models with a binary covariate that is missing at random. We start from a reformulation of the celebrated augmented inverse probability weighted estimating equation, and based on this reformulation, we propose a novel combination of the least squares and empirical likelihood to separately handle each of the two types of multiple candidate models,one for the missing variable regression and the other for the missingness mechanism. Due to the separation, all the working models are fused concisely and effectively. The asymptotic normality of our estimator is established through the theory of estimating function with plugged-in nuisance parameter estimates. The finite-sample performance of our procedure is illustrated both through the simulation studies and the analysis of a dementia data collected by the national Alzheimer's coordinating center.展开更多
We,in this paper,investigate two-sample quantile difference by empirical likelihood method when the responses with high-dimensional covariates of the two populations are missing at random.In particular,based on suffic...We,in this paper,investigate two-sample quantile difference by empirical likelihood method when the responses with high-dimensional covariates of the two populations are missing at random.In particular,based on sufficient dimension reduction technique,we construct three empirical log-likelihood ratios for the quantile difference between two samples by using inverse probability weighting imputation,regression imputation as well as augmented inverse probability weighting imputation,respectively,and prove their asymptotic distributions.At the same time,we give a test to check whether two populations have the same distribution.A simulation study is carried out to investigate finite sample behavior of the proposed methods too.展开更多
This paper discusses regression analysis of right-censored failure time data when censoring indicators are missing for some subjects. Several methods have been developed for the analysis under different situations and...This paper discusses regression analysis of right-censored failure time data when censoring indicators are missing for some subjects. Several methods have been developed for the analysis under different situations and especially, Goetghebeur and Ryan considered the situation where both the failure time and the censoring time follow the proportional hazards models marginally and developed an estimating equation approach. One limitation of their approach is that the two baseline hazard functions were assumed to be proportional to each other. We consider the same problem and present an efficient estimation procedure for regression parameters that does not require the proportionality assumption. An EM algorithm is developed and the method is evaluated by a simulation study, which indicates that the proposed methodology performs well for practical situations. An illustrative example is provided.展开更多
Oonsider two linear models Xi = U'β + ei, Yj = V1/2y + ηj with response variables missing at random. In this paper, we assume that X, Y are missing at random (MAR) and use the inverse probability weighted imput...Oonsider two linear models Xi = U'β + ei, Yj = V1/2y + ηj with response variables missing at random. In this paper, we assume that X, Y are missing at random (MAR) and use the inverse probability weighted imputation to produce 'complete' data sets for X and Y. Based on these data sets, we construct an empirical likelihood (EL) statistic for the difference of X and Y (denoted as A), and show that the EL statistic has the limiting distribution of X~, which is used to construct a confidence interval for A. Results of a simulation study on the finite sample performance of EL-based confidence intervals on A are reported.展开更多
The problem of hazard rate estimation under right-censored assumption has been investigated extensively.Integrated square error(ISE)of estimation is one of the most widely accepted measurements of the global performan...The problem of hazard rate estimation under right-censored assumption has been investigated extensively.Integrated square error(ISE)of estimation is one of the most widely accepted measurements of the global performance for nonparametric kernel estimation.But there are no results available for ISE of hazard rate estimation under right-censored model with censoring indicators missing at random(MAR)so far.This paper constructs an imputation estimator of the hazard rate function and establish asymptotic normality of the ISE for the kernel hazard rate estimator with censoring indicators MAR.At the same time,an asymptotic representation of the mean integrated square error(MISE)is also presented.The finite sample behavior of the estimator is investigated via one simple simulation.展开更多
基金supported by the National Key R&D Program of China under Grant No.2022YFA1003701the Open Research Fund of Yunnan Key Laboratory of Statistical Modeling and Data Analysis,Yunnan University under Grant No.SMDAYB2023004。
文摘Quantile regression(QR)has become an important tool to measure dependence of response variable's quantiles on a number of predictors for heterogeneous data,especially heavy-tailed data and outliers.However,it is quite challenging to make statistical inference on distributed high-dimensional QR with missing data due to the distributed nature,sparsity and missingness of data and nondifferentiable quantile loss function.To overcome the challenge,this paper develops a communicationefficient method to select variables and estimate parameters by utilizing a smooth function to approximate the non-differentiable quantile loss function and incorporating the idea of the inverse probability weighting and the penalty function.The proposed approach has three merits.First,it is both computationally and communicationally efficient because only the first-and second-order information of the approximate objective function are communicated at each iteration.Second,the proposed estimators possess the oracle property after a limited number of iterations without constraint on the number of machines.Third,the proposed method simultaneously selects variables and estimates parameters within a distributed framework,ensuring robustness to the specified response probability or propensity score function of the missing data mechanism.Simulation studies and a real example are used to illustrate the effectiveness of the proposed methodologies.
文摘In this paper, a model averaging method is proposed for varying-coefficient models with response missing at random by establishing a weight selection criterion based on cross-validation. Under certain regularity conditions, it is proved that the proposed method is asymptotically optimal in the sense of achieving the minimum squared error.
基金supported by the Fundamental Research Funds for the Central Universities(17CX02035A)supported by NNSF of China(11601197,11461029,61563018)+2 种基金China Postdoctoral Science Foundation funded project(2016M600511,2017T100475)NSF of Jiangxi Province(20171ACB21030,20161BAB201024,20161ACB200009)the Key Science Fund Project of Jiangxi provincial education department(GJJ150439)
文摘It is known that conditional independence is a quite basic assumption in many fields of statistics. How to test its validity is of great importance and has been extensively studied by the literature. Nevertheless, all of the existing methods focus on the case that data are fully observed, but none of them seems having taken into account of the scenario when missing data are present. Motivated by this, this paper develops two testing statistics to handle such a situation relying on the idea of inverse probability weighted and augmented inverse probability weighted techniques. The asymptotic distributions of the proposed statistics are also derived under the null hypothesis. The simulation studies indicate that both testing statistics perform well in terms of size and power.
基金supported by the National Key R&D Program of China under Grant No.2022YFA1003701the National Natural Science Foundation of China under Grant Nos.12331009 and 12071416the Yunnan Fundamental Research Projects under Grant No.202201AV070006。
文摘This paper aims to develop a unified Bayesian approach for clustered data analysis when observations are subject to missingness at random.The authors consider a general framework in which the parameters of interest are defined through estimating equations,and the probability of missingness follows a general parametric form.The generalized method of moments framework is employed to derive an optimal combination of inverse-probability-weighted estimating equations for the parameters of interest and score equations for propensity score.Using this framework,the authors develop a quasi-Bayesian analysis for clustered samples with missing values.A unified model selection approach is also proposed to compare models characterized by different moment conditions.The authors systematically evaluate the large-sample properties of the proposed quasi-posterior density with both fixed and shrinking priors and establish the selection consistency of the proposed model selection criterion.The proposed results are valid under very mild conditions and offer significant advantages for parameters defined through non-smooth estimating functions.Extensive numerical studies demonstrate that the proposed method performs exceptionally well in finite samples.
基金supported by National Key R&D Program of China(Grant No.2022YFA1003702)National Natural Science Foundation of China(Grant Nos.11931014 and 12271441)。
文摘Missingness in mixed-type variables is commonly encountered in a variety of areas.The requirement of complete observations necessitates data imputation when a moderate or large proportion of data is missing.However,inappropriate imputation would downgrade the performance of machine learning algorithms,leading to bad predictions and unreliable statistical inference.For high-dimensional large-scale mixed-type missing data,we develop a computationally efficient imputation method,missing value imputation via generalized factor models(MIG),under missing at random.The proposed MIG method allows missing variables to be of different types,including continuous,binary,and count variables,and are scalable to both data size n and variable dimension p while existing imputation methods rely on restrictive assumptions such as the same type of missing variables,the low dimensionality of variables,and a limited sample size.We explicitly show that the imputation error of the proposed MIG method diminishes to zero with the rate Op(max{n^(-1/2),p^(-1/2)})as both n and p tend to infinity.Five real datasets demonstrate the superior empirical performance of the proposed MIG method over existing methods that the average normalized absolute imputation error is reduced by 5.3%–34.1%.
基金supported by the National Natural Science Foundation of China under Grant Nos.11871287,11501208,11771144,11801359the Natural Science Foundation of Tianjin under Grant No.18JCYBJC41100+1 种基金Fundamental Research Funds for the Central Universitiesthe Key Laboratory for Medical Data Analysis and Statistical Research of Tianjin。
文摘Empirical-likelihood-based inference for parameters defined by the general estimating equations of Qin and Lawless(1994) remains an active research topic. When the response is missing at random(MAR) and the dimension of covariate is not low, the authors propose a two-stage estimation procedure by using the dimension-reduced kernel estimators in conjunction with an unbiased estimating function based on augmented inverse probability weighting and multiple imputation(AIPW-MI) methods. The authors show that the resulting estimator achieves consistency and asymptotic normality. In addition, the corresponding empirical likelihood ratio statistics asymptotically follow central chi-square distributions when evaluated at the true parameter. The finite-sample performance of the proposed estimator is studied through simulation, and an application to HIV-CD4 data set is also presented.
基金supported by National Natural Science Foundation of China(Grant Nos.11771431,11690015,11926341,11601080 and 11671275)Key Laboratory of Random Complex Structures and Data Science,Chinese Academy of Sciences(Grant No.2008DP173182)the Fundamental Research Funds for the Central Universities in University of International Business and Economics(Grant No.CXTD10-09)。
文摘Missing covariate data arise frequently in biomedical studies.In this article,we propose a class of weighted estimating equations for the additive hazards regression model when some of the covariates are missing at random.Time-specific and subject-specific weights are incorporated into the formulation of weighted estimating equations.Unified results are established for estimating selection probabilities that cover both parametric and non-parametric modelling schemes.The resulting estimators have closed forms and are shown to be consistent and asymptotically normal.Simulation studies indicate that the proposed estimators perform well for practical settings.An application to a mouse leukemia study is illustrated.
基金supported in part by the National Social Science Foundation of China(Grant No.20BTJ049).
文摘In this paper,we consider the weighted local polynomial calibration estimation and imputation estimation of a non-parametric function when the data are right censored and the censoring indicators are missing at random,and establish the asymptotic normality of these estimators.As their applications,we derive the weighted local linear calibration estimators and imputation estimations of the conditional distribution function,the conditional density function and the conditional quantile function,and investigate the asymptotic normality of these estimators.Finally,the simulation studies are conducted to illustrate the finite sample performance of the estimators.
文摘In this thesis,we establish non-linear wavelet density estimators and studying the asymptotic properties of the estimators with data missing at random when covariates are present.The outstanding advantage of non-linear wavelet method is estimating the unsoothed functions,however,the classical kernel estimation cannot do this work.At the same time,we study the larger sample properties of the ISE for hazard rate estimator.
文摘In this article, to improve the doubly robust estimator, the nonlinear regression models with missing responses are studied. Based on the covariate balancing propensity score (CBPS), estimators for the regression coefficients and the population mean are obtained. It is proved that the proposed estimators are asymptotically normal. In simulation studies, the proposed estimators show improved performance relative to usual augmented inverse probability weighted estimators.
文摘In this study, we investigate the effects of missing data when estimating HIV/TB co-infection. We revisit the concept of missing data and examine three available approaches for dealing with missingness. The main objective is to identify the best method for correcting missing data in TB/HIV Co-infection setting. We employ both empirical data analysis and extensive simulation study to examine the effects of missing data, the accuracy, sensitivity, specificity and train and test error for different approaches. The novelty of this work hinges on the use of modern statistical learning algorithm when treating missingness. In the empirical analysis, both HIV data and TB-HIV co-infection data imputations were performed, and the missing values were imputed using different approaches. In the simulation study, sets of 0% (Complete case), 10%, 30%, 50% and 80% of the data were drawn randomly and replaced with missing values. Results show complete cases only had a co-infection rate (95% Confidence Interval band) of 29% (25%, 33%), weighted method 27% (23%, 31%), likelihood-based approach 26% (24%, 28%) and multiple imputation approach 21% (20%, 22%). In conclusion, MI remains the best approach for dealing with missing data and failure to apply it, results to overestimation of HIV/TB co-infection rate by 8%.
基金supported by the Youth Program of the National Natural Science Foundation of China under Grant No.12401368the Youth Talent Special Support Program of Yunnan Provincial Xingdian Talent Support Plan+4 种基金the Scientific Research Fund Project of Yunnan Provincial Department of Education under Grant No.2020J0373the Scientific Research Fund Project of Yunnan University of Finance and Economics under Grant No.2022D11supported by the General Programs of the National Natural Science Foundation of China under Grant Nos.12271510 and 11871460the Innovative Research Group Program under Grant No.61621003a grant from the Key Laboratory of Random Complex Structures and Data Science,Chinese Academy of Sciences。
文摘Within the sufficient dimension reduction framework,research on nonignorable missing data remains relatively scarce,primarily due to the associated identifiability issues.This paper considers the problem of sufficient dimension reduction when the response is subject to nonignorable missingness.By adopting a flexible semiparametric missingness mechanism to ensure identifiability,the authors construct three classes of estimating equations based on inverse probability weighting,regression imputation and augmented inverse probability weighting.The novel aspects of the proposed methods also include the incorporation of sufficient dimension reduction techniques in the implementation of these estimating equations to mitigate the high-dimensional effect,and the construction of the estimator for the conditional expectation of the estimating functions given both the covariates and the missingness indicator.The authors prove that the resulting three estimators are asymptotically normally distributed.Comprehensive simulation studies are conducted to assess the finite-sample performance of the proposed methods,and an application to PM2.5 concentration data is also presented.
文摘In this paper, three smoothed empirical log-likelihood ratio functions for the parameters of nonlinear models with missing response are suggested. Under some regular conditions, the corresponding Wilks phenomena are obtained and the confidence regions for the parameter can be constructed easily.
基金This research is supported by he National Natural Science Foundation of China under Grant Nos. 10661003 and 10971038, and the Natural Science Foundation of Guangxi under Grant No. 2010GXNSFA013117.
文摘This paper considers two estimators of θ= g(x) in a nonparametric regression model Y = g(x) + ε(x∈ (0, 1)p) with missing responses: Imputation and inverse probability weighted esti- mators. Asymptotic normality of the two estimators is established, which is used to construct normal approximation based confidence intervals on θ.
基金supported by National Natural Science Foundation of China (Grant Nos.11171188, 11201499 and 10921101)Natural Science Foundation of Shandong Province (Grant Nos. ZR2010AZ001 and ZR2011AQ007)+1 种基金Shandong Provincial Scientific Research Reward Foundation for Excellent Young and MiddleAged Scientists (Grant No. BS2011SF006)K.C. Wong-HKBU Fellowship Program for Mainland Visiting Scholars 2010-11
文摘In this article, empirical likelihood inference for estimating equation with missing data is considered. Based on the weighted-corrected estimating function, an empirical log-likelihood ratio is proved to be a standard chiqsquare distribution asymptotically under some suitable conditions. This result is different from those derived before. So it is convenient to construct confidence regions for the parameters of interest. We also prove that our proposed maximum empirical likelihood estimator θ is asymptotically normal and attains the semiparametric efficiency bound of missing data. Some simulations indicate that the proposed method performs the best.
基金supported by National Natural Science Foundation of China(Grant No.11301031)
文摘Multiply robust inference has attracted much attention recently in the context of missing response data. An estimation procedure is multiply robust, if it can incorporate information from multiple candidate models, and meanwhile the resulting estimator is consistent as long as one of the candidate models is correctly specified. This property is appealing, since it provides the user a flexible modeling strategy with better protection against model misspecification. We explore this attractive property for the regression models with a binary covariate that is missing at random. We start from a reformulation of the celebrated augmented inverse probability weighted estimating equation, and based on this reformulation, we propose a novel combination of the least squares and empirical likelihood to separately handle each of the two types of multiple candidate models,one for the missing variable regression and the other for the missingness mechanism. Due to the separation, all the working models are fused concisely and effectively. The asymptotic normality of our estimator is established through the theory of estimating function with plugged-in nuisance parameter estimates. The finite-sample performance of our procedure is illustrated both through the simulation studies and the analysis of a dementia data collected by the national Alzheimer's coordinating center.
基金Supported by National Natural Science Foundation of China(Grant No.12071348)National Social Science Foundation of China(Grant No.17BTJ032)。
文摘We,in this paper,investigate two-sample quantile difference by empirical likelihood method when the responses with high-dimensional covariates of the two populations are missing at random.In particular,based on sufficient dimension reduction technique,we construct three empirical log-likelihood ratios for the quantile difference between two samples by using inverse probability weighting imputation,regression imputation as well as augmented inverse probability weighting imputation,respectively,and prove their asymptotic distributions.At the same time,we give a test to check whether two populations have the same distribution.A simulation study is carried out to investigate finite sample behavior of the proposed methods too.
文摘This paper discusses regression analysis of right-censored failure time data when censoring indicators are missing for some subjects. Several methods have been developed for the analysis under different situations and especially, Goetghebeur and Ryan considered the situation where both the failure time and the censoring time follow the proportional hazards models marginally and developed an estimating equation approach. One limitation of their approach is that the two baseline hazard functions were assumed to be proportional to each other. We consider the same problem and present an efficient estimation procedure for regression parameters that does not require the proportionality assumption. An EM algorithm is developed and the method is evaluated by a simulation study, which indicates that the proposed methodology performs well for practical situations. An illustrative example is provided.
基金Supported by the National Natural Science Foundation of China(No.11271088,11361011,11201088)Natural Science Foundation of Guangxi(No.2013GXNSFAA(019004 and 019007),2013GXNSFBA019001)
文摘Oonsider two linear models Xi = U'β + ei, Yj = V1/2y + ηj with response variables missing at random. In this paper, we assume that X, Y are missing at random (MAR) and use the inverse probability weighted imputation to produce 'complete' data sets for X and Y. Based on these data sets, we construct an empirical likelihood (EL) statistic for the difference of X and Y (denoted as A), and show that the EL statistic has the limiting distribution of X~, which is used to construct a confidence interval for A. Results of a simulation study on the finite sample performance of EL-based confidence intervals on A are reported.
基金the China Postdoctoral Science Foundation under Grant No.2019M651422the National Natural Science Foundation of China under Grant Nos.71701127,11831008 and 11971171+3 种基金the National Social Science Foundation Key Program under Grant No.17ZDA091the 111 Project of China under Grant No.B14019the Natural Science Foundation of Shanghai under Grant Nos.17ZR1409000 and 20ZR1423000the Project of Humanities and Social Science Foundation of Ministry of Education under Grant No.20YJC910003。
文摘The problem of hazard rate estimation under right-censored assumption has been investigated extensively.Integrated square error(ISE)of estimation is one of the most widely accepted measurements of the global performance for nonparametric kernel estimation.But there are no results available for ISE of hazard rate estimation under right-censored model with censoring indicators missing at random(MAR)so far.This paper constructs an imputation estimator of the hazard rate function and establish asymptotic normality of the ISE for the kernel hazard rate estimator with censoring indicators MAR.At the same time,an asymptotic representation of the mean integrated square error(MISE)is also presented.The finite sample behavior of the estimator is investigated via one simple simulation.