Relative-risk models are often used to characterize the relationship between survival time and time-dependent covariates. When the covariates are observed, the estimation and asymptotic theory for parameters of intere...Relative-risk models are often used to characterize the relationship between survival time and time-dependent covariates. When the covariates are observed, the estimation and asymptotic theory for parameters of interest are available; challenges remain when missingness occurs. A popular approach at hand is to jointly model survival data and longitudinal data. This seems efficient, in making use of more information, but the rigorous theoretical studies have long been ignored. For both additive risk models and relative-risk models, we consider the missing data nonignorable. Under general regularity conditions, we prove asymptotic normality for the nonparametric maximum likelihood estimators.展开更多
The lack of covariate data is one of the hotspots of modern statistical analysis.It often appears in surveys or interviews,and becomes more complex in the presence of heavy tailed,skewed,and heteroscedastic data.In th...The lack of covariate data is one of the hotspots of modern statistical analysis.It often appears in surveys or interviews,and becomes more complex in the presence of heavy tailed,skewed,and heteroscedastic data.In this sense,a robust quantile regression method is more concerned.This paper presents an inverse weighted quantile regression method to explore the relationship between response and covariates.This method has several advantages over the naive estimator.On the one hand,it uses all available data and the missing covariates are allowed to be heavily correlated with the response;on the other hand,the estimator is uniform and asymptotically normal at all quantile levels.The effectiveness of this method is verified by simulation.Finally,in order to illustrate the effectiveness of this method,we extend it to the more general case,multivariate case and nonparametric case.展开更多
Given a sample of regression data from (Y, Z), a new diagnostic plotting method is proposed for checking the hypothesis H0: the data are from a given Cox model with the time-dependent covariates Z. It compares two est...Given a sample of regression data from (Y, Z), a new diagnostic plotting method is proposed for checking the hypothesis H0: the data are from a given Cox model with the time-dependent covariates Z. It compares two estimates of the marginal distribution FY of Y. One is an estimate of the modified expression of FY under H0, based on a consistent estimate of the parameter under H0, and based on the baseline distribution of the data. The other is the Kaplan-Meier-estimator of FY, together with its confidence band. The new plot, called the marginal distribution plot, can be viewed as a test for testing H0. The main advantage of the test over the existing residual tests is in the case that the data do not satisfy any Cox model or the Cox model is mis-specified. Then the new test is still valid, but not the residual tests and the residual tests often make type II error with a very large probability.展开更多
The empirical likelihood-based inference for varying coefficient models with missing covariates is investigated. An imputed empirical likelihood ratio function for the coefficient functions is proposed, and it is show...The empirical likelihood-based inference for varying coefficient models with missing covariates is investigated. An imputed empirical likelihood ratio function for the coefficient functions is proposed, and it is shown that iis limiting distribution is standard chi-squared. Then the corresponding confidence intervals for the regression coefficients are constructed. Some simulations show that the proposed procedure can attenuate the effect of the missing data, and performs well for the finite sample.展开更多
The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based o...The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based on complete data. This paper studies the optimal estimation of high-dimensional covariance matrices based on missing and noisy sample under the norm. First, the model with sub-Gaussian additive noise is presented. The generalized sample covariance is then modified to define a hard thresholding estimator , and the minimax upper bound is derived. After that, the minimax lower bound is derived, and it is concluded that the estimator presented in this article is rate-optimal. Finally, numerical simulation analysis is performed. The result shows that for missing samples with sub-Gaussian noise, if the true covariance matrix is sparse, the hard thresholding estimator outperforms the traditional estimate method.展开更多
In this article, to improve the doubly robust estimator, the nonlinear regression models with missing responses are studied. Based on the covariate balancing propensity score (CBPS), estimators for the regression coef...In this article, to improve the doubly robust estimator, the nonlinear regression models with missing responses are studied. Based on the covariate balancing propensity score (CBPS), estimators for the regression coefficients and the population mean are obtained. It is proved that the proposed estimators are asymptotically normal. In simulation studies, the proposed estimators show improved performance relative to usual augmented inverse probability weighted estimators.展开更多
Multiply robust inference has attracted much attention recently in the context of missing response data. An estimation procedure is multiply robust, if it can incorporate information from multiple candidate models, an...Multiply robust inference has attracted much attention recently in the context of missing response data. An estimation procedure is multiply robust, if it can incorporate information from multiple candidate models, and meanwhile the resulting estimator is consistent as long as one of the candidate models is correctly specified. This property is appealing, since it provides the user a flexible modeling strategy with better protection against model misspecification. We explore this attractive property for the regression models with a binary covariate that is missing at random. We start from a reformulation of the celebrated augmented inverse probability weighted estimating equation, and based on this reformulation, we propose a novel combination of the least squares and empirical likelihood to separately handle each of the two types of multiple candidate models,one for the missing variable regression and the other for the missingness mechanism. Due to the separation, all the working models are fused concisely and effectively. The asymptotic normality of our estimator is established through the theory of estimating function with plugged-in nuisance parameter estimates. The finite-sample performance of our procedure is illustrated both through the simulation studies and the analysis of a dementia data collected by the national Alzheimer's coordinating center.展开更多
In this paper, we investigate the model checking problem for a general linear model with nonignorable missing covariates. We show that, without any parametric model assumption for the response probability, the least s...In this paper, we investigate the model checking problem for a general linear model with nonignorable missing covariates. We show that, without any parametric model assumption for the response probability, the least squares method yields consistent estimators for the linear model even if only the complete data are applied. This makes it feasible to propose two testing procedures for the corresponding model checking problem: a score type lack-of-fit test and a test based on the empirical process. The asymptotic properties of the test statistics are investigated. Both tests are shown to have asymptotic power 1 for local alternatives converging to the null at the rate n-r, 0 ≤ r 〈 1/2. Simulation results show that both tests perform satisfactorily.展开更多
In recent years,there has been a large amount of literature on missing data.Most of them focus on situations where there is only missingness in response or covariate.In this paper,we consider the adequacy check for th...In recent years,there has been a large amount of literature on missing data.Most of them focus on situations where there is only missingness in response or covariate.In this paper,we consider the adequacy check for the linear regression model with the response and covariates missing simultaneously.We apply model adjustment and inverse probability weighting methods to deal with the missingness of response and covariate,respectively.In order to avoid the curse of dimension,we propose an empirical process test with the linear indicator weighting function.The asymptotic properties of the proposed test under the null,local and global alternative hypothe tical models are rigorously investigated.A consisten t wild boot strap method is developed to approximate the critical value.Finally,simulation studies and real data analysis are performed to show that the proposed method performed well.展开更多
Feature screening with missing data is a critical problem but has not been well addressed in theliterature. In this discussion we propose a new screening index based on “information value” andapply it to feature scr...Feature screening with missing data is a critical problem but has not been well addressed in theliterature. In this discussion we propose a new screening index based on “information value” andapply it to feature screening with missing covariates.展开更多
针对配电系统多时刻量测缺失数据修复因误差累积导致准确率降低的问题,提出了一种基于多步长长短期记忆神经网络(multi-step long-short term memory,MLSTM)和协方差交叉(covariance intersection,CI)融合的配电系统多时刻量测缺失数据...针对配电系统多时刻量测缺失数据修复因误差累积导致准确率降低的问题,提出了一种基于多步长长短期记忆神经网络(multi-step long-short term memory,MLSTM)和协方差交叉(covariance intersection,CI)融合的配电系统多时刻量测缺失数据修复方法。首先,将配电系统电流、功率等量测量历史数据降维后,构建不同维度的输入向量矩阵和特征标签矩阵作为模型输入,并训练得到多个不同步长的长短期记忆神经网络(long-short term memory,LSTM)量测数据修复模型。在此基础上,利用CI算法对上述不同步长的LSTM修复模型进行融合,得到多时刻量测缺失数据修复模型。算例分析表明,所提方法可以有效抑制多时刻量测数据修复过程中的误差累积,提高多时刻缺失数据的修复准确度。展开更多
基金funded by National Natural Science Foundation of China(NSFC No.11771241)Natural Science Foundation of Anhui Province(No.1708085QA14)
文摘Relative-risk models are often used to characterize the relationship between survival time and time-dependent covariates. When the covariates are observed, the estimation and asymptotic theory for parameters of interest are available; challenges remain when missingness occurs. A popular approach at hand is to jointly model survival data and longitudinal data. This seems efficient, in making use of more information, but the rigorous theoretical studies have long been ignored. For both additive risk models and relative-risk models, we consider the missing data nonignorable. Under general regularity conditions, we prove asymptotic normality for the nonparametric maximum likelihood estimators.
基金Supported by the National Natural Science Foundation of China(Grant No.11861042)the China Statistical Research Project(Grant No.2020LZ25)。
文摘The lack of covariate data is one of the hotspots of modern statistical analysis.It often appears in surveys or interviews,and becomes more complex in the presence of heavy tailed,skewed,and heteroscedastic data.In this sense,a robust quantile regression method is more concerned.This paper presents an inverse weighted quantile regression method to explore the relationship between response and covariates.This method has several advantages over the naive estimator.On the one hand,it uses all available data and the missing covariates are allowed to be heavily correlated with the response;on the other hand,the estimator is uniform and asymptotically normal at all quantile levels.The effectiveness of this method is verified by simulation.Finally,in order to illustrate the effectiveness of this method,we extend it to the more general case,multivariate case and nonparametric case.
文摘Given a sample of regression data from (Y, Z), a new diagnostic plotting method is proposed for checking the hypothesis H0: the data are from a given Cox model with the time-dependent covariates Z. It compares two estimates of the marginal distribution FY of Y. One is an estimate of the modified expression of FY under H0, based on a consistent estimate of the parameter under H0, and based on the baseline distribution of the data. The other is the Kaplan-Meier-estimator of FY, together with its confidence band. The new plot, called the marginal distribution plot, can be viewed as a test for testing H0. The main advantage of the test over the existing residual tests is in the case that the data do not satisfy any Cox model or the Cox model is mis-specified. Then the new test is still valid, but not the residual tests and the residual tests often make type II error with a very large probability.
文摘The empirical likelihood-based inference for varying coefficient models with missing covariates is investigated. An imputed empirical likelihood ratio function for the coefficient functions is proposed, and it is shown that iis limiting distribution is standard chi-squared. Then the corresponding confidence intervals for the regression coefficients are constructed. Some simulations show that the proposed procedure can attenuate the effect of the missing data, and performs well for the finite sample.
文摘The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based on complete data. This paper studies the optimal estimation of high-dimensional covariance matrices based on missing and noisy sample under the norm. First, the model with sub-Gaussian additive noise is presented. The generalized sample covariance is then modified to define a hard thresholding estimator , and the minimax upper bound is derived. After that, the minimax lower bound is derived, and it is concluded that the estimator presented in this article is rate-optimal. Finally, numerical simulation analysis is performed. The result shows that for missing samples with sub-Gaussian noise, if the true covariance matrix is sparse, the hard thresholding estimator outperforms the traditional estimate method.
文摘In this article, to improve the doubly robust estimator, the nonlinear regression models with missing responses are studied. Based on the covariate balancing propensity score (CBPS), estimators for the regression coefficients and the population mean are obtained. It is proved that the proposed estimators are asymptotically normal. In simulation studies, the proposed estimators show improved performance relative to usual augmented inverse probability weighted estimators.
基金supported by National Natural Science Foundation of China(Grant No.11301031)
文摘Multiply robust inference has attracted much attention recently in the context of missing response data. An estimation procedure is multiply robust, if it can incorporate information from multiple candidate models, and meanwhile the resulting estimator is consistent as long as one of the candidate models is correctly specified. This property is appealing, since it provides the user a flexible modeling strategy with better protection against model misspecification. We explore this attractive property for the regression models with a binary covariate that is missing at random. We start from a reformulation of the celebrated augmented inverse probability weighted estimating equation, and based on this reformulation, we propose a novel combination of the least squares and empirical likelihood to separately handle each of the two types of multiple candidate models,one for the missing variable regression and the other for the missingness mechanism. Due to the separation, all the working models are fused concisely and effectively. The asymptotic normality of our estimator is established through the theory of estimating function with plugged-in nuisance parameter estimates. The finite-sample performance of our procedure is illustrated both through the simulation studies and the analysis of a dementia data collected by the national Alzheimer's coordinating center.
基金supported by the National Natural Science Foundation of China (No. 10901162,10926073)China Postdoctoral Science Foundation and the President Fund of GUCAS+1 种基金the foundation of the Key Laboratory of Random Complex Structures and Data Science, CASsupported by a research grant from the Research Committee, The Hong Kong Polytechnic University
文摘In this paper, we investigate the model checking problem for a general linear model with nonignorable missing covariates. We show that, without any parametric model assumption for the response probability, the least squares method yields consistent estimators for the linear model even if only the complete data are applied. This makes it feasible to propose two testing procedures for the corresponding model checking problem: a score type lack-of-fit test and a test based on the empirical process. The asymptotic properties of the test statistics are investigated. Both tests are shown to have asymptotic power 1 for local alternatives converging to the null at the rate n-r, 0 ≤ r 〈 1/2. Simulation results show that both tests perform satisfactorily.
基金This research was supported by Key projects of philosophy and social science in Beijing(15ZDA47)National Natural Science Foundation of China(Grant Nos.11571340,11971045)Beijing Natural Science Foundation(1202001)and the Open Project of Key Laboratory of Big Data Mining and Knowledge Management,Chinese Academy of Sciences.
文摘In recent years,there has been a large amount of literature on missing data.Most of them focus on situations where there is only missingness in response or covariate.In this paper,we consider the adequacy check for the linear regression model with the response and covariates missing simultaneously.We apply model adjustment and inverse probability weighting methods to deal with the missingness of response and covariate,respectively.In order to avoid the curse of dimension,we propose an empirical process test with the linear indicator weighting function.The asymptotic properties of the proposed test under the null,local and global alternative hypothe tical models are rigorously investigated.A consisten t wild boot strap method is developed to approximate the critical value.Finally,simulation studies and real data analysis are performed to show that the proposed method performed well.
文摘Feature screening with missing data is a critical problem but has not been well addressed in theliterature. In this discussion we propose a new screening index based on “information value” andapply it to feature screening with missing covariates.
文摘针对配电系统多时刻量测缺失数据修复因误差累积导致准确率降低的问题,提出了一种基于多步长长短期记忆神经网络(multi-step long-short term memory,MLSTM)和协方差交叉(covariance intersection,CI)融合的配电系统多时刻量测缺失数据修复方法。首先,将配电系统电流、功率等量测量历史数据降维后,构建不同维度的输入向量矩阵和特征标签矩阵作为模型输入,并训练得到多个不同步长的长短期记忆神经网络(long-short term memory,LSTM)量测数据修复模型。在此基础上,利用CI算法对上述不同步长的LSTM修复模型进行融合,得到多时刻量测缺失数据修复模型。算例分析表明,所提方法可以有效抑制多时刻量测数据修复过程中的误差累积,提高多时刻缺失数据的修复准确度。