We study the asymptotic properties of adaptive lasso estimators when some components of the parameter of interest βare strictly different than zero, while other components may be zero or may converge to zero ...We study the asymptotic properties of adaptive lasso estimators when some components of the parameter of interest βare strictly different than zero, while other components may be zero or may converge to zero with rate n<sup>-δ</sup>, with δ>0, where n denotes the sample size. To achieve this objective, we analyze the convergence/divergence rates of each term in the first-order conditions of adaptive lasso estimators. First, we derive conditions that allow selecting tuning parameters in order to ensure that adaptive lasso estimates of n<sup>-δ</sup>-components indeed collapse to zero. Second, in this case, we also derive asymptotic distributions of adaptive lasso estimators for nonzero components. When δ>1/2, we obtain the usual n<sup>1/2</sup>-asymptotic normal distribution, while when 0δ≤1/2, we show n<sup>δ</sup>-consistency combined with (biased) n<sup>1/2-δ</sup>-asymptotic normality for nonzero components. We call these properties, Extended Oracle Properties. These results allow practitioners to exclude in their model the asymptotically negligible variables and make inferences on the asymptotically relevant variables.展开更多
The seamless-L0 (SELO) penalty is a smooth function on [0, ∞) that very closely resembles the L0 penalty, which has been demonstrated theoretically and practically to be effective in nonconvex penalization for var...The seamless-L0 (SELO) penalty is a smooth function on [0, ∞) that very closely resembles the L0 penalty, which has been demonstrated theoretically and practically to be effective in nonconvex penalization for variable selection. In this paper, we first generalize SELO to a class of penalties retaining good features of SELO, and then propose variable selection and estimation in linear models using the proposed generalized SELO (GSELO) penalized least squares (PLS) approach. We show that the GSELO-PLS procedure possesses the oracle property and consistently selects the true model under some regularity conditions in the presence of a diverging number of variables. The entire path of GSELO-PLS estimates can be efficiently computed through a smoothing quasi-Newton (SQN) method. A modified BIC coupled with a continuation strategy is developed to select the optimal tuning parameter. Simulation studies and analysis of a clinical data are carried out to evaluate the finite sample performance of the proposed method. In addition, numerical experiments involving simulation studies and analysis of a microarray data are also conducted for GSELO-PLS in the high-dimensional settings.展开更多
Nonconvex penalties including the smoothly clipped absolute deviation penalty and the minimax concave penalty enjoy the properties of unbiasedness, continuity and sparsity,and the ridge regression can deal with the co...Nonconvex penalties including the smoothly clipped absolute deviation penalty and the minimax concave penalty enjoy the properties of unbiasedness, continuity and sparsity,and the ridge regression can deal with the collinearity problem. Combining the strengths of nonconvex penalties and ridge regression(abbreviated as NPR), we study the oracle property of the NPR estimator in high dimensional settings with highly correlated predictors, where the dimensionality of covariates pn is allowed to increase exponentially with the sample size n. Simulation studies and a real data example are presented to verify the performance of the NPR method.展开更多
In this paper,we study high-dimensional sparse multiplicative models for positive response data and propose a variable sorted active set(VSAS)algorithm for finding the L_(0)regularized least product relative error(LPR...In this paper,we study high-dimensional sparse multiplicative models for positive response data and propose a variable sorted active set(VSAS)algorithm for finding the L_(0)regularized least product relative error(LPRE)estimator.The VSAS algorithm is derived from the local quadratic approximation based on the Karush-Kuhn-Tucker(KKT)conditions of L_(0)-penalized LPRE objective function.Under the condition of restricted invertibility,we establish an explicit L∞upper bound for the sequence of solutions generated by the VSAS algorithm.We further obtain an optimal convergence rate for the proposed estimator with high probability in finite iterations.In addition,our estimator enjoys the oracle property with high probability if the target signal exceeds the detectable level.Finally,extensive simulations and two real-world applications are conducted to illustrate the effectiveness of the proposal.展开更多
This paper proposes a penalized method for high-dimensional variable selection and subgroup identification in the Tobit model.Based on Olsen’s[(1978).Note on the uniqueness of the maximum likelihood estimator for the...This paper proposes a penalized method for high-dimensional variable selection and subgroup identification in the Tobit model.Based on Olsen’s[(1978).Note on the uniqueness of the maximum likelihood estimator for the Tobit model.Econometrica:Journal of the Econometric Society,46(5),1211-1215.https://doi.org/10.2307/1911445]convex reparameterization of the Tobit negative log-likelihood,we develop an efficient algorithm for minimizing the objective function by combining the alternating direction method of multipliers(ADMM)and generalised coordinate descent(GCD).We also establish the oracle properties of our proposed estimator under some mild regularity conditions.Furthermore,extensive simulations and an empirical data study are conducted to demonstrate the performance of the proposed approach.展开更多
In this paper,the authors investigate three aspects of statistical inference for the partially linear regression models where some covariates are measured with errors.Firstly, a bandwidth selection procedure is propos...In this paper,the authors investigate three aspects of statistical inference for the partially linear regression models where some covariates are measured with errors.Firstly, a bandwidth selection procedure is proposed,which is a combination of the differencebased technique and GCV method.Secondly,a goodness-of-fit test procedure is proposed, which is an extension of the generalized likelihood technique.Thirdly,a variable selection procedure for the parametric part is provided based on the nonconcave penalization and corrected profile least squares.Same as"Variable selection via nonconcave penalized likelihood and its oracle properties"(J.Amer.Statist.Assoc.,96,2001,1348-1360),it is shown that the resulting estimator has an oracle property with a proper choice of regularization parameters and penalty function.Simulation studies are conducted to illustrate the finite sample performances of the proposed procedures.展开更多
In this paper, based on spline approximation, the authors propose a unified variable selection approach for single-index model via adaptive L1 penalty. The calculation methods of the proposed estimators are given on t...In this paper, based on spline approximation, the authors propose a unified variable selection approach for single-index model via adaptive L1 penalty. The calculation methods of the proposed estimators are given on the basis of the known lars algorithm. Under some regular conditions, the authors demonstrate the asymptotic properties of the proposed estimators and the oracle properties of adaptive LASSO(aL ASSO) variable selection. Simulations are used to investigate the performances of the proposed estimator and illustrate that it is effective for simultaneous variable selection as well as estimation of the single-index models.展开更多
This paper considers variable selection for moment restriction models. We propose a penalized empirical likelihood (PEL) approach that has desirable asymptotic properties comparable to the penalized likelihood appro...This paper considers variable selection for moment restriction models. We propose a penalized empirical likelihood (PEL) approach that has desirable asymptotic properties comparable to the penalized likelihood approach, which relies on a correct parametric likelihood specification. In addition to being consistent and having the oracle property, PEL admits inference on parameter without having to estimate its estimator's covariance. An approximate algorithm, along with a consistent BIC-type criterion for selecting the tuning parameters, is provided for FEL. The proposed algorithm enjoys considerable computational efficiency and overcomes the drawback of the local quadratic approximation of nonconcave penalties. Simulation studies to evaluate and compare the performances of our method with those of the existing ones show that PEL is competitive and robust. The proposed method is illustrated with two real examples.展开更多
We consider the problem of variable selection for single-index varying-coefficient model, and present a regularized variable selection procedure by combining basis function approximations with SCAD penalty. The propos...We consider the problem of variable selection for single-index varying-coefficient model, and present a regularized variable selection procedure by combining basis function approximations with SCAD penalty. The proposed procedure simultaneously selects significant covariates with functional coefficients and local significant variables with parametric coefficients. With appropriate selection of the tuning parameters, the consistency of the variable selection procedure and the oracle property of the estimators are established. The proposed method can naturally be applied to deal with pure single-index model and varying-coefficient model. Finite sample performances of the proposed method are illustrated by a simulation study and the real data analysis.展开更多
The varying-coefficient model is flexible and powerful for modeling the dynamic changes of regression coefficients. We study the problem of variable selection and estimation in this model in the sparse, high- dimensio...The varying-coefficient model is flexible and powerful for modeling the dynamic changes of regression coefficients. We study the problem of variable selection and estimation in this model in the sparse, high- dimensional case. We develop a concave group selection approach for this problem using basis function expansion and study its theoretical and empirical properties. We also apply the group Lasso for variable selection and estimation in this model and study its properties. Under appropriate conditions, we show that the group least absolute shrinkage and selection operator (Lasso) selects a model whose dimension is comparable to the underlying mode], regardless of the large number of unimportant variables. In order to improve the selection results, we show that the group minimax concave penalty (MCP) has the oracle selection property in the sense that it correctly selects important variables with probability converging to one under suitable conditions. By comparison, the group Lasso does not have the oracle selection property. In the simulation parts, we apply the group Lasso and the group MCP. At the same time, the two approaches are evaluated using simulation and demonstrated on a data example.展开更多
In this puper, we consider the problem of variabie selection and model detection in varying coefficient models with longitudinM data. We propose a combined penalization procedure to select the significant variables, d...In this puper, we consider the problem of variabie selection and model detection in varying coefficient models with longitudinM data. We propose a combined penalization procedure to select the significant variables, detect the true structure of the model and estimate the unknown regression coefficients simultaneously. With appropriate selection of the tuning parameters, we show that the proposed procedure is consistent in both variable selection and the separation of varying and constant coefficients, and the penalized estimators have the oracle property. Finite sample performances of the proposed method are illustrated by some simulation studies and the real data analysis.展开更多
In this paper,we focus on the partially linear varying-coefficient quantile regression with missing observations under ultra-high dimension,where the missing observations include either responses or covariates or the ...In this paper,we focus on the partially linear varying-coefficient quantile regression with missing observations under ultra-high dimension,where the missing observations include either responses or covariates or the responses and part of the covariates are missing at random,and the ultra-high dimension implies that the dimension of parameter is much larger than sample size.Based on the B-spline method for the varying coefficient functions,we study the consistency of the oracle estimator which is obtained only using active covariates whose coefficients are nonzero.At the same time,we discuss the asymptotic normality of the oracle estimator for the linear parameter.Note that the active covariates are unknown in practice,non-convex penalized estimator is investigated for simultaneous variable selection and estimation,whose oracle property is also established.Finite sample behavior of the proposed methods is investigated via simulations and real data analysis.展开更多
In practice, predictors possess grouping structures spontaneously. Incorporation of such useful information can improve statistical modeling and inference. In addition, the high-dimensionality often leads to the colli...In practice, predictors possess grouping structures spontaneously. Incorporation of such useful information can improve statistical modeling and inference. In addition, the high-dimensionality often leads to the collinearity problem. The elastic net is an ideal method which is inclined to reflect a grouping effect. In this paper, we consider the problem of group selection and estimation in the sparse linear regression model in which predictors can be grouped. We investigate a group adaptive elastic-net and derive oracle inequalities and model consistency for the cases where group number is larger than the sample size. Oracle property is addressed for the case of the fixed group number. We revise the locally approximated coordinate descent algorithm to make our computation. Simulation and real data studies indicate that the group adaptive elastic-net is an alternative and competitive method for model selection of high-dimensional problems for the cases of group number being larger than the sample size.展开更多
This paper employs the SCAD-penalized least squares method to simultaneously select variables and estimate the coefficients for high-dimensional covariate adjusted linear regression models.The distorted variables are ...This paper employs the SCAD-penalized least squares method to simultaneously select variables and estimate the coefficients for high-dimensional covariate adjusted linear regression models.The distorted variables are assumed to be contaminated with a multiplicative factor that is determined by the value of an unknown function of an observable covariate.The authors show that under some appropriate conditions,the SCAD-penalized least squares estimator has the so called "oracle property".In addition,the authors also suggest a BIC criterion to select the tuning parameter,and show that BIC criterion is able to identify the true model consistently for the covariate adjusted linear regression models.Simulation studies and a real data are used to illustrate the efficiency of the proposed estimation algorithm.展开更多
In statistics and machine learning communities, the last fifteen years have witnessed a surge of high-dimensional models backed by penalized methods and other state-of-the-art variable selection techniques.The high-di...In statistics and machine learning communities, the last fifteen years have witnessed a surge of high-dimensional models backed by penalized methods and other state-of-the-art variable selection techniques.The high-dimensional models we refer to differ from conventional models in that the number of all parameters p and number of significant parameters s are both allowed to grow with the sample size T. When the field-specific knowledge is preliminary and in view of recent and potential affluence of data from genetics, finance and on-line social networks, etc., such(s, T, p)-triply diverging models enjoy ultimate flexibility in terms of modeling, and they can be used as a data-guided first step of investigation. However, model selection consistency and other theoretical properties were addressed only for independent data, leaving time series largely uncovered. On a simple linear regression model endowed with a weakly dependent sequence, this paper applies a penalized least squares(PLS) approach. Under regularity conditions, we show sign consistency, derive finite sample bound with high probability for estimation error, and prove that PLS estimate is consistent in L_2 norm with rate (s log s/T)~1/2.展开更多
In this paper,we develop a flexible semiparametric model averaging marginal regression procedure to forecast the joint conditional quantile function of the response variable for ultrahighdimensional data.First,we appr...In this paper,we develop a flexible semiparametric model averaging marginal regression procedure to forecast the joint conditional quantile function of the response variable for ultrahighdimensional data.First,we approximate the joint conditional quantile function by a weighted average of one-dimensional marginal conditional quantile functions that have varying coefficient structures.Then,a local linear regression technique is employed to derive the consistent estimates of marginal conditional quantile functions.Second,based on estimated marginal conditional quantile functions,we estimate and select the significant model weights involved in the approximation by a nonconvex penalized quantile regression.Under some relaxed conditions,we establish the asymptotic properties for the nonparametric kernel estimators and oracle estimators of the model averaging weights.We further derive the oracle property for the proposed nonconvex penalized model averaging procedure.Finally,simulation studies and a real data analysis are conducted to illustrate the merits of our proposed model averaging method.展开更多
In many applications,covariates can be naturally grouped.For example,for gene expression data analysis,genes belonging to the same pathway might be viewed as a group.This paper studies variable selection problem for c...In many applications,covariates can be naturally grouped.For example,for gene expression data analysis,genes belonging to the same pathway might be viewed as a group.This paper studies variable selection problem for censored survival data in the additive hazards model when covariates are grouped.A hierarchical regularization method is proposed to simultaneously estimate parameters and select important variables at both the group level and the within-group level.For the situations in which the number of parameters tends to∞as the sample size increases,we establish an oracle property and asymptotic normality property of the proposed estimators.Numerical results indicate that the hierarchically penalized method performs better than some existing methods such as lasso,smoothly clipped absolute deviation(SCAD)and adaptive lasso.展开更多
When there are outliers or heavy-tailed distributions in the data, the traditional least squares with penalty function is no longer applicable. In addition, with the rapid development of science and technology, a lot ...When there are outliers or heavy-tailed distributions in the data, the traditional least squares with penalty function is no longer applicable. In addition, with the rapid development of science and technology, a lot of data, enjoying high dimension, strong correlation and redundancy, has been generated in real life. So it is necessary to find an effective variable selection method for dealing with collinearity based on the robust method. This paper proposes a penalized M-estimation method based on standard error adjusted adaptive elastic-net, which uses M-estimators and the corresponding standard errors as weights. The consistency and asymptotic normality of this method are proved theoretically. For the regularization in high-dimensional space, the authors use the multi-step adaptive elastic-net to reduce the dimension to a relatively large scale which is less than the sample size, and then use the proposed method to select variables and estimate parameters. Finally, the authors carry out simulation studies and two real data analysis to examine the finite sample performance of the proposed method. The results show that the proposed method has some advantages over other commonly used methods.展开更多
In personalised medicine,the goal is tomake a treatment recommendation for each patient with a given set of covariates tomaximise the treatment benefitmeasured by patient’s response to the treatment.In application,su...In personalised medicine,the goal is tomake a treatment recommendation for each patient with a given set of covariates tomaximise the treatment benefitmeasured by patient’s response to the treatment.In application,such a treatment assignment rule is constructed using a sample training data consisting of patients’responses and covariates.Instead of modelling responses using treatments and covariates,an alternative approach is maximising a response-weighted target function whose value directly reflects the effectiveness of treatment assignments.Since the target function involves a loss function,efforts have been made recently on the choice of the loss function to ensure a computationally feasible and theoretically sound solution.We propose to use a smooth hinge loss function so that the target function is convex and differentiable,which possesses good asymptotic properties and numerical advantages.To further simplify the computation and interpretability,we focus on the rules that are linear functions of covariates and discuss their asymptotic properties.We also examine the performances of our method with simulation studies and real data analysis.展开更多
This paper is concerned with the statistical inference of partially linear varying coefficient dynamic panel data model with incidental parameter, including efficient estimation of the parametric and nonparametric com...This paper is concerned with the statistical inference of partially linear varying coefficient dynamic panel data model with incidental parameter, including efficient estimation of the parametric and nonparametric components and consistent determination of the lagged order. For the parametric component, we propose an efficient semiparametric generalized method-of-moments(GMM) estimator and establish its asymptotic normality. For the nonparametric component, B-spline series approximation is employed to estimate the unknown coefficient functions, which are shown to achieve the optimal nonparametric convergence rate. A consistent estimator of the variance of error component is also constructed. In addition, by using the smooth-threshold GMM estimating equations, we propose a variable selection method to identify the significant order of lagged terms automatically and remove the irrelevant regressors by setting their coefficient to zeros. As a result, it can consistently determine the true lagged order and specify the significant exogenous variables. Further studies show that the resulting estimator has the same asymptotic properties as if the true lagged order and significant regressors were known prior, i.e., achieving the oracle property. Numerical experiments are conducted to evaluate the finite sample performance of our procedures. An example of application is also illustrated.展开更多
文摘We study the asymptotic properties of adaptive lasso estimators when some components of the parameter of interest βare strictly different than zero, while other components may be zero or may converge to zero with rate n<sup>-δ</sup>, with δ>0, where n denotes the sample size. To achieve this objective, we analyze the convergence/divergence rates of each term in the first-order conditions of adaptive lasso estimators. First, we derive conditions that allow selecting tuning parameters in order to ensure that adaptive lasso estimates of n<sup>-δ</sup>-components indeed collapse to zero. Second, in this case, we also derive asymptotic distributions of adaptive lasso estimators for nonzero components. When δ>1/2, we obtain the usual n<sup>1/2</sup>-asymptotic normal distribution, while when 0δ≤1/2, we show n<sup>δ</sup>-consistency combined with (biased) n<sup>1/2-δ</sup>-asymptotic normality for nonzero components. We call these properties, Extended Oracle Properties. These results allow practitioners to exclude in their model the asymptotically negligible variables and make inferences on the asymptotically relevant variables.
基金Supported by the National Natural Science Foundation of China(11501578,11501579,11701571,41572315)the Fundamental Research Funds for the Central Universities(CUGW150809)
文摘The seamless-L0 (SELO) penalty is a smooth function on [0, ∞) that very closely resembles the L0 penalty, which has been demonstrated theoretically and practically to be effective in nonconvex penalization for variable selection. In this paper, we first generalize SELO to a class of penalties retaining good features of SELO, and then propose variable selection and estimation in linear models using the proposed generalized SELO (GSELO) penalized least squares (PLS) approach. We show that the GSELO-PLS procedure possesses the oracle property and consistently selects the true model under some regularity conditions in the presence of a diverging number of variables. The entire path of GSELO-PLS estimates can be efficiently computed through a smoothing quasi-Newton (SQN) method. A modified BIC coupled with a continuation strategy is developed to select the optimal tuning parameter. Simulation studies and analysis of a clinical data are carried out to evaluate the finite sample performance of the proposed method. In addition, numerical experiments involving simulation studies and analysis of a microarray data are also conducted for GSELO-PLS in the high-dimensional settings.
基金Supported by the National Natural Science Foundation of China(Grant No.11401340)China Postdoctoral Science Foundation(Grant No.2014M561892)+1 种基金the Foundation of Qufu Normal University(Grant Nos.bsqd2012041xkj201304)
文摘Nonconvex penalties including the smoothly clipped absolute deviation penalty and the minimax concave penalty enjoy the properties of unbiasedness, continuity and sparsity,and the ridge regression can deal with the collinearity problem. Combining the strengths of nonconvex penalties and ridge regression(abbreviated as NPR), we study the oracle property of the NPR estimator in high dimensional settings with highly correlated predictors, where the dimensionality of covariates pn is allowed to increase exponentially with the sample size n. Simulation studies and a real data example are presented to verify the performance of the NPR method.
基金supported by the Fundamental Research Funds for theCentralUniversities[grant number 2021CDJQY-047]National Natural Science Foundation of China[grant number 11801202].
文摘In this paper,we study high-dimensional sparse multiplicative models for positive response data and propose a variable sorted active set(VSAS)algorithm for finding the L_(0)regularized least product relative error(LPRE)estimator.The VSAS algorithm is derived from the local quadratic approximation based on the Karush-Kuhn-Tucker(KKT)conditions of L_(0)-penalized LPRE objective function.Under the condition of restricted invertibility,we establish an explicit L∞upper bound for the sequence of solutions generated by the VSAS algorithm.We further obtain an optimal convergence rate for the proposed estimator with high probability in finite iterations.In addition,our estimator enjoys the oracle property with high probability if the target signal exceeds the detectable level.Finally,extensive simulations and two real-world applications are conducted to illustrate the effectiveness of the proposal.
文摘This paper proposes a penalized method for high-dimensional variable selection and subgroup identification in the Tobit model.Based on Olsen’s[(1978).Note on the uniqueness of the maximum likelihood estimator for the Tobit model.Econometrica:Journal of the Econometric Society,46(5),1211-1215.https://doi.org/10.2307/1911445]convex reparameterization of the Tobit negative log-likelihood,we develop an efficient algorithm for minimizing the objective function by combining the alternating direction method of multipliers(ADMM)and generalised coordinate descent(GCD).We also establish the oracle properties of our proposed estimator under some mild regularity conditions.Furthermore,extensive simulations and an empirical data study are conducted to demonstrate the performance of the proposed approach.
文摘In this paper,the authors investigate three aspects of statistical inference for the partially linear regression models where some covariates are measured with errors.Firstly, a bandwidth selection procedure is proposed,which is a combination of the differencebased technique and GCV method.Secondly,a goodness-of-fit test procedure is proposed, which is an extension of the generalized likelihood technique.Thirdly,a variable selection procedure for the parametric part is provided based on the nonconcave penalization and corrected profile least squares.Same as"Variable selection via nonconcave penalized likelihood and its oracle properties"(J.Amer.Statist.Assoc.,96,2001,1348-1360),it is shown that the resulting estimator has an oracle property with a proper choice of regularization parameters and penalty function.Simulation studies are conducted to illustrate the finite sample performances of the proposed procedures.
基金supported by the National Natural Science Foundation of China under Grant No.61272041
文摘In this paper, based on spline approximation, the authors propose a unified variable selection approach for single-index model via adaptive L1 penalty. The calculation methods of the proposed estimators are given on the basis of the known lars algorithm. Under some regular conditions, the authors demonstrate the asymptotic properties of the proposed estimators and the oracle properties of adaptive LASSO(aL ASSO) variable selection. Simulations are used to investigate the performances of the proposed estimator and illustrate that it is effective for simultaneous variable selection as well as estimation of the single-index models.
基金supported partly by National Natural Science Foundation of China (Grant No. 11071045)Shanghai Leading Academic Discipline Project (Grant No. B210)
文摘This paper considers variable selection for moment restriction models. We propose a penalized empirical likelihood (PEL) approach that has desirable asymptotic properties comparable to the penalized likelihood approach, which relies on a correct parametric likelihood specification. In addition to being consistent and having the oracle property, PEL admits inference on parameter without having to estimate its estimator's covariance. An approximate algorithm, along with a consistent BIC-type criterion for selecting the tuning parameters, is provided for FEL. The proposed algorithm enjoys considerable computational efficiency and overcomes the drawback of the local quadratic approximation of nonconcave penalties. Simulation studies to evaluate and compare the performances of our method with those of the existing ones show that PEL is competitive and robust. The proposed method is illustrated with two real examples.
文摘We consider the problem of variable selection for single-index varying-coefficient model, and present a regularized variable selection procedure by combining basis function approximations with SCAD penalty. The proposed procedure simultaneously selects significant covariates with functional coefficients and local significant variables with parametric coefficients. With appropriate selection of the tuning parameters, the consistency of the variable selection procedure and the oracle property of the estimators are established. The proposed method can naturally be applied to deal with pure single-index model and varying-coefficient model. Finite sample performances of the proposed method are illustrated by a simulation study and the real data analysis.
基金supported by National Natural Science Foundation of China(GrantNos.71271128 and 11101442)the State Key Program of National Natural Science Foundation of China(GrantNo.71331006)+2 种基金National Center for Mathematics and Interdisciplinary Sciences(NCMIS)Shanghai Leading Academic Discipline Project A,in Ranking Top of Shanghai University of Finance and Economics(IRTSHUFE)Scientific Research Innovation Fund for PhD Studies(Grant No.CXJJ-2011-434)
文摘The varying-coefficient model is flexible and powerful for modeling the dynamic changes of regression coefficients. We study the problem of variable selection and estimation in this model in the sparse, high- dimensional case. We develop a concave group selection approach for this problem using basis function expansion and study its theoretical and empirical properties. We also apply the group Lasso for variable selection and estimation in this model and study its properties. Under appropriate conditions, we show that the group least absolute shrinkage and selection operator (Lasso) selects a model whose dimension is comparable to the underlying mode], regardless of the large number of unimportant variables. In order to improve the selection results, we show that the group minimax concave penalty (MCP) has the oracle selection property in the sense that it correctly selects important variables with probability converging to one under suitable conditions. By comparison, the group Lasso does not have the oracle selection property. In the simulation parts, we apply the group Lasso and the group MCP. At the same time, the two approaches are evaluated using simulation and demonstrated on a data example.
基金Supported by National Natural Science Foundation of China(Grant Nos.11501522,11101014,11001118 and11171012)National Statistical Research Projects(Grant No.2014LZ45)+2 种基金the Doctoral Fund of Innovation of Beijing University of Technologythe Science and Technology Project of the Faculty Adviser of Excellent PhD Degree Thesis of Beijing(Grant No.20111000503)the Beijing Municipal Education Commission Foundation(Grant No.KM201110005029)
文摘In this puper, we consider the problem of variabie selection and model detection in varying coefficient models with longitudinM data. We propose a combined penalization procedure to select the significant variables, detect the true structure of the model and estimate the unknown regression coefficients simultaneously. With appropriate selection of the tuning parameters, we show that the proposed procedure is consistent in both variable selection and the separation of varying and constant coefficients, and the penalized estimators have the oracle property. Finite sample performances of the proposed method are illustrated by some simulation studies and the real data analysis.
基金Supported by National Natural Science Foundation of China(Grant No.12071348)Fundamental Research Funds for Central Universities,China(Grant No.2023-3-2D-04)。
文摘In this paper,we focus on the partially linear varying-coefficient quantile regression with missing observations under ultra-high dimension,where the missing observations include either responses or covariates or the responses and part of the covariates are missing at random,and the ultra-high dimension implies that the dimension of parameter is much larger than sample size.Based on the B-spline method for the varying coefficient functions,we study the consistency of the oracle estimator which is obtained only using active covariates whose coefficients are nonzero.At the same time,we discuss the asymptotic normality of the oracle estimator for the linear parameter.Note that the active covariates are unknown in practice,non-convex penalized estimator is investigated for simultaneous variable selection and estimation,whose oracle property is also established.Finite sample behavior of the proposed methods is investigated via simulations and real data analysis.
基金supported by National Natural Science Foundation of China(Grant No.11571219)the Open Research Fund Program of Key Laboratory of Mathematical Economics(SUFE)(Grant No.201309KF02)Ministry of Education,and Changjiang Scholars and Innovative Research Team in University(Grant No.IRT13077)
文摘In practice, predictors possess grouping structures spontaneously. Incorporation of such useful information can improve statistical modeling and inference. In addition, the high-dimensionality often leads to the collinearity problem. The elastic net is an ideal method which is inclined to reflect a grouping effect. In this paper, we consider the problem of group selection and estimation in the sparse linear regression model in which predictors can be grouped. We investigate a group adaptive elastic-net and derive oracle inequalities and model consistency for the cases where group number is larger than the sample size. Oracle property is addressed for the case of the fixed group number. We revise the locally approximated coordinate descent algorithm to make our computation. Simulation and real data studies indicate that the group adaptive elastic-net is an alternative and competitive method for model selection of high-dimensional problems for the cases of group number being larger than the sample size.
基金supported by the National Natural Science Foundation of China under Grant Nos.11471029,11101014,61273221 and 11171010the Beijing Natural Science Foundation under Grant Nos.1142002 and 1112001+1 种基金the Science and Technology Project of Beijing Municipal Education Commission under Grant No.KM201410005010the Research Fund for the Doctoral Program of Beijing University of Technology under Grant No.006000543114550
文摘This paper employs the SCAD-penalized least squares method to simultaneously select variables and estimate the coefficients for high-dimensional covariate adjusted linear regression models.The distorted variables are assumed to be contaminated with a multiplicative factor that is determined by the value of an unknown function of an observable covariate.The authors show that under some appropriate conditions,the SCAD-penalized least squares estimator has the so called "oracle property".In addition,the authors also suggest a BIC criterion to select the tuning parameter,and show that BIC criterion is able to identify the true model consistently for the covariate adjusted linear regression models.Simulation studies and a real data are used to illustrate the efficiency of the proposed estimation algorithm.
基金supported by Natural Science Foundation of USA (Grant Nos. DMS1206464 and DMS1613338)National Institutes of Health of USA (Grant Nos. R01GM072611, R01GM100474 and R01GM120507)
文摘In statistics and machine learning communities, the last fifteen years have witnessed a surge of high-dimensional models backed by penalized methods and other state-of-the-art variable selection techniques.The high-dimensional models we refer to differ from conventional models in that the number of all parameters p and number of significant parameters s are both allowed to grow with the sample size T. When the field-specific knowledge is preliminary and in view of recent and potential affluence of data from genetics, finance and on-line social networks, etc., such(s, T, p)-triply diverging models enjoy ultimate flexibility in terms of modeling, and they can be used as a data-guided first step of investigation. However, model selection consistency and other theoretical properties were addressed only for independent data, leaving time series largely uncovered. On a simple linear regression model endowed with a weakly dependent sequence, this paper applies a penalized least squares(PLS) approach. Under regularity conditions, we show sign consistency, derive finite sample bound with high probability for estimation error, and prove that PLS estimate is consistent in L_2 norm with rate (s log s/T)~1/2.
基金Supported by the National Natural Science Foundation of China Grant(Grant No.12201091)Natural Science Foundation of Chongqing Grant(Grant Nos.CSTB2022NSCQ-MSX0852,cstc2021jcyj-msxmX0502)+3 种基金Innovation Support Program for Chongqing Overseas Returnees(Grant No.cx2020025)Science and Technology Research Program of Chongqing Municipal Education Commission(Grant Nos.KJQN202100526,KJQN201900511)the National Statistical Science Research Program(Grant No.2022LY019)Chongqing University Innovation Research Group Project:Nonlinear Optimization Method and Its Application(Grant No.CXQT20014)。
文摘In this paper,we develop a flexible semiparametric model averaging marginal regression procedure to forecast the joint conditional quantile function of the response variable for ultrahighdimensional data.First,we approximate the joint conditional quantile function by a weighted average of one-dimensional marginal conditional quantile functions that have varying coefficient structures.Then,a local linear regression technique is employed to derive the consistent estimates of marginal conditional quantile functions.Second,based on estimated marginal conditional quantile functions,we estimate and select the significant model weights involved in the approximation by a nonconvex penalized quantile regression.Under some relaxed conditions,we establish the asymptotic properties for the nonparametric kernel estimators and oracle estimators of the model averaging weights.We further derive the oracle property for the proposed nonconvex penalized model averaging procedure.Finally,simulation studies and a real data analysis are conducted to illustrate the merits of our proposed model averaging method.
基金supported by National Natural Science Foundation of China(Grant Nos.11171112,11101114 and 11201190)National Statistical Science Research Major Program of China(Grant No.2011LZ051)
文摘In many applications,covariates can be naturally grouped.For example,for gene expression data analysis,genes belonging to the same pathway might be viewed as a group.This paper studies variable selection problem for censored survival data in the additive hazards model when covariates are grouped.A hierarchical regularization method is proposed to simultaneously estimate parameters and select important variables at both the group level and the within-group level.For the situations in which the number of parameters tends to∞as the sample size increases,we establish an oracle property and asymptotic normality property of the proposed estimators.Numerical results indicate that the hierarchically penalized method performs better than some existing methods such as lasso,smoothly clipped absolute deviation(SCAD)and adaptive lasso.
基金supported by the National Natural Science Foundation of China under Grant Nos.12271294,12171225 and 12071248.
文摘When there are outliers or heavy-tailed distributions in the data, the traditional least squares with penalty function is no longer applicable. In addition, with the rapid development of science and technology, a lot of data, enjoying high dimension, strong correlation and redundancy, has been generated in real life. So it is necessary to find an effective variable selection method for dealing with collinearity based on the robust method. This paper proposes a penalized M-estimation method based on standard error adjusted adaptive elastic-net, which uses M-estimators and the corresponding standard errors as weights. The consistency and asymptotic normality of this method are proved theoretically. For the regularization in high-dimensional space, the authors use the multi-step adaptive elastic-net to reduce the dimension to a relatively large scale which is less than the sample size, and then use the proposed method to select variables and estimate parameters. Finally, the authors carry out simulation studies and two real data analysis to examine the finite sample performance of the proposed method. The results show that the proposed method has some advantages over other commonly used methods.
基金Research reported in this article was partially funded through a Patient-Centered Outcomes Research Institute(PCORI)Award[ME-1409-21219]The second author’s research was also partially supported by the Chinese 111 Project[B14019]the US National Science Foundation[grant number DMS-1612873].
文摘In personalised medicine,the goal is tomake a treatment recommendation for each patient with a given set of covariates tomaximise the treatment benefitmeasured by patient’s response to the treatment.In application,such a treatment assignment rule is constructed using a sample training data consisting of patients’responses and covariates.Instead of modelling responses using treatments and covariates,an alternative approach is maximising a response-weighted target function whose value directly reflects the effectiveness of treatment assignments.Since the target function involves a loss function,efforts have been made recently on the choice of the loss function to ensure a computationally feasible and theoretically sound solution.We propose to use a smooth hinge loss function so that the target function is convex and differentiable,which possesses good asymptotic properties and numerical advantages.To further simplify the computation and interpretability,we focus on the rules that are linear functions of covariates and discuss their asymptotic properties.We also examine the performances of our method with simulation studies and real data analysis.
基金supported by SHUFE Graduate Innovation and Creativity Funds(No.2011130151)supported by grants from the National Natural Science Foundation of China(NSFC)(No.11071154)+1 种基金partially supported by the Leading Academic Discipline Program211 Project for Shanghai University of Finance and Economics
文摘This paper is concerned with the statistical inference of partially linear varying coefficient dynamic panel data model with incidental parameter, including efficient estimation of the parametric and nonparametric components and consistent determination of the lagged order. For the parametric component, we propose an efficient semiparametric generalized method-of-moments(GMM) estimator and establish its asymptotic normality. For the nonparametric component, B-spline series approximation is employed to estimate the unknown coefficient functions, which are shown to achieve the optimal nonparametric convergence rate. A consistent estimator of the variance of error component is also constructed. In addition, by using the smooth-threshold GMM estimating equations, we propose a variable selection method to identify the significant order of lagged terms automatically and remove the irrelevant regressors by setting their coefficient to zeros. As a result, it can consistently determine the true lagged order and specify the significant exogenous variables. Further studies show that the resulting estimator has the same asymptotic properties as if the true lagged order and significant regressors were known prior, i.e., achieving the oracle property. Numerical experiments are conducted to evaluate the finite sample performance of our procedures. An example of application is also illustrated.