High-dimensional heterogeneous data have acquired increasing attention and discussion in the past decade.In the context of heterogeneity,semiparametric regression emerges as a popular method to model this type of data...High-dimensional heterogeneous data have acquired increasing attention and discussion in the past decade.In the context of heterogeneity,semiparametric regression emerges as a popular method to model this type of data in statistics.In this paper,we leverage the benefits of expectile regression for computational efficiency and analytical robustness in heterogeneity,and propose a regularized partially linear additive expectile regression model with a nonconvex penalty,such as SCAD or MCP,for high-dimensional heterogeneous data.We focus on a more realistic scenario where the regression error exhibits a heavy-tailed distribution with only finite moments.This scenario challenges the classical sub-gaussian distribution assumption and is more prevalent in practical applications.Under certain regular conditions,we demonstrate that with probability tending to one,the oracle estimator is one of the local minima of the induced optimization problem.Our theoretical analysis suggests that the dimensionality of linear covariates that our estimation procedure can handle is fundamentally limited by the moment condition of the regression error.Computationally,given the nonconvex and nonsmooth nature of the induced optimization problem,we have developed a two-step algorithm.Finally,our method’s effectiveness is demonstrated through its high estimation accuracy and effective model selection,as evidenced by Monte Carlo simulation studies and a real-data application.Furthermore,by taking various expectile weights,our method effectively detects heterogeneity and explores the complete conditional distribution of the response variable,underscoring its utility in analyzing high-dimensional heterogeneous data.展开更多
As extensions of means, expectiles embrace all the distribution information of a random variable.The expectile regression is computationally friendlier because the asymmetric least square loss function is differentiab...As extensions of means, expectiles embrace all the distribution information of a random variable.The expectile regression is computationally friendlier because the asymmetric least square loss function is differentiable everywhere. This regression also enables effective estimation of the expectiles of a response variable when potential explanatory variables are given. In this study, we propose the partial functional linear expectile regression model. The slope function and constant coefficients are estimated by using the functional principal component basis. The convergence rate of the slope function and the asymptotic normality of the parameter vector are established. To inspect the effect of the parametric component on the response variable, we develop Wald-type and expectile rank score tests and establish their asymptotic properties. The finite performance of the proposed estimators and test statistics are evaluated through simulation study. Results indicate that the proposed estimators are comparable to competing estimation methods and the newly proposed expectile rank score test is useful. The methodologies are illustrated by using two real data examples.展开更多
In this paper,we study the large-scale inference for a linear expectile regression model.To mitigate the computational challenges in the classical asymmetric least squares(ALS)estimation under massive data,we propose ...In this paper,we study the large-scale inference for a linear expectile regression model.To mitigate the computational challenges in the classical asymmetric least squares(ALS)estimation under massive data,we propose a communication-efficient divide and conquer algorithm to combine the information from sub-machines through confidence distributions.The resulting pooled estimator has a closed-form expression,and its consistency and asymptotic normality are established under mild conditions.Moreover,we derive the Bahadur representation of the ALS estimator,which serves as an important tool to study the relationship between the number of submachines K and the sample size.Numerical studies including both synthetic and real data examples are presented to illustrate the finite-sample performance of our method and support the theoretical results.展开更多
基金Supported by the Hangzhou Joint Fund of the Zhejiang Provincial Natural Science Foundation of Chi-na(LHZY24A010002)the MOE Project of Humanities and Social Sciences(21YJCZH235).
文摘High-dimensional heterogeneous data have acquired increasing attention and discussion in the past decade.In the context of heterogeneity,semiparametric regression emerges as a popular method to model this type of data in statistics.In this paper,we leverage the benefits of expectile regression for computational efficiency and analytical robustness in heterogeneity,and propose a regularized partially linear additive expectile regression model with a nonconvex penalty,such as SCAD or MCP,for high-dimensional heterogeneous data.We focus on a more realistic scenario where the regression error exhibits a heavy-tailed distribution with only finite moments.This scenario challenges the classical sub-gaussian distribution assumption and is more prevalent in practical applications.Under certain regular conditions,we demonstrate that with probability tending to one,the oracle estimator is one of the local minima of the induced optimization problem.Our theoretical analysis suggests that the dimensionality of linear covariates that our estimation procedure can handle is fundamentally limited by the moment condition of the regression error.Computationally,given the nonconvex and nonsmooth nature of the induced optimization problem,we have developed a two-step algorithm.Finally,our method’s effectiveness is demonstrated through its high estimation accuracy and effective model selection,as evidenced by Monte Carlo simulation studies and a real-data application.Furthermore,by taking various expectile weights,our method effectively detects heterogeneity and explores the complete conditional distribution of the response variable,underscoring its utility in analyzing high-dimensional heterogeneous data.
基金supported by National Natural Science Foundation of China(Grant No.11771032)Natural Science Foundation of Shanxi Province of China(Grant No.201901D111279)+1 种基金the Research Grant Council of the Hong Kong Special Administration Region(Grant Nos.14301918 and 14302519)。
文摘As extensions of means, expectiles embrace all the distribution information of a random variable.The expectile regression is computationally friendlier because the asymmetric least square loss function is differentiable everywhere. This regression also enables effective estimation of the expectiles of a response variable when potential explanatory variables are given. In this study, we propose the partial functional linear expectile regression model. The slope function and constant coefficients are estimated by using the functional principal component basis. The convergence rate of the slope function and the asymptotic normality of the parameter vector are established. To inspect the effect of the parametric component on the response variable, we develop Wald-type and expectile rank score tests and establish their asymptotic properties. The finite performance of the proposed estimators and test statistics are evaluated through simulation study. Results indicate that the proposed estimators are comparable to competing estimation methods and the newly proposed expectile rank score test is useful. The methodologies are illustrated by using two real data examples.
文摘In this paper,we study the large-scale inference for a linear expectile regression model.To mitigate the computational challenges in the classical asymmetric least squares(ALS)estimation under massive data,we propose a communication-efficient divide and conquer algorithm to combine the information from sub-machines through confidence distributions.The resulting pooled estimator has a closed-form expression,and its consistency and asymptotic normality are established under mild conditions.Moreover,we derive the Bahadur representation of the ALS estimator,which serves as an important tool to study the relationship between the number of submachines K and the sample size.Numerical studies including both synthetic and real data examples are presented to illustrate the finite-sample performance of our method and support the theoretical results.