高维数据一般因具有异方差或非齐次协变量而具有异质性,分位数回归和expectile回归是分析异质高维数据的有力工具,但前者由于损失函数非光滑的特性在计算方面存在较大挑战,而后者会因异常值而不稳健。本文利用一类稳健的非对称损失函数...高维数据一般因具有异方差或非齐次协变量而具有异质性,分位数回归和expectile回归是分析异质高维数据的有力工具,但前者由于损失函数非光滑的特性在计算方面存在较大挑战,而后者会因异常值而不稳健。本文利用一类稳健的非对称损失函数来研究部分线性可加模型的稳健expectile回归,用B样条基函数近似非参数部分,利用加入非凸惩罚的正则化方法来实现变量筛选并进行参数估计。该方法的优势在于:(1) 通过取不同分位水平得到响应变量更完整的条件分布,从而探索数据的异质性分布;(2) 部分线性的模型结构兼顾了线性解释变量和非线性解释变量,一方面增加了模型的灵活性,同时也具有一定的模型可解释性;(3) 稳健expectile回归估计比分位数回归方法计算效率高,比expectile回归稳健。数值模拟和实际数据分析均显示了该方法在模型估计和计算效率上的优势。High-dimensional data are generally heterogeneous due to heteroskedasticity or non-homogeneous covariates. Quantile regression and expectile regression are powerful tools for analyzing heterogeneous high-dimensional data, but the former is a great challenge in calculation due to the non-smooth nature of the loss function, while the latter is unstable due to outliers. In this paper, a class of robust asymmetric loss functions is used to study the robust expectile regression of partial linear additive models, the B-spline basis function is used to approximate the non-parametric part, and the regularization method with non-convex penalty is used to realize variable screening and parameter estimation. The advantages of this method are: (1) A more complete conditional distribution of response variables can be obtained by taking different quantile levels, so as to explore the heterogeneity distribution of data;(2) The partial linear model structure takes into account both linear explanatory variables and nonlinear explanatory variables, which increases the flexibility of the model on the one hand, and has a certain interpretability of the model;(3) The robust expectile regression estimation score digit regression method has higher computational efficiency and is more robust than the expectile regression. Both numerical simulation and actual data analysis show the advantages of the proposed method in model estimation and computational efficiency.展开更多
High-dimensional heterogeneous data have acquired increasing attention and discussion in the past decade.In the context of heterogeneity,semiparametric regression emerges as a popular method to model this type of data...High-dimensional heterogeneous data have acquired increasing attention and discussion in the past decade.In the context of heterogeneity,semiparametric regression emerges as a popular method to model this type of data in statistics.In this paper,we leverage the benefits of expectile regression for computational efficiency and analytical robustness in heterogeneity,and propose a regularized partially linear additive expectile regression model with a nonconvex penalty,such as SCAD or MCP,for high-dimensional heterogeneous data.We focus on a more realistic scenario where the regression error exhibits a heavy-tailed distribution with only finite moments.This scenario challenges the classical sub-gaussian distribution assumption and is more prevalent in practical applications.Under certain regular conditions,we demonstrate that with probability tending to one,the oracle estimator is one of the local minima of the induced optimization problem.Our theoretical analysis suggests that the dimensionality of linear covariates that our estimation procedure can handle is fundamentally limited by the moment condition of the regression error.Computationally,given the nonconvex and nonsmooth nature of the induced optimization problem,we have developed a two-step algorithm.Finally,our method’s effectiveness is demonstrated through its high estimation accuracy and effective model selection,as evidenced by Monte Carlo simulation studies and a real-data application.Furthermore,by taking various expectile weights,our method effectively detects heterogeneity and explores the complete conditional distribution of the response variable,underscoring its utility in analyzing high-dimensional heterogeneous data.展开更多
甄别和确定风险因素的贡献是资产或资产组合风险管理的重要研究内容。近十年,下端风险越来越受到关注,在险价值(Value at Risk,VaR)和预期不足(Expected Shortfall,ES)是资产组合风险管理中两个常用的风险度量工具。Kuan等[1]在一类条...甄别和确定风险因素的贡献是资产或资产组合风险管理的重要研究内容。近十年,下端风险越来越受到关注,在险价值(Value at Risk,VaR)和预期不足(Expected Shortfall,ES)是资产组合风险管理中两个常用的风险度量工具。Kuan等[1]在一类条件自回归模型(CARE)下提出了基于expectile的VaR度量-EVaR。本文扩展了Kuan等[2]的CARE模型到带有异方差的数据,引入ARCH效应提出了一个线性ARCH-Expectile模型,旨在确定资产或资产组合的风险来源以及评估各风险因素的贡献大小,并应用expectile间接评估VaR和ES风险大小。同时给出了参数的两步估计算法,并建立了参数估计的大样本理论。最后,将本文所提出的方法应用于民生银行股票损益的风险分析,从公司基本面、市场流动性和宏观层面三个方面选取影响股票损益的风险因素,分析结果表明,各风险因素随股票极端损失大小的水平不同,其风险因素的来源及其大小和方向也是随之变化的。展开更多
This paper develops the theory of the kth power expectile estimation and considers its relevant hypothesis tests for coefficients of linear regression models.We prove that the asymptotic covariance matrix of kth power...This paper develops the theory of the kth power expectile estimation and considers its relevant hypothesis tests for coefficients of linear regression models.We prove that the asymptotic covariance matrix of kth power expectile regression converges to that of quantile regression as k converges to one and hence promise a moment estimator of asymptotic matrix of quantile regression.The kth power expectile regression is then utilized to test for homoskedasticity and conditional symmetry of the data.Detailed comparisons of the local power among the kth power expectile regression tests,the quantile regression test,and the expectile regression test have been provided.When the underlying distribution is not standard normal,results show that the optimal k are often larger than 1 and smaller than 2,which suggests the general kth power expectile regression is necessary.Finally,the methods are illustrated by a real example.展开更多
文摘高维数据一般因具有异方差或非齐次协变量而具有异质性,分位数回归和expectile回归是分析异质高维数据的有力工具,但前者由于损失函数非光滑的特性在计算方面存在较大挑战,而后者会因异常值而不稳健。本文利用一类稳健的非对称损失函数来研究部分线性可加模型的稳健expectile回归,用B样条基函数近似非参数部分,利用加入非凸惩罚的正则化方法来实现变量筛选并进行参数估计。该方法的优势在于:(1) 通过取不同分位水平得到响应变量更完整的条件分布,从而探索数据的异质性分布;(2) 部分线性的模型结构兼顾了线性解释变量和非线性解释变量,一方面增加了模型的灵活性,同时也具有一定的模型可解释性;(3) 稳健expectile回归估计比分位数回归方法计算效率高,比expectile回归稳健。数值模拟和实际数据分析均显示了该方法在模型估计和计算效率上的优势。High-dimensional data are generally heterogeneous due to heteroskedasticity or non-homogeneous covariates. Quantile regression and expectile regression are powerful tools for analyzing heterogeneous high-dimensional data, but the former is a great challenge in calculation due to the non-smooth nature of the loss function, while the latter is unstable due to outliers. In this paper, a class of robust asymmetric loss functions is used to study the robust expectile regression of partial linear additive models, the B-spline basis function is used to approximate the non-parametric part, and the regularization method with non-convex penalty is used to realize variable screening and parameter estimation. The advantages of this method are: (1) A more complete conditional distribution of response variables can be obtained by taking different quantile levels, so as to explore the heterogeneity distribution of data;(2) The partial linear model structure takes into account both linear explanatory variables and nonlinear explanatory variables, which increases the flexibility of the model on the one hand, and has a certain interpretability of the model;(3) The robust expectile regression estimation score digit regression method has higher computational efficiency and is more robust than the expectile regression. Both numerical simulation and actual data analysis show the advantages of the proposed method in model estimation and computational efficiency.
基金Supported by the Hangzhou Joint Fund of the Zhejiang Provincial Natural Science Foundation of Chi-na(LHZY24A010002)the MOE Project of Humanities and Social Sciences(21YJCZH235).
文摘High-dimensional heterogeneous data have acquired increasing attention and discussion in the past decade.In the context of heterogeneity,semiparametric regression emerges as a popular method to model this type of data in statistics.In this paper,we leverage the benefits of expectile regression for computational efficiency and analytical robustness in heterogeneity,and propose a regularized partially linear additive expectile regression model with a nonconvex penalty,such as SCAD or MCP,for high-dimensional heterogeneous data.We focus on a more realistic scenario where the regression error exhibits a heavy-tailed distribution with only finite moments.This scenario challenges the classical sub-gaussian distribution assumption and is more prevalent in practical applications.Under certain regular conditions,we demonstrate that with probability tending to one,the oracle estimator is one of the local minima of the induced optimization problem.Our theoretical analysis suggests that the dimensionality of linear covariates that our estimation procedure can handle is fundamentally limited by the moment condition of the regression error.Computationally,given the nonconvex and nonsmooth nature of the induced optimization problem,we have developed a two-step algorithm.Finally,our method’s effectiveness is demonstrated through its high estimation accuracy and effective model selection,as evidenced by Monte Carlo simulation studies and a real-data application.Furthermore,by taking various expectile weights,our method effectively detects heterogeneity and explores the complete conditional distribution of the response variable,underscoring its utility in analyzing high-dimensional heterogeneous data.
文摘甄别和确定风险因素的贡献是资产或资产组合风险管理的重要研究内容。近十年,下端风险越来越受到关注,在险价值(Value at Risk,VaR)和预期不足(Expected Shortfall,ES)是资产组合风险管理中两个常用的风险度量工具。Kuan等[1]在一类条件自回归模型(CARE)下提出了基于expectile的VaR度量-EVaR。本文扩展了Kuan等[2]的CARE模型到带有异方差的数据,引入ARCH效应提出了一个线性ARCH-Expectile模型,旨在确定资产或资产组合的风险来源以及评估各风险因素的贡献大小,并应用expectile间接评估VaR和ES风险大小。同时给出了参数的两步估计算法,并建立了参数估计的大样本理论。最后,将本文所提出的方法应用于民生银行股票损益的风险分析,从公司基本面、市场流动性和宏观层面三个方面选取影响股票损益的风险因素,分析结果表明,各风险因素随股票极端损失大小的水平不同,其风险因素的来源及其大小和方向也是随之变化的。
文摘This paper develops the theory of the kth power expectile estimation and considers its relevant hypothesis tests for coefficients of linear regression models.We prove that the asymptotic covariance matrix of kth power expectile regression converges to that of quantile regression as k converges to one and hence promise a moment estimator of asymptotic matrix of quantile regression.The kth power expectile regression is then utilized to test for homoskedasticity and conditional symmetry of the data.Detailed comparisons of the local power among the kth power expectile regression tests,the quantile regression test,and the expectile regression test have been provided.When the underlying distribution is not standard normal,results show that the optimal k are often larger than 1 and smaller than 2,which suggests the general kth power expectile regression is necessary.Finally,the methods are illustrated by a real example.