In statistics and machine learning communities, the last fifteen years have witnessed a surge of high-dimensional models backed by penalized methods and other state-of-the-art variable selection techniques.The high-di...In statistics and machine learning communities, the last fifteen years have witnessed a surge of high-dimensional models backed by penalized methods and other state-of-the-art variable selection techniques.The high-dimensional models we refer to differ from conventional models in that the number of all parameters p and number of significant parameters s are both allowed to grow with the sample size T. When the field-specific knowledge is preliminary and in view of recent and potential affluence of data from genetics, finance and on-line social networks, etc., such(s, T, p)-triply diverging models enjoy ultimate flexibility in terms of modeling, and they can be used as a data-guided first step of investigation. However, model selection consistency and other theoretical properties were addressed only for independent data, leaving time series largely uncovered. On a simple linear regression model endowed with a weakly dependent sequence, this paper applies a penalized least squares(PLS) approach. Under regularity conditions, we show sign consistency, derive finite sample bound with high probability for estimation error, and prove that PLS estimate is consistent in L_2 norm with rate (s log s/T)~1/2.展开更多
The objective of this paper is to quantify the complexity of rank and nuclear norm constrained methods for low rank matrix estimation problems. Specifically, we derive analytic forms of the degrees of freedom for thes...The objective of this paper is to quantify the complexity of rank and nuclear norm constrained methods for low rank matrix estimation problems. Specifically, we derive analytic forms of the degrees of freedom for these types of estimators in several common settings. These results provide efficient ways of comparing different estimators and eliciting tuning parameters. Moreover, our analyses reveal new insights on the behavior of these low rank matrix estimators. These observations are of great theoretical and practical importance. In particular, they suggest that, contrary to conventional wisdom, for rank constrained estimators the total number of free parameters underestimates the degrees of freedom, whereas for nuclear norm penalization, it overestimates the degrees of freedom. In addition, when using most model selection criteria to choose the tuning parameter for nuclear norm penalization, it oftentimes suffices to entertain a finite number of candidates as opposed to a continuum of choices. Numerical examples are also presented to illustrate the practical implications of our results.展开更多
基金supported by Natural Science Foundation of USA (Grant Nos. DMS1206464 and DMS1613338)National Institutes of Health of USA (Grant Nos. R01GM072611, R01GM100474 and R01GM120507)
文摘In statistics and machine learning communities, the last fifteen years have witnessed a surge of high-dimensional models backed by penalized methods and other state-of-the-art variable selection techniques.The high-dimensional models we refer to differ from conventional models in that the number of all parameters p and number of significant parameters s are both allowed to grow with the sample size T. When the field-specific knowledge is preliminary and in view of recent and potential affluence of data from genetics, finance and on-line social networks, etc., such(s, T, p)-triply diverging models enjoy ultimate flexibility in terms of modeling, and they can be used as a data-guided first step of investigation. However, model selection consistency and other theoretical properties were addressed only for independent data, leaving time series largely uncovered. On a simple linear regression model endowed with a weakly dependent sequence, this paper applies a penalized least squares(PLS) approach. Under regularity conditions, we show sign consistency, derive finite sample bound with high probability for estimation error, and prove that PLS estimate is consistent in L_2 norm with rate (s log s/T)~1/2.
基金supported by National Science Foundation of USA (Grant No. DMS1265202)National Institutes of Health of USA (Grant No. 1-U54AI117924-01)
文摘The objective of this paper is to quantify the complexity of rank and nuclear norm constrained methods for low rank matrix estimation problems. Specifically, we derive analytic forms of the degrees of freedom for these types of estimators in several common settings. These results provide efficient ways of comparing different estimators and eliciting tuning parameters. Moreover, our analyses reveal new insights on the behavior of these low rank matrix estimators. These observations are of great theoretical and practical importance. In particular, they suggest that, contrary to conventional wisdom, for rank constrained estimators the total number of free parameters underestimates the degrees of freedom, whereas for nuclear norm penalization, it overestimates the degrees of freedom. In addition, when using most model selection criteria to choose the tuning parameter for nuclear norm penalization, it oftentimes suffices to entertain a finite number of candidates as opposed to a continuum of choices. Numerical examples are also presented to illustrate the practical implications of our results.