Nowadays a common problem when processing data sets with the large number of covariates compared to small sample sizes (fat data sets) is to estimate the parameters associated with each covariate. When the number of c...Nowadays a common problem when processing data sets with the large number of covariates compared to small sample sizes (fat data sets) is to estimate the parameters associated with each covariate. When the number of covariates far exceeds the number of samples, the parameter estimation becomes very difficult. Researchers in many fields such as text categorization deal with the burden of finding and estimating important covariates without overfitting the model. In this study, we developed a Sparse Probit Bayesian Model (SPBM) based on Gibbs sampling which utilizes double exponentials prior to induce shrinkage and reduce the number of covariates in the model. The method was evaluated using ten domains such as mathematics, the corpuses of which were downloaded from Wikipedia. From the downloaded corpuses, we created the TFIDF matrix corresponding to all domains and divided the whole data set randomly into training and testing groups of size 300. To make the model more robust we performed 50 re-samplings on selection of training and test groups. The model was implemented in R and the Gibbs sampler ran for 60 k iterations and the first 20 k was discarded as burn in. We performed classification on training and test groups by calculating P (yi = 1) and according to [1] [2] the threshold of 0.5 was used as decision rule. Our model’s performance was compared to Support Vector Machines (SVM) using average sensitivity and specificity across 50 runs. The SPBM achieved high classification accuracy and outperformed SVM in almost all domains analyzed.展开更多
针对混频数据的建模问题,提出自回归U-MIDAS(unrestricted mixed data sampling)分位回归模型.首先,结合嵌套Lasso惩罚方法及spike-and-slab先验进行Bayes参数估计和变量选择;其次,通过数值模拟证明该方法的优越性;最后,将该方法用于美...针对混频数据的建模问题,提出自回归U-MIDAS(unrestricted mixed data sampling)分位回归模型.首先,结合嵌套Lasso惩罚方法及spike-and-slab先验进行Bayes参数估计和变量选择;其次,通过数值模拟证明该方法的优越性;最后,将该方法用于美国名义国内生产总值(GDP)年化季度增长率的预测,结果表明,该方法预测精度较好.展开更多
In this paper, we consider the problem of estimating a high dimensional precision matrix of Gaussian graphical model. Taking advantage of the connection between multivariate linear regression and entries of the precis...In this paper, we consider the problem of estimating a high dimensional precision matrix of Gaussian graphical model. Taking advantage of the connection between multivariate linear regression and entries of the precision matrix, we propose Bayesian Lasso together with neighborhood regression estimate for Gaussian graphical model. This method can obtain parameter estimation and model selection simultaneously. Moreover, the proposed method can provide symmetric confidence intervals of all entries of the precision matrix.展开更多
文摘Nowadays a common problem when processing data sets with the large number of covariates compared to small sample sizes (fat data sets) is to estimate the parameters associated with each covariate. When the number of covariates far exceeds the number of samples, the parameter estimation becomes very difficult. Researchers in many fields such as text categorization deal with the burden of finding and estimating important covariates without overfitting the model. In this study, we developed a Sparse Probit Bayesian Model (SPBM) based on Gibbs sampling which utilizes double exponentials prior to induce shrinkage and reduce the number of covariates in the model. The method was evaluated using ten domains such as mathematics, the corpuses of which were downloaded from Wikipedia. From the downloaded corpuses, we created the TFIDF matrix corresponding to all domains and divided the whole data set randomly into training and testing groups of size 300. To make the model more robust we performed 50 re-samplings on selection of training and test groups. The model was implemented in R and the Gibbs sampler ran for 60 k iterations and the first 20 k was discarded as burn in. We performed classification on training and test groups by calculating P (yi = 1) and according to [1] [2] the threshold of 0.5 was used as decision rule. Our model’s performance was compared to Support Vector Machines (SVM) using average sensitivity and specificity across 50 runs. The SPBM achieved high classification accuracy and outperformed SVM in almost all domains analyzed.
文摘针对混频数据的建模问题,提出自回归U-MIDAS(unrestricted mixed data sampling)分位回归模型.首先,结合嵌套Lasso惩罚方法及spike-and-slab先验进行Bayes参数估计和变量选择;其次,通过数值模拟证明该方法的优越性;最后,将该方法用于美国名义国内生产总值(GDP)年化季度增长率的预测,结果表明,该方法预测精度较好.
基金Supported by the National Natural Science Foundation of China(No.11571080)
文摘In this paper, we consider the problem of estimating a high dimensional precision matrix of Gaussian graphical model. Taking advantage of the connection between multivariate linear regression and entries of the precision matrix, we propose Bayesian Lasso together with neighborhood regression estimate for Gaussian graphical model. This method can obtain parameter estimation and model selection simultaneously. Moreover, the proposed method can provide symmetric confidence intervals of all entries of the precision matrix.