The traditional estimation of Gaussian mixture model is sensitive to heavy-tailed errors;thus we propose a robust mixture regression model by assuming that the error terms follow a Laplace distribution in this article...The traditional estimation of Gaussian mixture model is sensitive to heavy-tailed errors;thus we propose a robust mixture regression model by assuming that the error terms follow a Laplace distribution in this article. And for the variable selection problem in our new robust mixture regression model, we introduce the adaptive sparse group Lasso penalty to achieve sparsity at both the group-level and within-group-level. As numerical experiments show, compared with other alternative methods, our method has better performances in variable selection and parameter estimation. Finally, we apply our proposed method to analyze NBA salary data during the period from 2018 to 2019.展开更多
功能超网络广泛地应用于脑疾病诊断和分类研究中,而现有的关于超网络创建的研究缺乏解释分组效应的能力或者仅考虑到脑区间组级的信息,这样构建的脑功能超网络会丢失一些有用的连接或包含一些虚假的信息,因此,考虑到脑区间的组结构问题...功能超网络广泛地应用于脑疾病诊断和分类研究中,而现有的关于超网络创建的研究缺乏解释分组效应的能力或者仅考虑到脑区间组级的信息,这样构建的脑功能超网络会丢失一些有用的连接或包含一些虚假的信息,因此,考虑到脑区间的组结构问题,引入sparse group Lasso(sgLasso)方法进一步改善超网络的创建。首先,利用sgLasso方法进行超网络创建;然后,引入两组超网络特有的属性指标进行特征提取以及特征选择,这些指标分别是基于单一节点的聚类系数和基于一对节点的聚类系数;最后,将特征选择后得到的两组有显著差异的特征通过多核学习进行特征融合和分类。实验结果表明,所提方法经过多特征融合取得了87.88%的分类准确率。该结果表明为了改善脑功能超网络的创建,需要考虑到组信息,但不能逼迫使用整组信息,可以适当地对组结构进行扩展。展开更多
目前商业银行面临的个人信用风险问题极其复杂,如何对个人信用风险进行管理非常重要。个人信用风险建模是其中很关键的一步。利用某商业银行信用卡数据,构建信用评分模型,预测客户的违约概率。通过采用ROSE(random over sampling exampl...目前商业银行面临的个人信用风险问题极其复杂,如何对个人信用风险进行管理非常重要。个人信用风险建模是其中很关键的一步。利用某商业银行信用卡数据,构建信用评分模型,预测客户的违约概率。通过采用ROSE(random over sampling examples)方法处理类别不均衡的问题,利用Group-Lasso(AUC准则)方法进行变量选择,构建基于Logistic回归的信用评分模型。实证结果表明,该方法对样本数据进行类别不均衡处理的结果比其他模型在判别能力和预测能力上更为有效。采用该方法所构建的模型能够作为客户信用评价决策的有效依据,指导银行及其他金融机构评估顾客个人信用风险,在实际运用中具有良好的可操作性。展开更多
急性白血病可分为急性淋巴细胞白血病(ALL)和急性髓系白血病(AML)两大亚型,准确诊断是治疗急性白血病的前提和关键。本文基于急性白血病的基因芯片数据,结合两样本T检验、Wilconxon秩和检验、系统聚类法以及变量选择方法监督式分组套索...急性白血病可分为急性淋巴细胞白血病(ALL)和急性髓系白血病(AML)两大亚型,准确诊断是治疗急性白血病的前提和关键。本文基于急性白血病的基因芯片数据,结合两样本T检验、Wilconxon秩和检验、系统聚类法以及变量选择方法监督式分组套索法(supervised group lasso,SGLasso)筛选出对急性白血病分型(AML、ALL)有显著意义的基因,根据训练组数据建立关于急性白血病分型的逻辑回归模型,并对训练组和检验组中患者的病型作拟合和预测,验证该模型的预测精度。展开更多
The lack of validated tools to predict how long sow farms will remain PRRS virus-free following successful elimination of the virus has deterred veterinarians and producers from attempting to eliminate the PRRS virus ...The lack of validated tools to predict how long sow farms will remain PRRS virus-free following successful elimination of the virus has deterred veterinarians and producers from attempting to eliminate the PRRS virus from sow farms. The aim of this study was to use the database of PRRS Risk Assessments for the Breeding Herd in PADRAP to develop and validate an objective risk scoring system for predicting the likelihood of virus introduction in PRRS virus-free sow farms in the US. To overcome the challenges of dealing with a large number of variables, group lasso for logistic regression (GLLR) was applied to a retrospective dataset of PRRS Risk Assessment for the Breeding Herd surveys completed for 704 farms to develop the risk scoring system. The validity of the GLLR risk scoring system was then evaluated by testing its predictive ability on a dataset from a long-term prospective study of 196 sow farms to assess risk factors associated with how long PRRS virus-free sow farms remained PRRS virus-free. Receiver operator characteristic(ROC) curves were estimated to compare the performance of the GLLR risk scoring system to the risk scoring system based on expert opinion (EO), currently used in the PRRS Risk Assessment for the Breeding Herd, for predicting whether herds remained PRRS virus-free for 130 weeks. The GLLR risk scoring system (AUC, 0.76;95% CI, 0.67 - 0.84) performed significantly better than the EO risk scoring system (AUC, 0.36;95% CI, 0.27 - 0.46) for predicting whether to sow farms in the prospective study survived for 130 weeks (p 0.001). Dividing farms into 3 risk groups (low, medium and high) using a low and high cutoff values for the GLLR risk score was informative as the differences in the KM survival curves for the 3 groups were both clinically meaningful and statistically significant. The GLLR risk scoring system used in conjunction with the PRRS Risk Assessment for the Breeding Herd survey delivered through PADRAP appears to have the potential to help veterinarians predict the likelihood of virus introduction in PRRS virus-free sow farms in the US.展开更多
文摘The traditional estimation of Gaussian mixture model is sensitive to heavy-tailed errors;thus we propose a robust mixture regression model by assuming that the error terms follow a Laplace distribution in this article. And for the variable selection problem in our new robust mixture regression model, we introduce the adaptive sparse group Lasso penalty to achieve sparsity at both the group-level and within-group-level. As numerical experiments show, compared with other alternative methods, our method has better performances in variable selection and parameter estimation. Finally, we apply our proposed method to analyze NBA salary data during the period from 2018 to 2019.
文摘功能超网络广泛地应用于脑疾病诊断和分类研究中,而现有的关于超网络创建的研究缺乏解释分组效应的能力或者仅考虑到脑区间组级的信息,这样构建的脑功能超网络会丢失一些有用的连接或包含一些虚假的信息,因此,考虑到脑区间的组结构问题,引入sparse group Lasso(sgLasso)方法进一步改善超网络的创建。首先,利用sgLasso方法进行超网络创建;然后,引入两组超网络特有的属性指标进行特征提取以及特征选择,这些指标分别是基于单一节点的聚类系数和基于一对节点的聚类系数;最后,将特征选择后得到的两组有显著差异的特征通过多核学习进行特征融合和分类。实验结果表明,所提方法经过多特征融合取得了87.88%的分类准确率。该结果表明为了改善脑功能超网络的创建,需要考虑到组信息,但不能逼迫使用整组信息,可以适当地对组结构进行扩展。
文摘目前商业银行面临的个人信用风险问题极其复杂,如何对个人信用风险进行管理非常重要。个人信用风险建模是其中很关键的一步。利用某商业银行信用卡数据,构建信用评分模型,预测客户的违约概率。通过采用ROSE(random over sampling examples)方法处理类别不均衡的问题,利用Group-Lasso(AUC准则)方法进行变量选择,构建基于Logistic回归的信用评分模型。实证结果表明,该方法对样本数据进行类别不均衡处理的结果比其他模型在判别能力和预测能力上更为有效。采用该方法所构建的模型能够作为客户信用评价决策的有效依据,指导银行及其他金融机构评估顾客个人信用风险,在实际运用中具有良好的可操作性。
文摘急性白血病可分为急性淋巴细胞白血病(ALL)和急性髓系白血病(AML)两大亚型,准确诊断是治疗急性白血病的前提和关键。本文基于急性白血病的基因芯片数据,结合两样本T检验、Wilconxon秩和检验、系统聚类法以及变量选择方法监督式分组套索法(supervised group lasso,SGLasso)筛选出对急性白血病分型(AML、ALL)有显著意义的基因,根据训练组数据建立关于急性白血病分型的逻辑回归模型,并对训练组和检验组中患者的病型作拟合和预测,验证该模型的预测精度。
文摘The lack of validated tools to predict how long sow farms will remain PRRS virus-free following successful elimination of the virus has deterred veterinarians and producers from attempting to eliminate the PRRS virus from sow farms. The aim of this study was to use the database of PRRS Risk Assessments for the Breeding Herd in PADRAP to develop and validate an objective risk scoring system for predicting the likelihood of virus introduction in PRRS virus-free sow farms in the US. To overcome the challenges of dealing with a large number of variables, group lasso for logistic regression (GLLR) was applied to a retrospective dataset of PRRS Risk Assessment for the Breeding Herd surveys completed for 704 farms to develop the risk scoring system. The validity of the GLLR risk scoring system was then evaluated by testing its predictive ability on a dataset from a long-term prospective study of 196 sow farms to assess risk factors associated with how long PRRS virus-free sow farms remained PRRS virus-free. Receiver operator characteristic(ROC) curves were estimated to compare the performance of the GLLR risk scoring system to the risk scoring system based on expert opinion (EO), currently used in the PRRS Risk Assessment for the Breeding Herd, for predicting whether herds remained PRRS virus-free for 130 weeks. The GLLR risk scoring system (AUC, 0.76;95% CI, 0.67 - 0.84) performed significantly better than the EO risk scoring system (AUC, 0.36;95% CI, 0.27 - 0.46) for predicting whether to sow farms in the prospective study survived for 130 weeks (p 0.001). Dividing farms into 3 risk groups (low, medium and high) using a low and high cutoff values for the GLLR risk score was informative as the differences in the KM survival curves for the 3 groups were both clinically meaningful and statistically significant. The GLLR risk scoring system used in conjunction with the PRRS Risk Assessment for the Breeding Herd survey delivered through PADRAP appears to have the potential to help veterinarians predict the likelihood of virus introduction in PRRS virus-free sow farms in the US.