摘要
为了提高蛋白质O-糖基化位点的预测准确率,提出了把核主成分分析(KPCA)与支持向量机(SVM)相结合的方法。实验样本用稀疏编码方式编码,窗口长度为21。首先,用核主成分分析提取了样本的核主成分(特征);然后,在特征空间中用改进的支持向量机(ISVM)进行分类(预测)。在使用支持向量机分类时,设置了一个边界系数αc来减少运算的复杂度。实验结果表明,使用KPCA+ISVM的方法预测的效果优于PCA+SVM的预测效果。预测准确率为87%。更进一步,用不同长度的样本做实验(w=5,7,9,11,21,31,41,51),使用多数投票法综合各子分类器的优势。结果表明,组合分类器的预测准确率优于子分类器的预测准确率,预测准确率为88%。
To improve the prediction accuracy of O-glycosylation sites, a new method of KPCA + ISVM was pro- posed. The samples for experiment were encoded by the sparse coding with window size w = 21, kernel principal com- ponents (feature) were extracted by kernel principal component analysis ( KPCA), then the prediction (classification) was done in feature space by improved support vector machines (ISVM). When using ISVM, a bound coefficient ctc was defined to reduce the complexity of model. The results of experiment show that the performance of KPCA + ISVM is better than that of PCA + SVM and SVM. The prediction accuracy is about 87%. Furthermore, the same protein sequence under various window size (w = 5,7,9,11,21,31,41,51 )was investigated, and the majority-vote scheme was used to combine all the pre-classifiers to improve the prediction performance. The results indicate that the perform- ance of ensembles of KPCA + ISVM is superior to that of pre-classifier. The prediction accuracy is about 88%
出处
《科学技术与工程》
北大核心
2013年第25期7371-7376,共6页
Science Technology and Engineering
基金
陕西省教育厅2013年度科学研究计划项目(2013JK1125)资助
关键词
预测蛋白质
核主成分分析
改进的支持向量机
组合分类器
prediction protein kernel principal componentchines (ISVM) ensemble classifieranalysis (KPCA)improved support vector ma-