摘要
为了提高蛋白质氧链糖基化位点的预测准确率,提出了把独立成分分析和支持向量机相结合的方法。实验样本(蛋白质序列)用稀疏编码方式编码,窗口长度为w=21,对于训练样本和待测样本,首先用独立成分分析法(ICA)提取了120个独立成分(特征),把这些独立成分作为支持向量机的输入,在特征空间用支持向量机(SVM)进行预测(分类)。实验结果表明,ICA+SVM的方法比PCA+SVM和SVM的好。预测准确率为88%。更进一步,用同一个蛋白质序列在不同窗口长度下的样本做实验,结果表明,窗口长度越长,预测准确率越高。
To improve the prediction accuracy of O-glycosylation sites,a new method of ICA+SVM is proposed.The samples(protein sequence) for experiment are encoded by the sparse coding with window size w=21,120 independent components(feature) are extracted by independent component analysis(ICA),then the prediction(classification) is done in feature space by support vector machines(SVM).The results of experiment show that the performance of ICA+SVM is better than that of PCA+SVM and SVM.The prediction accuracy is about 88%.Furthermore,we investigated the same protein sequence under various window size,the results indicate that the longer the length of protein sequence,the higher the prediction accuracy.
出处
《计算机与数字工程》
2012年第8期32-34,41,共4页
Computer & Digital Engineering
基金
陕西省教育厅科学研究计划项目(编号:No.11JK1050)资助