基于半监督支持向量机的并行远同源检测方法

Parallel remote homology detection approach based on semi-supervised support vector machine

下载PDF

导出

摘要在生物信息学中,对给定氨基酸序列的蛋白质进行分类,检测细微的蛋白质序列相似性或远同源性对于准确预测蛋白质功能和结构都非常重要。提出一种新的基于半监督支持向量机的远同源性检测方法,通过定义序列概率剖面,充分利用大型数据库的非标记数据,并行构筑支持向量机核函数,并结合最近邻分类器实现对任何数据的全覆盖。实验表明,该方法能够大幅提高蛋白质序列分类器的性能与效率。使用并行技术将总体计算时间控制在一定范围,推动了半监督支持向量机分类器的广泛应用。 The classification of protein sequences into functional and structural families based on sequence homology is a fundamental problem in computational biology. This paper introduced a novel parallel remote homology detection approach based on semi-supervised support vector machine. The method defined the SVM kernel function parallel by probabilistic profiles which were built with unlabeled data by searching large database and got the complete data coverage by combined with the nearest neighbor algorithm, And presented the remote homology detection experiments to show that the parallel method could increase accuracy and computational efficiency greatly. The use of parallel computing technology to a whole-time control to a certain extent, promoted the semi-supervised support vector machine classifier widely used.

作者王栋孙济洲李福超

机构地区天津大学计算机科学与技术学院河南农业大学网络中心

出处《计算机应用研究》 CSCD 北大核心 2009年第12期4624-4627,共4页 Application Research of Computers

基金天津市科技支撑重点项目(09ZCKFGX00400) 河南省高等教育信息化工程项目(2008xxh011)

关键词半监督学习支持向量机并行计算分类器 semi-supervised learning support vector machine parallel computing classifier

分类号 TP338.6 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献17

1NEEDLEMAN S B, WUNSCH C D. A general method applicable to the search for similarities in the amino acid sequence of two proteins [J]. Journal of Molecular Biology, 1970, 48(3) : 443-453.
2SMITH T, WATERMAN M. Identification of common molecular subsequences [ J ]. Journal of Molecular Biology, 1981,147 ( 1 ) : 195-197.
3ALTSCHUL S, GISH W, MILLER W, et al. A basic local alignment search tool [ J ]. Journal of Molecular Biology, 1990,215 ( 3 ) : 403-410.
4PEARSON W R. Rapid and sensitive sequence comparisons with FASTP and FASTA [ J ]. MethOds Enzymol, 1985,183:63-98.
5GRIBSKOV M, L THY R, EISENBERG D. Profile analysis [ J]. Methods Enzymol, 1990,183 : 146-159.
6KROGH A, BROWN M, MIAN I, et al. Hidden Markov models in computational biology: applications to protein modeling[ J ]. Journal of Molecular Biology, 1994, 235 (5) : 1501 - 1531.
7PARK J, KARPLUS K, BARRETT C, et al. Sequence comparison using multiple sequences detect three times as many remote homologues as pairwise methods [ J ]. Journal of Molecular Biology, 1998, 284(4) : 1201-1210.
8ALTSCHUL S, MADDEN T, SCHAFFER A, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs[J]. Nucleic Acids Research,1997,25(17) : 3389-3402.
9SCHOLKOPF B. An introduction to support vector machines [ C ]// Proc of Recent Advances and Trends in Nonparametric Statistics. 2003:3-17.
10LIAO L, NOBLE W S. Combining pairwise sequence similarity and support vector machines for remote protein homology detection [ C ]// Proc of the 6th Annual International Conference on Research in Computational Molecular Biology. 2002:225-232.

1吴红花,刘国华,王伟.不确定时间序列的相似性匹配问题[J].计算机研究与发展,2014,51(8):1802-1810. 被引量：10
2陆寄远,朝红阳,黄承慧,侯昉.计算能力可伸缩的运动估计率失真优化[J].电子学报,2014,42(8):1495-1502. 被引量：1
3刘永庆,刘东生.基于马尔科夫链的主机异常检测方法研究[J].计算机与数字工程,2010,38(7):20-23. 被引量：2
4孙兵,陈祥国.混合蚁群优化算法求解卫星数传调度问题[J].计算机应用研究,2012,29(11):4064-4068. 被引量：1
5廖建平.一种新的不确定性时间序列概率相似查找方法[J].计算机系统应用,2013,22(4):138-141.
6李青,王能超,郑楚光.可扩展的旋转因子表及FFT算法[J].计算机学报,2002,25(4):392-396. 被引量：3
7慈瑞梅,李东波,童一飞.一种散乱数据的三角剖分新算法[J].计算机集成制造系统,2005,11(11):1640-1643. 被引量：7
8胡春玲,胡学钢,吕刚.一种贝叶斯网络结构学习的混合随机抽样算法[J].计算机工程,2014,40(5):238-242. 被引量：6
9王仙春.VB、VFP在型号研制中的应用[J].航天发射技术,2002(3):41-45.
10尹清波,张汝波,李雪耀,王慧强.基于动态马尔科夫模型的入侵检测技术研究[J].电子学报,2004,32(11):1785-1788. 被引量：9

计算机应用研究

2009年第12期

浏览历史

内容加载中请稍等...

基于半监督支持向量机的并行远同源检测方法

参考文献17

相关作者

相关机构

相关主题

浏览历史