期刊文献+

pepReap:基于支持向量机的肽鉴定算法 被引量:2

pepReap:A Peptide Identification Algorithm Using Support Vector Machines
在线阅读 下载PDF
导出
摘要 利用生物质谱技术进行肽蛋白质鉴定是蛋白质组学研究中的关键问题.提出了一种基于支持向量机(SVM)的肽鉴定算法pepReap.算法由粗细两层打分体系构成,粗打分利用匹配谱峰总强度和数目及肽长度等信息得到候选肽序列的列表,细打分通过SVM算法综合利用多项匹配指标如离子相关性、离子匹配误差、肽序列信息等对粗打分结果进行评价,得到更为可靠的肽鉴定结果.在SVM的参数选择过程中,采用马修斯相关系数来评价分类性能以适应不平衡数据集的情况.在公开发表的数据集上的实验表明,该算法与采用阈值评价方法的流行商业软件SEQUEST相比,在鉴定精度相当的情况下可以获得更高的鉴定灵敏度. Protein identification plays an important role in proteomics. An algorithm for peptide identification using support vector machines (SVM), pepReap, which consists of two-layered scoring scheme, is designed and implemented. First, a list of peptide candidates is obtained by coarse scoring calculated from total intensity and number of matched peaks, and peptide length. Second, the above preliminary peptide candidates are evaluated by an SVM-based scoring scheme using other important factors, such as correlations between ions, average match error, peptide sequence information, to improve the reliability of peptide identifications. Matthews correlation coefficient is used to measure the classification performance in the SVM training process in order to accommodate to unbalanced datasets. Experiments on a public dataset of tandem mass spectra demonstrate that the pepSeap algorithm outperforms the popular software SEQUEST which uses threshold evaluation in terms of identification sensitivity with comparable precision.
出处 《计算机研究与发展》 EI CSCD 北大核心 2005年第9期1511-1518,共8页 Journal of Computer Research and Development
基金 国家"九七三"重点基础研究发展规划基金项目(2002CB713807) 国家科技攻关计划基金项目(2004BA711A21)~~
关键词 支持向量机 分类 蛋白质组学 肽鉴定 不平衡数据集 参数选择 support vector machines classification proteomics peptide identification unbalanced dataset parmeter selection
  • 相关文献

参考文献41

  • 1R. Aebersold, M. Mann. Mass spectrometry-based proteomics.Nature, 2003, 422(6928): 198~207.
  • 2J.R. Yates Ⅲ. Mass spectrometry and the age of the proteome.Journal of Mass Spectrometry, 1998, 33(1): 1~19.
  • 3J.K. Eng, A. L. McCormack, J. R Yate Ⅲ. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. Journal of the American Society for Mass Spectrometry, 1994, 5 (11): 976~989.
  • 4D.N. Perkins, D. J. Pappin, D. M. Creasy, et al. Probabilitybased protein identification by searching sequence databases using mass spectrometry data. Electrophoresis, 1999, 20(18): 3551~3567.
  • 5H. I. Field, D. Fenyo, R. C. Beavis. RADARS, a bioinformatics solution that automates proteome mass spectral analysis, optimises protein identification, and archives data in a relational database. Proteomics, 2002, 2(1): 36~47.
  • 6P.A. Pevzner, V. Dancik, C. L. Tang. Mutation-tolerant protein identification by mass-spectrometry. Journal of Computational Biology, 2000, 7(6): 777~787.
  • 7V. Bafna, N. Edwards. SCOPE: A probabilistic model for scoring tandem mass spectra against a peptide database.Bioinformatics, 2001, 17(Suppl. 1): 13~21.
  • 8N. Zhang, R. Aebersold, B. Schwikowski. ProbID: A probabilistic algorithm to identify peptides through sequence database searching using tandem mass spectral data. Proteomics,2002, 2(10): 1406~1412.
  • 9J. Colinge, A. Masselot, M. Giron, et al. OLAV: Towards high-throughput tandem mass spectrometry data identification.Proteomics, 2003, 3(8): 1454~1463.
  • 10M. Havilio, Y. Haddad, Z. Smilansky. Intensity-based statistical scorer for tandem mass spectrometry. Analytical Chemistry,2003, 75(3): 435~444.

同被引文献13

引证文献2

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部