pepReap:基于支持向量机的肽鉴定算法被引量：2

pepReap:A Peptide Identification Algorithm Using Support Vector Machines

下载PDF

导出

摘要利用生物质谱技术进行肽蛋白质鉴定是蛋白质组学研究中的关键问题.提出了一种基于支持向量机(SVM)的肽鉴定算法pepReap.算法由粗细两层打分体系构成,粗打分利用匹配谱峰总强度和数目及肽长度等信息得到候选肽序列的列表,细打分通过SVM算法综合利用多项匹配指标如离子相关性、离子匹配误差、肽序列信息等对粗打分结果进行评价,得到更为可靠的肽鉴定结果.在SVM的参数选择过程中,采用马修斯相关系数来评价分类性能以适应不平衡数据集的情况.在公开发表的数据集上的实验表明,该算法与采用阈值评价方法的流行商业软件SEQUEST相比,在鉴定精度相当的情况下可以获得更高的鉴定灵敏度. Protein identification plays an important role in proteomics. An algorithm for peptide identification using support vector machines （SVM）, pepReap, which consists of two-layered scoring scheme, is designed and implemented. First, a list of peptide candidates is obtained by coarse scoring calculated from total intensity and number of matched peaks, and peptide length. Second, the above preliminary peptide candidates are evaluated by an SVM-based scoring scheme using other important factors, such as correlations between ions, average match error, peptide sequence information, to improve the reliability of peptide identifications. Matthews correlation coefficient is used to measure the classification performance in the SVM training process in order to accommodate to unbalanced datasets. Experiments on a public dataset of tandem mass spectra demonstrate that the pepSeap algorithm outperforms the popular software SEQUEST which uses threshold evaluation in terms of identification sensitivity with comparable precision.

作者王海鹏付岩孙瑞祥贺思敏曾嵘高文

机构地区中国科学院计算技术研究所数字化技术研究室中国科学院上海生命科学研究院生物化学与细胞生物学研究所

出处《计算机研究与发展》 EI CSCD 北大核心 2005年第9期1511-1518,共8页 Journal of Computer Research and Development

基金国家"九七三"重点基础研究发展规划基金项目(2002CB713807) 国家科技攻关计划基金项目(2004BA711A21)~~

关键词支持向量机分类蛋白质组学肽鉴定不平衡数据集参数选择 support vector machines classification proteomics peptide identification unbalanced dataset parmeter selection

分类号 TP181 [自动化与计算机技术—控制理论与控制工程] Q811.4 [生物学—生物工程]

引文网络
相关文献

参考文献41

1R. Aebersold, M. Mann. Mass spectrometry-based proteomics.Nature, 2003, 422(6928): 198～207.
2J.R. Yates Ⅲ. Mass spectrometry and the age of the proteome.Journal of Mass Spectrometry, 1998, 33(1): 1～19.
3J.K. Eng, A. L. McCormack, J. R Yate Ⅲ. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. Journal of the American Society for Mass Spectrometry, 1994, 5 (11): 976～989.
4D.N. Perkins, D. J. Pappin, D. M. Creasy, et al. Probabilitybased protein identification by searching sequence databases using mass spectrometry data. Electrophoresis, 1999, 20(18): 3551～3567.
5H. I. Field, D. Fenyo, R. C. Beavis. RADARS, a bioinformatics solution that automates proteome mass spectral analysis, optimises protein identification, and archives data in a relational database. Proteomics, 2002, 2(1): 36～47.
6P.A. Pevzner, V. Dancik, C. L. Tang. Mutation-tolerant protein identification by mass-spectrometry. Journal of Computational Biology, 2000, 7(6): 777～787.
7V. Bafna, N. Edwards. SCOPE: A probabilistic model for scoring tandem mass spectra against a peptide database.Bioinformatics, 2001, 17(Suppl. 1): 13～21.
8N. Zhang, R. Aebersold, B. Schwikowski. ProbID: A probabilistic algorithm to identify peptides through sequence database searching using tandem mass spectral data. Proteomics,2002, 2(10): 1406～1412.
9J. Colinge, A. Masselot, M. Giron, et al. OLAV: Towards high-throughput tandem mass spectrometry data identification.Proteomics, 2003, 3(8): 1454～1463.
10M. Havilio, Y. Haddad, Z. Smilansky. Intensity-based statistical scorer for tandem mass spectrometry. Analytical Chemistry,2003, 75(3): 435～444.

同被引文献13

1刘建,郑树,余捷凯,刘伟国,胡末伟.脑膜瘤和脑良性肿瘤及脑外伤患者脑脊液蛋白质谱差异表达模型的研究[J].中华检验医学杂志,2004,27(10):638-641. 被引量：4
2杨美香,曲迅,孔北华.SELDI-TOF MS技术及其在卵巢癌和乳腺癌早期诊断中的应用[J].医学综述,2005,11(4):332-334. 被引量：5
3张立玮,温登瑰,王士杰,李英赛,于卫芳,王晓玲,王俊和,李素平,李永伟,王顺平,尔立绵,丛庆文,马彩芬.食管癌高发区贲门癌、胃癌流行强度分析及其对内镜筛查的启示[J].肿瘤防治研究,2005,32(12):792-795. 被引量：34
4Petricoin EF,Ardekani AM,Hitt BA,et al.Use of proteomic patterns in serum to identify ovarian cancer[J].Lancet,2002,359(9306):572-577.
5Adam BL,Qu Y,Davis JW,et al.Serum protein fingerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men[J].Cancer Res,2002,62(13):3609-3614.
6Zhukov TA,Johanson RA,Cantor AB,et al.Discovery of distinct protein profiles specific for lung tumors and pre-malignant lung lesions by SELDI mass spectrometry[J].Lung Cancer,2003,40(3):267-279.
7Chen G,Tarek TG,Huang CC,et al.Proteomic analysis of lung adenocarcinoma:identification of a highly expressed set of proteins in tumors[J].Clin Cancer Res,2002,8(7):2298-2305.
8Li JY,Ershow AG,Chen ZJ,et al.A case control study of cancer of esophagus and gastric cardia in LinXian[J].Int J Cancer,1989,43(5):755-761.
9翟艳堂,涂强,郎显宇,陆忠华,迟学斌.基于CUDA的蛋白质翻译后修饰鉴定MS-Alignment算法加速研究[J].计算机应用研究,2010,27(9):3409-3414. 被引量：1
10孙瑞祥,罗兰,迟浩,刘超,贺思敏.“自顶向下(top-down)”的蛋白质组学--蛋白质变体的规模化鉴定[J].生物化学与生物物理进展,2015,42(2):101-114. 被引量：11

引证文献2

1于卫芳,张立玮,王士杰,郑树,余捷凯,王顺平,吴明利,郭晓青,高扬.高发区自然人群贲门癌血清蛋白指纹图诊断模型的建立及临床价值[J].临床荟萃,2006,21(12):841-844. 被引量：4
2段琼,田博,陈征,王洁,何增有.CUDA-TP:基于GPU的自顶向下完整蛋白质鉴定并行算法[J].计算机研究与发展,2018,55(7):1525-1538. 被引量：1

二级引证文献5

1白洁,孙玲,陈士岭,张立文,马俊龙,丛玉隆.无精子症患者与生育男性精浆蛋白质群的比较分析[J].中华男科学杂志,2007,13(7):579-583. 被引量：10
2白洁,孙玲,朱红,张立文,马俊龙,丛玉隆.严重少精子症患者与正常生育男性精浆蛋白质群比较分析[J].中华男科学杂志,2008,14(4):298-302. 被引量：3
3唐旭,沈宏,余捷凯,许爱娥.SELDI蛋白芯片技术筛选蕈样肉芽肿血清特异性蛋白[J].中华皮肤科杂志,2008,41(6):394-396. 被引量：1
4于卫芳,牛巍巍,李超,张立玮,王士杰.高发区早期食管癌蛋白指纹图谱模型的建立及其筛查价值[J].河北医科大学学报,2012,33(4):373-378.
5周敏,石莹莹,张凯林,张先燚,孔祥蕾.一种用于“自顶向下”质谱数据分析的软件及其在蛋白质光解离质谱中的应用[J].分析化学,2019,47(8):1153-1161. 被引量：4

1付锐.午餐托盘[J].智力（提高版）,2011(9):31-31.
2于长永,王国仁,吴俊杰,毛克明.利用决策树方法对蛋白质鉴定结果二次评价[J].小型微型计算机系统,2010,31(4):678-681.
3于长永,王国仁,毛克明,翟文丹.一种基于信息论的蛋白质数据库搜索鉴定算法[J].东北大学学报（自然科学版）,2009,30(1):50-53.
4朱思敏,李华梅.基于泊松分布模型的蛋白质串联质谱鉴定算法研究[J].云南民族大学学报（自然科学版）,2016,25(2):179-184. 被引量：1
5孙瑞祥,付岩,李德泉,张京芬,王晓彪,盛泉虎,曾嵘,陈益强,贺思敏,高文.基于质谱技术的计算蛋白质组学研究[J].中国科学（E辑）,2006,36(2):222-234. 被引量：16
6荣辉桂,李明伟,蔡立军.An early recognition algorithm for BitTorrent traffic based on improved K-means[J].Journal of Central South University,2011,18(6):2061-2067.
7袁玲.基于曲波和SVD的彩色图像盲水印算法[J].喀什师范学院学报,2011,32(3):50-53.
8王振滨,曹广益,朱新坚.The identification algorithm for commensurate order linear time-invariant fractional systems[J].Journal of Harbin Institute of Technology(New Series),2005,12(5):576-580.
9李艳涛,冯伟森.堆叠去噪自编码器在垃圾邮件过滤中的应用[J].计算机应用,2015,35(11):3256-3260. 被引量：13
10WeiSUN,Fu-XinLI,KaiZHAO,JueWANG,De-XianZHENG,You-HeGAO.Automatically Improve the Quality of MS／MS Spectrum from Sequest[J].Acta Biochimica et Biophysica Sinica,2004,36(1):86-86.

计算机研究与发展

2005年第9期

浏览历史

内容加载中请稍等...

pepReap:基于支持向量机的肽鉴定算法被引量：2

参考文献41

同被引文献13

引证文献2

二级引证文献5

相关作者

相关机构

相关主题

浏览历史

pepReap:基于支持向量机的肽鉴定算法 被引量：2

参考文献41

同被引文献13

引证文献2

二级引证文献5

相关作者

相关机构

相关主题

浏览历史

pepReap:基于支持向量机的肽鉴定算法被引量：2