Grouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids

Grouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids

导出

摘要 Sequence alignment is a common method for finding protein structurally conserved/similar regions. However, sequence alignment is often not accurate if sequence identities between to-be-aligned se- quences are less than 30%. This is because that for these sequences, different residues may play similar structural roles and they are incorrectly aligned during the sequence alignment using substitu- tion matrix consisting of 20 types of residues. Based on the similarity of physicochemical features, residues can be clustered into a few groups. Using such simplified alphabets, the complexity of protein sequences is reduced and at the same time the key information encoded in the sequences remains. As a result, the accuracy of sequence alignment might be improved if the residues are properly clustered. Here, by using a database of aligned protein structures (DAPS), a new clustering method based on the substitution scores is proposed for the grouping of residues, and substitution matrices of residues at different levels of simplification are constructed. The validity of the reduced alphabets is confirmed by relative entropy analysis. The reduced alphabets are applied to recognition of protein structurally conserved/similar regions by sequence alignment. The results indicate that the accuracy or efficiency of sequence alignment can be improved with the optimal reduced alphabet with N around 9. Sequence alignment is a common method for finding protein structurally conserved/similar regions. However, sequence alignment is often not accurate if sequence identities between to-be-aligned se- quences are less than 30%. This is because that for these sequences, different residues may play similar structural roles and they are incorrectly aligned during the sequence alignment using substitu- tion matrix consisting of 20 types of residues. Based on the similarity of physicochemical features, residues can be clustered into a few groups. Using such simplified alphabets, the complexity of protein sequences is reduced and at the same time the key information encoded in the sequences remains. As a result, the accuracy of sequence alignment might be improved if the residues are properly clustered. Here, by using a database of aligned protein structures (DAPS), a new clustering method based on the substitution scores is proposed for the grouping of residues, and substitution matrices of residues at different levels of simplification are constructed. The validity of the reduced alphabets is confirmed by relative entropy analysis. The reduced alphabets are applied to recognition of protein structurally conserved/similar regions by sequence alignment. The results indicate that the accuracy or efficiency of sequence alignment can be improved with the optimal reduced alphabet with N around 9.

作者 LI Jing1 & WANG Wei1,2 1 National Laboratory of Solid State Microstructure and Department of Physics, Nanjing University, Nanjing 210093, China 2 Interdisciplinary Center of Theoretical Studies, Chinese Academy of Sciences, Beijing 100080, China

出处《Science China(Life Sciences)》 SCIE CAS 2007年第3期392-402,共11页 中国科学（生命科学英文版）

基金 the National Natural Science Foundation of China (Grant Nos. 90403120, 10474041 and 10021001) the Nonlinear Project (973) of the NSM

关键词 Grouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids

分类号 Q51 [生物学—生物化学]

引文网络
相关文献

参考文献30

1Ogata K,Ohya M,Umeyama H.Amino acid similarity matrix for homology derived from structural alignment and optimized by the Monte Carlo method. Journal of Molecular Graphics and Modelling . 1998
2Henikoff S,Henikoff J G.Amino acid substitution matrices from pro- tein blocks. Proceedings of the National Academy of Sciences of the United States of America . 1992
3Dosztanyi Z,Torda A E.Amino acid identity matrices based on force fields. Bioinformatics . 2001
4Zhou H,Zhou Y.Fold recognition by combining sequence profiles derived from evolution and from depth-dependent structural align- ment of fragments. Proteins . 2005
5Altschul S F.Amino acid substitution matrices from an information theoretic perspective. Journal of Molecular Biology . 1991
6Karlin S,Altschul S F.Methods for assessing the statistical signifi- cance of molecular sequence features by using general scoring schemes. Proceedings of the National Academy of Sciences of the United States of America . 1990
7Higgins D G,Sharp P M.CLUSTAL: a package for performing mul- tiple sequence alignment on a microcomputer. Gene . 1988
8Koradi R,Billeter M,Whrich K.MOLMOL: A program for display and analysis of macromolecular structures. Journal of Molecular Graphics . 1996
9Wang J,Wang W.A computational approach to simplifying the pro- tein folding alphabet. Nature Structural Biology . 1999
10Clarke N D.Sequence "minimization": Exploring the sequence landscape with simplified sequences. Current Opinion in Biotechnology . 1995

1WANG Fang,CHEN Yue,HUANG Ye,JIN Hong-Wei,ZHANG Liang-Ren,YANG Zhen-Jun,ZHANG Li-He.Synthesis,physicochemical and biological properties of oligonucleotides incorporated with amino-isonucleosides[J].中国科学：化学,2012,42(2):213-214.
2Bing Liang Lijun Zhu Zhihui Liang Xiufang Weng Xiaoling Lu Cai＇e Zhang Hui Li Xiongwen Wu.A Simplified PCR-SSP Method for HLA-A2 Subtype in a Population of Wuhan,China[J].Cellular & Molecular Immunology,2006,3(6):453-458. 被引量：9
3江凡,李南.Protein structural codes and nucleation sites for protein folding[J].Chinese Physics B,2007,16(2):392-404.
4Jingfei Huang,Ciquan Liu.Identification of protein superfamily from structure-based sequence motif[J].Chinese Science Bulletin,2002,47(16):1377-1381.
5LIU Rui-rui.Simplified Introduction of In Silico Oncology’s Development[J].Annual Report of China Institute of Atomic Energy,2010(1):240-244.
6徐文玲,王淑芬,牟晋华,王翠花,刘贤娴.AFLP分子标记技术的改进——内切酶EcoRⅠ/TruⅠ组合与EcoRⅠ/MseⅠ组合的比较[J].山东农业科学,2008,40(9):4-6. 被引量：2
7王青艳,申乃坤,朱婧,秦艳,朱绮霞,谢能中,李亿,黄日波.新型Ⅰ型普鲁兰酶基因的克隆表达及酶学性质[J].广西科学院学报,2016,32(2):136-145. 被引量：2
8Quanwei Zhang, Qinke Peng, Tao Xu State Key Laboratory for Manufacturing Systems Engineering, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an 710049, China.DNA splice site sequences clustering method for conservativeness analysis[J].Progress in Natural Science:Materials International,2009,19(4):511-516. 被引量：3
9Yumi Hirakawa,Toshihisa Nomura,Seiichiro Hasezawa,Takumi Higaki.Simplification of vacuole structure during plant cell death triggered by culture filtrates of Erwinia carotovora[J].Journal of Integrative Plant Biology,2015,57(1):127-135. 被引量：1
10SONG Li,ZHAO De-gang,WU Yong-jun,TIAN Xiao-e.A Simplified Seed Transformation Method for Obtaining Transgenic Brassica napus Plants[J].Agricultural Sciences in China,2009,8(6):658-663. 被引量：4

Science China(Life Sciences)

2007年第3期

浏览历史

内容加载中请稍等...

Grouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids

参考文献30

相关作者

相关机构

相关主题

浏览历史