期刊文献+

Grouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids

Grouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids
原文传递
导出
摘要 Sequence alignment is a common method for finding protein structurally conserved/similar regions. However, sequence alignment is often not accurate if sequence identities between to-be-aligned se- quences are less than 30%. This is because that for these sequences, different residues may play similar structural roles and they are incorrectly aligned during the sequence alignment using substitu- tion matrix consisting of 20 types of residues. Based on the similarity of physicochemical features, residues can be clustered into a few groups. Using such simplified alphabets, the complexity of protein sequences is reduced and at the same time the key information encoded in the sequences remains. As a result, the accuracy of sequence alignment might be improved if the residues are properly clustered. Here, by using a database of aligned protein structures (DAPS), a new clustering method based on the substitution scores is proposed for the grouping of residues, and substitution matrices of residues at different levels of simplification are constructed. The validity of the reduced alphabets is confirmed by relative entropy analysis. The reduced alphabets are applied to recognition of protein structurally conserved/similar regions by sequence alignment. The results indicate that the accuracy or efficiency of sequence alignment can be improved with the optimal reduced alphabet with N around 9. Sequence alignment is a common method for finding protein structurally conserved/similar regions. However, sequence alignment is often not accurate if sequence identities between to-be-aligned se- quences are less than 30%. This is because that for these sequences, different residues may play similar structural roles and they are incorrectly aligned during the sequence alignment using substitu- tion matrix consisting of 20 types of residues. Based on the similarity of physicochemical features, residues can be clustered into a few groups. Using such simplified alphabets, the complexity of protein sequences is reduced and at the same time the key information encoded in the sequences remains. As a result, the accuracy of sequence alignment might be improved if the residues are properly clustered. Here, by using a database of aligned protein structures (DAPS), a new clustering method based on the substitution scores is proposed for the grouping of residues, and substitution matrices of residues at different levels of simplification are constructed. The validity of the reduced alphabets is confirmed by relative entropy analysis. The reduced alphabets are applied to recognition of protein structurally conserved/similar regions by sequence alignment. The results indicate that the accuracy or efficiency of sequence alignment can be improved with the optimal reduced alphabet with N around 9.
出处 《Science China(Life Sciences)》 SCIE CAS 2007年第3期392-402,共11页 中国科学(生命科学英文版)
基金 the National Natural Science Foundation of China (Grant Nos. 90403120, 10474041 and 10021001) the Nonlinear Project (973) of the NSM
  • 相关文献

参考文献30

  • 1Ogata K,Ohya M,Umeyama H.Amino acid similarity matrix for homology derived from structural alignment and optimized by the Monte Carlo method. Journal of Molecular Graphics and Modelling . 1998
  • 2Henikoff S,Henikoff J G.Amino acid substitution matrices from pro- tein blocks. Proceedings of the National Academy of Sciences of the United States of America . 1992
  • 3Dosztanyi Z,Torda A E.Amino acid identity matrices based on force fields. Bioinformatics . 2001
  • 4Zhou H,Zhou Y.Fold recognition by combining sequence profiles derived from evolution and from depth-dependent structural align- ment of fragments. Proteins . 2005
  • 5Altschul S F.Amino acid substitution matrices from an information theoretic perspective. Journal of Molecular Biology . 1991
  • 6Karlin S,Altschul S F.Methods for assessing the statistical signifi- cance of molecular sequence features by using general scoring schemes. Proceedings of the National Academy of Sciences of the United States of America . 1990
  • 7Higgins D G,Sharp P M.CLUSTAL: a package for performing mul- tiple sequence alignment on a microcomputer. Gene . 1988
  • 8Koradi R,Billeter M,Whrich K.MOLMOL: A program for display and analysis of macromolecular structures. Journal of Molecular Graphics . 1996
  • 9Wang J,Wang W.A computational approach to simplifying the pro- tein folding alphabet. Nature Structural Biology . 1999
  • 10Clarke N D.Sequence "minimization": Exploring the sequence landscape with simplified sequences. Current Opinion in Biotechnology . 1995

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部