摘要
通过分析3216条嗜热蛋白和4007条常温蛋白的二肽组成,结果发现,在嗜热蛋白中存在更多EE,EK,KE,VE,EI,KI,EV,KK,VK和IE等二肽,更少AA,LL,LA,AL,QA,QL,AQ,LT,TL和EQ等二肽。在此基础上发展了一种识别嗜热和常温蛋白的统计学方法,通过对两组共853个蛋白序列进行识别,该方法识别平均正确率分别可达89.0%和89.6%。同时探讨了一些特定二肽对识别效果的影响。
In this work, the dipeptide composition of 3216 thermophilic and 4007 mesophilic protein sequences was systematically analyzed. We found that the thermophilic proteins contained more dipeptides such as EE, EK, KE, VE, EI, KI, EV, KK,VK and IE, whereas less dipeptides such as AA,LL,LA,AL, QA,QL, AQ,LT,TL and EQ. Based on this information, a statistical method for discriminating thermophilic and mesophilic proteins was developed. Our approach correctly picked up the thermophilic proteins with the accuracy of 94.0% and 89%, respectively, for the testing sets of 382 and 73 thermophilic proteins. And for the testing 325 and 73 mesophilic proteins, the accuracy was 85.2 % and 89 %, respectively. The influence of specific dipeptides on discrimination was also discussed.
出处
《生物工程学报》
CAS
CSCD
北大核心
2006年第2期293-298,共6页
Chinese Journal of Biotechnology
基金
国家自然科学基金资助项目(No.20276026)
国务院侨办科研基金资助项目(No.05Q0018)。~~
关键词
二肽组成
识别
嗜热蛋白
蛋白质热稳定性
dipeptide composition, discrimination, thermophilic protein, protein thermostability