摘要
剪接位点的识别是基因识别中的一个重要环节。由于现有的基因识别算法主要关注编码区的整体特性 ,而并不着重考虑个别位点的信息 ,因此难以准确地识别出剪接位点。考虑到剪接位点附近的保守序列的相邻碱基之间应该存在某种相关性 ,利用一阶 Markov链建立了表述这种相关性的模型 ,在此基础之上 ,设计了专门用于剪接拉点识别的隐马氏模型 (HMM)方法。实验结果表明 ,用 HMM描述剪接位点附近序列符合实际情况 ,并且利用这一方法进行剪接位点的识别可以很好地提取位点附近保守序列在边缘分布与条件分布 (转移概率 )上的统计特征。使用该方法对真实剪接位点和虚假剪接位点进行识别 ,识别率均可达 90 %以上。
The recognition of splicing sites is an important step in gene recognition. Since current gene recognition algorithms are mainly considering the global features of coding area, instead of the specific information of the splicing sites, they are usually unable to recognize the splicing sites accurately. Considering that neighboring base pairs of the conserved sequences around splicing sites have some correlations, one order Markov chain was used to model the correlation. Based on this model, a special hidden Markov method for recognition of splicing sites was built. Experimental results show that the description of conserved sequences around splicing sites by HMM is well fit to reality. And the method is good at retrieving the statistical characteristics of the marginal and conditional distribution (transition probabilities) of the conserved sequences. Applying the method to recognize both the true and false splicing sites, the recognition rates are greater than 90%.
出处
《清华大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2002年第9期1214-1217,共4页
Journal of Tsinghua University(Science and Technology)
基金
国家自然科学基金资助项目 (6993 5 0 2 0 )