摘要
基因非编码区域剪接位点的识别是基因识别中一个非常具有挑战性的问题,尤其是5'非翻译区中剪接位点的识别。与一般剪接位点不同,5'非翻译区剪接位点的两侧不存在由编码到非编码的状态转移,所以通常的剪接位点识别算法在非翻译区的性能不太理想。文章采用了基于支持向量机的方法对5'非翻译区中的剪接位点进行识别。为了提高识别精度,采用了基于矩阵相似性度量的核函数参数选取方法,它能够简单快速地确定合适的核函数参数,进而提高核函数的识别性能。通过实验验证,经过参数选择后的支持向量机能够较好地识别5'非翻译区剪接位点。
Identification of splice sites in non-coding regions of genes is one of the most challenging aspects of gene structure recognition, especially the identification of splice sites embedded in human 5' untranslated regions (UTRs). Different from the conventional splice sites identification, there is no transition from coding to non-coding in 5'UTRs, so conventional splice sites prediction methods perform poorly in UTRs. In this paper, support vector machines was used to identify 5'UTRs splice sites. To increase recognition accuracy, the measurement of matrix similarity was used as the criterion of parameters selection. By doing this, apropos parameters were achieved quickly and simply, thereby improved the identification performance. Experiment results showed that 5'UTRs splice sites can be identified well based on SVM with the selection of parameters.
出处
《生物物理学报》
CAS
CSCD
北大核心
2005年第4期284-288,共5页
Acta Biophysica Sinica
基金
国家自然科学基金项目(60471003)