摘要
生物序列数据是生物信息数据中重要的一部分,研究生物序列解读其隐含的生物学意义是生物信息学研究的热点和难点。数据挖掘是当前分析大规模数据的有效工具之一,已广泛应用于分析生物序列数据,并取得了许多研究成果。文章综述了生物序列数据挖掘的关键技术,包括序列比对算法、DNA序列模式挖掘、关联、分类、聚类分析、RNA二级结构预测、蛋白质序列分类和聚类分析,最后展望未来研究方向。
Biological sequence data are an important part of bioinformation data. Researching biological sequence data and finding the tacit knowledge has become a hot and difficult issue of bioinformatics re- search. As one of the most efficient data analysis methods at present, data mining technique has been used widely in biological sequence data analysis, and considerable research achievements have been ob- tained. In this paper, the core technologies of biological sequence data mining are reviewed, including sequence alignment algorithms, DNA sequential pattern mining, association, classification and cluste- ring mining, RNA secondary structure prediction, protein sequential classification and clustering min- ing. Then future work in this respect is forecasted.
出处
《合肥工业大学学报(自然科学版)》
CAS
CSCD
北大核心
2012年第9期1212-1216,共5页
Journal of Hefei University of Technology:Natural Science
关键词
生物序列
数据挖掘
生物信息学
序列比对
biological sequence
data mining
bioinformatics
sequence similarity