期刊文献+

应用LDA模型的DNA序列分类方法

DNA Sequence Classification Method Based on Latent Dirichlet Allocation
在线阅读 下载PDF
导出
摘要 DNA序列分类作为生物信息领域的一项基础任务,其目的是根据结构或功能的相似性来预测DNA序列所属的类别。本文在基于LDA(潜在狄利克雷分配)主题模型提出一种DNA序列主题特征表示方法,将基因序列转换为一组多个主题呈现度的概率向量。基于这种新的特征表示方法,构造了一种k-NN分类器对DNA序列进行分类。实验结果表明,新型特征表示方法提高了表示模型的学习效率,并且较为充分的反映DNA序列间的结构信息,从而有效提高了序列的分类精度。 Classification of DNA sequences is a basic task in Bioinformatics,which aims to predict the category of DNA sequences by the similarity of structures or functions.A new feature representation method for DNA sequences was proposed to transform them into a set of topic probability vectors based on the LDA(Latent Dirichlet Allocation)topic model.Experiments were carried out on four real-world sequence sets,and compared with the existing grams-based and Markov model-based methods.The results showed that the new method can improve the learning efficiency of the representation model while reducing the feature dimensionality,which consequently can achieve better accuracy in the application of DNA sequences classification.
作者 冯超 FENG Chao(College of Mathematics and Informatics,Fujian Normal University,Fuzhou,China,350117)
出处 《福建电脑》 2020年第2期35-37,共3页 Journal of Fujian Computer
基金 国家自然科学基金项目“高维序列数据的核学习方法及应用研究”(No.61672157)的资助
关键词 DNA序列 分类 特征表示 潜在狄利克雷分配 DNA sequences classification feature representation Latent Dirichlet Allocation
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部