期刊文献+

基于组合minimizer种子的泛基因组图序列比对算法

Sequence Alignment Algorithm Based on Combined minimizer Seeds on Pan-Genome Graph
在线阅读 下载PDF
导出
摘要 随着测序技术的发展和应用,人类基因组序列的研究已从个体分析逐步扩展到群体分析。为更好地展示种群不同样本之间的遗传变异信息,泛基因组图模型开始取代传统的线性多序列参考基因组模型,序列到图的比对成为生物序列分析的关键问题之一。现有比对算法通常采用种子与扩展策略,但由于图中组合的路径较多,定位和验证阶段的时间成本高,需要进一步优化单种子选取方法,减少候选位置的数量。为此,提出一种基于组合minimizer种子的序列比对算法,在定位阶段通过对minimizer种子的组合hash,扩展单个种子的覆盖范围。同时,通过序列和相对位置两方面信息查找种子,减少假阳性匹配位置的数量,从而降低后续筛选和验证的工作量。实验结果表明,与主流算法相比,该算法能够减少约80%的候选位置,时间性能提升1~3倍,同时保持相当的索引内存占用和精确比对能力。 With advancements in sequencing technology,human genome analysis has shifted from individual analysis to population analysis.To better demonstrate the genetic variation information between different samples within a population,the pan-genome graph model has replaced the traditional linear multi-sequence reference genome model,and sequence-to-graph alignment has become a key issue in biological sequence analyses.Existing alignment algorithms employ seed-and-extend strategies.However,owing to the numerous paths formed by graph combinations,localization and verification phases become time-consuming,necessitating further optimization and improvement of single-seed selection methods.To address this issue,this paper proposes a sequence alignment algorithm based on a combined minimizer seed.In the localization phase,the algorithm enhances the coverage range of a single seed through the combined hashing of minimizer seeds.Simultaneously,seeds are located through both sequence and relative position information,which significantly reducing the number of false-positive matching positions,thus lowering the workload of the subsequent filtering and verification processes.Experimental results demonstrate that the proposed algorithm can reduce candidate positions by approximately 80%,optimize time performance by one to three times,and have index memory and precise comparison capabilities comparable to mainstream alignment algorithms.
作者 高佳 徐云 GAO Jia;XU Yun(School of Computer Science and Technology,University of Science and Technology of China,Hefei 230027,Anhui,China;Key Laboratory of High Performance Computing of Anhui Province,Hefei 230027,Anhui,China)
出处 《计算机工程》 北大核心 2025年第8期53-61,共9页 Computer Engineering
基金 国家自然科学基金面上项目(61672480) 高等学校学科创新引智计划(BP0719016)。
关键词 泛基因组图 序列比对 种子与扩展策略 minimizer种子 pan-genome graph sequence alignment seed-and-extend strategy minimizer seed
  • 相关文献

参考文献2

二级参考文献4

共引文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部