摘要
随着测序技术的发展和应用,人类基因组序列的研究已从个体分析逐步扩展到群体分析。为更好地展示种群不同样本之间的遗传变异信息,泛基因组图模型开始取代传统的线性多序列参考基因组模型,序列到图的比对成为生物序列分析的关键问题之一。现有比对算法通常采用种子与扩展策略,但由于图中组合的路径较多,定位和验证阶段的时间成本高,需要进一步优化单种子选取方法,减少候选位置的数量。为此,提出一种基于组合minimizer种子的序列比对算法,在定位阶段通过对minimizer种子的组合hash,扩展单个种子的覆盖范围。同时,通过序列和相对位置两方面信息查找种子,减少假阳性匹配位置的数量,从而降低后续筛选和验证的工作量。实验结果表明,与主流算法相比,该算法能够减少约80%的候选位置,时间性能提升1~3倍,同时保持相当的索引内存占用和精确比对能力。
With advancements in sequencing technology,human genome analysis has shifted from individual analysis to population analysis.To better demonstrate the genetic variation information between different samples within a population,the pan-genome graph model has replaced the traditional linear multi-sequence reference genome model,and sequence-to-graph alignment has become a key issue in biological sequence analyses.Existing alignment algorithms employ seed-and-extend strategies.However,owing to the numerous paths formed by graph combinations,localization and verification phases become time-consuming,necessitating further optimization and improvement of single-seed selection methods.To address this issue,this paper proposes a sequence alignment algorithm based on a combined minimizer seed.In the localization phase,the algorithm enhances the coverage range of a single seed through the combined hashing of minimizer seeds.Simultaneously,seeds are located through both sequence and relative position information,which significantly reducing the number of false-positive matching positions,thus lowering the workload of the subsequent filtering and verification processes.Experimental results demonstrate that the proposed algorithm can reduce candidate positions by approximately 80%,optimize time performance by one to three times,and have index memory and precise comparison capabilities comparable to mainstream alignment algorithms.
作者
高佳
徐云
GAO Jia;XU Yun(School of Computer Science and Technology,University of Science and Technology of China,Hefei 230027,Anhui,China;Key Laboratory of High Performance Computing of Anhui Province,Hefei 230027,Anhui,China)
出处
《计算机工程》
北大核心
2025年第8期53-61,共9页
Computer Engineering
基金
国家自然科学基金面上项目(61672480)
高等学校学科创新引智计划(BP0719016)。