近似串匹配过滤算法研究被引量：1

Research on Filtering Algorithm of Approximate String Matching

下载PDF

导出

摘要近似串匹配在众多研究领域都有广泛的应用,如文本检索、生物信息学等。文中对基于过滤技术的Off-line模式近似串匹配算法进行了相关研究。首先介绍了串匹配的基础知识和近似串匹配技术的应用分类;然后阐述了Off-line模式近似串匹配算法常用的索引结构;接着详细介绍了近似串匹配过滤算法的研究现状,并阐述了几个经典过滤算法的过滤原理;最后在实验中对比了这些经典过滤算法的性能差异,实验数据显示提高过滤效率和减少过滤时间是加快过滤算法匹配速度所要解决的关键问题。研究表明,基于留空q-gram的过滤算法是近似串匹配未来研究的方向。 Approximate string matching is widely used in many areas, such as text retrieval, computational biology, etc. In this paper, a survey on filter-based approximate suing matching algorithm of Off-line mode is done. First,the preliminaries of string matching and the classifications of approximate suing matching techniques are introduced. Next, some index structures which are often used in Off-line ap- proximate string matching algorithms are illustrated. Then, the research status quo of approximate suing matching is described in detail, and some classical filter algorithms are illustrated. Last, the performance of these classical filtering algorithms is given in experiment, and experimental data shows that enhancing filtration efficiency and decreasing filtration time are two key issues of improving matching speed. The research shows that the filter algorithms based on gapped q-gram is a fmther research direction of approximate suing matching.

作者孙德才王晓霞

机构地区渤海大学信息科学与技术学院渤海大学大学计算机教研部

出处《计算机技术与发展》 2015年第4期171-176,共6页 Computer Technology and Development

基金国家自然科学基金资助项目(61173142) 2014年辽宁省博士科研启动基金计划(20141138) 辽宁省社科联2014年度辽宁经济社会发展立项重点课题(2014lslktzdian-04)

关键词串匹配近似串匹配过滤算法 q-gram过滤 string matching approximate string matching filter algorithm q-gram filter

分类号 TP391.3 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献20

1Navarro G. A guided tour to approximate string matching [ J ]. ACM Computing Surveys,2001,33( 1 ) :31-88.
2Levenshtein V. Binary codes capable of correcting deletions, insertions, and reversals [ J]. Soviet Physics Doklady, 1966,10 (8) :707-710.
3Burkhardt S. Filter algorithms for approximate string matching [ D ]. Saarland : Saarland University, 2002.
4Navarro G, Baeza-Yates R, Sutineny E, et al. Indexing meth- ods for approximate string matching[ J]. IEEE Data Engineer- ing Bulletin,2001,24(4) :19-27.
5Needleman S, Wunsch C. A general method applicable to the search for similarities in the amino acid sequence of two pro- teins[ J ]. Journal of Molecular Biology, 1970,48 ( 3 ) : 443 - 453.
6Smith T F, Waterman M S. Identification of common molecular subsequences[ J ]. Journal of Molecular Biology, 1981,147 (1) :195-197.
7Altschul S F, Madden T L, Alejandro A S, etal. Gapped BLAST and PSI-BLAST:a new generation of protein database search programs[Jl. Nucleic Acids Research, 1997,25 ( 17 ) : 3389-3402.
8Pearson W R, Lipman D J. Improved tools for biological se- quence comparison[ J ]. Proceedings of the National Academy of Sciences of the United States of America, 1988,85 ( 8 ) : 2444-2448.
9Ma B, Tromp J, Li M. PatternHunter: faster and more sensitive homology search [ J ]. Bioinformatics ,2002,18 ( 3 ) :440-445.
10Giladi E,Walker M G, Wang J Z, et al. SST:an algorithm for finding near-exact sequence matches in time proportional tothe logarithm of the database size[ J ]. Bioinformatics ,2002,18 (6) :873-879.

同被引文献13

1Park P J.ChIP-seq:advantages and challenges of a maturing technology[J].Nat Rev Genet,2009,10(10):669-80.
2Zhang Yi-pu,Wang Ping.A Fast Cluster Motif Finding Algorithm for ChIP-Seq Data Sets[C]∥BioMed Research International,2015.2015.
3Bailey T L,Elkan C.Fitting a mixture model by expectation maximization to discover motifs in bipolymers[C]∥ Second International Conference on Intelligent Systems for Molecurlar Bio-logy.1994:28-36.
4Zambelli F,Pesole G,Pavesi G.Motif discovery and transcription factor binding sites before and after the next-generation sequencing era[J].Briefings in Bioinformatics,2013,14(2):225-237.
5Zhang Yi-pu,Huo H,Yu Qiang.A Heuristic Cluster-based EM Algorithm for the Planted(l,d) Problem[J].Journal of Bioinformatics and Computational Biology,2013,11(4):1350009.
6Pevzner P A,Sze S H.Combinatorial approaches to finding subtle signals in DNA sequences[C]∥ISMB.2000:269-278.
7Reid J,Wernisch L.STEME:efficient EM to find motifs in large data sets[J].Nucleic Acids Research,2011,39(18):126.
8Buhler J,Tompa M.Finding motifs using random projections[J].Journal of Computational Biology,2002,9(2):225-242.
9Chen Kun,Zhang Xiao-jun.MCI Clustering Algorithm SolvingPlanted (1,d) Motif Identification[J].Journal of Henan University(Natural Science),2015,45(1):102-107(in Chinese).
10Chan H L,Lam T W,Sung W K,et al.Compressed indexes for approximate string matching[J].Algorithmica,2010,58(2):263-281.

引证文献1

1张懿璞,茹锋,王飚.一种寻找最近子串的快速种子集求精算法[J].计算机科学,2016,43(5):261-264.

1李少芳,车艳.近似串匹配算法在自动评分系统中的应用[J].东莞理工学院学报,2008,15(3):25-28. 被引量：10
2张锦雄,梁正友,蔡德霞,韦兴柳.基于GPU实现允许k-差别近似串匹配并行算法[J].广西大学学报（自然科学版）,2011,36(2):285-291. 被引量：3
3孙德才,王晓霞.一种基于尾匹配q-gram的近似串匹配算法[J].计算机科学,2014,41(6):243-249. 被引量：1
4黄影.一种有效的后缀树建立方法[J].中国电子教育,2013(3):61-65. 被引量：1
5钟诚,范大娟.异构机群系统上基于多轮分配方式的近似串匹配并行算法[J].计算机研究与发展,2008,45(z1):105-112. 被引量：3
6孙德才,王晓霞.一种支持多种子近似串匹配的q-gram索引[J].计算机科学,2014,41(9):279-284. 被引量：3
7张旭珍,贾品贵,王秀艳.应用程序访问网络的监控研究与实现[J].计算机与数字工程,2008,36(8):110-112.
8陈曙晖,苏金树.基于两级审计的分布式内容审计系统[J].计算机工程与科学,2006,28(6):1-3.
9甘井中,韦盛学,谢妙.关系数据库的纠错性关键词查询研究[J].微电子学与计算机,2014,31(12):132-135.
10李少芳.基于近似串匹配算法的评分优化设计[J].黄山学院学报,2015,17(3):11-13.

计算机技术与发展

2015年第4期

浏览历史

内容加载中请稍等...

近似串匹配过滤算法研究被引量：1

参考文献20

同被引文献13

引证文献1

相关作者

相关机构

相关主题

浏览历史

近似串匹配过滤算法研究 被引量：1

参考文献20

同被引文献13

引证文献1

相关作者

相关机构

相关主题

浏览历史

近似串匹配过滤算法研究被引量：1