期刊文献+

高通量DNA测序数据压缩研究进展 被引量:4

Advances in the compression of high-throughput DNA sequencing data
在线阅读 下载PDF
导出
摘要 针对高通量DNA测序技术发展产生的DNA测序数据量猛增,数据压缩技术是解决存储和传输高通量DNA序列数据问题的重要方法之一.评述DNA测序数据传统压缩方法包括替代法和统计法,以及基于参考基因组的高通量DNA测序数据压缩方法,介绍并比较重测序数据压缩、从头测序数据压缩、质量分数压缩和压缩数据检索的代表性算法,研究高通量DNA测序数据压缩面临的挑战及对未来的展望. With the development of high-throughput DNA sequencing technology,DNA sequencing data grows rapidly.The use of compression techniques provides an important candidate solution for the storage and transmission challenges of high-throughput DNA sequencing data.In this paper,the traditional DNA sequences compression methods,including substitutionary and statistical methods,and the reference-genome-based compression method for high-throughput DNA sequencing data are surveyed.The state-of-the-art algorithms of re-sequencing data compression,de novo sequencing data compression,quality score compression,and compressed data indexing are introduced and compared.The challenges and future prospects of high-throughput DNA sequencing data compression are also discussed.
出处 《深圳大学学报(理工版)》 EI CAS 北大核心 2013年第4期409-415,共7页 Journal of Shenzhen University(Science and Engineering)
基金 国家自然科学基金资助项目(61211130120,61001185)~~
关键词 计算机应用 DNA测序 下一代测序 重测序 从头测序 高通量测序 数据压缩算法 computer application DNA sequencing next generation sequencing resequencing de novo sequencing high-throughput sequencing data compression
  • 相关文献

参考文献45

  • 1Sanger F, Nicklen S, Coulson A R. DNA sequencing with chain-terminating inhibitors [J]. Proceedings of the National Academy of Sciences of the United States of A- merica, 1977, 74(12): 5463-5467.
  • 2Margulies M, Egholm M, Ahman W E, et al. Genome sequencing in microfabricated high-density picolitre reac- tors [J]. Nature, 2005, 437(7057): 376-380.
  • 3Kai A. STM and AFM of bio/organic molecules and struc- tures [ J ]. Surface Science Reports, 1996, 26 ( 8 ) : 263 -332.
  • 4Hibbs X, Krstic S, Mastrangelo A, et al. The potential and challenges of nanopore sequencing [ J ]. Nature Bio- technology, 2008, 26(10): 1146-1153.
  • 5Kabn S D. On the future of genornic data[J]. Science, 2011, 331(6018): 728-729.
  • 6Grumbach S, Tahi F. Compression of DNA sequences [ C]// In Proceedings of Data Compression Conference. Snowbird(USA) : IEEE Computer Society, 1993: 340- 350.
  • 7Giancarlo R, Scaturro D, Utro F. Textual data compres- sion in computational biology : asynopsis [ J ]. Bioinfor-matics, 2009, 25(13) : 1575-1586.
  • 8Matsumoto T, Sadakane K, Imai H. Biological sequence compression algorithms [ C ]//Proceedings of Genome In- formatics Workshop,Tokyo, 2000 : 43-52.
  • 9Chen X, Li M, Ma B, et al. DNA compress: fast and effective DNA sequence compression [J]. Bioinformatics, 2002, 18(2): 1696-1698.
  • 10Loewenstern D, Yianilos P N. Significantly lower entropy estimates for natural DNA sequences [ J ]. Computational Biology, 1999, 6(1) : 125-142.

同被引文献26

  • 1Korodi G,Tabus I.An efficient normalized maximum likelihood algorithm for DNA sequence compression[J].ACM Transactions on Information Systems,2005,23(1):3-34.
  • 2Zhu Zexuan,Zhou Jiarui,et al.DNA sequence compression using adaptive particle swarm optimization-based memetic algorithm[J].IEEE Transactions on Evolutionary Computation,2011,15(5):643-658.
  • 3Kuruppu S,Puglisi S J,et al.Optimized relative Lempel-Ziv compression of genomes.Proceeding of the 34th Australasian Computer Science Conference[C].Australia:ACSC,2011.91-98.
  • 4Wang Congmao,Zhang Dabing.A novel compression tool for efficient storage of genome resequencing data[J].Nucleic Acids Research,2011,39(7):E45-U74.
  • 5Jones D,Ruzzo W,et al.Compression of next-generation sequencing reads aided by highly efficient de novo assembly[J].Nucleic Acids Research,2012,40(22):E171.
  • 6Kuruppu S,Beresford-Smith B,et al.Iterative dictionary construction for compression of large DNA data sets[J].IEEE/ACM Transactions on Computational Biology and Bioinformatics,2012,9(1):137-149.
  • 7Li Cong,Ji Zhenzhou,et al.Efficient parallel design for BWT-based DNA sequences data multicompression algorithm.Proceeding of International Conference on Automatic Control and Artificial Intelligence[C].Xiamen:ACAI,2012.967-970.
  • 8Wikipedia.Move-to-front Transformation[DB/OL].http://en.wikipedia.org/wiki/Move-to-front-transform,2013-12-09.
  • 9Wikipedia.FASTA[DB/OL].Http://blast.ncbi.nlm.nih.gov/blastcgihelp.shtml,2013-12-09.
  • 10Shamir.Gil I.Universal lossless compression with unknown alphabets-the average case[J].IEEE Transactions on Information Theory,2006,52(11):4915-4944.

引证文献4

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部