期刊文献+

Verbumculus and the Discovery of Unusual Words 被引量:1

Verbumculus and the Discovery of Unusual Words
原文传递
导出
摘要 Measures relating word frequencies and expectations have been constantly ofinterest in Bioinformatics studies. With sequence data becoming massively available, exhaustiveenumeration of such measures have become conceivable, and yet pose significant computational burdeneven when limited to words of bounded maximum length. In addition, the display of the huge tablespossibly resulting from these counts poses practical problems of visualization and inference.VERBUMCULUS is a suite of software tools for the efficient and fast detection of over- orunder-represented words in nucleotide sequences. The inner core of VERBUMCULUS rests on subtlyinterwoven properties of statistics, pattern matching and combinatorics on words, that enable one tolimit drastically and a priori the set of over-or under-represented candidate words of all lengthsin a given sequence, thereby rendering it more feasible both to detect and visualize such words in afast and practically useful way. This paper is devoted to the description of the facility at theoutset and to report experimental results, ranging from simulations on synthetic data to thediscovery of regulatory elements on the upstream regions of a set of genes of the yeast. Measures relating word frequencies and expectations have been constantly ofinterest in Bioinformatics studies. With sequence data becoming massively available, exhaustiveenumeration of such measures have become conceivable, and yet pose significant computational burdeneven when limited to words of bounded maximum length. In addition, the display of the huge tablespossibly resulting from these counts poses practical problems of visualization and inference.VERBUMCULUS is a suite of software tools for the efficient and fast detection of over- orunder-represented words in nucleotide sequences. The inner core of VERBUMCULUS rests on subtlyinterwoven properties of statistics, pattern matching and combinatorics on words, that enable one tolimit drastically and a priori the set of over-or under-represented candidate words of all lengthsin a given sequence, thereby rendering it more feasible both to detect and visualize such words in afast and practically useful way. This paper is devoted to the description of the facility at theoutset and to report experimental results, ranging from simulations on synthetic data to thediscovery of regulatory elements on the upstream regions of a set of genes of the yeast.
出处 《Journal of Computer Science & Technology》 SCIE EI CSCD 2004年第1期22-41,共20页 计算机科学技术学报(英文版)
基金 美国自然科学基金,Purdue Research Foundation,Italian Ministry of University and Research, and the Research Program of the University of Padova 及Purdue Research Foundation,the Italian Ministry of University and Re-search, the Research Program of the University of Padova and Bourns College of Engineering, University of California,Riverside
关键词 verbumculus unusual words subword statistics pattern discovery regulatoryelements suffix trees verbumculus unusual words subword statistics pattern discovery regulatoryelements suffix trees
  • 相关文献

参考文献52

  • 1Guyer M S, Collins F S. How is the human genome project doing, and what have we learned so far? In Proc. Natl. Acad. Sci. U.S.A., 1995, 92: 10841-10848.
  • 2Collins F S, Patrinos A, Jordan E et al. New goals for the U.S. human genome project: 1998-2003. Science,1998, 282: 682-689.
  • 3Fleischmann R D, Adams M D et al. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science, 1995, 269: 496-512.
  • 4Schena M, Shalom D, Davis R Wet al.Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science, 1995, 270: 467-470.
  • 5Lockhart D J, Dong H et al. Expression monitoring by hybridization to high-density oligonucleotide arrays.Nature Biotechnology, 1996, 14: 1675-1680.
  • 6DeRisi J L, Iyer V R, Brown P O. Exploring the metaboblic and genetic control of gene expression on a genomic scale. Science, 1997, 278: 680-686.
  • 7Chu S, DeRisi J L, Eisen Michael Bet al, The transcriptional program of sporulation in budding yeast. Science,October 1998, 282: 699-705.
  • 8Apostolico A, Bock M E, Lonardi Set al. Efficient detection of unusual words.J. Comput. Bio.January 2000, 7(1/2): 71-94.
  • 9Apostolico A, Bock M E, Lonardi S. Monotony of surprise and large-scale quest for unusual words (extended abstract). In Proc. Research in Computational Molecular Biology (RECOMB), Myers G, Hannenhalli S et al (Eds.), Washington DC, April 2002, pp.283-311. Also in J. Comput. Bio. July 2003, 10: 3-4.
  • 10Pesole G, Prunella N, Liuni S et al.WORDUP: An efficient algorithm for discovering statistically significant patterns in DNA sequences. Nucleic Acids Res. 1992,20(11): 2871 2875.

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部