期刊文献+

使用分类器自动发现特定领域的深度网入口(英文) 被引量:14

Using Classifiers to Find Domain-Specific Online Databases Automatically
在线阅读 下载PDF
导出
摘要 在深度网研究领域,通用搜索引擎(比如Google和Yahoo)具有许多不足之处:它们各自所能覆盖的数据量与整个深度网数据总量的比值小于1/3;与表层网中的情况不同,几个搜索引擎相结合所能覆盖的数据量基本没有发生变化.许多深度网站点能够提供大量高质量的信息,并且,深度网正在逐渐成为一个最重要的信息资源.提出了一个三分类器的框架,用于自动识别特定领域的深度网入口.查询接口得到以后,可以将它们进行集成,然后将一个统一的接口提交给用户以方便他们查询信息.通过8组大规模的实验,验证了所提出的方法可以准确高效地发现特定领域的深度网入口. In hidden Web domain, general-purpose shortcomings. They cover less than one-third of the data combined, they cover roughly the same data. Hidden Web search engines (i.e., Google and Yahoo) have their stored in document databases. Unlike the surface Web, if is a highly important information source since the content provided by many hidden Web sites is often of very high quality. This paper proposes a three-step framework to automatically identify domain-specific hidden Web entries. With those obtained query interfaces, they can be integrated to obtain a unified interface which is given to users to query. Eight large-scale experiments demonstrate that the technique can find domain-specific hidden Web entries accurately and efficiently.
出处 《软件学报》 EI CSCD 北大核心 2008年第2期246-256,共11页 Journal of Software
基金 Supported by the National Natural Science Foundation of China under Grant No.60373099 (国家自然科学基金) the Science and Technology Development Program of Jilin Province of China under Grant No.20070533 (吉林省科技发展计划)
关键词 深度网 深度网 表层网 深度网入口 搜索表单 deep Web hidden Web surface Web hidden Web entry searchable form
  • 相关文献

参考文献26

  • 1Rocco D, Caverlee J, Liu L, Critchlow T. Exploiting the deep Web with DynaBot: Matching, probing, and ranking. In: Ellis A, Hagino T, eds. Proc. of the World Wide Web Special Interest Tracks And Posters (WWW). Chiba: ACM, 2005. 1174-1175.
  • 2BrightPlanet.com. The deep Web: Surfacing hidden value, http://brightplanet.com
  • 3Bergman MK. The deep Web: Surfacing hidden value. Journal of Electronic Publishing, 2001,7(1): 1174-1175. http://www.press. umich.edu/jep/07-01/bergman.html
  • 4He B, Zhang Z, Chang KCC. Knocking the door to the deep Web: Integrating Web query interfaces. In: Weikum G, ed. Proc. of the SIGMOD Conf. Paris: ACM, 2004. 913-914.
  • 5Chang KCC, He B, Zhang Z. MetaQuerier over the deep Web: Shallow integration across holistic sources. In: Nascimento MA, Ozsu MT, Kossmann D, Miller RJ, Blakeley JA, Schiefer KB, eds. Proc. of the Int'l Conf. on Very Large Data Bases (VLDB). Morgan Kaufmann Publishers, 2004. 15-21.
  • 6Wu W, Doan A, Yu CT. Merging interface schemas on the deep Web via clustering aggregation. In: Proc. of the Int'l Conf. on Data Mining (ICDM). IEEE Computer Society, 2005. 801-804.
  • 7He H, Meng WY, Yu CT, Wu ZH. WISE-Integrator: A system for extracting and integrating complex Web search interfaces of the deep Web. In: Bohm K, Jensen CS, Haas LM, Kersten ML, Larson PA, Ooi BC, eds. Proc. of the Int'l Conf. on Very Large Data Bases (VLDB). ACM, 2005. 1314-1317.
  • 8Chang KCC, Garcia-Molina H. Mind your vocabulary: Query mapping across heterogeneous information sources. In: Dells A, Faloutsos C, Ghandeharizadeh S, eds. Proc. of the SIGMOD Conf. Philadelphia: ACM Press, 1999. 335-346.
  • 9He B, Zhang Z, Chang KCC. MetaQuerier: Querying structured Web sources on-the-fly. In: Ozcan F, ed. Proc. of the SIGMOD Conf. ACM, 2005. 927-929.
  • 10Nakatoh T, Yamada Y, Hirokawa S. Automatic generation of deep Web wrappers based on discovery of repetition. In: Proc. of the Asia Information Retrieval Symp. (AIRS). Beijing: Springer-Verlag, 2004. 269-272.

同被引文献171

引证文献14

二级引证文献18

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部