期刊文献+

Deep Web信息抽取研究 被引量:5

On Deep Web Information Extraction
原文传递
导出
摘要 针对Deep Web信息资源的利用问题,指出对其进行信息抽取的意义,分析对比在信息抽取过程中处理查询接口和抽取结构化数据这两个主要步骤所使用的技术,采用基于关键词查询和建立文档对象模型的方法对专利数据库进行抽取实验。通过分析实验结果,验证抽取方法的准确性,指出不足之处和解决的途径,以期达到充分利用Deep Web信息资源的目的。 Aiming at solving the problem of how to utilize the information resources in the Deep Web, this paper indicates the approach by information extraction, and through analyses and compares the technologies used in two major processes of handling database searching interface and extracting structured data, does information extraction experiment on patent databases by using the approach based on keywords search and document object modeling technologies. The results of experiment verify the precision of extraction approach and the author lastly points out the disadvantages and the ways to improve, so as to provide references for the full use of Deep Web information resources.
作者 董旻 方曙
出处 《图书情报工作》 CSSCI 北大核心 2007年第10期25-28,共4页 Library and Information Service
关键词 DEEP Web 信息抽取 查询接口 命名实体识别 文档对象模型 Deep Web information extraction searching interface named entity recognition document object model
  • 相关文献

参考文献11

  • 1Bergman M K. The deep web: surfacing hidden value. [2006-09-10]. http://www.press.umich.edu/jep/07-01/bergman.html.
  • 2Bin He, Patel M, Zhen Zhang, et al. Accessing the deep web: A survey. [2007-01-17]. http://eagle.cs.uiuc.edu/tr/dwsurvey-trhpzc-ju1104.pdf.
  • 3Liu V Z, Luo R C, Cho J, et al. DPro: A probabilistic approach for hidden web database selection using dynamic probing. [2007- 03-11]. http://www.cobase.cs.ucla.edu/tech-docs/vicliu/ Report030024.pdf.
  • 4Ipeirotis P G, Gravano L. When one sample is not enough: improving text database selection using shrinkage. [2007-03-20]. http://www1.cs.columbia.edu/-gravano/Papers/2004/ sigmod2004.pdf.
  • 5郑冬冬,崔志明.Deep Web爬虫爬行策略研究[J].计算机工程与设计,2006,27(17):3154-3158. 被引量:13
  • 6Liddle S W, Embley D W, et al. Extracting data behind web forms//Masatoshi Yoshikawa, Yu E S K eds. 21st International Conference on Conceptual Modeling. Advanced Conceptual Modeling Techniques:Tampere, Finland, 2003:402-413.
  • 7He Hai, Meng Weiyi, Yu Clement, et al. Wise-integrator: A system for extracting and integrating complex web search interfaces of the deep web.[2007-03-251, http://www.vldb2005. org/program/paper/demo/p1314-he.pdf.
  • 8Zhang Zhen, He Bin, Chen K,et al. Understanding web query interfaces: best-effort parsing with hidden syntax. [2007-03- 29]. http://eagle.cs.uiuc.edu/pubs/2004/parsing-sigmod04-zhcmar04.pdf.
  • 9Raghavan S, Garcia-Molina H. Crawling the hidden web. [2007-04-01]. http://www.dia.uniroma3.it/-vldbproc/017_129.pdf.
  • 10Appelt E D, Israel D J. Introduction to information extraction technology. [2007-04-01]. http://ranger.uta.edu/-alp/dm/ ixtutorial.pdf.

二级参考文献12

  • 1Raghavan S,Garcia-Molina H.Crawling the hidden web[C].Roma,Italy:Proceedings of the 27th International Conference on Very Large Data Bases,2001.129-138.
  • 2Cormen T H,Leiserson C E,Rivest R L.Introduction to algorithms[M].2nd Edition.MIT Press/McGraw Hill,2001.
  • 3Ipeirotis P,Gravano L.Distributed search over the hidden web:Hierarchical database sampling and selection[C].VLDB,2002.
  • 4Ntoulas A,Cho J,Olston C.What's new on the web? The evolution of the web from a search engine perspective[Z].WWW,2004.
  • 5Barbosa L,Freire J.Siphoning hidden-web data through keyword-based interfaces[C].SBBD,2004.
  • 6Cope J,Craswell N,Hawking D.Automated discovery of search interfaces on the web[C].14th Australasian conference on Data Base technologies,2003.
  • 7He B,Chang K C C.Statistical schema matching across web query interfaces[C].SIGMOD Conference,2003.
  • 8Ipeirotis P G,Gravano L,Sahami M.Probe,count,and classify:Categorizing hidden web databases[C].SIGMOD,2001.
  • 9Liu V Z,Luo J C Richard C,Chu W W.Dpro:A probabilistic approach for hidden web database selection using dynamic probing[C].ICDE,2004.
  • 10Wang Jiying.Information discovery,extraction and integration for the hidden web[C].2002.

共引文献12

同被引文献167

引证文献5

二级引证文献30

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部