期刊文献+

基于XML的密集型Web信息抽取与集成研究 被引量:2

Extraction and integration of intensive Web information based on XML
在线阅读 下载PDF
导出
摘要 针对密集型Web信息的数据抽取问题,提出了一种适合于XML结构又较为通用的树型结构抽取规则,把密集型Web上的数据抽取出来整合到指定模式的XML文档中.使用基于样例学习的半结构化Web信息抽取方法,自行开发了一个基于XML的Web新书查询原型系统,抽取Web页面效果良好,可直接应用于专门的Web网站信息的抽取,也可以用于其他相关应用的数据准备阶段. For the problem of intensive web information data extraction, one kind of general tree structure extraction rule which suits in the XML structure is proposed. It assigned the pattern of the intensive Web on data extraction conformity in the XML documents. Using the half structure Web information extraction method based on the example studies, the prototype system based on the XML Web inquiry has been developed which can extract the Web page with good effect. It can be applied in the special Web website information extraction directly, and also may be used the data preparation stage in other correlation application.
出处 《郑州轻工业学院学报(自然科学版)》 CAS 2008年第3期31-35,共5页 Journal of Zhengzhou University of Light Industry:Natural Science
基金 河南省自然科学基金资助项目(0411010500)
关键词 XML 密集型Web数据 数据抽取 信息集成 XML iritensive Web data data extract information integration
  • 相关文献

参考文献6

二级参考文献20

  • 1Hammer J,Proceedings of the Workshop on Management of Semistructured Tucson,1997年,18~25页
  • 2Arvind Arasu,Hector Garcia-Molina.Extracting structured data from web pages[R].Technical Report,Stanford University,2002
  • 3Alberto H F Laender,Berthier A Ribeiro-Neto.A Brief Survey of Web Data Extraction Tools[J].ACM SIGMOD Record,2002;31(2)
  • 4C Hsu,M Dung.Generating finite-state transducers for semistructured data extraction from the web[J].Information System,1998 ;23(8)
  • 5N Kushmerik.Wrapper induction:Efficiency and expressiveness[J].Artificial Intelligence ,2000;(118)
  • 6I Muslea,S Minton,C A Knobolock.A hierarchical approach to wrapper induction[C].In:Proc of Autonomous Agents,1999
  • 7S Soderland.Learning information extraction rules for semistructured and free text[J].Machine Learning,1999;34(1-3)
  • 8B Adelberg.NoDoSE-a tool for semi-automatically extracting structured and semistructured data from text documents.In SIGMOD 98
  • 9B A Ribeiro-Neto,A Laender.Extracting semistructured data through example.In CIKM 99
  • 10D W Embley,D M Campbell.A conceptual-modeling approach to extracting data from the web.In ER 98

共引文献86

同被引文献41

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部