期刊文献+

自适应式的海量半结构化数据采集引擎研究与设计

Study and Design of an Adaptive Extraction Engine Based on Massive Semi-structural Information
在线阅读 下载PDF
导出
摘要 在当今的信息时代,网上每天都有海量的数字化信息在生成、存储、传播和转换。这种趋势不可避免地加剧了信息获取的困难,如何有效地利用这些海量信息也成为了亟待解决的难题。给出了一个自适应式的海量半结构化数据采集引擎(AEEMSI)的框架,提出了自适应数据模板、数据网关等概念,并利用此结构框架,开发设计出了可投入实际商业应用的运行系统,完成了对Web中的海量半结构化信息进行提取和重新整合的工作。 Nowadays,the Internet is becoming an information highway where massive digital information is being created,stored,populated and transformed.It's more and more difficult for people to find valuable information on the Internet.In this article,we show the framework of AEEMSI (Adaptive Extraction Engine based on Massive Semistructural Information) system.And some fresh concepts such as adaptive data template and adaptive data gateway are included in this paper.
出处 《计算机应用研究》 CSCD 北大核心 2003年第9期65-68,90,共5页 Application Research of Computers
关键词 信息提取 半结构化数据 自适应数据模板 自适应数据网关 Information Extraction Semi-structural Information Adaptive Data Template Adaptive Data Gateway
  • 相关文献

参考文献12

  • 1..http://gate. ac. uk/ie,.
  • 2..http://ciir. cs. umass, edu,.
  • 3..http://www. haifa. il. ibm. com/webir,.
  • 4Line Eikvil. Information Extraction from World Wide Web a Survey [ EB/OL ]. http://citeseer, nj. nec. com/eikvi199-information, html, 1999.
  • 5Nicholas Kushmerick. Wrapper Induction for Information Extraction, Intl. Joint Conference on Artificial Intelligence [EB/OL]. http://citeseer, nj. nec. com/kushmerick97wrapper, html,1997-01-38.
  • 6Ricardo Baeza-yates, Berthier Ribeiro-neto. Modem Information Retrieval[M]. Boston Addison Wesley Longman Limited, 1999.1-17.
  • 7Stephen Soderland. learning to Extract Text-based Information from the World Wide Web[EB/OL]. In Proceedings of Third International Conference on Knowledge Discovery and Data Mining. http://citeseer, nj. nec. com/sodedand97leaming.html, 1997.
  • 8Naveen Ashish, Craig Knoblock. Semi-automatic Wrapper Generation for Intemet Information Sources [ EB/OL]. Conferenceon Cooperative Information Systems, http://citeseer, nj. nec.com/ashish97semiautomatic, html, 1997.
  • 9Naveen Ashish, Craig Knoblock. Wrapper Generation for Semistructured Intemet Sources [ EB/OL ]. Proc. Workshop on Management of Semistructured Data. http://citeseer. nj. nec.com/78296, html, 1997.
  • 10Nicholas Kushmerick. Wrapper Induction: Efficiency and Expressiveness[ J ]. Artificial Intelligence, 2000,118 (2000) : 15-68.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部