基于xml的Deep Web信息自动抽取技术的研究

Deep Web Information Automatic Extraction Technology Based On XML

下载PDF

导出

摘要随着近年来Internet的飞速发展,Deep Web已成为网络信息资源的重要组成部分,用户通过查询接口在线访问其后端的Web数据库来动态的获取其中蕴含的海量信息。由于Deep Web资源分布在各个Deep Web站点,具有异构、动态、数据量大等特点,使用起来较为不便,因此,面向DeepWeb的数据集成系统便应运而生。本文对Deep Web数据集成系统中的数据抽取技术进行了研究,提出了基于xml的Deep Web数据自动抽取方法,并作了详细的技术分析与研究,它能够快速有效地抽取出Deep Web资源,具有抽取准确度高,抽取粒度细等特点。 With the rapid development of Internet in recent years, Deep W^b has become an important part of network information resources, the tremendous information can only be accessed by the query interfaces provided by Web database. The data in Deep Web are obtained in the form of dynamic Web pages when users send a query. As the Deep Web resources are located in various Deep Web site, with a heterogeneous, dynamic, large volumes of data and other characteristics, and inconvenient to use, therefore, the Deep Web data integration systems emerged. In this paper, we researched the data extraction technology in Deep Web Data Integration System, and proposed Deep Web data automatic extraction method based on xml, and has a detailed technical analysis and research for that. The system can quickly and efficiently extracted out of Deep Web resources, has drawn high accuracy and fine granularity extraction and so on.

作者彭媛媛许建潮

机构地区长春工业大学

出处《科技信息》 2009年第33期85-85,104,共2页 Science & Technology Information

关键词信息提取 DEEPWEB DeepWeb数据集成 XML Information extraction Deep Web Deep Web data integration xml

分类号 TP391 [自动化与计算机技术—计算机应用技术] TP393 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献2

1Thanaa M Ghanem,Walid G.Aref.Databases deepen the Web[].IEEE Computer.2004
2Bergman M K.The Deep Web:surfacing hidden value[].Journal of Electronic Publishing.2001

1郭迎春,刘一伟,陈召旭.Deep Web数据抽取的分析与研究[J].南开大学学报（自然科学版）,2012,45(3):9-14. 被引量：2
2孟小峰,于戈.DeepWeb数据集成专刊前言[J].软件学报,2008,19(2):177-178. 被引量：1
3张波,党德鹏.面向应急预案领域的Deep Web数据集成研究[J].计算机应用与软件,2013,30(10):8-11. 被引量：1
4张大吉.面向电子商务的Deep Web数据集成研究[J].宁波大学学报（理工版）,2008,21(2):201-205.
5郭少杰,陈雅冰.Deep Web技术在科学数据共享平台中的应用[J].广东科技,2010,19(14):63-65.
6周亚.2001—2008年国内元数据自动抽取研究综述[J].科技情报开发与经济,2009,19(23):140-142. 被引量：3
7马玉祥,冯骁.Deep Web数据集成中模式匹配算法的研究[J].西安欧亚学院学报,2009,7(1):64-68. 被引量：1
8储赟.面向源代码软件设计模式自动抽取技术的研究[J].电子世界,2013(24):9-10.
9马安香,张斌,张引,高克宁,孙达明.基于结果模式的Deep Web数据集成[J].小型微型计算机系统,2010,31(5):813-818.
10马玉祥,冯骁.Deep Web馆藏图书集成查询系统的研究与分析[J].西安欧亚学院学报,2009,7(2):60-64.

科技信息

2009年第33期

浏览历史

内容加载中请稍等...

基于xml的Deep Web信息自动抽取技术的研究

参考文献2

相关作者

相关机构

相关主题

浏览历史