摘要
随着近年来Internet的飞速发展,Deep Web已成为网络信息资源的重要组成部分,用户通过查询接口在线访问其后端的Web数据库来动态的获取其中蕴含的海量信息。由于Deep Web资源分布在各个Deep Web站点,具有异构、动态、数据量大等特点,使用起来较为不便,因此,面向DeepWeb的数据集成系统便应运而生。本文对Deep Web数据集成系统中的数据抽取技术进行了研究,提出了基于xml的Deep Web数据自动抽取方法,并作了详细的技术分析与研究,它能够快速有效地抽取出Deep Web资源,具有抽取准确度高,抽取粒度细等特点。
With the rapid development of Internet in recent years, Deep W^b has become an important part of network information resources, the tremendous information can only be accessed by the query interfaces provided by Web database. The data in Deep Web are obtained in the form of dynamic Web pages when users send a query. As the Deep Web resources are located in various Deep Web site, with a heterogeneous, dynamic, large volumes of data and other characteristics, and inconvenient to use, therefore, the Deep Web data integration systems emerged. In this paper, we researched the data extraction technology in Deep Web Data Integration System, and proposed Deep Web data automatic extraction method based on xml, and has a detailed technical analysis and research for that. The system can quickly and efficiently extracted out of Deep Web resources, has drawn high accuracy and fine granularity extraction and so on.
出处
《科技信息》
2009年第33期85-85,104,共2页
Science & Technology Information