摘要
针对Deep Web信息资源的利用问题,指出对其进行信息抽取的意义,分析对比在信息抽取过程中处理查询接口和抽取结构化数据这两个主要步骤所使用的技术,采用基于关键词查询和建立文档对象模型的方法对专利数据库进行抽取实验。通过分析实验结果,验证抽取方法的准确性,指出不足之处和解决的途径,以期达到充分利用Deep Web信息资源的目的。
Aiming at solving the problem of how to utilize the information resources in the Deep Web, this paper indicates the approach by information extraction, and through analyses and compares the technologies used in two major processes of handling database searching interface and extracting structured data, does information extraction experiment on patent databases by using the approach based on keywords search and document object modeling technologies. The results of experiment verify the precision of extraction approach and the author lastly points out the disadvantages and the ways to improve, so as to provide references for the full use of Deep Web information resources.
出处
《图书情报工作》
CSSCI
北大核心
2007年第10期25-28,共4页
Library and Information Service
关键词
DEEP
Web
信息抽取
查询接口
命名实体识别
文档对象模型
Deep Web information extraction searching interface named entity recognition document object model