摘要
在当今的信息时代,网上每天都有海量的数字化信息在生成、存储、传播和转换。这种趋势不可避免地加剧了信息获取的困难,如何有效地利用这些海量信息也成为了亟待解决的难题。给出了一个自适应式的海量半结构化数据采集引擎(AEEMSI)的框架,提出了自适应数据模板、数据网关等概念,并利用此结构框架,开发设计出了可投入实际商业应用的运行系统,完成了对Web中的海量半结构化信息进行提取和重新整合的工作。
Nowadays,the Internet is becoming an information highway where massive digital information is being created,stored,populated and transformed.It's more and more difficult for people to find valuable information on the Internet.In this article,we show the framework of AEEMSI (Adaptive Extraction Engine based on Massive Semistructural Information) system.And some fresh concepts such as adaptive data template and adaptive data gateway are included in this paper.
出处
《计算机应用研究》
CSCD
北大核心
2003年第9期65-68,90,共5页
Application Research of Computers
关键词
信息提取
半结构化数据
自适应数据模板
自适应数据网关
Information Extraction
Semi-structural Information
Adaptive Data Template
Adaptive Data Gateway