摘要
Web数据库(WDB)提供了不同形式的数据查询接口,基于关键字的简单查询接口(SQI)是其中一种被广泛应用的查询接口,而现有研究主要讨论通过复杂查询接口对WDB作探测查询和模式识别。为此该文提出了一种基于SQI的WDB探测查询和模式识别方法。根据SQI的查询特性提出了基于SQI的满条件查询定义及其生成策略,用以识别接口模式;在结果模式识别中,通过对结果页面中的非查询关键词作扩展识别,提高了结果模式识别的属性召回率。在图书、电影和手机3个领域共35个WDB上的实验证明了该方法可以准确高效地识别数据库模式。
Web databases (WDB) provide different types of query interfaces to access data. While the simple query interface (SQI) is one of the most popular interfaces,most existing works use a complex query interface to perform schema recognition of the backend databases. This paper presents a method for instance based query probing and schema identification through SQI. A query probing strategy was developed to generate the full-conditioned query. An extended identification method for the non-query attributes for result schema was developed to effectively improve the attribute recall rate. Tests on website of online book,movie and mobile phone shopping sites show that this method achieves accurate recall of schema identification.
出处
《清华大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2010年第4期551-555,共5页
Journal of Tsinghua University(Science and Technology)
基金
国家自然科学基金与微软联合资助重点项目(60833003)