期刊文献+

基于简单查询接口的Web数据库模式识别 被引量:3

Web database schema identification through a simple query interface
原文传递
导出
摘要 Web数据库(WDB)提供了不同形式的数据查询接口,基于关键字的简单查询接口(SQI)是其中一种被广泛应用的查询接口,而现有研究主要讨论通过复杂查询接口对WDB作探测查询和模式识别。为此该文提出了一种基于SQI的WDB探测查询和模式识别方法。根据SQI的查询特性提出了基于SQI的满条件查询定义及其生成策略,用以识别接口模式;在结果模式识别中,通过对结果页面中的非查询关键词作扩展识别,提高了结果模式识别的属性召回率。在图书、电影和手机3个领域共35个WDB上的实验证明了该方法可以准确高效地识别数据库模式。 Web databases (WDB) provide different types of query interfaces to access data. While the simple query interface (SQI) is one of the most popular interfaces,most existing works use a complex query interface to perform schema recognition of the backend databases. This paper presents a method for instance based query probing and schema identification through SQI. A query probing strategy was developed to generate the full-conditioned query. An extended identification method for the non-query attributes for result schema was developed to effectively improve the attribute recall rate. Tests on website of online book,movie and mobile phone shopping sites show that this method achieves accurate recall of schema identification.
作者 林玲 周立柱
出处 《清华大学学报(自然科学版)》 EI CAS CSCD 北大核心 2010年第4期551-555,共5页 Journal of Tsinghua University(Science and Technology)
基金 国家自然科学基金与微软联合资助重点项目(60833003)
关键词 WEB数据库 模式识别 简单查询接口 基于实例的探测查询 Web database schema identification simple query interface instance-based query probing
  • 相关文献

参考文献7

  • 1He B, Chang K C. Statistical schema matching across Web query interfaces [C]// Proc 22nd ACM SIGMOD Conf. ACM, 2003: 217-228.
  • 2He H, Meng W, Yu C, et al. WISE-Integrator: A system for extracting and integrating complex Web search interfaces of the deep Web [C]//Proc 31st VLDB Conf. Morgan Kaufmann, 2005:1314 - 1317.
  • 3Wu W, Doan A, Yu C. Merging interface schemas on the deep Web via clustering aggregation [C]// Proc Int Conf on Data Mining. IEEE Computer Society, 2005:801 -804.
  • 4Wu W, Doan A, Yu C. Webiq: Learning from the web to match deep-web query interfaces [C]//Proc Int Conf on Data Engineering. IEEE Computer Society, 2006: 44.
  • 5Wang J, Lochovsky F. Data extraction and label assignment for Web databases [C]//Proc 12th WWW Conf. ACM, 2003:187-196.
  • 6Wang J, Wen J, Lochovsky F, et al. Instance-based schema matching for Web databases by domain-specific query probing [C]// Proc 30th VLDB Conf. Morgan Kaufmann, 2004: 408 - 419.
  • 7Lin L, Zhou L. Leveraging webpage classification for data object recognition [C]// Proc Conf on Web Intelligence. IEEE Computer Society, 2007 : 667 - 670.

同被引文献35

  • 1孙晨.利用机器学习技术获取WEB页面中的匹配数[J].中国科教创新导刊,2007(23):187-189. 被引量:1
  • 2黄晓冬.Invisible Web研究综述[J].情报科学,2004,22(9):1144-1148. 被引量:19
  • 3Ipeirotis P G,Gravano L,Sahami M.Probe,count,and classify:Categorizing hidden web databases//Proceedings of the SIGMOD Conference.Santa Barbara,CA,2001:67-78.
  • 4Chau M,Chen H.A machine learning approach to web page filtering using content and structure analysis.Decision Support Systems,2008,44(2):482-494.
  • 5Barbosa L,Freire J.Combining classifiers to identify online databases//Proceedings of the 16th International Conference on World Wide Web.Banff,Alberta,Canada,2007:431-440.
  • 6Cope J,Craswell N,Hawking D.Automated discovery ofsearch interfaces on the web//Proceedings of the 14th Australian Database Conference.Australia,2003:181-189.
  • 7Raghaven S,Garcia-Molina H.Crawling the hidden web//Proceedings of the 27th International Conference on Very Large Data Bases.Italy,2001,129-138.
  • 8Chang K C,He B,Li C.Structured databases on the Web:Observations and implications.SIGMOD Record,2004,33 (3):61270.
  • 9Gravano L,Ipeirotis P G,Sahami M.QProber:A system for automatic classification of hidden-web databases.ACM Transactions on Information System,2003,22(1):1-41.
  • 10Su W,Wang J,Lochovsky F H.Automatic hierarchical classification of structured deep web databases//Proceedings of the 7th International Conference on Web Information Systems Engineering,China,2006:210-221.

引证文献3

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部