期刊文献+

基于优先队列的增量式重复记录识别 被引量:7

Incremental Algorithm for Detecting Approximately Duplicate Database Records Based on Priority Queue Strategy
在线阅读 下载PDF
导出
摘要 介绍了优先队列方法(PriorityQueueStrategy,PQS),并以此为基础,研究了在数据模式与匹配模型不变的前提下,数据源动态增加时近似重复记录识别问题,提出了一种增量式算法IPQS(IncrementalPQS),最后给出了实验结果。 We introduce priority queue strategy. Based on this idea, we study the problem for detecting approximately duplicate records while receiving increments of data with no changes in data schema and matching model, and present an incremental algorithm for detecting the records. Finally, we give out the experimental results.
作者 佘春红
出处 《计算机应用》 CSCD 北大核心 2003年第9期61-63,共3页 journal of Computer Applications
关键词 数据清理 近似重复记录 增量式识别 特征记录 data cleaning approximately duplicate record incremental detection representative record
  • 相关文献

参考文献5

  • 1Erhard R. Do HH. Data Cleaning: Problem and Current Approaches[J/OL]. IEEE Data Engineering Bulletin, 2000, 23(4) : 3 - 13.
  • 2Hernandez MA, Stolfo SJ. The merge/purge problem for large data-bases[A]. Proceedings of the ACM SIGMOD International Conference on Management of Data[C]. ACM Press, 1995.127 - 138.
  • 3Hernandez MA . A Generalization of Band - Joins and the Merge /Purge Problem[R]. Technical Report CUCS-005-1995, Department of Computer Science, Columbia University, 1995.
  • 4Hernandez MA. Stolfo S.I. Real-world data is dirty: Data Cleaning and the Merge/Purge problem[J]. Journal of Data Mining and Knowledge Discovery, 1998, 2( 1 ) : 9 - 37.
  • 5Monge AE. An adaptive and efficient algorithm for detecting approximately duplicate database records[EB/OL]. http://citeseer, nj.nec. com/monge00adaptive, html, 2003 - 03.

同被引文献183

引证文献7

二级引证文献64

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部