摘要
介绍了优先队列方法(PriorityQueueStrategy,PQS),并以此为基础,研究了在数据模式与匹配模型不变的前提下,数据源动态增加时近似重复记录识别问题,提出了一种增量式算法IPQS(IncrementalPQS),最后给出了实验结果。
We introduce priority queue strategy. Based on this idea, we study the problem for detecting approximately duplicate records while receiving increments of data with no changes in data schema and matching model, and present an incremental algorithm for detecting the records. Finally, we give out the experimental results.
出处
《计算机应用》
CSCD
北大核心
2003年第9期61-63,共3页
journal of Computer Applications
关键词
数据清理
近似重复记录
增量式识别
特征记录
data cleaning
approximately duplicate record
incremental detection
representative record