摘要
针对数据规模的扩大,重复记录检测效率往往不能进一步提升的问题,提出一种粒子群优化BP神经网络的重复记录检测方法,充分利用了神经网络的非线性映射和粒子群算法的全局优化特性。将基于学习的思想和进化的思想应用到重复记录检测中,避开了传统方法计算属性权重的问题。理论分析和实验表明:该方法不仅具有好的检测精度,而且具有很好的时间效率,能够有效地解决大数据量的相似重复记录检测问题。
This paper presents a method for duplicate records detection using particle swarm optimization BP neural network. The method takes advantage of non-linear mapping neural networks and particle swarm global optimization features. Learning and evolution-based thinking is applied in the detection of duplicate records to avoid the attribute weight problem in traditional method. Theoretical analysis and experimental study show that the method not only has high detection accuracy,but also has high efficiency. It can be used to effectively conduct approximately duplicated records detection in a large volume of dataset.
出处
《辽宁工程技术大学学报(自然科学版)》
CAS
北大核心
2010年第5期959-962,共4页
Journal of Liaoning Technical University (Natural Science)
关键词
重复记录检测
BP神经网络
粒子群算法
智能检测
duplicate records detection
BP neural network
particle swarm algorithm
intelligent detection