期刊文献+

粒子群优化BP神经网络用于重复记录检测 被引量:4

Particle swarm optimization BP neural network for detection of duplicate records
在线阅读 下载PDF
导出
摘要 针对数据规模的扩大,重复记录检测效率往往不能进一步提升的问题,提出一种粒子群优化BP神经网络的重复记录检测方法,充分利用了神经网络的非线性映射和粒子群算法的全局优化特性。将基于学习的思想和进化的思想应用到重复记录检测中,避开了传统方法计算属性权重的问题。理论分析和实验表明:该方法不仅具有好的检测精度,而且具有很好的时间效率,能够有效地解决大数据量的相似重复记录检测问题。 This paper presents a method for duplicate records detection using particle swarm optimization BP neural network. The method takes advantage of non-linear mapping neural networks and particle swarm global optimization features. Learning and evolution-based thinking is applied in the detection of duplicate records to avoid the attribute weight problem in traditional method. Theoretical analysis and experimental study show that the method not only has high detection accuracy,but also has high efficiency. It can be used to effectively conduct approximately duplicated records detection in a large volume of dataset.
作者 马翔
出处 《辽宁工程技术大学学报(自然科学版)》 CAS 北大核心 2010年第5期959-962,共4页 Journal of Liaoning Technical University (Natural Science)
关键词 重复记录检测 BP神经网络 粒子群算法 智能检测 duplicate records detection BP neural network particle swarm algorithm intelligent detection
  • 相关文献

参考文献14

二级参考文献76

  • 1程国达,苏杭丽.一种检测汉语相似重复记录的有效方法[J].计算机应用,2005,25(6):1362-1365. 被引量:8
  • 2倪维健,黄亚楼,李飞,刘赏.一种基于加权多代表点的层次聚类算法[J].计算机科学,2005,32(5):150-154. 被引量:5
  • 3郭俊,樊彦国.一种改进的CURE聚类算法[J].内蒙古石油化工,2005,31(8):12-15. 被引量:4
  • 4韩京宇,徐立臻,董逸生.一种大数据量的相似记录检测方法[J].计算机研究与发展,2005,42(12):2206-2212. 被引量:32
  • 5Hernandez M,Stolfo S.The merge/purge problem for largedatabases[C]//Proc ACM SIGMOD International Conference on Management of Data, 1995: 127-138.
  • 6Hernandez M A,Stolfo S J.Real-World data is dirty:data cleansing and the merge/purge problem[J].Data Mining and Knowledge Discovery, 1999,2( 1 ) :9237.
  • 7Qiu Y F,Tian Z P,Ji W Y,et al.An efficient approach for detecting approximately duplicate database reeords[J].Chinese J of Computers, 2001,24( 1 ).
  • 8Surajit C,Kris G,Venkatesh G,et al.Robust and efficient fuzzy match for online data cleaning[C]//Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data.New York, USA : ACM Press, 2003 : 313-324.
  • 9Han Jiawei.Kamber M.Data mining concepts and techniques[M].北京:机械工业出版社,2004:223-263.
  • 10Lee M L, Lu Hongjun, Ling T W, et al. Cleansing Data for Mining and Warehousing[C]//Proc. of the 10th Int'l Conf. on Database and Expert Systems Applications. Florence, Italy: [s. n.], 1999:751-760.

共引文献372

同被引文献33

  • 1葛利.一种基于混合遗传算法学习的过程神经网络[J].哈尔滨工业大学学报,2005,37(7):986-988. 被引量:21
  • 2朱恒民,王宁生.一种改进的相似重复记录检测方法[J].控制与决策,2006,21(7):805-808. 被引量:12
  • 3徐琳宏,林鸿飞,杨志豪.基于语义理解的文本倾向性识别机制[J].中文信息学报,2007,21(1):96-100. 被引量:125
  • 4Liang Jin, Chen Li, Mehrotra S. Efficient record linkage in large data sets[C]//Proc, of the 8th Int'l Conf on Database. Systems for Advanced Applications. Washington: IEE[Computer Socie- ty, 2003 : 137-148.
  • 5Elmagarmid A K, Panagiotis G, et al. Duplicate record detection: a survey[J]. IEEE Transactions on Knowledge and Data Engi- neering, 2007,19 (1) : 1-16.
  • 6Elmagarmid K, Panagiotis G. Duplicate record detection: a sur- vey [J]. IEEE Transaction on Knowledge and Data Enginee- ring, 2007,19(1) : 1-16.
  • 7Minton S N, Nanjo C, Knobloek C A. A heterogeneous field matching method for record linkage[C]//Proceedings of the 5th International Conference on Data Mining. Washington: IEE[Computer Society, 2005 : 314-321.
  • 8Imagarmid A K, Ipeirotis P G, Verykios V S. Duplicate record detec- tion:a survey [ J ]. IEEE Transactions on Knowledge and Data Engi- neering,2007,19 ( 1 ) : 1 - 16.
  • 9Li Huang, Hai Jin, Pingpeng Yuan, et al. Duplicate records cleansing with length filtering and dynamic weighting [ C ]. Fourth International Conference on Semantics, Knowledge and Grid. Beijing: IEEE Press, 2008:95 - 102.
  • 10Coelho L S. Gaussian quantum behaved particle swarm optimization ap- proaches for constrained engineering design problems [ J ]. Expert Sys- tems with Applications,2010,37 (2) : 1676 - 1683.

引证文献4

二级引证文献17

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部