期刊文献+

改进的基于PSO的离群点检测算法 被引量:1

Improved PSO-based algorithm for outlier detection
在线阅读 下载PDF
导出
摘要 针对Mohemmed等新近提出的基于粒子群优化(PSO)算法的离群点检测方法(MOHEMMED A,ZHANGM,BROWNE W.Particle swarm optimisation for outlier detection[C]//GECCO'10:Proceedings of the 12th AnnualConference on Genetic and Evolutionary Computation.Oregon,Portland:ACM,2010:83-84)可能出现适应值和相应数据对象的离群度不匹配的不合理现象,分析了存在这种现象的原因,并提出一种改进的适应值函数。新的适应值调整了对不合理邻域半径估值的惩罚力度,从而弱化粒子适应值和对象离群度之间的偏差;算法在解空间范围内搜索近似最优粒子,以确定合适的邻域半径估值;最终基于该半径估值衡量各数据对象的离群度。通过对若干UCI数据集的实验表明,采用新的适应值函数的离群检测算法优于原有方法和LOF方法。所提算法不仅解决了上述存在的问题,离群点检测效果也更突出,这表明合理定义适应值函数有助于提高算法的检测质量。 A new outlier detection method based on Particle Swarm Optimization (PSO) was recently proposed by Mohemmed, et al. ( MOHEMMED A, ZHANG M, BROWNE W. Particle swarm optimisation for outlier detection [C]// GECCO'10: Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation. Oregon, Portland: ACM, 2010:83 - 84). There exists an unreasonable phenomenon that its way of defining the fitness function does not necessarily ensure a good match with outlying degree of an object. A new fitness function by weakening the penalty on unreasonable radiuses was proposed so that the deviation between a particle's fitness and outlying degree of the corresponding data object was narrowed. The algorithm searched for an approximate optimal solution, and the radius was then determined to compute the outlying degree of each object. The experimental results on several UCI datasets show the superiority of the proposed outlier detection method with the new fitness function over the original one and the LOF algorithm. The study shows that a reasonable definition of fitness function contributes to the improvement in quality of outlier detection.
出处 《计算机应用》 CSCD 北大核心 2012年第A01期139-143,共5页 journal of Computer Applications
基金 福建省自然科学基金资助项目(2010J01329) 福建省高校产学研重大项目(2010H6012)
关键词 数据挖掘 离群点检测 粒子群优化 离群度 适应值函数 data mining outlier detection Particle Swarm Optimization (PSO) outlying degree 7 fitness function
  • 相关文献

参考文献25

  • 1HAN J W, MICHELINE K. Data mining: concepts and techniques [ J]. 2nd ed. San Francisco: Morgan Kaufmann Publishers, 2006.
  • 2薛安荣,姚林,鞠时光,陈伟鹤,马汉达.离群点挖掘方法综述[J].计算机科学,2008,35(11):13-18. 被引量:69
  • 3KNORR E, NG R, TUCAKOV V. Distance-based outliers: algo-rithms and applications [J]. The VLDB Journal, 2000, 8(3/4) : 237 - 253.
  • 4KNORR E, NG R. Finding intentional knowledge of distance-based outliers[ C]// Proceedings of the 25th VLDB Conference. Edin- burgh, Scotland: Morgan Kaufmann, 1999:211-222.
  • 5BA1LNE'IT V, LEW]S T. Out|iers in statistical data [ M]. 3rd ed. New York: John Wiley & Sons, 1994.
  • 6JOHNSON T, KWOK I, NG R. Fast computation of 2-dimensional depth contours [ C]// Proceedings of the 4th International Confer- ence on Knowledge Discovery and Data Mining. New York: AAAI Press, 1998:224-228.
  • 7RAMASWAMY S, RASTOGI R, SHIM K. Emeient algorithms for mining outliers from large data sets[ C]// Proceedings of the ACM SIGMOD International Conference on Management of Data. New York: ACM, 2000:427-438.
  • 8BAY S D, SCHWABACHER M. Mining distance-based outliers in near linear time with randomization and a simple pruning rule [ C]// Proceedings of the 9th ACM SIGKDD International Conference onKnowledge Discovery and Data Mining. New York: ACM, 2003:29 -38.
  • 9BREUNIG M, KRIEGEL H, NG R, et al. LOF: Identifying densi- ty-based local outliers [ C] // Proceedings of ACM SIGMOD Interna- tional Conference on Management of Data. New York: ACM, 2000: 93 - 104.
  • 10AGGARWAL C, YU P. An effective and efficient algorithm for high-dimensional outlier detection [ J]. The VLDB Journal, 2005, 14(2): 211-221.

二级参考文献85

共引文献130

同被引文献16

  • 1Cover T M,Thomas J A,阮吉寿,等.信息论基础[M].北京:机械工业出版社,2005.348-354.
  • 2HANJW,KAMBERM.数据挖掘:概念与技术[M].范明,盂小峰译.北京:机械工业出版社.2007.
  • 3MOHEMMED A W, ZHANG M, BROWNE W N. Particle swarm optimization /or outlier detection[C]//Proceedings of the 12th Annual Conference on Genetic and Evolutionary Com- putation. New York, N. Y. , USA:ACM,2010~83-84.
  • 4WU Shu, WANG Shengrui. Information-theoretic outlier de- tection for large-scale categorical data[J]. IEEE Transactions on Knowledge and Data Engineering, 2013,25 (3) : 589-602.
  • 5NGUYEN H V, MOLLER E, VREEKEN J, et al. CMI:an information theoretic contrast measure for enhancing subspace cluster and outlier detection[C]//Proceedings of the 13th SI- AM International Conference on Data Mining, Austin, Texas, USA : SDM, 2013 : 198-206.
  • 6LEE W, XIANG D. Information-theoretic measures for anomaly detection[C]//Proceedings of IEEE Symposium on Security andPrivacy. Washington,D. C. , USA: IEEE,2001 : 130 143.
  • 7HE Zengyou, DENG Shengchun, XU Xiaofei. An optimizati- on model for outlier detection in categorical data EMJ//Ad- vanees in Intelligent Computing. Heidelberg, Germany: Springer-Verlag, 2005 : 400-409.
  • 8DAS K, SCHNEIDER J. Detecting anomalous records in cate- gorical datasets[-C~//Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, N.Y. , USA:ACM,2007:220-229.
  • 9DAS K, SCHNEIDER J, NEILL D B. Anomaly pattern de- tection in categorical datasetsEC~//Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Dis- covery and Data Mining. New York, N. Y. , USA~ ACM, 2008 ~ I69-176.
  • 10LI Hui, ZHANG Shu, WANG Xia. An outlier detection al- gorithm based on information entropy and rough set[J]. In ternational Journal of Digital Content Technology And Its Applications,2012,6(20) :97-106.

引证文献1

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部