期刊文献+

利用子空间划分的局部离群数据挖掘算法 被引量:4

Local Outlier Mining Algorithm Using Subspace Partitioning
在线阅读 下载PDF
导出
摘要 目前大多数局部离群数据挖掘算法需人为事先设置参数或阈值,且难以应用到高维数据集.给出一种新的局部离群数据挖掘算法PSO-SPLOF,该算法首先将数据集划分为互不相交的子空间,利用偏斜度判断子空间划分的优劣,并采用微粒群算法搜索最优划分子空间集;其次针对每个最优划分子空间,计算其数据对象的局部离群因子SPLOF值,并用SPLOF值来度量数据对象的局部偏离程度.最后采用离散化的天体光谱数据作为数据集,实验验证了PSO-SPLOF算法具有受人为因素影响小、伸缩性强和运算效率高等优点. Most local outlier mining algorithms depend on the parameters that user inputs,and it is difficult to apply to high-dimensional data set.In this paper,a novel algorithm(PSO-SPLOF) of local outlier mining is presented.Firstly,data set is divided into the disjoint subspaces,merits of the subspace partition is measured by skew of partition,and the best partition of the subspaces is searched by using the optimal particle swarm algorithm.Secondly,the local outlier factor(SPLOF) value of data objects is computed for each subspace in the best partition,and local outliers is measured by its SPLOF value.Finally,experimental results show that the PSO-SPLOF algorithm is not affected by man-made factors,and has strong scalability and high efficiency by taking star spectral data as data set.
出处 《小型微型计算机系统》 CSCD 北大核心 2011年第8期1628-1632,共5页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(61073145)资助 山西省自然科学基金项目(2010011021-2)资助 山西省回国留学人员科研项目(2009-77)资助
关键词 离群数据挖掘 微粒群算法 子空间 划分偏斜度 天体光谱数据 outlier mining particle swarm optimization sub-space skew of partition star spectrum
  • 相关文献

参考文献5

二级参考文献43

  • 1刘中田,李乡儒,吴福朝,赵永恒.基于小波特征的M型星自动识别方法[J].电子学报,2007,35(1):157-160. 被引量:11
  • 2张继福,蔡江辉.面向LAMOST的天体光谱离群数据挖掘系统研究[J].光谱学与光谱分析,2007,27(3):606-609. 被引量:6
  • 3蒋义勇,张继福,张素兰.基于链表结构的概念格渐进式构造[J].计算机工程与应用,2007,43(11):178-180. 被引量:11
  • 4D Hawkins. Identification of Outliers. London: Chapman and Hall, 1980
  • 5V Barnett, T Lewis. Outliers in Statistical Data. New York: John Wiley, 1994
  • 6E Knorr, R Ng. Algorithms for mining distance-based outliers in large data sets. The 24th Int'l Conf on Very Large Data Bases. New York, 1998
  • 7S Ramaswamy, R Rastogi, K Shim. Efficient algorithms for mining outliers from large data sets. The ACM SIGMOD 2000 Int'l Conf on Management of Data, Dalles, TX, 2000
  • 8R Agrawal, P Ragaran. A linear method for deviation detection in large databases. In: Proc of the 2nd Int'l Conf on Knowledge Discovery and Data Mining. Portland, OR: AAAI Press, 1996. 164~169
  • 9M Breunig, Hans-Peter Kriegel, R Ng et al. LOF: Identifying density-based local outliers. The ACM SIGMOD 2000 Int'l Conf on Management of Data, Dalles, TX, 2000
  • 10M Ester, Hans-Peter Kriegel, J Sander et al. Incremental clustering for mining in a data warehousing environment. The 24th Int'l Conf on Very Large Data Bases, New York, 1998

共引文献175

同被引文献29

引证文献4

二级引证文献39

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部