期刊文献+

一种基于数据流的软子空间聚类算法 被引量:31

Soft Subspace Clustering Algorithm for Streaming Data
在线阅读 下载PDF
导出
摘要 针对高维数据的聚类研究表明,样本在不同数据簇往往与某些特定的数据特征子集相对应.因此,子空间聚类技术越来越受到关注.然而,现有的软子空间聚类算法都是基于批处理技术的聚类算法,不能很好地应用于高维数据流或大规模数据的聚类研究中.为此,利用模糊可扩展聚类框架,与熵加权软子空间聚类算法相结合,提出了一种有效的熵加权流数据软子空间聚类算法——EWSSC(entropy-weighting streaming subspace clustering).该算法不仅保留了传统软子空间聚类算法的特性,而且利用了模糊可扩展聚类策略,将软子空间聚类算法应用于流数据的聚类分析中.实验结果表明,EWSSC算法对于高维数据流可以得到与批处理软子空间聚类方法近似一致的实验结果. A key challenge to most conventional clustering algorithms in handling many real life problems is that data points in different clusters are often correlated with different subsets of features. To address this problem, subspace clustering has attracted increasing attention in recent years. However, the existing subspace clustering methods cannot be effectively applied to large-scale high dimensional data and data streams. In this study, the scalable clustering technique to subspace clustering is extend to form soft subspace clustering for streaming data. An entropy-weighting streaming subspace clustering algorithm, EWSSC is proposed. This method leverages on the effectiveness of fuzzy scalable clustering method for streaming data by revealing the important local subspace characteristics of high dimensional data. Substantial experimental results on both artificial and real-world datasets demonstrate that EWSSC is generally effective in clustering high dimensional streaming data.
出处 《软件学报》 EI CSCD 北大核心 2013年第11期2610-2627,共18页 Journal of Software
基金 国家自然科学基金(61273258 61272437 61073189) 上海市自然科学基金(13ZR1417500) 上海市教育委员会科研创新项目(14YZ131)
关键词 子空间聚类 数据流聚类 可扩展聚类 模糊聚类 文本聚类 subspace clustering data stream clustering scalable clustering fuzzy clustering document clustering
  • 相关文献

参考文献6

二级参考文献23

  • 1李洁,高新波,焦李成.基于特征加权的模糊聚类新算法[J].电子学报,2006,34(1):89-92. 被引量:117
  • 2修宇,王士同,吴锡生,胡德文.方向相似性聚类方法DSCM[J].计算机研究与发展,2006,43(8):1425-1431. 被引量:21
  • 3王丽娟,关守义,王晓龙,王熙照.基于属性权重的Fuzzy C Mean算法[J].计算机学报,2006,29(10):1797-1803. 被引量:47
  • 4Hoppner F, Klawonn F. Improved fuzzy partitions for fuzzy regression models [J]. Journal of Approximate Reasoning, 2003, 32(2): 85-102
  • 5Bezdek J C. Pattern Recognition with Fuzzy Objective Function Algorithms [M]. New York: Plenum, 1081
  • 6Bezdek J C, Hathaway R J, Sahin M J, et al. Convergence theory for fuzzy c-means: Counterexamples and repairs [J]. IEEE Trans on SMC, 1987, 17(5): 873-877
  • 7Zhang Y J, Liu Z Q. Self-splitting competitive learning: A new on-line clustering paradigm [J]. IEEE Trans on Neural Network, 2002, 13(2) : 369-380
  • 8Wu S H, Liew A W, Hong Y, et al. Cluster analysis of gene expression data based on self-splitting and merging competitive learning [J]. IEEE Trans on Information Technology in Biomedicine, 2004, 8 (1) : 5-15
  • 9Xu L, Krzyak A, Oja E. Rival penalized competitive learning for clustering analysis, RBF net and curve detection [J]. IEEE Trans on Neural Network, 1993, 4(4): 636-649
  • 10Blake C L, Merz C J. UCI repository of machine learning databases [D]. Irvine, CA: University of California, Department of Information and Computer Science, 1998

共引文献1377

同被引文献271

引证文献31

二级引证文献112

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部