期刊文献+

基于向量内积不等式的分布式k均值聚类算法 被引量:15

An Effective Distributed k-Means Clustering Algorithm Based on the Pretreatment of Vectors' Inner-Product
在线阅读 下载PDF
导出
摘要 聚类分析是数据挖掘领域的一项重要研究课题.随着数据量的急剧增加,针对大数据集的聚类分析成为一个难点.虽然k均值算法具有易实现、复杂度与数据集大小成线性关系的优点,将其应用于大数据集时仍然存在效率低的问题.分布式聚类是解决这一问题的有效方法.在已有分布式聚类算法kDMeans基础上,结合向量内积不等式关系对算法加以优化,提出分布式聚类算法kDCBIP.理论分析和实验结果表明,算法kDCBIP优于kDMeans,可以有效地解决大数据集聚类问题,算法是有效可行的. Clustering is an important research in data mining. Clustering in large data sets becomes a nut with the accumulating of the data. Despite its simplicity and its linear time, a serial k-Means algorithm's time complexity remains expensive when it is applied to a large data set. Distributed clustering is an effective method to solve this problem. In this paper, the knowledge of vectors' inner product inequation is adopted to improve efficiency Of the existing parallel k-Means algorithm(k-DMeans), and an effective distributed k-Means clustering algorithm k-DCBIP is proposed. Theoretical analysis and experimental results testify that k-DCBIP outperforms the algorithm k-DMeans, and it is effective and efficient.
出处 《计算机研究与发展》 EI CSCD 北大核心 2005年第9期1493-1497,共5页 Journal of Computer Research and Development
基金 国家自然科学基金项目(70371015) 教育部高等学校博士学科点专项科研基金项目(20040286009)~~
关键词 分布式聚类 数据点的模 向量内积 向量内积不等式 distributed clustering mode of a data point vectors' inner product vectors' inner product ineguation
  • 相关文献

参考文献10

  • 1Han Jiawei, Micheline. Data Mining: Concepts and Techniques.San Francisco: Morgan Kaufmann Publishers, 2000.
  • 2M. Ester, HP. Kriegel, J. Sander, et al. A density based algorithm of discovering clusters in large spatial databases with noise. In: E. Simoudis, Han Jiawei, U. M. Fayyad, eds. Proc.the 2nd Int'l Conf. Knowledge Discovery and Data Mining Portland. Menlo Park, CA: AAAI Press, 1996. 226~231.
  • 3Tian Zhang, Raghu Ramakrishnan, Miron Livny. BIRCH: An efficient data clustering method for very large databases. In: Proc.ACM SIGMOD Int'l Conf. Management of Data. New York:ACM Press, 1996. 73~84.
  • 4S. Guha, R. Rostogi, K. Shim. CURE: An efficient clustering algorithm for large databases. In: L. M. Haas, A. Tiwary, eds.Proc. the ACM SIGMOD Int'l Conf. Management of Data Seattle. New York: ACM Press, 1998. 73~84.
  • 5W. Zhnn, et al. Muntz. STING: A statistical information grid approach to spatial data mining. In: Proc. 23rd VLDB Conf.,San Francisco: Morgan Kaufrnann, 1997. 186~195.
  • 6S. Kantabutra, A. L. Couch. Parallel k-means clustering algorithm on Nows. NECTEC Technical Journal, 1999, 1 ( 1 ) :243~ 247.
  • 7Manasi N. Joshi. Parallel k-means algorithm on distributed memory multiprocessors. http:∥www. cs. umn. edu/~mnjoshi/PKMeans. pdf, 2003.
  • 8C. Pizzuti, D. Talia. P-Autoclass: Scalable parallel clustering for mining large data sets. IEEE Trans. Knowledge and Data Engineering, 2003, 15(6): 629~641.
  • 9O. Egecioglu, H. Ferhatosmanoglu, U. Ogras. Dimensionality reduction and similarity computation by inner-product approximates. IEEE Trans. Knowledge and Data Engineering,2004, 16(6): 714~726.
  • 10Maria Halkidi, Michalis Vazirgiannis. Clustering validity assessment: Finding the optimal partitioning of a data set. IEEE Int'l Conf. Data Mining, California, 2001.

同被引文献113

引证文献15

二级引证文献84

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部