期刊文献+

核密度估计及其在聚类算法构造中的应用 被引量:70

Kernel Density Estimation and Its Application to Clustering Algorithm Construction
在线阅读 下载PDF
导出
摘要 经典数理统计学中的核密度估计理论是构造基于数据集密度函数聚类算法的理论基础 ,采用分箱近似的快速核密度函数估计方法同样为构造高效的聚类算法提供了依据 通过对核密度估计理论及其快速分箱核近似方法的讨论 ,给出分箱近似密度估计相对于核密度估计的均方误差界 ,提出基于网格数据重心的分箱核近似方法 在不改变计算复杂度的条件下 ,基于网格数据重心的分箱核近似密度函数计算可以有效地降低近似误差 ,这一思想方法对于构造高效大规模数据聚类分析算法具有指导意义 Kernel density estimation provides solid foundation for density based clustering algorithm construction While binned approximation is shown to be an efficient mechanism for fast kernel density computation, it is also proven to be a promising approach to construct robust clustering algorithms This paper deals with formation and accuracy of the binned kernel density estimators, presents mean squared error bounds for the closeness of such estimators to the unbinned kernel density estimators To improve the accuracy of the binning method, a nave grid level approximated density estimator is constructed, followed by a detailed proof of its mean squared error bounds The improved approach constructs binned density estimator by substituting the center of a grid with the gravity center of the data points, which results in better estimation accuracy without loss of computation efficiency As a main concern, the close relation between the density based clustering algorithms and the kernel estimation methods is revealed
出处 《计算机研究与发展》 EI CSCD 北大核心 2004年第10期1712-1719,共8页 Journal of Computer Research and Development
基金 国家自然科学基金项目 ( 70 3 710 15 ) 国家科技部中小型企业创新基金项目 ( 0 2C2 62 13 2 10 0 70 ) 江苏省教育厅自然科学基金项目( 0 2KJB5 2 0 0 12 )
关键词 核密度估计 分箱规则 聚类算法 kernel density estimation binning rule clustering algorithm
  • 相关文献

参考文献2

二级参考文献18

  • 1Sheikholeslami G, Chatterjee S, Zhang A. Wave-Cluster: A multi-resolution clustering approach for very large spatial databases. In:Proceedings of the 24th International Conference on Very Large Databases. New York, 1998. 428~439.
  • 2Aggrawal R, Gehrke J, Gunopulos D, Raghawan P. Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. Seattle, WA, 1998.94~ 105.
  • 3Wang W, Yang J, Muntz R. STING: A statistical information grid approach to spatial data mining. In: Proceedings of the 23rd International Conference on Very Large Databases. Athens, Greece, 1997.186~ 195.
  • 4Hinneburg A, Keim DA. An efficient approach to clustering in large multimedia databases with noise. In: Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD'98). New York, 1998.58~65.
  • 5Xing EP, Karp RM. CLIFF: Clustering of high dimensional microarray data via iterative feature filtering using normalized cuts.BIOINFORMATICS, 2001,1(1):1~9.
  • 6Hinneburg A, Keim DA, Brandt W. Clustering 3D-structures of small amino acid chains for detecting dependences from their sequential context in proteins. In: Proceedings of the IEEE International Symposium on BioInformatics and Biomedical Engineering. Washington, DC, 2000. 43-49.
  • 7Xu X, Ester M, Kriegel H, Sander J. A distribution-based clustering algorithm for mining in large spatial databases. In: Proceedings of the 14th International Conference on Data Engineering, ICDE'98. Orlando, FL, 1998. 324~331.
  • 8Silverman B. Density Estimation for Statistics and Data Analysis. Chapman & Hall, 1986.72~113.
  • 9Han J, Kamber M. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, 2000.335~398.
  • 10Berchtold S, Keim D, Kriegel HP. The X-tree: An index structure for high-dimensional data. In: Proceedings of the International Conference on Very Large Databases. Bombay, India, 1996.28~39.

共引文献41

同被引文献804

引证文献70

二级引证文献477

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部