摘要
针对高维空间数据的特点,为了降低"维数灾难效应"对聚类结果的影响,提出并实现了一种新的基于遗传算法的子空间聚类算法,通过特征选择方法并结合遗传算法的全局搜索能力对所有的特征子空间进行搜索;采用实数制编码方式对解空间进行编码,并设计一种基于距离和信息熵的适应度评估函数来对聚类结果和子空间所包含的特征维进行评估。最后,通过人工数据与真实数据等几组实验验证了算法的高效性和鲁棒性。实验结果表明,本文提出的新算法能够有效地进行高维数据聚类,降低"维数灾效应"的影响。
In view of the characteristics of high dimensional spatial data and in order to reduce the curse of dimensionality effect on clustering results, this paper proposed and implemented a new sub-space clustering algorithm based on genetic algorithm, with the feature-choice method and the combination with global searching ability of genetic algorithm to search all of the feature sub-spaces. A real number system encoding method is adopted to encode the solution space, and a fitness evaluation function based on the distance and information entropy is designed to canT on evaluation on the clustering results and the characteristic dimension out of sub-space. Finally, a series of experiments of artificial data and real data were used to verify the high-efficiency and robustness of the algorithm. The results demonstrate that the new proposed algorithm can effectively carry out the high-dimensional data clustering and reduce the influence on the curse of dimensionality effect.
出处
《电子设计工程》
2013年第5期180-183,共4页
Electronic Design Engineering
基金
湖北省教育厅科学技术研究项目(Q20091112)
关键词
遗传算法
高维空间
聚类
特征维
genetic algorithm
high-dimensional space
clustering
feature dimension