摘要
针对分类数据,利用属性相关性与属性值的分布特征,提出一种子空间聚类算法。该算法采用基于互信息和联合熵的属性相关性度量方法,结合各属性值的分布特征,细化了属性子空间的度量粒度;以最大化簇内数据相似度为聚类目标,引入属性值作用力的概念,强化了关键属性值对簇内数据的紧凑作用,加快聚类迭代速度。在人工数据集和UCI数据集上,验证了算法的正确性,可伸缩性和可靠性。
A subspace clustering algorithm is proposed for categorical data,which utilizes the attribute correlations and the distribution characteristics of attribute values.The algorithm adopts an attribute correlation measurement method based on mutual information and joint entropy,combined with the distribution characteristics of attribute values,to refine the granularity of attribute subspaces.With the clustering objective of maximizing intra-cluster data similarity,the attribute value force is introduced to strengthen the compacting effect of key attribute values on cluster cohesion,thereby accelerating the clustering iteration speed.The algorithm’s correctness,scalability,and reliability are verified on artificial datasets and UCI datasets.
作者
庞宁
任彦豪
PANG Ning;REN Yan-hao(School of Applied Science,Taiyuan University of Science and Technology,Taiyuan 030024,China;College of computer Science and Technology,Taiyuan University of Science and Technology,Taiyuan 030024,China)
出处
《计算机工程与设计》
北大核心
2025年第8期2186-2192,共7页
Computer Engineering and Design
基金
山西省自然科学研究面上基金项目(20210302123224)
太原科技大学博士科研启动基金项目(20202066)。
关键词
属性相关度
属性值权重
子空间聚类
分类数据
属性值作用力
互信息
频率分布
attribute correlation
attribute value weight
subspace clustering
categorical data
attribute value force
mutual information
frequency distribution