摘要
连续属性离散化是数据分析中重要的预处理过程,而基于粗糙集理论的数据分析要求离散化的结果能够最大程度地保持原信息系统的分辨关系。论文提出了一种新的离散化算法,此算法以决策信息系统中决策属性对条件属性集合的依赖度作为评价函数动态调整DBSCAN聚类算法的参数,直至离散化决策属性对条件属性集合的依赖度达到预先指定的阈值为止。算法分析和实验证明,算法是切实可行的。
Discretization of continuous features is an important process of data preprocessing.The data analysis based rough set theory demands keeping discernibility of information system.In this article,a novel approach for discretization of continuous features based on DBSCAN algorithm and feature dependency are proposed.In order to set appropriate parameters of DBSCAN clustering algorithm,we have employed dependency of decision feature for condition feature set as evaluate criterion.In this algorithm,we have adjusted parameters of DBSCAN dynamically until the threshold of dependency is satisfied.Experiment and analysis of algorithm show that this algorithm is feasible.
出处
《计算机工程与应用》
CSCD
北大核心
2006年第13期149-151,共3页
Computer Engineering and Applications
关键词
离散化
DBSCAN聚类
属性依赖度
discretization,DBSCAN clustering algorithm,feature dependency