摘要
对后继阶段的机器学习或数据挖掘过程而言 ,决策系统中连续属性值的离散化具有非常重要的意义 .本文系统地研究了基于粗集理论的数据离散化方法 :提出一种计算候选断点集合的算法 ;定义概念“选择概率”来合理、有效地度量和区分候选断点的相对重要性 ;最后基于这一概念提出一种确定结果断点子集的启发式算法 .理论分析及仿真结果表明 ,算法的综合性能优于文献报道的同类算法 .
The discretization of continuous feature values of a decision system is always with great contribution to the followed process of machine learning or data mining. In this paper the approaches for data discretization based on Rough Set theory are studied systematically: a new algorithm is put forward to compute the set of candidate cuts; a new conception, i.e. Selection Possibility, is then introduced to effectually and reasonably measure and distinguish the relative importance of candidate cuts; at last a heuristic algorithm based on this newly defined conception is also proposed to figure out the subset of result cuts which are ultimately used in the discretizating processes. Theoretical analysis and simulation results demonstrate that the comprehensive performances of these algorithms are better than those of analogous algorithms reported in literature.
出处
《小型微型计算机系统》
CSCD
北大核心
2004年第1期60-64,共5页
Journal of Chinese Computer Systems
基金
国家自然科学基金(69803014)资助
攀登特别支持费
教育部高等学校骨干教师资助计划(GG-520-10617-1001)资助
教育部留学回国人员科研启动基金
重庆市科委攻关基金
重点市中青年优秀骨干教师基金资助
关键词
粗集
离散化
候选断点
结果断点
选择概率
Rough Set
Discretization
Candidate Cuts
Result Cuts
Selection Possibility