摘要
提出了一种针对离群数据规则挖掘的决策树构造方法。通过给出一个平均致密度的新定义和对离群数据产生机制的深入分析,提出离群数据的致密度往往比正常样本数据高的新认识,指出离群数据本质上也是不平衡数据,基于此提出了一种自动标记离群数据的新算法,并进一步在该算法和C4.5算法部分功能的基础上提出了一种基于离群数据自动标记的模糊决策树构造方法。仿真实验结果表明,该方法具有高效的离群数据规则挖掘能力,能处理不平衡数据,优化决策树的结构,挖掘出更高信任度的规则,有一定的实用价值。
A new decision tree construction method for outliers rule mining is presented. By studying the producing mechanism of outliers, a definition of average denseness for data distribution is given, the higher density ofoutliers than that of normal data is pointed out, and the outliers is essentially imbalanced data, too. Based on the above, an auto-tagging outliers ALTO algorithm is given. Further, an ATO-based fuzzy decision tree generation method FDTM is proposed, which constructs the decision tree by the form tree function of C4.5 finally. Experimental results demonstrate that the FDTM outperforms the C4.5 at the aspects ofthe efficiency of outliers rule mining, the confidence of the mining rules, the capability of tackling the imbalanced data and optimizing the construction of the decision tree.
出处
《计算机工程与设计》
CSCD
北大核心
2011年第5期1781-1784,共4页
Computer Engineering and Design