摘要
修剪决策树可以在决策树生成时或生成后,前者称为事前修剪。决策树上的每一个节点对应着一个样例集,通过分析样例集中样例的个数或者样例集的纯度,提出了基于节点支持度的事前修剪算法PDTBS和基于节点纯度的事前修剪算法PDTBP。为了达到修剪的目的,PDTBS阻止小样例集节点的扩展,PDTBP阻止高纯度样例集节点的扩展。分析表明这两个算法的时间复杂度均呈线性,最后使用UCI的数据实验表明:算法PDTBS,PDTBP可以在保证分类精度损失极小的条件下大幅度地修剪决策树。
Pruning decision tree may occur in the process of creating decision tree or after that, the former is called prepruning. Every node on decision tree has a corresponding sample set. By analyzing the quantity of sample in the sample set or the purity of it, algorithm PDTBS, viz. pre-pruning decision tree hased on support, and algorithm PI)TBP, viz. pre-pruning decision tree based on purity were put forward. For pre-pruning, PDTBS prevented the node of a small sample set from extending; PDTBP prevented the node of a high purity sample set from extending. The time complexities of two algorithms were analyzed linear. Experiment results on UCI data show that the two algorithms can pre-prune decision tree to a great extent, while all its accuracy hardly diminishes.
出处
《计算机应用》
CSCD
北大核心
2006年第3期670-672,共3页
journal of Computer Applications
关键词
决策树
事前修剪
支持度
纯度
decision tree
pre-pruning
support
purity