摘要
决策树分类器是一个重要的数据挖掘问题,在数据流上建立决策树的关键问题是如何计算内部节点的最佳分裂标准。现有的算法有的不能处理数值型的属性,有的计算代价太高。本文采用将数值型的属性值分成适当的区间,根据它们giniindex值的特殊性质,确定具有最大giniindex梯度的区间,因而可以快速地计算最佳分裂点,实现在流数据上快速地建立决策树。
Decision tree classifier is an important data mining problem. The key issue in constructing the decision tree on data streams is to derive the best criterion of internal nodes, The existing algorithrms either cannot handle numerical attributes or have high computation cost. The paper divides the attribute value into proper intervals and identifies the interval with the maximum gini index gradient so as to determine the best splitdng point and make it true to construct decision tree on streaming data.
出处
《连云港职业技术学院学报》
2005年第2期61-64,共4页
Journal of Lianyungang Technical College