摘要
本文提出了一种基于训练集划分的随机森林算法。该算法首先将多数类划分为多个不相交子集。然后将每个子集与少数类合并,进行决策树的训练。最后根据平均加权策略构建随机森林,并获取最终的分类规则。本文所提方法避免了原始样本信息的损失,而且保持了子分类器的样本平衡。在人工生成数据集上的仿真实验表明本文方法非常有效。
In this paper, a random forest algorithm based on the training set splitting is proposed. Firstly, the majority class is divided into multiple disjointed sunsets. Then combine each subset with the rare class to train a decision tree. Finally, construct a random forest based on the average weighted strategy, and obtain the final classification rules. The proposed method avoids the loss of the original sample information, and maizes the training set balanced for each decision tree. Experiments on the artificial imbalanced data show that this method is very effective.
出处
《科技通报》
北大核心
2013年第10期124-126,共3页
Bulletin of Science and Technology
关键词
随机森林
不相交子集
决策树
平均加权
random forest
disjointed sunsets
a decision tree
average weighted