摘要
数据挖掘中划分训练集和测试集数据是数据预处理阶段的一个基础。实际上,训练集的选择在推导良好的分类规则方面是一个重要因素。传统的关联规则挖掘的方法是基于统计算法将数据集分成训练集和测试集。本文提出了采用遗传算法,将原始数据集分为采样集和验证集。然后采用遗传算法找到一个合适的分割将样本集分成训练集和测试集。通过实验,使用该算法得到的训练集作为关联规则挖掘算法的输入,可以产生高准确率的分类规则。
Divided into a training set and a test set of data in data mining is a basis of the data pre-processing stage. In fact, the choice of the training set is an important factor in the derivation of the good classification rules. The traditional association rule mining algorithm based on statistical data and divided into a training set and a test set. In this paper, we propose the use of genetic algorithms, the original data set is divided into sample collection validation set. Then using genetic algorithms to find an appropriate split of the sample set is divided into a training set and a test set. Experiments use the training set as association rule mining algorithm input, the algorithm can produce high accuracy classification rules.
出处
《科技通报》
北大核心
2013年第12期211-213,共3页
Bulletin of Science and Technology
关键词
预处理
数据挖掘
遗传算法
preconditioning
data mining and genetic algorithm