摘要
实际问题中经常涉及连续的数值属性,然而许多归纳学习算法却是针对离散属性空间的。因此,对数据进行预处理的离散化算法一直受到人们的重视。兼顾所有属性间关系的整体离散化是一个重要方法,该文提出基于数据分区的整体离散化算法,它首先对例子集合在各个连续属性上的取值进行统一的放大处理,选出包含最多聚类信息的属性,将整个例子集合粗略的划分为多个分区;然后在各个分区中分别进行聚类、合并。该方法改进了基本的整体离散化算法,并利用农业专家系统中的土壤分类数据对算法进行了验证。
The continuous attribute problems are often encountered in the real world, but many outstanding inductive learning algorithms are mainly based on a discrete feature space. Therefore, discretization techniques, one of the data preprocessing steps, have been attracting much attention before inductive learning algorithms are applied. The global discretization approach considering relation between all the involved attributes is one of the important discretization methods. The global discretization algorithm based on data - partitioning is proposed, which carries on an enlarge treatment on every continuous attribute axis, chooses the attribute which contains the maximum clustering information and partitions the entire data set into several section in the gross, and then carries on clustering treatment and unites them. Finally, the algorithm can improve the basic global discretization method, and is tested on the soil sorts data of agricultural expert system.
出处
《杭州电子科技大学学报(自然科学版)》
2006年第1期18-21,共4页
Journal of Hangzhou Dianzi University:Natural Sciences
关键词
归纳学习
离散化
数据分区
整体离散化
inductive learning
discretization
data - partitioning
global discretization