摘要
为提高数据采掘的效率,通常需要在提供同等分析结果的情况下对原数据集进行简化。文章提出了一种有效的数据缩减算法Sodra,以无监督与有监督相结合的学习方式生成适于分类的缩减数据集。对实际数据集和人工数据集的分类实验表明,所提出的算法既能大大降低空间需求,又不损害分类性能。同时,利用缩减集上的特征分析算法Relif-P可进一步提高算法对无关特征的适应能力。
Data reduction techniques are used to obtain a reduced representation of the data set that is much smaller in volume.It should be more efficient for mining on the reduced data set yet produce the same or almost the same analytical results.In the paper,a new self-organizing based data reduction algorithm called Sodra is proposed,which is an iterative process of unsupervised learning and supervised learning.The results of two experiments on real and artifi-cial datasets show that the reduced data set generated by Sodra can achieve the same generalization accuracy as its o-riginal while requiring much less storage,and increase the tolerance for irrelevant features through the use of Relif-P,a feature relevance analysis algorithm on the reduced data.
出处
《计算机工程与应用》
CSCD
北大核心
2003年第24期189-192,共4页
Computer Engineering and Applications