摘要
当前的支持向量机和均值聚类等数据挖掘算法中,几乎都是依靠数据之间的关联性来完成数据匹配。一旦数据库中含有大量的冗余数据,将造成数据之间的相关性降低,关联性被破坏,导致传统的数据挖掘算法效率降低。为了避免上述缺陷,提出了一种弱化关联规则修补挖掘算法。利用弱聚类方法,在数据选择过程中,不将所有的元素都进行初始分类处理,只计算某一元素属于某一个类别的概率,确定多个弱聚类中心,计算不同数据之间的弱聚类关联性,从而实现关联规则较弱的冗余环境下准确的数据挖掘。实验结果表明,这种算法能够有效提高海量冗余环境下的数据挖掘效率,取得了令人满意的效果。
The support vector machine (SVM) and mean cluster data mining algorithm, almost all rely on the correla- tion between data, complete data matching. Once the database contains a large amount of redundancy data, the correla- tion between data will be reduced, and relevance is destroyed, resulting in traditional data mining algorithm efficiency lower. In order to avoid the above defects, this paper proposed a weakening association rules repair mining algorithm. In the data selection process, the method will not make initial classification processing for all elements only calculates proba- bility that one element belongs to a category, and determines multiple weak clustering center, calculates weak clustering relevance between different data, so as to realize the association rules weaker redundancy environment accurate data mining. The experimental results show that this algorithm can effectively improve the massive redundant environment data mining efficiency, has made the satisfactory effect.
出处
《计算机科学》
CSCD
北大核心
2013年第8期220-222,共3页
Computer Science
基金
国家自然科学基金(11171112)资助
关键词
海量冗余
数据挖掘
关联规则
Mass redundancy
Data mining
Association rules