摘要
关联规则发现是数据挖掘的重要研究内容,随着数据库中数据的不断增加,大数据集环境下的关联规则发现日益受到重视,分布式关联规则发现是解决这一问题的有效方法。分布式数据库环境下的关联规则挖掘算法中,时间开销主要体现在两方面(:1)频繁项目集的确定;(2)网络的通讯量。为了解决第一个问题,文章提出了一种基于二进制形式的候选频繁项目集生成和相应的计算支持数算法,该算法只需对挖掘对象进行一些”或”、”与”、”异或”等逻辑运算操作,显著降低了算法的实现难度。将该算法与DMA算法相结合提出改进算法FDMA。理论分析和实验结果表明,算法FDMA大大提高了关联规则挖掘的效率,算法是有效可行的。
Association rule mining is an important issue in data mining.Distributed association mining is an effective method to solve the problem of association mining in large data set.The time complexity of association rule mining in distributed databases mainly focuses in the following aspects: (1)the generation of the frequent itemsets; (2)the communication consuming among each nodes.The article proposes a method to generate candidate frequent itemsets and corresponding supporting counts effficiently,which needs only some operations such as "and", "or"and "xor",Applying this idea in the existed distributed association rule mining algorithm DMA,the improved algorithm FDMA is proposed,The theoretical analysis and experiment testify that FDMA is effective and efficient.
出处
《计算机工程与应用》
CSCD
北大核心
2006年第4期165-167,194,共4页
Computer Engineering and Applications
基金
国家自然科学基金资助项目(编号:70371015)
江苏大学科研启动基金项目(编号:04KJD001)
关键词
频繁项目集
分布式关联规则挖掘
数据挖掘
布尔关联规则
frequent itemsets,distributed association rules mining,data mining,boolean association rules