期刊文献+

海量数据下的并行频繁项集挖掘算法 被引量:8

Parallel Frequent Itemset Mining Algorithm for Massive Data
在线阅读 下载PDF
导出
摘要 文章针对频繁项集挖掘中传统串行Eclat算法面对海量数据时挖掘效率不高的问题,提出一种海量数据下的并行频繁项集挖掘算法,即I-SPEclat算法。首先,对Eclat算法存在的缺陷进行改进,引入图的邻接矩阵作为数据的存储结构,避免了大量的交集运算;其次,利用先验性质对候选项集进行预剪枝和后剪枝,减少无用候选项集的数量,节约存储空间;再次,根据项集的前缀对数据进行划分,平衡每个计算节点的工作负载;最后,将改进的Eclat算法在Spark分布式计算框架上实现并行化。实验结果表明,I-SPEclat算法较已有的改进Eclat算法在时间消耗和内存消耗方面均有减少,且面对不同规模的数据集也有着良好的扩展性。 Aiming at the problem that the traditional serial Eclat algorithm in frequent itemset mining is not efficient when faced with mass data,this paper proposes a parallel frequent itemset mining algorithm under massive data,that is,I-SPEclat algorithm.The algorithm first improves the defects of Eclat algorithm,and introduces the adjacency matrix of graph as the storage structure of data,which avoids a large number of intersection operations.Then,the paper uses a priori nature to pre-cut and post-cut the candidate set,reduces the number of useless candidate sets and saves storage space.After that,this paper divides the data according to the prefix of the itemset,and balances the workload of each computing node.Finally,the paper parallelizes the improved Eclat algorithm on the Spark distributed computing framework.The experimental results show that the I-SPEclat algorithm is less time-consuming and memory-consuming than the existing improved Eclat algorithm,and also very scalable in the face of data sets with different sizes.
作者 敖孟飞 石鸿雁 Ao Mengfei;Shi Hongyan(School of Science,Shenyang University of Technology,Shenyang 110870,China)
出处 《统计与决策》 CSSCI 北大核心 2022年第18期48-53,共6页 Statistics & Decision
基金 国家自然科学基金资助项目(61074005)。
关键词 Eclat算法 Spark框架 邻接矩阵 剪枝优化 Eclat algorithm Spark framework adjacency matrix pruning optimization
  • 相关文献

参考文献9

二级参考文献40

  • 1易彤,徐宝文,吴方君.一种基于FP树的挖掘关联规则的增量更新算法[J].计算机学报,2004,27(5):703-710. 被引量:32
  • 2陈安龙,唐常杰,陶宏才,元昌安,谢方军.基于极大团和FP-Tree的挖掘关联规则的改进算法[J].软件学报,2004,15(8):1198-1207. 被引量:30
  • 3吉根林,杨明,宋余庆,孙志挥.最大频繁项目集的快速更新[J].计算机学报,2005,28(1):128-135. 被引量:47
  • 4李敏,李春平.频繁模式挖掘算法分析和比较[J].计算机应用,2005,25(B12):166-171. 被引量:11
  • 5AGRAWAL R, SRIKANT R. Fast Algorithms for min- ing association rules [C]// Proceedings of 20th Interna- tional Conference on Very Large Data Bases. Santiago, Chile: Morgankaufman, 1994:487 - 499.
  • 6HAN J, PEI J, YIN Y. Mining frequent patterns with- out candidate generation [C]/// Proeeedlngs of the 2000 ACM Data. Dallas, United States: ACM, 2000:1-12.
  • 7FENG Pei-en, ZHANG Hui, QIU Qing-ying, et al. PCAR: an efficient approach for mining association rules [C]/// Proceedings of the ICNC-FSKD 2008 Inter- national Conference on Fussy Systems and Knowledge Dis- covery. Jinan: IEEE, 2008:605-609.
  • 8ZAKI M J. Scalable algorithms for association mining[J]. IEEE Transactions on Knowledge and Data Engi- neering, 2000,12(3) : 372- 390.
  • 9ZAKI M J. Fast vertical mining using diffsets [R]. Technical Report 01-1, Troy, New York: Rensselaer Polytechnic Institute. 2001.
  • 10HAN J, KAMBE M. Data mining: concepts and Tech- niques [M]. San Francisco, United States: Morgan Kaufmann Publishers Inc, 2001 : 231.

共引文献196

同被引文献65

引证文献8

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部