期刊文献+

基于Spark平台的FP-Growth算法优化与实现 被引量:3

Optimization and Implementation of FP-Growth Algorithm Based on Spark Platform
在线阅读 下载PDF
导出
摘要 针对FP-Growth算法面对海量数据挖掘时串行操作机制出现内存瓶颈或者数据挖掘失效等问题,提出将基于Spark平台的FP-Growth算法在数据分组策略和项头表结构两方面进行优化。一方面提出一种S型的负载权值均衡分组的方式;另一方面,设计出一种新的项头表结构,此结构包含Hash查找表,能有效降低查找时间复杂度。实验证明,优化的基于Spark平台的FP-Growth算法(OptFP-Spark算法)具有更高的并行运算加速比、更好的并行挖掘效果及更高效的计算效率。 In view of the defect of memory bottleneck or data mining failure found in FP growth algorithm when processing massive data mining,a new method has thus been proposed to optimize FP growth algorithm based on spark platform in data grouping strategy and item header table structure.On the one hand,an S-typed grouping method has been proposed,which can realize a balanced grouping of load weights.On the other hand,a new item header table structure of FP-Growth with a hash look-up table has been proposed,which can effectively reduce the complexity of look-up time.Experimental results show that,characterized with a very high computational efficiency,the optimized FP-Growth algorithm,which is based on Spark platform,has a higher speedup of parallel computing and better parallel mining efficiency.
作者 黄婕 HUANG Jie(Hunan Provincial Engineering Research Center for Aircraft Maintenance,Changsha 410124,China;Department of Aviation Electronic Equipment Maintenance,Changsha Aeronautical Vocational and Technical College,Changsha 410124,China;School of Software,Central South University,Changsha 410075,China)
出处 《湖南工业大学学报》 2020年第1期77-84,共8页 Journal of Hunan University of Technology
基金 湖南省教育厅科学研究基金资助项目(17C0009)
关键词 SPARK 关联规则 频繁项集 FP-GROWTH Spark association rule frequent item set FP-Growth
  • 相关文献

参考文献7

二级参考文献57

  • 1邱勇,兰永杰.高效FP-TREE创建算法[J].计算机科学,2004,31(10):98-100. 被引量:4
  • 2冀俊忠,沙志强,刘椿年.贝叶斯网模型在推荐系统中的应用研究[J].计算机工程,2005,31(13):32-34. 被引量:9
  • 3Han Jiawei,Kamber Micheline,范明,孟小峰,等译.数据挖掘概念与技术[M].北京:机械工业出版社,2007:424-479.
  • 4Han Jiawei, Pei Jian, ytin Yiwen. Mining frequent patterns without candidate generation [ C ]//SIGMOD' 00. [ s. 1.] :[ s. n. ] ,2000.
  • 5Agrawal R, Shafer J C. Parallel mining of association rules[J].1EEE Transactions on knowledge and date engineering, 1996,8(6) :962-969.
  • 6Jeffrey D, Sanjay G. MapReduce: Simplified Data Processing on Large Clusters [ J ]. Commun. ACM, 2008,51 ( 1 ) : 107 -113.
  • 7Chu C, Kim S,Lin Y,et al. Map-Reduce for machine learning on nlulticore [ C ]//NIPS' 06. Cambridge, MA: MIT Press, 2O06.
  • 8Hadoep. The Apache Software Foundation [ EB/OL ]. 2010. ht- tp ://hadoop. apache, org/.
  • 9Bill Franks.驾驭大数据[M].黄海,车皓阳,王悦,等,译.北京:人民邮电出版社,2013.
  • 10城田真琴.大数据的冲击[M].北京:人民邮电出版社,2013.

共引文献59

同被引文献28

引证文献3

二级引证文献16

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部