期刊文献+

基于Hadoop和Spark的可扩展性大数据分析系统设计 被引量:12

Design of extensible big data analysis system based on Hadoop and Spark
原文传递
导出
摘要 随着社会进步和信息化高速发展,网络数据规模大幅度扩大,面对大规模网络数据环境,基于Hadoop和Spark设计可拓展性大数据分析系统。系统Flume模块的Source组件负责采集大数据,Sink组件将大数据传输至Kafka;分析检测模块采用Spark离线训练可扩展性数据,将训练完成的模型传输到Spark streaming中,依据训练模型特征对普通大数据分类,获取可扩展性大数据。系统软件采用ALS算法、PageRank算法得到可扩展性大数据的有效性与价值度排名,据此向用户推荐优质可扩展性大数据。实验结果显示:系统分析可拓展性大数据精准度高于90%,优于对比系统,且具备低能耗、高稳定性的优点,实际应用价值高。 With the rapid development of social progress and information technology,the scale of network data has greatly expanded.In the face of large-scale network data environment,a scalable large-scale data analysis system is designed based on Hadoop and Spark.The Source component of the Flume module is responsible for collecting large data,and large data is transfered by the slink component to Kafka.The analysis and detection module uses Spark off-line training scalability data,the completed training model is transfered to Spark streaming,and the general large data is classified according to the characteristics of the training model,to obtain the scalable large data.ALS algorithm and PageRank algorithm is used to get the validity and value ranking of scalable large data,and high-quality scalable large data is accordingly recommended to users.The experimental results show that the system analysis has the advantages of low energy consumption,high stability and high practical application value.
作者 刘昕林 邓巍 黄萍 刘睿臻 LIU Xinlin;DENG Wei;HUANG Ping;LIU Ruizhen(Shenzhen Power Supply Bureau Limited,Shenzhen Guangdong 518048,China;Central South University,Changsha 410083,China)
出处 《自动化与仪器仪表》 2020年第3期132-136,共5页 Automation & Instrumentation
关键词 Hodoop SPARK 可拓展性 ALS算法 大数据 分析系统 Hodoop Spark extensibility ALS algorithm large data analysis system
  • 相关文献

参考文献9

二级参考文献91

  • 1程思霖.基于四网协同下的WLAN发展策略及数据分流研究[J].电信工程技术与标准化,2013,26(10):20-23. 被引量:2
  • 2胡晓敏.无线传感器网络Agent数据分流策略[J].新型工业化,2013,2(4):103-112. 被引量:19
  • 3朱婕,靖继鹏,窦平安.国外信息行为模型分析与评价[J].图书情报工作,2005,49(4):48-51. 被引量:61
  • 4Apache Hadoop. Welcome to apache hadoop[EB/OL]. https://hadoop, apache, org/.
  • 5Spark. Lightning fast cluster computing[EB/OL]. https ://spark. apache, org/.
  • 6ZahariaM, Chowdhury M, Franklin M J, et al. Spark: Cluster computing with working sets [C]// Proceedings of the 2nd USENIX Conference on Hot Tropics in Cloud Computing. Boston, USA: USENIX, 2010: 10-14.
  • 7Xin R S, Rosen J, Zaharia M, et al. Shark: SQL and rich analytics at scale[C]// Proceedings of the ACM SIGMOD International Conference on Management ofData. New York, USA: ACM Press, 2013..13-24.
  • 8Abouzeid A, Bajda-Pawlikowski K, Abadi D, et al. HadoopDB.. An architectural hybrid of MapReduce and DBMS technologies for analytical workloads [J ]. Proceedings of the VLDB Endowment, 2009, 2 (1) .. 922-933.
  • 9Jiang D W, Ooi B C, Shi L, et al. The performance of MapReduce: An in-depth study[J]. Proceedings of the VLDB Endowment, 2010, 3(1-2): 472-483.
  • 10Dittrich J, Quian6-Ruiz J A, Jindal A, et al. Hadoop + +.. Making a yellow elephant run like a cheetah (without it even noticing) [J]. Proceedings of the VLDB Endowment, 2010, 3(1-2).. 515-529.

共引文献114

同被引文献100

引证文献12

二级引证文献25

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部