期刊文献+

基于Spark的实时数据采集与处理 被引量:6

Real time data acquisition and processing based on Spark
在线阅读 下载PDF
导出
摘要 社交平台与网络的飞速发展导致了数据量越来越大,使得实时数据量也成几何增长。实时数据分析越来越重要,已有的实时数据分析系统存在运算能力不足等问题。基于此,提出一种基于Spark的实时数据采集与处理方法。工作在分布式环境下的Spark具有处理大数据量能力,弥补了运算能力不足的问题。结合Flume,Kafka可以聚合多种数据源的特点,即使是不同的数据源Spark也能实时得到监控的数据流,调用Spark streaming模块对数据流实时处理并且可以将处理后的数据转存到其他处理组件或者数据库。实验结果表明,本方法可以对日志文件实时监控与分析并转存,有效的解决了实时数据的处理问题。 The rapid development of social platforms and networks has led to an increasing amount of data,making the amount of real-time data grow geometrically.Real time data analysis is becoming more and more important.The existing real-time data analysis systems have some problems,such as insufficient computing power and so on.Based on this,a real-time data acquisition and processing method based on spark is proposed.Spark,which works in a distributed environment,has the ability to process large amounts of data,making up for the lack of computing power.Combined with flume,Kafka can aggregate the characteristics of multiple data sources.Even for different data sources,spark can get the monitored data stream in real time.The Spark flow module is used to process data flows in real time.The processed data can be transferred to other processing components or databases.Experimental results show that this method can monitor,analyze and store log files in real time,and effectively solve the problem of real time data processing.
作者 黄涛 高丽婷 HUANG Tao;GAO Li-ting(Hebei University of Architecture,Zhangjiakou,075000)
出处 《河北建筑工程学院学报》 CAS 2022年第4期176-179,188,共5页 Journal of Hebei Institute of Architecture and Civil Engineering
关键词 分布式处理 实时数据流 日志文件 实时监控 Distributed processing Real time data flow Log file Real time monitoring
  • 相关文献

参考文献3

二级参考文献27

共引文献18

同被引文献44

引证文献6

二级引证文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部