摘要
随着社会进步和信息化高速发展,网络数据规模大幅度扩大,面对大规模网络数据环境,基于Hadoop和Spark设计可拓展性大数据分析系统。系统Flume模块的Source组件负责采集大数据,Sink组件将大数据传输至Kafka;分析检测模块采用Spark离线训练可扩展性数据,将训练完成的模型传输到Spark streaming中,依据训练模型特征对普通大数据分类,获取可扩展性大数据。系统软件采用ALS算法、PageRank算法得到可扩展性大数据的有效性与价值度排名,据此向用户推荐优质可扩展性大数据。实验结果显示:系统分析可拓展性大数据精准度高于90%,优于对比系统,且具备低能耗、高稳定性的优点,实际应用价值高。
With the rapid development of social progress and information technology,the scale of network data has greatly expanded.In the face of large-scale network data environment,a scalable large-scale data analysis system is designed based on Hadoop and Spark.The Source component of the Flume module is responsible for collecting large data,and large data is transfered by the slink component to Kafka.The analysis and detection module uses Spark off-line training scalability data,the completed training model is transfered to Spark streaming,and the general large data is classified according to the characteristics of the training model,to obtain the scalable large data.ALS algorithm and PageRank algorithm is used to get the validity and value ranking of scalable large data,and high-quality scalable large data is accordingly recommended to users.The experimental results show that the system analysis has the advantages of low energy consumption,high stability and high practical application value.
作者
刘昕林
邓巍
黄萍
刘睿臻
LIU Xinlin;DENG Wei;HUANG Ping;LIU Ruizhen(Shenzhen Power Supply Bureau Limited,Shenzhen Guangdong 518048,China;Central South University,Changsha 410083,China)
出处
《自动化与仪器仪表》
2020年第3期132-136,共5页
Automation & Instrumentation