摘要
针对挖掘用户上网日志热词的需求,本文设计并实现了一个基于Storm的网络日志实时分析系统。经测试,该系统在处理流式日志数据时性能较好,可在少于10s时间内计算300个左右热词,无需人工批处理操作,在性能上和便利性上明显优于离线Hadoop计算。
In order to tap on usersrequirements in Web logs,we design and implement a real-time analysis system based on web logs of Storm,after the test,it can handle better performance when dealing with current log data.It can calculate about 300 words in less than 10 minutes without artificial batch operation.This system is obviously better than offline Hadoop calculations as far as convenience and performance is concerned.
作者
何雅琴
李涛
He Yaqin Li Tao(Department of Information Engineering, Changzhou Institute of Mechatronic Technology, Changzhou 213164, China)
出处
《信息化研究》
2016年第4期23-27,共5页
INFORMATIZATION RESEARCH
基金
2016年江苏省青蓝工程资助项目(苏教师(2016)15号)
江苏省高校优秀中青年教师和校长境外研修计划资助项目(苏教师〔2014〕22号)