摘要
针对目前互联网"富信息化"现象,提出了基于机器学习的网络热点话题预测的思想。该思想通过总结能尽量准确描述热点话题的一组特征,得到每篇新闻各自的特征向量,并针对大量近期已知是否热门的随机新闻样本内容进行聚类处理。基于健壮精准的分类算法,利用支持向量机将向量映射到高维空间达到分类目的。在机器学习过程中,采用大量试验的方法修改并完善特征向量的组成、度量及权重,最终达到准确作出热点话题预测的目的。
Specific to the phenomenon of ″rich informationization″,an idea of Internet hot topic forecasting is proposed in this paper. The core of this idea is to summarize a set of relevant features of the hot topics in order to obtain the feature vectors of the sample news. Based on these features, therandom sample contents of a great deal of latest news are clustered, which means whether the news is a hot topic or not had been known to all. On the basis of theselected robust and accurate classification algorithm , the support vector machine is used to map the vectors into a higher dimensional space for the purpose of data classification. In the process of machine learning, the composition, the measurement and the weight of the feature vectors are modified and improved through trials and errors, thus to realize the accurate forecasting of hot topics.
出处
《微型机与应用》
2014年第15期62-64,共3页
Microcomputer & Its Applications
基金
北京对外文化交流与世界文化研究基地项目(BWSK201303)
北京外国语大学公共外交研究中心
北京市社科联青年社科人才资助项目(2013SKL030)
北京高等学校青年英才计划项目(YETP0847)
关键词
机器学习
网络媒体
热点话题
特征向量
分词
预测
machine learning
network media
hot topic
feature vector
classification
forecasting