摘要
微博是近年出现的新型社交媒体形式,具有内容碎片化、传播方式快捷迅速、交互性强等自身特点。传统的向量空间模型难以准确度量文本间的相似度,本文使用LDA主题模型可以有效解决数据稀疏性问题,并通过聚类算法最终发现热点话题。
As a new fomas of social media, Micro-blog has its characteristics, such as content fragmentation, quich speak way and interractive.The tradional Vector Space Model( VSM )can't accurately measure the similariW of the texts.This passage presents a model based on latent dirichlet allocation ( LDA ) to reduce the sparseness of short texts and finally obtain the hot topic through k-means clustering.
出处
《网络安全技术与应用》
2014年第4期5-6,共2页
Network Security Technology & Application
关键词
LDA
聚类
热点发现
LDA
clustering
topics extraction