摘要
基于短文本的舆情分析是当前信息挖掘与情感分析领域的研究重点,针对网络环境中大量的短文本信息的鲜明特点,本文突破了传统基于词的分类方法,提出一种基于后缀数组频繁模式发现的聚类算法,利用后缀数组频繁模式精确去重算法得到关键词库,结合局部性原理对位置点聚类之后作有意义字串挖掘,进而进行文本舆情分析,以便及时动态了解网络群体的情感方向以及社会舆情热点。
The analysis of public opinion based on short text is the focus of the field of information mining and sensation analysis.Different from the traditional classification method based on words,a clustering algorithm,which based on suffix arrays is proposed.By removeing repetitive string accurately,meaningful strings are obtained after the clustering analysis of repeat string alterations in accordance with the principle of position.Public opinion toward these meaningful strings are analyzed and the dynamic emotional direction and social public opinion of network groups are discovered.
出处
《北京电子科技学院学报》
2010年第4期6-11,共6页
Journal of Beijing Electronic Science And Technology Institute
关键词
短文本
舆情分析
后缀数组
频繁模式
聚类
short text
public opinion analysis
suffix arrays
frequent pattern
clustering