摘要
[目的/意义]专业领域微博往往具有话题的高维稀疏性,探寻此类情境下微博热点话题挖掘的高效模型,以便相关管理部门快速掌握领域近况并进行决策。[方法/过程]提出高维稀疏情境下微博热点话题挖掘模型,引入领域词典监督预处理微博文本,基于朴素贝叶斯分类器进行特定领域信息识别,采用"密度—距离"快速搜索聚类算法实现领域热点话题挖掘,并以国土资源领域为典型进行实证。[结果/结论]本文模型能在高维稀疏情境下准确识别专业领域信息并挖掘出热点话题,有助于专业领域微博舆情分析与预警。
[Purpose/significance]Microblogs in professional fields often have high-dimensional sparseness of topics.Exploring efficient models for mining hot topic topics in such situations,so that relevant management departments can quickly grasp the status of the field and make decisions.[Method/process]This paper proposes a microblog hot topic mining model in the high-dimensional sparse context,introduces a domain dictionary to supervise preprocessing of microblog text,uses Naive Bayes classifier to identify domain-specific information,and uses clustering by fast search and find of density peaks algorithm to realize domain hotspot topic mining,and empirical research on land and resources.[Result/conclusion]The model in this paper can accurately identify professional field information and dig out hot topics in high-dimensional sparse contexts,which is helpful for professional field Weibo public opinion analysis and early warning.
出处
《情报理论与实践》
CSSCI
北大核心
2020年第11期137-143,共7页
Information Studies:Theory & Application
基金
自然资源部城市国土资源监测与仿真重点实验室开放基金资助课题“我国国土资源的国际舆情监测与预警技术研究”的成果,项目编号:KF-2018-03-057。
关键词
高维稀疏
微博热点话题
信息识别
话题挖掘
国土资源
high-dimensional sparse
microblog hot topic
information recognition
topic mining
land resources field