期刊文献+

基于单句粒度的微博主题挖掘研究 被引量:7

Research of Micro-Blog Topics Mining Based on Sentence Granularity
在线阅读 下载PDF
导出
摘要 针对现有主题挖掘方法的不足,本文提出一种以句子为粒度的微博主题挖掘方法。首先,以标点符号为依据进行微博文本的句子划分,选择名词和动词为特征词来表征句子;其次,以高频特征词在微博文本集中的共现频次为基础构建词语相似矩阵,辅助计算句子相似度,构建句子相似矩阵;然后,以句子相似矩阵为基础进行聚类分析,通过分析聚类结果实现主题发现;最后,利用改进的LexRank算法计算各主题句子的重要度值,组合重要度值高的句子生成主题摘要,以完成对主题的描述。文章通过实验证明了该方法的可行性。 For the lack of an existing topic mining methods, this paper proposes to carry out mining micro blog topics based on the sentence. First of all, we divide the micro blog text into sentences according to the punctuation and select the nouns and verbs as key words to characterize the sentence. Secondly, we build a word similarity matrix according to the cooccurrence frequency of the high frequency key words in the micro blog text sets, then calculation of sentence similarity based on the matrix and eonstruction sentence similarity matrix. Next, the sentence similarity matrix is being cluster analysis, and then analysis of clustering results achieve topic discovery. At last, we calculate the importance value of the topic sentence by the improved LexRank algorithm, and complete description of the topic by combining sentences to generate high importance value. The experiment proves the feasibility of this method.
作者 唐晓波 肖璐
出处 《情报学报》 CSSCI 北大核心 2014年第6期623-632,共10页 Journal of the China Society for Scientific and Technical Information
基金 国家自然科学基金资助项目“社会化媒体集成检索与语义分析方法研究”(项目编号:71273194)的研究成果之一
关键词 单句粒度 词语相似矩阵 主题挖掘 sentence granularity, word similarity matrix, topics mining
  • 相关文献

参考文献25

  • 1闫幸,常亚平.微博研究综述[J].情报杂志,2011,30(9):61-65. 被引量:93
  • 2马雯雯,魏文晗,邓一贵.基于隐含语义分析的微博话题发现方法[J].计算机工程与应用,2014,50(1):96-100. 被引量:37
  • 3Banerjee S, Ramanathan K, Gupta A. Clustering short texts using Wikipedia [ C ]//Proceedings of the 30ts Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2007: 787-788.
  • 4Hu X, Sun N, Zhang C, et al. Exploiting internal and external semantics for the clustering of short texts using world knowledge [ C ]//Proceedings of the 18's ACM Conference on Information and Knowledge Management. ACM, 2009: 919-928.
  • 5Sahami M, Heilman T D. A web-based kernel function for measuring the similarity of short text snippets [ C ]// Proceedings of the 15'h International Conference on World Wide Web. ACM, 2006: 377-386.
  • 6Zelikovitz S,Kogan M. Using Web Searches on Important Words to Create Background Sets for LSI Classification [C ]//Proceedings of the 19'h International FLAIRS Conference. AAAI, 2006: 598-603.
  • 7宁亚辉,樊兴华,吴渝.基于领域词语本体的短文本分类[J].计算机科学,2009,36(3):142-145. 被引量:41
  • 8Gabrilovich E, Markovitch S. Feature generation for text categorization using world knowledge[ C]//Proeeedings of the 19'h International Joint Conference on Artificial Intelligence,2005, 5 : 1048-1053.
  • 9金春霞,周海岩.动态向量的中文短文本聚类[J].计算机工程与应用,2011,47(33):156-158. 被引量:10
  • 10Shen D, Yang Q, Sun J T, et al. Thread detection in dynamic text message streams [ C ]//Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2006: 35-42.

二级参考文献179

共引文献329

同被引文献92

  • 1段平.如何撰写科技论文英文信息型摘要[J].大学英语,2000(12):51-52. 被引量:3
  • 2陈涛,谢阳群.文本分类中的特征降维方法综述[J].情报学报,2005,24(6):690-695. 被引量:79
  • 3马文峰,杜小勇.关于知识组织体系的若干理论问题[J].中国图书馆学报,2007,33(2):13-17. 被引量:27
  • 4梁凯强,陆菊康.基于领域本体与概念格的关联规则挖掘[J].计算机工程与设计,2007,28(13):3033-3035. 被引量:5
  • 5BHATIA S,MAJUMDAR D,MITRA P.Query suggestions in the absence of query logs[C]//International ACM SIGIR Conference on Research&Development in Information Retrieval,July 24-28,2011,Beijing,China.New York:ACM Press,2011:795-804.
  • 6HE J,HOLLINK V,DE VRIES A.Combining implicit and explicit topic representations for result diversification[C]//The35th international ACM SIGIR conference on Research and development in information retrieval,August 12-16,2012,Poreland,OR,USA.New York:ACM Press,2012:851-860.
  • 7ZHU X,GUO J,CHENG X,et al.A unified framework for recommending diverse and relevant queries[C]//World Wide Web Conference Series,March 28-April 1,2011,Hyderabad,India.New York:ACM Press,2011:37-46.
  • 8KIM S J,SHIN K Y,LEE J H.Hierarchical subtopic mining for topic annotation[C]//The 6th international workshop on exploiting semantic annotations in information retrieval,October 28,2013,San Francisco,CA,USA.New York:ACM Press,2013:49–52.
  • 9DANG V,CROFT B W.Term level search result diversification[C]//International ACM SIGIR Conference on Research&Development in Information Retrieval,July 28-August 1,2013,Dublin,Ireland.New York:ACM Press,2013:603-612.
  • 10曾依灵,许洪波,白硕.网络文本主题词的提取与组织研究[J].中文信息学报,2008,22(3):64-70. 被引量:14

引证文献7

二级引证文献52

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部