期刊文献+

一种话题演化建模与分析方法 被引量:26

Modeling and Analyzing Topic Evolution
在线阅读 下载PDF
导出
摘要 根据时序关系将文本流划分为连续时间片中的文本集,在线抽取各时间片中隐含的子话题,采用模型选择方法动态确定各时间片包含的子话题数,以历史时间片的子话题信息作为当前子话题发现的先验知识,基于OLDA(Online latent Dirichlet allocation)模型抽取各时间片包含的子话题,通过Gibbs抽样对话题模型参数进行估计;对子话题进行关联分析,定义子话题产生、消亡、继承、分裂和合并五种演化类型,提出基于相对熵的子话题关联分析方法,根据子话题语义相似度和时序关系建立子话题间的关联,由具有时序关系和内容关联的子话题组成话题,通过子话题内容和强度的变化描述话题演化.基于真实网络新闻的话题演化分析实验表明,本文提出的话题演化分析方法能够有效检测网络新闻话题内容和强度的演化. Topic evolution of network public opinions is investigated. By treating topics as a set of correlated sub-topics, a topic evolution model is proposed, consisting of sub-topic detection and correlation analysis. Furthermore, a sub-topic detection algorithm based on OLDA is presented with Bayesian model selection for the appropriate topic numbers and parameters estimation via Gibbs sampling. The correlations are further defined for analysis of topic evolution, including emergence, extinction, development, merge and split of sub-topics. The method is experimentally verified to be efficient for detecting topic evolution of network public opinions.
出处 《自动化学报》 EI CSCD 北大核心 2012年第10期1690-1697,共8页 Acta Automatica Sinica
基金 国家自然科学基金(60902094 60903225 41001260) 高等学校博士学科点专项科研基金(20114307110008)资助~~
关键词 话题演化 OLDA 模型 模型选择 Gibbs 抽样 相对熵 关联分析 Topic evolution, online latent Dirichlet allocation (OLDA), model selection, Gibbs sampling, relative entropy,correlation analysis
  • 相关文献

参考文献23

  • 1Allan J, Carbonell J G, Doddington G, Yamron J, Yang Y M, Umass J A, Cmu B A, Cmu D B, Cmu A B, Cmu R B, Dragon I C, Darpa G D, Cmu A H, Cmu J L, Umass V L, Cmu X L, Dragon S L, Van Mulbregt Dragon P, Umass R P, Cmu T P, Umass J P, Umass M S. Topic detection and tracking pilot study: Final report. In: Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop. San Francisco, USA: Morgan Kaufmann, 1998. 194-218.
  • 2单斌,李芳.基于LDA话题演化研究方法综述[J].中文信息学报,2010,24(6):43-49. 被引量:91
  • 3NIST. Topic Detection and Tracking Evaluation (TDT 2002) [Online], available: http://www.itl.nist.gov/iad/ mig//tests/tdt/, April 28, 2012.
  • 4Blei D M, Ng A Y, Jordan M I. Latent Dirichlet allocation. Journal of Machine Learning Research, 2003, 3:993-1022.
  • 5Wang X R, McCallum A. Topics over time: A non-Markov continuous-time model of topical trends. In: Proceedings of the 12th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD). Philadelphia, USA: ACM, 2006. 424-433.
  • 6Blei D M, Lafferty J D. Dynamic topic models. In: Pro- ceedings of the 23rd International Conference on Machine Learning. Pittsburgh, USA: ACM, 2006. 113-120.
  • 7Wang C, Blei D, Heckerman D. Continuous time dynamic topic models. In: Proceedings of the 23rd Conference on Un- certainty in Artificial Intelligence. Helsinki, Finland: AUAI, 2008. 579-586.
  • 8Nallapati R M, Cohen W, Ditmore S, Lafferty J, Ung K. Multiscale topic tomography. In: Proceedings of the 13th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD). San Jose, USA: ACM, 2007. 520- 529.
  • 9Wei X, Sun J M, Wang X R. Dynamic mixture models for multiple time series. In: Proceedings of the 20th Inter- national Joint Conference on Artificial Intelligence. Hyder- abad, India: ACM, 2007. 2909-2914.
  • 10Song X D, Lin C Y, Tseng B L, Sun M T. Modeling and predicting personal information dissemination behav- ior. In: Proceedings of the 11th ACM International Confer- ence on Knowledge Discovery and Data Mining (SIGKDD). Chicago, USA: ACM, 2005. 479-488.

二级参考文献112

  • 1于满泉,骆卫华,许洪波,白硕.话题识别与跟踪中的层次化话题识别技术研究[J].计算机研究与发展,2006,43(3):489-495. 被引量:49
  • 2徐晓日.网络舆情事件的应急处理研究[J].华北电力大学学报(社会科学版),2007(1):89-93. 被引量:144
  • 3朱靖波,叶娜,罗海涛.基于多元判别分析的文本分割模型[J].软件学报,2007,18(3):555-564. 被引量:15
  • 4石晶,戴国忠.基于PLSA模型的文本分割[J].计算机研究与发展,2007,44(2):242-248. 被引量:25
  • 5Kehagias A, Nicolaou A, Petridis V, Fragkou P. Text segmentation by product partition models and dynamic programming. Mathematical and Computer Modeling, 2004, 39(2-3): 209-217.
  • 6Gina-Anne L. Prosody-based topic segmentation for mandarin broadcast news. In: Proceedings of the 9th American Chapter of the Association for Computational Linguistics- Human Language Technologies. Boston, USA: Association for Computational Linguistics, 2004. 137-140.
  • 7Olivier F. Using collocations for topic segmentation and link detection. In: Proceedings of the 19th International Conference on Computational Linguistics. Taipei, China: Association for Computational Linguistics, 2002. 1-7.
  • 8Li H, Yamanishi K. Topic analysis using a finite mixture model. Information Processing and Management, 2003, 39(4): 521-541.
  • 9Hofmann T. Probabilistic latent semantic analysis. In: Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence. Stockholm, Sweden: Morgan Kaufmann, 1999. 289-296.
  • 10Blei D M, Ng A Y, Jordan M I. Latent dirichlet allocation. Journal of Machine Learning Research, 2003, 3:993-]022.

共引文献242

同被引文献221

引证文献26

二级引证文献249

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部