期刊文献+

科技文献中作者研究兴趣动态发现 被引量:13

Dynamic finding of authors' research interests in scientific literature
在线阅读 下载PDF
导出
摘要 针对挖掘大规模科技文献中作者、主题和时间及其关系的问题,考虑科技文献的内外部特征,提出了一个作者主题演化(AToT)模型。模型中文档表示为一定概率比例的主题混合体,每个主题对应一个词项上的多项分布和一个随时间变化的贝塔分布,主题-词项分布不仅由文档中单词共现决定,同时受文档时间戳影响,每个作者也对应一个主题上的多项分布。主题-词项分布与作者-主题分布分别用来描述主题随时间变化的规律和作者研究兴趣的变化规律。采用吉布斯采样的方法,通过学习文档集可以获得模型的参数。在1700篇NIPS会议论文集上的实验结果显示,作者主题演化模型可以描述文档集中潜在的主题演化规律,动态发现作者研究兴趣的变化,可以预测与主题相关的作者,与作者主题模型相比计算困惑度更低。 To solve the problems of mining relationships among topics, authors and time in large scale scientific literature corpora, this paper proposed the Author-Topic over Time (AToT) model according to the intra-features and inter-features of scientific literature. In AToT, a document was represented as a mixture of probabilistic topics and each topic was correspondent with a muhinomial distribution over words and a beta distribution over time. The word-topic distribution was influenced not only by word co-occurrence but also by document timestamps. Each author was also correspondent with a multinomial distribution over topics. The word-topic distribution and author-topic distribution were used to describe the topics evolution and research interests changes of the authors over time respectively. Parameters in AToT could be learned from the documents by employing methods of Gibbs sampling. The experimental results by running in the collections of 1 700 NIPS conference papers show that AToT model can characterize the latent topics evolution, dynamically find authors' research interests and predict the authors related to the topics. Meanwhile, AToT model can also lower perplexity compared with the author-topic model.
出处 《计算机应用》 CSCD 北大核心 2013年第11期3080-3083,共4页 journal of Computer Applications
关键词 主题模型 时序分析 无监督学习 文本模型 困惑度 topic model temporal analysis unsupervised learning text model perplexity
  • 相关文献

参考文献14

  • 1刘桃,刘秉权,徐志明,王晓龙.领域术语自动抽取及其在文本分类中的应用[J].电子学报,2007,35(2):328-332. 被引量:31
  • 2韩红旗,朱东华,汪雪锋.专利技术术语的抽取方法[J].情报学报,2011,30(12):1280-1285. 被引量:24
  • 3BLEI D M, NG A Y, JORDAN M I. Latent Dirichlet allocation [ J]. Journal of Machine Learning Research, 2003, 3:993 - 1022.
  • 4ROSEN-ZVI M, GRIFFITHS T, STEYVERS M, et al. The author- topic model for authors and documents [ C ]// Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence. Arlington:AUAI Press, 2004:487-494.
  • 5ROSEN-ZVI M, CHEMUDUGUNTA C, GRIFFITHS T, et al. Learning author-topic models from text corpora [ J]. ACM Transac- tions on Information Systems, 2010, 28(1) : 4.
  • 6STEYVERS M, SMYTH M, ROSEN-ZVI M, et al. Probabilistic author-topic models for information discovery [ C]// Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Dis- covery and Data Mining. New York: ACM Press, 2004:306 -315.
  • 7BLEI D, LAFFERTY J. Dynamic topic models [ C]// Proceedings of the 23rd International Conference on Machine Learning. New York: ACM Press, 2006:113 - 120.
  • 8WANG C, BLEI D, HECKERMAN D. Continuous time dynamic topic models [ C]// Proceedings of the 23rd Conference on Uncer- tainty in Artificial Intelligence. Arlington: AUAI Press, 2008:579 - 586.
  • 9NALLAPATI R, DITMORE S, LAFFERTY J, et al. Multiscale top- ic tomography [ C] //Proceedings of the 13th ACM SIGKDD Inter- national Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2007: 520-529.
  • 10IWATA T, YAMADA T, SAKURAI Y, et al. Sequential modeling of topic dynamics with multiple timescales [ J]. ACM Transactions on Knowledge Discovery from Data, 2012 5(4) : 19.

二级参考文献87

  • 1冯志伟.科技术语古今谈[J].术语标准化与信息技术,2005(2):4-8. 被引量:12
  • 2何燕,穗志方,段慧明,俞士汶.一种结合术语部件库的术语提取方法[J].计算机工程与应用,2006,42(33):4-7. 被引量:17
  • 3Deerwester S C, Dumais S T, Landauer T K, et al. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 1990.
  • 4Hofmann T. Probabilistic latent semantic indexing//Proceedings of the 22nd Annual International SIGIR Conference. New York: ACM Press, 1999:50-57.
  • 5Blei D, Ng A, Jordan M. Latent Dirichlet allocation. Journal of Machine Learning Research, 2003, 3: 993-1022.
  • 6Griffiths T L, Steyvers M. Finding scientific topics//Proceedings of the National Academy of Sciences, 2004, 101: 5228 5235.
  • 7Steyvers M, Gritfiths T. Probabilistic topic models. Latent Semantic Analysis= A Road to Meaning. Laurence Erlbaum, 2006.
  • 8Teh Y W, Jordan M I, Beal M J, Blei D M. Hierarchical dirichlet processes. Technical Report 653. UC Berkeley Statistics, 2004.
  • 9Dempster A P, Laird N M, Rubin D B. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 1977, B39(1): 1-38.
  • 10Bishop C M. Pattern Recognition and Machine Learning. New York, USA: Springer, 2006.

共引文献297

同被引文献214

引证文献13

二级引证文献126

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部