期刊文献+

基于三维文档向量的自适应话题追踪器模型 被引量:11

An Adaptive Topic Tracking Model Based on 3-Dimension Document Vector
在线阅读 下载PDF
导出
摘要 话题追踪(TT)是研究自动追踪事件动态发展过程的一种信息智能获取技术,是话题检测与追踪(TDT)技术的一个子任务,其目标在于自动发现新闻报道信息流中与某一已知话题有关的新报道。该文通过分析传统文档向量空间模型的不足,结合新闻报道的特征,提出了一种三维文档向量模型,在此基础上建立了一种符合新闻报道特征的话题模型。该话题模型在追踪过程中能够根据事件的动态发展进行自我学习和自我修正。结合话题模型,该文还设计了一种自适应的KNN新闻话题追踪器,从而形成了一种完整的中文话题追踪器模型。实验数据表明该方法在描述新闻话题、避免话题漂移方面具有一定优势,在中文话题追踪领域取得了较好效果。 Topic Tracking (TT), which grows out of the Topic Detection and Tracking (TDT) tasks, is a technolo gy of information intelligent acquisition for dynamic developments of events. Its aim is to automatically track the subsequent news stories of known events from the information stream of news media. By analyzing the lacks of tra- ditional document vector space model and the characteristics of news reports, this paper presents a new document vector model of 3 dimensions, which stresses the theme and entities of news stories. Then we proposed a topic mod- el consistent with the feature of news reports, which can adjust itself to the developments of events in the process of topic tracking by means of self-learning. Combining with the topic model, we also designed a complete adaptive KNN topic tricking model for Chinese topic tracking. The experimental results show that the proposed approach can accurately describe the news topic and effectively avoid theme drift and eventually achieve good performance in Chi- nese topic tracking.
出处 《中文信息学报》 CSCD 北大核心 2010年第5期70-76,共7页 Journal of Chinese Information Processing
基金 国家科技基础条件平台建设基金(2005DKA63901)
关键词 话题追踪 话题模型 三维文档向量模型 自适应KNN追踪器 topic tracking topic model g-dimensional document vector model adaptive KNN
  • 相关文献

参考文献19

  • 1Allan J.,2002a.Topic Detection and Tracking:Eventbased Information Organization[M].Dordrecht:Kluwer Academic.
  • 2洪宇,张宇,刘挺,李生.话题检测与跟踪的评测及研究综述[J].中文信息学报,2007,21(6):71-87. 被引量:153
  • 3Fiscus J,Doddiongton G.,Topic Detection and Tracking Evaluation overview[M].Dordrecht,London:Kluwer Academic Publishers,2002:17-30.
  • 4Watanabe Y Okaxta,K Kaneji,and Y Sakamoto.Multiple Media Database System for TV Newscasts and Newspapers[C]//Technical Report of IEIGE.Japan,1998,47254.
  • 5C Buckley and G Salton.Optimization of relevance feedback weights[C]//Proceedings of SIGIR '95.Washington,United States:Seattle,1995,351-357.
  • 6B Masland,GLinoff,and D Waltz.Classifying news stories using memory based reasoning[C]//Proceedings of SIGIR '92.Denmark:Copenhagen,1992:59-65.
  • 7Y.Zhang,J.G.Carbonell,J.Allan.Topic Detection and Tracking:Detection Task[C]//Proceedings of the Workshop of Topic Detection and Tracking,1997.
  • 8J Carbonell,Y Yang,J Lafferty,R D.Brown,TPierce,and X.Liu.CMU Report on TDT2:Segmentation,Detection and Tracking[C]//Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop.San Francisco:Morgan Kauffman,1999:117-120.
  • 9J Kupiec and J Pedersen.A trainable document summarizer[C]//Proceedings of the 18th Annual In ACM SIGIR Conf on Research and Development in Information Retrieval(SIGIR '95).Seattle,Washington,USA:ACM Press,1995:68-73.
  • 10James Allan,Ron Papka,Victor Lavrenko.Online New Event Detection and Tracking[C]//the proceedings of SIGIR '98.University of Massachusetts:Amherst,1998:37-45.

二级参考文献83

共引文献207

同被引文献118

引证文献11

二级引证文献50

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部