摘要
话题追踪(TT)是研究自动追踪事件动态发展过程的一种信息智能获取技术,是话题检测与追踪(TDT)技术的一个子任务,其目标在于自动发现新闻报道信息流中与某一已知话题有关的新报道。该文通过分析传统文档向量空间模型的不足,结合新闻报道的特征,提出了一种三维文档向量模型,在此基础上建立了一种符合新闻报道特征的话题模型。该话题模型在追踪过程中能够根据事件的动态发展进行自我学习和自我修正。结合话题模型,该文还设计了一种自适应的KNN新闻话题追踪器,从而形成了一种完整的中文话题追踪器模型。实验数据表明该方法在描述新闻话题、避免话题漂移方面具有一定优势,在中文话题追踪领域取得了较好效果。
Topic Tracking (TT), which grows out of the Topic Detection and Tracking (TDT) tasks, is a technolo gy of information intelligent acquisition for dynamic developments of events. Its aim is to automatically track the subsequent news stories of known events from the information stream of news media. By analyzing the lacks of tra- ditional document vector space model and the characteristics of news reports, this paper presents a new document vector model of 3 dimensions, which stresses the theme and entities of news stories. Then we proposed a topic mod- el consistent with the feature of news reports, which can adjust itself to the developments of events in the process of topic tracking by means of self-learning. Combining with the topic model, we also designed a complete adaptive KNN topic tricking model for Chinese topic tracking. The experimental results show that the proposed approach can accurately describe the news topic and effectively avoid theme drift and eventually achieve good performance in Chi- nese topic tracking.
出处
《中文信息学报》
CSCD
北大核心
2010年第5期70-76,共7页
Journal of Chinese Information Processing
基金
国家科技基础条件平台建设基金(2005DKA63901)