摘要
提出一种基于词汇链的中文短信文本主题的抽取方法。该方法首先通过构造多条词汇链来表达短信文本的叙事线索,并从多条词汇链中抽取出富含主题信息的词汇链,将其作为构造短信文本主题语句的关键词序列。实验表明该方法抽取的短信文本主题能够更全面地覆盖短信文本的信息,并能消除多个关键词序列表达同一主题信息的冗余。其效果明显优于采用统计信息进行短信文本主题抽取的方法。
An algorithm for Chinese SMS text topic extraction based on lexical chain is proposed. By constructing lexical chains for each SMS text, the article’s multiple depiction clews can be reflected, and some strong lexical chains with high quality can be extracted to represent main content of this article, and as the subject phrase SMS text structure sequence. Experiments demonstrate that SMS text topic from this algorithm can cover SMS text information more completely. This algorithm can remove redundancy that different keyword sequence reflect same meanings. This method outperforms the method which uses statistics to perform extraction.
出处
《计算机工程与应用》
CSCD
2012年第7期132-134,共3页
Computer Engineering and Applications
基金
淮安科技计划项目(No.HAG09061)
淮阴工学院重点基金项目(No.HGA0907)
关键词
短信文本
词汇链
主题语句
抽取方法
short message text
lexical chain
theme statement
extraction method