摘要
研究统计语言模型中bigram模型在自动文本分类中的应用,针对传统的向量空间模型在计算文本相似度时假设特征项相互独立的缺点,提出一种利用词对及词序信息来改善文本分类结果的方法。实验结果表明该方法是可行且有效的。
Tiffs paper studies the application of bigram model from statistical language model in the automatic text classification. Referring to the shortcoming of the hypothesis that the terms are independent from each other in VSM (Vector Space Model), it puts forward a method to improve the result of text classification with mutual words' information and sequence. The experiment shows that the method is feasible and efficient.
出处
《计算机与现代化》
2010年第3期141-143,共3页
Computer and Modernization