摘要
由于短文本存在的特征稀疏的问题,所以导致了大多在长文本上能够取得优秀效果的算法在短文本上都不能取得很好的效果。基于前人在词向量嵌入技术上的研究提出改进方案,并从词向量的角度去扩充短文本的特征,意在缓解短文本特征稀疏的问题,在原始的短文本词向量的基础上引入主题向量,使得短文本得到了语义层面的特征扩充。并且基于短文本上下文内容较少的特点,选用了特征抽取能力很强的卷积神经网络作为最终的分类器。最终通过实验证明,该分类方案的分类效果较其他目前的研究成果有所提高。
Most of classification algorithms which can achieve excellent results in long texts can not achieve the satisfied result in short texts, because short text can not provide enough features for classification. This paper proposes an improvement scheme based on previous work on word embedding to enrich short texts, which is intented to solve the feature sparseness problem. Besides, this paper employes topic vector to improve the feature of short texts, which means the semantic features of short texts are extended. Finally, the CNN(convolutional neural network), the feature extraction capability is very strong, is employed as the classifier for short text classification. On an open short text classification dataset, we compared the proposed framework with other baselines, and experimental results validate the effectiveness of our method.
出处
《仪器仪表用户》
2017年第12期1-5,共5页
Instrumentation
关键词
主题模型
词嵌入
短文本分类
卷积神经网络
topic model
word embedding
short text classification
convolutional neural network