摘要
针对中文短文本的特征提取中存在特征稀疏的局限性,本文提出了一种基于多特征融合的短文本分类模型(Multi-feature fusion model,M FFM).首先,通过字词向量结合的方式构建新的文本表示;其次,通过BILSTM(Bi-directional Long ShortTerm Memory)、CNN(Convolutional Neural Networks)和CAPSNET(Capsule Network)模型对短文本进行不同层面的特征提取,并使用Self-attention模型动态调节各模型特征在最终特征构建中的权重系数.在实验部分,本文用MFFM方法与四个短文本分类经典模型(CNN、BILSTM、CAPSNET和CNN-BILSTM)在三个中文短文本数据集上进行验证,为了进一步验证数据融合(将三个中文短文本数据正负样本融合)对MFFM的影响,实验结果表明MFFM模型性能在四个评价指标(F1、Recall、Precision、Accuracy)下优于对比模型.总之,这可表明M FFM是短文本分类模型的一个有用框架.
In order to solve the limitation of feature sparsity in feature extraction in Chinese short text,this paper proposes a multi-feature fusion model(called MFFM)based on multi-feature fusion.First,a new text representation is constructed by combining words and vectors.Secondly,short texts are performed at different levels feature extract through BILSTM(Bi-directional Long Short-Term Memory),CNN(Convolutional Neural Networks)and CAPSNET(Capsule Network)models,and using the Self-attention model to dynamically adjust the weighting coefficients of each model feature in the final feature construction.In the experimental part,this paper uses MFFM method and four short text classification classic models(CNN,BILSTM,CAPSNET and CNN-BILSTM)to verify on three Chinese short text data sets.In order to further verify the impact of data fusion(this paper fuses positive and negative samples of three Chinese short text data.)on MFFM.The experimental results show that the performance of MFFM model is better than the comparison model under the four evaluation indexes(F1,Recall,Precision,Accuracy).In summary,it can prove that MFFM is a useful framew ork for short text classification models.
作者
杨朝强
邵党国
杨志豪
相艳
马磊
YANG Zhao-qiang;SHAO Dang-guo;YANG Zhi-hao;XIANG Yan;MA Lei(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650504,China)
出处
《小型微型计算机系统》
CSCD
北大核心
2020年第7期1421-1426,共6页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61462054,61732005,61672271,61741112)资助
云南省自然科学基金项目(2017FB098)资助
国家博士后面上科学基金项目(2016M592894XB)资助
云南省重大科技项目(2018ZF017)资助。