期刊文献+

多特征融合的中文短文本分类模型 被引量:14

Chinese Short Text Classification Model with Multi-feature Fusion
在线阅读 下载PDF
导出
摘要 针对中文短文本的特征提取中存在特征稀疏的局限性,本文提出了一种基于多特征融合的短文本分类模型(Multi-feature fusion model,M FFM).首先,通过字词向量结合的方式构建新的文本表示;其次,通过BILSTM(Bi-directional Long ShortTerm Memory)、CNN(Convolutional Neural Networks)和CAPSNET(Capsule Network)模型对短文本进行不同层面的特征提取,并使用Self-attention模型动态调节各模型特征在最终特征构建中的权重系数.在实验部分,本文用MFFM方法与四个短文本分类经典模型(CNN、BILSTM、CAPSNET和CNN-BILSTM)在三个中文短文本数据集上进行验证,为了进一步验证数据融合(将三个中文短文本数据正负样本融合)对MFFM的影响,实验结果表明MFFM模型性能在四个评价指标(F1、Recall、Precision、Accuracy)下优于对比模型.总之,这可表明M FFM是短文本分类模型的一个有用框架. In order to solve the limitation of feature sparsity in feature extraction in Chinese short text,this paper proposes a multi-feature fusion model(called MFFM)based on multi-feature fusion.First,a new text representation is constructed by combining words and vectors.Secondly,short texts are performed at different levels feature extract through BILSTM(Bi-directional Long Short-Term Memory),CNN(Convolutional Neural Networks)and CAPSNET(Capsule Network)models,and using the Self-attention model to dynamically adjust the weighting coefficients of each model feature in the final feature construction.In the experimental part,this paper uses MFFM method and four short text classification classic models(CNN,BILSTM,CAPSNET and CNN-BILSTM)to verify on three Chinese short text data sets.In order to further verify the impact of data fusion(this paper fuses positive and negative samples of three Chinese short text data.)on MFFM.The experimental results show that the performance of MFFM model is better than the comparison model under the four evaluation indexes(F1,Recall,Precision,Accuracy).In summary,it can prove that MFFM is a useful framew ork for short text classification models.
作者 杨朝强 邵党国 杨志豪 相艳 马磊 YANG Zhao-qiang;SHAO Dang-guo;YANG Zhi-hao;XIANG Yan;MA Lei(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650504,China)
出处 《小型微型计算机系统》 CSCD 北大核心 2020年第7期1421-1426,共6页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(61462054,61732005,61672271,61741112)资助 云南省自然科学基金项目(2017FB098)资助 国家博士后面上科学基金项目(2016M592894XB)资助 云南省重大科技项目(2018ZF017)资助。
关键词 中文短文本分类 字词向量结合 特征融合 Self-attention模型 Chinese short text classification word vector combination feature fusion Self-attention
  • 相关文献

参考文献3

二级参考文献38

  • 1王细薇,樊兴华,赵军.一种基于特征扩展的中文短文本分类方法[J].计算机应用,2009,29(3):843-845. 被引量:36
  • 2哈斯巴特尔.关于蒙古语人称代词词干变格问题[J].民族语文,2001(3):27-33. 被引量:5
  • 3吴健,吴朝晖,李莹,邓水光.基于本体论和词汇语义相似度的Web服务发现[J].计算机学报,2005,28(4):595-602. 被引量:217
  • 4L. Rocha, F. Mourao, H. Mota et al., "Temporal contexts: Ef- fective text classification in evolving document collections", In- formation Systems, Vol.38, No.3, pp.388-409, 2012.
  • 5M.T. Fardanesh, "Classification accuracy improvement of neu- ral network classifiers by using unlabeled data", IEEE Trans- actions on Geoscienee and Remote Sensing, Vol.36, No.3, pp.1020 1025, 1998.
  • 6T. Joachims, "Transductive inference for text classification us- ing support vector machines", Proc. of the Sixteenth In- ternational Conference on Machine Learning, Bled, Slovenia, pp.200-209, 1999.
  • 7Y. Tsuruoka, J. Tsujii, "Training a naive bayes classifier via the EM algorithm with a class distribution constraint", Proc. of the Seventh Conference on Natural Language Learning, Edmonton, Canada, pp.127-134, 2003.
  • 8R. Kothari, V. Jain, "Learning from labeled and unlabeled data using a minimal number of queries", IEEE Transaction on Neu- ral Networks, Vol.14, No.6, pp.1496 1505, 2003.
  • 9M. Efron, P. Organisciak, K. Fenlon, "Improving retrieval of short texts through document expansion", Proc. of the 35th International A CM SIGIR Conference on Research and Devel- opment in Information Retrieval, Portland, OR, United states, pp.911-920, 2012.
  • 10V. Vapnik, "The Nature of Statistical Learning Theory, Springer- Verlag, New York, 1999.

共引文献34

同被引文献97

引证文献14

二级引证文献73

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部