摘要
针对基于词粒度的长短时记忆(LSTM)网络模型存在着无法充分学习上下文语义信息的问题,提出一种基于卷积和双向简单循环单元的文本分类模型(Conv-BSA)。利用卷积和局部池化操作提取并筛选n-gram信息,使用双向简单循环单元结构提取文本深层次语义特征,通过注意力机制对深层语义特征进行加权得到最终文本表示,借助softmax函数进行分类,达到高效分辨文本类别的目的。实验结果表明,Conv-BSA模型的分类准确率高达96.09%,优于即有主流模型。简单循环单元(SRU)能够提升分类准确率,降低训练耗时。
To address the problem of being unable to fully learn long contextual semantic information in the model of word-level long short-term memory(LSTM)network,a text classification model based on convolution and bidirectional simple recurrent unit(Conv-BSA)was proposed.The n-gram information was extracted and filtered using convolution and local pooling operations.The deep semantic features of the text were extracted using the bidirectional simple recurrent unit.The deep semantic features were weighted using the attention mechanism to obtain the final text representation.The softmax function was used for classification to achieve the purpose of efficiently distinguishing text categories in the last step.Experimental results show that the classification accuracy of the Conv-BSA model is as high as 96.09%,which is better than that of the mainstream model.The simple recurrent unit(SRU)not only improves classification accuracy,but reduces training time.
作者
陈天龙
喻国平
姚磊岳
CHEN Tian-long;YU Guo-ping;YAO Lei-yue(Information Engineering School,Nanchang University,Nanchang 330031,China;Center of Collaboration and Innovation,Jiangxi University of Technology,Nanchang 330098,China)
出处
《计算机工程与设计》
北大核心
2020年第3期838-844,共7页
Computer Engineering and Design
基金
江西省科技厅科技计划基金项目(20171BBE50060)
江西省教育厅科技计划基金项目(GJJ180978)
南昌市科技局指导性科技计划基金项目(洪科字[2018]39号-73)。
关键词
卷积层
双向简单循环单元
注意力机制
文本分类
文本表示
convolutional layer
bidirectional simple recurrent unit
attention mechanism
text classification
text representation