摘要
“中国农技推广”问答社区每天新增提问数据近万条,对提问的有效分类是实现智能问答的关键技术环节。海量提问数据具有特征稀疏性强、噪声大、规范性差的特点,制约了文本分类效果。为了改善农业问答问句短文本分类性能,提出了BiGRU_MulCNN分类模型,运用TFIDF算法拓展文本特征,并加权表示文本词向量,利用双向门控循环单元神经网络获取输入词向量的上下文特征信息,构建多尺度并行卷积神经网络,进行多粒度的特征提取。试验结果表明,基于混合神经网络的短文本分类模型可以优化文本表示和文本特征提取,能够准确地对用户提问进行自动分类,正确率达95.9%,与其他9种文本分类方法相比,分类性能优势明显。
With the rapid development of mobile internet,short text data of APPs has exploded.In the field of agriculture,tens of thousands of questions about agricultural technology have been put forward in agro-technical extension community.Accurate classification is the basis of agricultural intelligent Q&A and the guarantee of precise information service.In order to improve the performance of data classification,a short text classification method based on BiGRU_MulCNN model was proposed to overcome the limitations of the classification process,such as few vocabulary,sparse features,large amount of data,lots of noise and poor normalization.In the model,Jieba word segmentation tools and agricultural dictionary were selected to text segmentation,then TFIDF algorithm was adopted to expand the text characteristic and weighted word vector according to the text of key vector,and bi-directional gated recurrent unit was applied to catch the context feature information,multi-convolutional neural networks was finally established to gain local multidimensional characteristics of text.Batch-normalization,Dropout,Global Average Pooling and Global Max Pooling were involved to solve over-fitting problem.The results showed that the model could classify questions accurately,with an accuracy of 95.9%.Compared with other models,such as CNN model,RNN model and CNN/RNN combinatorial model,BiGRU_MulCNN had obvious advantages in classification performance in intelligent agro-technical information service.
作者
金宁
赵春江
吴华瑞
缪祎晟
李思
杨宝祝
JIN Ning;ZHAO Chunjiang;WU Huarui;MIAO Yisheng;LI Si;YANG Baozhu(School of Information and Electrical Engineering,Shenyang Agricultural University,Shenyang 110866,China;Graduate School,Shenyang Jianzhu University,Shenyang 110168,China;National Engineering Research Center for Information Technology in Agriculture,Beijing 100097,China;Beijing Research Center for Information Technology in Agriculture,Beijing 100097,China;Organization Department,Shenyang Jianzhu University,Shenyang 110168,China)
出处
《农业机械学报》
EI
CAS
CSCD
北大核心
2020年第5期199-206,共8页
Transactions of the Chinese Society for Agricultural Machinery
基金
国家自然科学基金项目(61871041、61571051)
北京市自然科学基金项目(4172024、4172026)。
关键词
农业信息分类
自然语言处理
双向门控循环单元神经网络
卷积神经网络
classification of agriculture information
natural language processing
bi-directional gated recurrent unit
convolutional neural network