期刊文献+

基于多任务学习的跨类型文本分类技术研究 被引量:1

Research on Cross-Type Text Classification Technology Based on Multi-Task Learning
原文传递
导出
摘要 【目的】解决常规文本分类任务中由于领域训练数据稀缺、类型间差异大等因素导致的分类准确率低的问题。【方法】引入深层金字塔卷积网络与多门控制单元机制,构建基于BERT-DPCNN-MMOE模型框架的分类模型,通过设计多任务实验和迁移学习实验,对标8种基线模型,验证本文分类模型的有效性。【结果】自主构建多任务跨类型的数据作为训练测试基础,发现本文模型在多任务实验和迁移学习实验中的分类效果均优于8种基线模型,F1值的提升幅度均超过4.7个百分点。【局限】模型在其他领域的适应性需进一步研究。【结论】基于BERT-DPCNN-MMOE分类模型在多任务、跨类型文本分类任务上能够表现出更优的效果,对未来专题情报分类任务具有重要意义。 [Objective]This study addresses the issue of low classification accuracy in conventional text classification tasks due to factors such as sparse domain-specific training data and significant differences between types.[Methods]We constructed a novel classification model based on the BERT-DPCNN-MMOE framework,integrating the deep pyramid convolutional networks with the multi-gate control unit mechanism.Then,we designed multi-task and transfer learning experiments to validate the effectiveness of the new model against eight well-established and innovative models.[Results]This research independently constructed cross-type multi-task data as the basis for training and testing.The BERT-DPCNN-MMOE model outperformed the other eight baseline models in multi-task and transfer learning experiments,with F1 score improvements exceeding 4.7%.[Limitations]Further research is needed to explore the model's adaptability to other domains.[Conclusions]The BERT-DPCNN-MMOE model performs better in multi-task and cross-type text classification tasks.It is of significance for future specialized intelligence classification tasks.
作者 宋东桓 胡懋地 丁洁兰 瞿子皓 常志军 钱力 Song Donghuan;Hu Maodi;Ding Jielan;Qu Zihao;Chang Zhijun;Qian Li(National Science Library,Chinese Academy of Sciences,Beijing 100190,China;Department of Information Resources Management,School of Economics and Management,University of Chinese Academy of Sciences,Beijing 100190,China;Key Laboratory of New Publishing and Knowledge Services for Scholarly Journals,National Press and Publication Administration,Beijing 100190,China)
出处 《数据分析与知识发现》 北大核心 2025年第2期12-25,共14页 Data Analysis and Knowledge Discovery
基金 国家重点研发计划项目(项目编号:2022YFF0711900)的研究成果之一。
关键词 多任务学习 跨类型文本分类 迁移学习 集成学习 Multi-Task Learning Cross-Type Text Classification Transfer Learning Ensemble Learning
  • 相关文献

参考文献13

二级参考文献64

共引文献78

同被引文献14

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部