期刊文献+

基于数据增强和SimBert的语义匹配方法研究

Research on semantic matching method based ondata enhancement and SimBert
在线阅读 下载PDF
导出
摘要 常见问题解答(frequently asked questions,FAQ)系统依赖语义相似度匹配,但面临训练数据不足和领域语义理解有限的问题.为此,提出融合句法信息和编辑向量的句子复述生成方法,通过检索模板句并构建编辑向量,增强预训练模型对句子差异的学习,实现FAQ数据增强.同时,提出基于SimBERT的混合特征语义匹配模型,结合关键词和意图特征提升匹配效果.实验结果显示,所提方法在Quora、ParaNMT-small、LCQMC、BQ Corpus等数据集上均优于基线模型,并有效提升了特定领域FAQ问答的性能. Frequently asked questions(FAQ)systems rely on semantic similarity matching,but face the problems of insufficient training data and limited domain semantic understanding.To this end,this paper proposes a sentence paraphrase generation method that integrates syntactic information and edit vectors.By retrieving template sentences and constructing edit vectors,the pre-trained model's learning of sentence differences is enhanced to achieve FAQ data enhancement.At the same time,a hybrid feature semantic matching model based on SimBERT is proposed,which combines keywords and intent features to improve the matching effect.Experimental results show that the proposed method outperforms the baseline model on datasets such as Quora,ParaNMT-small,LCQMC,and BQ Corpus,and effectively improves the performance of FAQ question answering in specific fields.
作者 王东升 王黎铭 路曼 韩斌 王石 薄其乐 唐坤 WANG Dongsheng;WANG Liming;LU Man;HAN Bin;WANG Shi;BO Qile;TANG Kun(School of Computer,Jiangsu University of Science and Technology,Zhenjiang 212100,China;Key Laboratory of Intelligent Information Processing,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China)
出处 《江苏科技大学学报(自然科学版)》 2025年第4期48-54,共7页 Journal of Jiangsu University of Science and Technology:Natural Science Edition
基金 国家自然科学基金面上项目(61702234)。
关键词 问答系统 文本语义匹配 自然语言处理 数据增强 question and answer system text semantic matching natural language processing data enhancement
  • 相关文献

参考文献1

共引文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部