基于数据增强和SimBert的语义匹配方法研究

Research on semantic matching method based ondata enhancement and SimBert

下载PDF

导出

摘要常见问题解答(frequently asked questions,FAQ)系统依赖语义相似度匹配,但面临训练数据不足和领域语义理解有限的问题.为此,提出融合句法信息和编辑向量的句子复述生成方法,通过检索模板句并构建编辑向量,增强预训练模型对句子差异的学习,实现FAQ数据增强.同时,提出基于SimBERT的混合特征语义匹配模型,结合关键词和意图特征提升匹配效果.实验结果显示,所提方法在Quora、ParaNMT-small、LCQMC、BQ Corpus等数据集上均优于基线模型,并有效提升了特定领域FAQ问答的性能. Frequently asked questions(FAQ)systems rely on semantic similarity matching,but face the problems of insufficient training data and limited domain semantic understanding.To this end,this paper proposes a sentence paraphrase generation method that integrates syntactic information and edit vectors.By retrieving template sentences and constructing edit vectors,the pre-trained model's learning of sentence differences is enhanced to achieve FAQ data enhancement.At the same time,a hybrid feature semantic matching model based on SimBERT is proposed,which combines keywords and intent features to improve the matching effect.Experimental results show that the proposed method outperforms the baseline model on datasets such as Quora,ParaNMT-small,LCQMC,and BQ Corpus,and effectively improves the performance of FAQ question answering in specific fields.

作者王东升王黎铭路曼韩斌王石薄其乐唐坤 WANG Dongsheng;WANG Liming;LU Man;HAN Bin;WANG Shi;BO Qile;TANG Kun(School of Computer,Jiangsu University of Science and Technology,Zhenjiang 212100,China;Key Laboratory of Intelligent Information Processing,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China)

机构地区江苏科技大学计算机学院中国科学院计算技术研究所智能信息处理重点实验室

出处《江苏科技大学学报(自然科学版)》 2025年第4期48-54,共7页 Journal of Jiangsu University of Science and Technology:Natural Science Edition

基金国家自然科学基金面上项目(61702234)。

关键词问答系统文本语义匹配自然语言处理数据增强 question and answer system text semantic matching natural language processing data enhancement

分类号 TP242.6 [自动化与计算机技术—检测技术与自动化装置]

引文网络
相关文献

参考文献1

1路曼,王东升,钟家国,李佳伟.融合句法信息和编辑向量的句子复述生成[J].中文信息学报,2024,38(10):165-174. 被引量：3

共引文献2

1赵华.基于转换器模型与句法信息融合的语音识别技术翻译优化研究[J].自动化与仪器仪表,2025(8):128-131.
2万福成,雷鑫鹏,王双,魏斌.中文小说短句序列文本复述数据集[J].中国科学数据(中英文网络版),2025,10(3):535-543.

1Saima Kanwal,Ali Raza,Chunyan Bai,Dawei Zhang,Jing Wen,Dileep Kumar.An Effective Machine Learning Approach with Hyperparameter Tuning for Sentiment Analysis[J].Data Intelligence,2025,7(1):70-94.
2姜克鑫,赵亚慧,崔荣一,陈科.基于深度编码与知识增强的句子匹配方法[J].计算机工程与应用,2025,61(8):126-134.
3路曼,王东升,钟家国,李佳伟.融合句法信息和编辑向量的句子复述生成[J].中文信息学报,2024,38(10):165-174. 被引量：3
4国内稀土[J].稀土信息,2025(4):8-15.
5李香.影像科检查常见问题解答[J].人人健康,2025(18):88-89.
6刘王卫,杨勇.基于机器学习的老年髋部骨折术后严重并发症预测模型构建和验证[J].合肥医科大学学报,2025,48(9):919-931. 被引量：1
7赵敏.一种轮式挖掘机轮胎种类的识别方法及装置[J].轮胎工业,2024,44(8):457-457. 被引量：1
8郭雪香.汉英对比视角下的二级笔译长难句翻译策略研究[J].现代语言学,2025,13(6):930-936.
9刘苏,刘珊珊,李治纲.中药煎服常见问题解答[J].科学生活,2025(9):134-135.
10黄湛钧,王逸鸣,尚文卓,闫佳宁,张安.基于GAT的异构模型元素语义相似度匹配方法[J].计算机集成制造系统,2025,31(3):869-876. 被引量：1

江苏科技大学学报(自然科学版)

2025年第4期

浏览历史

内容加载中请稍等...

基于数据增强和SimBert的语义匹配方法研究

参考文献1

共引文献2

相关作者

相关机构

相关主题

浏览历史