结合预训练模型的双向门控图卷积对抗词义消歧

Combine the Pre-trained Model with Bidirectional Gated Recurrent Units and Graph Convolutional Network for Adversarial Word Sense Disambiguation

下载PDF

导出

摘要词义消歧(WSD)是提升计算机自然语言理解能力的关键技术,广泛应用于机器翻译、信息检索等领域。为解决现有模型在泛化与鲁棒性方面的不足,该文提出了一种基于预训练模型的双向门控循环单元(Bi-GRU)、交叉注意力(CA)和图卷积网络(GCN)融合的词义消歧模型,引入对抗训练(AT)来优化该模型。将歧义词左右词汇的词形、词性和语义类作为消歧特征,输入LERT获取动态词向量,利用交叉注意力融合Bi-GRU神经网络提取token序列的全局语义信息和CLS序列的局部语义信息,为消歧特征图生成更加完整的句子结点表示。将消歧特征图输入图卷积来更新结点之间的特征信息,然后利用插值预测层和语义分类层来确定歧义词的真实语义类别。计算输入动态词向量的梯度,生成细微的连续扰动,并将扰动加入到原始词向量矩阵中,生成对抗样本。利用对抗样本,融合网络的损失与对抗训练中的损失来优化消歧模型。实验结果表明,该方法不仅能够增强消歧模型处理复杂词汇歧义问题的能力,还能有效提高其鲁棒性和泛化能力,从而表现出更好的消歧性能。 Objective In Word Sense Disambiguation(WSD),the Linguistically-motivated bidirectional Encoder Representation from Transformer(LERT)is employed to capture rich semantic representations from large-scale corpora,enabling improved contextual understanding of word meanings.However,several challenges remain.Current WSD models are not sufficiently sensitive to temporal and spatial dependencies within sequences,and single-dimensional features are inadequate for representing the diversity of linguistic expressions.To address these limitations,a hybrid network is constructed by integrating LERT,Bidirectional Gated Recurrent Units(Bi-GRU),and Graph Convolutional Network(GCN).This network enhances the modeling of structured text and contextual semantics.Nevertheless,generalization and robustness remain problematic.Therefore,an adversarial training algorithm is applied to improve the overall performance and resilience of the WSD model.Methods An adversarial WSD method is proposed based on a pre-trained model,combining Bi-GRU and GCN.First,word forms,parts of speech,and semantic categories of the neighboring words of an ambiguous term are input into the LERT model to obtain the CLS sequence and token sequence.Second,cross-attention is applied to fuse the global semantic information extracted by Bi-GRU from the token sequence with the local semantic information derived from the CLS sequence.Sentences,word forms,parts of speech,and semantic categories are then used as nodes to construct a disambiguation feature graph,which is subsequently input into GCN to update the feature information of the nodes.Third,the semantic category of the ambiguous word is determined through the interpolated prediction layer and semantic classification layer.Fourth,subtle continuous perturbations are generated by computing the gradient of the dynamic word vectors in the input.These perturbations are added to the original word vector matrix to create adversarial samples,which are used to optimize the LERT+Bi-GRU+CA+GCN(LBGCA-GCN)model.A cross-entropy loss function is applied to measure the performance of the LBGCA-GCN model on adversarial samples.Finally,the loss from the network is combined with the loss from AT to optimize the LBGCA-GCN model.Results and Discussions When the FreeLB algorithm is applied,stronger adversarial perturbations are generated,and the FreeLB algorithm achieves the best performance(Table 2).As the number of perturbation steps increases,the strength of AT improves.However,when the number of steps exceeds a certain threshold,the LBGCA-GCN+AT(LBGCA-GCN-AT)model begins to overfit.The Free Large-Batch(FreeLB)algorithm demonstrates strong robustness with three perturbation steps(Table 3).The cross-attention mechanism,which fuses the token sequence with the CLS sequence,yields significant performance gains in complex semantic scenarios(Fig.3).By incorporating AT,the LBGCA-GCN-AT model achieves notable improvements across multiple evaluation metrics(Table 4).Conclusions This study presents an adversarial WSD method based on a pre-trained model,integrating Bi-GRU and GCN to address the weak generalization ability and robustness of conventional WSD models.LERT is used to transform discriminative features into dynamic word vectors,while cross-attention fuses the global semantic information extracted by Bi-GRU from the token sequence with the local semantic information derived from the CLS sequence.This fusion generates more complete node representations for the disambiguation feature graph.A GCN is then applied to update the relationships among nodes within the feature graph.The interpolated prediction layer and semantic classification layer are used to determine the semantic category of ambiguous words.To further improve robustness,the gradient of the dynamic word vector is computed and perturbed to generate adversarial samples,which are used to optimize the LBGCA-GCN model.The network loss is combined with the AT loss to refine the model.Experiments conducted on the SemEval-2007 Task#05 and HealthWSD datasets examine multiple factors affecting model performance,including adversarial algorithms,perturbation steps,and sequence fusion methods.Results demonstrate that introducing AT improves the model’s ability to handle real-world noise and perturbations.The proposed method not only enhances robustness and generalization but also strengthens the capacity of WSD models to capture subtle semantic distinctions.

作者张春祥孙颖高可心高雪瑶 ZHANG Chunxiang;SUN Ying;GAO Kexin;GAO Xueyao(School of Computer Science and Technology,Harbin University of Science and Technology,Harbin 150080,China)

机构地区哈尔滨理工大学计算机科学与技术学院

出处《电子与信息学报》北大核心 2025年第11期4549-4559,共11页 Journal of Electronics & Information Technology

基金国家自然科学基金(61502124,60903082) 中国博士后科学基金(2014M560249) 黑龙江省自然科学基金(LH2022F031,LH2022F030,F2015041,F201420)。

关键词词义消歧图卷积网络对抗训练消歧特征消歧特征图 Word Sense Disambiguation(WSD) Graph Convolutional Network(GCN) Adversarial Training(AT) Disambiguation features Disambiguation feature graph

分类号 TN919.8 [电子电信—通信与信息系统] TP391.1 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献1

1张春祥,张育隆,高雪瑶.基于多通道残差混合空洞卷积的注意力词义消歧[J].北京邮电大学学报,2024,47(5):128-134. 被引量：2

共引文献1

1周运磊,董效杰,刘三军,刘承毅.基于改进YOLOv11n的轻量级电力设备过热故障红外图像检测算法[J].湖北民族大学学报(自然科学版),2025,43(1):114-118. 被引量：2

1张天海,杨小龙,周帅,汤可怡,刘鑫.考虑储能SOC的分布式光伏一次调频优化控制方法[J].储能科学与技术,2025,14(10):3796-3807.
2王东旭,沈成喆.基于神经网络的电厂设备运行状态智能诊断方法[J].电气技术与经济,2025(11):374-377.
3杨宏宇,罗靖川,成翔,胡俊成.融合代码序列与属性图的源代码漏洞检测方法[J].电子与信息学报,2025,47(11):4440-4450.
4朝汗,哈斯.基于K-means聚类的蒙古文多义词歧义消除算法[J].内蒙古师范大学学报(自然科学版蒙古文),2020,41(1):41-47.
5赵弘毅,李志远,卜凡亮.基于多语言嵌入图卷积网络的仇恨言论检测方法[J].计算机科学,2025,52(S2):48-55.
6宣琦,陈芝昊,陈壮志,徐东伟.基于互信息增强的信号识别小样本学习方法[J].浙江工业大学学报,2025,53(6):623-635.
7周兴发,韦怡,谭丽娜.基于运行态势全息感知的水电机组一次调频扰动监测[J].传感器世界,2025,31(10):24-28.
8刘明哲,胡思敏,梁欣,耿超越,谭馨仪.基于奇异值分解和特征映射的光伏场站变压器潜在故障风险评估方法[J].电子设计工程,2025,33(23):167-171.
9吴开,武新乾,陈祖刚,张冀.基于词嵌入的词汇稀疏分布式编码方法[J].中文信息学报,2025,39(7):27-43.
10Fang Luofan.CAS sets up committee on smart elderly care facilities[J].China Standardization,2025(6):26-26.

电子与信息学报

2025年第11期

浏览历史

内容加载中请稍等...

结合预训练模型的双向门控图卷积对抗词义消歧

参考文献1

共引文献1

相关作者

相关机构

相关主题

浏览历史