摘要
词义消歧(WSD)是提升计算机自然语言理解能力的关键技术,广泛应用于机器翻译、信息检索等领域。为解决现有模型在泛化与鲁棒性方面的不足,该文提出了一种基于预训练模型的双向门控循环单元(Bi-GRU)、交叉注意力(CA)和图卷积网络(GCN)融合的词义消歧模型,引入对抗训练(AT)来优化该模型。将歧义词左右词汇的词形、词性和语义类作为消歧特征,输入LERT获取动态词向量,利用交叉注意力融合Bi-GRU神经网络提取token序列的全局语义信息和CLS序列的局部语义信息,为消歧特征图生成更加完整的句子结点表示。将消歧特征图输入图卷积来更新结点之间的特征信息,然后利用插值预测层和语义分类层来确定歧义词的真实语义类别。计算输入动态词向量的梯度,生成细微的连续扰动,并将扰动加入到原始词向量矩阵中,生成对抗样本。利用对抗样本,融合网络的损失与对抗训练中的损失来优化消歧模型。实验结果表明,该方法不仅能够增强消歧模型处理复杂词汇歧义问题的能力,还能有效提高其鲁棒性和泛化能力,从而表现出更好的消歧性能。
Objective In Word Sense Disambiguation(WSD),the Linguistically-motivated bidirectional Encoder Representation from Transformer(LERT)is employed to capture rich semantic representations from large-scale corpora,enabling improved contextual understanding of word meanings.However,several challenges remain.Current WSD models are not sufficiently sensitive to temporal and spatial dependencies within sequences,and single-dimensional features are inadequate for representing the diversity of linguistic expressions.To address these limitations,a hybrid network is constructed by integrating LERT,Bidirectional Gated Recurrent Units(Bi-GRU),and Graph Convolutional Network(GCN).This network enhances the modeling of structured text and contextual semantics.Nevertheless,generalization and robustness remain problematic.Therefore,an adversarial training algorithm is applied to improve the overall performance and resilience of the WSD model.Methods An adversarial WSD method is proposed based on a pre-trained model,combining Bi-GRU and GCN.First,word forms,parts of speech,and semantic categories of the neighboring words of an ambiguous term are input into the LERT model to obtain the CLS sequence and token sequence.Second,cross-attention is applied to fuse the global semantic information extracted by Bi-GRU from the token sequence with the local semantic information derived from the CLS sequence.Sentences,word forms,parts of speech,and semantic categories are then used as nodes to construct a disambiguation feature graph,which is subsequently input into GCN to update the feature information of the nodes.Third,the semantic category of the ambiguous word is determined through the interpolated prediction layer and semantic classification layer.Fourth,subtle continuous perturbations are generated by computing the gradient of the dynamic word vectors in the input.These perturbations are added to the original word vector matrix to create adversarial samples,which are used to optimize the LERT+Bi-GRU+CA+GCN(LBGCA-GCN)model.A cross-entropy loss function is applied to measure the performance of the LBGCA-GCN model on adversarial samples.Finally,the loss from the network is combined with the loss from AT to optimize the LBGCA-GCN model.Results and Discussions When the FreeLB algorithm is applied,stronger adversarial perturbations are generated,and the FreeLB algorithm achieves the best performance(Table 2).As the number of perturbation steps increases,the strength of AT improves.However,when the number of steps exceeds a certain threshold,the LBGCA-GCN+AT(LBGCA-GCN-AT)model begins to overfit.The Free Large-Batch(FreeLB)algorithm demonstrates strong robustness with three perturbation steps(Table 3).The cross-attention mechanism,which fuses the token sequence with the CLS sequence,yields significant performance gains in complex semantic scenarios(Fig.3).By incorporating AT,the LBGCA-GCN-AT model achieves notable improvements across multiple evaluation metrics(Table 4).Conclusions This study presents an adversarial WSD method based on a pre-trained model,integrating Bi-GRU and GCN to address the weak generalization ability and robustness of conventional WSD models.LERT is used to transform discriminative features into dynamic word vectors,while cross-attention fuses the global semantic information extracted by Bi-GRU from the token sequence with the local semantic information derived from the CLS sequence.This fusion generates more complete node representations for the disambiguation feature graph.A GCN is then applied to update the relationships among nodes within the feature graph.The interpolated prediction layer and semantic classification layer are used to determine the semantic category of ambiguous words.To further improve robustness,the gradient of the dynamic word vector is computed and perturbed to generate adversarial samples,which are used to optimize the LBGCA-GCN model.The network loss is combined with the AT loss to refine the model.Experiments conducted on the SemEval-2007 Task#05 and HealthWSD datasets examine multiple factors affecting model performance,including adversarial algorithms,perturbation steps,and sequence fusion methods.Results demonstrate that introducing AT improves the model’s ability to handle real-world noise and perturbations.The proposed method not only enhances robustness and generalization but also strengthens the capacity of WSD models to capture subtle semantic distinctions.
作者
张春祥
孙颖
高可心
高雪瑶
ZHANG Chunxiang;SUN Ying;GAO Kexin;GAO Xueyao(School of Computer Science and Technology,Harbin University of Science and Technology,Harbin 150080,China)
出处
《电子与信息学报》
北大核心
2025年第11期4549-4559,共11页
Journal of Electronics & Information Technology
基金
国家自然科学基金(61502124,60903082)
中国博士后科学基金(2014M560249)
黑龙江省自然科学基金(LH2022F031,LH2022F030,F2015041,F201420)。
关键词
词义消歧
图卷积网络
对抗训练
消歧特征
消歧特征图
Word Sense Disambiguation(WSD)
Graph Convolutional Network(GCN)
Adversarial Training(AT)
Disambiguation features
Disambiguation feature graph