摘要
词语作为语言模型中的基本语义单元,在整个语义空间中与其上下文词语具有很强的关联性。同样,在语言模型中,通过上下文词可判断出当前词的含义。词表示学习是通过一类浅层的神经网络模型将词语和上下文词之间的关联关系映射到低维度的向量空间中。然而,现有的词表示学习方法往往仅考虑了词语与上下文词之间的结构关联,词语本身所蕴含的内在语义信息却被忽略。因此,该文提出了DEWE词表示学习算法,该算法可在词表示学习的过程中不仅考量词语与上下文之间的结构关联,同时也将词语本身的语义信息融入词表示学习模型,使得训练得到的词表示既有结构共性也有语义共性。实验结果表明,DEWE算法是一种切实可行的词表示学习方法,相较于该文使用的对比算法,DEWE在6类相似度评测数据集上具有优异的词表示学习性能。
Words,as the basic semantic unit in language models,are strongly related to the context words in the whole semantic space.Word representation learning aims at mapping the relationship between words and context words into a low dimensional vector space using the shallow neural network models.However,the existing word representation learning methods usually only consider the syntagmatic relations between words,without directly capturing the paradigmatic information.In this paper,a new word representation learning algorithm,DEWE,is proposed to integrate the semantic information of the word itself into the training of word representation.The structural and semantic generalization of the proposed word representation learning method is validated by 6 similarity evaluation datasets,with all results confirming the excellent performance of DEWE.
作者
冶忠林
赵海兴
张科
朱宇
YE Zhonglin;ZHAO Haixing;ZHANG Ke;ZHU Yu(College of Computer Science, Shaanxi Normal University, Xi’an, Shaanxi 710062, China;College of Computer, Qinghai Normal University, Xining, Qinghai 810008, China;Provincial Laboratory of Tibetan Information Processing and Machine Translation, Qinghai Normal University, Xining, Qinghai 810008, China;Key Laboratory of Tibetan Information Processing, Ministry of Education, Qinghai Normal University, Xining, Qinghai 810008, China)
出处
《中文信息学报》
CSCD
北大核心
2019年第4期29-36,共8页
Journal of Chinese Information Processing
基金
国家自然科学基金(61663041
61763041)
长江学者和创新研究团队项目(IRT_15R40)
中央高校基本科研业务费专项资金(2017TS045)
藏文信息处理与机器翻译重点实验室(2013-Z-Y17)
关键词
词表示学习
语义嵌入
词表示联合模型
词嵌入
词语结构矩阵
word representation learning
semantic embedding
word representation joint model
word embedding
word structure matrix