基于描述约束的词表示学习被引量：3

Description Constrained Word Embedding

下载PDF

导出

摘要词语作为语言模型中的基本语义单元,在整个语义空间中与其上下文词语具有很强的关联性。同样,在语言模型中,通过上下文词可判断出当前词的含义。词表示学习是通过一类浅层的神经网络模型将词语和上下文词之间的关联关系映射到低维度的向量空间中。然而,现有的词表示学习方法往往仅考虑了词语与上下文词之间的结构关联,词语本身所蕴含的内在语义信息却被忽略。因此,该文提出了DEWE词表示学习算法,该算法可在词表示学习的过程中不仅考量词语与上下文之间的结构关联,同时也将词语本身的语义信息融入词表示学习模型,使得训练得到的词表示既有结构共性也有语义共性。实验结果表明,DEWE算法是一种切实可行的词表示学习方法,相较于该文使用的对比算法,DEWE在6类相似度评测数据集上具有优异的词表示学习性能。 Words,as the basic semantic unit in language models,are strongly related to the context words in the whole semantic space.Word representation learning aims at mapping the relationship between words and context words into a low dimensional vector space using the shallow neural network models.However,the existing word representation learning methods usually only consider the syntagmatic relations between words,without directly capturing the paradigmatic information.In this paper,a new word representation learning algorithm,DEWE,is proposed to integrate the semantic information of the word itself into the training of word representation.The structural and semantic generalization of the proposed word representation learning method is validated by 6 similarity evaluation datasets,with all results confirming the excellent performance of DEWE.

作者冶忠林赵海兴张科朱宇 YE Zhonglin;ZHAO Haixing;ZHANG Ke;ZHU Yu(College of Computer Science, Shaanxi Normal University, Xi’an, Shaanxi 710062, China;College of Computer, Qinghai Normal University, Xining, Qinghai 810008, China;Provincial Laboratory of Tibetan Information Processing and Machine Translation, Qinghai Normal University, Xining, Qinghai 810008, China;Key Laboratory of Tibetan Information Processing, Ministry of Education, Qinghai Normal University, Xining, Qinghai 810008, China)

机构地区陕西师范大学计算机科学学院青海师范大学藏文信息处理与机器翻译省级重点实验室青海师范大学藏文信息处理教育部重点实验室青海师范大学计算机学院

出处《中文信息学报》 CSCD 北大核心 2019年第4期29-36,共8页 Journal of Chinese Information Processing

基金国家自然科学基金(61663041 61763041) 长江学者和创新研究团队项目(IRT_15R40) 中央高校基本科研业务费专项资金(2017TS045) 藏文信息处理与机器翻译重点实验室(2013-Z-Y17)

关键词词表示学习语义嵌入词表示联合模型词嵌入词语结构矩阵 word representation learning semantic embedding word representation joint model word embedding word structure matrix

分类号 TP391 [自动化与计算机技术—计算机应用技术]