摘要
针对传统词义消歧方法面临的数据稀疏问题,提出一种基于上下文语境的词义消歧方法。该方法假设同一篇文章中的句子之间共享一些相同的话题,首先,抽取在同一篇文章中包含相同歧义词的句子,这些句子可以作为歧义句的上下文语境,为其中的一个歧义句子提供消歧知识;其次,通过一种无监督的词义消歧方法进行词义消歧。在真实的语料上实验结果表明,使用2个上下文语境句子,窗口大小为1时,该方法的消歧准确率比基线方法(Orig Disam)提高了3.26%。
In order to overcome the data sparseness problem of traditional Word Sense Disambiguation( WSD) methods,a new WSD method based on knowledge context was proposed. The method is based on the assumption that sentences within one article share some common topics. Fisrt, similarity algorithm was used to obtain sentences with the same ambiguous words in the article, and those sentences could be appropriate knowledge context for ambiguous sentences and provided disambiguation knowledge. Then a graph-based ranking algorithm was used for WSD. The experimental results of real data show that, when there are two knowledge context sentences and the window size is 1, the disambiguation accuracy of this method is increased by 3. 2% compared to the baseline method( Orig Disam).
出处
《计算机应用》
CSCD
北大核心
2015年第4期1006-1008,1012,共4页
journal of Computer Applications
基金
国家自然科学基金资助项目(61403238)
山西省自然科学基金资助项目(2014021022-1)
关键词
数据稀疏
词义消歧
上下文语境
网络图模型
参数估计
data sparseness
Word Sense Disambiguation(WSD)
knowledge context
graph based model
parameter estimation