摘要
针对传统语素方法对于种子词语数量的依赖和传统图方法召回率较低的问题,提出一种将词语间语素关系融入到图模型中,并结合词语同义关系进行中文褒贬词典半监督构建的方法。首先利用语素模型计算词语间语素相似度;然后利用同义词林和双语词典资源,构建词语间同义关系;最后将二种关系结合,并利用标签传播(LP)算法进行词语的褒贬分类。实验结果表明,所提方法具有较高的准确率和召回率,微平均F1值最高可达92.8%;并降低了对种子词语数量的依赖,当种子词语数量仅为100时,微平均F1值依然可达到84.1%。除此之外,所提方法还具有快速收敛的特性。
Concerning the dependence on seed words amount of the traditional method based on morpheme,and the low recall rate of traditional graph-based method,the authors proposed a method which integrated the morpheme relationship of Chinese words into the graph model,and combined the synonymy of words to build Chinese polarity lexicon by a semi-supervised learning algorithm in a graph.Firstly,a morpheme model was used to weight the similarity of two Chinese words.Secondly,synonymous words and bilingual lexicon were used to build the synonymy of words.Finally,the final relation was acquired by integrating the two relations,and Label Propagation(LP) was used to run on the relation map to distinguish the polarity of the emotion words.The experimental results show that the proposed method can achieve high accuracy and recall rate,and MicroF1 can be as high as 92.8%.The dependence on seed words amount is reduced based on the fact that when the seed word amount is 100,MicroF1 can still be 84.1%.In addition,the proposed method has fast convergence.
出处
《计算机应用》
CSCD
北大核心
2012年第7期2033-2037,共5页
journal of Computer Applications
关键词
极性词典
语素模型
同义关系
图模型
标签传播
polarity lexicon
morpheme model
synonymy relation
graph model
Label Propagation(LP)