摘要
对海量文本语料进行上下位语义关系自动抽取是自然语言处理的重要内容,利用简单模式匹配方法抽取得到候选上下位关系后,对其进行验证过滤是难点问题。为此,分别通过对词汇语境相似度与布朗聚类相似度计算,提出一种结合语境相似度和布朗聚类相似度特征对候选下位词集合进行聚类的上下位关系验证方法。通过对少量已标注训练语料的语境相似度和布朗聚类相似度进行计算,得到验证模型和2种相似度的结合权重系数。该方法无需借助现有的词汇关系词典和知识库,可对上下位关系抽取结果进行有效过滤。在CCF NLP&2012词汇语义关系评测语料上进行实验,结果表明,与模式匹配和上下文比较等方法相比,该方法可使F值指标得到明显提升。
Hyponymy has many important applications in the field of Natural Language Processing(NLP) and the automatic extraction of hyponym relation from massive text datasets is naturally one of important NLP research tasks.The emphasis and difficult point of the research is how to validate a hyponym which is extracted with simple pattern matching method is really correct.By calculating the context feature similarity(SimCF) and Brown clustering similarity(SimBrown),this paper proposes a novel approach of hyponymy validation.It applies a clustering on hyponym candidates,and the clustering similarity feature is obtained by combining SimCF and SimBrown.The combination coefficient of two kinds of similarity is derived based on the SimCFs and SimBrowns between all labeled training words and their hyponyms.The model can filter roughly extraction results without any existed lexical relation dictionary or knowledge base.Evaluation on CCF NLPCC2012 word semantic relation corpus shows that the proposed approach in this paper significantly improves the F measure value compared with other approaches including pattern matching and simple context comparison.
出处
《计算机工程》
CAS
CSCD
北大核心
2015年第2期145-150,共6页
Computer Engineering
基金
国家自然科学基金资助项目(61163039
61163036
61363058)
西北师范大学青年教师科研能力提升计划基金资助项目(NWNU-LKQN-10-2)
关键词
上下位关系
语境相似度
布朗聚类相似度
点互信息
模式匹配
聚类验证
hyponymy relation
context similarity
Brown clustering similarity
Point Mutual Information(PMI) pattern matching
clustering validation