摘要
研究互联网中的人名消歧问题。抽取与网页文本中人名关键字实体相关的依存特征及命名实体等辅助特征,利用二层聚类算法,根据依存特征将可信度高的文档聚类,使用辅助特征将剩余文档加到现有聚类结果中,由此实现人名消歧。实验结果证明,该方法消歧效果优于其他人名消歧方法。
This paper works on the common ambiguity problem on Internet.The following is the proposed method: extract the dependency features which are related to the key name entities in the Web page text,while extract supporting features such as named entity extraction;cluster these features by a two-step cluster algorithm which clusters the documents with high reliability in the first stage and then merges the other documents to the existing clustering results.Experimental result shows that the proposed disambiguation system has better performance than common methods.
出处
《计算机工程》
CAS
CSCD
2012年第19期133-136,共4页
Computer Engineering
基金
国家自然科学基金资助项目(60970056
61070123
61003155)
江苏省自然科学基金资助项目(BK2008160)
高等学校博士学科点专项基金资助项目(20093201110006)
模式识别国家重点实验室开放课题基金资助项目
关键词
人名歧义
依存特征
人名消歧
命名实体
聚类
name ambiguity
dependency feature
name disambiguation
named entity
clustering