期刊文献+

基于职业特征的人名消歧算法 被引量:2

Name Disambiguation Algorithm Based on Clustering Occupational Characteristics
在线阅读 下载PDF
导出
摘要 职业是人物实体的代表性特征,能够有效地区分人物实体。传统人名消歧算法仅把职业当作一个普通的特征,忽视了它的重要性。针对以上问题,提出了基于职业特征的人名消歧算法。首先通过互联网手动构建基础职业词典;其次以维基百科的所有中文页面为训练语料,通过词激活力模型扩展基础职业词典得到职业特征词典;然后从文本中提取职业特征,并抽取人名和作品名作为其补充特征,弥补文本中职业特征缺失和同一人物具有多个职业的问题;最后采用凝聚层次聚类实现人名消歧。在CLP2010的人名消歧训练语料上进行实验,结果表明文章算法能够有效地实现人名消歧。 Occupation is the representative feature of character entities and can effectively distinguish them. Considering that the traditional algorithm of name disambiguation takes the occupation as a common feature and ignores its importance, this paper puts forward an algorithm of name disambiguation based on occupation. Firstly, a basic occupation dictionary is built manually through the internet; secondly, all Chinese Wikipedia pages are used as training corpus and a basic occupation dictionary is derived by extending the word activation force model; then, occupation is extracted as a feature from the text, supplemented by names and works to make up for the problems of occupation missing and the same person having multiple occupations; finally, name disambiguation is imple-mented by agglomerative hierarchical clustering. Experimental results on CLP2010 of Chinese names disambiguation evaluation corpus show that our algorithm is effective.
机构地区 信息工程大学 [
出处 《信息工程大学学报》 2016年第5期548-554,共7页 Journal of Information Engineering University
基金 国家社会科学基金资助项目(14BXW028)
关键词 职业特征 亲和度 人名消歧 词激活力 凝聚层次聚类 occupational characteristics affinity name disambiguation word activation force agglomerative hierarchical clustering
  • 相关文献

参考文献3

二级参考文献42

  • 1J. Artiles, J. Gonzalo, S. Sekine. The SemEval- 2007WePS Evaluation.. Establishing a benchmark for the Web People Search Task [C]//SemEval, 2007.
  • 2A. Bagga, B. Baldwin. Entity-based cross-document coreferencing using the Vector Space Model[C]//Proceedings of the 17th international conference on Computational linguistics-Volume 1, 1998: 79-85.
  • 3G. S. Mann, D. Yarowsky. Unsupervised personal name disambiguation [C]//Proceedings of the seventh conference on Natural language learning at HLT- NAACL, 2003.. 33-40.
  • 4M. B. Fleischman, E. Hovy. Multi-document person name resolution[C]//Proceedings of ACL-42, Reference Resolution Workshop, 2004.
  • 5B. Malin. Unsupervised Name Disambiguation via Social Network Similarity [C]//Workshop Notes on Link Analysis, Counterterrorism, and Security, 2005.
  • 6T. Pedersen, K. Anagha. Automatic Cluster Stopping with Criterion Functions and the Gap Statistic[C]// Proceedings of the Demonstration Session of the Human Language Technology Conference and the Sixth Annual Meeting of the North American Chapter of the Association for Computational Linguistic, New York City. 2006.
  • 7Scott J. Social network analysis: A handbook (2nd ed. ) [M]. Thousands Oaks, CA: Sage. 2000.
  • 8Ng A, Jordan M,Weiss Y. On spectral clustering: Analysis and an algorithm. Advances in Neural Information Precessing Systems 14 [C]//MIT Press, 2002.
  • 9Z. Wu, R. Leahy. An Optimal Graph Theoretic Approach to Data Clustering: Theory and Its Application to Image Segmentation[J]. IEEE Trans. Pattern Analysis and Machine Intelligence, 1993, 15 (11) : 1101-1113.
  • 10Shi J, Malik J. Normalized cuts and image segmentation [J]. IEEE Trans. Pattern Analysis and Machine Intelligence,2000, 22(8) : 888-905.

共引文献26

同被引文献24

引证文献2

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部