期刊文献+

基于中文机构名简称的检索方法研究 被引量:7

Study on Approaches to Retrieval of Chinese Organization Name Based on its Abbreviated Name
在线阅读 下载PDF
导出
摘要 对于是否是中文机构名或机构名简称的自动判别,已经有广泛和深入的研究;但是对机构名简称和全称的匹配,目前鲜有研究成果。本文针对基于中文机构名简称的检索方法,研究了机构名的结构特征,总结出两种规则,定制了一个基于关键词类的分词工具,提出简称和全称匹配的一种算法,并且结合多级索引技术,实现了基于中文机构名简称的检索系统。实验结果表明,本文所提方法的准确性较好,首选准确率达到近95%,在全称机构名总数达到51万的情况下,检索平均耗时约0.21秒,达到实用要求。 Many research has been done on automatic recognition of Chinese organization names or abbreviated Chinese organization names, but almost none Of them focuses on matching the full names with the abbreviated ones. This paper aims at retrieving full Chinese organization names based on their abbreviated names. After studying the structural features of the organization names, two types of rules were firstly proposed and then a keyword-based segmentation system was implemented, after that, a novel algorithm of matching an abbreviated name with a full name was proposed. Finally, a retrieval system was implemented using a multi-level indexing technique. The experimental results show that our approach could achieve an accuracy of nearly 95% where the total number of organization names was 510 000, and the average retrieval time was about 0.21 seconds per query.
作者 钟良伍 郑方
出处 《中文信息学报》 CSCD 北大核心 2007年第1期38-42,共5页 Journal of Chinese Information Processing
关键词 计算机应用 中文信息处理 多级索引 模糊匹配 分词算法 computer application, Chinese information processing multi-level indexing fuzzy matching word segmentation
  • 相关文献

参考文献9

  • 1Sun J.,Gao J.F.,Zhang L.,Zhou M.,Huang C.N.,Chinese Named Entity Identification Using Class-based Language Model[A],In:Proc.of the 19th International Conference on Computational Linguistics,Taipei,2002,967-973.
  • 2王宁,葛瑞芳,苑春法,黄锦辉,李文捷.中文金融新闻中公司名的识别[J].中文信息学报,2002,16(2):1-6. 被引量:55
  • 3张小衡,王玲玲.中文机构名称的识别与分析[J].中文信息学报,1997,11(4):21-32. 被引量:85
  • 4张艳丽,黄德根等.统计和规则相结合的中文机构名称识别[A].自然语言理解与机器翻译[C].北京:清华大学出版社,2001.233-239.
  • 5王海波,姜吉发,耿晖,白硕,祝明发.XML搜索引擎研究[J].计算机应用研究,2001,18(4):68-71. 被引量:40
  • 6Tarhio Jorma,Ukkonen Esko,Approximate Boyer-Moore string matching[J],SIAM Journal on Computing,April 1993,22(2):243-260.
  • 7雷静.汉语机构名的构成模式[A].第七届全国计算语言学会议论文[C],2003,91-96.
  • 8Salton,G.,Yang,C-S.,On the Specification of Terms Values in Automatic Indexing[J],Journal of Documentation,1973,29(4):351-372.
  • 9Levenshtein VI,Binary codes capable of correcting deletions,insertions and reversals[A],Sov.Phys.Dokl,1966,707-710.

二级参考文献19

  • 1张小衡.从“qinghuadaxue”谈起逐步实现中文智能输入[J].中文信息,1996,13(5):3-5. 被引量:1
  • 2[1]XML and Search[EB/OL]. http://www.searchtools.com/related/ xml.html.
  • 3[2]Goxml[EB/OL]. http://www.goxml.com.
  • 4[3]Dongwook Shin, Hyuncheol Jang, Hongglan Jin. BUS: An Effective Indexing and Retrieval Cheme in Structured Documents[Z].
  • 5[4]Roy Goldman, JasonMcHugh, Jennifer Widom. From Semi-structured Data to XML: Migrating the Lore DataModel and Query Language[Z].
  • 6[5]Alin Deutsch, Mary Fernandez, Daniela Florescu. A Query Language for XML[C]. The Eighth International World Wide Web Conference.
  • 7[6]Guidelines for Robot Writers[EB/OL]. Http://info. Webcrawler.com/mak/projects/robots/robots.html.
  • 8[7]Extensible Markup Language (XML)[EB/OL]. Http://www.w3 .org/XML/.
  • 9[8]Jon Bosak, Sun Microsystems. XML, Java, and the Future of the Web[Z].
  • 10张茂松,中文信息学报,1995年,9卷,2期,16页

共引文献162

同被引文献42

  • 1胡恬,夏迎炬,黄萱菁,吴立德.基于向量空间模型的Web中文信息过滤系统[J].计算机工程,2003,29(z1):25-26. 被引量:6
  • 2车万翔,刘挺,秦兵,李生.基于改进编辑距离的中文相似句子检索[J].高技术通讯,2004,14(7):15-19. 被引量:66
  • 3贺德方.知识链接发展的历史、未来和行动[J].现代图书情报技术,2005(3):11-15. 被引量:31
  • 4万方数据企业信息网[OL].[2009-10-30].http://www.ei86.net/.
  • 5全国组织机构代码管理中心[OL].[2009-10-30].http://www.nacao.org.cn/publish/main/index.html.
  • 6Marks S.Fox,Mihai Barbuceanu,Michael Gruninger,Jinxin Lin.An Organization ontology for enterprise modeling[O].Computers in Industry,1996(29):123-125.
  • 7Studer R,Benjamins V R,Fensel D.Knowledge Engineering,Principles and Methods[J].Data and Knowledge Engineering,1998,25(122):161-197.
  • 8T.R.Gruber.A Translation Approach to Portable Ontologies[J].Knowledge Acquisition,1993,5(2):199-220.
  • 9http://baike.baidu.com/view/479661.htm.
  • 10科学技术信息研究所.中国科技论文统计结果2008版[R].北京:科学技术文献出版社,2008:2-4.

引证文献7

二级引证文献23

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部