期刊文献+

一种适用于复合术语的本体概念学习方法 被引量:10

Ontology Concept Learning Method for Compound Terms
在线阅读 下载PDF
导出
摘要 术语的提取显然在本体概念学习中起着重要作用,由于汉语文本中词与词之间没有明显的界限,使得领域术语特别是复合术语的提取尤为困难。针对传统提取方法缺乏语义支持、计算量大、准确率低等不足,提出了一种适用于复合术语提取的本体概念学习方法。首先利用自然语言处理技术过滤掉与术语无关的成分,对语句进行自然切割,为领域术语提取提供完整的候选数据集,以保证候选领域复合术语不被误分。在此基础上,根据术语的领域统计和分布特征,利用术语频率和信息熵进行多策略的领域术语筛选,经同义术语识别与合并,获得领域概念集。经实验验证,提出的方法能够以较高的准确率从领域文本中提取出领域单词术语和复合术语。 Term extraction plays an important role in ontology concept learning based on text. Because of no clear boundary among words in Chinese text, domain terms, especially compound terms, are difficult to be extracted. Tradi- tional term extraction methods usually need large amount of calculation and lack of semantic supporting. A novel ontology concept learning method for compound terms was presented in this paper. At first, natural language processing technolo- gy is utilized to remove the irrelevant parts to get candidate terms. Sentences in the text are cut by punctuation marks and removed parts, so that the candidate compound terms can be reserved from wrong cutting. The candidate domain- specific terms are filtered by term frequency and information entropy with multi-strategy, according to the characteris- tics of distribution and statistics of terms. Then domain-specific concept set is obtained after the synonymous terms recog-nition. Experimental results show that the method can extract domain-specific word terms and compound terms with higher precision.
出处 《计算机科学》 CSCD 北大核心 2013年第5期168-172,共5页 Computer Science
基金 国家"十二五"科技支撑计划项目(2011BAK08B04) 中央高校基本科研业务费专项资金资助项目(FRF-TP-12-162A) 江西省教育厅科技项目(GJJ12345)资助
关键词 术语提取 术语筛选 复合术语 本体概念学习 Term extraction Term filtering Compound terms Ontology concept learning
  • 相关文献

参考文献18

  • 1Borst W N. Construction of Engineering Ontologies for Knowled- ge Sharing and Reuse[D]. University of Twente, Enschede, 1997.
  • 2Gomez P A,Macho M D. An over view of methods and tools for ontology learning from texts[J]. The Knowledge Engineering Review, 2004,3 (19) : 187-212.
  • 3Maedche A. Ontology Learning for the Semantic Web [M]. Bos- ton: Kluwer Academic Publishers, 2002.
  • 4Frantzi K T, Ananiadou S. The C-Value/NC-Value Domain In- dependent Method for Multi-Word Term Extraction[J]. Journal of Natural Language Processing, 1999,6(3) : 145-179.
  • 5Shamsfard M, Barforoush A A. Learning ontolngies from natural language texts[J]. Int: 1 Journal Human-Computer Studies, 2004,60(1) : 17-63.
  • 6Navigli R, Velardi P, Gangemi A. Ontology learning and its ap- plication to automated terminology translation[J]. IEEE Intelli- gent Systems, 2003,18 (1) : 22-31.
  • 7Maedehe A, Staab S. Discovering Conceptual Relations From Text[C]//Proc. European Conf. Artificial Intelligence(ECAI- 00). 2000,1 : 321-325.
  • 8陈文亮,朱靖波,姚天顺,等.基于Bootstrapping的领域词汇自动获取[C]//语言计算与基于内容的文本处理——全国第七届计算语言学联合学术会议论文集,2003:67-72.
  • 9张锋,许云,侯艳,樊孝忠.基于互信息的中文术语抽取系统[J].计算机应用研究,2005,22(5):72-73. 被引量:36
  • 10杜波,田怀凤,王立,陆汝占.基于多策略的专业领域术语抽取器的设计[J].计算机工程,2005,31(14):159-160. 被引量:26

二级参考文献57

共引文献331

同被引文献85

引证文献10

二级引证文献23

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部