期刊文献+

基于条件随机场与Web数据的缩略语预测 被引量:3

Abbreviation Prediction Using Conditional Random Field and Web Data
在线阅读 下载PDF
导出
摘要 缩略语在自然语言中被广泛使用。因其是新词的重要来源之一,成为了自然语言处理领域的一大问题。该文以汉语为对象,研究了从完整形式预测缩略语形式的方法。首先,使用条件随机场模型对完整形式进行序列标注,生成缩略语候选集合。再利用搜索引擎获取网络数据,并通过不同策略利用网络数据对各候选依次评估,结合各项评估分数进行重排序,选择最终的缩略语结果。实验结果表明,增加Web信息之后,缩略语预测的准确率可以提高约五个百分点。 Abbreviations are commonly used in natural languages and constitutes a substantial proportion of Unknown Words,which challenges Natural Language Processing.This article proposes a strategy of predicting abbreviation from full form in Chinese.For a full form,it firstly generates a number of candidates using Conditional Random Field.Then each of the candidates is re-scored according to the results from Web Search Engine based on different search conditions and statistic methods.The candidate with highest score is selected as the abbreviation.Experiments show the precision improves about 5% compared with single Conditional Random Field method.
出处 《中文信息学报》 CSCD 北大核心 2012年第2期62-68,共7页 Journal of Chinese Information Processing
基金 国家自然科学基金资助项目(60973053 91024009 90920011) 核高基资助项目(2011ZX01042-001-001) 博士点基金资助项目(20090001110047)
关键词 缩略语 CRF模型 网页数据 abbreviation CRF model web data
  • 相关文献

参考文献18

  • 1J.Lafferty,A.McCallum,F.Pereira.Conditional random fields: Probabilistic models for segmenting and labeling sequence data[C]//Proceedings of Machine Learninginternational Workshop Then Conference-.Citeseer,2001,282-289.
  • 2Y.Tsuruoka,S.Ananiadou.A machine learning approach to acronym generation[J].Linking Biological Literature,Ontologies and Databases:Mining Biological Semantics.2005:25.
  • 3Hui Liu,Yuquan Chen,Lei Liu.Automatic Expansion of Chinese Abbreviations by Web Mining[C]//Proceedings of the International Conference on Artificial Intelligence and Computational Intelligence.LNAI 5855,2009,Springer.
  • 4Guang Jiang,Cao Gungen,Sui Yuefei,et al.A General Approach to Extracting Full Names and Abbreviations for Chinese Entities from the Web[C]//Proceedings of Intelligent Information Processing 2010:271-280.
  • 5A.S.Schwartz,M.A. Hearst.A simple algorithm for identifying abbreviation definitions in biomedical text[C]//Proceedings of Pacific Symposium on Biocomputing.Citeseer,2003,8,451-462.
  • 6殷志平.构造缩略语的方法和原则[J].语言教学与研究,1999(2):73-82. 被引量:47
  • 7Xu Sun,Hou-Feng Wang,Bo Wang.Predicting Chinese Abbreviations from Definitions: An Empirical Learning Approach Using Support Vector Regression[J]. Journalof Computer Science and Technology.Jul.2008,23(4):602-611.
  • 8Dong Yang,Yi-Cheng Pan,Sadaoki Furui.Automatic Chinese Abbreviation Generation Using Conditional Random Field[C]//Proceedings of Human Language Technologies:The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers.Boulder,Colorado:Association for Computational Linguistics,2009:273-276.
  • 9王厚峰.汉语缩略语自动处理研究现状[J].中文信息学报,2011,25(5):60-67. 被引量:7
  • 10Manuel Zahariev.ACRONYMS[D].PHD thesis,Simon Fraser University,2004.

二级参考文献7

共引文献71

同被引文献41

  • 1殷志平.构造缩略语的方法和原则[J].语言教学与研究,1999(2):73-82. 被引量:47
  • 2谢丽星,孙茂松,佟子健,等.基于用户查询日志和锚文字的汉语缩略语识别[C]∥中国计算机语言学研究前沿进展,2009.烟台:清华大学出版社,2009:551-556.
  • 3计峰,高沫,邱锡鹏,等.中文机构名简称的自动生成研究[M].孙茂松,陈群秀,中国计算语言学研究前沿进展,清华大学出版社,2009.
  • 4Okazaki N, Ananiadou S, Tsujii J. A discriminative alignment model for abbreviation recognition [C] //Proceedings of the 22nd International Cor:erence on Computational Linguistics, 2008.
  • 5Stevenson Mark, Guo Yikun, Abdulaziz AI Amri, et al. Disam- biguation of biomedical abbreviations [C] //Proceedings of the Workshop on BioNLP, 2009.
  • 6YANG Hua, HONG Yu, HUA Zhenwei, et al. Combination method of rules and statistics for abbreviation and its full name recognition [C]//Proceedings of the International Conference on Informatics, Cybernetics, and Computer Engineering, 2012: 707-714.
  • 7Freund Y, Mason L. The alternating decision tree learning al- gorithm [C] //Proceeding of the Sixteenth International Con- ference on Machine Learning, 1999: 124.
  • 8Freund Y, Schapire RE. A decision-theoretic generation of on-line leamir.g and an application to boosting [G]. LNCS 904: Computational Learning Theory. London: Springer- VerlagLondon, 1995: 23-37.
  • 9Akira Terada, Takenobu Tokunaga, Hozumi Tanaka. Auto- matic expansion o{ abbreviations by using context and character information [J]. International Journal of Information Proces- sing and Management, 2004, 40 (1): 3t-45.
  • 10刘群,李素建.基于《知网》的词汇语义相似度计算[J].计算机语言学及中文信息处理,2007,31(7):59-76.

引证文献3

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部