期刊文献+

引入标点处理的层次化汉语长句句法分析方法 被引量:23

A Hierarchical Parsing Approach with Punctuation Processing for Long Chinese Sentences
在线阅读 下载PDF
导出
摘要 在分析汉语标点符号用法和句法功能的基础上,本文提出了一种新的面向汉语长句的层次化句法分析方法。这种方法和传统的不考虑标点符号的一遍分析方法的主要区别在于两个方面:第一,利用部分标点符号的特殊功能将复杂长句分割成子句序列,从而把整句的句法分析分成两级来进行。这种“分而治之”的策略大大降低了在传统的一遍分析方法中同时识别子句或短语之间的句法关系以及子句和短语内部成分的句法关系的困难。第二,从大规模树库中提取包含所有标点符号的语法规则和相应概率分布信息,有利于句法分析和歧义消解。实验证明我们的方法与传统的一遍图表(chart)分析方法相比,能够大大减少时间消耗和歧义边的个数,并且提高了复杂长句分析的正确率和召回率约7%。 Based on the analysis of the usage and the syntactic function of Chinese punctuations, this paper proposes a new hierarchical approach to parse the long Chinese sentences. In traditional parsing approaches, the parsing procedure is performed in an one-level way and the punctuation marks are not specially treated. Correspondingly, in our approach, the complex long Chinese sentences are broken into sub-sentences or units (say ' units' hereafter) by using punctuation marks with special functions, so that the original whole sentence is parsed unit by unit. This idea of ' divide-and-conquer' greatly reduces the difficulty in the traditional parsing approaches to recognize the syntactic relationship between the sub-sentences and phrases or inside the sub-sentences or phrases. And also, in our approach, the grammatical rules with punctuation marks and their probabilities are extracted from the large scale treebank, which are very beneficial to the syntactic disambiguation. Our experimental results have shown that comparing with the traditional Chart parsing algorithm, our approach can significantly reduce the time consumption and the numbers of ambiguous edges, and get about 7% of the correct rate and the recall rate increasing while parsing long Chinese sentences.
作者 李幸 宗成庆
出处 《中文信息学报》 CSCD 北大核心 2006年第4期8-15,共8页 Journal of Chinese Information Processing
基金 国家自然科学基金资助项目(60375018 60175012 60121302) 中科院海外学者基金资助项目(2003-1-1)
关键词 人工智能 自然语言处理 句法分析 标点符号 层次化分析方法 artificial intelligence natural language processing parsing Chinese punctuations hierarchical parsing approach
  • 相关文献

参考文献12

  • 1Jones Bernard,Towards a Syntactic Account of Punctuation[A].In:proceedings of the 16th International Conference on Computational Linguistics (COLING-96)[C].Copenhagen,Denmark,August,1996.
  • 2Geoffrey Nunberg.The Linguistics of Punctuation[M].CSLI Lecture Notes,No.18,Stanford CA,1990.
  • 3Jones Bernard.What's the Point? A (Computational) Theory of Punctuations[D].PhD thesis,Centre for Cognitive Science,Universito of Edinburgh,Edinburgh,UK(1997).
  • 4Charles Meyer.A Linguistic Study of American Punctuation[M].Peter Lang:New York.1987.
  • 5B.Say and V.Akman,Current approaches to punctuation in computational linguistics[J].Computers and the Humanities,vol.30 (1997) 457 -469.
  • 6Edward Briscoe.The Syntax and Semantics of Punctuation and its Use in Interpretation[A].In:proceedings of the ACL/SIGPARSE International Meeting on Punctuation in Computational Linguistics[C].Santa Cruz,California.(1996) 1 -7.
  • 7Zhou Qiang.The Chunk Parsing Algorithm for Chinese Language[A].In:proceedings of JSCL'99[C].(1999)242-247.
  • 8宗成庆,张玉洁,山本和英,坂本仁,白井谕.口语自动翻译系统中的汉语语句改写[A].中文计算国际会议(ICCC)论文集[C].2001,新加坡.第395-401页.
  • 9黄河燕,陈肇雄.基于多策略分析的复杂长句翻译处理算法[J].中文信息学报,2002,16(3):1-7. 被引量:12
  • 10周强.汉语句法树库标注体系[J].中文信息学报,2004,18(4):1-8. 被引量:91

二级参考文献24

  • 1戴浩一.概念结构与非自主性语法:汉语语法概念系统初探[J].当代语言学,2002,4(1):1-12. 被引量:108
  • 2陈肇雄,高庆狮.智能化英汉机译系统IMT/EC[J].中国科学(A辑),1989,20(2):186-194. 被引量:16
  • 3贺琛 黄河燕 等.英语文本简化处理方法综述,ICCC2001[M].上海,-..
  • 4黄河燕.IHSMTS中多策略译文生成算法,ICCC2001[M].上海,-..
  • 5Brants, S., & Hansen, S. (2002). Developments in the TIGER annotation scheme and their realization in the corpus[A]. In: Proceedings of the Third Conference on Language Resources and Evaluation (LREC-02)[C]. Las Palmas de Gran Canaria, Spain. 1643-164
  • 6Collins, M. (1999) Head-Driven Statistical Models for Natural Language Parsing[D]. Ph.D. Thesis. Dept. of Computer Science and Information, The University of Pennsylvania.
  • 7Hajic, J. (1999). Building a syntactically annotated corpus: The Prague Dependency Treebank[A]. In: E. Hajicova (Ed.), Issues of valency and meaning. Studies in honour of Jarmila Panevova. Prague, Czech Republic: Charles University Press.
  • 8Chu-Ren Huang, Feng-Yi Chen, Keh-Jiann Chen, & al.(2000). Sinica Treebank: Design Criteria, Annotation Guidelines, and On-line Interface[A], Proceedings of the Second Chinese Language Processing Workshop[C], HongKong. 29-37.
  • 9Kingsbury, P.; Martha Palmer, and Marcus, M. (2002). Adding Semantic Annotation to the Penn TreeBank[A]. In: Proceedings of the Human Language Technology Conference[C], San Diego, California.
  • 10Leech, G.; and Garside, R. (1991). Running a grammar factory: The production of syntactically analysed corpora or ‘treebanks' [A]. In: Stig Johansson and Anna-Brita Stenstrom (eds.) English Computer Corpora: Selected papers and Research Guide. 1991. 15-3

共引文献100

同被引文献191

引证文献23

二级引证文献111

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部