期刊文献+

汉语篇章修辞结构的标注研究 被引量:28

Rhetorical Structure Annotation of Chinese News Commentaries
在线阅读 下载PDF
导出
摘要 汉语篇章修辞结构标注项目CJPL采用大陆主要媒体的财经评论文章为语料,依据修辞结构理论(Rhetor-ical Structure Theory,RST),定义了以标点符号为边界的篇章修辞分析基本单元和47种区分核心性单元的汉语修辞关系集,并草拟了近60页的篇章结构标注工作守则。这一工作目前完成了对97篇财经评论文章的修辞结构标注,在较大规模数据的基础上检验了修辞结构理论及其形式化方法在汉语篇章分析中的可移用性。树库所带有的修辞关系信息以及三类篇章提示标记的篇章用法特征,可以为篇章层级的中文信息处理提供一些浅层语言形式标记的数据。 This paper reports a rhetorical structure annotation project to a Chinese news commentary corpus Caijingpinglun (CJPL) for the purpose of natural language processing. The elementary discourse unit (EDU) in this project is defined as a string between two selected punctuation marks. And altogether 47 Chinese rhetorical relations are defined to mark the nuclarity according to the classic rhetorical structure theory (RST). A 60-page annotation manual with detailed rules of EDU segmentation, EDU combination, relation and scheme tagging protocols are composed. Analysis on the first manually annotated set of 97 texts shows that the RST has good cross-language transferability to Chinese, and the data obtained from this project may be further exploited in Chinese discourse processing.
作者 乐明
出处 《中文信息学报》 CSCD 北大核心 2008年第4期19-23,42,共6页 Journal of Chinese Information Processing
基金 浙江省社会科学界联合会2007年研究课题成果(07N02)
关键词 计算机应用 中文信息处理 汉语语料库 篇章标注 修辞结构理论 computer application Chinese information processing Chinese corpus discourse annotation rhetorical structure theory
  • 相关文献

参考文献11

  • 1William Mann, and Sandra Thompson. Rhetorical Structure Theory: A Theory of Text Organization [M]. ISI/RS-87-190. Information Sciences Institute, University of Southern California. 1987.
  • 2William Mann, and Sandra Thompson. Rhetorical Structure Theory: Toward a functional theory of text organization[J] Text. 1988, 8(3): 243-281.
  • 3Lynn Carlson, Daniel Marcu, and Mary E. Okurowski. Building a discourse-tagged corpus in the frame work of Rhetorical Structure Theory [C]//Jan van Kuppevelt and Ronnie Smith, editors, Current Directions in Discourse and Dialogue. Kluwer Academic Publishers. 2003.
  • 4Manfred Stede. The Potsdam Commentary Corpus. [C]//Proceedings of the ACL 2004 Workshop Discourse Annotation', Barcelona. 2004.
  • 5R. Soricut and Daniel Marcu. Sentence level discourse parsing using syntactic and lexical information [C]// Proceedings of Human Language Technology and North American Association for Computational Linguistics Conference ( HLTNAACL' 2003). Edmonton, Canada.
  • 6J. Burstein and Daniel Marcu. A machine learning approach for identification of thesis and conclusion statements in student essays [J]. Computers and the Humanities. 2003,37(4), 455-467.
  • 7Benjamin K. T'sou, Lin H. L., Ho H. C., Lai T.and Chan T. Automated Chinese Full-text Abstraction Based on Rhetorical Structure Analysis [J]. Computer Processing of Oriental Languages. 1996,10 (2) : 225- 238.
  • 8张益民,陆汝占,沈李斌.一种混合型的汉语篇章结构自动分析方法[J].软件学报,2000,11(11):1527-1533. 被引量:10
  • 9YUE Ming. Discursive Usage of Six Chinese Punctuation Marks [C]//Proceedings of COLING/ACL-2006 Student Research Workshop. Sydney. July 2006. 43- 48.
  • 10邢福义.汉语复句研究[M].北京:商务印书馆.2002.

二级参考文献2

  • 1吴立德,大规模中文文本处理,1997年
  • 2王伟,国外语言学,1994年,1卷,4期,8页

共引文献26

同被引文献313

引证文献28

二级引证文献129

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部