摘要
汉语篇章修辞结构标注项目CJPL采用大陆主要媒体的财经评论文章为语料,依据修辞结构理论(Rhetor-ical Structure Theory,RST),定义了以标点符号为边界的篇章修辞分析基本单元和47种区分核心性单元的汉语修辞关系集,并草拟了近60页的篇章结构标注工作守则。这一工作目前完成了对97篇财经评论文章的修辞结构标注,在较大规模数据的基础上检验了修辞结构理论及其形式化方法在汉语篇章分析中的可移用性。树库所带有的修辞关系信息以及三类篇章提示标记的篇章用法特征,可以为篇章层级的中文信息处理提供一些浅层语言形式标记的数据。
This paper reports a rhetorical structure annotation project to a Chinese news commentary corpus Caijingpinglun (CJPL) for the purpose of natural language processing. The elementary discourse unit (EDU) in this project is defined as a string between two selected punctuation marks. And altogether 47 Chinese rhetorical relations are defined to mark the nuclarity according to the classic rhetorical structure theory (RST). A 60-page annotation manual with detailed rules of EDU segmentation, EDU combination, relation and scheme tagging protocols are composed. Analysis on the first manually annotated set of 97 texts shows that the RST has good cross-language transferability to Chinese, and the data obtained from this project may be further exploited in Chinese discourse processing.
出处
《中文信息学报》
CSCD
北大核心
2008年第4期19-23,42,共6页
Journal of Chinese Information Processing
基金
浙江省社会科学界联合会2007年研究课题成果(07N02)
关键词
计算机应用
中文信息处理
汉语语料库
篇章标注
修辞结构理论
computer application
Chinese information processing
Chinese corpus
discourse annotation
rhetorical structure theory