期刊文献+

农业古籍断句标点模式研究 被引量:31

On Sentence Segmentation and Punctuation Model for Ancient Books on Agriculture
在线阅读 下载PDF
导出
摘要 农业古籍的整理已经引起了众多学者和专家的注意,但是,对于农业古籍的自动断句、标点模式的研究仍付之阙如。本研究探索并总结出部分农业古籍断句、标点识别模式。首先采用句法特征词断句法、同义语标志词法进行初步断句;进而利用反义复合词、引书标志、时序、数量词、重叠字词、动名结构及比较句法进一步对子句进行断句、标点;最后使用农业用语和禁用模式表进一步提高断句、标点后农业古籍的可读性和准确性。经测试表明,断句、标点的平均准确率分别达到48%和35%,证明本方法具有一定的正确性和可行性。 The collation of ancient books on agriculture has arouse the attention of the research circle. But the automatic sentence segmentation and punctuation model for these books have been less touched. This article probes into this issue and summarizes certain patterns on sentence splitting and punctuation model for ancient books on agriculture. It is proposed that the sentence is initially segmented by syntax words (like empty word, conjunction and modal words) and synonyms indication words. Then antonyms, cited books indications, time sequence, quantifiers, pleonasms and verb-object structure are employed for further sentence segmentation and punctuation fill-up. Also the comparative sentence supplies an auxiliary means of judgment of complex sentences and punctuation of clauses. Finally the terms in agriculture and the stop punctuation list are applied to improve the readability of these books after marking the punctuations. In experiments, the average precision of the punctuation model reaches 48% and 35% respectively, which shows the feasibility and the potentially of the proposed method.
出处 《中文信息学报》 CSCD 北大核心 2008年第4期31-38,共8页 Journal of Chinese Information Processing
基金 国家社科基金资助项目(08ATQ002)
关键词 计算机应用 中文信息处理 农业古籍 古农书 古籍整理 断句 标点 模式匹配 computer application Chinese information processing ancient books on agriculture agricultural treatises of ancient China collation of ancient books sentence segmentation punctuation pattern match
  • 相关文献

参考文献13

二级参考文献56

共引文献53

同被引文献388

引证文献31

二级引证文献187

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部