摘要
文本校对在新闻发布、书刊出版、语音输入、汉字识别等领域有着极其重要的应用价值,是自然语言处理领域中的一个重要研究方向。该文对中文文本自动校对技术进行了系统性的梳理,将中文文本的错误类型分为拼写错误、语法错误和语义错误,并对这三类错误的校对方法进行了梳理,对中文文本自动校对的数据集和评价方法进行了总结,最后展望了中文文本自动校对技术的未来发展。
Text correction,an important research field in Natural Language Processing(NLP),is of great application value in fields such as news,publication,and text input.This paper provides a systematic overview of automatic error correction technology for Chinese texts.Errors in Chinese texts are divided into spelling errors,grammatic errors and semantic errors,and the methods of error correction for these three types are reviewed.Moreover,datasets and evaluation methods of automatic error correction for Chinese texts are summarized.In the end,prospects for the automatic error correction for Chinese texts are raised.
作者
李云汉
施运梅
李宁
田英爱
Li Yunhan;Shi Yunmei;Li Ning;Tian Ying ai(Beijing Information Science and Technology University,Beijing Key Laboratory of Internet Culture Digital Dissemination,Beijing 100101,China;School of Computer,Beijing University of Information Technology,Beijing 100101,China)
出处
《中文信息学报》
CSCD
北大核心
2022年第9期1-18,27,共19页
Journal of Chinese Information Processing
基金
国家重点研发计划项目(2018YFB1004100)。
关键词
自动校对
拼写错误
语法错误
语义错误
数据集
评估指标
automatic correction
spelling errors
grammatical errors
semantic errors
datasets
evaluation indicators