摘要
提高低质量文本图像的识别率是现今文字识别研究的重要方向。文章对倾斜文本行的切分算法,断裂、粘连、交叠字符的切分算法以及后处理作了较为深入的研究,提出一些新的算法。该系统能够识别多达260种字体,包括黑体、斜体等字体,对训练集的识别率达到98.5%,并在实际应用中取得了良好效果。
It is important to do research in improving recognition rate for low quality text images.This paper discusses the algorithms of skew text llne segmentation and splitting,touching and overlapping character segmentation,and postprocessing after the deep study of these fields.Some novel algorithms are provided in the paper.The system can recognize at least 260 kinds of fonts,including black font and italic font,The recognition rate in the training set is 98.5%,and the experiments in real-world documents are very promising.
出处
《计算机工程与应用》
CSCD
北大核心
2006年第12期183-186,共4页
Computer Engineering and Applications
关键词
光学字符识别
行切分
字符切分
后处理
OCR,text line segmentation,character segrnenation, post-processing