摘要
在文档图像分析处理中,分割技术十分必要。本文介绍了目前文档图像分割算法中常用的特征和方式,并对针中文文档图像给出了一种分割方法。该方法首先利用Sobel算子粗略检测出文字边缘区域,利用形态学膨胀方法扩展该区域,接着进行了两次颜色聚类分析,最后根据中文字符的特征进行了一系列启发式处理,很好地分割出了文字区域。
Segmentation is necessary to the analysis and processing of document image. In this paper we introduce some features and approaches which have been used in document image segmentation nowadays, and then propose an algorithm for segmenting Chinese document image. In this method, Sobel operators are first used to detect the text edge regions coarsely~ on which a morphological dilated operation is carried out. After twice color clustering analysis, many heuristics are used based on the features of Chinese character, and text regions are extracted very well at last.
出处
《中国传媒大学学报(自然科学版)》
2006年第4期62-67,共6页
Journal of Communication University of China:Science and Technology
关键词
文档图像分析
图像分割
文字提取
document image analysis
image segmentation
text extraction