摘要
版面分析是一个将文本页面图像分割成不同区域,并标定区域类型(如文字、图片、表格等)的过程,与字符识别具有同等重要的地位.提出了一种基于连通域的蒙古文版面分析方法,它提取文档图像中所有连通域,根据连通域的大小进行聚类,从而可以得到文字连通域和非文字连通域,达到分割版面的目的.实验证明,该算法能够对蒙古文书籍版面进行准确的分析.
Layout analysis is a process that a document image is segmented into different areas and the areas should be classified. It is as important as the character recognition. A new layout analysis method for the Mongolian document images was proposed based on the connected components analysis. All the connected components of a document image are searched by the pixel labeling. Then, they are clustered by their size. Thereby, many connected components of character and non-character can be achieved separately. Experiment shows that the method is suitable for the layout of Mongolian books.
出处
《内蒙古大学学报(自然科学版)》
CAS
CSCD
北大核心
2007年第5期586-590,共5页
Journal of Inner Mongolia University:Natural Science Edition
基金
国家自然科学基金资助项目(批准号69965001)
关键词
蒙古文文档图像
版面分析
自底向上法
自顶向下法
连通域
Mongolian document image
layout analysis
bottom-up approach
top-downapproach
connected component