摘要
表格分析是对表格的基本结构及形状进行识别的过程,是以后能否从表格单元中正确提取文本信息的关键。在结合表格特点的基础上,采用了表格线检测与处理相结合的方法获取表格框线。检测表格线过程中,通过定义了主表格线长度来加快扫描的速度;在表格线的处理中,针对杂线的剔除、表格线的调整及最终获得表格结构等方面进行了系统的探讨。大量的实验结果表明所提方法是可行的。
Form analysis is a recognition process for a basic structure of form and shape. It is crucial to extract text information correctly from form unit. Based on the property of form, the method that combine lines detecting and processing together to detect the table line effectively is used. In the process of line detecting, by defining main line to accelerate scan. The elimination of the other lines is discussed, the table line is adjusted and finally the structure of form is gotten when process line. Lots of experimental results show that the method is feasible.
出处
《计算机工程与设计》
CSCD
北大核心
2008年第19期5114-5116,共3页
Computer Engineering and Design
基金
广东省自然科学基金项目(032356、07010869)
北京大学视觉与听觉信息处理国家重点实验室开放课题基金项目(0505)
江门市科技计划基金项目(【2007】28号)
关键词
表格分析
表格识别
直线提取
直线检测
表格结构
form analysis
form recognition
line extraction
line detection
form structure