摘要
为了研制高性能的数学公式自动识别系统,在详细分析印刷体公式中各种数学符号(包括大小写英文字母、数字及各种运算符、关系符)特点的基础上,选择字符图像中的字符宽高比、网格特征、穿线特征、直线特征、孔洞数、交叉点、端点、质心位置、直方图峰值等9个特征共29维作为字符识别的特征进行识别,并在识别过程中,训练一个高性能、高识别率的支持向量机分类器.实验结果表明,使用这些特征,可以很好地区分数学公式中出现的符号,提高整个系统的识别率.
In order to develop efficient automatic recognition system of mathematical formula,the characteristics of various symbols in printed mathematical expression(including English alphabets in upper/lower case,digits,operators,and relation characters) was analyzed in detail and on this basis,nine features of mathematical symbol with 29 dimensions in all were chosen and extracted as the symbols being recognized;they were width-to-height ratio of the symbol in its image,mesh feature,line crossing feature,straightness,number of holes,intersection point,end point,mass-center position,and peak value of histogram.In addition,a supporting vector machine classifier with high performance and recognition rate was trained in the process of recognition.It was shown by experimental result that by using these features,the symbols appearing in a mathematical formula could well be distinguished so that the recognition rate of entire system would be improved.
出处
《兰州理工大学学报》
CAS
北大核心
2012年第5期98-101,共4页
Journal of Lanzhou University of Technology
基金
湖南文理学院科研基金项目(JJYB0914)的资助
关键词
数学公式
符号
特征提取
分类
识别
支持向量机
mathematical formula
symbol
feature extraction
classification
recognition
support vector machine