摘要
提出了一种基于二元树复数小波变换的文字种类自动识别技术,相对于传统的离散小波变换,这种小波变换具有近似的平移不变性和良好的方向选择性等特点。在文本图像的特征提取中,可以得到更加稳定和更多方向的纹理特征。在实验中选取了6种语言(中文,英文,日文,韩文,俄文和阿拉伯文)不同格式和字体的文本图像,作实验结果表明:本文算法的识别效果要优于传统的离散小波变换算法。
The technology of automatic script identification of a document image has important applications in the field of document analysis. This paper proposes a script identification technique based on dual tree complex wavelet transform. The transform provides approximate shift invariance and the good directional selectivity compared with the traditional real wavelet transform. The transform can extract more stable and directional texture feature from text images. Six languages (Chinese, English, Japanese, Korean, Russian, and Arabic) script samples are chosen with different formats and fonts to demonstrate the potential of the technology. Experimental results show that the method is more effective in script identification than the traditional method of real wavelet transform.
出处
《数据采集与处理》
CSCD
北大核心
2008年第6期766-771,共6页
Journal of Data Acquisition and Processing
关键词
文种识剐
二元树复数小波变换
纹理特征
小波能量
script identification
dual-tree complex wavelet transform
texture feature
wavelet energy