摘要
提出了基于残差网络和注意力机制的LRAM(LSTM with ResNet and attention model)模型,在模型中引入残差模块(ResNet),加快了网络的收敛速度,降低了网络训练难度;引入注意力机制(AM),实现了不同序列对当前文本识别的权重分配,提高文本识别的准确率.通过在Synth90K,Street View Text和ICDAR等数据集测试结果,与已存在的模型相比,LRAM性能超过现存其他网络模型.
A LRAM model was proposed based on LSTM with ResNet (residual network) and attention model for reading text in natural image. In this model, a ResNet was adopted to accelerate the convergence speed of the network and to reduce the difficulty of network training, and an attention mechanism (AM) was utilized to carry out weigh distribution of different sequences for current text recognition and to improve the accuracy of text recognition. Extensive experiments on various benchmarks, including the Synth90K, Street View Text and ICDAR datasets, show that the performance of LRAM model substantially outperforms the existing methods.
作者
王茂森
蒋小森
牛少彰
WANG Mao-sen;JIANG Xiao-sen;NIU Shao-zhang(Beijing Key Lab of Intelligent Telecommunication Software and Multimedia,Beijing University of Postsand Telecommunications,Beijing 100876,China;Video Algorithm Group,Youku InformationTechnology(Beijing)Co.,Ltd, Beijing 100080,China)
出处
《北京理工大学学报》
EI
CAS
CSCD
北大核心
2019年第3期269-275,共7页
Transactions of Beijing Institute of Technology
基金
国家自然科学基金资助项目(61370195
U1536121)
关键词
序列文本识别
长短记忆网络
残差网络
注意力模型
sequence text recognition
long short-time memory
deep residual network
attention model