期刊文献+

基于CRNN改进的中文街景文本识别技术 被引量:2

Improved Chinese Street View Text Recognition Technology based on CRNN
在线阅读 下载PDF
导出
摘要 现实场景中存在图像扭曲、背景复杂、弯曲倾斜等不规则文字形状,提取其中的文字信息可提高图像的语义信息和帮助分析上下文,从而更好地理解场景图像。针对场景文本的复杂问题,提出基于CRNN(卷积循环神经网络)改进的端到端场景文本识别技术。在卷积网络层提取特征,基于GoogLeNet改进的inception结构,加入多分支卷积层对多尺度特征的融合,其次融入注意力机制,在通道维度和空间维度加强特征联系,使局部特征拥有全局性。在循环网络层采用Bi-LSTM(双向长短期记忆网络)加强字符之间的上下文联系进行序列预测,最后将预测序列传入CTC(时序分类层)进行转录后序列输出。在IIIT5K数据集和百度中文街景数据集上的实验结果表明,该方法分别获得了95.3%和91.1%的准确率,证明其可靠性。 In real-world scenarios,there are complexities such as image distortion,background clutter,bending,and tilting that can cause irregular text shapes.Extracting textual information from these images can enhance their semantic content and help analyze the context,thus better facilitating understanding of the scene.To address these challenges in scene text recognition,an end-to-end text recognition technique based on CRNN(Convolutional Recurrent Neural Net-work)is proposed.In the convolutional network layer,an improved inception structure based on GoogLeNet is used to extract features.This structure incorporates multi-branch convolutional layers for the fusion of multi-scale features.Ad-ditionally,an attention mechanism is incorporated to enhance feature correlation in both the channel and spatial dimen-sions,giving local features a global perspective.In the recurrent network layer,Bi-LSTM(Bidirectional Long Short-Term Memory)is employed to strengthen the contextual relationships between characters for sequential prediction.Final-ly,the predicted sequence is fed into CTC(Connectionist Temporal Classification)for post-transcription sequence out-put.Experimental results on the ITT5K dataset and Baidu’s Chinese Street View dataset demonstrate the reliability of this approach,with accuracy rates of 95.3%and 91.1%respectively.
作者 任锐 王晓娅 文成玉 REN Rui;WANG Xiaoya;WEN Chengyu(College of Communicating Engineering,Chengdu University of Information Technology,Chengdu 610225,China)
出处 《成都信息工程大学学报》 2025年第1期1-6,共6页 Journal of Chengdu University of Information Technology
基金 四川省科技计划资助项目(2023YFS0422)。
关键词 文本识别 卷积神经网络 注意力机制 双向长短期记忆 text recognition convolutional neural network ttention mechanism bi-directional long and shot-termn memory
  • 相关文献

参考文献4

二级参考文献8

共引文献32

同被引文献22

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部