Recognizing irregular text in natural images is a challenging task in computer vision.The existing approaches still face difficulties in recognizing irre-gular text because of its diverse shapes.In this paper,we propos...Recognizing irregular text in natural images is a challenging task in computer vision.The existing approaches still face difficulties in recognizing irre-gular text because of its diverse shapes.In this paper,we propose a simple yet powerful irregular text recognition framework based on an encoder-decoder archi-tecture.The proposed framework is divided into four main modules.Firstly,in the image transformation module,a Thin Plate Spline(TPS)transformation is employed to transform the irregular text image into a readable text image.Sec-ondly,we propose a novel Spatial Attention Module(SAM)to compel the model to concentrate on text regions and obtain enriched feature maps.Thirdly,a deep bi-directional long short-term memory(Bi-LSTM)network is used to make a con-textual feature map out of a visual feature map generated from a Convolutional Neural Network(CNN).Finally,we propose a Dual Step Attention Mechanism(DSAM)integrated with the Connectionist Temporal Classification(CTC)-Attention decoder to re-weights visual features and focus on the intra-sequence relationships to generate a more accurate character sequence.The effectiveness of our proposed framework is verified through extensive experiments on various benchmarks datasets,such as SVT,ICDAR,CUTE80,and IIIT5k.The perfor-mance of the proposed text recognition framework is analyzed with the accuracy metric.Demonstrate that our proposed method outperforms the existing approaches on both regular and irregular text.Additionally,the robustness of our approach is evaluated using the grocery datasets,such as GroZi-120,Web-Market,SKU-110K,and Freiburg Groceries datasets that contain complex text images.Still,our framework produces superior performance on grocery datasets.展开更多
现有目标检测算法对背景复杂下小交通标志的检测效果并不理想。为此,提出了一种基于归一化通道注意力机制YOLOv7的交通标志检测算法(YOLOv7 based on normalized channel attention mechanism,YOLOv7-NCAM)。为了使YOLOv7-NCAM模型具有...现有目标检测算法对背景复杂下小交通标志的检测效果并不理想。为此,提出了一种基于归一化通道注意力机制YOLOv7的交通标志检测算法(YOLOv7 based on normalized channel attention mechanism,YOLOv7-NCAM)。为了使YOLOv7-NCAM模型具有像素级建模能力,提高它对小目标交通标志特征的提取能力,YOLOv7-NCAM算法使用FReLU激活函数构建了DBF和CBF两种卷积层,并用它们来组建模型的Backbone模块和Neck模块;提出一种归一化通道注意力机制(normalized channel attention mechanism,NCAM)并加入Head模块中。通过与整体网络一起训练,得到归一化(batch normalization,BN)缩放因子,利用缩放因子算出各个通道的权重因子,提升网络对交通标志特征的表达能力,从而使YOLOv7-NCAM网络模型能够集中关注检测目标交通标志。通过在CCTSDB-2021交通标志检测数据集上的测试,与YOLOv7网络模型对比结果表明,YOLOv7-NCAM算法对背景复杂下小交通标志的检测各项指标均有明显提高:准确率(precision,P)达到91.5%,比原网络高出9.5个百分点;召回率(recall,R)达到85.9%,比原网络高出5.7个百分点;均值平均精度(mean average precision,mAP)达到了91.4%,比原网络高出4.7个百分点。与现有的交通标志检测算法相比,YOLOv7-NCAM算法的检测准确率也有提高,且检测速度48.3 FPS,能满足实时需求。展开更多
文摘Recognizing irregular text in natural images is a challenging task in computer vision.The existing approaches still face difficulties in recognizing irre-gular text because of its diverse shapes.In this paper,we propose a simple yet powerful irregular text recognition framework based on an encoder-decoder archi-tecture.The proposed framework is divided into four main modules.Firstly,in the image transformation module,a Thin Plate Spline(TPS)transformation is employed to transform the irregular text image into a readable text image.Sec-ondly,we propose a novel Spatial Attention Module(SAM)to compel the model to concentrate on text regions and obtain enriched feature maps.Thirdly,a deep bi-directional long short-term memory(Bi-LSTM)network is used to make a con-textual feature map out of a visual feature map generated from a Convolutional Neural Network(CNN).Finally,we propose a Dual Step Attention Mechanism(DSAM)integrated with the Connectionist Temporal Classification(CTC)-Attention decoder to re-weights visual features and focus on the intra-sequence relationships to generate a more accurate character sequence.The effectiveness of our proposed framework is verified through extensive experiments on various benchmarks datasets,such as SVT,ICDAR,CUTE80,and IIIT5k.The perfor-mance of the proposed text recognition framework is analyzed with the accuracy metric.Demonstrate that our proposed method outperforms the existing approaches on both regular and irregular text.Additionally,the robustness of our approach is evaluated using the grocery datasets,such as GroZi-120,Web-Market,SKU-110K,and Freiburg Groceries datasets that contain complex text images.Still,our framework produces superior performance on grocery datasets.
文摘现有目标检测算法对背景复杂下小交通标志的检测效果并不理想。为此,提出了一种基于归一化通道注意力机制YOLOv7的交通标志检测算法(YOLOv7 based on normalized channel attention mechanism,YOLOv7-NCAM)。为了使YOLOv7-NCAM模型具有像素级建模能力,提高它对小目标交通标志特征的提取能力,YOLOv7-NCAM算法使用FReLU激活函数构建了DBF和CBF两种卷积层,并用它们来组建模型的Backbone模块和Neck模块;提出一种归一化通道注意力机制(normalized channel attention mechanism,NCAM)并加入Head模块中。通过与整体网络一起训练,得到归一化(batch normalization,BN)缩放因子,利用缩放因子算出各个通道的权重因子,提升网络对交通标志特征的表达能力,从而使YOLOv7-NCAM网络模型能够集中关注检测目标交通标志。通过在CCTSDB-2021交通标志检测数据集上的测试,与YOLOv7网络模型对比结果表明,YOLOv7-NCAM算法对背景复杂下小交通标志的检测各项指标均有明显提高:准确率(precision,P)达到91.5%,比原网络高出9.5个百分点;召回率(recall,R)达到85.9%,比原网络高出5.7个百分点;均值平均精度(mean average precision,mAP)达到了91.4%,比原网络高出4.7个百分点。与现有的交通标志检测算法相比,YOLOv7-NCAM算法的检测准确率也有提高,且检测速度48.3 FPS,能满足实时需求。