摘要
针对印刷标签中的多语言、密集排列及极端长宽比的文本,提出一种基于注意力机制的印刷包装标签文本区域的定位检测方法。将ResNet中的Bottleneck传统卷积替换为可学习间距的扩张卷积,以扩大网络感受野;加入全局与局部注意力机制模块,以增强骨干网络对文本信息的特征提取能力;在特征金字塔网络中加入残差注意力机制模块,以引导多尺度特征的自适应融合。消融试验结果表明,相较于DBNet,改进后的模型F1值分别提高1.2,2.1,1.7个百分点。在ICDAR2015,Total-Text及自建数据集上的对比试验结果显示,模型检测效果优于EAST,PSENet,FCENet,DPText-DETR,DBNet等主流文本检测模型,模型F1值分别为88.3%,86.1%,85.1%。研究为印刷标签在线检测的智能化提供保障。
To address the challenges of multilingual texts,dense arrangements,and extreme aspect ratios in printed labels,this study proposes an attention mechanism-based localization method for text regions in printed packaging labels.The conventional convolution in ResNet's Bottleneck was replaced with dilated convolution featuring learnable spacing to expand the network's receptive field.Global and local attention mechanism modules were incorporated to enhance feature extraction capabilities for textual information in the backbone network.A residual attention mechanism module was added to the feature pyramid network to guide adaptive fusion of multi-scale features.Ablation experiments showed that compared to DBNet,the proposed model improved F1-scores by 1.2,2.1,and 1.7 percentage points respectively.Comparative tests on ICDAR2015,Total-Text,and a self-built dataset demonstrated superior detection performance over mainstream text detection models(EAST,PSENet,FCENet,DPText-DETR,DBNet),achieving F1-scores of 88.3%,86.1%,and 85.1%.This research provides assurance for intelligent online inspection of printed labels.
作者
张鹏涛
李文峰
宋强
ZHANG Pengtao;LI Wenfeng;SONG Qiang(Institute of Technology,China University of Petroleum(Beijing)at Karamay,Karamay 834000,China;Zhongheng Yongchuang(Beijing)Technology Company,Beijing 102206,China)
出处
《包装与食品机械》
北大核心
2025年第3期80-87,共8页
Packaging and Food Machinery
基金
新疆维吾尔自治区自然科学基金面上项目(2021D01A203)。
关键词
标签检测
文本定位
注意力机制
可学习间距扩张卷积
label detection
text localization
attention mechanism
dilated convolution with learnable spacing