摘要
从图书封面自动识别文字是获取元数据的关键,但书籍摆放角度、复杂设计及光照条件显著影响识别精度。为此,提出多阶段协同的级联框架,融合DBNet检测网络、改进的TILT姿态矫正算法与CRNN序列模型,构建“检测—矫正—再检测”闭环流程。首先通过DBNet初步定位文字区域,随后采用局部低秩优化的TILT算法对所有文字区域进行一次性几何校正,再通过DBNet二次检测精确定位文字位置,最终结合CRNN实现多语言混合文本的高效识别。双重检测机制抑制误差传播,局部低秩优化避免全局矫正对背景的敏感性,在常规与倾斜场景下均提升识别鲁棒性。实验表明,较传统OCR及主流深度学习模型,该方法在复杂图书封面场景中准确性与适应性更优,为图书馆数字化管理的文字信息提取提供有效技术路径。
Automatically recognizing text on book covers is crucial for retrieving metadata,but challenges such as book orientation,complex designs,and varying lighting conditions significantly degrade recognition accuracy.To address this,this paper proposes a multi-stage cascaded framework that integrates the DBNet detection network,an improved TILT pose correction algorithm,and a CRNN sequence model to construct a closed-loop“detection-correction-re-detection”pipeline.First,DBNet preliminarily localizes text regions.Then,the TILT algorithm with local low-rank optimization performs geometric correction on all text regions in a single step.A second DBNet detection refines text positions,and CRNN ultimately enables efficient recognition of multilingual mixed text.The double detection mechanism suppresses error propagation,while local low-rank optimization avoids global correction’s sensitivity to background interference,enhancing recognition robustness in both regular and tilted scenarios.Experiments demonstrate that the method outperforms traditional OCR and mainstream deep learning models in accuracy and adaptability for complex book cover scenarios,providing an effective technical solution for text extraction in library digitization management.
出处
《图书情报导刊》
2025年第5期27-34,共8页
Journal of Library and Information Science
关键词
深度学习
光学字符识别
神经网络
图书馆自动化
图书元数据管理
deep learning
optical character recognition
neural networks
library automation
bibliographic metadata management