期刊文献+

基于TILT、DBNet与CRNN的图书封面文字识别算法

A Book Cover Text Recognition Algorithm Based on TILT,DBNet and CRNN
在线阅读 下载PDF
导出
摘要 从图书封面自动识别文字是获取元数据的关键,但书籍摆放角度、复杂设计及光照条件显著影响识别精度。为此,提出多阶段协同的级联框架,融合DBNet检测网络、改进的TILT姿态矫正算法与CRNN序列模型,构建“检测—矫正—再检测”闭环流程。首先通过DBNet初步定位文字区域,随后采用局部低秩优化的TILT算法对所有文字区域进行一次性几何校正,再通过DBNet二次检测精确定位文字位置,最终结合CRNN实现多语言混合文本的高效识别。双重检测机制抑制误差传播,局部低秩优化避免全局矫正对背景的敏感性,在常规与倾斜场景下均提升识别鲁棒性。实验表明,较传统OCR及主流深度学习模型,该方法在复杂图书封面场景中准确性与适应性更优,为图书馆数字化管理的文字信息提取提供有效技术路径。 Automatically recognizing text on book covers is crucial for retrieving metadata,but challenges such as book orientation,complex designs,and varying lighting conditions significantly degrade recognition accuracy.To address this,this paper proposes a multi-stage cascaded framework that integrates the DBNet detection network,an improved TILT pose correction algorithm,and a CRNN sequence model to construct a closed-loop“detection-correction-re-detection”pipeline.First,DBNet preliminarily localizes text regions.Then,the TILT algorithm with local low-rank optimization performs geometric correction on all text regions in a single step.A second DBNet detection refines text positions,and CRNN ultimately enables efficient recognition of multilingual mixed text.The double detection mechanism suppresses error propagation,while local low-rank optimization avoids global correction’s sensitivity to background interference,enhancing recognition robustness in both regular and tilted scenarios.Experiments demonstrate that the method outperforms traditional OCR and mainstream deep learning models in accuracy and adaptability for complex book cover scenarios,providing an effective technical solution for text extraction in library digitization management.
作者 秦燕 QIN Yan
出处 《图书情报导刊》 2025年第5期27-34,共8页 Journal of Library and Information Science
关键词 深度学习 光学字符识别 神经网络 图书馆自动化 图书元数据管理 deep learning optical character recognition neural networks library automation bibliographic metadata management
  • 相关文献

参考文献1

二级参考文献19

  • 1Girod B, Chandrasekhar V, Chen D M, et al. Mobile visual search [J]. IEEE Signal Processing Magazine, 2011, 28(4): 61-76.
  • 2Lowe D G. Distinctive image features from scale-invariant keypoints [J]. International Journal of Computer Vision, 2004, 60(2): 91-110.
  • 3Morel J M, Yu G S. ASIFT: a new framework for fully affine invariant image comparison [J]. SIAM Journal on Imaging Sciences, 2009, 2(2): 438-469.
  • 4Iwata K, Yamamoto K. Book cover identification by using four directional features filed for a small-scale library system [C] //Proceedings of International Conference on Document Analysis and Recognition. Los Alamitos: IEEE Computer Society Press, 2001:582-586.
  • 5Tsai S S, Chen D, Singh J P, etal. Rate-efficient, real-time cd cover recognition on a camera-phone[C] //Proceedings of the 16th International Conference on Multimedia. New York: ACM Press, 2008: 1023-1024.
  • 6Tsai S S, Chen D M, Chandrasekhar V, etal. Mobile product recognition[C] //Proceedings of the International Conference on Multimedia. New York: ACMPress, 2010:1587-1590.
  • 7Burges C J C. A tutorial on support vector machines for pattern recognition [J]. Data Mining and Knowledge Discovery, 1998, 2(2): 121-167.
  • 8Shi J B, Malik J. Normalized cuts and image segmentation [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(8): 888-905.
  • 9Osher S, Sethian J A. Fronts propagating with curvature dependent speedz algorithms based on the Hamilton Jacobi formulation [J]. Journal of Computational Physics, 1988, 79 (1): 12-49.
  • 10Rother C, Kolmogorov V, Blake A. "GrabCut": interactive foreground extraction using iterated graph cuts [J]. ACM Transactions on Graphics, 2004, 23(3); 309-314.

共引文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部