摘要
End-to-end scene text spotting,which jointly localizes and recognizes texts in natural images,has advanced significantly for Chinese and English.However,Vietnamese text spotting remains challenging due to persistent diacritic recognition failures and missed detections.To bridge this gap,we proposed a diacritic-focused Vietnamese text spotting framework that mitigates background interference.Specifically,we proposed the DDCM to capture fine-grained diacritical features by adapting to the structural characteristics of Vietnamese character.During the detection phase,we proposed the Global Feature Fusion Module to help the model more accurately understand the relationship between local details and global context for each region of interest.During the recognition phase,we designed the Cross Channel Attention Module to capture the spatial relationships while discriminating subtle diacritic variations through channel-wise recalibration.Extensive experiments demonstrate that our framework improves recognition accuracy over several state-of-the-art methods on Vietnamese scene text benchmarks.The code is available at https://github.com/mlmmwym/FCVintextSpotter.
基金
partially supported by the National Natural Science Foundation of China(62366011)
the Natural Science Foundation of Guangxi District(2024GXNSFDA010066)
the Key R&D Program of Guangxi under Grant(AB21220023)。