期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
STDNet:Improved lip reading via short-term temporal dependency modeling
1
作者 Xiaoer WU Zhenhua TAN +1 位作者 ziwei cheng Yuran RU 《虚拟现实与智能硬件(中英文)》 2025年第2期173-187,共15页
Background Lip reading uses lip images for visual speech recognition.Deep-learning-based lip reading has greatly improved performance in current datasets;however,most existing research ignores the significance of shor... Background Lip reading uses lip images for visual speech recognition.Deep-learning-based lip reading has greatly improved performance in current datasets;however,most existing research ignores the significance of short-term temporal dependencies of lip-shape variations between adjacent frames,which leaves space for further improvement in feature extraction.Methods This article presents a spatiotemporal feature fusion network(STDNet)that compensates for the deficiencies of current lip-reading approaches in short-term temporal dependency modeling.Specifically,to distinguish more similar and intricate content,STDNet adds a temporal feature extraction branch based on a 3D-CNN,which enhances the learning of dynamic lip movements in adjacent frames while not affecting spatial feature extraction.In particular,we designed a local–temporal block,which aggregates interframe differences,strengthening the relationship between various local lip regions through multiscale convolution.We incorporated the squeeze-and-excitation mechanism into the Global-Temporal Block,which processes a single frame as an independent unitto learn temporal variations across the entire lip region more effectively.Furthermore,attention pooling was introduced to highlight meaningful frames containing key semantic information for the target word.Results Experimental results demonstrated STDNet's superior performance on the LRW and LRW-1000,achieving word-level recognition accuracies of 90.2% and 53.56%,respectively.Extensive ablation experiments verified the rationality and effectiveness of its modules.Conclusions The proposed model effectively addresses short-term temporal dependency limitations in lip reading,and improves the temporal robustness of the model against variable-length sequences.These advancements validate the importance of explicit short-term dynamics modeling for practical lip-reading systems. 展开更多
关键词 Lip reading Spatio-temporal feature fusion Short-term temporal dependency modeling
在线阅读 下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部