基于平移不变字典的语音唇动一致性判决方法被引量：3

Lip motion and voice consistency analysis algorithm based on shift-invariant dictionary

导出

摘要针对传统语音唇动分析模型容易忽略唇动帧间时变信息从而影响一致性判别结果的问题,提出一种基于平移不变学习字典的一致性判决方法.该方法将平移不变稀疏表示引入语音唇动一致性分析,通过音视频联合字典学习算法训练出时空平移不变的音视频字典,并采用新的数据映射方式对学习算法中的稀疏编码部分进行改进;利用字典中的音视频联合原子作为描述不同音节或词语发音时音频与唇形同步变化关系的模板,最后根据这种模板制定出语音唇动一致性评分判决准则.对四类音视频不一致数据的实验结果表明:本方法与传统统计类方法相比,对于少音节语料,总体等错误率(EER)平均从23.6%下降到11.3%;对于多音节语句,总体EER平均从22.1%下降到15.9%. In order to solve the issue of ignoring the successive and dynamic lip motion information in traditional audiovisual speech synchrony analysis models,a novel method based on shift-invariant learned dictionary was presented.In this method,sparse representation with shift-invariant dictionary was introduced to analysis the bimodal structure of articulation.Firstly,a learning dictionary was obtained based on the audiovisual coherence dictionary learning algorithm,and the sparse coding stage of the learning algorithm was modified by using a new data projection step.Secondly,the dynamic correlation between voice and lip motion of diverse syllable or word was represented as a pattern by this audiovisual coherence atom.Finally,an original audiovisual synchronization score measuring scheme was proposed according to these utterance patterns.The results of the experiment on four different inconsistent data show the good performance of the method.For long sentence,the equal error rate（EER）is reduced from 23.6% to 11.3%,and for short numeric string dataset,the EER is reduced from 22.1%to 15.9%.

作者贺前华朱铮宇奉小慧

机构地区华南理工大学电子与信息学院

出处《华中科技大学学报（自然科学版）》 EI CAS CSCD 北大核心 2015年第10期69-74,共6页 Journal of Huazhong University of Science and Technology(Natural Science Edition)

基金国家自然科学基金资助项目(61401161 61571192) 中央高校基本科研业务费专项资金资助项目(D2154950)

关键词音视频处理稀疏表示平移不变一致性分析字典学习 audio-visual processing sparse representation shift-invariance consistent analysis dictionary learning

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献15

1Vallet F,Essid S,Carrive J.Multimodal approach to speaker diarization on TV talk-shows[J].IEEE Transactions on Multimedia,2013,15(3):509-520.
2Schabus D,Pucher M,Hofer G,et al.Joint audiovisual hidden semi-markovmodel-based speech synthesis[J].IEEE Journal of Selected Topics in Signal Processing,2014,8(2):336-347.
3Bredin H,Chollet G.Making talking-face authentication robust to deliberate imposture[C]∥Proc of the IEEE International Conference on Acoustics,Speech,and Signal Processing.Las Vegas:IEEE,2008:1693-1696.
4Rúa E A,Bredin H,Mateo C G,et al.Audio-visual speech asynchrony detection using co-inertia analysis and coupled hidden markov models[J].Pattern Analysis and Applications,2009,12(3):271-284.
5Izadinia H,Saleemi I,Shah M.Multimodal analysis for identification and segmentation of moving-sounding objects[J].IEEE Transactions on Multimedia,2013,15(2):378-389.
6Lee J S,Ebrahimi T.Audio-visual synchronization recovery in multimedia content[C]∥Proc of IEEE International Conference on Acoustics,Speech,and Signal Processing.Prague:IEEE,2011:2280-2283.
7Kumagai S,Doman K,Takahashi T,et al.Detection of inconsistency between subject and speaker based on the co-occurrence of lip motion and voice towards speech scene extraction from news videos[C]∥Proc of 2011IEEE International Symposium on Multimedia.Los Angeles:IEEE,2011:311-318.
8Rusu C,Dumitrescu B,Tsaftaris S A.Explicit shiftinvariant dictionary learning[J].IEEE Signal Processing Letter,2014,21(1):6-9.
9Blumensath T,Davies M.Sparse and shift-invariant representations of music[J].IEEE Transactions on Audio,Speech,and Language Processing,2006,14(1):50-57.
10Jost P,Vandergheynst P,Lesage S,et al.MoTIF:an efficient algorithm for learning translation invariant dictionaries[C]∥Proc of IEEE International Conference on Acoustics,Speech,and Signal Processing.Toulouse:IEEE,2006:857-860.

二级参考文献12

1Cremers D, Rousson M, Deriche R. A review of statistical approaches to level set segmentation: integrating color, texture, motion and shape [J]. International Journal of Computer Vision, 2007,72 ( 2 ) : 195- 215.
2Cremers D, Soatto Stefano. A pseudo-distance for shape priors in level set segmentation [ C ] //Proc of IEEE Workshop on Variational, Geometric and Level Set Methods in Computer Vision. Nice : IEEE ,2003 : 1-8.
3Chan Tony, Zhu Wei. Level set based shape prior segmentation [ C]//Proc of IEEE Computer Society Conference on Computer Vision and Panem Recognition. San Diego : IEEE ,2005 : 1 164-1 170.
4McGurk J M H. Hearing lips and seeing voices [ J ]. Nature, 1976,264 : 746-748.
5Potamianos G, Neti C, Gravier G, et al. Recent advances in the automatic recognition of audiovisual speech [ J ]. IEEE Signal Processing Magazine, 2003,91 (9) : 1 306- 1 323.
6Zhang X, Mersereau R M, Clements M. Visual speech feature extraction for improved speech recognition [ C ] //Proc of IEEE International Conference on Acoustics, Speech and Signal Processing. Orlando : IEEE, 2002 : 1993-1 996.
7Nefian A, Liang L, Pi X, et al. A couple HMM for audiovisual speech recognition [ C ]//Proc of IEEE International Conference on Acoustics, Speech and Signal Processing. Orlando: IEEE,2002:2013-2016.
8Werda S, Mahdi W, Tmak M, et al. A life:automatic lip feature extraction:a new approach for speech recognition application [ C ]///Proc of IEEE International Conference on Information and Communication Technologies. Damasus:IEEE,2006:2953-2968.
9Dumitras A, Venetsanopoulos N A. Angular map driven snakes with application to object shape description in color images [ J ]. IEEE Transactions on Image Processing, 2001,10(12) : 1 851-1 859.
10Osher S, Sethian J A. Level sets and the fast marching method: evolving interfaces in computational geometry [ M ] // Fluid Mechanics, Computer Vision and Materials Science. Cambridge : Cambridge University Press, 1999.

共引文献5

1邬敏杰,穆平安,张彩艳.基于眼睛和嘴巴状态的驾驶员疲劳检测算法[J].计算机应用与软件,2013,30(3):25-27. 被引量：18
2朱铮宇,贺前华,奉小慧,叶婉玲,李艳雄,杨继臣.基于时空相关度融合的语音唇动一致性检测算法[J].电子学报,2014,42(4):779-785. 被引量：5
3秦传波,杜启亮,田联房,张勤.基于变分水平集的显微注射过程细胞分割[J].华南理工大学学报（自然科学版）,2014,42(3):15-20.
4程文冬,付锐,马勇,张名芳,刘通.非约束条件下的驾驶人嘴唇检测方法[J].长安大学学报（自然科学版）,2016,36(5):79-87. 被引量：2
5罗庆全,王艺澎,杜大猷,赖丽娟.一种专注度培养系统的设计与实现[J].计算技术与自动化,2022,41(1):155-159. 被引量：1

同被引文献10

1孙金城,倪宏,莫福源,李昌立.普通话声母和韵母的统计特性[J].应用声学,1995,14(3):35-41. 被引量：2
2钱博,李燕萍,唐振民,徐利敏.基于频域能量分布分析的自适应元音帧提取算法[J].电子学报,2007,35(2):279-282. 被引量：8
3胡瑛,陈宁.基于小波变换的清浊音分类及基音周期检测算法[J].电子与信息学报,2008,30(2):353-356. 被引量：17
4钟燕飞,张良培,李平湘.遥感影像分类中的模糊聚类有效性研究[J].武汉大学学报（信息科学版）,2009,34(4):391-394. 被引量：11
5梅晓,熊子瑜.普通话韵律结构对声韵母时长影响的分析[J].中文信息学报,2010,24(4):96-103. 被引量：8
6邵健,赵庆卫,颜永红.基于鼻韵尾分离的汉语声韵母识别模型[J].声学学报,2010,35(5):587-592. 被引量：3
7李皓,唐朝京.采用损失函数和声学特征切分声韵母的方法[J].声学学报,2012,37(3):339-345. 被引量：3
8朱铮宇,贺前华,奉小慧,叶婉玲,李艳雄,杨继臣.基于时空相关度融合的语音唇动一致性检测算法[J].电子学报,2014,42(4):779-785. 被引量：5
9徐静云,赵晓群,王峤,王缔罡.基于幅度压缩滤波的清浊音分类及基音估计[J].电子与信息学报,2016,38(3):586-593. 被引量：4
10张瑞峰,白金桐,关欣,李锵.结合SE与BiSRU的Unet的音乐源分离方法[J].华南理工大学学报（自然科学版）,2021,49(11):106-115. 被引量：5

引证文献3

1朱铮宇,邱华愉,杨春玲,王泳.基于特定韵母发音事件分析的语音唇动一致性判决方法[J].华南理工大学学报（自然科学版）,2020,48(1):139-146. 被引量：4
2朱铮宇,廖丽平,杨春玲,王泳,蔡君,邱华愉.基于韵母发音事件匹配与位置时延分析的音唇一致性判决方法[J].电子学报,2021,49(1):140-148. 被引量：1
3朱铮宇,罗超,贺前华,彭炜锋,毛志炜,张顺四.基于唇重构与三维耦合CNN的多视角音唇一致性判别[J].华南理工大学学报（自然科学版）,2023,51(5):70-77. 被引量：1

二级引证文献6

1王艳,李昂,王晟全.基于深度学习的细粒度图像推荐算法研究[J].兵器装备工程学报,2021,42(2):162-167. 被引量：1
2邬友朋,赵金龙,贾中营.一种基于KNN/CNN的供热客服音频分类方法[J].电力大数据,2021,24(7):56-66. 被引量：1
3闵秋洁,刘东.基于机器翻译的普通话发音标准度测试系统[J].自动化与仪器仪表,2022(9):115-119. 被引量：2
4段文婷.英语对话机器人发音标准性检测方法[J].自动化与仪器仪表,2022(11):210-215. 被引量：3
5张昕煜.基于语音感知的英语口语发音自动校准系统[J].自动化技术与应用,2023,42(5):44-47. 被引量：3
6李泽慧,张琳,山显英.三维卷积神经网络方法改进及其应用综述[J].计算机工程与应用,2025,61(3):48-61. 被引量：5

1李建平.上海威乾VN6804A网络视频服务器实测[J].A&S（安防工程商）,2007(5):26-27.
2东升.快速搞定不合适的音视频[J].电脑爱好者,2013(3):39-39.
3段文群.基于DM642在视频处理系统中的应用与设计[J].郧阳师范高等专科学校学报,2012,32(6):45-48.
4马华东,陶丹.多媒体传感器网络及其研究进展[J].软件学报,2006,17(9):2013-2028. 被引量：186
5CMMB[J].世界宽带网络,2011,18(6):12-12.
6陈蓉.下一代网络媒体服务器研究[J].计算机与信息技术,2006(7):62-63.
7长虹建成中国首个自主芯片产学研基地[J].电子工业专用设备,2006,35(5):16-16.
8横向集成纵向贯通——阿启视科技VM3000E综合管理平台新品出炉[J].中国公共安全,2015,0(20):118-119.
9紫玲珑.影音工具酷音视频处理好利器[J].网友世界,2010(10):14-16.
10中国共享软件[J].大众软件,2010(3):49-49.

华中科技大学学报（自然科学版）

2015年第10期

浏览历史

内容加载中请稍等...

基于平移不变字典的语音唇动一致性判决方法被引量：3

参考文献15

二级参考文献12

共引文献5

同被引文献10

引证文献3

二级引证文献6

相关作者

相关机构

相关主题

浏览历史

基于平移不变字典的语音唇动一致性判决方法 被引量：3

参考文献15

二级参考文献12

共引文献5

同被引文献10

引证文献3

二级引证文献6

相关作者

相关机构

相关主题

浏览历史

基于平移不变字典的语音唇动一致性判决方法被引量：3