基于双层码本的语音驱动视觉语音合成系统被引量：2

Bi-level Codebook Based Speech-driven Visual-speech Synthesis System

下载PDF

导出

摘要提出了一种基于双层码本的语音驱动视觉语音合成系统,该系统以矢量量化的思想为基础,建立语音特征空间到视觉语音特征空间的粗耦合映射关系。为加强语音和视觉语音的关联性,系统分别根据语音特征与视觉语音特征的相似性两次对样本数据进行自动聚类,构造同时反映语音之间与视觉语音之间相似性的双层映射码本。数据预处理阶段,提出一种能反映视觉语音几何形状特征与牙齿可见度的联合特征模型,并在语音特征LPCC及MFCC基础上采用遗传算法提取视觉语音相关的语音特征模型。合成的视频中图像数据与原始视频中图像数据的比较结果表明,合成结果能在一定程度上逼近原始数据,取得了很好的效果。 The paper proposed a hi-level codebook based speech-driven visual-speech synthesis system. The system uses the vector quantization principle to establish a coarse-coupling mapping relationship from the speech feature space to the visual speech feature space, In order to enhance the relationship between the speech and the visual speech, the system makes the unsupervising-clustering on the sample data according to the similarity of both the acoustic speech and the visual speech and constructs the hi-level mapping codebook reflecting the similarity of both the acoustic speech and the visual speech. At the stage of preproeessing, the paper proposed a joint feature model, which reflects the geometric character and the visibility of teeth. The paper also proposed an approach to extract the visual speech correlative speech feature from the speech features of LPCC and MFCC on the basis of genetic algorithm. The comparison results between the synthesis image sequences with the original one show that the synthesis one can approximate the original one and the result is good. In the future research, the restriction between the visual speech contexts should be considered to im- prove the smoothness of the synthesis results.

作者贾熹滨尹宝才孙艳丰

机构地区北京工业大学多媒体与智能软件技术北京市重点实验室

出处《计算机科学》 CSCD 北大核心 2014年第1期100-104,共5页 Computer Science

基金国家自然科学基金(61070117) 北京市自然科学基金(4122004)资助

关键词双层码本视觉语音合成视觉语音特征语音特征 Bi-level codebook, Visual speech synthesis, Visual speech feature, Speech feature

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献12

1Jia Jia,Zhang Shen,Meng Fan-bo. Emotional audio-visual speech synthesis based on PAD[J].IEEE Transactions on Audio Speech and Language Processing,2011,(03):570-582.
2谢金晶,陈益强,刘军发.基于语音情感识别的多表情人脸动画方法[J].计算机辅助设计与图形学学报,2008,20(4):520-525. 被引量：6
3Pandzic I S,Ostermann J. User evaluation:synthetic talking faces for interactive services[J].The Visual Computer,1999,(7/8):330-340.
4Massaro D W,Ouni S,Cohen M M. A multilingual embodied conversational agent[A].Los Alimitos,CA,IEEE Computer Society Press,2005.
5王志明,陶建华.文本-视觉语音合成综述[J].计算机研究与发展,2006,43(1):145-152. 被引量：5
6Gao W,Chen Y Q. Learning and synthesizing mpeg-4 compatible 3-d face animation from video sequence[J].IEEE Transactions on Circuits and Systems for Video Technology,2003,(11):1119-1128.
7Brand M. Voice puppetry[A].ACM Press/Addison-Wesley Publishing Co:New York,NY,USA,1999.21-28.
8Morishima S,Harashima H. Speech-to-image media conversion based on VQ and neural network[A].1991.2865-2868.
9Gutierrez-OsunaR,KakumanuP,EspositoA. Speech-driven facial animation with realistic dynamics[J].IEEE Transactions on multimedia,2005,(01):33-41.
10Jiang J T,Alwan A,Bemstein L E. Predicting face move ments from speech acoustics using spectral dynamics[A].2002.181-184.

二级参考文献13

1左力,李治国,李锦涛,高文.基于标注图像的MPEG-4人脸运动参数获取方法[J].系统仿真学报,2001,13(S2):497-501. 被引量：1
2陈明义,余伶俐,朱晗,周昆湘.基于特征参数融合的语音情感识别方法[J].微电子学与计算机,2006,23(12):168-171. 被引量：10
3Li Y, Yu F, Xu Y Q, et al. Speech-driven cartoon animation with emotions [C] //Proceedings of the 9th ACM International Conference on Multimedia, Ottawa, 2001:365-371
4Hong P Y, Wen Z, Huang T S. Real-time speech-driven face animation with expressions using neural networks [J]. IEEE Transactions on Neural Networks, 2002, 13(4) : 916-927
5Deng Z G, Bulut M, Neumann U, et al. Automatic dynamic expression synthesis for speech animation [C] //Proceedings of the 17th Computer Animation and Social Agents, Geneva, 2004:267-274
6Dellaert F, Polzin T, Waibel A. Recognizing emotion in speech [C]//Proceedings of the 4th International Conference on Spoken Language Processing, Philadelphia, PA, 1996: 1970-1973
7Schuller B, Rigoll G, Lang M. Hidden Markov model-based speech emotion reeognition [C] //Proeeedings of the 2003 IEEE International Conferenee on Aeousties, Speech, & Signal Proeessing, Hong Kong, 2003:401-404
8Nogueiras A, Moreno A, Bonafante A, et al. Speech emotion recognition using hidden Markov models [C] //Proceedings of Eurospeech, Aalborg, 2001:2679-2682
9姜大龙.真实感三维人脸合成方法研究[D].北京:中国科学院研究生院,2004
10赵力,钱向民,邹采荣,吴镇扬.语音信号中的情感识别研究[J].软件学报,2001,12(7):1050-1055. 被引量：58

共引文献9

1邵艳秋,穗志方,韩纪庆,王志伟.小规模情感数据和大规模中性数据相结合的情感韵律建模研究[J].计算机研究与发展,2007,44(9):1624-1631.
2孙凯,于俊清.面向观众的个性化电影情感内容表示与识别[J].计算机辅助设计与图形学学报,2010,22(1):136-144. 被引量：3
3张小燕,宿建军,薛化建,王磊.维吾尔语语音识别语料库中的OOV研究[J].计算机工程与设计,2012,33(2):772-776. 被引量：4
4崔明,许志闻.基于加权DFFD算法和渐变动画思想的人脸动画系统[J].吉林大学学报（理学版）,2012,50(2):288-292.
5熊磊,毕笃彦,何林远,李权合.分层人脸模型及其真实感风格表情合成[J].计算机辅助设计与图形学学报,2013,25(8):1204-1212. 被引量：3
6於俊,汪增福.面向人机接口的多种输入驱动的三维虚拟人头[J].计算机学报,2013,36(12):2525-2536. 被引量：2
7吴翠娟,赵晖.可视化协同发音合成研究综述[J].现代计算机,2014,20(9):9-14.
8曹亮,赵晖.具有情感表现力的可视语音合成研究综述[J].计算机工程与科学,2015,37(4):813-818. 被引量：3
9张泽强,邓军祥,易法令.基于Candide-3模型的人脸表情动画系统设计与实现[J].福建电脑,2016,32(2):9-11. 被引量：1

同被引文献31

1Wesley Mattheyses, Lukas Latacz, Werner Verhelst. Comprehensive Many-to-Many Phoneme-to-Viseme Mapping and Its Application for Concatenative Visual Speech Synthesis[J]. Speech Communication,2013,55 (7-8) :857-876.
2Salil Deena,Shaobo Hou, Aphrodite Galata. Visual Speech Synthesis Using a Variable-Order Switching Shared Gaussian Process Dy- namical Model[J]. Multimedia, IEEE Transactions on, 2013,15(8), 1755-1768.
3Salil Deena, Shaobo Hou, Aphrodite Galata. Visual Speech Synthesis by Modelling Coarticulation Dynamics Using a Non-Parametric Switching State-Space Model[C]. ICMI-MLMI "10: International Conference on Muhimodal Interfaces and the Workshop on Machine Learning for Muhimodal Interaction,2010.
4Changwei Luo, Jun Yu, Xian Li, ZengfuWang. Real Time Speech-Driven Facial Animation Using Gaussian Mixture Models[C]. 2014 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)2014:1-6.
5Changwei Luo, Jun Yu, Zengfu Wang. Synthesizing Real-Time Speech-Driven Facial Animation[C]. 2014 IEEE International Confer- ence on Acoustics, Speech and Signal Processing (ICASSP),2014:4568-4572.
6Jia Jia, Shen Zhang, Fanbo Meng, Yongxin Wang, Lianhong Cai. Emotional Audio-Visual Speech Synthesis Based on PAD, IEEE Transactions on AUDIO, Speech, and Language Processing, VOL. 19, No. 3, MARCH 2011.
7Shen Zhang, Jia Jia, Yingjin Xu, Lianhong Cai. Emotional Talking Agent: System and Evaluation. 2010 Sixth International Conference on Natural Computation (ICNC 2010).
8Jianhua Tao, Member, IEEE, Li Xin,Panrong Yin. Realistic Visual Speech Synthesis Based on Hybrid Concatenation Method. IEEE Transactions on AUDIO, Speech, and Language Processing, VOL. 17, No. 3, MARCH 2009.
9Lucas Terissi; Mauricio Cerda; Juan C.Gomez. Animation of Generic 3D Head Model Driven by Speech[C]. 2011 IEEE International Conference on Multimedia and Expo (ICME),2011:1-6.
10Lei Xie, Zhi-Qiang Liu. Speech Animation Using Coupled Hidden Markov Models[C]. Pattern Recognition, 2006. ICPR 2006. 18th International Conference on,2006:1128-1131.

引证文献2

1刘豫军,夏聪.计算机语音合成技术研究及发展方向[J].网络安全技术与应用,2014(12):22-22. 被引量：4
2王慧慧,赵晖.语音驱动人脸动画研究综述[J].现代计算机（中旬刊）,2015(5):54-59. 被引量：3

二级引证文献7

1秦添,赵晖.维吾尔语可视语音合成的唇部动画系统[J].计算机工程,2016,42(12):282-289. 被引量：1
2李欣怡,张志超.语音驱动的人脸动画研究现状综述[J].计算机工程与应用,2017,53(22):21-28. 被引量：6
3张媛媛,宋海荣,杨少魁,郜慧斌.智能机器人语音交互专利技术分析[J].河南科技,2020(9):153-160. 被引量：1
4李建文,朱悦.皮肤听声原理在语音合成中的应用研究[J].现代电子技术,2020,43(19):35-39.
5申少鹏,胡松涛.基于StarGAN-VC的语音风格转换技术[J].电声技术,2024,48(1):35-37.
6吴荣顺,Saiful Nizam Warris.Balabolka和拼音文本转语音系统比较及其对外汉语语音教学应用[J].国际汉语学报,2016,7(1):217-227. 被引量：1
7刘龙,李浩生,张梦璇,杜莹,常雅淇,张文博.基于深度学习的人脸动画驱动方法综述[J].西安电子科技大学学报,2025,52(2):57-84. 被引量：2

1赖伟,孙岭,王仁华.一种基于三维模型和照片的合成“说话头”[J].中国图象图形学报（A辑）,2004,9(7):886-892. 被引量：3
2张建明,陶宏,王良民,詹永照,宋顺林.基于SVD的唇动视觉语音特征提取技术[J].江苏大学学报（自然科学版）,2004,25(5):426-429. 被引量：3
3王志明,蔡莲红,艾海舟.视觉语音参数的自动估计[J].计算机研究与发展,2005,42(7):1185-1190.
4吕国云,赵荣椿,蒋冬梅,蒋晓悦,侯云舒,Sahli H.基于BTSM和DBN模型的唇读和视素切分研究[J].计算机工程与应用,2007,43(14):21-24.
5王志明,陶建华.文本-视觉语音合成综述[J].计算机研究与发展,2006,43(1):145-152. 被引量：5
6王志明,陶建华.计算机应用——文本-视觉语音合成综述[J].中国学术期刊文摘,2006,12(8):5-5.
7倪宁,卢刚.基于视频的人脸表情识别方法研究[J].计算机工程与应用,2008,44(17):198-200. 被引量：2
8蒙山,张有为.基于支撑向量机的视觉语音特征区域定位方法[J].计算机工程与应用,2003,39(3):50-52.
9王志明,蔡莲红,吴志勇,陶建华.汉语文本-可视语音转换的研究[J].小型微型计算机系统,2002,23(4):474-477. 被引量：10
10吕国云,赵荣椿,蒋冬梅,H．Sahli,樊养余,W．Verhelst.基于BTSM-LDA的口形动态特征及多流异步音视频语音识别[J].数据采集与处理,2008,23(4):397-403.

计算机科学

2014年第1期

浏览历史

内容加载中请稍等...

基于双层码本的语音驱动视觉语音合成系统被引量：2

参考文献12

二级参考文献13

共引文献9

同被引文献31

引证文献2

二级引证文献7

相关作者

相关机构

相关主题

浏览历史

基于双层码本的语音驱动视觉语音合成系统 被引量：2

参考文献12

二级参考文献13

共引文献9

同被引文献31

引证文献2

二级引证文献7

相关作者

相关机构

相关主题

浏览历史

基于双层码本的语音驱动视觉语音合成系统被引量：2