基于感知加权线谱对距离的最小生成误差语音合成模型训练方法

Minimum Generation Error Training Based on Perceptually Weighted Line Spectral Pair Distance for Statistical Parametric Speech Synthesis

原文传递

导出

摘要提出一种基于感知加权线谱对(Line Spectral Pair,LSP)距离的最小生成误差(Minimum Generation Error,MGE)模型训练方法,用以改善基于隐马尔科夫模型的参数语音合成系统性能.在采用线谱对参数表征语音频谱特征时,传统MGE训练中使用的欧氏距离生成误差计算方法并不能较好地反映生成频谱与自然频谱之间的真实距离,而采用与谱参数无关的对数谱间距(Log Spectral Distortion,LSD)定义的生成误差函数可改善这一问题,但改进后主观效果不明显,且运算复杂度很高.文中先提出基于加权LSP距离的MGE模型训练方法,并在实验中从主客观对比不同加权方法以及基于LSD的MGE训练.最后,找到一种感知加权方法,不但具有较好的主观表现,而且在运算复杂度上与传统MGE训练相比几乎没有增加. A Minimum Generation Error （MGE） training method based on perceptually weighted Line Spectral Pair （LSP） distance is proposed to improve the performance of Hidden Markov Model （HMM） based parametric speech synthesis system. The generation error defined by Euclidean distance used in the traditional MGE training, is not eligible in measuring the real gap between generated spectrum and natural spectrum when the speech spectrum is described by LSP. Although using generation error defined by Log Spectral Distortion （LSD） having nothing to do with spectrum parameters manages to deal with this problem, the improvement seems trivial compared to the incurred higher computational complexity. In this paper, an MGE training criterion based on weighted LSP distance is proposed, and this MGE training method is subjectively and objectively contrasted with different weighted methods and LSD basedMGE training method. Eventually, a perceptually weighted training method is obtained, which not only achieves the best performance, but also incurs no extra computational complexity compared with thetraditional MGE training.

作者雷鸣凌震华戴礼荣

机构地区中国科学技术大学电子工程与信息科学系讯飞语音实验室

出处《模式识别与人工智能》 EI CSCD 北大核心 2010年第4期572-579,共8页 Pattern Recognition and Artificial Intelligence

关键词语音合成隐可尔科夫模型(HMM) 最小生成误差(MGE) 感知加权线谱对参数 Speech Synthesis, Hidden Markov Model （HMM）, Minimum Generation Error （MGE）,Perceptually Weighting, Line Spectral Pair Parameter

分类号 TN912.33 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献16

1Masuko T,Tokuda K,Kobayashi T,et al.Speech Synthesis Using HMMs with Dynamic Features // Proc of the IEEE International Conference on Acoustics,Speech and Signal Processing.Atlanta,USA,1996,Ⅰ:389-392.
2Yoshimura T,Tokuda K,Masuko T,et al.Simultaneous Modeling of Spectrum,Pitch and Duration in HMM-Based Speech Synthesis//Proc of the IEEE International Conference on Acoustics,Speech and Signal Processing.Phoenix,USA,1999,Ⅴ:2347-2350.
3Tokuda K,Kobayasbi T,Imai S.Speech Parameter Generation from HMM Using Dynamic Features // Proc of the IEEE International Conference on Acoustics,Speech and Signal Processing.Detroit,USA,1995,Ⅰ:660-663.
4Ling Zhenhua,Qin Long,Lu Heng,et al.The USTC and iFlytek Speech Synthesis Systems for Blizzard Challenge 2007//Proc of the Blizzard Challenge Workshop.Bonn,Germany,2007:17-21.
5Zen H,Toda T.An Overview of Nitech HMM-Based Speech Synthesis System for Blizzard Challenge 2005//Proc of the 9th European Conference on Speech Communication and Technology.Lisbon,Portngal,2005:93-96.
6Wu Yijian,Wang Renhua.Minimum Generation Error Training for HMM-Based Speech Synthesis // Proc of the IEEE International Conference on Acoustics,Speech and Signal Processing.Toulouse,France,2006,Ⅰ:889-892.
7Wu Yijian,Guo Wu,Wang Renhua.Minimum Generation Error Criterion for Tree-Based Clustering of Context Dependent HMMs//Proc of the 9th International Conference on Speken Language Processing.Pittsburgh,USA,2006:2046-2049.
8Qin Long,Wu Yijian,Ling Zhenhua,et al.Minimum Generation Error Linear Regressi6n Based Model Adaptation for HMM-Based Speech Synthesis// Proc of the IEEE International Conference on Acoustics,Speech and Signal Processing.Las Vegas,USA,2008:3953-3956.
9McLoughlin I V.Line Spectral Pairs.Signal Processing Journal,2008,88(3):448-467.
10吴义坚,王仁华.基于HMM的可训练中文语音合成[J].中文信息学报,2006,20(4):75-81. 被引量：17

二级参考文献12

1R.H.Wang,Qingfeng Liu,Deyu Xia,Towards A Chinese Text-To-Speech System With Higher Naturalness[A],In:Proc.of ICSLP[C].Sydney,1998,p2047-2050.
2R.H.Wang,Zhongke Ma,Wei Li,Donglai Zhu,A Corpus-Based Chinese Speech Synthesis with ContextualDependent Unit Selection[A].In:Proc.of ICSLP[C].Beijing,2000,p391 -394.
3L.R.Rabiner,A tutorial on hidden Markov models and selected applications in speech recognition.Proc.of IEEE,1989[J].vol.77,pp.257-286.
4R.E.Donovan and E.M.Eide,The IBM trainable speech synthesis system[A].In:Proc.of ICSLP[C].Sydney,1998,vol.5,pp.1703-1706.
5X.Huang,A.Acero,H.Hon,Y.Ju,J.Liu,S.Merdith,and M.Plumpe,Recent improvements on Microsoft's trainable text-to-speech system-Whistler[A].In:Proc.of ICASSP[C].Munich,1997,pp.959-962.
6T.masuko,K.Tokuda,T.Kobayashi,and S.Imai,Speech synthesis from HMMs using dynamic features[A].In:Proc.of ICASSP[C].Atlanta,1996,pp.389 -392.
7T.Yoshimura,K.Tokuda,T.Masuko,T.Kobayashi,and T.Kitamura,Simultaneous modeling of spectrum,pitch and duration in HMM-based speech synthesis[A].In:Proc.of Eurospeech[C].Budapest,1999,vol.5,pp.2347-2350.
8K.Tokuda,T.Masuko,N.Miyazaki,and T.Kobayashi,Hidden Markov models based on multi-space probability distribution for pitch pattern modeling.In:Proc.of ICASSP[C].Arizona,1999,pp.229-232.
9T.Yoshimura,K.Tokuda,T.Masuko,T.Kobayashi and T.Kitamura,Duration modeling in HMM-based speech synthesis system[A].In:Proc.of ICSLP[C].Sydney,1998,vol.2,pp.29-32.
10H.Kawahara,I.Masuda-Katsuse and A.deCheveigne,Restructuring speech representations using pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based FO extraction:possible role of a repetitive structure in sounds,Speech Communication[J].1999,vol.27,pp.187-207.

共引文献16

1凌震华,王仁华.基于统计声学模型的单元挑选语音合成算法[J].模式识别与人工智能,2008,21(3):280-284. 被引量：8
2王仁华,戴礼荣,凌震华,胡郁.基于统计建模的可训练单元挑选语音合成方法[J].科学通报,2009,54(8):1133-1138. 被引量：4
3姑丽加玛丽.麦麦提艾力,艾斯卡尔.肉孜,艾斯卡尔.艾木都拉.三音素模型的维吾尔语最佳文本选取算法[J].计算机工程与应用,2009,45(18):242-244. 被引量：5
4吕浩音.可训练文语转换系统的时长模型优化[J].计算机应用,2010,30(1):282-284. 被引量：2
5涂奇雄,梁维谦.基于HMM的语音合成系统的模型压缩[J].电声技术,2010,34(7):48-51. 被引量：1
6卢恒,凌震华,雷鸣,戴礼荣,王仁华.基于最小生成误差的HMM模型聚类自动优化[J].模式识别与人工智能,2010,23(6):822-828. 被引量：1
7杨丽萍,陈明义,刘玉芳.基于语音合成的分布式大坝安全预警系统设计[J].仪表技术与传感器,2010(7):57-58.
8陈雁翔,龙润田.基于PAD情感模型的可训练语音合成研究[J].模式识别与人工智能,2013,26(11):1019-1025. 被引量：1
9赵建东,高光来,飞龙.基于HMM的蒙古语语音合成技术研究[J].计算机科学,2014,41(1):80-82. 被引量：6
10郑莹,陈明.自然语言处理下的语音形式化研究[J].湖北科技学院学报,2014,34(12):123-124.

1MGE UPS为雅典奥运会提供百分百服务[J].UPS应用,2004(12):68-68.
2王莉,胡剑凌,徐盛.基于听觉掩蔽效应的语音增强算法的研究[J].电声技术,2006,30(7):39-42. 被引量：3
3Eaton DX系列UPS新品重装上市深受好评[J].UPS应用,2009(6):78-78.
4朱宏,蒋刚毅,王晓东,陈芬,郁梅,邵枫,彭宗举.一种基于人眼视觉特性的视频质量评价算法[J].计算机辅助设计与图形学学报,2014,26(5):776-781. 被引量：9
5企业动态[J].当代通信,2005,12(24):68-70.
6王逸军.低码率语音编码的线谱对实现[J].重庆邮电学院学报（自然科学版）,1999,11(1):49-50. 被引量：2
7新品方案[J].通信世界,2005(8):45-46.
8刘福文,倪维桢,王德隽.一种新的规则脉冲激励语音合成模型[J].北京邮电学院学报,1993,16(2):69-74.
9陈亮,张翼鹏,庞亮.基于冗余字典的线谱对参数压缩感知量化算法[J].军事通信技术,2014,35(3):63-68.
10新型Pulsar EXtreme 2200C与3200C UPS：全面适用于网络设备，只占2U机架空间[J].UPS应用,2004(4):72-72.

模式识别与人工智能

2010年第4期

浏览历史

内容加载中请稍等...

基于感知加权线谱对距离的最小生成误差语音合成模型训练方法

参考文献16

二级参考文献12

共引文献16

相关作者

相关机构

相关主题

浏览历史