摘要
对于开放域的文语转换系统来说,最佳训练文本集的选择是至关重要的,尤其在希望文语转换系统能实现对不同说话人语音的自适应建模时.以音长模型为例,在线性模型的假设下,通过对设计矩阵结构的扩展,提出了基于多模型合并的贪婪选择算法.实验表明,由于充分利用了不同子类模型选择文本间的巨大冗余度,从而显著减少了所需训练语句的数量.同时,通过改进拟阵覆盖问题中代价函数的形式,将算法进一步推广至使训练文本中音素总数最少,从而可以更准确地实现选择文本最小化的目标.
Optimal text selection is vital for open domain text to speech synthesis, especially when we desire the synthesizer to realize speaker adaptive modeling. The duration model was taken as a test case. Under the linear model assumption, by modifying the structure of design matrix, a multi model based greedy algorithm was proposed. Taking advantage of the big redundancy between texts for different sound categories, it can reduce the number of selected sentences drastically compared to the original single model based version. With a further modification of the cost function in matroid cover problem, the algorithm can be generalized from the minimization of sentence number to the minimization of total phoneme number, so as to implement a more accurate optimal text selection.
出处
《上海交通大学学报》
EI
CAS
CSCD
北大核心
1999年第1期96-100,共5页
Journal of Shanghai Jiaotong University
基金
美国贝尔实验室中国上海分部资助
关键词
文本选择
贪婪算法
音长模型
文语转换系统
text to speech synthesis
text selection
greedy algorithm
matroid cover
duration model