摘要
本文实现了一个媒体资产管理中的语音人机界面标引系统。系统以连续混合高斯隐马尔可夫模型为基础,采用分层构筑维特比算法进行训练和识别。为实现标引的实时性,采用实时计算的方法。为了减少计算量,并没有将状态持续时间分布引入Viterbi译码,而是将其作为后处理部分。对于数字识别,以声调作为辅助判决。以此做了一个体育赛事的词库,测试表明,标引系统首选识别率达到93.5%,前五选识别率达到98%。
In this paper a speech man-machine interface used in media asset management (MAM) is developed. Based on the continuous Gaussian Mixture HMM, the system adopts Level-Building Viterbi searching algorithm to train and spot. To have a real-time performance, the recording and the calculating is working simultaneously. The state duration distribution is employed as second part other than joining Viterbi for less calculation .For connected digit recognition ,we proposed the pitch of speech as a additional judgement. Then we make a word library about sports. The experiments show that the system has a recognition accuracy of the top one candidate is 93.5 % ,and the recognition accuracy of the top five candidates is 98 %.
出处
《微计算机信息》
北大核心
2005年第4期232-233,共2页
Control & Automation
关键词
人机界面
语音识别
实时算法
Man-Machine Interface
Speech recognition
Real-time algorithm