摘要
提出了基于聚类的方法实现词的快速量化表示,并由此导出潜在语义分析语言模型预测置信度,同时运用新提出的几何加权静态插值方式同三元文法模型相结合,构建了一种新的潜在语义分析语言模型,并将其应用于汉语语音识别.实验表明其效率和性能均优于传统基于奇异值分解的潜在语义分析语言模型,相比于三元文法模型,识别错误率相对下降为3.6%~7.1%左右,并为有效量化表示词对进一步提高潜在语义分析语言模型性能提供了新的途径.
In this paper, latent semantic analysis automatically uncovered the salient semantic relationships between words in a given training corpus by a novel faster method for quantizing word via clustering, it was used for mandarin speech recognition through combining with trigram model via a new proposed static geometric weighting interpolation manner. Experiments show that it outperformed the traditional singular value decomposition-based latent semantic analysis model for its better efficiency and performance. Compared with the trigram model, the reduction of relative recognition error rate is about 3.6% -7.1%. Furthermore, it provides a novel approach for improving latent semantic analysis model through quantizing word pair effectively.
出处
《高技术通讯》
CAS
CSCD
北大核心
2005年第8期1-5,共5页
Chinese High Technology Letters
基金
国家高技术研究发展计划(863计划)