To explore Chinese Mandarin speech identification in babble of spatially separated talkers,subjective speech identification tests of word and sentence were made with diotic and dichotic listening respectively.The resu...To explore Chinese Mandarin speech identification in babble of spatially separated talkers,subjective speech identification tests of word and sentence were made with diotic and dichotic listening respectively.The result shows that the speech identification scores changed non-monotonically with the masker number N increasing from 1 to infinity,first declining gradually until reaching their minimums and then rising.Statistical difference was found between the scores of diotic and dichotic listening.For all the values of N checked,dichotic listening achieved higher scores than diotic listening,showing that dichotic effect has an advantage for reducing babble masking.And the scores of sentence test are significantly higher than that of word test with whether diotic or dichotic listening,indicating that the linguistic connection in sentence can help listeners get a better perception of the target speech in babble masking.展开更多
The cocktail party problem,i.e.,tracing and recognizing the speech of a specific speaker when multiple speakers talk simultaneously,is one of the critical problems yet to be solved to enable the wide application of au...The cocktail party problem,i.e.,tracing and recognizing the speech of a specific speaker when multiple speakers talk simultaneously,is one of the critical problems yet to be solved to enable the wide application of automatic speech recognition(ASR) systems.In this overview paper,we review the techniques proposed in the last two decades in attacking this problem.We focus our discussions on the speech separation problem given its central role in the cocktail party environment,and describe the conventional single-channel techniques such as computational auditory scene analysis(CASA),non-negative matrix factorization(NMF) and generative models,the conventional multi-channel techniques such as beamforming and multi-channel blind source separation,and the newly developed deep learning-based techniques,such as deep clustering(DPCL),the deep attractor network(DANet),and permutation invariant training(PIT).We also present techniques developed to improve ASR accuracy and speaker identification in the cocktail party environment.We argue effectively exploiting information in the microphone array,the acoustic training set,and the language itself using a more powerful model.Better optimization ob jective and techniques will be the approach to solving the cocktail party problem.展开更多
基金supported by the National Natural Science Foundation of China(10774048,51078326)Science Foundation of Zhejiang Province,China(Y5090138)Science and Technology Planning Project of Guangdong Province,China(2011B061300066)
文摘To explore Chinese Mandarin speech identification in babble of spatially separated talkers,subjective speech identification tests of word and sentence were made with diotic and dichotic listening respectively.The result shows that the speech identification scores changed non-monotonically with the masker number N increasing from 1 to infinity,first declining gradually until reaching their minimums and then rising.Statistical difference was found between the scores of diotic and dichotic listening.For all the values of N checked,dichotic listening achieved higher scores than diotic listening,showing that dichotic effect has an advantage for reducing babble masking.And the scores of sentence test are significantly higher than that of word test with whether diotic or dichotic listening,indicating that the linguistic connection in sentence can help listeners get a better perception of the target speech in babble masking.
基金supported by the Tencent and Shanghai Jiao Tong University Joint Project
文摘The cocktail party problem,i.e.,tracing and recognizing the speech of a specific speaker when multiple speakers talk simultaneously,is one of the critical problems yet to be solved to enable the wide application of automatic speech recognition(ASR) systems.In this overview paper,we review the techniques proposed in the last two decades in attacking this problem.We focus our discussions on the speech separation problem given its central role in the cocktail party environment,and describe the conventional single-channel techniques such as computational auditory scene analysis(CASA),non-negative matrix factorization(NMF) and generative models,the conventional multi-channel techniques such as beamforming and multi-channel blind source separation,and the newly developed deep learning-based techniques,such as deep clustering(DPCL),the deep attractor network(DANet),and permutation invariant training(PIT).We also present techniques developed to improve ASR accuracy and speaker identification in the cocktail party environment.We argue effectively exploiting information in the microphone array,the acoustic training set,and the language itself using a more powerful model.Better optimization ob jective and techniques will be the approach to solving the cocktail party problem.