The Khatri-Rao(KR) subspace method is a high resolution method for direction-of-arrival(DOA) estimation.Combined with 2q level nested array,the KR subspace method can detect O(N2q) sources with N sensors.However,the m...The Khatri-Rao(KR) subspace method is a high resolution method for direction-of-arrival(DOA) estimation.Combined with 2q level nested array,the KR subspace method can detect O(N2q) sources with N sensors.However,the method cannot be applicable to Gaussian sources when q is equal to or greater than 2 since it needs to use 2q-th order cumulants.In this work,a novel approach is presented to conduct DOA estimation by constructing a fourth order difference co-array.Unlike the existing DOA estimation method based on the KR product and 2q level nested array,the proposed method only uses second order statistics,so it can be employed to Gaussian sources as well as non-Gaussian sources.By exploiting a four-level nested array with N elements,our method can also identify O(N4) sources.In order to estimate the wideband signals,the proposed method is extended to the wideband scenarios.Simulation results demonstrate that,compared to the state of the art KR subspace based methods,the new method achieves higher resolution.展开更多
This paper shows the importance of the optimal smoothing scheme in Microphone Array Post-Filtering(MAPF) under a combined Deterministic-Stochastic Hybrid Model(DSHM).We reveal that some of the well-known MAPF algorith...This paper shows the importance of the optimal smoothing scheme in Microphone Array Post-Filtering(MAPF) under a combined Deterministic-Stochastic Hybrid Model(DSHM).We reveal that some of the well-known MAPF algorithms may cause serious speech distortion without using the optimal smoothing scheme,which is resulted from oversmoothing the raw periodogram over time.Using a minimum conditional mean square error criterion,we derive the optimal smoothing factor under the DSHM,where the Deterministic-to-Stochastic-Ratio(DSR) and the stationarity determine the value of the optimal smoothing factor.The optimal smoothing scheme is applied to the Tran-sient-Beam-to-Reference-Ratio(TBRR)-based MAPF algorithm and experimental results show its better performance in terms of both the Log-Spectral Distance(LSD) and the Perceptual Evaluation of Speech Quality(PESQ).展开更多
In order to investigate the influence of dummy head on measuring speech intelligi- bility, the objective and subjective speech intelligibility evaluation experiments were respectively carried out for different spatial...In order to investigate the influence of dummy head on measuring speech intelligi- bility, the objective and subjective speech intelligibility evaluation experiments were respectively carried out for different spatial configurations of a target source and a noise source in the horizontal plane. The differences between standard STIPA measured without a dummy head and binaural STIPA measured with a dummy head were compared and the correlation of subjective speech intelligibility and objective STIPA was analyzed. It is showed that the position of sound source affects significantly on binaural STIPA and subjective intelligibility measured by a dummy head or measured in a real-life scenario. The standard STIPA is closer to the lower value of the two binaural STIPA values. The speech intelligibility is higher for a single ear which is on the same side with the target source or on the other side of the noise source. Binaural speech intelligibility is always the lowest when both target and noise sources are at the same place but once apart the speech intelligibility will increase sharply. It is also found that the subjective intelligibility measured by a dummy head or measured in a real-life scenario is uncorrelated with standard STIPA, but correlated highly with STIPA measured with a dummy head. The subjective intelligibility of one single ear is correlated highly with STIPA measured at the same ear, and the binaural speech intelligibility is in well agreement with the higher value of the two binaural STIPA values.展开更多
This paper addresses the JND(Just Noticeable Difference)change of auditory perception with synchronous visual stimuli.Through psychoacoustics experimentS,loudness JND,subjective duration JND and pitch JND of pure to...This paper addresses the JND(Just Noticeable Difference)change of auditory perception with synchronous visual stimuli.Through psychoacoustics experimentS,loudness JND,subjective duration JND and pitch JND of pure tone were measured in auditory-only mode and visual_auditory mode with different visual stimuli which have different attributes such as color,illumination,quality and moving state.Statistical analyses of the experimental data indicare that,comparing with JND in auditory-only mode,the amount of JND with visual stimuli is often larger.The JND'S average increment of subjective duration,pitch and loudness are 45.1%,14.8%and 12.3%,respectively.The conclusion is that the ability of JNDbased auditory perception often decreases with visual stimuli.The incremental amount of JND is afiected bv the attributes of visual stimuli.If the visual stimuli make subjects feel more comfortable,the JND of auditory perception will change smaller.展开更多
By analyzing the differences between binaural recording and real listening, it was deduced that there were some unrevealed auditory localization clues, and the sound pressure distribution pattern at the entrance of ea...By analyzing the differences between binaural recording and real listening, it was deduced that there were some unrevealed auditory localization clues, and the sound pressure distribution pattern at the entrance of ear canal was probably a clue. It was proved through the listening test that the unrevealed auditory localization clues really exist with the reduction to absurdity. And the effective frequency bands of the unrevealed localization clues were in- duced and summed. The result of finite element based simulations showed that the pressure distribution at the entrance of ear canal was non-uniform, and the pattern was related to the direction of sound source. And it was proved that the sound pressure distribution pattern at the entrance of the ear canal carried the sound source direction information and could be used as an unrevealed localization clue. The frequency bands in which the sound pressure distribution patterns had significant differences between front and back sound source directions were roughly matched with the effective frequency bands of unrevealed localization clues obtained from the listening tests. To some extent, it supports the pattern could be a kind of unrevealed auditory hypothesis that the sound pressure distribution localization clues.展开更多
In order to investigate the group characteristics of Putonghua monophthong formants, the tokens of 90 female students were surveyed. The formants were measured using LPC method. The averaged values and spread of forma...In order to investigate the group characteristics of Putonghua monophthong formants, the tokens of 90 female students were surveyed. The formants were measured using LPC method. The averaged values and spread of formant frequencies were given with statistical meaning. The results show the difference from the previous measurements by other researchers decades ago. For all monophthongs, F4/F3 and F5/F4 are generally around 1.4. To discriminate monophthongs, F2/F1 and F3/F2 are possibly the two new parameters besides the first three formants.展开更多
The just noticeable difference (JND) of the reverberance was tested using the constant-stimulus, method. Some factors that may influence the tested results are analyzed to validate the experimental data. The test ma...The just noticeable difference (JND) of the reverberance was tested using the constant-stimulus, method. Some factors that may influence the tested results are analyzed to validate the experimental data. The test materials are the Chinese instrumental music. Three subjects groups were tested, including the audio technician group, the students from audio engineering department and a group of postgraduates majoring in acoustics. It is found that the value of JND of reverberance is about 25%. The difference possibly caused by the professional training and experience of different subjects groups is noticeable, but the difference caused by different music motifs is insignificant.展开更多
基金Project(2010ZX03006-004) supported by the National Science and Technology Major Program of ChinaProject(YYYJ-1113) supported by the Knowledge Innovation Program of the Chinese Academy of SciencesProject(2011CB302901) supported by the National Basic Research Program of China
文摘The Khatri-Rao(KR) subspace method is a high resolution method for direction-of-arrival(DOA) estimation.Combined with 2q level nested array,the KR subspace method can detect O(N2q) sources with N sensors.However,the method cannot be applicable to Gaussian sources when q is equal to or greater than 2 since it needs to use 2q-th order cumulants.In this work,a novel approach is presented to conduct DOA estimation by constructing a fourth order difference co-array.Unlike the existing DOA estimation method based on the KR product and 2q level nested array,the proposed method only uses second order statistics,so it can be employed to Gaussian sources as well as non-Gaussian sources.By exploiting a four-level nested array with N elements,our method can also identify O(N4) sources.In order to estimate the wideband signals,the proposed method is extended to the wideband scenarios.Simulation results demonstrate that,compared to the state of the art KR subspace based methods,the new method achieves higher resolution.
基金Supported by the National Natural Science Foundation of China (No. 61072123)
文摘This paper shows the importance of the optimal smoothing scheme in Microphone Array Post-Filtering(MAPF) under a combined Deterministic-Stochastic Hybrid Model(DSHM).We reveal that some of the well-known MAPF algorithms may cause serious speech distortion without using the optimal smoothing scheme,which is resulted from oversmoothing the raw periodogram over time.Using a minimum conditional mean square error criterion,we derive the optimal smoothing factor under the DSHM,where the Deterministic-to-Stochastic-Ratio(DSR) and the stationarity determine the value of the optimal smoothing factor.The optimal smoothing scheme is applied to the Tran-sient-Beam-to-Reference-Ratio(TBRR)-based MAPF algorithm and experimental results show its better performance in terms of both the Log-Spectral Distance(LSD) and the Perceptual Evaluation of Speech Quality(PESQ).
基金supported by the National Nature Science Foundation of China(11204278)
文摘In order to investigate the influence of dummy head on measuring speech intelligi- bility, the objective and subjective speech intelligibility evaluation experiments were respectively carried out for different spatial configurations of a target source and a noise source in the horizontal plane. The differences between standard STIPA measured without a dummy head and binaural STIPA measured with a dummy head were compared and the correlation of subjective speech intelligibility and objective STIPA was analyzed. It is showed that the position of sound source affects significantly on binaural STIPA and subjective intelligibility measured by a dummy head or measured in a real-life scenario. The standard STIPA is closer to the lower value of the two binaural STIPA values. The speech intelligibility is higher for a single ear which is on the same side with the target source or on the other side of the noise source. Binaural speech intelligibility is always the lowest when both target and noise sources are at the same place but once apart the speech intelligibility will increase sharply. It is also found that the subjective intelligibility measured by a dummy head or measured in a real-life scenario is uncorrelated with standard STIPA, but correlated highly with STIPA measured with a dummy head. The subjective intelligibility of one single ear is correlated highly with STIPA measured at the same ear, and the binaural speech intelligibility is in well agreement with the higher value of the two binaural STIPA values.
文摘This paper addresses the JND(Just Noticeable Difference)change of auditory perception with synchronous visual stimuli.Through psychoacoustics experimentS,loudness JND,subjective duration JND and pitch JND of pure tone were measured in auditory-only mode and visual_auditory mode with different visual stimuli which have different attributes such as color,illumination,quality and moving state.Statistical analyses of the experimental data indicare that,comparing with JND in auditory-only mode,the amount of JND with visual stimuli is often larger.The JND'S average increment of subjective duration,pitch and loudness are 45.1%,14.8%and 12.3%,respectively.The conclusion is that the ability of JNDbased auditory perception often decreases with visual stimuli.The incremental amount of JND is afiected bv the attributes of visual stimuli.If the visual stimuli make subjects feel more comfortable,the JND of auditory perception will change smaller.
基金supported by the Science and Engineering Project of Communication University of China(3132016XNG1625)
文摘By analyzing the differences between binaural recording and real listening, it was deduced that there were some unrevealed auditory localization clues, and the sound pressure distribution pattern at the entrance of ear canal was probably a clue. It was proved through the listening test that the unrevealed auditory localization clues really exist with the reduction to absurdity. And the effective frequency bands of the unrevealed localization clues were in- duced and summed. The result of finite element based simulations showed that the pressure distribution at the entrance of ear canal was non-uniform, and the pattern was related to the direction of sound source. And it was proved that the sound pressure distribution pattern at the entrance of the ear canal carried the sound source direction information and could be used as an unrevealed localization clue. The frequency bands in which the sound pressure distribution patterns had significant differences between front and back sound source directions were roughly matched with the effective frequency bands of unrevealed localization clues obtained from the listening tests. To some extent, it supports the pattern could be a kind of unrevealed auditory hypothesis that the sound pressure distribution localization clues.
基金This work was supported by the Research Fund from SARFT (BG0305).
文摘In order to investigate the group characteristics of Putonghua monophthong formants, the tokens of 90 female students were surveyed. The formants were measured using LPC method. The averaged values and spread of formant frequencies were given with statistical meaning. The results show the difference from the previous measurements by other researchers decades ago. For all monophthongs, F4/F3 and F5/F4 are generally around 1.4. To discriminate monophthongs, F2/F1 and F3/F2 are possibly the two new parameters besides the first three formants.
基金supported by the National Natural Science Foundation of China(10874155,10574100)
文摘The just noticeable difference (JND) of the reverberance was tested using the constant-stimulus, method. Some factors that may influence the tested results are analyzed to validate the experimental data. The test materials are the Chinese instrumental music. Three subjects groups were tested, including the audio technician group, the students from audio engineering department and a group of postgraduates majoring in acoustics. It is found that the value of JND of reverberance is about 25%. The difference possibly caused by the professional training and experience of different subjects groups is noticeable, but the difference caused by different music motifs is insignificant.