Spectral subtraction is used in this research as a method to remove noise from noisy speech signals in the frequency domain. This method consists of computing the spectrum of the noisy speech using the Fast Fourier Tr...Spectral subtraction is used in this research as a method to remove noise from noisy speech signals in the frequency domain. This method consists of computing the spectrum of the noisy speech using the Fast Fourier Transform (FFT) and subtracting the average magnitude of the noise spectrum from the noisy speech spectrum. We applied spectral subtraction to the speech signal “Real graph”. A digital audio recorder system embedded in a personal computer was used to sample the speech signal “Real graph” to which we digitally added vacuum cleaner noise. The noise removal algorithm was implemented using Matlab software by storing the noisy speech data into Hanning time-widowed half-overlapped data buffers, computing the corresponding spectrums using the FFT, removing the noise from the noisy speech, and reconstructing the speech back into the time domain using the inverse Fast Fourier Transform (IFFT). The performance of the algorithm was evaluated by calculating the Speech to Noise Ratio (SNR). Frame averaging was introduced as an optional technique that could improve the SNR. Seventeen different configurations with various lengths of the Hanning time windows, various degrees of data buffers overlapping, and various numbers of frames to be averaged were investigated in view of improving the SNR. Results showed that using one-fourth overlapped data buffers with 128 points Hanning windows and no frames averaging leads to the best performance in removing noise from the noisy speech.展开更多
The acoustic characteristics or the chinese vowels of 24 children with cleft palate and 10 normal control children were analyzed by computerized speech signal processing system (CSSPS),and the speech articulation was ...The acoustic characteristics or the chinese vowels of 24 children with cleft palate and 10 normal control children were analyzed by computerized speech signal processing system (CSSPS),and the speech articulation was judged with Glossary of clert palate speech(GCPS).The listening judgement showed that the speech articulation was significantly different between the two groups(P<0.01).The objective quantitative measurement suggested that the formant pattern(FP)of vowels in children with cleft palate was different from that of normal control children except vowel[a](P< 0.05).The acoustic vowelgraph or the Chinese vowels which demonstrated directly the relationship of vocal space and speech perception was stated with the first formant frequence(F1)and the second formant frequence(F2).The authors conclude that the values or F1 and F2 point out the upward and backward tongue movement to close the clert, which reflects the vocal characteristics of trausmission of clert palate speech.展开更多
This paper proposes a multi-band speech enhancement algorithm exploiting iterative processing for enhancement of single channel speech. In the proposed algorithm, the output of the multi-band spectral subtraction (MBS...This paper proposes a multi-band speech enhancement algorithm exploiting iterative processing for enhancement of single channel speech. In the proposed algorithm, the output of the multi-band spectral subtraction (MBSS) algorithm is used as the input signal again for next iteration process. As after the first MBSS processing step, the additive noise transforms to the remnant noise, the remnant noise needs to be further re-estimated. The proposed algorithm reduces the remnant musical noise further by iterating the enhanced output signal to the input again and performing the operation repeatedly. The newly estimated remnant noise is further used to process the next MBSS step. This procedure is iterated a small number of times. The proposed algorithm estimates noise in each iteration and spectral over-subtraction is executed independently in each band. The experiments are conducted for various types of noises. The performance of the proposed enhancement algorithm is evaluated for various types of noises at different level of SNRs using, 1) objective quality measures: signal-to-noise ratio (SNR), segmental SNR, perceptual evaluation of speech quality (PESQ);and 2) subjective quality measure: mean opinion score (MOS). The results of proposed enhancement algorithm are compared with the popular MBSS algorithm. Experimental results as well as the objective and subjective quality measurement test results confirm that the enhanced speech obtained from the proposed algorithm is more pleasant to listeners than speech enhanced by classical MBSS algorithm.展开更多
鉴于训练和测试阶段存在不同的噪声或混响环境,并且由于真实数据的稀缺会降低语音声源到达方向(Direction of Arrival, DOA)的分类准确性,因此提出一种基于核函数领域自适应的机器学习DOA分类算法。通过优化结构风险函数和减小域之间的...鉴于训练和测试阶段存在不同的噪声或混响环境,并且由于真实数据的稀缺会降低语音声源到达方向(Direction of Arrival, DOA)的分类准确性,因此提出一种基于核函数领域自适应的机器学习DOA分类算法。通过优化结构风险函数和减小域之间的条件分布差异,实现对训练数据的适应性学习,从而提升测试数据的分类准确率。实验结果证明在中小型数据集中,新算法在各种声学条件下均明显优于对比的深度学习算法。展开更多
随着人工智能技术的迅猛发展,数字媒体内容的自动化生成已成为提高内容生产效率与质量的重要手段。文章深入探讨了自然语言处理(Natural Language Processing,NLP)、计算机视觉和语音合成技术在文本、图像和音频内容生成中的具体应用,...随着人工智能技术的迅猛发展,数字媒体内容的自动化生成已成为提高内容生产效率与质量的重要手段。文章深入探讨了自然语言处理(Natural Language Processing,NLP)、计算机视觉和语音合成技术在文本、图像和音频内容生成中的具体应用,重点分析了深度学习在内容生成中的关键技术,以及自动化内容生成系统的模型架构与设计方法。通过具体的应用案例,文章展示了这些技术在实际环境中的应用成果,并强调了在技术实现过程中需要克服的关键问题与挑战。展开更多
围绕媒资数据的智能标签生成方法展开研究,通过构建模块化的后台服务系统,实现对视频、音频、图像及文本等多模态数据中的人物、地点、时间及语义信息等标签的自动提取与结构化生成。系统基于Windows平台开发,集成人脸识别、语音识别、...围绕媒资数据的智能标签生成方法展开研究,通过构建模块化的后台服务系统,实现对视频、音频、图像及文本等多模态数据中的人物、地点、时间及语义信息等标签的自动提取与结构化生成。系统基于Windows平台开发,集成人脸识别、语音识别、物体检测、光学字符识别(Optical Character Recognition,OCR)及自然语言处理(Natural Language Processing,NLP)等关键技术,利用中间件实现模块协同与统一调度,通过标准超文本传输协议(Hyper Text Transfer Protocol,HTTP)接口提供服务。实验表明,所提方法具有良好的通用性和拓展性,能够准确、快速地生成媒资数据标签,为内容管理、检索及智能分析提供高效支持。展开更多
Sound indexing and segmentation of digital documentsespecially in the internet and digital libraries are very useful tosimplify and to accelerate the multimedia document retrieval. Wecan imagine that we can extract mu...Sound indexing and segmentation of digital documentsespecially in the internet and digital libraries are very useful tosimplify and to accelerate the multimedia document retrieval. Wecan imagine that we can extract multimedia files not only bykeywords but also by speech semantic contents. The maindifficulty of this operation is the parameterization and modellingof the sound track and the discrimination of the speech, musicand noise segments. In this paper, we will present aSpeech/Music/Noise indexing interface designed for audiodiscrimination in multimedia documents. The program uses astatistical method based on ANN and HMM classifiers. After preemphasisand segmentation, the audio segments are analysed bythe cepstral acoustic analysis method. The developed system wasevaluated on a database constituted of music songs with Arabicspeech segments under several noisy environments.展开更多
针对时频谱模型估计语音不准确的问题,文中提出采用模型变换的方式来获得噪声和语音的对数概率密度函数,同时借助带噪语音、干净语音和噪声之间的对数关系并结合最小均方误差(Minimum Mean Square Error,MMSE)估计理论推导出估计语音对...针对时频谱模型估计语音不准确的问题,文中提出采用模型变换的方式来获得噪声和语音的对数概率密度函数,同时借助带噪语音、干净语音和噪声之间的对数关系并结合最小均方误差(Minimum Mean Square Error,MMSE)估计理论推导出估计语音对数谱的时频掩模。基于语音和噪声的对数概率分布推导出了一种软掩模,该软掩模可对带噪语音的对数子带进行加权以降低噪声,提高语音估计的准确性。仿真结果表明,与未处理的含噪语音相比,所提方法在噪声抑制方面具有3 dB以上的提升,基于最小均方误差的时频掩模和软掩模在听觉感知方面的平均提升量分别为27.7%和29.4%,在可懂度方面的平均提升量分别为12.7%和14.3%。展开更多
文摘Spectral subtraction is used in this research as a method to remove noise from noisy speech signals in the frequency domain. This method consists of computing the spectrum of the noisy speech using the Fast Fourier Transform (FFT) and subtracting the average magnitude of the noise spectrum from the noisy speech spectrum. We applied spectral subtraction to the speech signal “Real graph”. A digital audio recorder system embedded in a personal computer was used to sample the speech signal “Real graph” to which we digitally added vacuum cleaner noise. The noise removal algorithm was implemented using Matlab software by storing the noisy speech data into Hanning time-widowed half-overlapped data buffers, computing the corresponding spectrums using the FFT, removing the noise from the noisy speech, and reconstructing the speech back into the time domain using the inverse Fast Fourier Transform (IFFT). The performance of the algorithm was evaluated by calculating the Speech to Noise Ratio (SNR). Frame averaging was introduced as an optional technique that could improve the SNR. Seventeen different configurations with various lengths of the Hanning time windows, various degrees of data buffers overlapping, and various numbers of frames to be averaged were investigated in view of improving the SNR. Results showed that using one-fourth overlapped data buffers with 128 points Hanning windows and no frames averaging leads to the best performance in removing noise from the noisy speech.
文摘The acoustic characteristics or the chinese vowels of 24 children with cleft palate and 10 normal control children were analyzed by computerized speech signal processing system (CSSPS),and the speech articulation was judged with Glossary of clert palate speech(GCPS).The listening judgement showed that the speech articulation was significantly different between the two groups(P<0.01).The objective quantitative measurement suggested that the formant pattern(FP)of vowels in children with cleft palate was different from that of normal control children except vowel[a](P< 0.05).The acoustic vowelgraph or the Chinese vowels which demonstrated directly the relationship of vocal space and speech perception was stated with the first formant frequence(F1)and the second formant frequence(F2).The authors conclude that the values or F1 and F2 point out the upward and backward tongue movement to close the clert, which reflects the vocal characteristics of trausmission of clert palate speech.
文摘This paper proposes a multi-band speech enhancement algorithm exploiting iterative processing for enhancement of single channel speech. In the proposed algorithm, the output of the multi-band spectral subtraction (MBSS) algorithm is used as the input signal again for next iteration process. As after the first MBSS processing step, the additive noise transforms to the remnant noise, the remnant noise needs to be further re-estimated. The proposed algorithm reduces the remnant musical noise further by iterating the enhanced output signal to the input again and performing the operation repeatedly. The newly estimated remnant noise is further used to process the next MBSS step. This procedure is iterated a small number of times. The proposed algorithm estimates noise in each iteration and spectral over-subtraction is executed independently in each band. The experiments are conducted for various types of noises. The performance of the proposed enhancement algorithm is evaluated for various types of noises at different level of SNRs using, 1) objective quality measures: signal-to-noise ratio (SNR), segmental SNR, perceptual evaluation of speech quality (PESQ);and 2) subjective quality measure: mean opinion score (MOS). The results of proposed enhancement algorithm are compared with the popular MBSS algorithm. Experimental results as well as the objective and subjective quality measurement test results confirm that the enhanced speech obtained from the proposed algorithm is more pleasant to listeners than speech enhanced by classical MBSS algorithm.
文摘鉴于训练和测试阶段存在不同的噪声或混响环境,并且由于真实数据的稀缺会降低语音声源到达方向(Direction of Arrival, DOA)的分类准确性,因此提出一种基于核函数领域自适应的机器学习DOA分类算法。通过优化结构风险函数和减小域之间的条件分布差异,实现对训练数据的适应性学习,从而提升测试数据的分类准确率。实验结果证明在中小型数据集中,新算法在各种声学条件下均明显优于对比的深度学习算法。
文摘随着人工智能技术的迅猛发展,数字媒体内容的自动化生成已成为提高内容生产效率与质量的重要手段。文章深入探讨了自然语言处理(Natural Language Processing,NLP)、计算机视觉和语音合成技术在文本、图像和音频内容生成中的具体应用,重点分析了深度学习在内容生成中的关键技术,以及自动化内容生成系统的模型架构与设计方法。通过具体的应用案例,文章展示了这些技术在实际环境中的应用成果,并强调了在技术实现过程中需要克服的关键问题与挑战。
文摘围绕媒资数据的智能标签生成方法展开研究,通过构建模块化的后台服务系统,实现对视频、音频、图像及文本等多模态数据中的人物、地点、时间及语义信息等标签的自动提取与结构化生成。系统基于Windows平台开发,集成人脸识别、语音识别、物体检测、光学字符识别(Optical Character Recognition,OCR)及自然语言处理(Natural Language Processing,NLP)等关键技术,利用中间件实现模块协同与统一调度,通过标准超文本传输协议(Hyper Text Transfer Protocol,HTTP)接口提供服务。实验表明,所提方法具有良好的通用性和拓展性,能够准确、快速地生成媒资数据标签,为内容管理、检索及智能分析提供高效支持。
文摘Sound indexing and segmentation of digital documentsespecially in the internet and digital libraries are very useful tosimplify and to accelerate the multimedia document retrieval. Wecan imagine that we can extract multimedia files not only bykeywords but also by speech semantic contents. The maindifficulty of this operation is the parameterization and modellingof the sound track and the discrimination of the speech, musicand noise segments. In this paper, we will present aSpeech/Music/Noise indexing interface designed for audiodiscrimination in multimedia documents. The program uses astatistical method based on ANN and HMM classifiers. After preemphasisand segmentation, the audio segments are analysed bythe cepstral acoustic analysis method. The developed system wasevaluated on a database constituted of music songs with Arabicspeech segments under several noisy environments.
文摘针对时频谱模型估计语音不准确的问题,文中提出采用模型变换的方式来获得噪声和语音的对数概率密度函数,同时借助带噪语音、干净语音和噪声之间的对数关系并结合最小均方误差(Minimum Mean Square Error,MMSE)估计理论推导出估计语音对数谱的时频掩模。基于语音和噪声的对数概率分布推导出了一种软掩模,该软掩模可对带噪语音的对数子带进行加权以降低噪声,提高语音估计的准确性。仿真结果表明,与未处理的含噪语音相比,所提方法在噪声抑制方面具有3 dB以上的提升,基于最小均方误差的时频掩模和软掩模在听觉感知方面的平均提升量分别为27.7%和29.4%,在可懂度方面的平均提升量分别为12.7%和14.3%。