This paper proposes an unequal error protection(UEP)coding method to improve the transmission performance of three-dimensional(3D)audio based on expanding window fountain(EWF).Different from other transmissions ...This paper proposes an unequal error protection(UEP)coding method to improve the transmission performance of three-dimensional(3D)audio based on expanding window fountain(EWF).Different from other transmissions with equal error protection(EEP)when transmitting the 3D audio objects.An approach of extracting the important audio object is presented,and more protection is given to more important audio object and comparatively less protection is given to the normal audio objects.Objective and subjective experiments have shown that the proposed UEP method achieves better performance than equal error protection method,while the bits error rates(BER)of the important audio object can decrease from 10^(–3) to 10^(–4),and the subjective quality of UEP is better than that of EEP by 14%.展开更多
With the rapid expansion of multimedia data,protecting digital information has become increasingly critical.Reversible data hiding offers an effective solution by allowing sensitive information to be embedded in multi...With the rapid expansion of multimedia data,protecting digital information has become increasingly critical.Reversible data hiding offers an effective solution by allowing sensitive information to be embedded in multimedia files while enabling full recovery of the original data after extraction.Audio,as a vital medium in communication,entertainment,and information sharing,demands the same level of security as images.However,embedding data in encrypted audio poses unique challenges due to the trade-offs between security,data integrity,and embedding capacity.This paper presents a novel interpolation-based reversible data hiding algorithm for encrypted audio that achieves scalable embedding capacity.By increasing sample density through interpolation,embedding opportunities are significantly enhanced while maintaining encryption throughout the process.The method further integrates multiple most significant bit(multi-MSB)prediction and Huffman coding to optimize compression and embedding efficiency.Experimental results on standard audio datasets demonstrate the proposed algorithm’s ability to embed up to 12.47 bits per sample with over 9.26 bits per sample available for pure embedding capacity,while preserving full reversibility.These results confirm the method’s suitability for secure applications that demand high embedding capacity and perfect reconstruction of original audio.This work advances reversible data hiding in encrypted audio by offering a secure,efficient,and fully reversible data hiding framework.展开更多
Object-based audio coding is the main technique of audio scene coding. It can effectively reconstruct each object trajectory, besides provide sufficient flexibility for personalized audio scene reconstruction. So more...Object-based audio coding is the main technique of audio scene coding. It can effectively reconstruct each object trajectory, besides provide sufficient flexibility for personalized audio scene reconstruction. So more and more attentions have been paid to the object-based audio coding. However, existing object-based techniques have poor sound quality because of low parameter frequency domain resolution. In order to achieve high quality audio object coding, we propose a new coding framework with introducing the non-negative matrix factorization(NMF) method. We extract object parameters with high resolution to improve sound quality, and apply NMF method to parameter coding to reduce the high bitrate caused by high resolution. And the experimental results have shown that the proposed framework can improve the coding quality by 25%, so it can provide a better solution to encode audio scene in a more flexible and higher quality way.展开更多
Lattice vector quantization (LVQ) has been used for real-time speech and audio coding systems. Compared with conventional vector quantization, LVQ has two main advantages: It has a simple and fast encoding process,...Lattice vector quantization (LVQ) has been used for real-time speech and audio coding systems. Compared with conventional vector quantization, LVQ has two main advantages: It has a simple and fast encoding process, and it significantly reduces the amount of memory required. Therefore, LVQ is suitable for use in low-complexity speech and audio coding. In this paper, we describe the basic concepts of LVQ and its advantages over conventional vector quantization. We also describe some LVQ techniques that have been used in speech and audio coding standards of international standards developing organizations (SDOs).展开更多
A Hi Fi audio coding technology for ISDN and Internet is introduced. It is the ISO/MPEG Audio Layer III digital audio compression scheme coding at 64 kbit/s. First, the paper implements C language simulation accordin...A Hi Fi audio coding technology for ISDN and Internet is introduced. It is the ISO/MPEG Audio Layer III digital audio compression scheme coding at 64 kbit/s. First, the paper implements C language simulation according to the algorithm and gets satisfactory quality of the reconstructed music signal. The estimation of operation steps and simulation of decoder finished by a TMS 320C548 simulator are presented. The result is the same as that of the C language simulation.展开更多
Audio Video Coding Standard (AVS) is a second-generation source coding standard and the first standard for audio and video coding in China with independent intellectual property rights. Its performance has reached t...Audio Video Coding Standard (AVS) is a second-generation source coding standard and the first standard for audio and video coding in China with independent intellectual property rights. Its performance has reached the international standard. Its coding efficiency is 2 to 3 times greater than that of MPEG -2. This technical solution is more simple, and it can greatly save channel resource. After more than ten years' development, AVS has achieved great success. The latest version of the AVS audio coding standard is ongoing and mainly aims at the increasing demand for low bitrate and high quality audio services. The paper reviews the history and recent development of AVS audio coding standard in terms of basic features, key techniques and performance. Finally, the future development of AVS audio coding standard is discussed.展开更多
A novel frame error concealment scheme is proposed to improve the decoded audio quality of the receiver for transform coded excitation(TCX)audio codec.This scheme,which is a gain control approach based on the stabil...A novel frame error concealment scheme is proposed to improve the decoded audio quality of the receiver for transform coded excitation(TCX)audio codec.This scheme,which is a gain control approach based on the stability of linear predictive coding(LPC)filter,predicts the lost frames by utilizing the linear spectrum frequency and different continuous attenuation factor of different kinds of lost frames.Signal noise ratio(SNR)test and multiple stimuli with hidden reference and anchor(MUSHRA)test are conducted to evaluate the performance of this approach in adaptive multi-rate wideband plus(AMR-WB+)audio codec.Compared with the original frame error concealment scheme,our scheme achieves better audio recovery quality in AMR-WB+audio codec.展开更多
A new three-dimensional(3D) audio coding approach is presented to improve the spatial perceptual quality of 3D audio. Different from other audio coding approaches, the distance side information is also quantified, and...A new three-dimensional(3D) audio coding approach is presented to improve the spatial perceptual quality of 3D audio. Different from other audio coding approaches, the distance side information is also quantified, and the non-uniform perceptual quantization is proposed based on the spatial perception features of the human auditory system, which is named as concentric spheres spatial quantization(CSSQ) method. Comparison results were presented, which showed that a better distance perceptual quality of 3D audio can be enhanced by 5.7%~8.8% through extracting and coding the distance side information comparing with the directional audio coding, and the bit rate of our coding method is decreased of 8.07% comparing with the spatial squeeze surround audio coding.展开更多
A Bark-band residual noise model integrated with the human hearing mechanism is proposed to efficiently complement sinusoidal model in parametric audio coding. The time-varying spectrum of the residual noise is retrie...A Bark-band residual noise model integrated with the human hearing mechanism is proposed to efficiently complement sinusoidal model in parametric audio coding. The time-varying spectrum of the residual noise is retrieved by Bark-scale piecewise constant magnitude estimates along with random phases. In the proposed noise model, Bark bands information is obtained by short-time FFT method and window overlap-add technique is exploited to remove boundary discontinuities. SVQ is also incorporated into parameter quantization process for the low bit-rate coding demand. Simulation results and informal listening tests show that when the sinusoidal model is combined with the Bark-band noise model, better synthesis audio quality can be achieved compared with the original sinusoidal modeling audio codec.展开更多
This paper proposed improvements to the low bit rate parametric audio coder with sinusoid model as its kernel. Firstly, we propose a new method to effectively order and select the perceptually most important sinusoids...This paper proposed improvements to the low bit rate parametric audio coder with sinusoid model as its kernel. Firstly, we propose a new method to effectively order and select the perceptually most important sinusoids. The sinusoid which contributes most to the reduction of overall NMR is chosen. Combined with our improved parametric psychoacoustic model and advanced peak riddling techniques, the number of sinusoids required can be greatly reduced and the coding efficiency can be greatly enhanced. A lightweight version is also given to reduce the amount of computation with only little sacrifice of performance. Secondly, we propose two enhancement techniques for sinusoid synthesis: bandwidth enhancement and line enhancement. With little overhead, the effective bandwidth can be extended one more octave; the timbre tends to sound much brighter, thicker and more beautiful.展开更多
Abstract The method of quantization noise control of audio coding in the wavelet domain is proposed. Using the inverse Discrete Fourier Transform (DFT), it converts the masking threshold coming from MPEG psycho-acou...Abstract The method of quantization noise control of audio coding in the wavelet domain is proposed. Using the inverse Discrete Fourier Transform (DFT), it converts the masking threshold coming from MPEG psycho-acoustic model in the frequency domain to the signal in the time domain; the Discrete Wavelet Packet Transform (DWPF) is performed; the energy in each subband is regarded as the maximum allowed quantization noise energy. The experimental result shows that the proposed method can attain the nearly transparent audio quality below 64kbps for the most testing audio signals.展开更多
The performance of the speaker recognition system declines when training and testing audio codecs are mismatched. In this paper, based on analyzing the effect of mismatched audio codecs in the linear prediction cepstr...The performance of the speaker recognition system declines when training and testing audio codecs are mismatched. In this paper, based on analyzing the effect of mismatched audio codecs in the linear prediction cepstrum coefficients, a method of MAP-based audio coding compensation for speaker recognition is proposed. The proposed method firstly sets a standard codec as a reference and trains the speaker models in this codec format, then learns the deviation distributions between the standard codec format and the other ones, next gets the current bias via using a small number adaptive data and the MAP-based adaptive technique, and then adjusts the model parameters by the type of coming audio codec format and its related bias. During the test, the features of the coming speaker are used to match with the adjusted model. The experimental result shows that the accuracy reached 82.4% with just one second adaptive data, which is higher 5.5% than that in the baseline system.展开更多
针对提高4K超高清电视广播视频压缩效率,减少存储空间和网络带宽这一需求,提出应用音视频编码标准3(Audio Video Coding Standard 3,AVS3)视频编解码技术,利用AVS3编解码标准的高效性,提升4K超高清视频内容的编码质量和传输效率。首先,...针对提高4K超高清电视广播视频压缩效率,减少存储空间和网络带宽这一需求,提出应用音视频编码标准3(Audio Video Coding Standard 3,AVS3)视频编解码技术,利用AVS3编解码标准的高效性,提升4K超高清视频内容的编码质量和传输效率。首先,从节目制播环节入手,通过扩展帧内角度预测和信源动态编码,在保持图像质量的同时降低编码比特率,优化编码过程,节省传输带宽;其次,通过视频分层并行解码和音频解码还放声音技术,提升接收解码及呈现的速度和质量;最后,通过应用实验验证AVS3视频编解码技术在4K超高清电视广播领域的实际效果。展开更多
基金Supported by the National High Technology Research and Development Program of China(863 Program,2015AA016306)the National Natural Science Foundation of China(61662010,61231015,61471271)+1 种基金Science and Technology Plan Projects of Shenzhen(ZDSYS2014050916575763)Science and Technology Foundation of Guizhou Province(LKS[2011]1)
文摘This paper proposes an unequal error protection(UEP)coding method to improve the transmission performance of three-dimensional(3D)audio based on expanding window fountain(EWF).Different from other transmissions with equal error protection(EEP)when transmitting the 3D audio objects.An approach of extracting the important audio object is presented,and more protection is given to more important audio object and comparatively less protection is given to the normal audio objects.Objective and subjective experiments have shown that the proposed UEP method achieves better performance than equal error protection method,while the bits error rates(BER)of the important audio object can decrease from 10^(–3) to 10^(–4),and the subjective quality of UEP is better than that of EEP by 14%.
基金funded by theNational Science and Technology Council of Taiwan under the grant number NSTC 113-2221-E-035-058.
文摘With the rapid expansion of multimedia data,protecting digital information has become increasingly critical.Reversible data hiding offers an effective solution by allowing sensitive information to be embedded in multimedia files while enabling full recovery of the original data after extraction.Audio,as a vital medium in communication,entertainment,and information sharing,demands the same level of security as images.However,embedding data in encrypted audio poses unique challenges due to the trade-offs between security,data integrity,and embedding capacity.This paper presents a novel interpolation-based reversible data hiding algorithm for encrypted audio that achieves scalable embedding capacity.By increasing sample density through interpolation,embedding opportunities are significantly enhanced while maintaining encryption throughout the process.The method further integrates multiple most significant bit(multi-MSB)prediction and Huffman coding to optimize compression and embedding efficiency.Experimental results on standard audio datasets demonstrate the proposed algorithm’s ability to embed up to 12.47 bits per sample with over 9.26 bits per sample available for pure embedding capacity,while preserving full reversibility.These results confirm the method’s suitability for secure applications that demand high embedding capacity and perfect reconstruction of original audio.This work advances reversible data hiding in encrypted audio by offering a secure,efficient,and fully reversible data hiding framework.
基金supported by National High Technology Research and Development Program of China (863 Program) (No.2015AA016306)National Nature Science Foundation of China (No.61231015)National Nature Science Foundation of China (No.61671335)
文摘Object-based audio coding is the main technique of audio scene coding. It can effectively reconstruct each object trajectory, besides provide sufficient flexibility for personalized audio scene reconstruction. So more and more attentions have been paid to the object-based audio coding. However, existing object-based techniques have poor sound quality because of low parameter frequency domain resolution. In order to achieve high quality audio object coding, we propose a new coding framework with introducing the non-negative matrix factorization(NMF) method. We extract object parameters with high resolution to improve sound quality, and apply NMF method to parameter coding to reduce the high bitrate caused by high resolution. And the experimental results have shown that the proposed framework can improve the coding quality by 25%, so it can provide a better solution to encode audio scene in a more flexible and higher quality way.
文摘Lattice vector quantization (LVQ) has been used for real-time speech and audio coding systems. Compared with conventional vector quantization, LVQ has two main advantages: It has a simple and fast encoding process, and it significantly reduces the amount of memory required. Therefore, LVQ is suitable for use in low-complexity speech and audio coding. In this paper, we describe the basic concepts of LVQ and its advantages over conventional vector quantization. We also describe some LVQ techniques that have been used in speech and audio coding standards of international standards developing organizations (SDOs).
文摘A Hi Fi audio coding technology for ISDN and Internet is introduced. It is the ISO/MPEG Audio Layer III digital audio compression scheme coding at 64 kbit/s. First, the paper implements C language simulation according to the algorithm and gets satisfactory quality of the reconstructed music signal. The estimation of operation steps and simulation of decoder finished by a TMS 320C548 simulator are presented. The result is the same as that of the C language simulation.
文摘Audio Video Coding Standard (AVS) is a second-generation source coding standard and the first standard for audio and video coding in China with independent intellectual property rights. Its performance has reached the international standard. Its coding efficiency is 2 to 3 times greater than that of MPEG -2. This technical solution is more simple, and it can greatly save channel resource. After more than ten years' development, AVS has achieved great success. The latest version of the AVS audio coding standard is ongoing and mainly aims at the increasing demand for low bitrate and high quality audio services. The paper reviews the history and recent development of AVS audio coding standard in terms of basic features, key techniques and performance. Finally, the future development of AVS audio coding standard is discussed.
基金Supported by the National High Technology Research and Development Program of China(863 Program)(2015AA016306)the Foundation of Outstanding Middle-aged and Young Scientific and Technological Innovation Team Program of Department of Education of Hubei Province(T201516)the Foundation of Department of Education of Hubei Province(Q20132207)
文摘A novel frame error concealment scheme is proposed to improve the decoded audio quality of the receiver for transform coded excitation(TCX)audio codec.This scheme,which is a gain control approach based on the stability of linear predictive coding(LPC)filter,predicts the lost frames by utilizing the linear spectrum frequency and different continuous attenuation factor of different kinds of lost frames.Signal noise ratio(SNR)test and multiple stimuli with hidden reference and anchor(MUSHRA)test are conducted to evaluate the performance of this approach in adaptive multi-rate wideband plus(AMR-WB+)audio codec.Compared with the original frame error concealment scheme,our scheme achieves better audio recovery quality in AMR-WB+audio codec.
基金supported by National High Technology Research and Development Program of China (863 Program, No. 2015AA016306)National Nature Science Foundation of China (No. 61662010, 61231015, 61471271, 61761044, 61762005)
文摘A new three-dimensional(3D) audio coding approach is presented to improve the spatial perceptual quality of 3D audio. Different from other audio coding approaches, the distance side information is also quantified, and the non-uniform perceptual quantization is proposed based on the spatial perception features of the human auditory system, which is named as concentric spheres spatial quantization(CSSQ) method. Comparison results were presented, which showed that a better distance perceptual quality of 3D audio can be enhanced by 5.7%~8.8% through extracting and coding the distance side information comparing with the directional audio coding, and the bit rate of our coding method is decreased of 8.07% comparing with the spatial squeeze surround audio coding.
文摘A Bark-band residual noise model integrated with the human hearing mechanism is proposed to efficiently complement sinusoidal model in parametric audio coding. The time-varying spectrum of the residual noise is retrieved by Bark-scale piecewise constant magnitude estimates along with random phases. In the proposed noise model, Bark bands information is obtained by short-time FFT method and window overlap-add technique is exploited to remove boundary discontinuities. SVQ is also incorporated into parameter quantization process for the low bit-rate coding demand. Simulation results and informal listening tests show that when the sinusoidal model is combined with the Bark-band noise model, better synthesis audio quality can be achieved compared with the original sinusoidal modeling audio codec.
文摘This paper proposed improvements to the low bit rate parametric audio coder with sinusoid model as its kernel. Firstly, we propose a new method to effectively order and select the perceptually most important sinusoids. The sinusoid which contributes most to the reduction of overall NMR is chosen. Combined with our improved parametric psychoacoustic model and advanced peak riddling techniques, the number of sinusoids required can be greatly reduced and the coding efficiency can be greatly enhanced. A lightweight version is also given to reduce the amount of computation with only little sacrifice of performance. Secondly, we propose two enhancement techniques for sinusoid synthesis: bandwidth enhancement and line enhancement. With little overhead, the effective bandwidth can be extended one more octave; the timbre tends to sound much brighter, thicker and more beautiful.
文摘Abstract The method of quantization noise control of audio coding in the wavelet domain is proposed. Using the inverse Discrete Fourier Transform (DFT), it converts the masking threshold coming from MPEG psycho-acoustic model in the frequency domain to the signal in the time domain; the Discrete Wavelet Packet Transform (DWPF) is performed; the energy in each subband is regarded as the maximum allowed quantization noise energy. The experimental result shows that the proposed method can attain the nearly transparent audio quality below 64kbps for the most testing audio signals.
文摘The performance of the speaker recognition system declines when training and testing audio codecs are mismatched. In this paper, based on analyzing the effect of mismatched audio codecs in the linear prediction cepstrum coefficients, a method of MAP-based audio coding compensation for speaker recognition is proposed. The proposed method firstly sets a standard codec as a reference and trains the speaker models in this codec format, then learns the deviation distributions between the standard codec format and the other ones, next gets the current bias via using a small number adaptive data and the MAP-based adaptive technique, and then adjusts the model parameters by the type of coming audio codec format and its related bias. During the test, the features of the coming speaker are used to match with the adjusted model. The experimental result shows that the accuracy reached 82.4% with just one second adaptive data, which is higher 5.5% than that in the baseline system.
文摘针对提高4K超高清电视广播视频压缩效率,减少存储空间和网络带宽这一需求,提出应用音视频编码标准3(Audio Video Coding Standard 3,AVS3)视频编解码技术,利用AVS3编解码标准的高效性,提升4K超高清视频内容的编码质量和传输效率。首先,从节目制播环节入手,通过扩展帧内角度预测和信源动态编码,在保持图像质量的同时降低编码比特率,优化编码过程,节省传输带宽;其次,通过视频分层并行解码和音频解码还放声音技术,提升接收解码及呈现的速度和质量;最后,通过应用实验验证AVS3视频编解码技术在4K超高清电视广播领域的实际效果。