提出一种基于听觉滤波器模型与声学特征融合的空间感知质量客观评价方法,适用于低混响条件下的双耳Ambisonics重放.首先,使用听觉滤波器模型处理双耳输入信号,提取空间感知相关客观参量,并结合已有的空间感知和音质相关的参量来构建声...提出一种基于听觉滤波器模型与声学特征融合的空间感知质量客观评价方法,适用于低混响条件下的双耳Ambisonics重放.首先,使用听觉滤波器模型处理双耳输入信号,提取空间感知相关客观参量,并结合已有的空间感知和音质相关的参量来构建声学特征集.然后,采用高斯回归过程(Gaussian Process Regression,GPR)模型建立特征集与主观评分的映射关系,以构建客观评价模型.为了验证该方法的有效性,开展主观评价实验,采用无混响/低混响仿真声学场景中的不同双耳Ambisonics重放算法生成的语音信号作为测试激励信号,获得主观评分数据,使用交叉验证的方式训练客观模型并评估模型性能.实验结果显示,与现有的评价模型相比,提出的模型在预测精确度方面取得了显著提升.此外,使用公开的Ambisonics格式(Ambix)音频及其主观评分数据进行外部验证,进一步证明了提出的模型的泛化能力和稳定性.展开更多
Binaural rendering is of great interest to virtual reality and immersive media. Although humans can naturally use their two ears to perceive the spatial information contained in sounds, it is a challenging task for ma...Binaural rendering is of great interest to virtual reality and immersive media. Although humans can naturally use their two ears to perceive the spatial information contained in sounds, it is a challenging task for machines to achieve binaural rendering since the description of a sound field often requires multiple channels and even the metadata of the sound sources. In addition, the perceived sound varies from person to person even in the same sound field. Previous methods generally rely on individual-dependent head-related transferred function(HRTF)datasets and optimization algorithms that act on HRTFs. In practical applications, there are two major drawbacks to existing methods. The first is a high personalization cost, as traditional methods achieve personalized needs by measuring HRTFs. The second is insufficient accuracy because the optimization goal of traditional methods is to retain another part of information that is more important in perception at the cost of discarding a part of the information. Therefore, it is desirable to develop novel techniques to achieve personalization and accuracy at a low cost. To this end, we focus on the binaural rendering of ambisonic and propose 1) channel-shared encoder and channel-compared attention integrated into neural networks and 2) a loss function quantifying interaural level differences to deal with spatial information. To verify the proposed method, we collect and release the first paired ambisonic-binaural dataset and introduce three metrics to evaluate the content information and spatial information accuracy of the end-to-end methods. Extensive experimental results on the collected dataset demonstrate the superior performance of the proposed method and the shortcomings of previous methods.展开更多
0引言高阶Ambisonics声重放系统(High Order Ambisonics,HOA)采用空间球谐函数展开,逐级逼近理想声场的方法,记录原声场空间信息并重放一个与原声场相同或尽可能相似的三维声场[1]。重放声场的精度可以通过选择球谐展开级数来调节,重放...0引言高阶Ambisonics声重放系统(High Order Ambisonics,HOA)采用空间球谐函数展开,逐级逼近理想声场的方法,记录原声场空间信息并重放一个与原声场相同或尽可能相似的三维声场[1]。重放声场的精度可以通过选择球谐展开级数来调节,重放时,可有多种不同的扬声器布置方案。因此,HOA声重放系统的效果不仅与球谐函数的阶数有关。展开更多
在基于扬声器阵列的声回放技术中,相比于波场合成(Wave field synthesis,WFS)和矢量基幅度平移技术,以球谐分解为理论基础的Ambisonics回放系统拥有编解码相互独立以及可扩展等优势.基本的Ambisonics系统通过采集和回放一阶声场方位信息...在基于扬声器阵列的声回放技术中,相比于波场合成(Wave field synthesis,WFS)和矢量基幅度平移技术,以球谐分解为理论基础的Ambisonics回放系统拥有编解码相互独立以及可扩展等优势.基本的Ambisonics系统通过采集和回放一阶声场方位信息:全指向性(W)和双指向性(X,Y,Z)成分,即所谓的B-format重构声场.HOA(Higher order Ambisonics)基于声场球谐函数分解将B-format以更高空间分辨率进行了扩展.已有很多学者关注HOA的回放精度和使用限制,但综合考虑人头散射效应和人耳听觉特性效应的研究还十分缺乏.从低频人耳定位的重要因素——双耳时间差(Interaural time difference,ITD)的角度评价了不同阶数Ambisonis系统的最佳听音区域.将水平面各方向入射平面波编码为HOA分量,在此基础上计算了模拟人头(刚性球)在环形阵列内部移动时二维水平面的ITD波动,并通过ITD阈值确定最佳听音区域边界.仿真结果表明基于ITD的客观评价指标可以较好地体现不同阶数Ambisonics系统的声场回放性能:4阶Ambisoics系统能够使最佳听音区域达到20cm×14cm,而1,2阶系统在中心区域尚不能实现精确回放.因此,高阶Ambisoics系统拥有更好的声源定位性能.展开更多
Ambisonics is a series of spatial sound reproduction system based on spatial harmonics decomposition and each order approximation of sound field.Ambisonics signals are originally intended for loudspeakers reproduction...Ambisonics is a series of spatial sound reproduction system based on spatial harmonics decomposition and each order approximation of sound field.Ambisonics signals are originally intended for loudspeakers reproduction.By using head-related transfer functions(HRTFs)filters,binaural Ambisonics converts the Ambisonics signals for static or dynamic headphone reproduction.In present work,the performances of static and dynamic binaural Ambisonics reproduction are evaluated and compared.The mean binaural pressure errors across target source directions are first analyzed.Then a virtual source localization experiment is conducted,and the localization performances are evaluated by analyzing the percentages of front-back and up-down confusion,the mean angle error and discreteness in the localization results.The results indicate that binaural Ambsonics reproduction with insufficiently high order(for example,5-10 order)is unable to recreate correct high-frequency magnitude spectra in binaural pressures,resulting in degradation in localization for static reproduction.Because dynamic localization cue is included,dynamic binaural Ambisoncis reproduction yields obviously better localization performance than static reproduction with the same order.Even a 3-order dynamic binaural Ambisoncis reproduction exhibits appropriate localizations performance.展开更多
The present work proposes a method of approximate implementation of constant-power-equalization in signal mixing for horizontal Ambisonics with a nonuniform loudspeaker configuration derived from simulation and microp...The present work proposes a method of approximate implementation of constant-power-equalization in signal mixing for horizontal Ambisonics with a nonuniform loudspeaker configuration derived from simulation and microphone-array recording.Ambisonics signal mixing with approximately constant-power-equalization is obtained by adding higher-order spatial harmonics in original signal mixing.The power characteristics of the mixed signal is further analyzed,and the appropriate truncation order is selected based on the criterion that the fluctuation of the overall signal power across the target source azimuth is less than 3.0 dB after equalization.Moore’s modified binaural loudness model serves to analyze the proposed method,and a psychoacoustic experiment is conducted to validate it.The results indicate that the proposed method reduces the timbre coloration in the azimuthal range where the power fluctuation is obvious without equalization.展开更多
文摘提出一种基于听觉滤波器模型与声学特征融合的空间感知质量客观评价方法,适用于低混响条件下的双耳Ambisonics重放.首先,使用听觉滤波器模型处理双耳输入信号,提取空间感知相关客观参量,并结合已有的空间感知和音质相关的参量来构建声学特征集.然后,采用高斯回归过程(Gaussian Process Regression,GPR)模型建立特征集与主观评分的映射关系,以构建客观评价模型.为了验证该方法的有效性,开展主观评价实验,采用无混响/低混响仿真声学场景中的不同双耳Ambisonics重放算法生成的语音信号作为测试激励信号,获得主观评分数据,使用交叉验证的方式训练客观模型并评估模型性能.实验结果显示,与现有的评价模型相比,提出的模型在预测精确度方面取得了显著提升.此外,使用公开的Ambisonics格式(Ambix)音频及其主观评分数据进行外部验证,进一步证明了提出的模型的泛化能力和稳定性.
基金supported in part by the National Natural Science Foundation of China (62176059, 62101136)。
文摘Binaural rendering is of great interest to virtual reality and immersive media. Although humans can naturally use their two ears to perceive the spatial information contained in sounds, it is a challenging task for machines to achieve binaural rendering since the description of a sound field often requires multiple channels and even the metadata of the sound sources. In addition, the perceived sound varies from person to person even in the same sound field. Previous methods generally rely on individual-dependent head-related transferred function(HRTF)datasets and optimization algorithms that act on HRTFs. In practical applications, there are two major drawbacks to existing methods. The first is a high personalization cost, as traditional methods achieve personalized needs by measuring HRTFs. The second is insufficient accuracy because the optimization goal of traditional methods is to retain another part of information that is more important in perception at the cost of discarding a part of the information. Therefore, it is desirable to develop novel techniques to achieve personalization and accuracy at a low cost. To this end, we focus on the binaural rendering of ambisonic and propose 1) channel-shared encoder and channel-compared attention integrated into neural networks and 2) a loss function quantifying interaural level differences to deal with spatial information. To verify the proposed method, we collect and release the first paired ambisonic-binaural dataset and introduce three metrics to evaluate the content information and spatial information accuracy of the end-to-end methods. Extensive experimental results on the collected dataset demonstrate the superior performance of the proposed method and the shortcomings of previous methods.
文摘0引言高阶Ambisonics声重放系统(High Order Ambisonics,HOA)采用空间球谐函数展开,逐级逼近理想声场的方法,记录原声场空间信息并重放一个与原声场相同或尽可能相似的三维声场[1]。重放声场的精度可以通过选择球谐展开级数来调节,重放时,可有多种不同的扬声器布置方案。因此,HOA声重放系统的效果不仅与球谐函数的阶数有关。
文摘在基于扬声器阵列的声回放技术中,相比于波场合成(Wave field synthesis,WFS)和矢量基幅度平移技术,以球谐分解为理论基础的Ambisonics回放系统拥有编解码相互独立以及可扩展等优势.基本的Ambisonics系统通过采集和回放一阶声场方位信息:全指向性(W)和双指向性(X,Y,Z)成分,即所谓的B-format重构声场.HOA(Higher order Ambisonics)基于声场球谐函数分解将B-format以更高空间分辨率进行了扩展.已有很多学者关注HOA的回放精度和使用限制,但综合考虑人头散射效应和人耳听觉特性效应的研究还十分缺乏.从低频人耳定位的重要因素——双耳时间差(Interaural time difference,ITD)的角度评价了不同阶数Ambisonis系统的最佳听音区域.将水平面各方向入射平面波编码为HOA分量,在此基础上计算了模拟人头(刚性球)在环形阵列内部移动时二维水平面的ITD波动,并通过ITD阈值确定最佳听音区域边界.仿真结果表明基于ITD的客观评价指标可以较好地体现不同阶数Ambisonics系统的声场回放性能:4阶Ambisoics系统能够使最佳听音区域达到20cm×14cm,而1,2阶系统在中心区域尚不能实现精确回放.因此,高阶Ambisoics系统拥有更好的声源定位性能.
基金This work was supported by the National Natural Science Foundation of China(11674105)State Key Lab of Subtropical Building Science,South China University of Technology.
文摘Ambisonics is a series of spatial sound reproduction system based on spatial harmonics decomposition and each order approximation of sound field.Ambisonics signals are originally intended for loudspeakers reproduction.By using head-related transfer functions(HRTFs)filters,binaural Ambisonics converts the Ambisonics signals for static or dynamic headphone reproduction.In present work,the performances of static and dynamic binaural Ambisonics reproduction are evaluated and compared.The mean binaural pressure errors across target source directions are first analyzed.Then a virtual source localization experiment is conducted,and the localization performances are evaluated by analyzing the percentages of front-back and up-down confusion,the mean angle error and discreteness in the localization results.The results indicate that binaural Ambsonics reproduction with insufficiently high order(for example,5-10 order)is unable to recreate correct high-frequency magnitude spectra in binaural pressures,resulting in degradation in localization for static reproduction.Because dynamic localization cue is included,dynamic binaural Ambisoncis reproduction yields obviously better localization performance than static reproduction with the same order.Even a 3-order dynamic binaural Ambisoncis reproduction exhibits appropriate localizations performance.
基金supported by the National Natural Science Foundation of China(12174118).
文摘The present work proposes a method of approximate implementation of constant-power-equalization in signal mixing for horizontal Ambisonics with a nonuniform loudspeaker configuration derived from simulation and microphone-array recording.Ambisonics signal mixing with approximately constant-power-equalization is obtained by adding higher-order spatial harmonics in original signal mixing.The power characteristics of the mixed signal is further analyzed,and the appropriate truncation order is selected based on the criterion that the fluctuation of the overall signal power across the target source azimuth is less than 3.0 dB after equalization.Moore’s modified binaural loudness model serves to analyze the proposed method,and a psychoacoustic experiment is conducted to validate it.The results indicate that the proposed method reduces the timbre coloration in the azimuthal range where the power fluctuation is obvious without equalization.