期刊文献+

新的全参考音视频同步感知质量评价模型 被引量:2

Novel full reference perceptual quality metric for audio-visual asynchrony
在线阅读 下载PDF
导出
摘要 提出一种利用协惯量分析构建的全参考音视频同步感知质量模型。通过对齐得到待测音频与视频的同步误差。将音视频内容分为纯净语音、无语音和有背景语音3类。将纯净语音类分为视频中有说话人和无说话人2个子类。分别对各类选取多维特征,利用协惯量分析从特征中获得音视频最相关的特征映射和相关程度。通过参考音视频得到相关程度曲线并得到同步误差到感知质量的映射关系。结果表明该模型评测结果与主观实验结果有较好相关性。 A full reference model was proposed to evaluate the perceptual quality of audiovisual asynchrony. A standard synchronization process was used to determine the time difference between audio and video. The mapping between the time difference and the perceptual quality was derived by co-inertia analysis. The co-inertia analysis extracted the most related component from audio and video features, and then formed a mapping for each audiovisual sequence. Audiovisual contents were divided into three categories: clean speech, non speech and mixed speech. The clean speech category was further split into two subcategories. Audio and video features were chosen separately for each category. Subjective test results showed that the proposed model conforms well with subjective results.
出处 《通信学报》 EI CSCD 北大核心 2012年第2期182-190,共9页 Journal on Communications
基金 国家科技重大专项基金项目资助(2010ZX03004-003)~~
关键词 信息处理技术 音视频质量评价 协惯量分析 同步 signal processing technique audiovisual quality assessment co-inertia analysis synchrony
  • 相关文献

参考文献13

  • 1RIMELL A, OWEN A. The effect of focused attention on audio-visual quality perception with applications in multi-model codec design[A] ICASSP 2000[C]. Istanbul, Turkey, 2000. 2377-2380.
  • 2HANDS D S. A basic multimedia quality model[J]. IEEE Transactions on Multimedia,2004,12 (6): 806-816.
  • 3STEINMETZ R. Human perception of jitter and media synchronization[J]. IEEE Journal on Selected Areas in Communications, 1996, 14(1): 61-72.
  • 4NISHIBORI K, TAKEUCHI Y, MATSUMOTO T, et al. Finding the correspondence of audiovisual events by object manipulation[J]. Electronics and Communications, 2009, 92(5): 1-13.
  • 5BREDIN H, CHOLLET G Audiovisual speech synchrony measure application to biometrics[J]. Eurasip Journal on Advances in SignalProcessing, 2007, (3): 1-11.
  • 6GILLET O, ESSID S, RICHARD G~ On the correlation of automatic audio and visual segmentations of music videos[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2007,3(17): 347-355.
  • 7LIU Y Y, SATO Y. Recovery of audio-to-video synchronizationthrough analysis of cross-modality correlation[J]. Pattern Recognition Letters, 2010,31 (8): 696-701.
  • 8ENR1QUE A R, BREDIN H, GARCIA M C, et al. Audio-visual speech asynchrony detection using co-inertia analysis and coupled hidden markov models[J]. Pattern Analysis & Applications, 2009,9(12):271-284.
  • 9EVENO N, BESACIER L. Co-inertia analysis for "liveness" test in audio-visual biometrics[A]. Proceedings of the 4th International Symposium on Image and Signal Processing and Analysis[C]. Zagreb Croatia, 2005.257-261.
  • 10KUMAR K, NAVEATIL J, MARCHERET E, et al. Audio-visual speech synchronization detection using a bimodal linear prediction model[A]. 2009 IEEE Conference on Computer Vision and Pattern Recognition[C]. 2009.53-59.

同被引文献25

  • 1吴张顺,张珣.基于FFmpeg的视频编码存储研究与实现[J].杭州电子科技大学学报(自然科学版),2006,26(3):30-34. 被引量:33
  • 2马燕,李存,李晓勇,刘海涛.基于ARM平台的多媒体播放器的设计与实现[J].计算机工程,2006,32(24):221-222. 被引量:7
  • 3MI Faraj, J Bigun. S ynergy of lip-motion and acoustic features in biometric speech and speaker recognition[ J]. IEEE Transac- tions on Computer,2007,56(9): 1169- 1175.
  • 4S Kumagal, K Doman, et al. Detection of inconsistency between subject and speaker based on the co-occurrence of lip motion and voice towards speech scene extraction from news videos [ A]. IEEE International Symposium on Multimedia[ C]. Cali- fornia: IEEE,2011.311 - 318.
  • 5M Slaney,M Covell. Facesync:A linear operator for measuring synchronization of video facial images and audio track [ A ].Neural Information Processing Systems [ C ]. Denver: NIPSF, 2000. 814 - 820.
  • 6N Eveno, L Besacier. A speaker independent "liveness" test for audio-visual biomelrics [ A ]. Nineth European Conference on Speech Communication and Technology [ C ]. Lisbon: ISCA, 2005. 3081 - 3084.
  • 7G ChoUet, R Landais, et al. Some experiments in audio-visual speech processing [A ]. Non-Linear Speech Processing 2007 [ C]. Paris-ISCA, 2007.28 - 56.
  • 8A Sayo, Y Kajikawa, et al. Biometrics authentication method using lip motion in utterance[ A]. 8th International Conference on Information, Communications and Signal Processing [ C ]. Singapore: IEEF., 2011.1 - 5.
  • 9AA EL-Sallam, AS Mian. Correlation based speech-video syn- chronization[ J]. Pattern Recognition Letters, 2011,32 ( 6 ) : 780 - 786.
  • 10B Goswami, C Chan, et al. Speaker authentication using video- based lip information[ A]. IEEE, International Conference on A- coustics, Speech, and Signal Processing [ C ]. Prague: IEEE, 2011.1908 - 1910.

引证文献2

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部