The application of fusion technology is of considerable importance in the field of multi-modal viewport prediction.The latest attention-based fusion methods have been shown to perform well in prediction accuracy.Howev...The application of fusion technology is of considerable importance in the field of multi-modal viewport prediction.The latest attention-based fusion methods have been shown to perform well in prediction accuracy.However,these methods fail to account for the differential density of information among the three modalities involved in viewport prediction-trajectory,visual,and audio.Visual and audio modalities present primitive signal information,while trajectory modality shows advanced time-series information.In this paper,a viewport prediction framework based on a Modality Diversity-Aware(MDA)fusion network is proposed to achieve multi-modal feature interaction.Firstly,we designed a fusion module to promote the combination of visual and auditory modalities,augmenting their efficacy as advanced complementary features.Subsequently,we utilize cross-modal attention to enable reinforced integration of visual-audio fused information and trajectory features.Our method addresses the issue of differing information densities among the three modalities,ensuring a fair and effective interaction between them.To evaluate the efficacy of the proposed approach,we conducted experiments on a widely-used public dataset.Experiments demonstrate that our approach predicts accurate viewport areas with a significant decrease in model parameters.展开更多
基于分治和按需传输思想的分块传输技术是解决三维全息视频流传输的有效手段.然而,现有的分块方案要么缺乏自适应机制,要么不适用于移动实时通信场景.为此,本文提出了VVSTiler(Volumetric Video Streaming Tiling selector),一种面向全...基于分治和按需传输思想的分块传输技术是解决三维全息视频流传输的有效手段.然而,现有的分块方案要么缺乏自适应机制,要么不适用于移动实时通信场景.为此,本文提出了VVSTiler(Volumetric Video Streaming Tiling selector),一种面向全息视频通信的自适应分块传输方法,能够在动态且有限的计算和带宽资源下最大化视频的观感质量.具体而言,本文对不同粒度的分块方案带来的影响进行了初步研究,发现细粒度的分块方案可提高动态网络资源的利用率,粗粒度的分块方案可保证视频编解码效率和鲁棒性.基于此,本文构建了考虑预测视口、可用计算资源以及网络带宽等上下文信息的视频观感质量优化问题,并设计了一个高效的求解方案以支持在线的分块粒度决策.本文在8iVFB(8i Voxelized Full Bodies)标准数据集上将VVSTiler与当前主流的分块传输方法进行了比较.实验结果表明,VVSTiler在有偏差的视口预测情况下实现了高达60.4%的视频观感质量提升,在较准确的视口预测情况下平均每帧视频节省了27%的带宽资源.展开更多
文摘The application of fusion technology is of considerable importance in the field of multi-modal viewport prediction.The latest attention-based fusion methods have been shown to perform well in prediction accuracy.However,these methods fail to account for the differential density of information among the three modalities involved in viewport prediction-trajectory,visual,and audio.Visual and audio modalities present primitive signal information,while trajectory modality shows advanced time-series information.In this paper,a viewport prediction framework based on a Modality Diversity-Aware(MDA)fusion network is proposed to achieve multi-modal feature interaction.Firstly,we designed a fusion module to promote the combination of visual and auditory modalities,augmenting their efficacy as advanced complementary features.Subsequently,we utilize cross-modal attention to enable reinforced integration of visual-audio fused information and trajectory features.Our method addresses the issue of differing information densities among the three modalities,ensuring a fair and effective interaction between them.To evaluate the efficacy of the proposed approach,we conducted experiments on a widely-used public dataset.Experiments demonstrate that our approach predicts accurate viewport areas with a significant decrease in model parameters.
文摘基于分治和按需传输思想的分块传输技术是解决三维全息视频流传输的有效手段.然而,现有的分块方案要么缺乏自适应机制,要么不适用于移动实时通信场景.为此,本文提出了VVSTiler(Volumetric Video Streaming Tiling selector),一种面向全息视频通信的自适应分块传输方法,能够在动态且有限的计算和带宽资源下最大化视频的观感质量.具体而言,本文对不同粒度的分块方案带来的影响进行了初步研究,发现细粒度的分块方案可提高动态网络资源的利用率,粗粒度的分块方案可保证视频编解码效率和鲁棒性.基于此,本文构建了考虑预测视口、可用计算资源以及网络带宽等上下文信息的视频观感质量优化问题,并设计了一个高效的求解方案以支持在线的分块粒度决策.本文在8iVFB(8i Voxelized Full Bodies)标准数据集上将VVSTiler与当前主流的分块传输方法进行了比较.实验结果表明,VVSTiler在有偏差的视口预测情况下实现了高达60.4%的视频观感质量提升,在较准确的视口预测情况下平均每帧视频节省了27%的带宽资源.