Video synopsis is an effective way to easily summarize long-recorded surveillance videos.The omnidirectional view allows the observer to select the desired fields of view(FoV)from the different FoVavailable for spheri...Video synopsis is an effective way to easily summarize long-recorded surveillance videos.The omnidirectional view allows the observer to select the desired fields of view(FoV)from the different FoVavailable for spherical surveillance video.By choosing to watch one portion,the observer misses out on the events occurring somewhere else in the spherical scene.This causes the observer to experience fear of missing out(FOMO).Hence,a novel personalized video synopsis approach for the generation of non-spherical videos has been introduced to address this issue.It also includes an action recognition module that makes it easy to display necessary actions by prioritizing them.This work minimizes and maximizes multiple goals such as loss of activity,collision,temporal consistency,length,show,and important action cost respectively.The performance of the proposed framework is evaluated through extensive simulation and compared with the state-of-art video synopsis optimization algorithms.Experimental results suggest that some constraints are better optimized by using the latest metaheuristic optimization algorithms to generate compact personalized synopsis videos from spherical surveillance videos.展开更多
The application of fusion technology is of considerable importance in the field of multi-modal viewport prediction.The latest attention-based fusion methods have been shown to perform well in prediction accuracy.Howev...The application of fusion technology is of considerable importance in the field of multi-modal viewport prediction.The latest attention-based fusion methods have been shown to perform well in prediction accuracy.However,these methods fail to account for the differential density of information among the three modalities involved in viewport prediction-trajectory,visual,and audio.Visual and audio modalities present primitive signal information,while trajectory modality shows advanced time-series information.In this paper,a viewport prediction framework based on a Modality Diversity-Aware(MDA)fusion network is proposed to achieve multi-modal feature interaction.Firstly,we designed a fusion module to promote the combination of visual and auditory modalities,augmenting their efficacy as advanced complementary features.Subsequently,we utilize cross-modal attention to enable reinforced integration of visual-audio fused information and trajectory features.Our method addresses the issue of differing information densities among the three modalities,ensuring a fair and effective interaction between them.To evaluate the efficacy of the proposed approach,we conducted experiments on a widely-used public dataset.Experiments demonstrate that our approach predicts accurate viewport areas with a significant decrease in model parameters.展开更多
文摘Video synopsis is an effective way to easily summarize long-recorded surveillance videos.The omnidirectional view allows the observer to select the desired fields of view(FoV)from the different FoVavailable for spherical surveillance video.By choosing to watch one portion,the observer misses out on the events occurring somewhere else in the spherical scene.This causes the observer to experience fear of missing out(FOMO).Hence,a novel personalized video synopsis approach for the generation of non-spherical videos has been introduced to address this issue.It also includes an action recognition module that makes it easy to display necessary actions by prioritizing them.This work minimizes and maximizes multiple goals such as loss of activity,collision,temporal consistency,length,show,and important action cost respectively.The performance of the proposed framework is evaluated through extensive simulation and compared with the state-of-art video synopsis optimization algorithms.Experimental results suggest that some constraints are better optimized by using the latest metaheuristic optimization algorithms to generate compact personalized synopsis videos from spherical surveillance videos.
文摘The application of fusion technology is of considerable importance in the field of multi-modal viewport prediction.The latest attention-based fusion methods have been shown to perform well in prediction accuracy.However,these methods fail to account for the differential density of information among the three modalities involved in viewport prediction-trajectory,visual,and audio.Visual and audio modalities present primitive signal information,while trajectory modality shows advanced time-series information.In this paper,a viewport prediction framework based on a Modality Diversity-Aware(MDA)fusion network is proposed to achieve multi-modal feature interaction.Firstly,we designed a fusion module to promote the combination of visual and auditory modalities,augmenting their efficacy as advanced complementary features.Subsequently,we utilize cross-modal attention to enable reinforced integration of visual-audio fused information and trajectory features.Our method addresses the issue of differing information densities among the three modalities,ensuring a fair and effective interaction between them.To evaluate the efficacy of the proposed approach,we conducted experiments on a widely-used public dataset.Experiments demonstrate that our approach predicts accurate viewport areas with a significant decrease in model parameters.