期刊文献+
共找到1,037篇文章
< 1 2 52 >
每页显示 20 50 100
Video Frame Prediction by Joint Optimization of Direct Frame Synthesis and Optical-Flow Estimation
1
作者 Navin Ranjan Sovit Bhandari +1 位作者 Yeong-Chan Kim Hoon Kim 《Computers, Materials & Continua》 SCIE EI 2023年第5期2615-2639,共25页
Video prediction is the problem of generating future frames by exploiting the spatiotemporal correlation from the past frame sequence.It is one of the crucial issues in computer vision and has many real-world applicat... Video prediction is the problem of generating future frames by exploiting the spatiotemporal correlation from the past frame sequence.It is one of the crucial issues in computer vision and has many real-world applications,mainly focused on predicting future scenarios to avoid undesirable outcomes.However,modeling future image content and object is challenging due to the dynamic evolution and complexity of the scene,such as occlusions,camera movements,delay and illumination.Direct frame synthesis or optical-flow estimation are common approaches used by researchers.However,researchers mainly focused on video prediction using one of the approaches.Both methods have limitations,such as direct frame synthesis,usually face blurry prediction due to complex pixel distributions in the scene,and optical-flow estimation,usually produce artifacts due to large object displacements or obstructions in the clip.In this paper,we constructed a deep neural network Frame Prediction Network(FPNet-OF)with multiplebranch inputs(optical flow and original frame)to predict the future video frame by adaptively fusing the future object-motion with the future frame generator.The key idea is to jointly optimize direct RGB frame synthesis and dense optical flow estimation to generate a superior video prediction network.Using various real-world datasets,we experimentally verify that our proposed framework can produce high-level video frame compared to other state-ofthe-art framework. 展开更多
关键词 video frame prediction multi-step prediction optical-flow prediction DELAY deep learning
在线阅读 下载PDF
Flow-aware synthesis: A generic motion model for video frame interpolation
2
作者 Jinbo Xing Wenbo Hu +1 位作者 Yuechen Zhang Tien-Tsin Wong 《Computational Visual Media》 EI CSCD 2021年第3期393-405,共13页
A popular and challenging task in video research,frame interpolation aims to increase the frame rate of video.Most existing methods employ a fixed motion model,e.g.,linear,quadratic,or cubic,to estimate the intermedia... A popular and challenging task in video research,frame interpolation aims to increase the frame rate of video.Most existing methods employ a fixed motion model,e.g.,linear,quadratic,or cubic,to estimate the intermediate warping field.However,such fixed motion models cannot well represent the complicated non-linear motions in the real world or rendered animations.Instead,we present an adaptive flow prediction module to better approximate the complex motions in video.Furthermore,interpolating just one intermediate frame between consecutive input frames may be insufficient for complicated non-linear motions.To enable multi-frame interpolation,we introduce the time as a control variable when interpolating frames between original ones in our generic adaptive flow prediction module.Qualitative and quantitative experimental results show that our method can produce high-quality results and outperforms the existing stateof-the-art methods on popular public datasets. 展开更多
关键词 flow-aware generic motion model video frame interpolation
原文传递
Rate-distortion optimized frame dropping and scheduling for multi-user conversational and streaming video 被引量:1
3
作者 CHAKARESKI Jacob STEINBACH Eckehard 《Journal of Zhejiang University-Science A(Applied Physics & Engineering)》 SCIE EI CAS CSCD 2006年第5期864-872,共9页
We propose a Rate-Distortion (RD) optimized strategy for frame-dropping and scheduling of multi-user conversa- tional and streaming videos. We consider a scenario where conversational and streaming videos share the fo... We propose a Rate-Distortion (RD) optimized strategy for frame-dropping and scheduling of multi-user conversa- tional and streaming videos. We consider a scenario where conversational and streaming videos share the forwarding resources at a network node. Two buffers are setup on the node to temporarily store the packets for these two types of video applications. For streaming video, a big buffer is used as the associated delay constraint of the application is moderate and a very small buffer is used for conversational video to ensure that the forwarding delay of every packet is limited. A scheduler is located behind these two buffers that dynamically assigns transmission slots on the outgoing link to the two buffers. Rate-distortion side information is used to perform RD-optimized frame dropping in case of node overload. Sharing the data rate on the outgoing link between the con- versational and the streaming videos is done either based on the fullness of the two associated buffers or on the mean incoming rates of the respective videos. Simulation results showed that our proposed RD-optimized frame dropping and scheduling ap- proach provides significant improvements in performance over the popular priority-based random dropping (PRD) technique. 展开更多
关键词 RATE-DISTORTION optimization video frame dropping CONVERSATIONAL video Streaming video Distortion matrix Hinttracks Scheduling Resource assignment
在线阅读 下载PDF
Video Inter-Frame Forgery Identification Based on Consistency of Correlation Coefficients of Gray Values 被引量:4
4
作者 Qi Wang Zhaohong Li +1 位作者 Zhenzhen Zhang Qinglong Ma 《Journal of Computer and Communications》 2014年第4期51-57,共7页
Identifying inter-frame forgery is a hot topic in video forensics. In this paper, we propose a method based on the assumption that the correlation coefficients of gray values is consistent in an original video, while ... Identifying inter-frame forgery is a hot topic in video forensics. In this paper, we propose a method based on the assumption that the correlation coefficients of gray values is consistent in an original video, while in forgeries the consistency will be destroyed. We first extract the consistency of correlation coefficients of gray values (CCCoGV for short) after normalization and quantization as distinguishing feature to identify interframe forgeries. Then we test the CCCoGV in a large database with the help of SVM (Support Vector Machine). Experimental results show that the proposed method is efficient in classifying original videos and forgeries. Furthermore, the proposed method performs also pretty well in classifying frame insertion and frame deletion forgeries. 展开更多
关键词 INTER-frame Forgeries CONTENT CONSISTENCY video FORENSICS
在线阅读 下载PDF
Real-Time Mosaic Method of Aerial Video Based on Two-Stage Key Frame Selection Method
5
作者 Minwen Yuan Yonghong Long Xin Li 《Open Journal of Applied Sciences》 2024年第4期1008-1021,共14页
A two-stage automatic key frame selection method is proposed to enhance stitching speed and quality for UAV aerial videos. In the first stage, to reduce redundancy, the overlapping rate of the UAV aerial video sequenc... A two-stage automatic key frame selection method is proposed to enhance stitching speed and quality for UAV aerial videos. In the first stage, to reduce redundancy, the overlapping rate of the UAV aerial video sequence within the sampling period is calculated. Lagrange interpolation is used to fit the overlapping rate curve of the sequence. An empirical threshold for the overlapping rate is then applied to filter candidate key frames from the sequence. In the second stage, the principle of minimizing remapping spots is used to dynamically adjust and determine the final key frame close to the candidate key frames. Comparative experiments show that the proposed method significantly improves stitching speed and accuracy by more than 40%. 展开更多
关键词 UAV Aerial video Image Stiching Key frame Selection Overlapping Rate Remap Error
在线阅读 下载PDF
Multiresolution Video Watermarking Algorithm Exploiting the Block-Based Motion Estimation 被引量:2
6
作者 Salwa A. K. Mostafa Abdelrahman Ali 《Journal of Information Security》 2016年第4期260-268,共9页
This paper presents a novel technique for embedding a digital watermark into video frames based on motion vectors and discrete wavelet transform (DWT). In the proposed scheme, the binary image watermark is divided int... This paper presents a novel technique for embedding a digital watermark into video frames based on motion vectors and discrete wavelet transform (DWT). In the proposed scheme, the binary image watermark is divided into blocks and each watermark block is embedded several times in each selected video frame at different locations. The block-based motion estimation algorithm is used to select the video frame blocks having the greatest motion vectors magnitude. The DWT is applied to the selected frame blocks, and then, the watermark block is hidden into these blocks by modifying the coefficients of the Horizontal sub-bands (HL). Adding the watermark at different locations in the same video frame makes the scheme more robust against different types of attacks. The method was tested on different types of videos. The average peak signal to noise ratio (PSNR) and the normalized correlation (NC) are used to measure the performance of the proposed method. Experimental results show that the proposed algorithm does not affect the visual quality of video frames and the scheme is robust against a variety of attacks. 展开更多
关键词 Digital video Watermarking Wavelet Transform Motion Vector Motion Estimation video frame
在线阅读 下载PDF
Deepfake Video Detection Employing Human Facial Features
7
作者 Daniel Schilling Weiss Nguyen Desmond T. Ademiluyi 《Journal of Computer and Communications》 2023年第12期1-13,共13页
Deepfake technology can be used to replace people’s faces in videos or pictures to show them saying or doing things they never said or did. Deepfake media are often used to extort, defame, and manipulate public opini... Deepfake technology can be used to replace people’s faces in videos or pictures to show them saying or doing things they never said or did. Deepfake media are often used to extort, defame, and manipulate public opinion. However, despite deepfake technology’s risks, current deepfake detection methods lack generalization and are inconsistent when applied to unknown videos, i.e., videos on which they have not been trained. The purpose of this study is to develop a generalizable deepfake detection model by training convoluted neural networks (CNNs) to classify human facial features in videos. The study formulated the research questions: “How effectively does the developed model provide reliable generalizations?” A CNN model was trained to distinguish between real and fake videos using the facial features of human subjects in videos. The model was trained, validated, and tested using the FaceForensiq++ dataset, which contains more than 500,000 frames and subsets of the DFDC dataset, totaling more than 22,000 videos. The study demonstrated high generalizability, as the accuracy of the unknown dataset was only marginally (about 1%) lower than that of the known dataset. The findings of this study indicate that detection systems can be more generalizable, lighter, and faster by focusing on just a small region (the human face) of an entire video. 展开更多
关键词 Artificial Intelligence Convoluted Neural Networks Deepfake GANs GENERALIZATION Deep Learning Facial Features video frames
暂未订购
Side Information Generation Based on Hierarchical Motion Estimation in Distributed Video Coding 被引量:4
8
作者 刘荣科 岳志 陈长汶 《Chinese Journal of Aeronautics》 SCIE EI CAS CSCD 2009年第2期167-173,共7页
The side information quality has an immense effect on the compression efficiency of the distributed video coding (DVC) sys- tem. This article, based on the hierarchical motion estimation (HME), proposes a new side inf... The side information quality has an immense effect on the compression efficiency of the distributed video coding (DVC) sys- tem. This article, based on the hierarchical motion estimation (HME), proposes a new side information generation algorithm which is integrated into DVC system. First, forward motion estimation (FME) and bidirectional motion estimation (BME) on the basis of variable block size HME algorithm are used to acquire relatively accurate motion vectors. Second, a motion vector filter (MVF) is i... 展开更多
关键词 communication technology video signal processing hierarchical motion estimation side information motion vector filter frame interpolation
原文传递
A Wireless Video Capsule Endoscopy System: Design and Realization 被引量:1
9
作者 朱柄全 颜国正 +1 位作者 刘刚 徐文铭 《Journal of Shanghai Jiaotong university(Science)》 EI 2015年第6期649-653,共5页
Wireless capsule endoscopy(CE), an image inspection technique, has been an important advancement in the diagnosis of gastrointestinal(GI) tract diseases. A video capsule endoscopy(VCE) system is analyzed in this study... Wireless capsule endoscopy(CE), an image inspection technique, has been an important advancement in the diagnosis of gastrointestinal(GI) tract diseases. A video capsule endoscopy(VCE) system is analyzed in this study. A complementary metal oxide semiconductor(CMOS) analog image sensor is adopted, and other illumination, communication and energy modules are designed for functional realization. Measuring only φ11 mm ×25 mm, the VCE has a total power consumption of 52.5 m W, which enables it to work continuously for 8 h. The in vivo experiment on a living pig indicates that a clear video with high frame rate of 30 f/s can be obtained. 展开更多
关键词 video capsule endoscopy(VCE) frame rate working hours in vivo experiment
原文传递
Authentication of Video Evidence for Forensic Investigation: A Case of Nigeria 被引量:1
10
作者 Beatrice O. Akumba Aamo Iorliam +2 位作者 Selumun Agber Emmanuel Odeh Okube Kenneth Dekera Kwaghtyo 《Journal of Information Security》 2021年第2期163-176,共14页
Video shreds of evidence are usually admissible in the court of law all over the world. However, individuals manipulate these videos to either defame or incriminate innocent people. Others indulge in video tampering t... Video shreds of evidence are usually admissible in the court of law all over the world. However, individuals manipulate these videos to either defame or incriminate innocent people. Others indulge in video tampering to falsely escape the wrath of the law against misconducts. One way impostors can forge these videos is through inter-frame video forgery. Thus, the integrity of such videos is under threat. This is because these digital forgeries seriously debase the credibility of video contents as being definite records of events. <span style="font-family:Verdana;">This leads to an increasing concern about the trustworthiness of video contents. Hence, it continues to affect the social and legal system, forensic investigations, intelligence services, and security and surveillance systems as the case may be. The problem of inter-frame video forgery is increasingly spontaneous as more video-editing software continues to emerge. These video editing tools can easily manipulate videos without leaving obvious traces and these tampered videos become viral. Alarmingly, even the beginner users of these editing tools can alter the contents of digital videos in a manner that renders them practically indistinguishable from the original content by mere observations. </span><span style="font-family:Verdana;">This paper, however, leveraged on the concept of correlation coefficients to produce a more elaborate and reliable inter-frame video detection to aid forensic investigations, especially in Nigeria. The model employed the use of the idea of a threshold to efficiently distinguish forged videos from authentic videos. A benchmark and locally manipulated video datasets were used to evaluate the proposed model. Experimentally, our approach performed better than the existing methods. The overall accuracy for all the evaluation metrics such as accuracy, recall, precision and F1-score was 100%. The proposed method implemented in the MATLAB programming language has proven to effectively detect inter-frame forgeries.</span> 展开更多
关键词 INTER-frame video Forgery Correlation Coefficients Forensic Investigation Threshold
在线阅读 下载PDF
基于低秩与稀疏分解的VideoSAR散射关键帧提取 被引量:1
11
作者 张营 冀贞海 +2 位作者 魏阳杰 刘志武 吴昊 《空间电子技术》 2023年第1期93-98,共6页
视频合成孔径雷达(video synthetic aperture radar,VideoSAR)的超长相干孔径观测使得区域动态信息的快速浏览极其困难。为以机器视觉方式自动捕捉地物散射消失-瞬态持续-消失-瞬态持续-消失的关键帧变化全过程,提出了一种子孔径能量梯... 视频合成孔径雷达(video synthetic aperture radar,VideoSAR)的超长相干孔径观测使得区域动态信息的快速浏览极其困难。为以机器视觉方式自动捕捉地物散射消失-瞬态持续-消失-瞬态持续-消失的关键帧变化全过程,提出了一种子孔径能量梯度(subaperture energy gradient,SEG)和低秩与稀疏分解(low-rank plus sparse decomposition,LRSD)相结合的VideoSAR关键帧提取器。提取器为系列性通用架构,适用于任何SEG和LRSD系列方法相结合的形式。所提技术首要针对同时单通道、单波段、单航迹等有限信息条件的解决途径,有助于打破应急响应场景中难以采集多通道、多波段、多航迹或多传感器数据的应用局限性。基于实测数据处理和多种先进LRSD算法进行了对比验证,其代表性散射信息的充分提取可促进未来快速地理解并浓缩区域动态。 展开更多
关键词 视频合成孔径雷达 散射关键帧 低秩与稀疏分解
在线阅读 下载PDF
Algorithm Research on Moving Object Detection of Surveillance Video Sequence 被引量:2
12
作者 Kuihe Yang Zhiming Cai Lingling Zhao 《Optics and Photonics Journal》 2013年第2期308-312,共5页
In video surveillance, there are many interference factors such as target changes, complex scenes, and target deformation in the moving object tracking. In order to resolve this issue, based on the comparative analysi... In video surveillance, there are many interference factors such as target changes, complex scenes, and target deformation in the moving object tracking. In order to resolve this issue, based on the comparative analysis of several common moving object detection methods, a moving object detection and recognition algorithm combined frame difference with background subtraction is presented in this paper. In the algorithm, we first calculate the average of the values of the gray of the continuous multi-frame image in the dynamic image, and then get background image obtained by the statistical average of the continuous image sequence, that is, the continuous interception of the N-frame images are summed, and find the average. In this case, weight of object information has been increasing, and also restrains the static background. Eventually the motion detection image contains both the target contour and more target information of the target contour point from the background image, so as to achieve separating the moving target from the image. The simulation results show the effectiveness of the proposed algorithm. 展开更多
关键词 video SURVEILLANCE MOVING Object Detection frame DIFFERENCE BACKGROUND SUBTRACTION
在线阅读 下载PDF
Intelligent Mobile Video Surveillance System with Multilevel Distillation
13
作者 Yuan-Kai Wang Hung-Yu Chen 《Journal of Electronic Science and Technology》 CAS CSCD 2017年第2期133-140,共8页
This paper proposes a mobile video surveillance system consisting of intelligent video analysis and mobile communication networking. This multilevel distillation approach helps mobile users monitor tremendous surveill... This paper proposes a mobile video surveillance system consisting of intelligent video analysis and mobile communication networking. This multilevel distillation approach helps mobile users monitor tremendous surveillance videos on demand through video streaming over mobile communication networks. The intelligent video analysis includes moving object detection/tracking and key frame selection which can browse useful video clips. The communication networking services, comprising video transcoding, multimedia messaging, and mobile video streaming, transmit surveillance information into mobile appliances. Moving object detection is achieved by background subtraction and particle filter tracking. Key frame selection, which aims to deliver an alarm to a mobile client using multimedia messaging service accompanied with an extracted clear frame, is reached by devising a weighted importance criterion considering object clarity and face appearance. Besides, a spatial- domain cascaded transcoder is developed to convert the filtered image sequence of detected objects into the mobile video streaming format. Experimental results show that the system can successfully detect all events of moving objects for a complex surveillance scene, choose very appropriate key frames for users, and transcode the images with a high power signal-to-noise ratio (PSNR). 展开更多
关键词 Index Terms---Mobile video streaming moving object detection key frame extraction video surveillance video transcoding.
在线阅读 下载PDF
Real-time detection of moving objects in video sequences
14
作者 宋红 石峰 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2005年第3期687-691,共5页
An approach to detection of moving objects in video sequences, with application to video surveillance is presented. The algorithm combines two kinds of change points, which are detected from the region-based frame dif... An approach to detection of moving objects in video sequences, with application to video surveillance is presented. The algorithm combines two kinds of change points, which are detected from the region-based frame difference and adjusted background subtraction. An adaptive threshold technique is employed to automatically choose the threshold value to segment the moving objects from the still background. And experiment results show that the algorithm is effective and efficient in practical situations. Furthermore, the algorithm is robust to the effects of the changing of lighting condition and can be applied for video surveillance system. 展开更多
关键词 object detection video surveillance region-based frame difference adjusted background subtraction.
在线阅读 下载PDF
Digitalization of Underwater Video Image Using High Speed DSP Chip
15
作者 许茹 《High Technology Letters》 EI CAS 1999年第1期49-53,共5页
This paper introduces a system based on Tls fifth generation DSP(Digital Signal Processor) device-TMS320C50 to construct the simplest system of digitalizing underwater video signal. The system realizes collecting 3 di... This paper introduces a system based on Tls fifth generation DSP(Digital Signal Processor) device-TMS320C50 to construct the simplest system of digitalizing underwater video signal. The system realizes collecting 3 different density image data by means of software designation. The system may expand its outer data memory to 4 Giga byte by using a technology of memory page extension. Two different interface circuits for different speed peripheral devices and C50 are also designed: one is high speed A/D, and the other is static memory whose access time is 70ns. The system can digitalize analog video signal and process the gathered data in limited time. 展开更多
关键词 UNDERWATER video IMAGE DSP Software COLLECTION Extended frame memory
在线阅读 下载PDF
A channel distortion model for video over lossy packet networks
16
作者 CHENG Jian-xin GAO Zhen-ming ZHANG Zhi-chao 《Journal of Zhejiang University-Science A(Applied Physics & Engineering)》 SCIE EI CAS CSCD 2006年第z1期48-53,共6页
Error-resilient video communication over lossy packet networks is often designed and operated based on models for the effect of losses on the reconstructed video quality. This paper analyzes the channel distortion for... Error-resilient video communication over lossy packet networks is often designed and operated based on models for the effect of losses on the reconstructed video quality. This paper analyzes the channel distortion for video over lossy packet networks and proposes a new model that, compared to previous models, more accurately estimates the expected mean-squared error distortion for different packet loss patterns by accounting for inter-frame error propagation and the correlation between error frames. The accuracy of the proposed model is validated with JVT/H.264 encoded standard test sequences and previous frame concealment, where the proposed model provides an obvious accuracy gain over previous models. 展开更多
关键词 CHANNEL distortion PACKET loss INTER-frame error propagation Correlation video communication
在线阅读 下载PDF
海量数字媒体视频无损转码重压缩的轻量化检测算法
17
作者 董华松 连远锋 《吉林大学学报(工学版)》 北大核心 2025年第2期741-747,共7页
为及时获取视频质量,提出海量数字媒体视频无损转码重压缩的轻量化检测算法。该方法根据高性能视频编码将视频编码帧划分成若干编码树块,并在通过视频率失真优化保障划分过程中的视频质量的同时,尽量减少计算和存储的开销,实现一定程度... 为及时获取视频质量,提出海量数字媒体视频无损转码重压缩的轻量化检测算法。该方法根据高性能视频编码将视频编码帧划分成若干编码树块,并在通过视频率失真优化保障划分过程中的视频质量的同时,尽量减少计算和存储的开销,实现一定程度上的轻量化;再以此为基础,对视频中各图像帧开展像素值、色彩空间以及运动矢量的提取,并将提取结果与初始视频像素值、色彩特征以及运动矢量展开比较分析,从而确定重压缩视频质量是否受到损伤,实现重压缩视频的轻量化检测。实验结果表明:利用本文方法开展压缩视频质量检测时,压缩视频图像的颜色分量提取结果与实际压缩视频图像颜色分量相一致,且当视频数量达到5000个时,视频检测误报数量检测结果为3个,进一步说明本文方法的检测性能高、效果好。 展开更多
关键词 数字媒体视频 无损转码重压缩 轻量化检测 视频帧划分 色彩特征提取
原文传递
基于关键帧的频域多特征融合的Deepfake视频检测
18
作者 王金伟 张玫瑰 +2 位作者 张家伟 罗向阳 马宾 《应用科学学报》 北大核心 2025年第3期451-462,共12页
现有的Deepfake视频检测方法为节约计算资源,避免数据冗余,大多随机选取视频的多帧或部分段作为检测对象,因而会降低检测对象的表征能力以及限制检测的性能。此外,现有算法在单一数据集上的检测效果良好,但在跨数据集检测时性能下降严重... 现有的Deepfake视频检测方法为节约计算资源,避免数据冗余,大多随机选取视频的多帧或部分段作为检测对象,因而会降低检测对象的表征能力以及限制检测的性能。此外,现有算法在单一数据集上的检测效果良好,但在跨数据集检测时性能下降严重,泛化能力有待进一步提升。为此,提出了一种基于关键帧的频域多特征融合的Deepfake视频检测算法。利用频域的均方误差提取关键帧作为检测对象,并将频域学习主帧的伪影特征和关键帧间的时间不一致性进行融合后输入到全连接层中,从而获得最终的检测结果。实验结果表明,所提算法在跨数据集检测任务中的性能优于现有算法,具有较强的泛化性。 展开更多
关键词 Deepfake视频检测 关键帧 频域 多特征融合
在线阅读 下载PDF
Trace-Based Analysis of MPEG-4 AVC Video Streams
19
作者 Kizouris Panagiotis Papadakis Andreas 《Journal of Computer and Communications》 2019年第1期34-48,共15页
MPEG-4 AVC encoded video streams have been analyzed using video traces and statistical features have been extracted, in the context of supporting efficient deployment of networked and multimedia services. The statisti... MPEG-4 AVC encoded video streams have been analyzed using video traces and statistical features have been extracted, in the context of supporting efficient deployment of networked and multimedia services. The statistical features include the number of scenes composing the video and the sizes of different types of frames, within the overall trace and each scene. Statistical processing has been performed upon the traces and subsequent fitting upon statistical distributions (Pareto and lognormal). Through the construction of a synthetic trace, based upon this analysis, our selections of statistical distribution have been verified. In addition, different types of content, in terms of level of activity (quantified as different scene change ratio) have been considered. Through modelling and fitting, the stability of the main statistical parameters has been verified as well as observations on the dependence of these parameters upon the video activity level. 展开更多
关键词 SYNTHETIC TRAFFIC video TRACE MPEG-4 AVC I and B frameS
暂未订购
结构与纹理分解的多尺度3D解耦卷积视频预测
20
作者 郑明魁 吴孔贤 +2 位作者 邱鑫涛 郑海峰 赵铁松 《计算机学报》 北大核心 2025年第8期1832-1847,共16页
视频预测旨在利用历史帧预测未来图像帧,是一项逐像素的密集预测任务。目前的非自回归模型采用多帧输入多帧输出的架构,有效避免了误差累积。针对现有方法在对视频数据降维处理时使用跨步卷积进行下采样而导致局部细节丢失的问题,本文... 视频预测旨在利用历史帧预测未来图像帧,是一项逐像素的密集预测任务。目前的非自回归模型采用多帧输入多帧输出的架构,有效避免了误差累积。针对现有方法在对视频数据降维处理时使用跨步卷积进行下采样而导致局部细节丢失的问题,本文采用了特征域结构与纹理分离学习的思路,去除细节后的低频结构信息具有更强的时间相关性,有利于局部区域结构像素时空相关性的预测,而高频细节特征则采用一个独立的增强模块进行学习。在此基础上,本文设计了一种多尺度的3D解耦卷积模块,将3D卷积解耦为2D卷积和1D卷积来专注学习低频结构的空间和时间特性,这种解耦方式在提高对象形态预测性能的同时还减少了模型参数和内存消耗。最后采用一种高频细节小尺度增强模块,用来学习分解后的高频信息并预测图像的纹理,提升视频预测的细节质量。在合成数据以及真实场景数据集上的实验结果表明,本文所设计的算法兼顾了时空一致性和细节表现力,在视频中运动物体的整体结构与局部细节预测方面展现出更高的准确性,其中在Moving MNIST数据集上的MSE为15.7,分别比现有算法如SimVP、TAU、SwinLSTM、VMRNN等降低了34.0%、20.7%、11.3%、4.8%,在其他数据集上的实验结果也表现出一定的优越性。 展开更多
关键词 视频预测 多帧输入多帧输出 结构与纹理分离 3D解耦卷积
在线阅读 下载PDF
上一页 1 2 52 下一页 到第
使用帮助 返回顶部