High-resolution video transmission requires a substantial amount of bandwidth.In this paper,we present a novel video processing methodology that innovatively integrates region of interest(ROI)identification and super-...High-resolution video transmission requires a substantial amount of bandwidth.In this paper,we present a novel video processing methodology that innovatively integrates region of interest(ROI)identification and super-resolution enhancement.Our method commences with the accurate detection of ROIs within video sequences,followed by the application of advanced super-resolution techniques to these areas,thereby preserving visual quality while economizing on data transmission.To validate and benchmark our approach,we have curated a new gaming dataset tailored to evaluate the effectiveness of ROI-based super-resolution in practical applications.The proposed model architecture leverages the transformer network framework,guided by a carefully designed multi-task loss function,which facilitates concurrent learning and execution of both ROI identification and resolution enhancement tasks.This unified deep learning model exhibits remarkable performance in achieving super-resolution on our custom dataset.The implications of this research extend to optimizing low-bitrate video streaming scenarios.By selectively enhancing the resolution of critical regions in videos,our solution enables high-quality video delivery under constrained bandwidth conditions.Empirical results demonstrate a 15%reduction in transmission bandwidth compared to traditional super-resolution based compression methods,without any perceivable decline in visual quality.This work thus contributes to the advancement of video compression and enhancement technologies,offering an effective strategy for improving digital media delivery efficiency and user experience,especially in bandwidth-limited environments.The innovative integration of ROI identification and super-resolution presents promising avenues for future research and development in adaptive and intelligent video communication systems.展开更多
In an increasing number of area inspection applications,such as powerline inspection and sewage disposal monitoring,Unmanned Aerial Vehicles(UAVs)are used for capturing and transmitting on-site videos.Existing UAV vid...In an increasing number of area inspection applications,such as powerline inspection and sewage disposal monitoring,Unmanned Aerial Vehicles(UAVs)are used for capturing and transmitting on-site videos.Existing UAV video compressions employ Advanced Video Coding(AVC)or High Efficiency Video Coding(HEvC)encoders to eliminate intra-frame and short-term inter-frame redundancy,while these methods still face challenges in achieving high compression efficiency due to the high captured video bitrate and limited transmission capacity.In this paper,we further consider that UAVs revisit the same area and capture videos from different viewpoints,hence the Long-term Historical Background Redundancy(LHBR)exists among revisited video clips.Thus,we leverage the LHBR caused by UAV revisits,and propose a high-efficiency aerial video compression for UAVs.Our method comprises three steps:Firstly,we propose a lightweight method based on a spatial correlation model to select the most correlated reference frames from historical video database.Then,we design a Historical Reference Background Frame(HBRF)generation algorithm by alternately using the keypoint-based and telemetry-assisted alignments to align the selected frames with current frame.Finally,we use the generated HBRF as a reference frame to eliminate the LHBR within I-frames.Our proposed method has been experimentally proven to reduce Bjøntegaard-Delta bitrate(BD-bitrate)by 42.83%or enhance Bjøntegaard-Delta Peak Signal-to-Noise Ratio(BD-PSNR)by 2.98 dB over original HEVC,and take 29.3%of the encoding time needed for existing LHBR based compressions.展开更多
Multimedia semantic communication has been receiving increasing attention due to its significant enhancement of communication efficiency.Semantic coding,which is oriented towards extracting and encoding the key semant...Multimedia semantic communication has been receiving increasing attention due to its significant enhancement of communication efficiency.Semantic coding,which is oriented towards extracting and encoding the key semantics of video for transmission,is a key aspect in the framework of multimedia semantic communication.In this paper,we propose a facial video semantic coding method with low bitrate based on the temporal continuity of video semantics.At the sender’s end,we selectively transmit facial keypoints and deformation information,allocating distinct bitrates to different keypoints across frames.Compressive techniques involving sampling and quantization are employed to reduce the bitrate while retaining facial key semantic information.At the receiver’s end,a GAN-based generative network is utilized for reconstruction,effectively mitigating block artifacts and buffering problems present in traditional codec algorithms under low bitrates.The performance of the proposed approach is validated on multiple datasets,such as VoxCeleb and TalkingHead-1kH,employing metrics such as LPIPS,DISTS,and AKD for assessment.Experimental results demonstrate significant advantages over traditional codec methods,achieving up to approximately 10-fold bitrate reduction in prolonged,stable head pose scenarios across diverse conversational video settings.展开更多
It has been over a decade since the first coded aperture video compressive sensing(CS)system was reported.The underlying principle of this technology is to employ a high-frequency modulator in the optical path to modu...It has been over a decade since the first coded aperture video compressive sensing(CS)system was reported.The underlying principle of this technology is to employ a high-frequency modulator in the optical path to modulate a recorded high-speed scene within one integration time.The superimposed image captured in this manner is modulated and compressed,since multiple modulation patterns are imposed.Following this,reconstruction algorithms are utilized to recover the desired high-speed scene.One leading advantage of video CS is that a single captured measurement can be used to reconstruct a multi-frame video,thereby enabling a low-speed camera to capture high-speed scenes.Inspired by this,a number of variants of video CS systems have been built,mainly using different modulation devices.Meanwhile,in order to obtain high-quality reconstruction videos,many algorithms have been developed,from optimization-based iterative algorithms to deep-learning-based ones.Recently,emerging deep learning methods have been dominant due to their high-speed inference and high-quality reconstruction,highlighting the possibility of deploying video CS in practical applications.Toward this end,this paper reviews the progress that has been achieved in video CS during the past decade.We further analyze the efforts that need to be made—in terms of both hardware and algorithms—to enable real applications.Research gaps are put forward and future directions are summarized to help researchers and engineers working on this topic.展开更多
Many important developments in video compression technologies have occurred during the past two decades. The block-based discrete cosine transform with motion compensation hybrid coding scheme has been widely employed...Many important developments in video compression technologies have occurred during the past two decades. The block-based discrete cosine transform with motion compensation hybrid coding scheme has been widely employed by most available video coding standards, notably the ITU-T H.26x and ISO/IEC MPEG-x families and video part of China audio video coding standard (AVS). The objective of this paper is to provide a review of the developments of the four basic building blocks of hybrid coding scheme, namely predictive coding, transform coding, quantization and entropy coding, and give theoretical analyses and summaries of the technological advancements. We further analyze the development trends and perspectives of video com- pression, highlighting problems and research directions.展开更多
The evolution of social network and multimedia technologies encourage more and more people to generate and upload visual information, which leads to the generation of large-scale video data. Therefore, preeminent comp...The evolution of social network and multimedia technologies encourage more and more people to generate and upload visual information, which leads to the generation of large-scale video data. Therefore, preeminent compression technologies are highly desired to facilitate the storage and transmission of these tremendous video data for a wide variety of applications. In this paper, a systematic review of the recent advances for large-scale video compression (LSVC) is presented. Specifically, fast video coding algorithms and effective models to improve video compression efficiency are introduced in detail, since coding complexity and compression efficiency are two important factors to evaluate video coding approaches. Finally, the challenges and fu- ture research trends for LSVC are discussed.展开更多
In this paper, we summarize 3D perception-oriented algorithms for perceptually driven 3D video coding. Several perceptual ef- fects have been exploited for 2D video viewing; however, this is not yet the case for 3D vi...In this paper, we summarize 3D perception-oriented algorithms for perceptually driven 3D video coding. Several perceptual ef- fects have been exploited for 2D video viewing; however, this is not yet the case for 3D video viewing. 3D video requires depth perception, which implies binocular effects such as con fl icts, fusion, and rivalry. A better understanding of these effects is necessary for 3D perceptual compression, which provides users with a more comfortable visual experience for video that is de- livered over a channel with limited bandwidth. We present state-of-the-art of 3D visual attention models, 3D just-notice- able difference models, and 3D texture-synthesis models that address 3D human vision issues in 3D video coding and trans-mission.展开更多
To improve the performance of video compression for machine vision analysis tasks,a video coding for machines(VCM)standard working group was established to promote standardization procedures.In this paper,recent advan...To improve the performance of video compression for machine vision analysis tasks,a video coding for machines(VCM)standard working group was established to promote standardization procedures.In this paper,recent advances in video coding for machine standards are presented and comprehensive introductions to the use cases,requirements,evaluation frameworks and corresponding metrics of the VCM standard are given.Then the existing methods are presented,introducing the existing proposals by category and the research progress of the latest VCM conference.Finally,we give conclusions.展开更多
The exponential growth of cultural heritage documentation videos calls for new compression methods that preserve critical details while reducing storage.For static scenes,traditional frame-based compression methods st...The exponential growth of cultural heritage documentation videos calls for new compression methods that preserve critical details while reducing storage.For static scenes,traditional frame-based compression methods struggle with the trade-off between semantic redundancy and detail preservation.To improve compression efficiency,a novel dual-mode semantic compression framework for static object videos based on neural radiance fields(NeRF)was proposed in this paper.By integrating semantic segmentation with COLMAP technology,the proposed system decouples the video stream into two semantic layers,which are the central object containing critical details and the dynamic background rich in semantic redundancy,respectively.In the proposed dual-mode framework,the focus-priority(FP)mode is designed for scenarios with high-efficiency demands,where only the NeRF-based neural representation of the primary object is preserved and compressed.For scenarios that require additional environmental context,the panorama-compatible(PC)mode synchronously compresses the H.264-encoded background streams and the primary object streams to reconstruct the full scene.Experimental results on single-artifact video data demonstrate that the proposed framework achieves a storage reduction of 20%compared with conventional methods,thus providing a flexible and controllable solution for the compression of cultural heritage documentation videos.展开更多
Video compression in medical video streaming is one of the key technologies associated with mobile healthcare.Seamless delivery of medical video streams over a resource constrained network emphasizes the need of a vid...Video compression in medical video streaming is one of the key technologies associated with mobile healthcare.Seamless delivery of medical video streams over a resource constrained network emphasizes the need of a video codec that requires minimum bitrates and maintains high perceptual quality.This paper presents a comparative study between High Efciency Video Coding(HEVC)and its potential successor Versatile Video Coding(VVC)in the context of healthcare.A large-scale subjective experiment comprising of twenty-four non-expert participants is presented for eight different test conditions in Full High Denition(FHD)videos.The presented analysis highlights the impact of compression artefacts on the perceptual quality of HEVC and VVC processed videos.Our results and ndings show that VVC clearly outperforms HEVC in terms of achieving higher compression,while maintaining high quality in FHD videos.VVC requires upto 40%less bitrate for encoding an FHD video at excellent perceptual quality.We have provided rate-quality curves for both encoders and a degree of overlap across both codecs in terms of perceptual quality.Overall,there is a 71%degree of overlap in terms of quality between VVC and HEVC compressed videos for eight different test conditions.展开更多
Video snapshot compressive imaging(Video SCI) modulates scenes using various encoding masks and captures compressed measurements with a low-speed camera during a single exposure. Subsequently, reconstruction algorithm...Video snapshot compressive imaging(Video SCI) modulates scenes using various encoding masks and captures compressed measurements with a low-speed camera during a single exposure. Subsequently, reconstruction algorithms restore image sequences of dynamic scenes, offering advantages such as reduced bandwidth and storage space requirements. The temporal correlation in video data is crucial for Video SCI, as it leverages the temporal relationships among frames to enhance the efficiency and quality of reconstruction algorithms, particularly for fast-moving objects.This paper discretizes video frames to create image datasets with the same data volume but differing temporal correlations. We utilized the state-of-the-art(SOTA) reconstruction framework, EfficientSCI++, to train various compressed reconstruction models with these differing temporal correlations. Evaluating the reconstruction results from these models, our simulation experiments confirm that a reduction in temporal correlation leads to decreased reconstruction accuracy. Additionally, we simulated the reconstruction outcomes of datasets devoid of temporal correlation, illustrating that models trained on non-temporal data affect the temporal feature extraction capabilities of transformers, resulting in negligible impacts on the evaluation of reconstruction results for non-temporal correlation test datasets.展开更多
Two video coding schemes based on wavelet transform achieving very low bit rate are presented in this paper. The first is a hybrid motion compensated wavelet transform(MC WT)system which behaves better at very low ...Two video coding schemes based on wavelet transform achieving very low bit rate are presented in this paper. The first is a hybrid motion compensated wavelet transform(MC WT)system which behaves better at very low bit rates than the block DCT residual coder. The second is a new efficient coding system based on a simple frame differencing wavelet transform(FD WT)which performs well in both PSNR and visual quality with substantially reduced complexity.展开更多
A new improved Goh's 3 D wavelet transform(WT) coding scheme is presented in this paper. The new scheme has great advantages including a simple code structure, low computation cost and good performance in PSNR, c...A new improved Goh's 3 D wavelet transform(WT) coding scheme is presented in this paper. The new scheme has great advantages including a simple code structure, low computation cost and good performance in PSNR, compression ratios and visual quality of reconstructions, when compared to the other existing 3 D WT coding methods and the 2 D WT based coding methods. The new 3 D WT coding scheme is suitable for very low bit rate video coding.展开更多
The high-efficiency video coder(HEVC)is one of the most advanced techniques used in growing real-time multimedia applications today.However,they require large bandwidth for transmission through bandwidth,and bandwidth...The high-efficiency video coder(HEVC)is one of the most advanced techniques used in growing real-time multimedia applications today.However,they require large bandwidth for transmission through bandwidth,and bandwidth varies with different video sequences/formats.This paper proposes an adaptive information-based variable quantization matrix(AIVQM)developed for different video formats having variable energy levels.The quantization method is adapted based on video sequence using statistical analysis,improving bit budget,quality and complexity reduction.Further,to have precise control over bit rate and quality,a multi-constraint prune algorithm is proposed in the second stage of the AI-VQM technique for pre-calculating K numbers of paths.The same should be handy to selfadapt and choose one of the K-path automatically in dynamically changing bandwidth availability as per requirement after extensive testing of the proposed algorithm in the multi-constraint environment for multiple paths and evaluating the performance based on peak signal to noise ratio(PSNR),bit-budget and time complexity for different videos a noticeable improvement in rate-distortion(RD)performance is achieved.Using the proposed AIVQM technique,more feasible and efficient video sequences are achieved with less loss in PSNR than the variable quantization method(VQM)algorithm with approximately a rise of 10%–20%based on different video sequences/formats.展开更多
High Efficiency Video Coding (HEVC) is the latest international video coding standard, which can provide the similar quality with about half bandwidth compared with its predecessor, H.264/MPEG?4 AVC. To meet the requi...High Efficiency Video Coding (HEVC) is the latest international video coding standard, which can provide the similar quality with about half bandwidth compared with its predecessor, H.264/MPEG?4 AVC. To meet the requirement of higher bit depth coding and more chroma sampling formats, range extensions of HEVC were developed. This paper introduces the coding tools in HEVC range extensions and provides experimental results to compare HEVC range extensions with previous video coding standards. Ex?perimental results show that HEVC range extensions improve coding efficiency much over H.264/MPEG?4 AVC High Predictive profile, especially for 4K sequences.展开更多
Discrete Cosine Transform(DCT)is the most widely used technique in image and video compression.In this paper,the structure of DCT and Inverse DCT(IDCT)algorithm is split in the form of COordinate Rotation DIgital Comp...Discrete Cosine Transform(DCT)is the most widely used technique in image and video compression.In this paper,the structure of DCT and Inverse DCT(IDCT)algorithm is split in the form of COordinate Rotation DIgital Computer(CORDIC)rotation matrix.The two-dimensional(2-D)8×8 DCT/IDCT units based on the improved rotation CORDIC algorithm is proposed.The shift and addition operations of the CORDIC algorithm are used to replace the cosine multiplication operations in the algorithm.The design does not contain any multiplier unit,which reduces the complexity of the hardware unit.The row-column transform unit composed of register arrays connects two 1-D 8-point DCT units to complete the calculation of 2-D 8×8 DCT.The pipeline latency of proposed architecture is 28 clock cycles.The proposed efficient two-dimensional DCT architecture has been synthesized on the Xilinx’s Kintex-7 FPGA.The resource utilization is 17.36%for Slice LUTs,3.49%for Slice Registers,and the maximum operating frequency is 172 MHz.It takes only 0.161μs to complete a process of block of 8×8 samples.A frame of image is processed by the designed DCT unit and then reconstructed by the IDCT unit to verify the function.The Peak Signal to Noise Ratio(PSNR)can reach 51.99 dB.展开更多
Super-Resolution (SR) technique means to reconstruct High-Resolution (HR) images from a sequence of Low-Resolution (LR) observations,which has been a great focus for compressed video. Based on the theory of Projection...Super-Resolution (SR) technique means to reconstruct High-Resolution (HR) images from a sequence of Low-Resolution (LR) observations,which has been a great focus for compressed video. Based on the theory of Projection Onto Convex Set (POCS),this paper constructs Quantization Constraint Set (QCS) using the quantization information extracted from the video bit stream. By combining the statistical properties of image and the Human Visual System (HVS),a novel Adaptive Quantization Constraint Set (AQCS) is proposed. Simulation results show that AQCS-based SR al-gorithm converges at a fast rate and obtains better performance in both objective and subjective quality,which is applicable for compressed video.展开更多
In this paper,a video compressed sensing reconstruction algorithm based on multidimensional reference frames is proposed using the sparse characteristics of video signals in different sparse representation domains.Fir...In this paper,a video compressed sensing reconstruction algorithm based on multidimensional reference frames is proposed using the sparse characteristics of video signals in different sparse representation domains.First,the overall structure of the proposed video compressed sensing algorithm is introduced in this paper.The paper adopts a multi-reference frame bidirectional prediction hypothesis optimization algorithm.Then,the paper proposes a reconstruction method for CS frames at the re-decoding end.In addition to using key frames of each GOP reconstructed in the time domain as reference frames for reconstructing CS frames,half-pixel reference frames and scaled reference frames in the pixel domain are also used as CS frames.Reference frames of CS frames are used to obtain higher quality assumptions.Themethod of obtaining reference frames in the pixel domain is also discussed in detail in this paper.Finally,the reconstruction algorithm proposed in this paper is compared with video compression algorithms in the literature that have better reconstruction results.Experiments show that the algorithm has better performance than the best multi-reference frame video compression sensing algorithm and can effectively improve the quality of slowmotion video reconstruction.展开更多
This letter proposes a novel method of compressed video super-resolution reconstruction based on MAP-POCS (Maximum Posterior Probability-Projection Onto Convex Set). At first assuming the high-resolution model subject...This letter proposes a novel method of compressed video super-resolution reconstruction based on MAP-POCS (Maximum Posterior Probability-Projection Onto Convex Set). At first assuming the high-resolution model subject to Poisson-Markov distribution, then constructing the projecting convex based on MAP. According to the characteristics of compressed video, two different convexes are constructed based on integrating the inter-frame and intra-frame information in the wavelet-domain. The results of the experiment demonstrate that the new method not only outperforms the traditional algorithms on the aspects of PSNR (Peak Signal-to-Noise Ratio), MSE (Mean Square Error) and reconstruction vision effect, but also has the advantages of rapid convergence and easy extension.展开更多
Extraction of traffic information from image or video sequence is a hot research topic in intelligenttransportation system and computer vision. A real-time traffic information extraction method based on com-pressed vi...Extraction of traffic information from image or video sequence is a hot research topic in intelligenttransportation system and computer vision. A real-time traffic information extraction method based on com-pressed video with interframe motion vectors for speed, density and flow detection, has been proposed for ex-traction of traffic information under fixed camera setting and well-defined environment. The motion vectors arefirst separated from the compressed video streams, and then filtered to eliminate incorrect and noisy vectors u-sing the well-defined environmental knowledge. By applying the projective transform and using the filtered mo-tion vectors, speed can be calculated from motion vector statistics, density can be estimated using the motionvector occupancy, and flow can be detected using the combination of speed and density. The embodiment of aprototype system for sky camera traffic monitoring using the MPEG video has been implemented, and experi-mental results proved the effectiveness of the method proposed.展开更多
基金funded by National Key Research and Development Program of China(No.2022YFC3302103).
文摘High-resolution video transmission requires a substantial amount of bandwidth.In this paper,we present a novel video processing methodology that innovatively integrates region of interest(ROI)identification and super-resolution enhancement.Our method commences with the accurate detection of ROIs within video sequences,followed by the application of advanced super-resolution techniques to these areas,thereby preserving visual quality while economizing on data transmission.To validate and benchmark our approach,we have curated a new gaming dataset tailored to evaluate the effectiveness of ROI-based super-resolution in practical applications.The proposed model architecture leverages the transformer network framework,guided by a carefully designed multi-task loss function,which facilitates concurrent learning and execution of both ROI identification and resolution enhancement tasks.This unified deep learning model exhibits remarkable performance in achieving super-resolution on our custom dataset.The implications of this research extend to optimizing low-bitrate video streaming scenarios.By selectively enhancing the resolution of critical regions in videos,our solution enables high-quality video delivery under constrained bandwidth conditions.Empirical results demonstrate a 15%reduction in transmission bandwidth compared to traditional super-resolution based compression methods,without any perceivable decline in visual quality.This work thus contributes to the advancement of video compression and enhancement technologies,offering an effective strategy for improving digital media delivery efficiency and user experience,especially in bandwidth-limited environments.The innovative integration of ROI identification and super-resolution presents promising avenues for future research and development in adaptive and intelligent video communication systems.
基金supported by the National Natural Science Foundation of China(No.62025110).
文摘In an increasing number of area inspection applications,such as powerline inspection and sewage disposal monitoring,Unmanned Aerial Vehicles(UAVs)are used for capturing and transmitting on-site videos.Existing UAV video compressions employ Advanced Video Coding(AVC)or High Efficiency Video Coding(HEvC)encoders to eliminate intra-frame and short-term inter-frame redundancy,while these methods still face challenges in achieving high compression efficiency due to the high captured video bitrate and limited transmission capacity.In this paper,we further consider that UAVs revisit the same area and capture videos from different viewpoints,hence the Long-term Historical Background Redundancy(LHBR)exists among revisited video clips.Thus,we leverage the LHBR caused by UAV revisits,and propose a high-efficiency aerial video compression for UAVs.Our method comprises three steps:Firstly,we propose a lightweight method based on a spatial correlation model to select the most correlated reference frames from historical video database.Then,we design a Historical Reference Background Frame(HBRF)generation algorithm by alternately using the keypoint-based and telemetry-assisted alignments to align the selected frames with current frame.Finally,we use the generated HBRF as a reference frame to eliminate the LHBR within I-frames.Our proposed method has been experimentally proven to reduce Bjøntegaard-Delta bitrate(BD-bitrate)by 42.83%or enhance Bjøntegaard-Delta Peak Signal-to-Noise Ratio(BD-PSNR)by 2.98 dB over original HEVC,and take 29.3%of the encoding time needed for existing LHBR based compressions.
基金supported by the National Natural Science Foundation of China (Nos. NSFC 61925105, 62322109, 62171257 and U22B2001)the Xplorer Prize in Information and Electronics technologiesthe Tsinghua University (Department of Electronic Engineering)-Nantong Research Institute for Advanced Communication Technologies Joint Research Center for Space, Air, Ground and Sea Cooperative Communication Network Technology
文摘Multimedia semantic communication has been receiving increasing attention due to its significant enhancement of communication efficiency.Semantic coding,which is oriented towards extracting and encoding the key semantics of video for transmission,is a key aspect in the framework of multimedia semantic communication.In this paper,we propose a facial video semantic coding method with low bitrate based on the temporal continuity of video semantics.At the sender’s end,we selectively transmit facial keypoints and deformation information,allocating distinct bitrates to different keypoints across frames.Compressive techniques involving sampling and quantization are employed to reduce the bitrate while retaining facial key semantic information.At the receiver’s end,a GAN-based generative network is utilized for reconstruction,effectively mitigating block artifacts and buffering problems present in traditional codec algorithms under low bitrates.The performance of the proposed approach is validated on multiple datasets,such as VoxCeleb and TalkingHead-1kH,employing metrics such as LPIPS,DISTS,and AKD for assessment.Experimental results demonstrate significant advantages over traditional codec methods,achieving up to approximately 10-fold bitrate reduction in prolonged,stable head pose scenarios across diverse conversational video settings.
基金supported by the National Natural Science Foundation of China(61931012,62171258,62088102,and 62271414)the Zhejiang Provincial Outstanding Youth Science Foundation(LR23F010001)the Key Project of Westlake Institute for Optoelectronics(2023GD007).
文摘It has been over a decade since the first coded aperture video compressive sensing(CS)system was reported.The underlying principle of this technology is to employ a high-frequency modulator in the optical path to modulate a recorded high-speed scene within one integration time.The superimposed image captured in this manner is modulated and compressed,since multiple modulation patterns are imposed.Following this,reconstruction algorithms are utilized to recover the desired high-speed scene.One leading advantage of video CS is that a single captured measurement can be used to reconstruct a multi-frame video,thereby enabling a low-speed camera to capture high-speed scenes.Inspired by this,a number of variants of video CS systems have been built,mainly using different modulation devices.Meanwhile,in order to obtain high-quality reconstruction videos,many algorithms have been developed,from optimization-based iterative algorithms to deep-learning-based ones.Recently,emerging deep learning methods have been dominant due to their high-speed inference and high-quality reconstruction,highlighting the possibility of deploying video CS in practical applications.Toward this end,this paper reviews the progress that has been achieved in video CS during the past decade.We further analyze the efforts that need to be made—in terms of both hardware and algorithms—to enable real applications.Research gaps are put forward and future directions are summarized to help researchers and engineers working on this topic.
基金Project(No.2009CB320903)supported by the National Basic Research Program(973)of China
文摘Many important developments in video compression technologies have occurred during the past two decades. The block-based discrete cosine transform with motion compensation hybrid coding scheme has been widely employed by most available video coding standards, notably the ITU-T H.26x and ISO/IEC MPEG-x families and video part of China audio video coding standard (AVS). The objective of this paper is to provide a review of the developments of the four basic building blocks of hybrid coding scheme, namely predictive coding, transform coding, quantization and entropy coding, and give theoretical analyses and summaries of the technological advancements. We further analyze the development trends and perspectives of video com- pression, highlighting problems and research directions.
基金This work was supported in part by the National Natural Science Foundation of China (Grant Nos. 61622115 and 61472281), the Program for Professor of Special Appointment (Eastern Scholar) at Shanghai Institutions of Higher Learning (GZ2015005), and Shanghai Engineering Research Center of Industrial Vision Perception & Intelligent Computing ( 17DZ2251600).
文摘The evolution of social network and multimedia technologies encourage more and more people to generate and upload visual information, which leads to the generation of large-scale video data. Therefore, preeminent compression technologies are highly desired to facilitate the storage and transmission of these tremendous video data for a wide variety of applications. In this paper, a systematic review of the recent advances for large-scale video compression (LSVC) is presented. Specifically, fast video coding algorithms and effective models to improve video compression efficiency are introduced in detail, since coding complexity and compression efficiency are two important factors to evaluate video coding approaches. Finally, the challenges and fu- ture research trends for LSVC are discussed.
文摘In this paper, we summarize 3D perception-oriented algorithms for perceptually driven 3D video coding. Several perceptual ef- fects have been exploited for 2D video viewing; however, this is not yet the case for 3D video viewing. 3D video requires depth perception, which implies binocular effects such as con fl icts, fusion, and rivalry. A better understanding of these effects is necessary for 3D perceptual compression, which provides users with a more comfortable visual experience for video that is de- livered over a channel with limited bandwidth. We present state-of-the-art of 3D visual attention models, 3D just-notice- able difference models, and 3D texture-synthesis models that address 3D human vision issues in 3D video coding and trans-mission.
基金supported by ZTE Industry-University-Institute Cooperation Funds.
文摘To improve the performance of video compression for machine vision analysis tasks,a video coding for machines(VCM)standard working group was established to promote standardization procedures.In this paper,recent advances in video coding for machine standards are presented and comprehensive introductions to the use cases,requirements,evaluation frameworks and corresponding metrics of the VCM standard are given.Then the existing methods are presented,introducing the existing proposals by category and the research progress of the latest VCM conference.Finally,we give conclusions.
基金supported by the National Key Research and Development Program of China(2022YFB2902100)。
文摘The exponential growth of cultural heritage documentation videos calls for new compression methods that preserve critical details while reducing storage.For static scenes,traditional frame-based compression methods struggle with the trade-off between semantic redundancy and detail preservation.To improve compression efficiency,a novel dual-mode semantic compression framework for static object videos based on neural radiance fields(NeRF)was proposed in this paper.By integrating semantic segmentation with COLMAP technology,the proposed system decouples the video stream into two semantic layers,which are the central object containing critical details and the dynamic background rich in semantic redundancy,respectively.In the proposed dual-mode framework,the focus-priority(FP)mode is designed for scenarios with high-efficiency demands,where only the NeRF-based neural representation of the primary object is preserved and compressed.For scenarios that require additional environmental context,the panorama-compatible(PC)mode synchronously compresses the H.264-encoded background streams and the primary object streams to reconstruct the full scene.Experimental results on single-artifact video data demonstrate that the proposed framework achieves a storage reduction of 20%compared with conventional methods,thus providing a flexible and controllable solution for the compression of cultural heritage documentation videos.
基金supported by Innovate UK,which is a part of UK Research&Innovation,and Pangea Connected Ltd.,under the Knowledge Transfer Partnership(KTP)program(Project No.11433)。
文摘Video compression in medical video streaming is one of the key technologies associated with mobile healthcare.Seamless delivery of medical video streams over a resource constrained network emphasizes the need of a video codec that requires minimum bitrates and maintains high perceptual quality.This paper presents a comparative study between High Efciency Video Coding(HEVC)and its potential successor Versatile Video Coding(VVC)in the context of healthcare.A large-scale subjective experiment comprising of twenty-four non-expert participants is presented for eight different test conditions in Full High Denition(FHD)videos.The presented analysis highlights the impact of compression artefacts on the perceptual quality of HEVC and VVC processed videos.Our results and ndings show that VVC clearly outperforms HEVC in terms of achieving higher compression,while maintaining high quality in FHD videos.VVC requires upto 40%less bitrate for encoding an FHD video at excellent perceptual quality.We have provided rate-quality curves for both encoders and a degree of overlap across both codecs in terms of perceptual quality.Overall,there is a 71%degree of overlap in terms of quality between VVC and HEVC compressed videos for eight different test conditions.
基金supported in part by the National Natural Science Foundation of China (No. U23B2011)。
文摘Video snapshot compressive imaging(Video SCI) modulates scenes using various encoding masks and captures compressed measurements with a low-speed camera during a single exposure. Subsequently, reconstruction algorithms restore image sequences of dynamic scenes, offering advantages such as reduced bandwidth and storage space requirements. The temporal correlation in video data is crucial for Video SCI, as it leverages the temporal relationships among frames to enhance the efficiency and quality of reconstruction algorithms, particularly for fast-moving objects.This paper discretizes video frames to create image datasets with the same data volume but differing temporal correlations. We utilized the state-of-the-art(SOTA) reconstruction framework, EfficientSCI++, to train various compressed reconstruction models with these differing temporal correlations. Evaluating the reconstruction results from these models, our simulation experiments confirm that a reduction in temporal correlation leads to decreased reconstruction accuracy. Additionally, we simulated the reconstruction outcomes of datasets devoid of temporal correlation, illustrating that models trained on non-temporal data affect the temporal feature extraction capabilities of transformers, resulting in negligible impacts on the evaluation of reconstruction results for non-temporal correlation test datasets.
文摘Two video coding schemes based on wavelet transform achieving very low bit rate are presented in this paper. The first is a hybrid motion compensated wavelet transform(MC WT)system which behaves better at very low bit rates than the block DCT residual coder. The second is a new efficient coding system based on a simple frame differencing wavelet transform(FD WT)which performs well in both PSNR and visual quality with substantially reduced complexity.
文摘A new improved Goh's 3 D wavelet transform(WT) coding scheme is presented in this paper. The new scheme has great advantages including a simple code structure, low computation cost and good performance in PSNR, compression ratios and visual quality of reconstructions, when compared to the other existing 3 D WT coding methods and the 2 D WT based coding methods. The new 3 D WT coding scheme is suitable for very low bit rate video coding.
文摘The high-efficiency video coder(HEVC)is one of the most advanced techniques used in growing real-time multimedia applications today.However,they require large bandwidth for transmission through bandwidth,and bandwidth varies with different video sequences/formats.This paper proposes an adaptive information-based variable quantization matrix(AIVQM)developed for different video formats having variable energy levels.The quantization method is adapted based on video sequence using statistical analysis,improving bit budget,quality and complexity reduction.Further,to have precise control over bit rate and quality,a multi-constraint prune algorithm is proposed in the second stage of the AI-VQM technique for pre-calculating K numbers of paths.The same should be handy to selfadapt and choose one of the K-path automatically in dynamically changing bandwidth availability as per requirement after extensive testing of the proposed algorithm in the multi-constraint environment for multiple paths and evaluating the performance based on peak signal to noise ratio(PSNR),bit-budget and time complexity for different videos a noticeable improvement in rate-distortion(RD)performance is achieved.Using the proposed AIVQM technique,more feasible and efficient video sequences are achieved with less loss in PSNR than the variable quantization method(VQM)algorithm with approximately a rise of 10%–20%based on different video sequences/formats.
文摘High Efficiency Video Coding (HEVC) is the latest international video coding standard, which can provide the similar quality with about half bandwidth compared with its predecessor, H.264/MPEG?4 AVC. To meet the requirement of higher bit depth coding and more chroma sampling formats, range extensions of HEVC were developed. This paper introduces the coding tools in HEVC range extensions and provides experimental results to compare HEVC range extensions with previous video coding standards. Ex?perimental results show that HEVC range extensions improve coding efficiency much over H.264/MPEG?4 AVC High Predictive profile, especially for 4K sequences.
文摘Discrete Cosine Transform(DCT)is the most widely used technique in image and video compression.In this paper,the structure of DCT and Inverse DCT(IDCT)algorithm is split in the form of COordinate Rotation DIgital Computer(CORDIC)rotation matrix.The two-dimensional(2-D)8×8 DCT/IDCT units based on the improved rotation CORDIC algorithm is proposed.The shift and addition operations of the CORDIC algorithm are used to replace the cosine multiplication operations in the algorithm.The design does not contain any multiplier unit,which reduces the complexity of the hardware unit.The row-column transform unit composed of register arrays connects two 1-D 8-point DCT units to complete the calculation of 2-D 8×8 DCT.The pipeline latency of proposed architecture is 28 clock cycles.The proposed efficient two-dimensional DCT architecture has been synthesized on the Xilinx’s Kintex-7 FPGA.The resource utilization is 17.36%for Slice LUTs,3.49%for Slice Registers,and the maximum operating frequency is 172 MHz.It takes only 0.161μs to complete a process of block of 8×8 samples.A frame of image is processed by the designed DCT unit and then reconstructed by the IDCT unit to verify the function.The Peak Signal to Noise Ratio(PSNR)can reach 51.99 dB.
基金the Natural Science Foundation of Jiangsu Province (No.BK2004151).
文摘Super-Resolution (SR) technique means to reconstruct High-Resolution (HR) images from a sequence of Low-Resolution (LR) observations,which has been a great focus for compressed video. Based on the theory of Projection Onto Convex Set (POCS),this paper constructs Quantization Constraint Set (QCS) using the quantization information extracted from the video bit stream. By combining the statistical properties of image and the Human Visual System (HVS),a novel Adaptive Quantization Constraint Set (AQCS) is proposed. Simulation results show that AQCS-based SR al-gorithm converges at a fast rate and obtains better performance in both objective and subjective quality,which is applicable for compressed video.
文摘In this paper,a video compressed sensing reconstruction algorithm based on multidimensional reference frames is proposed using the sparse characteristics of video signals in different sparse representation domains.First,the overall structure of the proposed video compressed sensing algorithm is introduced in this paper.The paper adopts a multi-reference frame bidirectional prediction hypothesis optimization algorithm.Then,the paper proposes a reconstruction method for CS frames at the re-decoding end.In addition to using key frames of each GOP reconstructed in the time domain as reference frames for reconstructing CS frames,half-pixel reference frames and scaled reference frames in the pixel domain are also used as CS frames.Reference frames of CS frames are used to obtain higher quality assumptions.Themethod of obtaining reference frames in the pixel domain is also discussed in detail in this paper.Finally,the reconstruction algorithm proposed in this paper is compared with video compression algorithms in the literature that have better reconstruction results.Experiments show that the algorithm has better performance than the best multi-reference frame video compression sensing algorithm and can effectively improve the quality of slowmotion video reconstruction.
基金Supported by the Natural Science Foundation of Jiangsu Province (No. BK2004151).
文摘This letter proposes a novel method of compressed video super-resolution reconstruction based on MAP-POCS (Maximum Posterior Probability-Projection Onto Convex Set). At first assuming the high-resolution model subject to Poisson-Markov distribution, then constructing the projecting convex based on MAP. According to the characteristics of compressed video, two different convexes are constructed based on integrating the inter-frame and intra-frame information in the wavelet-domain. The results of the experiment demonstrate that the new method not only outperforms the traditional algorithms on the aspects of PSNR (Peak Signal-to-Noise Ratio), MSE (Mean Square Error) and reconstruction vision effect, but also has the advantages of rapid convergence and easy extension.
文摘Extraction of traffic information from image or video sequence is a hot research topic in intelligenttransportation system and computer vision. A real-time traffic information extraction method based on com-pressed video with interframe motion vectors for speed, density and flow detection, has been proposed for ex-traction of traffic information under fixed camera setting and well-defined environment. The motion vectors arefirst separated from the compressed video streams, and then filtered to eliminate incorrect and noisy vectors u-sing the well-defined environmental knowledge. By applying the projective transform and using the filtered mo-tion vectors, speed can be calculated from motion vector statistics, density can be estimated using the motionvector occupancy, and flow can be detected using the combination of speed and density. The embodiment of aprototype system for sky camera traffic monitoring using the MPEG video has been implemented, and experi-mental results proved the effectiveness of the method proposed.