It has been over a decade since the first coded aperture video compressive sensing(CS)system was reported.The underlying principle of this technology is to employ a high-frequency modulator in the optical path to modu...It has been over a decade since the first coded aperture video compressive sensing(CS)system was reported.The underlying principle of this technology is to employ a high-frequency modulator in the optical path to modulate a recorded high-speed scene within one integration time.The superimposed image captured in this manner is modulated and compressed,since multiple modulation patterns are imposed.Following this,reconstruction algorithms are utilized to recover the desired high-speed scene.One leading advantage of video CS is that a single captured measurement can be used to reconstruct a multi-frame video,thereby enabling a low-speed camera to capture high-speed scenes.Inspired by this,a number of variants of video CS systems have been built,mainly using different modulation devices.Meanwhile,in order to obtain high-quality reconstruction videos,many algorithms have been developed,from optimization-based iterative algorithms to deep-learning-based ones.Recently,emerging deep learning methods have been dominant due to their high-speed inference and high-quality reconstruction,highlighting the possibility of deploying video CS in practical applications.Toward this end,this paper reviews the progress that has been achieved in video CS during the past decade.We further analyze the efforts that need to be made—in terms of both hardware and algorithms—to enable real applications.Research gaps are put forward and future directions are summarized to help researchers and engineers working on this topic.展开更多
In this paper,a video compressed sensing reconstruction algorithm based on multidimensional reference frames is proposed using the sparse characteristics of video signals in different sparse representation domains.Fir...In this paper,a video compressed sensing reconstruction algorithm based on multidimensional reference frames is proposed using the sparse characteristics of video signals in different sparse representation domains.First,the overall structure of the proposed video compressed sensing algorithm is introduced in this paper.The paper adopts a multi-reference frame bidirectional prediction hypothesis optimization algorithm.Then,the paper proposes a reconstruction method for CS frames at the re-decoding end.In addition to using key frames of each GOP reconstructed in the time domain as reference frames for reconstructing CS frames,half-pixel reference frames and scaled reference frames in the pixel domain are also used as CS frames.Reference frames of CS frames are used to obtain higher quality assumptions.Themethod of obtaining reference frames in the pixel domain is also discussed in detail in this paper.Finally,the reconstruction algorithm proposed in this paper is compared with video compression algorithms in the literature that have better reconstruction results.Experiments show that the algorithm has better performance than the best multi-reference frame video compression sensing algorithm and can effectively improve the quality of slowmotion video reconstruction.展开更多
High-resolution video transmission requires a substantial amount of bandwidth.In this paper,we present a novel video processing methodology that innovatively integrates region of interest(ROI)identification and super-...High-resolution video transmission requires a substantial amount of bandwidth.In this paper,we present a novel video processing methodology that innovatively integrates region of interest(ROI)identification and super-resolution enhancement.Our method commences with the accurate detection of ROIs within video sequences,followed by the application of advanced super-resolution techniques to these areas,thereby preserving visual quality while economizing on data transmission.To validate and benchmark our approach,we have curated a new gaming dataset tailored to evaluate the effectiveness of ROI-based super-resolution in practical applications.The proposed model architecture leverages the transformer network framework,guided by a carefully designed multi-task loss function,which facilitates concurrent learning and execution of both ROI identification and resolution enhancement tasks.This unified deep learning model exhibits remarkable performance in achieving super-resolution on our custom dataset.The implications of this research extend to optimizing low-bitrate video streaming scenarios.By selectively enhancing the resolution of critical regions in videos,our solution enables high-quality video delivery under constrained bandwidth conditions.Empirical results demonstrate a 15%reduction in transmission bandwidth compared to traditional super-resolution based compression methods,without any perceivable decline in visual quality.This work thus contributes to the advancement of video compression and enhancement technologies,offering an effective strategy for improving digital media delivery efficiency and user experience,especially in bandwidth-limited environments.The innovative integration of ROI identification and super-resolution presents promising avenues for future research and development in adaptive and intelligent video communication systems.展开更多
In this paper, a new mesh based algorithm is applied for motion estimation and compensation in the wavelet domain. The first major contribution of this work is the introduction of a new active mesh based method for mo...In this paper, a new mesh based algorithm is applied for motion estimation and compensation in the wavelet domain. The first major contribution of this work is the introduction of a new active mesh based method for motion estimation and compensation. The proposed algorithm is based on the mesh energy minimization with novel sets of energy functions. The proposed energy functions have appropriate features, which improve the accuracy of motion estimation and compensation algorithm. We employ the proposed motion estimation algorithm in two different manners for video compression. In the first approach, the proposed algorithm is employed for motion estimation of consecutive frames. In the second approach, the algorithm is applied for motion estimation and compensation in the wavelet sub-bands. The experimental results reveal that the incorporation of active mesh based motion-compensated temporal filtering into wavelet sub-bands significantly improves the distortion performance rate of the video compression. We also use a new wavelet coder for the coding of the 3D volume of coefficients based on the retained energy criteria. This coder gives the maximum retained energy in all sub-bands. The proposed algorithm was tested with some video sequences and the results showed that the use of the proposed active mesh method for motion compensation and its implementation in sub-bands yields significant improvement in PSNR performance.展开更多
The evolution of social network and multimedia technologies encourage more and more people to generate and upload visual information, which leads to the generation of large-scale video data. Therefore, preeminent comp...The evolution of social network and multimedia technologies encourage more and more people to generate and upload visual information, which leads to the generation of large-scale video data. Therefore, preeminent compression technologies are highly desired to facilitate the storage and transmission of these tremendous video data for a wide variety of applications. In this paper, a systematic review of the recent advances for large-scale video compression (LSVC) is presented. Specifically, fast video coding algorithms and effective models to improve video compression efficiency are introduced in detail, since coding complexity and compression efficiency are two important factors to evaluate video coding approaches. Finally, the challenges and fu- ture research trends for LSVC are discussed.展开更多
Many important developments in video compression technologies have occurred during the past two decades. The block-based discrete cosine transform with motion compensation hybrid coding scheme has been widely employed...Many important developments in video compression technologies have occurred during the past two decades. The block-based discrete cosine transform with motion compensation hybrid coding scheme has been widely employed by most available video coding standards, notably the ITU-T H.26x and ISO/IEC MPEG-x families and video part of China audio video coding standard (AVS). The objective of this paper is to provide a review of the developments of the four basic building blocks of hybrid coding scheme, namely predictive coding, transform coding, quantization and entropy coding, and give theoretical analyses and summaries of the technological advancements. We further analyze the development trends and perspectives of video com- pression, highlighting problems and research directions.展开更多
In this paper, we summarize 3D perception-oriented algorithms for perceptually driven 3D video coding. Several perceptual ef- fects have been exploited for 2D video viewing; however, this is not yet the case for 3D vi...In this paper, we summarize 3D perception-oriented algorithms for perceptually driven 3D video coding. Several perceptual ef- fects have been exploited for 2D video viewing; however, this is not yet the case for 3D video viewing. 3D video requires depth perception, which implies binocular effects such as con fl icts, fusion, and rivalry. A better understanding of these effects is necessary for 3D perceptual compression, which provides users with a more comfortable visual experience for video that is de- livered over a channel with limited bandwidth. We present state-of-the-art of 3D visual attention models, 3D just-notice- able difference models, and 3D texture-synthesis models that address 3D human vision issues in 3D video coding and trans-mission.展开更多
Video reconstruction quality largely depends on the ability of employed sparse domain to adequately represent the underlying video in Distributed Compressed Video Sensing (DCVS). In this paper, we propose a novel dyna...Video reconstruction quality largely depends on the ability of employed sparse domain to adequately represent the underlying video in Distributed Compressed Video Sensing (DCVS). In this paper, we propose a novel dynamic global-Principal Component Analysis (PCA) sparse representation algorithm for video based on the sparse-land model and nonlocal similarity. First, grouping by matching is realized at the decoder from key frames that are previously recovered. Second, we apply PCA to each group (sub-dataset) to compute the principle components from which the sub-dictionary is constructed. Finally, the non-key frames are reconstructed from random measurement data using a Compressed Sensing (CS) reconstruction algorithm with sparse regularization. Experimental results show that our algorithm has a better performance compared with the DCT and K-SVD dictionaries.展开更多
Multimedia semantic communication has been receiving increasing attention due to its significant enhancement of communication efficiency.Semantic coding,which is oriented towards extracting and encoding the key semant...Multimedia semantic communication has been receiving increasing attention due to its significant enhancement of communication efficiency.Semantic coding,which is oriented towards extracting and encoding the key semantics of video for transmission,is a key aspect in the framework of multimedia semantic communication.In this paper,we propose a facial video semantic coding method with low bitrate based on the temporal continuity of video semantics.At the sender’s end,we selectively transmit facial keypoints and deformation information,allocating distinct bitrates to different keypoints across frames.Compressive techniques involving sampling and quantization are employed to reduce the bitrate while retaining facial key semantic information.At the receiver’s end,a GAN-based generative network is utilized for reconstruction,effectively mitigating block artifacts and buffering problems present in traditional codec algorithms under low bitrates.The performance of the proposed approach is validated on multiple datasets,such as VoxCeleb and TalkingHead-1kH,employing metrics such as LPIPS,DISTS,and AKD for assessment.Experimental results demonstrate significant advantages over traditional codec methods,achieving up to approximately 10-fold bitrate reduction in prolonged,stable head pose scenarios across diverse conversational video settings.展开更多
Compressed sensing(CS)is a novel technology to acquire and reconstruct sparse signals below the Nyquist rate.It has great potential in image and video acquisition and processing.To effectively improve the sparsity of ...Compressed sensing(CS)is a novel technology to acquire and reconstruct sparse signals below the Nyquist rate.It has great potential in image and video acquisition and processing.To effectively improve the sparsity of signal being measured and reconstructing efficiency,an encoding and decoding model of residual distributed compressive video sensing based on double side information(RDCVS-DSI)is proposed in this paper.Exploiting the characteristics of image itself in the frequency domain and the correlation between successive frames,the model regards the video frame in low quality as the first side information in the process of coding,and generates the second side information for the non-key frames using motion estimation and compensation technology at its decoding end.Performance analysis and simulation experiments show that the RDCVS-DSI model can rebuild the video sequence with high fidelity in the consumption of quite low complexity.About 1~5 dB gain in the average peak signal-to-noise ratio of the reconstructed frames is observed,and the speed is close to the least complex DCVS,when compared with prior works on compressive video sensing.展开更多
Super-Resolution (SR) technique means to reconstruct High-Resolution (HR) images from a sequence of Low-Resolution (LR) observations,which has been a great focus for compressed video. Based on the theory of Projection...Super-Resolution (SR) technique means to reconstruct High-Resolution (HR) images from a sequence of Low-Resolution (LR) observations,which has been a great focus for compressed video. Based on the theory of Projection Onto Convex Set (POCS),this paper constructs Quantization Constraint Set (QCS) using the quantization information extracted from the video bit stream. By combining the statistical properties of image and the Human Visual System (HVS),a novel Adaptive Quantization Constraint Set (AQCS) is proposed. Simulation results show that AQCS-based SR al-gorithm converges at a fast rate and obtains better performance in both objective and subjective quality,which is applicable for compressed video.展开更多
This paper proposes a thorough scheme, by virtue of camera zooming descriptor with two-level threshold, to automatically retrieve close-ups directly from moving picture experts group (MPEG) compressed videos based o...This paper proposes a thorough scheme, by virtue of camera zooming descriptor with two-level threshold, to automatically retrieve close-ups directly from moving picture experts group (MPEG) compressed videos based on camera motion analysis. A new algorithm for fast camera motion estimation in compressed domain is presented. In the retrieval process, camera-motion-based semantic retrieval is built. To improve the coverage of the proposed scheme, close-up retrieval in all kinds of videos is investigated. Extensive experiments illustrate that the proposed scheme provides promising retrieval results under real-time and automatic application scenario.展开更多
This letter proposes a novel method of compressed video super-resolution reconstruction based on MAP-POCS (Maximum Posterior Probability-Projection Onto Convex Set). At first assuming the high-resolution model subject...This letter proposes a novel method of compressed video super-resolution reconstruction based on MAP-POCS (Maximum Posterior Probability-Projection Onto Convex Set). At first assuming the high-resolution model subject to Poisson-Markov distribution, then constructing the projecting convex based on MAP. According to the characteristics of compressed video, two different convexes are constructed based on integrating the inter-frame and intra-frame information in the wavelet-domain. The results of the experiment demonstrate that the new method not only outperforms the traditional algorithms on the aspects of PSNR (Peak Signal-to-Noise Ratio), MSE (Mean Square Error) and reconstruction vision effect, but also has the advantages of rapid convergence and easy extension.展开更多
Extraction of traffic information from image or video sequence is a hot research topic in intelligenttransportation system and computer vision. A real-time traffic information extraction method based on com-pressed vi...Extraction of traffic information from image or video sequence is a hot research topic in intelligenttransportation system and computer vision. A real-time traffic information extraction method based on com-pressed video with interframe motion vectors for speed, density and flow detection, has been proposed for ex-traction of traffic information under fixed camera setting and well-defined environment. The motion vectors arefirst separated from the compressed video streams, and then filtered to eliminate incorrect and noisy vectors u-sing the well-defined environmental knowledge. By applying the projective transform and using the filtered mo-tion vectors, speed can be calculated from motion vector statistics, density can be estimated using the motionvector occupancy, and flow can be detected using the combination of speed and density. The embodiment of aprototype system for sky camera traffic monitoring using the MPEG video has been implemented, and experi-mental results proved the effectiveness of the method proposed.展开更多
A layered compression algorithm is presented which delivers spatial scalable encoded bit streams for remote video monitoring system. The complexity of the algorithm is modest and is well suited to real time implementa...A layered compression algorithm is presented which delivers spatial scalable encoded bit streams for remote video monitoring system. The complexity of the algorithm is modest and is well suited to real time implementation. Based on the layered compression algorithm, a codec system model is established. High-speed video compression can be realized with parallel data compression in this codec system. For image reconstruction, a prediction method using two nearest pix points is presented.展开更多
In this paper, we present a method using video codec technology to compress ECG signals. This method exploits both intra-beat and inter-beat correlations of the ECG signals to achieve high compression ratios (CR) and ...In this paper, we present a method using video codec technology to compress ECG signals. This method exploits both intra-beat and inter-beat correlations of the ECG signals to achieve high compression ratios (CR) and a low percent root mean square difference (PRD). Since ECG signals have both intra-beat and inter-beat redundancies like video signals, which have both intra-frame and inter-frame correlation, video codec technology can be used for ECG compression. In order to do this, some pre-process will be needed. The ECG signals should firstly be segmented and normalized to a sequence of beat cycles with the same length, and then these beat cycles can be treated as picture frames and compressed with video codec technology. We have used records from MIT-BIH arrhythmia database to evaluate our algorithm. Results show that, besides compression efficiently, this algorithm has the advantages of resolution adjustable, random access and flexibility for irregular period and QRS false detection.展开更多
Video snapshot compressive imaging(Video SCI) modulates scenes using various encoding masks and captures compressed measurements with a low-speed camera during a single exposure. Subsequently, reconstruction algorithm...Video snapshot compressive imaging(Video SCI) modulates scenes using various encoding masks and captures compressed measurements with a low-speed camera during a single exposure. Subsequently, reconstruction algorithms restore image sequences of dynamic scenes, offering advantages such as reduced bandwidth and storage space requirements. The temporal correlation in video data is crucial for Video SCI, as it leverages the temporal relationships among frames to enhance the efficiency and quality of reconstruction algorithms, particularly for fast-moving objects.This paper discretizes video frames to create image datasets with the same data volume but differing temporal correlations. We utilized the state-of-the-art(SOTA) reconstruction framework, EfficientSCI++, to train various compressed reconstruction models with these differing temporal correlations. Evaluating the reconstruction results from these models, our simulation experiments confirm that a reduction in temporal correlation leads to decreased reconstruction accuracy. Additionally, we simulated the reconstruction outcomes of datasets devoid of temporal correlation, illustrating that models trained on non-temporal data affect the temporal feature extraction capabilities of transformers, resulting in negligible impacts on the evaluation of reconstruction results for non-temporal correlation test datasets.展开更多
In the frame of compressed sensing distributed video coding, the design of the quantization matrix directly affects the reconstruction quality of the receiving terminal of the video. In this article, we present a new ...In the frame of compressed sensing distributed video coding, the design of the quantization matrix directly affects the reconstruction quality of the receiving terminal of the video. In this article, we present a new design method of the Gaussian quantization matrix adapting to the compressed sensing coding, for that the distribution of the parameters of the image is featured of the characteristic of approximately normal distribution after measured by compressive sensing. By this way, the parameters of a certain quantity of the image frames depending on the video sequences generated by the Gaussian quantization matrix possess certain adaptive capacity. By comparison with the plan of the traditional quantization, the quantization matrix presented in this article would improve the reconstruction quality of the video.展开更多
基金supported by the National Natural Science Foundation of China(61931012,62171258,62088102,and 62271414)the Zhejiang Provincial Outstanding Youth Science Foundation(LR23F010001)the Key Project of Westlake Institute for Optoelectronics(2023GD007).
文摘It has been over a decade since the first coded aperture video compressive sensing(CS)system was reported.The underlying principle of this technology is to employ a high-frequency modulator in the optical path to modulate a recorded high-speed scene within one integration time.The superimposed image captured in this manner is modulated and compressed,since multiple modulation patterns are imposed.Following this,reconstruction algorithms are utilized to recover the desired high-speed scene.One leading advantage of video CS is that a single captured measurement can be used to reconstruct a multi-frame video,thereby enabling a low-speed camera to capture high-speed scenes.Inspired by this,a number of variants of video CS systems have been built,mainly using different modulation devices.Meanwhile,in order to obtain high-quality reconstruction videos,many algorithms have been developed,from optimization-based iterative algorithms to deep-learning-based ones.Recently,emerging deep learning methods have been dominant due to their high-speed inference and high-quality reconstruction,highlighting the possibility of deploying video CS in practical applications.Toward this end,this paper reviews the progress that has been achieved in video CS during the past decade.We further analyze the efforts that need to be made—in terms of both hardware and algorithms—to enable real applications.Research gaps are put forward and future directions are summarized to help researchers and engineers working on this topic.
文摘In this paper,a video compressed sensing reconstruction algorithm based on multidimensional reference frames is proposed using the sparse characteristics of video signals in different sparse representation domains.First,the overall structure of the proposed video compressed sensing algorithm is introduced in this paper.The paper adopts a multi-reference frame bidirectional prediction hypothesis optimization algorithm.Then,the paper proposes a reconstruction method for CS frames at the re-decoding end.In addition to using key frames of each GOP reconstructed in the time domain as reference frames for reconstructing CS frames,half-pixel reference frames and scaled reference frames in the pixel domain are also used as CS frames.Reference frames of CS frames are used to obtain higher quality assumptions.Themethod of obtaining reference frames in the pixel domain is also discussed in detail in this paper.Finally,the reconstruction algorithm proposed in this paper is compared with video compression algorithms in the literature that have better reconstruction results.Experiments show that the algorithm has better performance than the best multi-reference frame video compression sensing algorithm and can effectively improve the quality of slowmotion video reconstruction.
基金funded by National Key Research and Development Program of China(No.2022YFC3302103).
文摘High-resolution video transmission requires a substantial amount of bandwidth.In this paper,we present a novel video processing methodology that innovatively integrates region of interest(ROI)identification and super-resolution enhancement.Our method commences with the accurate detection of ROIs within video sequences,followed by the application of advanced super-resolution techniques to these areas,thereby preserving visual quality while economizing on data transmission.To validate and benchmark our approach,we have curated a new gaming dataset tailored to evaluate the effectiveness of ROI-based super-resolution in practical applications.The proposed model architecture leverages the transformer network framework,guided by a carefully designed multi-task loss function,which facilitates concurrent learning and execution of both ROI identification and resolution enhancement tasks.This unified deep learning model exhibits remarkable performance in achieving super-resolution on our custom dataset.The implications of this research extend to optimizing low-bitrate video streaming scenarios.By selectively enhancing the resolution of critical regions in videos,our solution enables high-quality video delivery under constrained bandwidth conditions.Empirical results demonstrate a 15%reduction in transmission bandwidth compared to traditional super-resolution based compression methods,without any perceivable decline in visual quality.This work thus contributes to the advancement of video compression and enhancement technologies,offering an effective strategy for improving digital media delivery efficiency and user experience,especially in bandwidth-limited environments.The innovative integration of ROI identification and super-resolution presents promising avenues for future research and development in adaptive and intelligent video communication systems.
文摘In this paper, a new mesh based algorithm is applied for motion estimation and compensation in the wavelet domain. The first major contribution of this work is the introduction of a new active mesh based method for motion estimation and compensation. The proposed algorithm is based on the mesh energy minimization with novel sets of energy functions. The proposed energy functions have appropriate features, which improve the accuracy of motion estimation and compensation algorithm. We employ the proposed motion estimation algorithm in two different manners for video compression. In the first approach, the proposed algorithm is employed for motion estimation of consecutive frames. In the second approach, the algorithm is applied for motion estimation and compensation in the wavelet sub-bands. The experimental results reveal that the incorporation of active mesh based motion-compensated temporal filtering into wavelet sub-bands significantly improves the distortion performance rate of the video compression. We also use a new wavelet coder for the coding of the 3D volume of coefficients based on the retained energy criteria. This coder gives the maximum retained energy in all sub-bands. The proposed algorithm was tested with some video sequences and the results showed that the use of the proposed active mesh method for motion compensation and its implementation in sub-bands yields significant improvement in PSNR performance.
基金This work was supported in part by the National Natural Science Foundation of China (Grant Nos. 61622115 and 61472281), the Program for Professor of Special Appointment (Eastern Scholar) at Shanghai Institutions of Higher Learning (GZ2015005), and Shanghai Engineering Research Center of Industrial Vision Perception & Intelligent Computing ( 17DZ2251600).
文摘The evolution of social network and multimedia technologies encourage more and more people to generate and upload visual information, which leads to the generation of large-scale video data. Therefore, preeminent compression technologies are highly desired to facilitate the storage and transmission of these tremendous video data for a wide variety of applications. In this paper, a systematic review of the recent advances for large-scale video compression (LSVC) is presented. Specifically, fast video coding algorithms and effective models to improve video compression efficiency are introduced in detail, since coding complexity and compression efficiency are two important factors to evaluate video coding approaches. Finally, the challenges and fu- ture research trends for LSVC are discussed.
基金Project(No.2009CB320903)supported by the National Basic Research Program(973)of China
文摘Many important developments in video compression technologies have occurred during the past two decades. The block-based discrete cosine transform with motion compensation hybrid coding scheme has been widely employed by most available video coding standards, notably the ITU-T H.26x and ISO/IEC MPEG-x families and video part of China audio video coding standard (AVS). The objective of this paper is to provide a review of the developments of the four basic building blocks of hybrid coding scheme, namely predictive coding, transform coding, quantization and entropy coding, and give theoretical analyses and summaries of the technological advancements. We further analyze the development trends and perspectives of video com- pression, highlighting problems and research directions.
文摘In this paper, we summarize 3D perception-oriented algorithms for perceptually driven 3D video coding. Several perceptual ef- fects have been exploited for 2D video viewing; however, this is not yet the case for 3D video viewing. 3D video requires depth perception, which implies binocular effects such as con fl icts, fusion, and rivalry. A better understanding of these effects is necessary for 3D perceptual compression, which provides users with a more comfortable visual experience for video that is de- livered over a channel with limited bandwidth. We present state-of-the-art of 3D visual attention models, 3D just-notice- able difference models, and 3D texture-synthesis models that address 3D human vision issues in 3D video coding and trans-mission.
基金supported by the Innovation Project of Graduate Students of Jiangsu Province, China under Grants No. CXZZ12_0466, No. CXZZ11_0390the National Natural Science Foundation of China under Grants No. 61071091, No. 61271240, No. 61201160, No. 61172118+2 种基金the Natural Science Foundation of the Higher Education Institutions of Jiangsu Province, China under Grant No. 12KJB510019the Science and Technology Research Program of Hubei Provincial Department of Education under Grants No. D20121408, No. D20121402the Program for Research Innovation of Nanjing Institute of Technology Project under Grant No. CKJ20110006
文摘Video reconstruction quality largely depends on the ability of employed sparse domain to adequately represent the underlying video in Distributed Compressed Video Sensing (DCVS). In this paper, we propose a novel dynamic global-Principal Component Analysis (PCA) sparse representation algorithm for video based on the sparse-land model and nonlocal similarity. First, grouping by matching is realized at the decoder from key frames that are previously recovered. Second, we apply PCA to each group (sub-dataset) to compute the principle components from which the sub-dictionary is constructed. Finally, the non-key frames are reconstructed from random measurement data using a Compressed Sensing (CS) reconstruction algorithm with sparse regularization. Experimental results show that our algorithm has a better performance compared with the DCT and K-SVD dictionaries.
基金supported by the National Natural Science Foundation of China (Nos. NSFC 61925105, 62322109, 62171257 and U22B2001)the Xplorer Prize in Information and Electronics technologiesthe Tsinghua University (Department of Electronic Engineering)-Nantong Research Institute for Advanced Communication Technologies Joint Research Center for Space, Air, Ground and Sea Cooperative Communication Network Technology
文摘Multimedia semantic communication has been receiving increasing attention due to its significant enhancement of communication efficiency.Semantic coding,which is oriented towards extracting and encoding the key semantics of video for transmission,is a key aspect in the framework of multimedia semantic communication.In this paper,we propose a facial video semantic coding method with low bitrate based on the temporal continuity of video semantics.At the sender’s end,we selectively transmit facial keypoints and deformation information,allocating distinct bitrates to different keypoints across frames.Compressive techniques involving sampling and quantization are employed to reduce the bitrate while retaining facial key semantic information.At the receiver’s end,a GAN-based generative network is utilized for reconstruction,effectively mitigating block artifacts and buffering problems present in traditional codec algorithms under low bitrates.The performance of the proposed approach is validated on multiple datasets,such as VoxCeleb and TalkingHead-1kH,employing metrics such as LPIPS,DISTS,and AKD for assessment.Experimental results demonstrate significant advantages over traditional codec methods,achieving up to approximately 10-fold bitrate reduction in prolonged,stable head pose scenarios across diverse conversational video settings.
基金Supported by National Natural Science Foundation of China(61170147)Major Cooperation Project of Production and College in Fujian Province(2012H61010016)Natural Science Foundation of Fujian Province(2013J01234)
文摘Compressed sensing(CS)is a novel technology to acquire and reconstruct sparse signals below the Nyquist rate.It has great potential in image and video acquisition and processing.To effectively improve the sparsity of signal being measured and reconstructing efficiency,an encoding and decoding model of residual distributed compressive video sensing based on double side information(RDCVS-DSI)is proposed in this paper.Exploiting the characteristics of image itself in the frequency domain and the correlation between successive frames,the model regards the video frame in low quality as the first side information in the process of coding,and generates the second side information for the non-key frames using motion estimation and compensation technology at its decoding end.Performance analysis and simulation experiments show that the RDCVS-DSI model can rebuild the video sequence with high fidelity in the consumption of quite low complexity.About 1~5 dB gain in the average peak signal-to-noise ratio of the reconstructed frames is observed,and the speed is close to the least complex DCVS,when compared with prior works on compressive video sensing.
基金the Natural Science Foundation of Jiangsu Province (No.BK2004151).
文摘Super-Resolution (SR) technique means to reconstruct High-Resolution (HR) images from a sequence of Low-Resolution (LR) observations,which has been a great focus for compressed video. Based on the theory of Projection Onto Convex Set (POCS),this paper constructs Quantization Constraint Set (QCS) using the quantization information extracted from the video bit stream. By combining the statistical properties of image and the Human Visual System (HVS),a novel Adaptive Quantization Constraint Set (AQCS) is proposed. Simulation results show that AQCS-based SR al-gorithm converges at a fast rate and obtains better performance in both objective and subjective quality,which is applicable for compressed video.
基金This work was supported by European IST FP6 Research Programme as funded for the Integrated Project:LIVE(No.IST-4-027312).
文摘This paper proposes a thorough scheme, by virtue of camera zooming descriptor with two-level threshold, to automatically retrieve close-ups directly from moving picture experts group (MPEG) compressed videos based on camera motion analysis. A new algorithm for fast camera motion estimation in compressed domain is presented. In the retrieval process, camera-motion-based semantic retrieval is built. To improve the coverage of the proposed scheme, close-up retrieval in all kinds of videos is investigated. Extensive experiments illustrate that the proposed scheme provides promising retrieval results under real-time and automatic application scenario.
基金Supported by the Natural Science Foundation of Jiangsu Province (No. BK2004151).
文摘This letter proposes a novel method of compressed video super-resolution reconstruction based on MAP-POCS (Maximum Posterior Probability-Projection Onto Convex Set). At first assuming the high-resolution model subject to Poisson-Markov distribution, then constructing the projecting convex based on MAP. According to the characteristics of compressed video, two different convexes are constructed based on integrating the inter-frame and intra-frame information in the wavelet-domain. The results of the experiment demonstrate that the new method not only outperforms the traditional algorithms on the aspects of PSNR (Peak Signal-to-Noise Ratio), MSE (Mean Square Error) and reconstruction vision effect, but also has the advantages of rapid convergence and easy extension.
文摘Extraction of traffic information from image or video sequence is a hot research topic in intelligenttransportation system and computer vision. A real-time traffic information extraction method based on com-pressed video with interframe motion vectors for speed, density and flow detection, has been proposed for ex-traction of traffic information under fixed camera setting and well-defined environment. The motion vectors arefirst separated from the compressed video streams, and then filtered to eliminate incorrect and noisy vectors u-sing the well-defined environmental knowledge. By applying the projective transform and using the filtered mo-tion vectors, speed can be calculated from motion vector statistics, density can be estimated using the motionvector occupancy, and flow can be detected using the combination of speed and density. The embodiment of aprototype system for sky camera traffic monitoring using the MPEG video has been implemented, and experi-mental results proved the effectiveness of the method proposed.
文摘A layered compression algorithm is presented which delivers spatial scalable encoded bit streams for remote video monitoring system. The complexity of the algorithm is modest and is well suited to real time implementation. Based on the layered compression algorithm, a codec system model is established. High-speed video compression can be realized with parallel data compression in this codec system. For image reconstruction, a prediction method using two nearest pix points is presented.
文摘In this paper, we present a method using video codec technology to compress ECG signals. This method exploits both intra-beat and inter-beat correlations of the ECG signals to achieve high compression ratios (CR) and a low percent root mean square difference (PRD). Since ECG signals have both intra-beat and inter-beat redundancies like video signals, which have both intra-frame and inter-frame correlation, video codec technology can be used for ECG compression. In order to do this, some pre-process will be needed. The ECG signals should firstly be segmented and normalized to a sequence of beat cycles with the same length, and then these beat cycles can be treated as picture frames and compressed with video codec technology. We have used records from MIT-BIH arrhythmia database to evaluate our algorithm. Results show that, besides compression efficiently, this algorithm has the advantages of resolution adjustable, random access and flexibility for irregular period and QRS false detection.
基金supported in part by the National Natural Science Foundation of China (No. U23B2011)。
文摘Video snapshot compressive imaging(Video SCI) modulates scenes using various encoding masks and captures compressed measurements with a low-speed camera during a single exposure. Subsequently, reconstruction algorithms restore image sequences of dynamic scenes, offering advantages such as reduced bandwidth and storage space requirements. The temporal correlation in video data is crucial for Video SCI, as it leverages the temporal relationships among frames to enhance the efficiency and quality of reconstruction algorithms, particularly for fast-moving objects.This paper discretizes video frames to create image datasets with the same data volume but differing temporal correlations. We utilized the state-of-the-art(SOTA) reconstruction framework, EfficientSCI++, to train various compressed reconstruction models with these differing temporal correlations. Evaluating the reconstruction results from these models, our simulation experiments confirm that a reduction in temporal correlation leads to decreased reconstruction accuracy. Additionally, we simulated the reconstruction outcomes of datasets devoid of temporal correlation, illustrating that models trained on non-temporal data affect the temporal feature extraction capabilities of transformers, resulting in negligible impacts on the evaluation of reconstruction results for non-temporal correlation test datasets.
文摘In the frame of compressed sensing distributed video coding, the design of the quantization matrix directly affects the reconstruction quality of the receiving terminal of the video. In this article, we present a new design method of the Gaussian quantization matrix adapting to the compressed sensing coding, for that the distribution of the parameters of the image is featured of the characteristic of approximately normal distribution after measured by compressive sensing. By this way, the parameters of a certain quantity of the image frames depending on the video sequences generated by the Gaussian quantization matrix possess certain adaptive capacity. By comparison with the plan of the traditional quantization, the quantization matrix presented in this article would improve the reconstruction quality of the video.