Data compression plays a key role in optimizing the use of memory storage space and also reducing latency in data transmission. In this paper, we are interested in lossless compression techniques because their perform...Data compression plays a key role in optimizing the use of memory storage space and also reducing latency in data transmission. In this paper, we are interested in lossless compression techniques because their performance is exploited with lossy compression techniques for images and videos generally using a mixed approach. To achieve our intended objective, which is to study the performance of lossless compression methods, we first carried out a literature review, a summary of which enabled us to select the most relevant, namely the following: arithmetic coding, LZW, Tunstall’s algorithm, RLE, BWT, Huffman coding and Shannon-Fano. Secondly, we designed a purposive text dataset with a repeating pattern in order to test the behavior and effectiveness of the selected compression techniques. Thirdly, we designed the compression algorithms and developed the programs (scripts) in Matlab in order to test their performance. Finally, following the tests conducted on relevant data that we constructed according to a deliberate model, the results show that these methods presented in order of performance are very satisfactory:- LZW- Arithmetic coding- Tunstall algorithm- BWT + RLELikewise, it appears that on the one hand, the performance of certain techniques relative to others is strongly linked to the sequencing and/or recurrence of symbols that make up the message, and on the other hand, to the cumulative time of encoding and decoding.展开更多
Light-field fluorescence microscopy(LFM)is a powerful elegant compact method for long-term highspeed imaging of complex biological systems,such as neuron activities and rapid movements of organelles.LFM experiments ty...Light-field fluorescence microscopy(LFM)is a powerful elegant compact method for long-term highspeed imaging of complex biological systems,such as neuron activities and rapid movements of organelles.LFM experiments typically generate terabytes of image data and require a substantial amount of storage space.Some lossy compression algorithms have been proposed recently with good compression performance.However,since the specimen usually only tolerates low-power density illumination for longterm imaging with low phototoxicity,the image signal-to-noise ratio(SNR)is relatively low,which will cause the loss of some efficient position or intensity information using such lossy compression algorithms.Here,we propose a phase-space continuity-enhanced bzip2(PC-bzip2)lossless compression method for LFM data as a high-efficiency and open-source tool that combines graphics processing unit-based fast entropy judgment and multicore-CPU-based high-speed lossless compression.Our proposed method achieves almost 10%compression ratio improvement while keeping the capability of high-speed compression,compared with the original bzip2.We evaluated our method on fluorescence beads data and fluorescence staining cells data with different SNRs.Moreover,by introducing temporal continuity,our method shows the superior compression ratio on time series data of zebrafish blood vessels.展开更多
A simple and adaptive lossless compression algorithm is proposed for remote sensing image compression, which includes integer wavelet transform and the Rice entropy coder. By analyzing the probability distribution of ...A simple and adaptive lossless compression algorithm is proposed for remote sensing image compression, which includes integer wavelet transform and the Rice entropy coder. By analyzing the probability distribution of integer wavelet transform coefficients and the characteristics of Rice entropy coder, the divide and rule method is used for high-frequency sub-bands and low-frequency one. High-frequency sub-bands are coded by the Rice entropy coder, and low-frequency coefficients are predicted before coding. The role of predictor is to map the low-frequency coefficients into symbols suitable for the entropy coding. Experimental results show that the average Comprcssion Ratio (CR) of our approach is about two, which is close to that of JPEG 2000. The algorithm is simple and easy to be implemented in hardware. Moreover, it has the merits of adaptability, and independent data packet. So the algorithm can adapt to space lossless compression applications.展开更多
In this paper, a new predictive model, adapted to QTM (Quaternary Triangular Mesh) pixel compression, is introduced. Our approach starts with the principles of proposed predictive models based on available QTM neighbo...In this paper, a new predictive model, adapted to QTM (Quaternary Triangular Mesh) pixel compression, is introduced. Our approach starts with the principles of proposed predictive models based on available QTM neighbor pixels. An algorithm of ascertaining available QTM neighbors is also proposed. Then, the method for reducing space complexities in the procedure of predicting QTM pixel values is presented. Next, the structure for storing compressed QTM pixel is proposed. In the end, the experiment on comparing compression ratio of this method with other methods is carried out by using three wave bands data of 1 km resolution of NOAA images in China. The results indicate that: 1) the compression method performs better than any other, such as Run Length Coding, Arithmetic Coding, Huffman Cod- ing, etc; 2) the average size of compressed three wave band data based on the neighbor QTM pixel predictive model is 31.58% of the origin space requirements and 67.5% of Arithmetic Coding without predictive model.展开更多
We propose a novel, lossless compression algorithm, based on the 2D Discrete Fast Fourier Transform, to approximate the Algorithmic (Kolmogorov) Complexity of Elementary Cellular Automata. Fast Fourier transforms are ...We propose a novel, lossless compression algorithm, based on the 2D Discrete Fast Fourier Transform, to approximate the Algorithmic (Kolmogorov) Complexity of Elementary Cellular Automata. Fast Fourier transforms are widely used in image compression but their lossy nature exclude them as viable candidates for Kolmogorov Complexity approximations. For the first time, we present a way to adapt fourier transforms for lossless image compression. The proposed method has a very strong Pearsons correlation to existing complexity metrics and we further establish its consistency as a complexity metric by confirming its measurements never exceed the complexity of nothingness and randomness (representing the lower and upper limits of complexity). Surprisingly, many of the other methods tested fail this simple sanity check. A final symmetry-based test also demonstrates our method’s superiority over existing lossless compression metrics. All complexity metrics tested, as well as the code used to generate and augment the original dataset, can be found in our github repository: ECA complexity metrics<sup>1</sup>.展开更多
This article presents a coding method for the lossless compression of color video. In the proposed method, four-dimensional matrix Walsh transform (4D-M-Walsh-T) is used for color video coding. The whole n frames of...This article presents a coding method for the lossless compression of color video. In the proposed method, four-dimensional matrix Walsh transform (4D-M-Walsh-T) is used for color video coding. The whole n frames of a color video sequence are divided into '3D-blocks' which are image width (row component), image height (column component), image width (vertical component) in a color video sequence, and adjacency (depth component) of n frames (Y, U or V) of the video sequence. Similar to the method of 2D-Walsh transform, 4D-M-Walsh-T is 4D sub-matrices, and the size of each sub-matrix is n. The method can fully utilize correlations to encode for lossless compression and reduce the redundancy of color video, such as adjacent pixels in one frame or different frames of a video at the same time. Experimental results show that the proposed method can achieve higher lossless compression ratio (CR) for the color video sequence.展开更多
The technique of lossless image compression plays an important role in image transmission and storage for high quality. At present, both the compression ratio and processing speed should be considered in a real-time m...The technique of lossless image compression plays an important role in image transmission and storage for high quality. At present, both the compression ratio and processing speed should be considered in a real-time multimedia system. A novel lossless compression algorithm is researched. A low complexity predictive model is proposed using the correlation of pixels and color components. In the meantime, perceptron in neural network is used to rectify the prediction values adaptively. It makes the prediction residuals smaller and in a small dynamic scope. Also a color space transform is used and good decorrelation is obtained in our algorithm. The compared experimental results have shown that our algorithm has a noticeably better performance than traditional algorithms. Compared to the new standard JPEG-LS, this predictive model reduces its computational complexity. And its speed is faster than the JPEG-LS with negligible performance sacrifice.展开更多
To improve the classical lossless compression of low efficiency,a method of image lossless compression with high efficiency is presented.Its theory and the algorithm implementation are introduced.The basic approach of...To improve the classical lossless compression of low efficiency,a method of image lossless compression with high efficiency is presented.Its theory and the algorithm implementation are introduced.The basic approach of medical image lossless compression is then briefly described.After analyzing and implementing differential plus code modulation(DPCM)in lossless compression,a new method of combining an integer wavelet transform with DPCM to compress medical images is discussed.The analysis and simulation results show that this new method is simpler and useful.Moreover,it has high compression ratio in medical image lossless compression.展开更多
Along with the proliferating research interest in semantic communication(Sem Com),joint source channel coding(JSCC)has dominated the attention due to the widely assumed existence in efficiently delivering information ...Along with the proliferating research interest in semantic communication(Sem Com),joint source channel coding(JSCC)has dominated the attention due to the widely assumed existence in efficiently delivering information semantics.Nevertheless,this paper challenges the conventional JSCC paradigm and advocates for adopting separate source channel coding(SSCC)to enjoy a more underlying degree of freedom for optimization.We demonstrate that SSCC,after leveraging the strengths of the Large Language Model(LLM)for source coding and Error Correction Code Transformer(ECCT)complemented for channel coding,offers superior performance over JSCC.Our proposed framework also effectively highlights the compatibility challenges between Sem Com approaches and digital communication systems,particularly concerning the resource costs associated with the transmission of high-precision floating point numbers.Through comprehensive evaluations,we establish that assisted by LLM-based compression and ECCT-enhanced error correction,SSCC remains a viable and effective solution for modern communication systems.In other words,separate source channel coding is still what we need.展开更多
Due to the particularity of the seismic data, they must be treated by lossless compression algorithm in some cases. In the paper, based on the integer wavelet transform, the lossless compression algorithm is studied....Due to the particularity of the seismic data, they must be treated by lossless compression algorithm in some cases. In the paper, based on the integer wavelet transform, the lossless compression algorithm is studied. Comparing with the traditional algorithm, it can better improve the compression rate. CDF (2, n) biorthogonal wavelet family can lead to better compression ratio than other CDF family, SWE and CRF, which is owe to its capability in can- celing data redundancies and focusing data characteristics. CDF (2, n) family is suitable as the wavelet function of the lossless compression seismic data.展开更多
We study an approach to integer wavelet transform for lossless compression of medical image in medical picture archiving and communication system (PACS). By lifting scheme a reversible integer wavelet transform is gen...We study an approach to integer wavelet transform for lossless compression of medical image in medical picture archiving and communication system (PACS). By lifting scheme a reversible integer wavelet transform is generated, which has the similar features with the corresponding biorthogonal wavelet transform. Experimental results of the method based on integer wavelet transform are given to show better performance and great applicable potentiality in medical image compression.展开更多
In this document, we present new techniques for near-lossless and lossy compression of SAR imagery saved in PNG and binary formats of magnitude and phase data based on the application of transforms, dimensionality red...In this document, we present new techniques for near-lossless and lossy compression of SAR imagery saved in PNG and binary formats of magnitude and phase data based on the application of transforms, dimensionality reduction methods, and lossless compression. In particular, we discuss the use of blockwise integer to integer transforms, subsequent application of a dimensionality reduction method, and Burrows-Wheeler based lossless compression for the PNG data and the use of high correlation based modeling of sorted transform coefficients for the raw floating point magnitude and phase data. The gains exhibited are substantial over the application of different lossless methods directly on the data and competitive with existing lossy approaches. The methods presented are effective for large scale processing of similar data formats as they are heavily based on techniques which scale well on parallel architectures.展开更多
Purpose The rapid growth in image data generated by high-energy photon sources poses significant challenges for storage and analysis,with conventional compression methods offering compression ratios often below 1.5.Me...Purpose The rapid growth in image data generated by high-energy photon sources poses significant challenges for storage and analysis,with conventional compression methods offering compression ratios often below 1.5.Methods This study introduces a novel,fast lossless compression method that combines deep learning with a hybrid computing architecture to overcome existing compression limitations.By employing a spatiotemporal learning network for predictive pixel value estimation and a residual quantization algorithm for efficient encoding.Results When benchmarked against the DeepZip algorithm,our approach demonstrates a 40%reduction in compression time while maintaining comparable compression ratios using identical computational resources.The implementation of a GPU+CPU+FPGA hybrid architecture further accelerates compression,reducing time by an additional 38%.Conclusions This study presents an innovative solution for efficiently storing and managing large-scale image data from synchrotron radiation facilities,harnessing the power of deep learning and advanced computing architectures.展开更多
LiDAR devices are capable of acquiring clouds of 3D points reflecting any object around them,and adding additional attributes to each point such as color,position,time,etc.LiDAR datasets are usually large,and compress...LiDAR devices are capable of acquiring clouds of 3D points reflecting any object around them,and adding additional attributes to each point such as color,position,time,etc.LiDAR datasets are usually large,and compressed data formats(e.g.LAZ)have been proposed over the years.These formats are capable of transparently decompressing portions of the data,but they are not focused on solving general queries over the data.In contrast to that traditional approach,a new recent research line focuses on designing data structures that combine compression and indexation,allowing directly querying the compressed data.Compression is used to fit the data structure in main memory all the time,thus getting rid of disk accesses,and indexation is used to query the compressed data as fast as querying the uncompressed data.In this paper,we present the first data structure capable of losslessly compressing point clouds that have attributes and jointly indexing all three dimensions of space and attribute values.Our method is able to run range queries and attribute queries up to 100 times faster than previous methods.展开更多
In the fields of GPU and AI chip design,the frequent read and write operations on the color buffer data(ARGB),which are intensive in graphical and image access,significantly impact performance.There is a need for appl...In the fields of GPU and AI chip design,the frequent read and write operations on the color buffer data(ARGB),which are intensive in graphical and image access,significantly impact performance.There is a need for applications that require random access and only read small images once.To address this situation,this paper proposes an algorithm with lower modeling complexity,yet achieving near-complex implementation results,along with its FPGA implementation method.Through actual testing on multiple images,the average lossless compression rate reached 40.3%.With hardware acceleration,the execution efficiency of the algorithm was further improved,ensuring both compression rate and speed,thus confirming the effectiveness of the algorithm.展开更多
As various types of data grow explosively,largescale data storage,backup,and transmission become challenging,which motivates many researchers to propose efficient universal compression algorithms for multi-source data...As various types of data grow explosively,largescale data storage,backup,and transmission become challenging,which motivates many researchers to propose efficient universal compression algorithms for multi-source data.In recent years,due to the emergence of hardware acceleration devices such as GPUs,TPUs,DPUs,and FPGAs,the performance bottleneck of neural networks(NN)has been overcome,making NN-based compression algorithms increasingly practical and popular.However,the research survey for the NN-based universal lossless compressors has not been conducted yet,and there is also a lack of unified evaluation metrics.To address the above problems,in this paper,we present a holistic survey as well as benchmark evaluations.Specifically,i)we thoroughly investigate NNbased lossless universal compression algorithms toward multisource data and classify them into 3 types:static pre-training,adaptive,and semi-adaptive.ii)We unify 19 evaluation metrics to comprehensively assess the compression effect,resource consumption,and model performance of compressors.iii)We conduct experiments more than 4600 CPU/GPU hours to evaluate 17 state-of-the-art compressors on 28 real-world datasets across data types of text,images,videos,audio,etc.iv)We also summarize the strengths and drawbacks of NNbased lossless data compressors and discuss promising research directions.We summarize the results as the NN-based Lossless Compressors Benchmark(NNLCB,See fahaihi.github.io/NNLCB website),which will be updated and maintained continuously in the future.展开更多
For protecting the copyright of a text and recovering its original content harmlessly,this paper proposes a novel reversible natural language watermarking method that combines arithmetic coding and synonym substitutio...For protecting the copyright of a text and recovering its original content harmlessly,this paper proposes a novel reversible natural language watermarking method that combines arithmetic coding and synonym substitution operations.By analyzing relative frequencies of synonymous words,synonyms employed for carrying payload are quantized into an unbalanced and redundant binary sequence.The quantized binary sequence is compressed by adaptive binary arithmetic coding losslessly to provide a spare for accommodating additional data.Then,the compressed data appended with the watermark are embedded into the cover text via synonym substitutions in an invertible manner.On the receiver side,the watermark and compressed data can be extracted by decoding the values of synonyms in the watermarked text,as a result of which the original context can be perfectly recovered by decompressing the extracted compressed data and substituting the replaced synonyms with their original synonyms.Experimental results demonstrate that the proposed method can extract the watermark successfully and achieve a lossless recovery of the original text.Additionally,it achieves a high embedding capacity.展开更多
The capacity and the scale of smart substation are expanding constantly,with the characteristics of information digitization and automation,leading to a quantitative trend of data.Aiming at the existing processing sho...The capacity and the scale of smart substation are expanding constantly,with the characteristics of information digitization and automation,leading to a quantitative trend of data.Aiming at the existing processing shortages in the big data processing,the query and analysis of smart substation,a data compression processing method is proposed for analyzing smart substation and Hive.Experimental results show that the compression ratio and query time of RCFile storage format are better than those of TextFile and SequenceFile.The query efficiency is improved for data compressed by Deflate,Gzip and Lzo compression formats.The results verify the correctness of adjacent speedup defined as the index of cluster efficiency.Results also prove that the method has a significant theoretical and practical value for big data processing of smart substation.展开更多
We describe practical improvements for parallel BWT-based lossless compressors frequently utilized in modern day big data applications.We propose a clustering-based data permutation approach for improving compression...We describe practical improvements for parallel BWT-based lossless compressors frequently utilized in modern day big data applications.We propose a clustering-based data permutation approach for improving compression ratio for data with significant alphabet variation along with a faster string sorting approach based on the application of the O(n)complexity counting sort with permutation reindexing.展开更多
文摘Data compression plays a key role in optimizing the use of memory storage space and also reducing latency in data transmission. In this paper, we are interested in lossless compression techniques because their performance is exploited with lossy compression techniques for images and videos generally using a mixed approach. To achieve our intended objective, which is to study the performance of lossless compression methods, we first carried out a literature review, a summary of which enabled us to select the most relevant, namely the following: arithmetic coding, LZW, Tunstall’s algorithm, RLE, BWT, Huffman coding and Shannon-Fano. Secondly, we designed a purposive text dataset with a repeating pattern in order to test the behavior and effectiveness of the selected compression techniques. Thirdly, we designed the compression algorithms and developed the programs (scripts) in Matlab in order to test their performance. Finally, following the tests conducted on relevant data that we constructed according to a deliberate model, the results show that these methods presented in order of performance are very satisfactory:- LZW- Arithmetic coding- Tunstall algorithm- BWT + RLELikewise, it appears that on the one hand, the performance of certain techniques relative to others is strongly linked to the sequencing and/or recurrence of symbols that make up the message, and on the other hand, to the cumulative time of encoding and decoding.
基金supported by the National Natural Science Foundation of China(Grant Nos.62371006,61931008,U21B2024,and 62071415)the National Key Research and Development Program of China(Grant No.2020YFB1406604)+2 种基金the Zhejiang Provincial Natural Science Foundation of China(Grant Nos.LDT23F01011F01,LDT23F01015F01,and LDT23F01014F01)the“Pioneer”and“Leading Goose”R&D Program of Zhejiang Province(Grant No.2022C01068)the China Postdoctoral Science Foundation(Grant Nos.2023TQ0006 and GZC20230057).
文摘Light-field fluorescence microscopy(LFM)is a powerful elegant compact method for long-term highspeed imaging of complex biological systems,such as neuron activities and rapid movements of organelles.LFM experiments typically generate terabytes of image data and require a substantial amount of storage space.Some lossy compression algorithms have been proposed recently with good compression performance.However,since the specimen usually only tolerates low-power density illumination for longterm imaging with low phototoxicity,the image signal-to-noise ratio(SNR)is relatively low,which will cause the loss of some efficient position or intensity information using such lossy compression algorithms.Here,we propose a phase-space continuity-enhanced bzip2(PC-bzip2)lossless compression method for LFM data as a high-efficiency and open-source tool that combines graphics processing unit-based fast entropy judgment and multicore-CPU-based high-speed lossless compression.Our proposed method achieves almost 10%compression ratio improvement while keeping the capability of high-speed compression,compared with the original bzip2.We evaluated our method on fluorescence beads data and fluorescence staining cells data with different SNRs.Moreover,by introducing temporal continuity,our method shows the superior compression ratio on time series data of zebrafish blood vessels.
文摘A simple and adaptive lossless compression algorithm is proposed for remote sensing image compression, which includes integer wavelet transform and the Rice entropy coder. By analyzing the probability distribution of integer wavelet transform coefficients and the characteristics of Rice entropy coder, the divide and rule method is used for high-frequency sub-bands and low-frequency one. High-frequency sub-bands are coded by the Rice entropy coder, and low-frequency coefficients are predicted before coding. The role of predictor is to map the low-frequency coefficients into symbols suitable for the entropy coding. Experimental results show that the average Comprcssion Ratio (CR) of our approach is about two, which is close to that of JPEG 2000. The algorithm is simple and easy to be implemented in hardware. Moreover, it has the merits of adaptability, and independent data packet. So the algorithm can adapt to space lossless compression applications.
基金Project 40471108 supported by the National Natural Science Foundation of China
文摘In this paper, a new predictive model, adapted to QTM (Quaternary Triangular Mesh) pixel compression, is introduced. Our approach starts with the principles of proposed predictive models based on available QTM neighbor pixels. An algorithm of ascertaining available QTM neighbors is also proposed. Then, the method for reducing space complexities in the procedure of predicting QTM pixel values is presented. Next, the structure for storing compressed QTM pixel is proposed. In the end, the experiment on comparing compression ratio of this method with other methods is carried out by using three wave bands data of 1 km resolution of NOAA images in China. The results indicate that: 1) the compression method performs better than any other, such as Run Length Coding, Arithmetic Coding, Huffman Cod- ing, etc; 2) the average size of compressed three wave band data based on the neighbor QTM pixel predictive model is 31.58% of the origin space requirements and 67.5% of Arithmetic Coding without predictive model.
文摘We propose a novel, lossless compression algorithm, based on the 2D Discrete Fast Fourier Transform, to approximate the Algorithmic (Kolmogorov) Complexity of Elementary Cellular Automata. Fast Fourier transforms are widely used in image compression but their lossy nature exclude them as viable candidates for Kolmogorov Complexity approximations. For the first time, we present a way to adapt fourier transforms for lossless image compression. The proposed method has a very strong Pearsons correlation to existing complexity metrics and we further establish its consistency as a complexity metric by confirming its measurements never exceed the complexity of nothingness and randomness (representing the lower and upper limits of complexity). Surprisingly, many of the other methods tested fail this simple sanity check. A final symmetry-based test also demonstrates our method’s superiority over existing lossless compression metrics. All complexity metrics tested, as well as the code used to generate and augment the original dataset, can be found in our github repository: ECA complexity metrics<sup>1</sup>.
基金supported by the National Natural Science Foundation of China (60832002, 60702036)
文摘This article presents a coding method for the lossless compression of color video. In the proposed method, four-dimensional matrix Walsh transform (4D-M-Walsh-T) is used for color video coding. The whole n frames of a color video sequence are divided into '3D-blocks' which are image width (row component), image height (column component), image width (vertical component) in a color video sequence, and adjacency (depth component) of n frames (Y, U or V) of the video sequence. Similar to the method of 2D-Walsh transform, 4D-M-Walsh-T is 4D sub-matrices, and the size of each sub-matrix is n. The method can fully utilize correlations to encode for lossless compression and reduce the redundancy of color video, such as adjacent pixels in one frame or different frames of a video at the same time. Experimental results show that the proposed method can achieve higher lossless compression ratio (CR) for the color video sequence.
基金This project was supported by the National Natural Science Foundation of China (60172045).
文摘The technique of lossless image compression plays an important role in image transmission and storage for high quality. At present, both the compression ratio and processing speed should be considered in a real-time multimedia system. A novel lossless compression algorithm is researched. A low complexity predictive model is proposed using the correlation of pixels and color components. In the meantime, perceptron in neural network is used to rectify the prediction values adaptively. It makes the prediction residuals smaller and in a small dynamic scope. Also a color space transform is used and good decorrelation is obtained in our algorithm. The compared experimental results have shown that our algorithm has a noticeably better performance than traditional algorithms. Compared to the new standard JPEG-LS, this predictive model reduces its computational complexity. And its speed is faster than the JPEG-LS with negligible performance sacrifice.
基金supported by the National Natural Science Foundation of China (Grant No.60475036).
文摘To improve the classical lossless compression of low efficiency,a method of image lossless compression with high efficiency is presented.Its theory and the algorithm implementation are introduced.The basic approach of medical image lossless compression is then briefly described.After analyzing and implementing differential plus code modulation(DPCM)in lossless compression,a new method of combining an integer wavelet transform with DPCM to compress medical images is discussed.The analysis and simulation results show that this new method is simpler and useful.Moreover,it has high compression ratio in medical image lossless compression.
基金supported in part by the National Key Research and Development Program of China under Grant No.2024YFE0200600the Zhejiang Provincial Natural Science Foundation of China under Grant No.LR23F010005the Huawei Cooperation Project under Grant No.TC20240829036。
文摘Along with the proliferating research interest in semantic communication(Sem Com),joint source channel coding(JSCC)has dominated the attention due to the widely assumed existence in efficiently delivering information semantics.Nevertheless,this paper challenges the conventional JSCC paradigm and advocates for adopting separate source channel coding(SSCC)to enjoy a more underlying degree of freedom for optimization.We demonstrate that SSCC,after leveraging the strengths of the Large Language Model(LLM)for source coding and Error Correction Code Transformer(ECCT)complemented for channel coding,offers superior performance over JSCC.Our proposed framework also effectively highlights the compatibility challenges between Sem Com approaches and digital communication systems,particularly concerning the resource costs associated with the transmission of high-precision floating point numbers.Through comprehensive evaluations,we establish that assisted by LLM-based compression and ECCT-enhanced error correction,SSCC remains a viable and effective solution for modern communication systems.In other words,separate source channel coding is still what we need.
文摘Due to the particularity of the seismic data, they must be treated by lossless compression algorithm in some cases. In the paper, based on the integer wavelet transform, the lossless compression algorithm is studied. Comparing with the traditional algorithm, it can better improve the compression rate. CDF (2, n) biorthogonal wavelet family can lead to better compression ratio than other CDF family, SWE and CRF, which is owe to its capability in can- celing data redundancies and focusing data characteristics. CDF (2, n) family is suitable as the wavelet function of the lossless compression seismic data.
文摘We study an approach to integer wavelet transform for lossless compression of medical image in medical picture archiving and communication system (PACS). By lifting scheme a reversible integer wavelet transform is generated, which has the similar features with the corresponding biorthogonal wavelet transform. Experimental results of the method based on integer wavelet transform are given to show better performance and great applicable potentiality in medical image compression.
文摘In this document, we present new techniques for near-lossless and lossy compression of SAR imagery saved in PNG and binary formats of magnitude and phase data based on the application of transforms, dimensionality reduction methods, and lossless compression. In particular, we discuss the use of blockwise integer to integer transforms, subsequent application of a dimensionality reduction method, and Burrows-Wheeler based lossless compression for the PNG data and the use of high correlation based modeling of sorted transform coefficients for the raw floating point magnitude and phase data. The gains exhibited are substantial over the application of different lossless methods directly on the data and competitive with existing lossy approaches. The methods presented are effective for large scale processing of similar data formats as they are heavily based on techniques which scale well on parallel architectures.
文摘Purpose The rapid growth in image data generated by high-energy photon sources poses significant challenges for storage and analysis,with conventional compression methods offering compression ratios often below 1.5.Methods This study introduces a novel,fast lossless compression method that combines deep learning with a hybrid computing architecture to overcome existing compression limitations.By employing a spatiotemporal learning network for predictive pixel value estimation and a residual quantization algorithm for efficient encoding.Results When benchmarked against the DeepZip algorithm,our approach demonstrates a 40%reduction in compression time while maintaining comparable compression ratios using identical computational resources.The implementation of a GPU+CPU+FPGA hybrid architecture further accelerates compression,reducing time by an additional 38%.Conclusions This study presents an innovative solution for efficiently storing and managing large-scale image data from synchrotron radiation facilities,harnessing the power of deep learning and advanced computing architectures.
文摘LiDAR devices are capable of acquiring clouds of 3D points reflecting any object around them,and adding additional attributes to each point such as color,position,time,etc.LiDAR datasets are usually large,and compressed data formats(e.g.LAZ)have been proposed over the years.These formats are capable of transparently decompressing portions of the data,but they are not focused on solving general queries over the data.In contrast to that traditional approach,a new recent research line focuses on designing data structures that combine compression and indexation,allowing directly querying the compressed data.Compression is used to fit the data structure in main memory all the time,thus getting rid of disk accesses,and indexation is used to query the compressed data as fast as querying the uncompressed data.In this paper,we present the first data structure capable of losslessly compressing point clouds that have attributes and jointly indexing all three dimensions of space and attribute values.Our method is able to run range queries and attribute queries up to 100 times faster than previous methods.
文摘In the fields of GPU and AI chip design,the frequent read and write operations on the color buffer data(ARGB),which are intensive in graphical and image access,significantly impact performance.There is a need for applications that require random access and only read small images once.To address this situation,this paper proposes an algorithm with lower modeling complexity,yet achieving near-complex implementation results,along with its FPGA implementation method.Through actual testing on multiple images,the average lossless compression rate reached 40.3%.With hardware acceleration,the execution efficiency of the algorithm was further improved,ensuring both compression rate and speed,thus confirming the effectiveness of the algorithm.
基金supported by the National Natural Science Foundation of China(Grant Nos.62272253 and 62272252)the Fundamental Research Funds for the Central Universities.It was also supported in part by the China Scholarship Council(CSC202406200085)the Innovation Project of Guangxi Graduate Education(YCBZ2024005).
文摘As various types of data grow explosively,largescale data storage,backup,and transmission become challenging,which motivates many researchers to propose efficient universal compression algorithms for multi-source data.In recent years,due to the emergence of hardware acceleration devices such as GPUs,TPUs,DPUs,and FPGAs,the performance bottleneck of neural networks(NN)has been overcome,making NN-based compression algorithms increasingly practical and popular.However,the research survey for the NN-based universal lossless compressors has not been conducted yet,and there is also a lack of unified evaluation metrics.To address the above problems,in this paper,we present a holistic survey as well as benchmark evaluations.Specifically,i)we thoroughly investigate NNbased lossless universal compression algorithms toward multisource data and classify them into 3 types:static pre-training,adaptive,and semi-adaptive.ii)We unify 19 evaluation metrics to comprehensively assess the compression effect,resource consumption,and model performance of compressors.iii)We conduct experiments more than 4600 CPU/GPU hours to evaluate 17 state-of-the-art compressors on 28 real-world datasets across data types of text,images,videos,audio,etc.iv)We also summarize the strengths and drawbacks of NNbased lossless data compressors and discuss promising research directions.We summarize the results as the NN-based Lossless Compressors Benchmark(NNLCB,See fahaihi.github.io/NNLCB website),which will be updated and maintained continuously in the future.
基金This project is supported by National Natural Science Foundation of China(No.61202439)partly supported by Scientific Research Foundation of Hunan Provincial Education Department of China(No.16A008)partly supported by Hunan Key Laboratory of Smart Roadway and Cooperative Vehicle-Infrastructure Systems(No.2017TP1016).
文摘For protecting the copyright of a text and recovering its original content harmlessly,this paper proposes a novel reversible natural language watermarking method that combines arithmetic coding and synonym substitution operations.By analyzing relative frequencies of synonymous words,synonyms employed for carrying payload are quantized into an unbalanced and redundant binary sequence.The quantized binary sequence is compressed by adaptive binary arithmetic coding losslessly to provide a spare for accommodating additional data.Then,the compressed data appended with the watermark are embedded into the cover text via synonym substitutions in an invertible manner.On the receiver side,the watermark and compressed data can be extracted by decoding the values of synonyms in the watermarked text,as a result of which the original context can be perfectly recovered by decompressing the extracted compressed data and substituting the replaced synonyms with their original synonyms.Experimental results demonstrate that the proposed method can extract the watermark successfully and achieve a lossless recovery of the original text.Additionally,it achieves a high embedding capacity.
基金This work is supported by National Natural Science Foundation of China(No.51267005)Jiangxi Province University Visiting Scholar Special Funds for Young Teacher Development Plan(No.G201415,No.GJJ13350).
文摘The capacity and the scale of smart substation are expanding constantly,with the characteristics of information digitization and automation,leading to a quantitative trend of data.Aiming at the existing processing shortages in the big data processing,the query and analysis of smart substation,a data compression processing method is proposed for analyzing smart substation and Hive.Experimental results show that the compression ratio and query time of RCFile storage format are better than those of TextFile and SequenceFile.The query efficiency is improved for data compressed by Deflate,Gzip and Lzo compression formats.The results verify the correctness of adjacent speedup defined as the index of cluster efficiency.Results also prove that the method has a significant theoretical and practical value for big data processing of smart substation.
文摘We describe practical improvements for parallel BWT-based lossless compressors frequently utilized in modern day big data applications.We propose a clustering-based data permutation approach for improving compression ratio for data with significant alphabet variation along with a faster string sorting approach based on the application of the O(n)complexity counting sort with permutation reindexing.