Deep learning-based semantic communication has achieved remarkable progress with CNNs and Transformers.However,CNNs exhibit constrained performance in high-resolution image transmission,while Transformers incur high c...Deep learning-based semantic communication has achieved remarkable progress with CNNs and Transformers.However,CNNs exhibit constrained performance in high-resolution image transmission,while Transformers incur high computational cost due to quadratic complexity.Recently,VMamba,a novel state space model with linear complexity and exceptional long-range dependency modeling capabilities,has shown great potential in computer vision tasks.Inspired by this,we propose MNTSCC,an efficient VMamba-based nonlinear joint source-channel coding(JSCC)model for wireless image transmission.Specifically,MNTSCC comprises a VMamba-based nonlinear transform module,an MCAM entropy model,and a JSCC module.In the encoding stage,the input image is first encoded into a latent representation via the nonlinear transformation module,which is then processed by the MCAM for source distribution modeling.The JSCC module then optimizes transmission efficiency by adaptively assigning transmission rate to the latent representation according to the estimated entropy values.The proposedMCAMenhances the channel-wise autoregressive entropy model with attention mechanisms,which enables the entropy model to effectively capture both global and local information within latent features,thereby enabling more accurate entropy estimation and improved rate-distortion performance.Additionally,to further enhance the robustness of the system under varying signal-to-noise ratio(SNR)conditions,we incorporate SNR adaptive net(SAnet)into the JSCCmodule,which dynamically adjusts the encoding strategy by integrating SNRinformationwith latent features,thereby improving SNR adaptability.Experimental results across diverse resolution datasets demonstrate that the proposed method achieves superior image transmission performance compared to existing CNN-and Transformer-based semantic communication models,while maintaining competitive computational efficiency.In particular,under an Additive White Gaussian Noise(AWGN)channel with SNR=10 dB and a channel bandwidth ratio(CBR)of 1/16,MNTSCC consistently outperforms NTSCC,achieving a 1.72 dB Peak Signal-to-Noise Ratio(PSNR)gain on the Kodak24 dataset,0.79 dB on CLIC2022,and 2.54 dB on CIFAR-10,while reducing computational cost by 32.23%.The code is available at https://github.com/WanChen10/MNTSCC(accessed on 09 July 2025).展开更多
文摘Deep learning-based semantic communication has achieved remarkable progress with CNNs and Transformers.However,CNNs exhibit constrained performance in high-resolution image transmission,while Transformers incur high computational cost due to quadratic complexity.Recently,VMamba,a novel state space model with linear complexity and exceptional long-range dependency modeling capabilities,has shown great potential in computer vision tasks.Inspired by this,we propose MNTSCC,an efficient VMamba-based nonlinear joint source-channel coding(JSCC)model for wireless image transmission.Specifically,MNTSCC comprises a VMamba-based nonlinear transform module,an MCAM entropy model,and a JSCC module.In the encoding stage,the input image is first encoded into a latent representation via the nonlinear transformation module,which is then processed by the MCAM for source distribution modeling.The JSCC module then optimizes transmission efficiency by adaptively assigning transmission rate to the latent representation according to the estimated entropy values.The proposedMCAMenhances the channel-wise autoregressive entropy model with attention mechanisms,which enables the entropy model to effectively capture both global and local information within latent features,thereby enabling more accurate entropy estimation and improved rate-distortion performance.Additionally,to further enhance the robustness of the system under varying signal-to-noise ratio(SNR)conditions,we incorporate SNR adaptive net(SAnet)into the JSCCmodule,which dynamically adjusts the encoding strategy by integrating SNRinformationwith latent features,thereby improving SNR adaptability.Experimental results across diverse resolution datasets demonstrate that the proposed method achieves superior image transmission performance compared to existing CNN-and Transformer-based semantic communication models,while maintaining competitive computational efficiency.In particular,under an Additive White Gaussian Noise(AWGN)channel with SNR=10 dB and a channel bandwidth ratio(CBR)of 1/16,MNTSCC consistently outperforms NTSCC,achieving a 1.72 dB Peak Signal-to-Noise Ratio(PSNR)gain on the Kodak24 dataset,0.79 dB on CLIC2022,and 2.54 dB on CIFAR-10,while reducing computational cost by 32.23%.The code is available at https://github.com/WanChen10/MNTSCC(accessed on 09 July 2025).