Image dehazing aims to generate clear images critical for subsequent visual tasks.CNNs have made significant progress in the field of image dehazing.However,due to the inherent limitations of convolution operations,it...Image dehazing aims to generate clear images critical for subsequent visual tasks.CNNs have made significant progress in the field of image dehazing.However,due to the inherent limitations of convolution operations,it is challenging to effectively model global context and long-range spatial dependencies effectively.Although the Transformer can address this issue,it faces the challenge of excessive computational requirements.Therefore,we propose the FS-MSFormer network,an asymmetric encoder-decoder architecture that combines the advantages of CNNs and Transformers to improve dehazing performance.Specifically,the encoding process employs two branches formulti-scale feature extraction.One branch integrates an improved Transformer to enrich local and global contextual information while achieving linear complexity,and the other branch dynamically selects the most suitable frequency components in the frequency domain for enhancement.A single decoding branch is utilized to achieve feature recovery in the decoding process.After enhancing local and global features,they are fused with the encoded features,which reduces information loss and enhances the model’s robustness.A perceptual consistency loss function is also designed to minimize image color distortion.We conducted experiments on synthetic datasets SOTS-Indoor,Foggy Cityscapes,and the real-world dataset Dense-Haze,showing improved dehazing results.Compared with FSNet,our method has shown improvements of 0.95 dB in PSNR and 0.007 in SSIMon SOTS-Indoor dataset,and enhancements of 1.89 dB in PSNR and 0.0579 in SSIM on the Dense-Haze dataset demonstrate the effectiveness of our method.展开更多
文摘Image dehazing aims to generate clear images critical for subsequent visual tasks.CNNs have made significant progress in the field of image dehazing.However,due to the inherent limitations of convolution operations,it is challenging to effectively model global context and long-range spatial dependencies effectively.Although the Transformer can address this issue,it faces the challenge of excessive computational requirements.Therefore,we propose the FS-MSFormer network,an asymmetric encoder-decoder architecture that combines the advantages of CNNs and Transformers to improve dehazing performance.Specifically,the encoding process employs two branches formulti-scale feature extraction.One branch integrates an improved Transformer to enrich local and global contextual information while achieving linear complexity,and the other branch dynamically selects the most suitable frequency components in the frequency domain for enhancement.A single decoding branch is utilized to achieve feature recovery in the decoding process.After enhancing local and global features,they are fused with the encoded features,which reduces information loss and enhances the model’s robustness.A perceptual consistency loss function is also designed to minimize image color distortion.We conducted experiments on synthetic datasets SOTS-Indoor,Foggy Cityscapes,and the real-world dataset Dense-Haze,showing improved dehazing results.Compared with FSNet,our method has shown improvements of 0.95 dB in PSNR and 0.007 in SSIMon SOTS-Indoor dataset,and enhancements of 1.89 dB in PSNR and 0.0579 in SSIM on the Dense-Haze dataset demonstrate the effectiveness of our method.