Images with complementary spectral information can be recorded using image sensors that can identify visible and near-infrared spectrum.The fusion of visible and nearinfrared(NIR)aims to enhance the quality of images ...Images with complementary spectral information can be recorded using image sensors that can identify visible and near-infrared spectrum.The fusion of visible and nearinfrared(NIR)aims to enhance the quality of images acquired by video monitoring systems for the ease of user observation and data processing.Unfortunately,current fusion algorithms produce artefacts and colour distortion since they cannot make use of spectrum properties and are lacking in information complementarity.Therefore,an information complementarity fusion(ICF)model is designed based on physical signals.In order to separate high-frequency noise from important information in distinct frequency layers,the authors first extracted texture-scale and edge-scale layers using a two-scale filter.Second,the difference map between visible and near-infrared was filtered using the extended-DoG filter to produce the initial visible-NIR complementary weight map.Then,to generate a guide map,the near-infrared image with night adjustment was processed as well.The final complementarity weight map was subsequently derived via an arctanI function mapping using the guide map and the initial weight maps.Finally,fusion images were generated with the complementarity weight maps.The experimental results demonstrate that the proposed approach outperforms the state-of-the-art in both avoiding artificial colours as well as effectively utilising information complementarity.展开更多
Infrared and visible light image fusion technology integrates feature information from two different modalities into a fused image to obtain more comprehensive information.However,in low-light scenarios,the illuminati...Infrared and visible light image fusion technology integrates feature information from two different modalities into a fused image to obtain more comprehensive information.However,in low-light scenarios,the illumination degradation of visible light images makes it difficult for existing fusion methods to extract texture detail information from the scene.At this time,relying solely on the target saliency information provided by infrared images is far from sufficient.To address this challenge,this paper proposes a lightweight infrared and visible light image fusion method based on low-light enhancement,named LLE-Fuse.The method is based on the improvement of the MobileOne Block,using the Edge-MobileOne Block embedded with the Sobel operator to perform feature extraction and downsampling on the source images.The intermediate features at different scales obtained are then fused by a cross-modal attention fusion module.In addition,the Contrast Limited Adaptive Histogram Equalization(CLAHE)algorithm is used for image enhancement of both infrared and visible light images,guiding the network model to learn low-light enhancement capabilities through enhancement loss.Upon completion of network training,the Edge-MobileOne Block is optimized into a direct connection structure similar to MobileNetV1 through structural reparameterization,effectively reducing computational resource consumption.Finally,after extensive experimental comparisons,our method achieved improvements of 4.6%,40.5%,156.9%,9.2%,and 98.6%in the evaluation metrics Standard Deviation(SD),Visual Information Fidelity(VIF),Entropy(EN),and Spatial Frequency(SF),respectively,compared to the best results of the compared algorithms,while only being 1.5 ms/it slower in computation speed than the fastest method.展开更多
High-intensive underground mining has caused severe ground fissures,resulting in environmental degradation.Consequently,prompt detection is crucial to mitigate their environmental impact.However,the accurate segmentat...High-intensive underground mining has caused severe ground fissures,resulting in environmental degradation.Consequently,prompt detection is crucial to mitigate their environmental impact.However,the accurate segmentation of fissuresin complex and variable scenes of visible imagery is a challenging issue.Our method,DeepFissureNets-Infrared-Visible(DFN-IV),highlights the potential of incorporating visible images with infrared information for improved ground fissuresegmentation.DFNIV adopts a two-step process.First,a fusion network is trained with the dual adversarial learning strategy fuses infrared and visible imaging,providing an integrated representation of fissuretargets that combines the structural information with the textual details.Second,the fused images are processed by a fine-tunedsegmentation network,which lever-ages knowledge injection to learn the distinctive characteristics of fissuretargets effectively.Furthermore,an infrared-visible ground fissuredataset(IVGF)is built from an aerial investigation of the Daliuta Coal Mine.Extensive experiments reveal that our approach provides superior accuracy over single-modality image strategies employed in fivesegmentation models.Notably,DeeplabV3+tested with DFN-IV improves by 9.7%and 11.13%in pixel accuracy and Intersection over Union(IoU),respectively,compared to solely visible images.Moreover,our method surpasses six state-of-the-art image fusion methods,achieving a 5.28%improvement in pixel accuracy and a 1.57%increase in IoU,respectively,compared to the second-best effective method.In addition,ablation studies further validate the significanceof the dual adversarial learning module and the integrated knowledge injection strategy.By leveraging DFN-IV,we aim to quantify the impacts of mining-induced ground fissures,facilitating the implementation of intelligent safety measures.展开更多
Medical image fusion technology is crucial for improving the detection accuracy and treatment efficiency of diseases,but existing fusion methods have problems such as blurred texture details,low contrast,and inability...Medical image fusion technology is crucial for improving the detection accuracy and treatment efficiency of diseases,but existing fusion methods have problems such as blurred texture details,low contrast,and inability to fully extract fused image information.Therefore,a multimodal medical image fusion method based on mask optimization and parallel attention mechanism was proposed to address the aforementioned issues.Firstly,it converted the entire image into a binary mask,and constructed a contour feature map to maximize the contour feature information of the image and a triple path network for image texture detail feature extraction and optimization.Secondly,a contrast enhancement module and a detail preservation module were proposed to enhance the overall brightness and texture details of the image.Afterwards,a parallel attention mechanism was constructed using channel features and spatial feature changes to fuse images and enhance the salient information of the fused images.Finally,a decoupling network composed of residual networks was set up to optimize the information between the fused image and the source image so as to reduce information loss in the fused image.Compared with nine high-level methods proposed in recent years,the seven objective evaluation indicators of our method have improved by 6%−31%,indicating that this method can obtain fusion results with clearer texture details,higher contrast,and smaller pixel differences between the fused image and the source image.It is superior to other comparison algorithms in both subjective and objective indicators.展开更多
Multimodal image fusion plays an important role in image analysis and applications.Multimodal medical image fusion helps to combine contrast features from two or more input imaging modalities to represent fused inform...Multimodal image fusion plays an important role in image analysis and applications.Multimodal medical image fusion helps to combine contrast features from two or more input imaging modalities to represent fused information in a single image.One of the critical clinical applications of medical image fusion is to fuse anatomical and functional modalities for rapid diagnosis of malignant tissues.This paper proposes a multimodal medical image fusion network(MMIF-Net)based on multiscale hybrid attention.The method first decomposes the original image to obtain the low-rank and significant parts.Then,to utilize the features at different scales,we add amultiscalemechanism that uses three filters of different sizes to extract the features in the encoded network.Also,a hybrid attention module is introduced to obtain more image details.Finally,the fused images are reconstructed by decoding the network.We conducted experiments with clinical images from brain computed tomography/magnetic resonance.The experimental results show that the multimodal medical image fusion network method based on multiscale hybrid attention works better than other advanced fusion methods.展开更多
Visible-infrared object detection leverages the day-night stable object perception capability of infrared images to enhance detection robustness in low-light environments by fusing the complementary information of vis...Visible-infrared object detection leverages the day-night stable object perception capability of infrared images to enhance detection robustness in low-light environments by fusing the complementary information of visible and infrared images.However,the inherent differences in the imaging mechanisms of visible and infrared modalities make effective cross-modal fusion challenging.Furthermore,constrained by the physical characteristics of sensors and thermal diffusion effects,infrared images generally suffer from blurred object contours and missing details,making it difficult to extract object features effectively.To address these issues,we propose an infrared-visible image fusion network that realizesmultimodal information fusion of infrared and visible images through a carefully designedmultiscale fusion strategy.First,we design an adaptive gray-radiance enhancement(AGRE)module to strengthen the detail representation in infrared images,improving their usability in complex lighting scenarios.Next,we introduce a channelspatial feature interaction(CSFI)module,which achieves efficient complementarity between the RGB and infrared(IR)modalities via dynamic channel switching and a spatial attention mechanism.Finally,we propose a multi-scale enhanced cross-attention fusion(MSECA)module,which optimizes the fusion ofmulti-level features through dynamic convolution and gating mechanisms and captures long-range complementary relationships of cross-modal features on a global scale,thereby enhancing the expressiveness of the fused features.Experiments on the KAIST,M3FD,and FLIR datasets demonstrate that our method delivers outstanding performance in daytime and nighttime scenarios.On the KAIST dataset,the miss rate drops to 5.99%,and further to 4.26% in night scenes.On the FLIR and M3FD datasets,it achieves AP50 scores of 79.4% and 88.9%,respectively.展开更多
The goal of infrared and visible image fusion(IVIF)is to integrate the unique advantages of both modalities to achieve a more comprehensive understanding of a scene.However,existing methods struggle to effectively han...The goal of infrared and visible image fusion(IVIF)is to integrate the unique advantages of both modalities to achieve a more comprehensive understanding of a scene.However,existing methods struggle to effectively handle modal disparities,resulting in visual degradation of the details and prominent targets of the fused images.To address these challenges,we introduce Prompt Fusion,a prompt-based approach that harmoniously combines multi-modality images under the guidance of semantic prompts.Firstly,to better characterize the features of different modalities,a contourlet autoencoder is designed to separate and extract the high-/low-frequency components of different modalities,thereby improving the extraction of fine details and textures.We also introduce a prompt learning mechanism using positive and negative prompts,leveraging Vision-Language Models to improve the fusion model's understanding and identification of targets in multi-modality images,leading to improved performance in downstream tasks.Furthermore,we employ bi-level asymptotic convergence optimization.This approach simplifies the intricate non-singleton non-convex bi-level problem into a series of convergent and differentiable single optimization problems that can be effectively resolved through gradient descent.Our approach advances the state-of-the-art,delivering superior fusion quality and boosting the performance of related downstream tasks.Project page:https://github.com/hey-it-s-me/PromptFusion.展开更多
Infrared and visible image fusion technology integrates the thermal radiation information of infrared images with the texture details of visible images to generate more informative fused images.However,existing method...Infrared and visible image fusion technology integrates the thermal radiation information of infrared images with the texture details of visible images to generate more informative fused images.However,existing methods often fail to distinguish salient objects from background regions,leading to detail suppression in salient regions due to global fusion strategies.This study presents a mask-guided latent low-rank representation fusion method to address this issue.First,the GrabCut algorithm is employed to extract a saliency mask,distinguishing salient regions from background regions.Then,latent low-rank representation(LatLRR)is applied to extract deep image features,enhancing key information extraction.In the fusion stage,a weighted fusion strategy strengthens infrared thermal information and visible texture details in salient regions,while an average fusion strategy improves background smoothness and stability.Experimental results on the TNO dataset demonstrate that the proposed method achieves superior performance in SPI,MI,Qabf,PSNR,and EN metrics,effectively preserving salient target details while maintaining balanced background information.Compared to state-of-the-art fusion methods,our approach achieves more stable and visually consistent fusion results.The fusion code is available on GitHub at:https://github.com/joyzhen1/Image(accessed on 15 January 2025).展开更多
A novel image fusion network framework with an autonomous encoder and decoder is suggested to increase thevisual impression of fused images by improving the quality of infrared and visible light picture fusion. The ne...A novel image fusion network framework with an autonomous encoder and decoder is suggested to increase thevisual impression of fused images by improving the quality of infrared and visible light picture fusion. The networkcomprises an encoder module, fusion layer, decoder module, and edge improvementmodule. The encoder moduleutilizes an enhanced Inception module for shallow feature extraction, then combines Res2Net and Transformerto achieve deep-level co-extraction of local and global features from the original picture. An edge enhancementmodule (EEM) is created to extract significant edge features. A modal maximum difference fusion strategy isintroduced to enhance the adaptive representation of information in various regions of the source image, therebyenhancing the contrast of the fused image. The encoder and the EEM module extract features, which are thencombined in the fusion layer to create a fused picture using the decoder. Three datasets were chosen to test thealgorithmproposed in this paper. The results of the experiments demonstrate that the network effectively preservesbackground and detail information in both infrared and visible images, yielding superior outcomes in subjectiveand objective evaluations.展开更多
To address the issues of incomplete information,blurred details,loss of details,and insufficient contrast in infrared and visible image fusion,an image fusion algorithm based on a convolutional autoencoder is proposed...To address the issues of incomplete information,blurred details,loss of details,and insufficient contrast in infrared and visible image fusion,an image fusion algorithm based on a convolutional autoencoder is proposed.The region attention module is meant to extract the background feature map based on the distinct properties of the background feature map and the detail feature map.A multi-scale convolution attention module is suggested to enhance the communication of feature information.At the same time,the feature transformation module is introduced to learn more robust feature representations,aiming to preserve the integrity of image information.This study uses three available datasets from TNO,FLIR,and NIR to perform thorough quantitative and qualitative trials with five additional algorithms.The methods are assessed based on four indicators:information entropy(EN),standard deviation(SD),spatial frequency(SF),and average gradient(AG).Object detection experiments were done on the M3FD dataset to further verify the algorithm’s performance in comparison with five other algorithms.The algorithm’s accuracy was evaluated using the mean average precision at a threshold of 0.5(mAP@0.5)index.Comprehensive experimental findings show that CAEFusion performs well in subjective visual and objective evaluation criteria and has promising potential in downstream object detection tasks.展开更多
Multimodal medical image fusion can help physicians provide more accurate treatment plans for patients, as unimodal images provide limited valid information. To address the insufficient ability of traditional medical ...Multimodal medical image fusion can help physicians provide more accurate treatment plans for patients, as unimodal images provide limited valid information. To address the insufficient ability of traditional medical image fusion solutions to protect image details and significant information, a new multimodality medical image fusion method(NSST-PAPCNNLatLRR) is proposed in this paper. Firstly, the high and low-frequency sub-band coefficients are obtained by decomposing the source image using NSST. Then, the latent low-rank representation algorithm is used to process the low-frequency sub-band coefficients;An improved PAPCNN algorithm is also proposed for the fusion of high-frequency sub-band coefficients. The improved PAPCNN model was based on the automatic setting of the parameters, and the optimal method was configured for the time decay factor αe. The experimental results show that, in comparison with the five mainstream fusion algorithms, the new algorithm has significantly improved the visual effect over the comparison algorithm,enhanced the ability to characterize important information in images, and further improved the ability to protect the detailed information;the new algorithm has achieved at least four firsts in six objective indexes.展开更多
Mangroves are indispensable to coastlines,maintaining biodiversity,and mitigating climate change.Therefore,improving the accuracy of mangrove information identification is crucial for their ecological protection.Aimin...Mangroves are indispensable to coastlines,maintaining biodiversity,and mitigating climate change.Therefore,improving the accuracy of mangrove information identification is crucial for their ecological protection.Aiming at the limited morphological information of synthetic aperture radar(SAR)images,which is greatly interfered by noise,and the susceptibility of optical images to weather and lighting conditions,this paper proposes a pixel-level weighted fusion method for SAR and optical images.Image fusion enhanced the target features and made mangrove monitoring more comprehensive and accurate.To address the problem of high similarity between mangrove forests and other forests,this paper is based on the U-Net convolutional neural network,and an attention mechanism is added in the feature extraction stage to make the model pay more attention to the mangrove vegetation area in the image.In order to accelerate the convergence and normalize the input,batch normalization(BN)layer and Dropout layer are added after each convolutional layer.Since mangroves are a minority class in the image,an improved cross-entropy loss function is introduced in this paper to improve the model’s ability to recognize mangroves.The AttU-Net model for mangrove recognition in high similarity environments is thus constructed based on the fused images.Through comparison experiments,the overall accuracy of the improved U-Net model trained from the fused images to recognize the predicted regions is significantly improved.Based on the fused images,the recognition results of the AttU-Net model proposed in this paper are compared with its benchmark model,U-Net,and the Dense-Net,Res-Net,and Seg-Net methods.The AttU-Net model captured mangroves’complex structures and textural features in images more effectively.The average OA,F1-score,and Kappa coefficient in the four tested regions were 94.406%,90.006%,and 84.045%,which were significantly higher than several other methods.This method can provide some technical support for the monitoring and protection of mangrove ecosystems.展开更多
Multimodal medical image fusion has attained immense popularity in recent years due to its robust technology for clinical diagnosis.It fuses multiple images into a single image to improve the quality of images by reta...Multimodal medical image fusion has attained immense popularity in recent years due to its robust technology for clinical diagnosis.It fuses multiple images into a single image to improve the quality of images by retaining significant information and aiding diagnostic practitioners in diagnosing and treating many diseases.However,recent image fusion techniques have encountered several challenges,including fusion artifacts,algorithm complexity,and high computing costs.To solve these problems,this study presents a novel medical image fusion strategy by combining the benefits of pixel significance with edge-preserving processing to achieve the best fusion performance.First,the method employs a cross-bilateral filter(CBF)that utilizes one image to determine the kernel and the other for filtering,and vice versa,by considering both geometric closeness and the gray-level similarities of neighboring pixels of the images without smoothing edges.The outputs of CBF are then subtracted from the original images to obtain detailed images.It further proposes to use edge-preserving processing that combines linear lowpass filtering with a non-linear technique that enables the selection of relevant regions in detailed images while maintaining structural properties.These regions are selected using morphologically processed linear filter residuals to identify the significant regions with high-amplitude edges and adequate size.The outputs of low-pass filtering are fused with meaningfully restored regions to reconstruct the original shape of the edges.In addition,weight computations are performed using these reconstructed images,and these weights are then fused with the original input images to produce a final fusion result by estimating the strength of horizontal and vertical details.Numerous standard quality evaluation metrics with complementary properties are used for comparison with existing,well-known algorithms objectively to validate the fusion results.Experimental results from the proposed research article exhibit superior performance compared to other competing techniques in the case of both qualitative and quantitative evaluation.In addition,the proposed method advocates less computational complexity and execution time while improving diagnostic computing accuracy.Nevertheless,due to the lower complexity of the fusion algorithm,the efficiency of fusion methods is high in practical applications.The results reveal that the proposed method exceeds the latest state-of-the-art methods in terms of providing detailed information,edge contour,and overall contrast.展开更多
Recently,there have been several uses for digital image processing.Image fusion has become a prominent application in the domain of imaging processing.To create one final image that provesmore informative and helpful ...Recently,there have been several uses for digital image processing.Image fusion has become a prominent application in the domain of imaging processing.To create one final image that provesmore informative and helpful compared to the original input images,image fusion merges two or more initial images of the same item.Image fusion aims to produce,enhance,and transform significant elements of the source images into combined images for the sake of human visual perception.Image fusion is commonly employed for feature extraction in smart robots,clinical imaging,audiovisual camera integration,manufacturing process monitoring,electronic circuit design,advanced device diagnostics,and intelligent assembly line robots,with image quality varying depending on application.The research paper presents various methods for merging images in spatial and frequency domains,including a blend of stable and curvelet transformations,everageMax-Min,weighted principal component analysis(PCA),HIS(Hue,Intensity,Saturation),wavelet transform,discrete cosine transform(DCT),dual-tree Complex Wavelet Transform(CWT),and multiple wavelet transform.Image fusion methods integrate data from several source images of an identical target,thereby enhancing information in an extremely efficient manner.More precisely,in imaging techniques,the depth of field constraint precludes images from focusing on every object,leading to the exclusion of certain characteristics.To tackle thess challanges,a very efficient multi-focus wavelet decomposition and recompositionmethod is proposed.The use of these wavelet decomposition and recomposition techniques enables this method to make use of existing optimized wavelet code and filter choice.The simulated outcomes provide evidence that the suggested approach initially extracts particular characteristics from images in order to accurately reflect the level of clarity portrayed in the original images.This study enhances the performance of the eXtreme Gradient Boosting(XGBoost)algorithm in detecting brain malignancies with greater precision through the integration of computational image analysis and feature selection.The performance of images is improved by segmenting them employing the K-Means algorithm.The segmentation method aids in identifying specific regions of interest,using Particle Swarm Optimization(PCA)for trait selection and XGBoost for data classification.Extensive trials confirm the model’s exceptional visual performance,achieving an accuracy of up to 97.067%and providing good objective indicators.展开更多
The growth optimizer(GO)is an innovative and robust metaheuristic optimization algorithm designed to simulate the learning and reflective processes experienced by individuals as they mature within the social environme...The growth optimizer(GO)is an innovative and robust metaheuristic optimization algorithm designed to simulate the learning and reflective processes experienced by individuals as they mature within the social environment.However,the original GO algorithm is constrained by two significant limitations:slow convergence and high mem-ory requirements.This restricts its application to large-scale and complex problems.To address these problems,this paper proposes an innovative enhanced growth optimizer(eGO).In contrast to conventional population-based optimization algorithms,the eGO algorithm utilizes a probabilistic model,designated as the virtual population,which is capable of accurately replicating the behavior of actual populations while simultaneously reducing memory consumption.Furthermore,this paper introduces the Lévy flight mechanism,which enhances the diversity and flexibility of the search process,thus further improving the algorithm’s global search capability and convergence speed.To verify the effectiveness of the eGO algorithm,a series of experiments were conducted using the CEC2014 and CEC2017 test sets.The results demonstrate that the eGO algorithm outperforms the original GO algorithm and other compact algorithms regarding memory usage and convergence speed,thus exhibiting powerful optimization capabilities.Finally,the eGO algorithm was applied to image fusion.Through a comparative analysis with the existing PSO and GO algorithms and other compact algorithms,the eGO algorithm demonstrates superior performance in image fusion.展开更多
In view of the problem that current mainstream fusion method of infrared polarization image—Multiscale Geometry Analysis method only focuses on a certain characteristic to image representation.And spatial domain fusi...In view of the problem that current mainstream fusion method of infrared polarization image—Multiscale Geometry Analysis method only focuses on a certain characteristic to image representation.And spatial domain fusion method,Principal Component Analysis(PCA)method has the shortcoming of losing small target,this paper presents a new fusion method of infrared polarization images based on combination of Nonsubsampled Shearlet Transformation(NSST)and improved PCA.This method can make full use of the effectiveness to image details expressed by NSST and the characteristics that PCA can highlight the main features of images.The combination of the two methods can integrate the complementary features of themselves to retain features of targets and image details fully.Firstly,intensity and polarization images are decomposed into low frequency and high frequency components with different directions by NSST.Secondly,the low frequency components are fused with improved PCA,while the high frequency components are fused by joint decision making rule with local energy and local variance.Finally,the fused image is reconstructed with the inverse NSST to obtain the final fused image of infrared polarization.The experiment results show that the method proposed has higher advantages than other methods in terms of detail preservation and visual effect.展开更多
Aim To fuse the fluorescence image and transmission image of a cell into a single image containing more information than any of the individual image. Methods Image fusion technology was applied to biological cell imag...Aim To fuse the fluorescence image and transmission image of a cell into a single image containing more information than any of the individual image. Methods Image fusion technology was applied to biological cell imaging processing. It could match the images and improve the confidence and spatial resolution of the images. Using two algorithms, double thresholds algorithm and denoising algorithm based on wavelet transform,the fluorescence image and transmission image of a Cell were merged into a composite image. Results and Conclusion The position of fluorescence and the structure of cell can be displyed in the composite image. The signal-to-noise ratio of the exultant image is improved to a large extent. The algorithms are not only useful to investigate the fluorescence and transmission images, but also suitable to observing two or more fluoascent label proes in a single cell.展开更多
This study proposes a novel general image fusion framework based on cross-domain long-range learning and Swin Transformer,termed as SwinFusion.On the one hand,an attention-guided cross-domain module is devised to achi...This study proposes a novel general image fusion framework based on cross-domain long-range learning and Swin Transformer,termed as SwinFusion.On the one hand,an attention-guided cross-domain module is devised to achieve sufficient integration of complementary information and global interaction.More specifically,the proposed method involves an intra-domain fusion unit based on self-attention and an interdomain fusion unit based on cross-attention,which mine and integrate long dependencies within the same domain and across domains.Through long-range dependency modeling,the network is able to fully implement domain-specific information extraction and cross-domain complementary information integration as well as maintaining the appropriate apparent intensity from a global perspective.In particular,we introduce the shifted windows mechanism into the self-attention and cross-attention,which allows our model to receive images with arbitrary sizes.On the other hand,the multi-scene image fusion problems are generalized to a unified framework with structure maintenance,detail preservation,and proper intensity control.Moreover,an elaborate loss function,consisting of SSIM loss,texture loss,and intensity loss,drives the network to preserve abundant texture details and structural information,as well as presenting optimal apparent intensity.Extensive experiments on both multi-modal image fusion and digital photography image fusion demonstrate the superiority of our SwinFusion compared to the state-of-theart unified image fusion algorithms and task-specific alternatives.Implementation code and pre-trained weights can be accessed at https://github.com/Linfeng-Tang/SwinFusion.展开更多
In order to enhance the contrast of the fused image and reduce the loss of fine details in the process of image fusion,a novel fusion algorithm of infrared and visible images is proposed.First of all,regions of intere...In order to enhance the contrast of the fused image and reduce the loss of fine details in the process of image fusion,a novel fusion algorithm of infrared and visible images is proposed.First of all,regions of interest(RoIs)are detected in two original images by using saliency map.Then,nonsubsampled contourlet transform(NSCT)on both the infrared image and the visible image is performed to get a low-frequency sub-band and a certain amount of high-frequency sub-bands.Subsequently,the coefcients of all sub-bands are classified into four categories based on the result of RoI detection:the region of interest in the low-frequency sub-band(LSRoI),the region of interest in the high-frequency sub-band(HSRoI),the region of non-interest in the low-frequency sub-band(LSNRoI)and the region of non-interest in the high-frequency sub-band(HSNRoI).Fusion rules are customized for each kind of coefcients and fused image is achieved by performing the inverse NSCT to the fused coefcients.Experimental results show that the fusion scheme proposed in this paper achieves better efect than the other fusion algorithms both in visual efect and quantitative metrics.展开更多
A new method for image fusion based on Contourlet transform and cycle spinning is proposed. Contourlet transform is a flexible multiresolution, local and directional image expansion, also provids a sparse representati...A new method for image fusion based on Contourlet transform and cycle spinning is proposed. Contourlet transform is a flexible multiresolution, local and directional image expansion, also provids a sparse representation for two-dimensional piece-wise smooth signals resembling images. Due to lack of translation invariance property in Contourlet transform, the conventional image fusion algorithm based on Contourlet transform introduces many artifacts. According to the theory of cycle spinning applied to image denoising, an invariance transform can reduce the artifacts through a series of processing efficiently. So the technology of cycle spinning is introduced to develop the translation invariant Contourlet fusion algorithm. This method can effectively eliminate the Gibbs-like phenomenon, extract the characteristics of original images, and preserve more important information. Experimental results show the simplicity and effectiveness of the method and its advantages over the conventional approaches.展开更多
基金supports in part by the Natural Science Foundation of China(NSFC)under contract No.62171253the Young Elite Scientists Sponsorship Program by CAST under program No.2022QNRC001,as well as the Fundamental Research Funds for the Central Universities.
文摘Images with complementary spectral information can be recorded using image sensors that can identify visible and near-infrared spectrum.The fusion of visible and nearinfrared(NIR)aims to enhance the quality of images acquired by video monitoring systems for the ease of user observation and data processing.Unfortunately,current fusion algorithms produce artefacts and colour distortion since they cannot make use of spectrum properties and are lacking in information complementarity.Therefore,an information complementarity fusion(ICF)model is designed based on physical signals.In order to separate high-frequency noise from important information in distinct frequency layers,the authors first extracted texture-scale and edge-scale layers using a two-scale filter.Second,the difference map between visible and near-infrared was filtered using the extended-DoG filter to produce the initial visible-NIR complementary weight map.Then,to generate a guide map,the near-infrared image with night adjustment was processed as well.The final complementarity weight map was subsequently derived via an arctanI function mapping using the guide map and the initial weight maps.Finally,fusion images were generated with the complementarity weight maps.The experimental results demonstrate that the proposed approach outperforms the state-of-the-art in both avoiding artificial colours as well as effectively utilising information complementarity.
基金This researchwas Sponsored by Xinjiang Uygur Autonomous Region Tianshan Talent Programme Project(2023TCLJ02)Natural Science Foundation of Xinjiang Uygur Autonomous Region(2022D01C349).
文摘Infrared and visible light image fusion technology integrates feature information from two different modalities into a fused image to obtain more comprehensive information.However,in low-light scenarios,the illumination degradation of visible light images makes it difficult for existing fusion methods to extract texture detail information from the scene.At this time,relying solely on the target saliency information provided by infrared images is far from sufficient.To address this challenge,this paper proposes a lightweight infrared and visible light image fusion method based on low-light enhancement,named LLE-Fuse.The method is based on the improvement of the MobileOne Block,using the Edge-MobileOne Block embedded with the Sobel operator to perform feature extraction and downsampling on the source images.The intermediate features at different scales obtained are then fused by a cross-modal attention fusion module.In addition,the Contrast Limited Adaptive Histogram Equalization(CLAHE)algorithm is used for image enhancement of both infrared and visible light images,guiding the network model to learn low-light enhancement capabilities through enhancement loss.Upon completion of network training,the Edge-MobileOne Block is optimized into a direct connection structure similar to MobileNetV1 through structural reparameterization,effectively reducing computational resource consumption.Finally,after extensive experimental comparisons,our method achieved improvements of 4.6%,40.5%,156.9%,9.2%,and 98.6%in the evaluation metrics Standard Deviation(SD),Visual Information Fidelity(VIF),Entropy(EN),and Spatial Frequency(SF),respectively,compared to the best results of the compared algorithms,while only being 1.5 ms/it slower in computation speed than the fastest method.
基金supported by the National Science Fund of China(Grant No.52225402)Fund of Inner Mongolia Research Institute,China University of Mining and Technology(Beijing)(Grant No.IMRI23003).
文摘High-intensive underground mining has caused severe ground fissures,resulting in environmental degradation.Consequently,prompt detection is crucial to mitigate their environmental impact.However,the accurate segmentation of fissuresin complex and variable scenes of visible imagery is a challenging issue.Our method,DeepFissureNets-Infrared-Visible(DFN-IV),highlights the potential of incorporating visible images with infrared information for improved ground fissuresegmentation.DFNIV adopts a two-step process.First,a fusion network is trained with the dual adversarial learning strategy fuses infrared and visible imaging,providing an integrated representation of fissuretargets that combines the structural information with the textual details.Second,the fused images are processed by a fine-tunedsegmentation network,which lever-ages knowledge injection to learn the distinctive characteristics of fissuretargets effectively.Furthermore,an infrared-visible ground fissuredataset(IVGF)is built from an aerial investigation of the Daliuta Coal Mine.Extensive experiments reveal that our approach provides superior accuracy over single-modality image strategies employed in fivesegmentation models.Notably,DeeplabV3+tested with DFN-IV improves by 9.7%and 11.13%in pixel accuracy and Intersection over Union(IoU),respectively,compared to solely visible images.Moreover,our method surpasses six state-of-the-art image fusion methods,achieving a 5.28%improvement in pixel accuracy and a 1.57%increase in IoU,respectively,compared to the second-best effective method.In addition,ablation studies further validate the significanceof the dual adversarial learning module and the integrated knowledge injection strategy.By leveraging DFN-IV,we aim to quantify the impacts of mining-induced ground fissures,facilitating the implementation of intelligent safety measures.
基金supported by Gansu Natural Science Foundation Programme(No.24JRRA231)National Natural Science Foundation of China(No.62061023)Gansu Provincial Education,Science and Technology Innovation and Industry(No.2021CYZC-04)。
文摘Medical image fusion technology is crucial for improving the detection accuracy and treatment efficiency of diseases,but existing fusion methods have problems such as blurred texture details,low contrast,and inability to fully extract fused image information.Therefore,a multimodal medical image fusion method based on mask optimization and parallel attention mechanism was proposed to address the aforementioned issues.Firstly,it converted the entire image into a binary mask,and constructed a contour feature map to maximize the contour feature information of the image and a triple path network for image texture detail feature extraction and optimization.Secondly,a contrast enhancement module and a detail preservation module were proposed to enhance the overall brightness and texture details of the image.Afterwards,a parallel attention mechanism was constructed using channel features and spatial feature changes to fuse images and enhance the salient information of the fused images.Finally,a decoupling network composed of residual networks was set up to optimize the information between the fused image and the source image so as to reduce information loss in the fused image.Compared with nine high-level methods proposed in recent years,the seven objective evaluation indicators of our method have improved by 6%−31%,indicating that this method can obtain fusion results with clearer texture details,higher contrast,and smaller pixel differences between the fused image and the source image.It is superior to other comparison algorithms in both subjective and objective indicators.
基金supported by Qingdao Huanghai University School-Level ScientificResearch Project(2023KJ14)Undergraduate Teaching Reform Research Project of Shandong Provincial Department of Education(M2022328)+1 种基金National Natural Science Foundation of China under Grant(42472324)Qingdao Postdoctoral Foundation under Grant(QDBSH202402049).
文摘Multimodal image fusion plays an important role in image analysis and applications.Multimodal medical image fusion helps to combine contrast features from two or more input imaging modalities to represent fused information in a single image.One of the critical clinical applications of medical image fusion is to fuse anatomical and functional modalities for rapid diagnosis of malignant tissues.This paper proposes a multimodal medical image fusion network(MMIF-Net)based on multiscale hybrid attention.The method first decomposes the original image to obtain the low-rank and significant parts.Then,to utilize the features at different scales,we add amultiscalemechanism that uses three filters of different sizes to extract the features in the encoded network.Also,a hybrid attention module is introduced to obtain more image details.Finally,the fused images are reconstructed by decoding the network.We conducted experiments with clinical images from brain computed tomography/magnetic resonance.The experimental results show that the multimodal medical image fusion network method based on multiscale hybrid attention works better than other advanced fusion methods.
基金supported by the National Natural Science Foundation of China(Grant No.62302086)the Natural Science Foundation of Liaoning Province(Grant No.2023-MSBA-070)the Fundamental Research Funds for the Central Universities(Grant No.N2317005).
文摘Visible-infrared object detection leverages the day-night stable object perception capability of infrared images to enhance detection robustness in low-light environments by fusing the complementary information of visible and infrared images.However,the inherent differences in the imaging mechanisms of visible and infrared modalities make effective cross-modal fusion challenging.Furthermore,constrained by the physical characteristics of sensors and thermal diffusion effects,infrared images generally suffer from blurred object contours and missing details,making it difficult to extract object features effectively.To address these issues,we propose an infrared-visible image fusion network that realizesmultimodal information fusion of infrared and visible images through a carefully designedmultiscale fusion strategy.First,we design an adaptive gray-radiance enhancement(AGRE)module to strengthen the detail representation in infrared images,improving their usability in complex lighting scenarios.Next,we introduce a channelspatial feature interaction(CSFI)module,which achieves efficient complementarity between the RGB and infrared(IR)modalities via dynamic channel switching and a spatial attention mechanism.Finally,we propose a multi-scale enhanced cross-attention fusion(MSECA)module,which optimizes the fusion ofmulti-level features through dynamic convolution and gating mechanisms and captures long-range complementary relationships of cross-modal features on a global scale,thereby enhancing the expressiveness of the fused features.Experiments on the KAIST,M3FD,and FLIR datasets demonstrate that our method delivers outstanding performance in daytime and nighttime scenarios.On the KAIST dataset,the miss rate drops to 5.99%,and further to 4.26% in night scenes.On the FLIR and M3FD datasets,it achieves AP50 scores of 79.4% and 88.9%,respectively.
基金partially supported by China Postdoctoral Science Foundation(2023M730741)the National Natural Science Foundation of China(U22B2052,52102432,52202452,62372080,62302078)
文摘The goal of infrared and visible image fusion(IVIF)is to integrate the unique advantages of both modalities to achieve a more comprehensive understanding of a scene.However,existing methods struggle to effectively handle modal disparities,resulting in visual degradation of the details and prominent targets of the fused images.To address these challenges,we introduce Prompt Fusion,a prompt-based approach that harmoniously combines multi-modality images under the guidance of semantic prompts.Firstly,to better characterize the features of different modalities,a contourlet autoencoder is designed to separate and extract the high-/low-frequency components of different modalities,thereby improving the extraction of fine details and textures.We also introduce a prompt learning mechanism using positive and negative prompts,leveraging Vision-Language Models to improve the fusion model's understanding and identification of targets in multi-modality images,leading to improved performance in downstream tasks.Furthermore,we employ bi-level asymptotic convergence optimization.This approach simplifies the intricate non-singleton non-convex bi-level problem into a series of convergent and differentiable single optimization problems that can be effectively resolved through gradient descent.Our approach advances the state-of-the-art,delivering superior fusion quality and boosting the performance of related downstream tasks.Project page:https://github.com/hey-it-s-me/PromptFusion.
基金supported by Universiti Teknologi MARA through UiTM MyRA Research Grant,600-RMC 5/3/GPM(053/2022).
文摘Infrared and visible image fusion technology integrates the thermal radiation information of infrared images with the texture details of visible images to generate more informative fused images.However,existing methods often fail to distinguish salient objects from background regions,leading to detail suppression in salient regions due to global fusion strategies.This study presents a mask-guided latent low-rank representation fusion method to address this issue.First,the GrabCut algorithm is employed to extract a saliency mask,distinguishing salient regions from background regions.Then,latent low-rank representation(LatLRR)is applied to extract deep image features,enhancing key information extraction.In the fusion stage,a weighted fusion strategy strengthens infrared thermal information and visible texture details in salient regions,while an average fusion strategy improves background smoothness and stability.Experimental results on the TNO dataset demonstrate that the proposed method achieves superior performance in SPI,MI,Qabf,PSNR,and EN metrics,effectively preserving salient target details while maintaining balanced background information.Compared to state-of-the-art fusion methods,our approach achieves more stable and visually consistent fusion results.The fusion code is available on GitHub at:https://github.com/joyzhen1/Image(accessed on 15 January 2025).
文摘A novel image fusion network framework with an autonomous encoder and decoder is suggested to increase thevisual impression of fused images by improving the quality of infrared and visible light picture fusion. The networkcomprises an encoder module, fusion layer, decoder module, and edge improvementmodule. The encoder moduleutilizes an enhanced Inception module for shallow feature extraction, then combines Res2Net and Transformerto achieve deep-level co-extraction of local and global features from the original picture. An edge enhancementmodule (EEM) is created to extract significant edge features. A modal maximum difference fusion strategy isintroduced to enhance the adaptive representation of information in various regions of the source image, therebyenhancing the contrast of the fused image. The encoder and the EEM module extract features, which are thencombined in the fusion layer to create a fused picture using the decoder. Three datasets were chosen to test thealgorithmproposed in this paper. The results of the experiments demonstrate that the network effectively preservesbackground and detail information in both infrared and visible images, yielding superior outcomes in subjectiveand objective evaluations.
文摘To address the issues of incomplete information,blurred details,loss of details,and insufficient contrast in infrared and visible image fusion,an image fusion algorithm based on a convolutional autoencoder is proposed.The region attention module is meant to extract the background feature map based on the distinct properties of the background feature map and the detail feature map.A multi-scale convolution attention module is suggested to enhance the communication of feature information.At the same time,the feature transformation module is introduced to learn more robust feature representations,aiming to preserve the integrity of image information.This study uses three available datasets from TNO,FLIR,and NIR to perform thorough quantitative and qualitative trials with five additional algorithms.The methods are assessed based on four indicators:information entropy(EN),standard deviation(SD),spatial frequency(SF),and average gradient(AG).Object detection experiments were done on the M3FD dataset to further verify the algorithm’s performance in comparison with five other algorithms.The algorithm’s accuracy was evaluated using the mean average precision at a threshold of 0.5(mAP@0.5)index.Comprehensive experimental findings show that CAEFusion performs well in subjective visual and objective evaluation criteria and has promising potential in downstream object detection tasks.
基金funded by the National Natural Science Foundation of China,grant number 61302188.
文摘Multimodal medical image fusion can help physicians provide more accurate treatment plans for patients, as unimodal images provide limited valid information. To address the insufficient ability of traditional medical image fusion solutions to protect image details and significant information, a new multimodality medical image fusion method(NSST-PAPCNNLatLRR) is proposed in this paper. Firstly, the high and low-frequency sub-band coefficients are obtained by decomposing the source image using NSST. Then, the latent low-rank representation algorithm is used to process the low-frequency sub-band coefficients;An improved PAPCNN algorithm is also proposed for the fusion of high-frequency sub-band coefficients. The improved PAPCNN model was based on the automatic setting of the parameters, and the optimal method was configured for the time decay factor αe. The experimental results show that, in comparison with the five mainstream fusion algorithms, the new algorithm has significantly improved the visual effect over the comparison algorithm,enhanced the ability to characterize important information in images, and further improved the ability to protect the detailed information;the new algorithm has achieved at least four firsts in six objective indexes.
基金The Key R&D Project of Hainan Province under contract No.ZDYF2023SHFZ097the National Natural Science Foundation of China under contract No.42376180。
文摘Mangroves are indispensable to coastlines,maintaining biodiversity,and mitigating climate change.Therefore,improving the accuracy of mangrove information identification is crucial for their ecological protection.Aiming at the limited morphological information of synthetic aperture radar(SAR)images,which is greatly interfered by noise,and the susceptibility of optical images to weather and lighting conditions,this paper proposes a pixel-level weighted fusion method for SAR and optical images.Image fusion enhanced the target features and made mangrove monitoring more comprehensive and accurate.To address the problem of high similarity between mangrove forests and other forests,this paper is based on the U-Net convolutional neural network,and an attention mechanism is added in the feature extraction stage to make the model pay more attention to the mangrove vegetation area in the image.In order to accelerate the convergence and normalize the input,batch normalization(BN)layer and Dropout layer are added after each convolutional layer.Since mangroves are a minority class in the image,an improved cross-entropy loss function is introduced in this paper to improve the model’s ability to recognize mangroves.The AttU-Net model for mangrove recognition in high similarity environments is thus constructed based on the fused images.Through comparison experiments,the overall accuracy of the improved U-Net model trained from the fused images to recognize the predicted regions is significantly improved.Based on the fused images,the recognition results of the AttU-Net model proposed in this paper are compared with its benchmark model,U-Net,and the Dense-Net,Res-Net,and Seg-Net methods.The AttU-Net model captured mangroves’complex structures and textural features in images more effectively.The average OA,F1-score,and Kappa coefficient in the four tested regions were 94.406%,90.006%,and 84.045%,which were significantly higher than several other methods.This method can provide some technical support for the monitoring and protection of mangrove ecosystems.
文摘Multimodal medical image fusion has attained immense popularity in recent years due to its robust technology for clinical diagnosis.It fuses multiple images into a single image to improve the quality of images by retaining significant information and aiding diagnostic practitioners in diagnosing and treating many diseases.However,recent image fusion techniques have encountered several challenges,including fusion artifacts,algorithm complexity,and high computing costs.To solve these problems,this study presents a novel medical image fusion strategy by combining the benefits of pixel significance with edge-preserving processing to achieve the best fusion performance.First,the method employs a cross-bilateral filter(CBF)that utilizes one image to determine the kernel and the other for filtering,and vice versa,by considering both geometric closeness and the gray-level similarities of neighboring pixels of the images without smoothing edges.The outputs of CBF are then subtracted from the original images to obtain detailed images.It further proposes to use edge-preserving processing that combines linear lowpass filtering with a non-linear technique that enables the selection of relevant regions in detailed images while maintaining structural properties.These regions are selected using morphologically processed linear filter residuals to identify the significant regions with high-amplitude edges and adequate size.The outputs of low-pass filtering are fused with meaningfully restored regions to reconstruct the original shape of the edges.In addition,weight computations are performed using these reconstructed images,and these weights are then fused with the original input images to produce a final fusion result by estimating the strength of horizontal and vertical details.Numerous standard quality evaluation metrics with complementary properties are used for comparison with existing,well-known algorithms objectively to validate the fusion results.Experimental results from the proposed research article exhibit superior performance compared to other competing techniques in the case of both qualitative and quantitative evaluation.In addition,the proposed method advocates less computational complexity and execution time while improving diagnostic computing accuracy.Nevertheless,due to the lower complexity of the fusion algorithm,the efficiency of fusion methods is high in practical applications.The results reveal that the proposed method exceeds the latest state-of-the-art methods in terms of providing detailed information,edge contour,and overall contrast.
基金Princess Nourah bint Abdulrahman University and Researchers Supporting Project Number(PNURSP2024R346)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Recently,there have been several uses for digital image processing.Image fusion has become a prominent application in the domain of imaging processing.To create one final image that provesmore informative and helpful compared to the original input images,image fusion merges two or more initial images of the same item.Image fusion aims to produce,enhance,and transform significant elements of the source images into combined images for the sake of human visual perception.Image fusion is commonly employed for feature extraction in smart robots,clinical imaging,audiovisual camera integration,manufacturing process monitoring,electronic circuit design,advanced device diagnostics,and intelligent assembly line robots,with image quality varying depending on application.The research paper presents various methods for merging images in spatial and frequency domains,including a blend of stable and curvelet transformations,everageMax-Min,weighted principal component analysis(PCA),HIS(Hue,Intensity,Saturation),wavelet transform,discrete cosine transform(DCT),dual-tree Complex Wavelet Transform(CWT),and multiple wavelet transform.Image fusion methods integrate data from several source images of an identical target,thereby enhancing information in an extremely efficient manner.More precisely,in imaging techniques,the depth of field constraint precludes images from focusing on every object,leading to the exclusion of certain characteristics.To tackle thess challanges,a very efficient multi-focus wavelet decomposition and recompositionmethod is proposed.The use of these wavelet decomposition and recomposition techniques enables this method to make use of existing optimized wavelet code and filter choice.The simulated outcomes provide evidence that the suggested approach initially extracts particular characteristics from images in order to accurately reflect the level of clarity portrayed in the original images.This study enhances the performance of the eXtreme Gradient Boosting(XGBoost)algorithm in detecting brain malignancies with greater precision through the integration of computational image analysis and feature selection.The performance of images is improved by segmenting them employing the K-Means algorithm.The segmentation method aids in identifying specific regions of interest,using Particle Swarm Optimization(PCA)for trait selection and XGBoost for data classification.Extensive trials confirm the model’s exceptional visual performance,achieving an accuracy of up to 97.067%and providing good objective indicators.
文摘The growth optimizer(GO)is an innovative and robust metaheuristic optimization algorithm designed to simulate the learning and reflective processes experienced by individuals as they mature within the social environment.However,the original GO algorithm is constrained by two significant limitations:slow convergence and high mem-ory requirements.This restricts its application to large-scale and complex problems.To address these problems,this paper proposes an innovative enhanced growth optimizer(eGO).In contrast to conventional population-based optimization algorithms,the eGO algorithm utilizes a probabilistic model,designated as the virtual population,which is capable of accurately replicating the behavior of actual populations while simultaneously reducing memory consumption.Furthermore,this paper introduces the Lévy flight mechanism,which enhances the diversity and flexibility of the search process,thus further improving the algorithm’s global search capability and convergence speed.To verify the effectiveness of the eGO algorithm,a series of experiments were conducted using the CEC2014 and CEC2017 test sets.The results demonstrate that the eGO algorithm outperforms the original GO algorithm and other compact algorithms regarding memory usage and convergence speed,thus exhibiting powerful optimization capabilities.Finally,the eGO algorithm was applied to image fusion.Through a comparative analysis with the existing PSO and GO algorithms and other compact algorithms,the eGO algorithm demonstrates superior performance in image fusion.
基金Open Fund Project of Key Laboratory of Instrumentation Science&Dynamic Measurement(No.2DSYSJ2015005)Specialized Research Fund for the Doctoral Program of Ministry of Education Colleges(No.20121420110004)
文摘In view of the problem that current mainstream fusion method of infrared polarization image—Multiscale Geometry Analysis method only focuses on a certain characteristic to image representation.And spatial domain fusion method,Principal Component Analysis(PCA)method has the shortcoming of losing small target,this paper presents a new fusion method of infrared polarization images based on combination of Nonsubsampled Shearlet Transformation(NSST)and improved PCA.This method can make full use of the effectiveness to image details expressed by NSST and the characteristics that PCA can highlight the main features of images.The combination of the two methods can integrate the complementary features of themselves to retain features of targets and image details fully.Firstly,intensity and polarization images are decomposed into low frequency and high frequency components with different directions by NSST.Secondly,the low frequency components are fused with improved PCA,while the high frequency components are fused by joint decision making rule with local energy and local variance.Finally,the fused image is reconstructed with the inverse NSST to obtain the final fused image of infrared polarization.The experiment results show that the method proposed has higher advantages than other methods in terms of detail preservation and visual effect.
文摘Aim To fuse the fluorescence image and transmission image of a cell into a single image containing more information than any of the individual image. Methods Image fusion technology was applied to biological cell imaging processing. It could match the images and improve the confidence and spatial resolution of the images. Using two algorithms, double thresholds algorithm and denoising algorithm based on wavelet transform,the fluorescence image and transmission image of a Cell were merged into a composite image. Results and Conclusion The position of fluorescence and the structure of cell can be displyed in the composite image. The signal-to-noise ratio of the exultant image is improved to a large extent. The algorithms are not only useful to investigate the fluorescence and transmission images, but also suitable to observing two or more fluoascent label proes in a single cell.
基金This work was supported by the National Natural Science Foundation of China(62075169,62003247,62061160370)the Key Research and Development Program of Hubei Province(2020BAB113).
文摘This study proposes a novel general image fusion framework based on cross-domain long-range learning and Swin Transformer,termed as SwinFusion.On the one hand,an attention-guided cross-domain module is devised to achieve sufficient integration of complementary information and global interaction.More specifically,the proposed method involves an intra-domain fusion unit based on self-attention and an interdomain fusion unit based on cross-attention,which mine and integrate long dependencies within the same domain and across domains.Through long-range dependency modeling,the network is able to fully implement domain-specific information extraction and cross-domain complementary information integration as well as maintaining the appropriate apparent intensity from a global perspective.In particular,we introduce the shifted windows mechanism into the self-attention and cross-attention,which allows our model to receive images with arbitrary sizes.On the other hand,the multi-scene image fusion problems are generalized to a unified framework with structure maintenance,detail preservation,and proper intensity control.Moreover,an elaborate loss function,consisting of SSIM loss,texture loss,and intensity loss,drives the network to preserve abundant texture details and structural information,as well as presenting optimal apparent intensity.Extensive experiments on both multi-modal image fusion and digital photography image fusion demonstrate the superiority of our SwinFusion compared to the state-of-theart unified image fusion algorithms and task-specific alternatives.Implementation code and pre-trained weights can be accessed at https://github.com/Linfeng-Tang/SwinFusion.
基金the National Natural Science Foundation of China(No.61105022)the Research Fund for the Doctoral Program of Higher Education of China(No.20110073120028)the Jiangsu Provincial Natural Science Foundation(No.BK2012296)
文摘In order to enhance the contrast of the fused image and reduce the loss of fine details in the process of image fusion,a novel fusion algorithm of infrared and visible images is proposed.First of all,regions of interest(RoIs)are detected in two original images by using saliency map.Then,nonsubsampled contourlet transform(NSCT)on both the infrared image and the visible image is performed to get a low-frequency sub-band and a certain amount of high-frequency sub-bands.Subsequently,the coefcients of all sub-bands are classified into four categories based on the result of RoI detection:the region of interest in the low-frequency sub-band(LSRoI),the region of interest in the high-frequency sub-band(HSRoI),the region of non-interest in the low-frequency sub-band(LSNRoI)and the region of non-interest in the high-frequency sub-band(HSNRoI).Fusion rules are customized for each kind of coefcients and fused image is achieved by performing the inverse NSCT to the fused coefcients.Experimental results show that the fusion scheme proposed in this paper achieves better efect than the other fusion algorithms both in visual efect and quantitative metrics.
基金supported by the National Natural Science Foundation of China (60802084)
文摘A new method for image fusion based on Contourlet transform and cycle spinning is proposed. Contourlet transform is a flexible multiresolution, local and directional image expansion, also provids a sparse representation for two-dimensional piece-wise smooth signals resembling images. Due to lack of translation invariance property in Contourlet transform, the conventional image fusion algorithm based on Contourlet transform introduces many artifacts. According to the theory of cycle spinning applied to image denoising, an invariance transform can reduce the artifacts through a series of processing efficiently. So the technology of cycle spinning is introduced to develop the translation invariant Contourlet fusion algorithm. This method can effectively eliminate the Gibbs-like phenomenon, extract the characteristics of original images, and preserve more important information. Experimental results show the simplicity and effectiveness of the method and its advantages over the conventional approaches.