Image fusion has been developing into an important area of research. In remote sensing, the use of the same image sensor in different working modes, or different image sensors, can provide reinforcing or complementary...Image fusion has been developing into an important area of research. In remote sensing, the use of the same image sensor in different working modes, or different image sensors, can provide reinforcing or complementary information. Therefore, it is highly valuable to fuse outputs from multiple sensors (or the same sensor in different working modes) to improve the overall performance of the remote images, which are very useful for human visual perception and image processing task. Accordingly, in this paper, we first provide a comprehensive survey of the state of the art of multi-sensor image fusion methods in terms of three aspects: pixel-level fusion, feature-level fusion and decision-level fusion. An overview of existing fusion strategies is then introduced, after which the existing fusion quality measures are summarized. Finally, this review analyzes the development trends in fusion algorithms that may attract researchers to further explore the research in this field.展开更多
In order to enhance the image information from multi-sensor and to improve the abilities of the information analysis and the feature extraction, this letter proposed a new fusion approach in pixel level by means of th...In order to enhance the image information from multi-sensor and to improve the abilities of the information analysis and the feature extraction, this letter proposed a new fusion approach in pixel level by means of the Wavelet Packet Transform (WPT). The WPT is able to decompose an image into low frequency band and high frequency band in higher scale. It offers a more precise method for image analysis than Wavelet Transform (WT). Firstly, the proposed approach employs HIS (Hue, Intensity, Saturation) transform to obtain the intensity component of CBERS (China-Brazil Earth Resource Satellite) multi-spectral image. Then WPT transform is employed to decompose the intensity component and SPOT (Systeme Pour I'Observation de la Therre ) image into low frequency band and high frequency band in three levels. Next, two high frequency coefficients and low frequency coefficients of the images are combined by linear weighting strategies. Finally, the fused image is obtained with inverse WPT and inverse HIS. The results show the new approach can fuse details of input image successfully, and thereby can obtain a more satisfactory result than that of HM (Histogram Matched)-based fusion algorithm and WT-based fusion approach.展开更多
Background:Medical imaging advancements are constrained by fundamental trade-offs between acquisition speed,radiation dose,and image quality,forcing clinicians to work with noisy,incomplete data.Existing reconstructio...Background:Medical imaging advancements are constrained by fundamental trade-offs between acquisition speed,radiation dose,and image quality,forcing clinicians to work with noisy,incomplete data.Existing reconstruction methods either compromise on accuracy with iterative algorithms or suffer from limited generalizability with task-specific deep learning approaches.Methods:We present LDM-PIR,a lightweight physics-conditioned diffusion multi-model for medical image reconstruction that addresses key challenges in magnetic resonance imaging(MRI),CT,and low-photon imaging.Unlike traditional iterative methods,which are computationally expensive,or task-specific deep learning approaches lacking generalizability,integrates three innovations.A physics-conditioned diffusion framework that embeds acquisition operators(Fourier/Radon transforms)and noise models directly into the reconstruction process.A multi-model architecture that unifies denoising,inpainting,and super-resolution via shared weight conditioning.A lightweight design(2.1M parameters)enabling rapid inference(0.8s/image on GPU).Through self-supervised fine-tuning with measurement consistency losses adapts to new imaging modalities using fewer annotated samples.Results:Achieves state-of-the-art performance on fastMRI(peak signal-to-noise ratio(PSNR):34.04 for single-coil/31.50 for multi-coil)and Lung Image Database Consortium and Image Database Resource Initiative(28.83 PSNR under Poisson noise).Clinical evaluations demonstrate superior preservation of anatomical structures,with SSIM improvements of 8.8%for single-coil and 4.36%for multi-coil MRI over uDPIR.Conclusion:It offers a flexible,efficient,and scalable solution for medical image reconstruction,addressing the challenges of noise,undersampling,and modality generalization.The model’s lightweight design allows for rapid inference,while its self-supervised fine-tuning capability minimizes reliance on large annotated datasets,making it suitable for real-world clinical applications.展开更多
Deep learning-based wind turbine blade fault diagnosis has been widely applied due to its advantages in end-to-end feature extraction.However,several challenges remain.First,signal noise collected during blade operati...Deep learning-based wind turbine blade fault diagnosis has been widely applied due to its advantages in end-to-end feature extraction.However,several challenges remain.First,signal noise collected during blade operation masks fault features,severely impairing the fault diagnosis performance of deep learning models.Second,current blade fault diagnosis often relies on single-sensor data,resulting in limited monitoring dimensions and ability to comprehensively capture complex fault states.To address these issues,a multi-sensor fusion-based wind turbine blade fault diagnosis method is proposed.Specifically,a CNN-Transformer Coupled Feature Learning Architecture is constructed to enhance the ability to learn complex features under noisy conditions,while a Weight-Aligned Data Fusion Module is designed to comprehensively and effectively utilize multi-sensor fault information.Experimental results of wind turbine blade fault diagnosis under different noise interferences show that higher accuracy is achieved by the proposed method compared to models with single-source data input,enabling comprehensive and effective fault diagnosis.展开更多
Roadbed disease detection is essential for maintaining road functionality.Ground penetrating radar(GPR)enables non-destructive detection without drilling.However,current identification often relies on manual inspectio...Roadbed disease detection is essential for maintaining road functionality.Ground penetrating radar(GPR)enables non-destructive detection without drilling.However,current identification often relies on manual inspection,which requires extensive experience,suffers from low efficiency,and is highly subjective.As the results are presented as radar images,image processing methods can be applied for fast and objective identification.Deep learning-based approaches now offer a robust solution for automated roadbed disease detection.This study proposes an enhanced Faster Region-based Convolutional Neural Networks(R-CNN)framework integrating ResNet-50 as the backbone and two-dimensional discrete Fourier spectrum transformation(2D-DFT)for frequency-domain feature fusion.A dedicated GPR image dataset comprising 1650 annotated images was constructed and augmented to 6600 images via median filtering,histogram equalization,and binarization.The proposed model segments defect regions,applies binary masking,and fuses frequency-domain features to improve small-target detection under noisy backgrounds.Experimental results show that the improved Faster R-CNN achieves a mean Average Precision(mAP)of 0.92,representing a 0.22 increase over the baseline.Precision improved by 26%while recall remained stable at 87%.The model was further validated on real urban road data,demonstrating robust detection capability even under interference.These findings highlight the potential of combining GPR with deep learning for efficient,non-destructive roadbed health monitoring.展开更多
The initial noise present in the depth images obtained with RGB-D sensors is a combination of hardware limitations in addition to the environmental factors,due to the limited capabilities of sensors,which also produce...The initial noise present in the depth images obtained with RGB-D sensors is a combination of hardware limitations in addition to the environmental factors,due to the limited capabilities of sensors,which also produce poor computer vision results.The common image denoising techniques tend to remove significant image details and also remove noise,provided they are based on space and frequency filtering.The updated framework presented in this paper is a novel denoising model that makes use of Boruta-driven feature selection using a Long Short-Term Memory Autoencoder(LSTMAE).The Boruta algorithm identifies the most useful depth features that are used to maximize the spatial structure integrity and reduce redundancy.An LSTMAE is then used to process these selected features and model depth pixel sequences to generate robust,noise-resistant representations.The system uses the encoder to encode the input data into a latent space that has been compressed before it is decoded to retrieve the clean image.Experiments on a benchmark data set show that the suggested technique attains a PSNR of 45 dB and an SSIM of 0.90,which is 10 dB higher than the performance of conventional convolutional autoencoders and 15 times higher than that of the wavelet-based models.Moreover,the feature selection step will decrease the input dimensionality by 40%,resulting in a 37.5%reduction in training time and a real-time inference rate of 200 FPS.Boruta-LSTMAE framework,therefore,offers a highly efficient and scalable system for depth image denoising,with a high potential to be applied to close-range 3D systems,such as robotic manipulation and gesture-based interfaces.展开更多
The fusion of infrared and visible images should emphasize the salient targets in the infrared image while preserving the textural details of the visible images.To meet these requirements,an autoencoder-based method f...The fusion of infrared and visible images should emphasize the salient targets in the infrared image while preserving the textural details of the visible images.To meet these requirements,an autoencoder-based method for infrared and visible image fusion is proposed.The encoder designed according to the optimization objective consists of a base encoder and a detail encoder,which is used to extract low-frequency and high-frequency information from the image.This extraction may lead to some information not being captured,so a compensation encoder is proposed to supplement the missing information.Multi-scale decomposition is also employed to extract image features more comprehensively.The decoder combines low-frequency,high-frequency and supplementary information to obtain multi-scale features.Subsequently,the attention strategy and fusion module are introduced to perform multi-scale fusion for image reconstruction.Experimental results on three datasets show that the fused images generated by this network effectively retain salient targets while being more consistent with human visual perception.展开更多
The purpose of infrared and visible image fusion is to create a single image containing the texture details and significant object information of the source images,particularly in challenging environments.However,exis...The purpose of infrared and visible image fusion is to create a single image containing the texture details and significant object information of the source images,particularly in challenging environments.However,existing image fusion algorithms are generally suitable for normal scenes.In the hazy scene,a lot of texture information in the visible image is hidden,the results of existing methods are filled with infrared information,resulting in the lack of texture details and poor visual effect.To address the aforementioned difficulties,we propose a haze-free infrared and visible fusion method,termed HaIVFusion,which can eliminate the influence of haze and obtain richer texture information in the fused image.Specifically,we first design a scene information restoration network(SIRNet)to mine the masked texture information in visible images.Then,a denoising fusion network(DFNet)is designed to integrate the features extracted from infrared and visible images and remove the influence of residual noise as much as possible.In addition,we use color consistency loss to reduce the color distortion resulting from haze.Furthermore,we publish a dataset of hazy scenes for infrared and visible image fusion to promote research in extreme scenes.Extensive experiments show that HaIVFusion produces fused images with increased texture details and higher contrast in hazy scenes,and achieves better quantitative results,when compared to state-ofthe-art image fusion methods,even combined with state-of-the-art dehazing methods.展开更多
Visible-infrared object detection leverages the day-night stable object perception capability of infrared images to enhance detection robustness in low-light environments by fusing the complementary information of vis...Visible-infrared object detection leverages the day-night stable object perception capability of infrared images to enhance detection robustness in low-light environments by fusing the complementary information of visible and infrared images.However,the inherent differences in the imaging mechanisms of visible and infrared modalities make effective cross-modal fusion challenging.Furthermore,constrained by the physical characteristics of sensors and thermal diffusion effects,infrared images generally suffer from blurred object contours and missing details,making it difficult to extract object features effectively.To address these issues,we propose an infrared-visible image fusion network that realizesmultimodal information fusion of infrared and visible images through a carefully designedmultiscale fusion strategy.First,we design an adaptive gray-radiance enhancement(AGRE)module to strengthen the detail representation in infrared images,improving their usability in complex lighting scenarios.Next,we introduce a channelspatial feature interaction(CSFI)module,which achieves efficient complementarity between the RGB and infrared(IR)modalities via dynamic channel switching and a spatial attention mechanism.Finally,we propose a multi-scale enhanced cross-attention fusion(MSECA)module,which optimizes the fusion ofmulti-level features through dynamic convolution and gating mechanisms and captures long-range complementary relationships of cross-modal features on a global scale,thereby enhancing the expressiveness of the fused features.Experiments on the KAIST,M3FD,and FLIR datasets demonstrate that our method delivers outstanding performance in daytime and nighttime scenarios.On the KAIST dataset,the miss rate drops to 5.99%,and further to 4.26% in night scenes.On the FLIR and M3FD datasets,it achieves AP50 scores of 79.4% and 88.9%,respectively.展开更多
Infrared and visible light image fusion technology integrates feature information from two different modalities into a fused image to obtain more comprehensive information.However,in low-light scenarios,the illuminati...Infrared and visible light image fusion technology integrates feature information from two different modalities into a fused image to obtain more comprehensive information.However,in low-light scenarios,the illumination degradation of visible light images makes it difficult for existing fusion methods to extract texture detail information from the scene.At this time,relying solely on the target saliency information provided by infrared images is far from sufficient.To address this challenge,this paper proposes a lightweight infrared and visible light image fusion method based on low-light enhancement,named LLE-Fuse.The method is based on the improvement of the MobileOne Block,using the Edge-MobileOne Block embedded with the Sobel operator to perform feature extraction and downsampling on the source images.The intermediate features at different scales obtained are then fused by a cross-modal attention fusion module.In addition,the Contrast Limited Adaptive Histogram Equalization(CLAHE)algorithm is used for image enhancement of both infrared and visible light images,guiding the network model to learn low-light enhancement capabilities through enhancement loss.Upon completion of network training,the Edge-MobileOne Block is optimized into a direct connection structure similar to MobileNetV1 through structural reparameterization,effectively reducing computational resource consumption.Finally,after extensive experimental comparisons,our method achieved improvements of 4.6%,40.5%,156.9%,9.2%,and 98.6%in the evaluation metrics Standard Deviation(SD),Visual Information Fidelity(VIF),Entropy(EN),and Spatial Frequency(SF),respectively,compared to the best results of the compared algorithms,while only being 1.5 ms/it slower in computation speed than the fastest method.展开更多
There is still a dearth of systematic study on picture stitching techniques for the natural tubular structures of intestines,and traditional stitching techniques have a poor application to endoscopic images with deep ...There is still a dearth of systematic study on picture stitching techniques for the natural tubular structures of intestines,and traditional stitching techniques have a poor application to endoscopic images with deep scenes.In order to recreate the intestinal wall in two dimensions,a method is developed.The normalized Laplacian algorithm is used to enhance the image and transform it into polar coordinates according to the characteristics that intestinal images are not obvious and usually arranged in a circle,in order to extract the new image segments of the current image relative to the previous image.The improved weighted fusion algorithm is then used to sequentially splice the segment images.The experimental results demonstrate that the suggested approach can improve image clarity and minimize noise while maintaining the information content of intestinal images.In addition,the method's seamless transition between the final portions of a panoramic image also demonstrates that the stitching trace has been removed.展开更多
The goal of infrared and visible image fusion(IVIF)is to integrate the unique advantages of both modalities to achieve a more comprehensive understanding of a scene.However,existing methods struggle to effectively han...The goal of infrared and visible image fusion(IVIF)is to integrate the unique advantages of both modalities to achieve a more comprehensive understanding of a scene.However,existing methods struggle to effectively handle modal disparities,resulting in visual degradation of the details and prominent targets of the fused images.To address these challenges,we introduce Prompt Fusion,a prompt-based approach that harmoniously combines multi-modality images under the guidance of semantic prompts.Firstly,to better characterize the features of different modalities,a contourlet autoencoder is designed to separate and extract the high-/low-frequency components of different modalities,thereby improving the extraction of fine details and textures.We also introduce a prompt learning mechanism using positive and negative prompts,leveraging Vision-Language Models to improve the fusion model's understanding and identification of targets in multi-modality images,leading to improved performance in downstream tasks.Furthermore,we employ bi-level asymptotic convergence optimization.This approach simplifies the intricate non-singleton non-convex bi-level problem into a series of convergent and differentiable single optimization problems that can be effectively resolved through gradient descent.Our approach advances the state-of-the-art,delivering superior fusion quality and boosting the performance of related downstream tasks.Project page:https://github.com/hey-it-s-me/PromptFusion.展开更多
Images with complementary spectral information can be recorded using image sensors that can identify visible and near-infrared spectrum.The fusion of visible and nearinfrared(NIR)aims to enhance the quality of images ...Images with complementary spectral information can be recorded using image sensors that can identify visible and near-infrared spectrum.The fusion of visible and nearinfrared(NIR)aims to enhance the quality of images acquired by video monitoring systems for the ease of user observation and data processing.Unfortunately,current fusion algorithms produce artefacts and colour distortion since they cannot make use of spectrum properties and are lacking in information complementarity.Therefore,an information complementarity fusion(ICF)model is designed based on physical signals.In order to separate high-frequency noise from important information in distinct frequency layers,the authors first extracted texture-scale and edge-scale layers using a two-scale filter.Second,the difference map between visible and near-infrared was filtered using the extended-DoG filter to produce the initial visible-NIR complementary weight map.Then,to generate a guide map,the near-infrared image with night adjustment was processed as well.The final complementarity weight map was subsequently derived via an arctanI function mapping using the guide map and the initial weight maps.Finally,fusion images were generated with the complementarity weight maps.The experimental results demonstrate that the proposed approach outperforms the state-of-the-art in both avoiding artificial colours as well as effectively utilising information complementarity.展开更多
Multi-label image classification is a challenging task due to the diverse sizes and complex backgrounds of objects in images.Obtaining class-specific precise representations at different scales is a key aspect of feat...Multi-label image classification is a challenging task due to the diverse sizes and complex backgrounds of objects in images.Obtaining class-specific precise representations at different scales is a key aspect of feature representation.However,existing methods often rely on the single-scale deep feature,neglecting shallow and deeper layer features,which poses challenges when predicting objects of varying scales within the same image.Although some studies have explored multi-scale features,they rarely address the flow of information between scales or efficiently obtain class-specific precise representations for features at different scales.To address these issues,we propose a two-stage,three-branch Transformer-based framework.The first stage incorporates multi-scale image feature extraction and hierarchical scale attention.This design enables the model to consider objects at various scales while enhancing the flow of information across different feature scales,improving the model’s generalization to diverse object scales.The second stage includes a global feature enhancement module and a region selection module.The global feature enhancement module strengthens interconnections between different image regions,mitigating the issue of incomplete represen-tations,while the region selection module models the cross-modal relationships between image features and labels.Together,these components enable the efficient acquisition of class-specific precise feature representations.Extensive experiments on public datasets,including COCO2014,VOC2007,and VOC2012,demonstrate the effectiveness of our proposed method.Our approach achieves consistent performance gains of 0.3%,0.4%,and 0.2%over state-of-the-art methods on the three datasets,respectively.These results validate the reliability and superiority of our approach for multi-label image classification.展开更多
Infrared and visible image fusion technology integrates the thermal radiation information of infrared images with the texture details of visible images to generate more informative fused images.However,existing method...Infrared and visible image fusion technology integrates the thermal radiation information of infrared images with the texture details of visible images to generate more informative fused images.However,existing methods often fail to distinguish salient objects from background regions,leading to detail suppression in salient regions due to global fusion strategies.This study presents a mask-guided latent low-rank representation fusion method to address this issue.First,the GrabCut algorithm is employed to extract a saliency mask,distinguishing salient regions from background regions.Then,latent low-rank representation(LatLRR)is applied to extract deep image features,enhancing key information extraction.In the fusion stage,a weighted fusion strategy strengthens infrared thermal information and visible texture details in salient regions,while an average fusion strategy improves background smoothness and stability.Experimental results on the TNO dataset demonstrate that the proposed method achieves superior performance in SPI,MI,Qabf,PSNR,and EN metrics,effectively preserving salient target details while maintaining balanced background information.Compared to state-of-the-art fusion methods,our approach achieves more stable and visually consistent fusion results.The fusion code is available on GitHub at:https://github.com/joyzhen1/Image(accessed on 15 January 2025).展开更多
The traditional EnFCM(Enhanced fuzzy C-means)algorithm only considers the grey-scale features in image segmentation,resulting in less than satisfactory results when the algorithm is used for remote sensing woodland im...The traditional EnFCM(Enhanced fuzzy C-means)algorithm only considers the grey-scale features in image segmentation,resulting in less than satisfactory results when the algorithm is used for remote sensing woodland image segmentation and extraction.An EnFCM remote sensing forest land extraction method based on PCA multi-feature fusion was proposed.Firstly,histogram equalization was applied to improve the image contrast.Secondly,the texture and edge features of the image were extracted,and a multi-feature fused pixel image was generated using the PCA technique.Moreover,the fused feature was used as a feature constraint to measure the difference of pixels instead of a single grey-scale feature.Finally,an improved feature distance metric calculated the similarity between the pixel points and the cluster center to complete the cluster segmentation.The experimental results showed that the error was between 1.5%and 4.0%compared with the forested area counted by experts’hand-drawing,which could obtain a high accuracy segmentation and extraction result.展开更多
Multimodal image fusion plays an important role in image analysis and applications.Multimodal medical image fusion helps to combine contrast features from two or more input imaging modalities to represent fused inform...Multimodal image fusion plays an important role in image analysis and applications.Multimodal medical image fusion helps to combine contrast features from two or more input imaging modalities to represent fused information in a single image.One of the critical clinical applications of medical image fusion is to fuse anatomical and functional modalities for rapid diagnosis of malignant tissues.This paper proposes a multimodal medical image fusion network(MMIF-Net)based on multiscale hybrid attention.The method first decomposes the original image to obtain the low-rank and significant parts.Then,to utilize the features at different scales,we add amultiscalemechanism that uses three filters of different sizes to extract the features in the encoded network.Also,a hybrid attention module is introduced to obtain more image details.Finally,the fused images are reconstructed by decoding the network.We conducted experiments with clinical images from brain computed tomography/magnetic resonance.The experimental results show that the multimodal medical image fusion network method based on multiscale hybrid attention works better than other advanced fusion methods.展开更多
At present,research on multi-label image classification mainly focuses on exploring the correlation between labels to improve the classification accuracy of multi-label images.However,in existing methods,label correla...At present,research on multi-label image classification mainly focuses on exploring the correlation between labels to improve the classification accuracy of multi-label images.However,in existing methods,label correlation is calculated based on the statistical information of the data.This label correlation is global and depends on the dataset,not suitable for all samples.In the process of extracting image features,the characteristic information of small objects in the image is easily lost,resulting in a low classification accuracy of small objects.To this end,this paper proposes a multi-label image classification model based on multiscale fusion and adaptive label correlation.The main idea is:first,the feature maps of multiple scales are fused to enhance the feature information of small objects.Semantic guidance decomposes the fusion feature map into feature vectors of each category,then adaptively mines the correlation between categories in the image through the self-attention mechanism of graph attention network,and obtains feature vectors containing category-related information for the final classification.The mean average precision of the model on the two public datasets of VOC 2007 and MS COCO 2014 reached 95.6% and 83.6%,respectively,and most of the indicators are better than those of the existing latest methods.展开更多
High-intensive underground mining has caused severe ground fissures,resulting in environmental degradation.Consequently,prompt detection is crucial to mitigate their environmental impact.However,the accurate segmentat...High-intensive underground mining has caused severe ground fissures,resulting in environmental degradation.Consequently,prompt detection is crucial to mitigate their environmental impact.However,the accurate segmentation of fissuresin complex and variable scenes of visible imagery is a challenging issue.Our method,DeepFissureNets-Infrared-Visible(DFN-IV),highlights the potential of incorporating visible images with infrared information for improved ground fissuresegmentation.DFNIV adopts a two-step process.First,a fusion network is trained with the dual adversarial learning strategy fuses infrared and visible imaging,providing an integrated representation of fissuretargets that combines the structural information with the textual details.Second,the fused images are processed by a fine-tunedsegmentation network,which lever-ages knowledge injection to learn the distinctive characteristics of fissuretargets effectively.Furthermore,an infrared-visible ground fissuredataset(IVGF)is built from an aerial investigation of the Daliuta Coal Mine.Extensive experiments reveal that our approach provides superior accuracy over single-modality image strategies employed in fivesegmentation models.Notably,DeeplabV3+tested with DFN-IV improves by 9.7%and 11.13%in pixel accuracy and Intersection over Union(IoU),respectively,compared to solely visible images.Moreover,our method surpasses six state-of-the-art image fusion methods,achieving a 5.28%improvement in pixel accuracy and a 1.57%increase in IoU,respectively,compared to the second-best effective method.In addition,ablation studies further validate the significanceof the dual adversarial learning module and the integrated knowledge injection strategy.By leveraging DFN-IV,we aim to quantify the impacts of mining-induced ground fissures,facilitating the implementation of intelligent safety measures.展开更多
A wavelet-based local and global feature fusion network(LAGN)is proposed for low-light image enhancement,aiming to enhance image details and restore colors in dark areas.This study focuses on addressing three key issu...A wavelet-based local and global feature fusion network(LAGN)is proposed for low-light image enhancement,aiming to enhance image details and restore colors in dark areas.This study focuses on addressing three key issues in low-light image enhancement:Enhancing low-light images using LAGN to preserve image details and colors;extracting image edge information via wavelet transform to enhance image details;and extracting local and global features of images through convolutional neural networks and Transformer to improve image contrast.Comparisons with state-of-the-art methods on two datasets verify that LAGN achieves the best performance in terms of details,brightness,and contrast.展开更多
文摘Image fusion has been developing into an important area of research. In remote sensing, the use of the same image sensor in different working modes, or different image sensors, can provide reinforcing or complementary information. Therefore, it is highly valuable to fuse outputs from multiple sensors (or the same sensor in different working modes) to improve the overall performance of the remote images, which are very useful for human visual perception and image processing task. Accordingly, in this paper, we first provide a comprehensive survey of the state of the art of multi-sensor image fusion methods in terms of three aspects: pixel-level fusion, feature-level fusion and decision-level fusion. An overview of existing fusion strategies is then introduced, after which the existing fusion quality measures are summarized. Finally, this review analyzes the development trends in fusion algorithms that may attract researchers to further explore the research in this field.
文摘In order to enhance the image information from multi-sensor and to improve the abilities of the information analysis and the feature extraction, this letter proposed a new fusion approach in pixel level by means of the Wavelet Packet Transform (WPT). The WPT is able to decompose an image into low frequency band and high frequency band in higher scale. It offers a more precise method for image analysis than Wavelet Transform (WT). Firstly, the proposed approach employs HIS (Hue, Intensity, Saturation) transform to obtain the intensity component of CBERS (China-Brazil Earth Resource Satellite) multi-spectral image. Then WPT transform is employed to decompose the intensity component and SPOT (Systeme Pour I'Observation de la Therre ) image into low frequency band and high frequency band in three levels. Next, two high frequency coefficients and low frequency coefficients of the images are combined by linear weighting strategies. Finally, the fused image is obtained with inverse WPT and inverse HIS. The results show the new approach can fuse details of input image successfully, and thereby can obtain a more satisfactory result than that of HM (Histogram Matched)-based fusion algorithm and WT-based fusion approach.
文摘Background:Medical imaging advancements are constrained by fundamental trade-offs between acquisition speed,radiation dose,and image quality,forcing clinicians to work with noisy,incomplete data.Existing reconstruction methods either compromise on accuracy with iterative algorithms or suffer from limited generalizability with task-specific deep learning approaches.Methods:We present LDM-PIR,a lightweight physics-conditioned diffusion multi-model for medical image reconstruction that addresses key challenges in magnetic resonance imaging(MRI),CT,and low-photon imaging.Unlike traditional iterative methods,which are computationally expensive,or task-specific deep learning approaches lacking generalizability,integrates three innovations.A physics-conditioned diffusion framework that embeds acquisition operators(Fourier/Radon transforms)and noise models directly into the reconstruction process.A multi-model architecture that unifies denoising,inpainting,and super-resolution via shared weight conditioning.A lightweight design(2.1M parameters)enabling rapid inference(0.8s/image on GPU).Through self-supervised fine-tuning with measurement consistency losses adapts to new imaging modalities using fewer annotated samples.Results:Achieves state-of-the-art performance on fastMRI(peak signal-to-noise ratio(PSNR):34.04 for single-coil/31.50 for multi-coil)and Lung Image Database Consortium and Image Database Resource Initiative(28.83 PSNR under Poisson noise).Clinical evaluations demonstrate superior preservation of anatomical structures,with SSIM improvements of 8.8%for single-coil and 4.36%for multi-coil MRI over uDPIR.Conclusion:It offers a flexible,efficient,and scalable solution for medical image reconstruction,addressing the challenges of noise,undersampling,and modality generalization.The model’s lightweight design allows for rapid inference,while its self-supervised fine-tuning capability minimizes reliance on large annotated datasets,making it suitable for real-world clinical applications.
基金supported by the China Three Gorges Corporation(No.NBZZ202300860)the National Natural Science Foundation of China(No.52275104)the Science and Technology Innovation Program of Hunan Province(No.2023RC3097).
文摘Deep learning-based wind turbine blade fault diagnosis has been widely applied due to its advantages in end-to-end feature extraction.However,several challenges remain.First,signal noise collected during blade operation masks fault features,severely impairing the fault diagnosis performance of deep learning models.Second,current blade fault diagnosis often relies on single-sensor data,resulting in limited monitoring dimensions and ability to comprehensively capture complex fault states.To address these issues,a multi-sensor fusion-based wind turbine blade fault diagnosis method is proposed.Specifically,a CNN-Transformer Coupled Feature Learning Architecture is constructed to enhance the ability to learn complex features under noisy conditions,while a Weight-Aligned Data Fusion Module is designed to comprehensively and effectively utilize multi-sensor fault information.Experimental results of wind turbine blade fault diagnosis under different noise interferences show that higher accuracy is achieved by the proposed method compared to models with single-source data input,enabling comprehensive and effective fault diagnosis.
基金supported by the Second Batch of Key Textbook Construction Projects of“14th Five-Year Plan”of Zhejiang Vocational Colleges(SZDJC-2412).
文摘Roadbed disease detection is essential for maintaining road functionality.Ground penetrating radar(GPR)enables non-destructive detection without drilling.However,current identification often relies on manual inspection,which requires extensive experience,suffers from low efficiency,and is highly subjective.As the results are presented as radar images,image processing methods can be applied for fast and objective identification.Deep learning-based approaches now offer a robust solution for automated roadbed disease detection.This study proposes an enhanced Faster Region-based Convolutional Neural Networks(R-CNN)framework integrating ResNet-50 as the backbone and two-dimensional discrete Fourier spectrum transformation(2D-DFT)for frequency-domain feature fusion.A dedicated GPR image dataset comprising 1650 annotated images was constructed and augmented to 6600 images via median filtering,histogram equalization,and binarization.The proposed model segments defect regions,applies binary masking,and fuses frequency-domain features to improve small-target detection under noisy backgrounds.Experimental results show that the improved Faster R-CNN achieves a mean Average Precision(mAP)of 0.92,representing a 0.22 increase over the baseline.Precision improved by 26%while recall remained stable at 87%.The model was further validated on real urban road data,demonstrating robust detection capability even under interference.These findings highlight the potential of combining GPR with deep learning for efficient,non-destructive roadbed health monitoring.
文摘The initial noise present in the depth images obtained with RGB-D sensors is a combination of hardware limitations in addition to the environmental factors,due to the limited capabilities of sensors,which also produce poor computer vision results.The common image denoising techniques tend to remove significant image details and also remove noise,provided they are based on space and frequency filtering.The updated framework presented in this paper is a novel denoising model that makes use of Boruta-driven feature selection using a Long Short-Term Memory Autoencoder(LSTMAE).The Boruta algorithm identifies the most useful depth features that are used to maximize the spatial structure integrity and reduce redundancy.An LSTMAE is then used to process these selected features and model depth pixel sequences to generate robust,noise-resistant representations.The system uses the encoder to encode the input data into a latent space that has been compressed before it is decoded to retrieve the clean image.Experiments on a benchmark data set show that the suggested technique attains a PSNR of 45 dB and an SSIM of 0.90,which is 10 dB higher than the performance of conventional convolutional autoencoders and 15 times higher than that of the wavelet-based models.Moreover,the feature selection step will decrease the input dimensionality by 40%,resulting in a 37.5%reduction in training time and a real-time inference rate of 200 FPS.Boruta-LSTMAE framework,therefore,offers a highly efficient and scalable system for depth image denoising,with a high potential to be applied to close-range 3D systems,such as robotic manipulation and gesture-based interfaces.
基金Supported by the Henan Province Key Research and Development Project(231111211300)the Central Government of Henan Province Guides Local Science and Technology Development Funds(Z20231811005)+2 种基金Henan Province Key Research and Development Project(231111110100)Henan Provincial Outstanding Foreign Scientist Studio(GZS2024006)Henan Provincial Joint Fund for Scientific and Technological Research and Development Plan(Application and Overcoming Technical Barriers)(242103810028)。
文摘The fusion of infrared and visible images should emphasize the salient targets in the infrared image while preserving the textural details of the visible images.To meet these requirements,an autoencoder-based method for infrared and visible image fusion is proposed.The encoder designed according to the optimization objective consists of a base encoder and a detail encoder,which is used to extract low-frequency and high-frequency information from the image.This extraction may lead to some information not being captured,so a compensation encoder is proposed to supplement the missing information.Multi-scale decomposition is also employed to extract image features more comprehensively.The decoder combines low-frequency,high-frequency and supplementary information to obtain multi-scale features.Subsequently,the attention strategy and fusion module are introduced to perform multi-scale fusion for image reconstruction.Experimental results on three datasets show that the fused images generated by this network effectively retain salient targets while being more consistent with human visual perception.
基金supported by the Natural Science Foundation of Shandong Province,China(ZR2022MF237)the National Natural Science Foundation of China Youth Fund(62406155)the Major Innovation Project(2023JBZ02)of Qilu University of Technology(Shandong Academy of Sciences).
文摘The purpose of infrared and visible image fusion is to create a single image containing the texture details and significant object information of the source images,particularly in challenging environments.However,existing image fusion algorithms are generally suitable for normal scenes.In the hazy scene,a lot of texture information in the visible image is hidden,the results of existing methods are filled with infrared information,resulting in the lack of texture details and poor visual effect.To address the aforementioned difficulties,we propose a haze-free infrared and visible fusion method,termed HaIVFusion,which can eliminate the influence of haze and obtain richer texture information in the fused image.Specifically,we first design a scene information restoration network(SIRNet)to mine the masked texture information in visible images.Then,a denoising fusion network(DFNet)is designed to integrate the features extracted from infrared and visible images and remove the influence of residual noise as much as possible.In addition,we use color consistency loss to reduce the color distortion resulting from haze.Furthermore,we publish a dataset of hazy scenes for infrared and visible image fusion to promote research in extreme scenes.Extensive experiments show that HaIVFusion produces fused images with increased texture details and higher contrast in hazy scenes,and achieves better quantitative results,when compared to state-ofthe-art image fusion methods,even combined with state-of-the-art dehazing methods.
基金supported by the National Natural Science Foundation of China(Grant No.62302086)the Natural Science Foundation of Liaoning Province(Grant No.2023-MSBA-070)the Fundamental Research Funds for the Central Universities(Grant No.N2317005).
文摘Visible-infrared object detection leverages the day-night stable object perception capability of infrared images to enhance detection robustness in low-light environments by fusing the complementary information of visible and infrared images.However,the inherent differences in the imaging mechanisms of visible and infrared modalities make effective cross-modal fusion challenging.Furthermore,constrained by the physical characteristics of sensors and thermal diffusion effects,infrared images generally suffer from blurred object contours and missing details,making it difficult to extract object features effectively.To address these issues,we propose an infrared-visible image fusion network that realizesmultimodal information fusion of infrared and visible images through a carefully designedmultiscale fusion strategy.First,we design an adaptive gray-radiance enhancement(AGRE)module to strengthen the detail representation in infrared images,improving their usability in complex lighting scenarios.Next,we introduce a channelspatial feature interaction(CSFI)module,which achieves efficient complementarity between the RGB and infrared(IR)modalities via dynamic channel switching and a spatial attention mechanism.Finally,we propose a multi-scale enhanced cross-attention fusion(MSECA)module,which optimizes the fusion ofmulti-level features through dynamic convolution and gating mechanisms and captures long-range complementary relationships of cross-modal features on a global scale,thereby enhancing the expressiveness of the fused features.Experiments on the KAIST,M3FD,and FLIR datasets demonstrate that our method delivers outstanding performance in daytime and nighttime scenarios.On the KAIST dataset,the miss rate drops to 5.99%,and further to 4.26% in night scenes.On the FLIR and M3FD datasets,it achieves AP50 scores of 79.4% and 88.9%,respectively.
基金This researchwas Sponsored by Xinjiang Uygur Autonomous Region Tianshan Talent Programme Project(2023TCLJ02)Natural Science Foundation of Xinjiang Uygur Autonomous Region(2022D01C349).
文摘Infrared and visible light image fusion technology integrates feature information from two different modalities into a fused image to obtain more comprehensive information.However,in low-light scenarios,the illumination degradation of visible light images makes it difficult for existing fusion methods to extract texture detail information from the scene.At this time,relying solely on the target saliency information provided by infrared images is far from sufficient.To address this challenge,this paper proposes a lightweight infrared and visible light image fusion method based on low-light enhancement,named LLE-Fuse.The method is based on the improvement of the MobileOne Block,using the Edge-MobileOne Block embedded with the Sobel operator to perform feature extraction and downsampling on the source images.The intermediate features at different scales obtained are then fused by a cross-modal attention fusion module.In addition,the Contrast Limited Adaptive Histogram Equalization(CLAHE)algorithm is used for image enhancement of both infrared and visible light images,guiding the network model to learn low-light enhancement capabilities through enhancement loss.Upon completion of network training,the Edge-MobileOne Block is optimized into a direct connection structure similar to MobileNetV1 through structural reparameterization,effectively reducing computational resource consumption.Finally,after extensive experimental comparisons,our method achieved improvements of 4.6%,40.5%,156.9%,9.2%,and 98.6%in the evaluation metrics Standard Deviation(SD),Visual Information Fidelity(VIF),Entropy(EN),and Spatial Frequency(SF),respectively,compared to the best results of the compared algorithms,while only being 1.5 ms/it slower in computation speed than the fastest method.
基金the Special Research Fund for the Natural Science Foundation of Chongqing(No.cstc2019jcyjmsxm1351)the Science and Technology Research Project of Chongqing Education Commission(No.KJQN2020006300)。
文摘There is still a dearth of systematic study on picture stitching techniques for the natural tubular structures of intestines,and traditional stitching techniques have a poor application to endoscopic images with deep scenes.In order to recreate the intestinal wall in two dimensions,a method is developed.The normalized Laplacian algorithm is used to enhance the image and transform it into polar coordinates according to the characteristics that intestinal images are not obvious and usually arranged in a circle,in order to extract the new image segments of the current image relative to the previous image.The improved weighted fusion algorithm is then used to sequentially splice the segment images.The experimental results demonstrate that the suggested approach can improve image clarity and minimize noise while maintaining the information content of intestinal images.In addition,the method's seamless transition between the final portions of a panoramic image also demonstrates that the stitching trace has been removed.
基金partially supported by China Postdoctoral Science Foundation(2023M730741)the National Natural Science Foundation of China(U22B2052,52102432,52202452,62372080,62302078)
文摘The goal of infrared and visible image fusion(IVIF)is to integrate the unique advantages of both modalities to achieve a more comprehensive understanding of a scene.However,existing methods struggle to effectively handle modal disparities,resulting in visual degradation of the details and prominent targets of the fused images.To address these challenges,we introduce Prompt Fusion,a prompt-based approach that harmoniously combines multi-modality images under the guidance of semantic prompts.Firstly,to better characterize the features of different modalities,a contourlet autoencoder is designed to separate and extract the high-/low-frequency components of different modalities,thereby improving the extraction of fine details and textures.We also introduce a prompt learning mechanism using positive and negative prompts,leveraging Vision-Language Models to improve the fusion model's understanding and identification of targets in multi-modality images,leading to improved performance in downstream tasks.Furthermore,we employ bi-level asymptotic convergence optimization.This approach simplifies the intricate non-singleton non-convex bi-level problem into a series of convergent and differentiable single optimization problems that can be effectively resolved through gradient descent.Our approach advances the state-of-the-art,delivering superior fusion quality and boosting the performance of related downstream tasks.Project page:https://github.com/hey-it-s-me/PromptFusion.
基金supports in part by the Natural Science Foundation of China(NSFC)under contract No.62171253the Young Elite Scientists Sponsorship Program by CAST under program No.2022QNRC001,as well as the Fundamental Research Funds for the Central Universities.
文摘Images with complementary spectral information can be recorded using image sensors that can identify visible and near-infrared spectrum.The fusion of visible and nearinfrared(NIR)aims to enhance the quality of images acquired by video monitoring systems for the ease of user observation and data processing.Unfortunately,current fusion algorithms produce artefacts and colour distortion since they cannot make use of spectrum properties and are lacking in information complementarity.Therefore,an information complementarity fusion(ICF)model is designed based on physical signals.In order to separate high-frequency noise from important information in distinct frequency layers,the authors first extracted texture-scale and edge-scale layers using a two-scale filter.Second,the difference map between visible and near-infrared was filtered using the extended-DoG filter to produce the initial visible-NIR complementary weight map.Then,to generate a guide map,the near-infrared image with night adjustment was processed as well.The final complementarity weight map was subsequently derived via an arctanI function mapping using the guide map and the initial weight maps.Finally,fusion images were generated with the complementarity weight maps.The experimental results demonstrate that the proposed approach outperforms the state-of-the-art in both avoiding artificial colours as well as effectively utilising information complementarity.
基金supported by the National Natural Science Foundation of China(62302167,62477013)Natural Science Foundation of Shanghai(No.24ZR1456100)+1 种基金Science and Technology Commission of Shanghai Municipality(No.24DZ2305900)the Shanghai Municipal Special Fund for Promoting High-Quality Development of Industries(2211106).
文摘Multi-label image classification is a challenging task due to the diverse sizes and complex backgrounds of objects in images.Obtaining class-specific precise representations at different scales is a key aspect of feature representation.However,existing methods often rely on the single-scale deep feature,neglecting shallow and deeper layer features,which poses challenges when predicting objects of varying scales within the same image.Although some studies have explored multi-scale features,they rarely address the flow of information between scales or efficiently obtain class-specific precise representations for features at different scales.To address these issues,we propose a two-stage,three-branch Transformer-based framework.The first stage incorporates multi-scale image feature extraction and hierarchical scale attention.This design enables the model to consider objects at various scales while enhancing the flow of information across different feature scales,improving the model’s generalization to diverse object scales.The second stage includes a global feature enhancement module and a region selection module.The global feature enhancement module strengthens interconnections between different image regions,mitigating the issue of incomplete represen-tations,while the region selection module models the cross-modal relationships between image features and labels.Together,these components enable the efficient acquisition of class-specific precise feature representations.Extensive experiments on public datasets,including COCO2014,VOC2007,and VOC2012,demonstrate the effectiveness of our proposed method.Our approach achieves consistent performance gains of 0.3%,0.4%,and 0.2%over state-of-the-art methods on the three datasets,respectively.These results validate the reliability and superiority of our approach for multi-label image classification.
基金supported by Universiti Teknologi MARA through UiTM MyRA Research Grant,600-RMC 5/3/GPM(053/2022).
文摘Infrared and visible image fusion technology integrates the thermal radiation information of infrared images with the texture details of visible images to generate more informative fused images.However,existing methods often fail to distinguish salient objects from background regions,leading to detail suppression in salient regions due to global fusion strategies.This study presents a mask-guided latent low-rank representation fusion method to address this issue.First,the GrabCut algorithm is employed to extract a saliency mask,distinguishing salient regions from background regions.Then,latent low-rank representation(LatLRR)is applied to extract deep image features,enhancing key information extraction.In the fusion stage,a weighted fusion strategy strengthens infrared thermal information and visible texture details in salient regions,while an average fusion strategy improves background smoothness and stability.Experimental results on the TNO dataset demonstrate that the proposed method achieves superior performance in SPI,MI,Qabf,PSNR,and EN metrics,effectively preserving salient target details while maintaining balanced background information.Compared to state-of-the-art fusion methods,our approach achieves more stable and visually consistent fusion results.The fusion code is available on GitHub at:https://github.com/joyzhen1/Image(accessed on 15 January 2025).
基金supported by National Natural Science Foundation of China(No.61761027)Gansu Young Doctor’s Fund for Higher Education Institutions(No.2021QB-053)。
文摘The traditional EnFCM(Enhanced fuzzy C-means)algorithm only considers the grey-scale features in image segmentation,resulting in less than satisfactory results when the algorithm is used for remote sensing woodland image segmentation and extraction.An EnFCM remote sensing forest land extraction method based on PCA multi-feature fusion was proposed.Firstly,histogram equalization was applied to improve the image contrast.Secondly,the texture and edge features of the image were extracted,and a multi-feature fused pixel image was generated using the PCA technique.Moreover,the fused feature was used as a feature constraint to measure the difference of pixels instead of a single grey-scale feature.Finally,an improved feature distance metric calculated the similarity between the pixel points and the cluster center to complete the cluster segmentation.The experimental results showed that the error was between 1.5%and 4.0%compared with the forested area counted by experts’hand-drawing,which could obtain a high accuracy segmentation and extraction result.
基金supported by Qingdao Huanghai University School-Level ScientificResearch Project(2023KJ14)Undergraduate Teaching Reform Research Project of Shandong Provincial Department of Education(M2022328)+1 种基金National Natural Science Foundation of China under Grant(42472324)Qingdao Postdoctoral Foundation under Grant(QDBSH202402049).
文摘Multimodal image fusion plays an important role in image analysis and applications.Multimodal medical image fusion helps to combine contrast features from two or more input imaging modalities to represent fused information in a single image.One of the critical clinical applications of medical image fusion is to fuse anatomical and functional modalities for rapid diagnosis of malignant tissues.This paper proposes a multimodal medical image fusion network(MMIF-Net)based on multiscale hybrid attention.The method first decomposes the original image to obtain the low-rank and significant parts.Then,to utilize the features at different scales,we add amultiscalemechanism that uses three filters of different sizes to extract the features in the encoded network.Also,a hybrid attention module is introduced to obtain more image details.Finally,the fused images are reconstructed by decoding the network.We conducted experiments with clinical images from brain computed tomography/magnetic resonance.The experimental results show that the multimodal medical image fusion network method based on multiscale hybrid attention works better than other advanced fusion methods.
基金the National Natural Science Foundation of China(Nos.62167005 and 61966018)the Key Research Projects of Jiangxi Provincial Department of Education(No.GJJ200302)。
文摘At present,research on multi-label image classification mainly focuses on exploring the correlation between labels to improve the classification accuracy of multi-label images.However,in existing methods,label correlation is calculated based on the statistical information of the data.This label correlation is global and depends on the dataset,not suitable for all samples.In the process of extracting image features,the characteristic information of small objects in the image is easily lost,resulting in a low classification accuracy of small objects.To this end,this paper proposes a multi-label image classification model based on multiscale fusion and adaptive label correlation.The main idea is:first,the feature maps of multiple scales are fused to enhance the feature information of small objects.Semantic guidance decomposes the fusion feature map into feature vectors of each category,then adaptively mines the correlation between categories in the image through the self-attention mechanism of graph attention network,and obtains feature vectors containing category-related information for the final classification.The mean average precision of the model on the two public datasets of VOC 2007 and MS COCO 2014 reached 95.6% and 83.6%,respectively,and most of the indicators are better than those of the existing latest methods.
基金supported by the National Science Fund of China(Grant No.52225402)Fund of Inner Mongolia Research Institute,China University of Mining and Technology(Beijing)(Grant No.IMRI23003).
文摘High-intensive underground mining has caused severe ground fissures,resulting in environmental degradation.Consequently,prompt detection is crucial to mitigate their environmental impact.However,the accurate segmentation of fissuresin complex and variable scenes of visible imagery is a challenging issue.Our method,DeepFissureNets-Infrared-Visible(DFN-IV),highlights the potential of incorporating visible images with infrared information for improved ground fissuresegmentation.DFNIV adopts a two-step process.First,a fusion network is trained with the dual adversarial learning strategy fuses infrared and visible imaging,providing an integrated representation of fissuretargets that combines the structural information with the textual details.Second,the fused images are processed by a fine-tunedsegmentation network,which lever-ages knowledge injection to learn the distinctive characteristics of fissuretargets effectively.Furthermore,an infrared-visible ground fissuredataset(IVGF)is built from an aerial investigation of the Daliuta Coal Mine.Extensive experiments reveal that our approach provides superior accuracy over single-modality image strategies employed in fivesegmentation models.Notably,DeeplabV3+tested with DFN-IV improves by 9.7%and 11.13%in pixel accuracy and Intersection over Union(IoU),respectively,compared to solely visible images.Moreover,our method surpasses six state-of-the-art image fusion methods,achieving a 5.28%improvement in pixel accuracy and a 1.57%increase in IoU,respectively,compared to the second-best effective method.In addition,ablation studies further validate the significanceof the dual adversarial learning module and the integrated knowledge injection strategy.By leveraging DFN-IV,we aim to quantify the impacts of mining-induced ground fissures,facilitating the implementation of intelligent safety measures.
文摘A wavelet-based local and global feature fusion network(LAGN)is proposed for low-light image enhancement,aiming to enhance image details and restore colors in dark areas.This study focuses on addressing three key issues in low-light image enhancement:Enhancing low-light images using LAGN to preserve image details and colors;extracting image edge information via wavelet transform to enhance image details;and extracting local and global features of images through convolutional neural networks and Transformer to improve image contrast.Comparisons with state-of-the-art methods on two datasets verify that LAGN achieves the best performance in terms of details,brightness,and contrast.