In recent years, deep convolutional neural networks have shown superior performance in image denoising. However, deep network structures often come with a large number of model parameters, leading to high training cos...In recent years, deep convolutional neural networks have shown superior performance in image denoising. However, deep network structures often come with a large number of model parameters, leading to high training costs and long inference times, limiting their practical application in denoising tasks. This paper proposes a new dual convolutional denoising network with skip connections(DECDNet), which achieves an ideal balance between denoising effect and network complexity. The proposed DECDNet consists of a noise estimation network, a multi-scale feature extraction network, a dual convolutional neural network, and dual attention mechanisms. The noise estimation network is used to estimate the noise level map, and the multi-scale feature extraction network is combined to improve the model's flexibility in obtaining image features. The dual convolutional neural network branch design includes convolution and dilated convolution interactive connections, with the lower branch consisting of dilated convolution layers, and both branches using skip connections. Experiments show that compared with other models, the proposed DECDNet achieves superior PSNR and SSIM values at all compared noise levels, especially at higher noise levels, showing robustness to images with higher noise levels. It also demonstrates better visual effects, maintaining a balance between denoising and detail preservation.展开更多
Semantic segmentation plays a foundational role in biomedical image analysis, providing precise information about cellular, tissue, and organ structures in both biological and medical imaging modalities. Traditional a...Semantic segmentation plays a foundational role in biomedical image analysis, providing precise information about cellular, tissue, and organ structures in both biological and medical imaging modalities. Traditional approaches often fail in the face of challenges such as low contrast, morphological variability, and densely packed structures. Recent advancements in deep learning have transformed segmentation capabilities through the integration of fine-scale detail preservation, coarse-scale contextual modeling, and multi-scale feature fusion. This work provides a comprehensive analysis of state-of-the-art deep learning models, including U-Net variants, attention-based frameworks, and Transformer-integrated networks, highlighting innovations that improve accuracy, generalizability, and computational efficiency. Key architectural components such as convolution operations, shallow and deep blocks, skip connections, and hybrid encoders are examined for their roles in enhancing spatial representation and semantic consistency. We further discuss the importance of hierarchical and instance-aware segmentation and annotation in interpreting complex biological scenes and multiplexed medical images. By bridging methodological developments with diverse application domains, this paper outlines current trends and future directions for semantic segmentation, emphasizing its critical role in facilitating annotation, diagnosis, and discovery in biomedical research.展开更多
Image inpainting refers to synthesizing missing content in an image based on known information to restore occluded or damaged regions,which is a typical manifestation of this trend.With the increasing complexity of im...Image inpainting refers to synthesizing missing content in an image based on known information to restore occluded or damaged regions,which is a typical manifestation of this trend.With the increasing complexity of image in tasks and the growth of data scale,existing deep learning methods still have some limitations.For example,they lack the ability to capture long-range dependencies and their performance in handling multi-scale image structures is suboptimal.To solve this problem,the paper proposes an image inpainting method based on the parallel dual-branch learnable Transformer network.The encoder of the proposed model generator consists of a dual-branch parallel structure with stacked CNN blocks and Transformer blocks,aiming to extract global and local feature information from images.Furthermore,a dual-branch fusion module is adopted to combine the features obtained from both branches.Additionally,a gated full-scale skip connection module is proposed to further enhance the coherence of the inpainting results and alleviate information loss.Finally,experimental results from the three public datasets demonstrate the superior performance of the proposed method.展开更多
Visual degradation of captured images caused by rainy streaks under rainy weather can adversely affect the performance of many open-air vision systems.Hence,it is necessary to address the problem of eliminating rain s...Visual degradation of captured images caused by rainy streaks under rainy weather can adversely affect the performance of many open-air vision systems.Hence,it is necessary to address the problem of eliminating rain streaks from the individual rainy image.In this work,a deep convolution neural network(CNN)based method is introduced,called Rain-Removal Net(R2N),to solve the single image de-raining issue.Firstly,we decomposed the rainy image into its high-frequency detail layer and lowfrequency base layer.Then,we used the high-frequency detail layer to input the carefully designed CNN architecture to learn the mapping between it and its corresponding derained high-frequency detail layer.The CNN architecture consists of four convolution layers and four deconvolution layers,as well as three skip connections.The experiments on synthetic and real-world rainy images show that the performance of our architecture outperforms the compared state-of-the-art de-raining models with respects to the quality of de-rained images and computing efficiency.展开更多
Aiming at the problem of radar base and ground observation stations on the Tibet is sparsely distributed and cannot achieve large-scale precipitation monitoring.U-Net,an advanced machine learning(ML)method,is used to ...Aiming at the problem of radar base and ground observation stations on the Tibet is sparsely distributed and cannot achieve large-scale precipitation monitoring.U-Net,an advanced machine learning(ML)method,is used to develop a robust and rapid algorithm for precipitating cloud detection based on the new-generation geostationary satellite of FengYun-4A(FY-4A).First,in this algorithm,the real-time multi-band infrared brightness temperature from FY-4A combined with the data of Digital Elevation Model(DEM)has been used as predictor variables for our model.Second,the efficiency of the feature was improved by changing the traditional convolution layer serial connection method of U-Net to residual mapping.Then,in order to solve the problem of the network that would produce semantic differences when directly concentrated with low-level and high-level features,we use dense skip pathways to reuse feature maps of different layers as inputs for concatenate neural networks feature layers from different depths.Finally,according to the characteristics of precipitation clouds,the pooling layer of U-Net was replaced by a convolution operation to realize the detection of small precipitation clouds.It was experimentally concluded that the Pixel Accuracy(PA)and Mean Intersection over Union(MIoU)of the improved U-Net on the test set could reach 0.916 and 0.928,the detection of precipitation clouds over Tibet were well actualized.展开更多
As an important part of the new generation of information technology,the Internet of Things(IoT)has been widely concerned and regarded as an enabling technology of the next generation of health care system.The fundus ...As an important part of the new generation of information technology,the Internet of Things(IoT)has been widely concerned and regarded as an enabling technology of the next generation of health care system.The fundus photography equipment is connected to the cloud platform through the IoT,so as to realize the realtime uploading of fundus images and the rapid issuance of diagnostic suggestions by artificial intelligence.At the same time,important security and privacy issues have emerged.The data uploaded to the cloud platform involves more personal attributes,health status and medical application data of patients.Once leaked,abused or improperly disclosed,personal information security will be violated.Therefore,it is important to address the security and privacy issues of massive medical and healthcare equipment connecting to the infrastructure of IoT healthcare and health systems.To meet this challenge,we propose MIA-UNet,a multi-scale iterative aggregation U-network,which aims to achieve accurate and efficient retinal vessel segmentation for ophthalmic auxiliary diagnosis while ensuring that the network has low computational complexity to adapt to mobile terminals.In this way,users do not need to upload the data to the cloud platform,and can analyze and process the fundus images on their own mobile terminals,thus eliminating the leakage of personal information.Specifically,the interconnection between encoder and decoder,as well as the internal connection between decoder subnetworks in classic U-Net are redefined and redesigned.Furthermore,we propose a hybrid loss function to smooth the gradient and deal with the imbalance between foreground and background.Compared with the UNet,the segmentation performance of the proposed network is significantly improved on the premise that the number of parameters is only increased by 2%.When applied to three publicly available datasets:DRIVE,STARE and CHASE DB1,the proposed network achieves the accuracy/F1-score of 96.33%/84.34%,97.12%/83.17%and 97.06%/84.10%,respectively.The experimental results show that the MIA-UNet is superior to the state-of-the-art methods.展开更多
In this study, we present a new andinnovative framework for acquiring high-qualitySVBRDF maps. Our approach addresses the limitations of the current methods and proposes a newsolution. The core of our method is a simp...In this study, we present a new andinnovative framework for acquiring high-qualitySVBRDF maps. Our approach addresses the limitations of the current methods and proposes a newsolution. The core of our method is a simple hardwaresetup consisting of a consumer-level camera, LEDlights, and a carefully designed network that canaccurately obtain the high-quality SVBRDF propertiesof a nearly planar object. By capturing a flexiblenumber of images of an object, our network usesdifferent subnetworks to train different property mapsand employs appropriate loss functions for each ofthem. To further enhance the quality of the maps, weimproved the network structure by adding a novel skipconnection that connects the encoder and decoder withglobal features. Through extensive experimentation usingboth synthetic and real-world materials, our resultsdemonstrate that our method outperforms previousmethods and produces superior results. Furthermore,our proposed setup can also be used to acquire physicallybased rendering maps of special materials.展开更多
Grasp detection plays a critical role for robot manipulation.Mainstream pixel-wise grasp detection networks with encoder-decoder structure receive much attention due to good accuracy and efficiency.However,they usuall...Grasp detection plays a critical role for robot manipulation.Mainstream pixel-wise grasp detection networks with encoder-decoder structure receive much attention due to good accuracy and efficiency.However,they usually transmit the high-level feature in the encoder to the decoder,and low-level features are neglected.It is noted that low-level features contain abundant detail information,and how to fully exploit low-level features remains unsolved.Meanwhile,the channel information in high-level feature is also not well mined.Inevitably,the performance of grasp detection is degraded.To solve these problems,we propose a grasp detection network with hierarchical multi-scale feature fusion and inverted shuffle residual.Both low-level and high-level features in the encoder are firstly fused by the designed skip connections with attention module,and the fused information is then propagated to corresponding layers of the decoder for in-depth feature fusion.Such a hierarchical fusion guarantees the quality of grasp prediction.Furthermore,an inverted shuffle residual module is created,where the high-level feature from encoder is split in channel and the resultant split features are processed in their respective branches.By such differentiation processing,more high-dimensional channel information is kept,which enhances the representation ability of the network.Besides,an information enhancement module is added before the encoder to reinforce input information.The proposed method attains 98.9%and 97.8%in image-wise and object-wise accuracy on the Cornell grasping dataset,respectively,and the experimental results verify the effectiveness of the method.展开更多
基金funded by National Nature Science Foundation of China,grant number 61302188。
文摘In recent years, deep convolutional neural networks have shown superior performance in image denoising. However, deep network structures often come with a large number of model parameters, leading to high training costs and long inference times, limiting their practical application in denoising tasks. This paper proposes a new dual convolutional denoising network with skip connections(DECDNet), which achieves an ideal balance between denoising effect and network complexity. The proposed DECDNet consists of a noise estimation network, a multi-scale feature extraction network, a dual convolutional neural network, and dual attention mechanisms. The noise estimation network is used to estimate the noise level map, and the multi-scale feature extraction network is combined to improve the model's flexibility in obtaining image features. The dual convolutional neural network branch design includes convolution and dilated convolution interactive connections, with the lower branch consisting of dilated convolution layers, and both branches using skip connections. Experiments show that compared with other models, the proposed DECDNet achieves superior PSNR and SSIM values at all compared noise levels, especially at higher noise levels, showing robustness to images with higher noise levels. It also demonstrates better visual effects, maintaining a balance between denoising and detail preservation.
基金Open Access funding provided by the National Institutes of Health(NIH)The funding for this project was provided by NCATS Intramural Fund.
文摘Semantic segmentation plays a foundational role in biomedical image analysis, providing precise information about cellular, tissue, and organ structures in both biological and medical imaging modalities. Traditional approaches often fail in the face of challenges such as low contrast, morphological variability, and densely packed structures. Recent advancements in deep learning have transformed segmentation capabilities through the integration of fine-scale detail preservation, coarse-scale contextual modeling, and multi-scale feature fusion. This work provides a comprehensive analysis of state-of-the-art deep learning models, including U-Net variants, attention-based frameworks, and Transformer-integrated networks, highlighting innovations that improve accuracy, generalizability, and computational efficiency. Key architectural components such as convolution operations, shallow and deep blocks, skip connections, and hybrid encoders are examined for their roles in enhancing spatial representation and semantic consistency. We further discuss the importance of hierarchical and instance-aware segmentation and annotation in interpreting complex biological scenes and multiplexed medical images. By bridging methodological developments with diverse application domains, this paper outlines current trends and future directions for semantic segmentation, emphasizing its critical role in facilitating annotation, diagnosis, and discovery in biomedical research.
基金supported by Scientific Research Fund of Hunan Provincial Natural Science Foundation under Grant 20231J60257Hunan Provincial Engineering Research Center for Intelligent Rehabilitation Robotics and Assistive Equipment under Grant 2025SH501Inha University and Design of a Conflict Detection and Validation Tool under Grant HX2024123.
文摘Image inpainting refers to synthesizing missing content in an image based on known information to restore occluded or damaged regions,which is a typical manifestation of this trend.With the increasing complexity of image in tasks and the growth of data scale,existing deep learning methods still have some limitations.For example,they lack the ability to capture long-range dependencies and their performance in handling multi-scale image structures is suboptimal.To solve this problem,the paper proposes an image inpainting method based on the parallel dual-branch learnable Transformer network.The encoder of the proposed model generator consists of a dual-branch parallel structure with stacked CNN blocks and Transformer blocks,aiming to extract global and local feature information from images.Furthermore,a dual-branch fusion module is adopted to combine the features obtained from both branches.Additionally,a gated full-scale skip connection module is proposed to further enhance the coherence of the inpainting results and alleviate information loss.Finally,experimental results from the three public datasets demonstrate the superior performance of the proposed method.
基金This work was supported by the National Natural Science Foundation of China(Grant No.61673222)Jiangsu Universities Natural Science Research Project(Grant No.13KJA510001)Major Program of the National Social Science Fund of China(Grant No.17ZDA092).
文摘Visual degradation of captured images caused by rainy streaks under rainy weather can adversely affect the performance of many open-air vision systems.Hence,it is necessary to address the problem of eliminating rain streaks from the individual rainy image.In this work,a deep convolution neural network(CNN)based method is introduced,called Rain-Removal Net(R2N),to solve the single image de-raining issue.Firstly,we decomposed the rainy image into its high-frequency detail layer and lowfrequency base layer.Then,we used the high-frequency detail layer to input the carefully designed CNN architecture to learn the mapping between it and its corresponding derained high-frequency detail layer.The CNN architecture consists of four convolution layers and four deconvolution layers,as well as three skip connections.The experiments on synthetic and real-world rainy images show that the performance of our architecture outperforms the compared state-of-the-art de-raining models with respects to the quality of de-rained images and computing efficiency.
基金The authors would like to acknowledge the financial support from the National Science Foundation of China(Grant No.41875027).
文摘Aiming at the problem of radar base and ground observation stations on the Tibet is sparsely distributed and cannot achieve large-scale precipitation monitoring.U-Net,an advanced machine learning(ML)method,is used to develop a robust and rapid algorithm for precipitating cloud detection based on the new-generation geostationary satellite of FengYun-4A(FY-4A).First,in this algorithm,the real-time multi-band infrared brightness temperature from FY-4A combined with the data of Digital Elevation Model(DEM)has been used as predictor variables for our model.Second,the efficiency of the feature was improved by changing the traditional convolution layer serial connection method of U-Net to residual mapping.Then,in order to solve the problem of the network that would produce semantic differences when directly concentrated with low-level and high-level features,we use dense skip pathways to reuse feature maps of different layers as inputs for concatenate neural networks feature layers from different depths.Finally,according to the characteristics of precipitation clouds,the pooling layer of U-Net was replaced by a convolution operation to realize the detection of small precipitation clouds.It was experimentally concluded that the Pixel Accuracy(PA)and Mean Intersection over Union(MIoU)of the improved U-Net on the test set could reach 0.916 and 0.928,the detection of precipitation clouds over Tibet were well actualized.
基金This work was supported in part by the National Natural Science Foundation of China(Nos.62072074,62076054,62027827,61902054)the Frontier Science and Technology Innovation Projects of National Key R&D Program(No.2019QY1405)+2 种基金the Sichuan Science and Technology Innovation Platform and Talent Plan(No.2020JDJQ0020)the Sichuan Science and Technology Support Plan(No.2020YFSY0010)the Natural Science Foundation of Guangdong Province(No.2018A030313354).
文摘As an important part of the new generation of information technology,the Internet of Things(IoT)has been widely concerned and regarded as an enabling technology of the next generation of health care system.The fundus photography equipment is connected to the cloud platform through the IoT,so as to realize the realtime uploading of fundus images and the rapid issuance of diagnostic suggestions by artificial intelligence.At the same time,important security and privacy issues have emerged.The data uploaded to the cloud platform involves more personal attributes,health status and medical application data of patients.Once leaked,abused or improperly disclosed,personal information security will be violated.Therefore,it is important to address the security and privacy issues of massive medical and healthcare equipment connecting to the infrastructure of IoT healthcare and health systems.To meet this challenge,we propose MIA-UNet,a multi-scale iterative aggregation U-network,which aims to achieve accurate and efficient retinal vessel segmentation for ophthalmic auxiliary diagnosis while ensuring that the network has low computational complexity to adapt to mobile terminals.In this way,users do not need to upload the data to the cloud platform,and can analyze and process the fundus images on their own mobile terminals,thus eliminating the leakage of personal information.Specifically,the interconnection between encoder and decoder,as well as the internal connection between decoder subnetworks in classic U-Net are redefined and redesigned.Furthermore,we propose a hybrid loss function to smooth the gradient and deal with the imbalance between foreground and background.Compared with the UNet,the segmentation performance of the proposed network is significantly improved on the premise that the number of parameters is only increased by 2%.When applied to three publicly available datasets:DRIVE,STARE and CHASE DB1,the proposed network achieves the accuracy/F1-score of 96.33%/84.34%,97.12%/83.17%and 97.06%/84.10%,respectively.The experimental results show that the MIA-UNet is superior to the state-of-the-art methods.
基金supported by the Nature Science Fund of Guangdong Province(No.2021A1515011849)the Key Area Research and Development of Guangdong Province(No.2022A0505050014).
文摘In this study, we present a new andinnovative framework for acquiring high-qualitySVBRDF maps. Our approach addresses the limitations of the current methods and proposes a newsolution. The core of our method is a simple hardwaresetup consisting of a consumer-level camera, LEDlights, and a carefully designed network that canaccurately obtain the high-quality SVBRDF propertiesof a nearly planar object. By capturing a flexiblenumber of images of an object, our network usesdifferent subnetworks to train different property mapsand employs appropriate loss functions for each ofthem. To further enhance the quality of the maps, weimproved the network structure by adding a novel skipconnection that connects the encoder and decoder withglobal features. Through extensive experimentation usingboth synthetic and real-world materials, our resultsdemonstrate that our method outperforms previousmethods and produces superior results. Furthermore,our proposed setup can also be used to acquire physicallybased rendering maps of special materials.
基金This work was supported by the National Natural Science Foundation of China(Nos.62073322 and 61633020)the CIE-Tencent Robotics X Rhino-Bird Focused Research Program(No.2022-07)the Beijing Natural Science Foundation(No.2022MQ05).
文摘Grasp detection plays a critical role for robot manipulation.Mainstream pixel-wise grasp detection networks with encoder-decoder structure receive much attention due to good accuracy and efficiency.However,they usually transmit the high-level feature in the encoder to the decoder,and low-level features are neglected.It is noted that low-level features contain abundant detail information,and how to fully exploit low-level features remains unsolved.Meanwhile,the channel information in high-level feature is also not well mined.Inevitably,the performance of grasp detection is degraded.To solve these problems,we propose a grasp detection network with hierarchical multi-scale feature fusion and inverted shuffle residual.Both low-level and high-level features in the encoder are firstly fused by the designed skip connections with attention module,and the fused information is then propagated to corresponding layers of the decoder for in-depth feature fusion.Such a hierarchical fusion guarantees the quality of grasp prediction.Furthermore,an inverted shuffle residual module is created,where the high-level feature from encoder is split in channel and the resultant split features are processed in their respective branches.By such differentiation processing,more high-dimensional channel information is kept,which enhances the representation ability of the network.Besides,an information enhancement module is added before the encoder to reinforce input information.The proposed method attains 98.9%and 97.8%in image-wise and object-wise accuracy on the Cornell grasping dataset,respectively,and the experimental results verify the effectiveness of the method.