Tomato is a major economic crop worldwide,and diseases on tomato leaves can significantly reduce both yield and quality.Traditional manual inspection is inefficient and highly subjective,making it difficult to meet th...Tomato is a major economic crop worldwide,and diseases on tomato leaves can significantly reduce both yield and quality.Traditional manual inspection is inefficient and highly subjective,making it difficult to meet the requirements of early disease identification in complex natural environments.To address this issue,this study proposes an improved YOLO11-based model,YOLO-SPDNet(Scale Sequence Fusion,Position-Channel Attention,and Dual Enhancement Network).The model integrates the SEAM(Self-Ensembling Attention Mechanism)semantic enhancement module,the MLCA(Mixed Local Channel Attention)lightweight attention mechanism,and the SPA(Scale-Position-Detail Awareness)module composed of SSFF(Scale Sequence Feature Fusion),TFE(Triple Feature Encoding),and CPAM(Channel and Position Attention Mechanism).These enhancements strengthen fine-grained lesion detection while maintaining model lightweightness.Experimental results show that YOLO-SPDNet achieves an accuracy of 91.8%,a recall of 86.5%,and an mAP@0.5 of 90.6%on the test set,with a computational complexity of 12.5 GFLOPs.Furthermore,the model reaches a real-time inference speed of 987 FPS,making it suitable for deployment on mobile agricultural terminals and online monitoring systems.Comparative analysis and ablation studies further validate the reliability and practical applicability of the proposed model in complex natural scenes.展开更多
Globally,diabetic retinopathy(DR)is the primary cause of blindness,affecting millions of people worldwide.This widespread impact underscores the critical need for reliable and precise diagnostic techniques to ensure p...Globally,diabetic retinopathy(DR)is the primary cause of blindness,affecting millions of people worldwide.This widespread impact underscores the critical need for reliable and precise diagnostic techniques to ensure prompt diagnosis and effective treatment.Deep learning-based automated diagnosis for diabetic retinopathy can facilitate early detection and treatment.However,traditional deep learning models that focus on local views often learn feature representations that are less discriminative at the semantic level.On the other hand,models that focus on global semantic-level information might overlook critical,subtle local pathological features.To address this issue,we propose an adaptive multi-scale feature fusion network called(AMSFuse),which can adaptively combine multi-scale global and local features without compromising their individual representation.Specifically,our model incorporates global features for extracting high-level contextual information from retinal images.Concurrently,local features capture fine-grained details,such as microaneurysms,hemorrhages,and exudates,which are critical for DR diagnosis.These global and local features are adaptively fused using a fusion block,followed by an Integrated Attention Mechanism(IAM)that refines the fused features by emphasizing relevant regions,thereby enhancing classification accuracy for DR classification.Our model achieves 86.3%accuracy on the APTOS dataset and 96.6%RFMiD,both of which are comparable to state-of-the-art methods.展开更多
To solve the problems of redundant feature information,the insignificant difference in feature representation,and low recognition accuracy of the fine-grained image,based on the ResNeXt50 model,an MSFResNet network mo...To solve the problems of redundant feature information,the insignificant difference in feature representation,and low recognition accuracy of the fine-grained image,based on the ResNeXt50 model,an MSFResNet network model is proposed by fusing multi-scale feature information.Firstly,a multi-scale feature extraction module is designed to obtain multi-scale information on feature images by using different scales of convolution kernels.Meanwhile,the channel attention mechanism is used to increase the global information acquisition of the network.Secondly,the feature images processed by the multi-scale feature extraction module are fused with the deep feature images through short links to guide the full learning of the network,thus reducing the loss of texture details of the deep network feature images,and improving network generalization ability and recognition accuracy.Finally,the validity of the MSFResNet model is verified using public datasets and applied to wild mushroom identification.Experimental results show that compared with ResNeXt50 network model,the accuracy of the MSFResNet model is improved by 6.01%on the FGVC-Aircraft common dataset.It achieves 99.13%classification accuracy on the wild mushroom dataset,which is 0.47%higher than ResNeXt50.Furthermore,the experimental results of the thermal map show that the MSFResNet model significantly reduces the interference of background information,making the network focus on the location of the main body of wild mushroom,which can effectively improve the accuracy of wild mushroom identification.展开更多
Convolutional neural networks(CNNs)-based medical image segmentation technologies have been widely used in medical image segmentation because of their strong representation and generalization abilities.However,due to ...Convolutional neural networks(CNNs)-based medical image segmentation technologies have been widely used in medical image segmentation because of their strong representation and generalization abilities.However,due to the inability to effectively capture global information from images,CNNs can easily lead to loss of contours and textures in segmentation results.Notice that the transformer model can effectively capture the properties of long-range dependencies in the image,and furthermore,combining the CNN and the transformer can effectively extract local details and global contextual features of the image.Motivated by this,we propose a multi-branch and multi-scale attention network(M2ANet)for medical image segmentation,whose architecture consists of three components.Specifically,in the first component,we construct an adaptive multi-branch patch module for parallel extraction of image features to reduce information loss caused by downsampling.In the second component,we apply residual block to the well-known convolutional block attention module to enhance the network’s ability to recognize important features of images and alleviate the phenomenon of gradient vanishing.In the third component,we design a multi-scale feature fusion module,in which we adopt adaptive average pooling and position encoding to enhance contextual features,and then multi-head attention is introduced to further enrich feature representation.Finally,we validate the effectiveness and feasibility of the proposed M2ANet method through comparative experiments on four benchmark medical image segmentation datasets,particularly in the context of preserving contours and textures.展开更多
The precise and automatic segmentation of prostate magnetic resonance imaging(MRI)images is vital for assisting doctors in diagnosing prostate diseases.In recent years,many advanced methods have been applied to prosta...The precise and automatic segmentation of prostate magnetic resonance imaging(MRI)images is vital for assisting doctors in diagnosing prostate diseases.In recent years,many advanced methods have been applied to prostate segmentation,but due to the variability caused by prostate diseases,automatic segmentation of the prostate presents significant challenges.In this paper,we propose an attention-guided multi-scale feature fusion network(AGMSF-Net)to segment prostate MRI images.We propose an attention mechanism for extracting multi-scale features,and introduce a 3D transformer module to enhance global feature representation by adding it during the transition phase from encoder to decoder.In the decoder stage,a feature fusion module is proposed to obtain global context information.We evaluate our model on MRI images of the prostate acquired from a local hospital.The relative volume difference(RVD)and dice similarity coefficient(DSC)between the results of automatic prostate segmentation and ground truth were 1.21%and 93.68%,respectively.To quantitatively evaluate prostate volume on MRI,which is of significant clinical significance,we propose a unique AGMSF-Net.The essential performance evaluation and validation experiments have demonstrated the effectiveness of our method in automatic prostate segmentation.展开更多
With the rapid expansion of drone applications,accurate detection of objects in aerial imagery has become crucial for intelligent transportation,urban management,and emergency rescue missions.However,existing methods ...With the rapid expansion of drone applications,accurate detection of objects in aerial imagery has become crucial for intelligent transportation,urban management,and emergency rescue missions.However,existing methods face numerous challenges in practical deployment,including scale variation handling,feature degradation,and complex backgrounds.To address these issues,we propose Edge-enhanced and Detail-Capturing You Only Look Once(EHDC-YOLO),a novel framework for object detection in Unmanned Aerial Vehicle(UAV)imagery.Based on the You Only Look Once version 11 nano(YOLOv11n)baseline,EHDC-YOLO systematically introduces several architectural enhancements:(1)a Multi-Scale Edge Enhancement(MSEE)module that leverages multi-scale pooling and edge information to enhance boundary feature extraction;(2)an Enhanced Feature Pyramid Network(EFPN)that integrates P2-level features with Cross Stage Partial(CSP)structures and OmniKernel convolutions for better fine-grained representation;and(3)Dynamic Head(DyHead)with multi-dimensional attention mechanisms for enhanced cross-scale modeling and perspective adaptability.Comprehensive experiments on the Vision meets Drones for Detection(VisDrone-DET)2019 dataset demonstrate that EHDC-YOLO achieves significant improvements,increasing mean Average Precision(mAP)@0.5 from 33.2%to 46.1%(an absolute improvement of 12.9 percentage points)and mAP@0.5:0.95 from 19.5%to 28.0%(an absolute improvement of 8.5 percentage points)compared with the YOLOv11n baseline,while maintaining a reasonable parameter count(2.81 M vs the baseline’s 2.58 M).Further ablation studies confirm the effectiveness of each proposed component,while visualization results highlight EHDC-YOLO’s superior performance in detecting objects and handling occlusions in complex drone scenarios.展开更多
Solar cell defect detection is crucial for quality inspection in photovoltaic power generation modules.In the production process,defect samples occur infrequently and exhibit random shapes and sizes,which makes it cha...Solar cell defect detection is crucial for quality inspection in photovoltaic power generation modules.In the production process,defect samples occur infrequently and exhibit random shapes and sizes,which makes it challenging to collect defective samples.Additionally,the complex surface background of polysilicon cell wafers complicates the accurate identification and localization of defective regions.This paper proposes a novel Lightweight Multiscale Feature Fusion network(LMFF)to address these challenges.The network comprises a feature extraction network,a multi-scale feature fusion module(MFF),and a segmentation network.Specifically,a feature extraction network is proposed to obtain multi-scale feature outputs,and a multi-scale feature fusion module(MFF)is used to fuse multi-scale feature information effectively.In order to capture finer-grained multi-scale information from the fusion features,we propose a multi-scale attention module(MSA)in the segmentation network to enhance the network’s ability for small target detection.Moreover,depthwise separable convolutions are introduced to construct depthwise separable residual blocks(DSR)to reduce the model’s parameter number.Finally,to validate the proposed method’s defect segmentation and localization performance,we constructed three solar cell defect detection datasets:SolarCells,SolarCells-S,and PVEL-S.SolarCells and SolarCells-S are monocrystalline silicon datasets,and PVEL-S is a polycrystalline silicon dataset.Experimental results show that the IOU of our method on these three datasets can reach 68.5%,51.0%,and 92.7%,respectively,and the F1-Score can reach 81.3%,67.5%,and 96.2%,respectively,which surpasses other commonly usedmethods and verifies the effectiveness of our LMFF network.展开更多
Human pose estimation has received much attention from the research community because of its wide range of applications.However,current research for pose estimation is usually complex and computationally intensive,esp...Human pose estimation has received much attention from the research community because of its wide range of applications.However,current research for pose estimation is usually complex and computationally intensive,especially the feature loss problems in the feature fusion process.To address the above problems,we propose a lightweight human pose estimation network based on multi-attention mechanism(LMANet).In our method,network parameters can be significantly reduced by lightweighting the bottleneck blocks with depth-wise separable convolution on the high-resolution networks.After that,we also introduce a multi-attention mechanism to improve the model prediction accuracy,and the channel attention module is added in the initial stage of the network to enhance the local cross-channel information interaction.More importantly,we inject spatial crossawareness module in the multi-scale feature fusion stage to reduce the spatial information loss during feature extraction.Extensive experiments on COCO2017 dataset and MPII dataset show that LMANet can guarantee a higher prediction accuracy with fewer network parameters and computational effort.Compared with the highresolution network HRNet,the number of parameters and the computational complexity of the network are reduced by 67%and 73%,respectively.展开更多
Semantic segmentation has made significant breakthroughs in various application fields,but achieving both accurate and efficient segmentation with limited computational resources remains a major challenge.To this end,...Semantic segmentation has made significant breakthroughs in various application fields,but achieving both accurate and efficient segmentation with limited computational resources remains a major challenge.To this end,we propose CGMISeg,an efficient semantic segmentation architecture based on a context-guided multi-scale interaction strategy,aiming to significantly reduce computational overhead while maintaining segmentation accuracy.CGMISeg consists of three core components:context-aware attention modulation,feature reconstruction,and crossinformation fusion.Context-aware attention modulation is carefully designed to capture key contextual information through channel and spatial attention mechanisms.The feature reconstruction module reconstructs contextual information from different scales,modeling key rectangular areas by capturing critical contextual information in both horizontal and vertical directions,thereby enhancing the focus on foreground features.The cross-information fusion module aims to fuse the reconstructed high-level features with the original low-level features during upsampling,promoting multi-scale interaction and enhancing the model’s ability to handle objects at different scales.We extensively evaluated CGMISeg on ADE20K,Cityscapes,and COCO-Stuff,three widely used datasets benchmarks,and the experimental results show that CGMISeg exhibits significant advantages in segmentation performance,computational efficiency,and inference speed,clearly outperforming several mainstream methods,including SegFormer,Feedformer,and SegNext.Specifically,CGMISeg achieves 42.9%mIoU(Mean Intersection over Union)and 15.7 FPS(Frames Per Second)on the ADE20K dataset with 3.8 GFLOPs(Giga Floating-point Operations Per Second),outperforming Feedformer and SegNeXt by 3.7%and 1.8%in mIoU,respectively,while also offering reduced computational complexity and faster inference.CGMISeg strikes an excellent balance between accuracy and efficiency,significantly enhancing both computational and inference performance while maintaining high precision,showcasing exceptional practical value and strong potential for widespread applications.展开更多
To solve the problem of difficulty in identifying apple diseases in the natural environment and the low application rate of deep learning recognition networks,a lightweight ResNet(LW-ResNet)model for apple disease rec...To solve the problem of difficulty in identifying apple diseases in the natural environment and the low application rate of deep learning recognition networks,a lightweight ResNet(LW-ResNet)model for apple disease recognition is proposed.Based on the deep residual network(ResNet18),the multi-scale feature extraction layer is constructed by group convolution to realize the compression model and improve the extraction ability of different sizes of lesion features.By improving the identity mapping structure to reduce information loss.By introducing the efficient channel attention module(ECANet)to suppress noise from a complex background.The experimental results show that the average precision,recall and F1-score of the LW-ResNet on the test set are 97.80%,97.92%and 97.85%,respectively.The parameter memory is 2.32 MB,which is 94%less than that of ResNet18.Compared with the classic lightweight networks SqueezeNet and MobileNetV2,LW-ResNet has obvious advantages in recognition performance,speed,parameter memory requirement and time complexity.The proposed model has the advantages of low computational cost,low storage cost,strong real-time performance,high identification accuracy,and strong practicability,which can meet the needs of real-time identification task of apple leaf disease on resource-constrained devices.展开更多
This study addresses the critical issue of anemia detection using machine learning(ML)techniques.Although a widespread blood disorder with significant health implications,anemia often remains undetected.This necessita...This study addresses the critical issue of anemia detection using machine learning(ML)techniques.Although a widespread blood disorder with significant health implications,anemia often remains undetected.This necessitates timely and efficient diagnostic methods,as traditional approaches that rely on manual assessment are time-consuming and subjective.The present study explored the application of ML-particularly classification models,such as logistic regression,decision trees,random forest,support vector machines,Naïve Bayes,and k-nearest neighbors-in conjunction with innovative models incorporating attention modules and spatial attention to detect anemia.The proposed models demonstrated promising results,achieving high accuracy,precision,recall,and F1 scores for both textual and image datasets.In addition,an integrated approach that combines textual and image data was found to outperform the individual modalities.Specifically,the proposed AlexNet Multiple Spatial Attention model achieved an exceptional accuracy of 99.58%,emphasizing its potential to revolutionize automated anemia detection.The results of ablation studies confirm the significance of key components-including the blue-green-red,multiple,and spatial attentions-in enhancing model performance.Overall,this study presents a comprehensive and innovative framework for noninvasive anemia detection,contributing valuable insights to the field.展开更多
Fine-grained visual classification(FGVC)is a very challenging task due to distinguishing subcategories under the same super-category.Recent works mainly localize discriminative image regions and capture subtle inter-c...Fine-grained visual classification(FGVC)is a very challenging task due to distinguishing subcategories under the same super-category.Recent works mainly localize discriminative image regions and capture subtle inter-class differences by utilizing attention-based methods.However,at the same layer,most attention-based works only consider large-scale attention blocks with the same size as feature maps,and they ignore small-scale attention blocks that are smaller than feature maps.To distinguish subcategories,it is important to exploit small local regions.In this work,a novel multi-scale attention network(MSANet)is proposed to capture large and small regions at the same layer in fine-grained visual classification.Specifically,a novel multi-scale attention layer(MSAL)is proposed,which generates multiple groups in each feature maps to capture different-scale discriminative regions.The groups based on large-scale regions can exploit global features and the groups based on the small-scale regions can extract local subtle features.Then,a simple feature fusion strategy is utilized to fully integrate global features and local subtle features to mine information that are more conducive to FGVC.Comprehensive experiments in Caltech-UCSD Birds-200-2011(CUB),FGVC-Aircraft(AIR)and Stanford Cars(Cars)datasets show that our method achieves the competitive performances,which demonstrate its effectiveness.展开更多
Traditional classification methods for cervical cells heavily rely on manual feature extraction,constraining their versatility due to the intricacies of cytology images.Although deep learning approaches offer remarkab...Traditional classification methods for cervical cells heavily rely on manual feature extraction,constraining their versatility due to the intricacies of cytology images.Although deep learning approaches offer remarkable po-tential,they often sacrifice domain-specific knowledge,particularly the morphological patterns characterizing various cell subtypes during automated feature extraction.To bridge this gap,we introduce a novel hierarchical framework that integrates robust features from color,texture,and morphology with latent representations discovered by an improved attention-based multi-scale local binary convolutional neural networks(MS-LBCNN),designed to facilitate powerful feature extraction mechanism.We enhance the standard 6-class Bethesda system(TBS)classification by incorporating a coarse-to-refine fusion strategy,which optimizes the classification pro-cess.The proposed method is uniquely equipped to manage the complexities present in both individual and clustered cell images.Upon rigorous evaluation across three independent data cohorts,our method consistently surpassed existing state-of-the-art techniques.The experimental results indicated the potential of our method in enhancing the development of automation-aided diagnostic systems,and bolstering both the accuracy and ef-ficiency of cytology screening procedures.展开更多
Grasp detection plays a critical role for robot manipulation.Mainstream pixel-wise grasp detection networks with encoder-decoder structure receive much attention due to good accuracy and efficiency.However,they usuall...Grasp detection plays a critical role for robot manipulation.Mainstream pixel-wise grasp detection networks with encoder-decoder structure receive much attention due to good accuracy and efficiency.However,they usually transmit the high-level feature in the encoder to the decoder,and low-level features are neglected.It is noted that low-level features contain abundant detail information,and how to fully exploit low-level features remains unsolved.Meanwhile,the channel information in high-level feature is also not well mined.Inevitably,the performance of grasp detection is degraded.To solve these problems,we propose a grasp detection network with hierarchical multi-scale feature fusion and inverted shuffle residual.Both low-level and high-level features in the encoder are firstly fused by the designed skip connections with attention module,and the fused information is then propagated to corresponding layers of the decoder for in-depth feature fusion.Such a hierarchical fusion guarantees the quality of grasp prediction.Furthermore,an inverted shuffle residual module is created,where the high-level feature from encoder is split in channel and the resultant split features are processed in their respective branches.By such differentiation processing,more high-dimensional channel information is kept,which enhances the representation ability of the network.Besides,an information enhancement module is added before the encoder to reinforce input information.The proposed method attains 98.9%and 97.8%in image-wise and object-wise accuracy on the Cornell grasping dataset,respectively,and the experimental results verify the effectiveness of the method.展开更多
基金Tianmin Tianyuan Boutique Vegetable Industry Technology Service Station(Grant No.2024120011003081)Development of Environmental Monitoring and Traceability System for Wuqing Agricultural Production Areas(Grant No.2024120011001866)。
文摘Tomato is a major economic crop worldwide,and diseases on tomato leaves can significantly reduce both yield and quality.Traditional manual inspection is inefficient and highly subjective,making it difficult to meet the requirements of early disease identification in complex natural environments.To address this issue,this study proposes an improved YOLO11-based model,YOLO-SPDNet(Scale Sequence Fusion,Position-Channel Attention,and Dual Enhancement Network).The model integrates the SEAM(Self-Ensembling Attention Mechanism)semantic enhancement module,the MLCA(Mixed Local Channel Attention)lightweight attention mechanism,and the SPA(Scale-Position-Detail Awareness)module composed of SSFF(Scale Sequence Feature Fusion),TFE(Triple Feature Encoding),and CPAM(Channel and Position Attention Mechanism).These enhancements strengthen fine-grained lesion detection while maintaining model lightweightness.Experimental results show that YOLO-SPDNet achieves an accuracy of 91.8%,a recall of 86.5%,and an mAP@0.5 of 90.6%on the test set,with a computational complexity of 12.5 GFLOPs.Furthermore,the model reaches a real-time inference speed of 987 FPS,making it suitable for deployment on mobile agricultural terminals and online monitoring systems.Comparative analysis and ablation studies further validate the reliability and practical applicability of the proposed model in complex natural scenes.
基金supported by the National Natural Science Foundation of China(No.62376287)the International Science and Technology Innovation Joint Base of Machine Vision and Medical Image Processing in Hunan Province(2021CB1013)the Natural Science Foundation of Hunan Province(Nos.2022JJ30762,2023JJ70016).
文摘Globally,diabetic retinopathy(DR)is the primary cause of blindness,affecting millions of people worldwide.This widespread impact underscores the critical need for reliable and precise diagnostic techniques to ensure prompt diagnosis and effective treatment.Deep learning-based automated diagnosis for diabetic retinopathy can facilitate early detection and treatment.However,traditional deep learning models that focus on local views often learn feature representations that are less discriminative at the semantic level.On the other hand,models that focus on global semantic-level information might overlook critical,subtle local pathological features.To address this issue,we propose an adaptive multi-scale feature fusion network called(AMSFuse),which can adaptively combine multi-scale global and local features without compromising their individual representation.Specifically,our model incorporates global features for extracting high-level contextual information from retinal images.Concurrently,local features capture fine-grained details,such as microaneurysms,hemorrhages,and exudates,which are critical for DR diagnosis.These global and local features are adaptively fused using a fusion block,followed by an Integrated Attention Mechanism(IAM)that refines the fused features by emphasizing relevant regions,thereby enhancing classification accuracy for DR classification.Our model achieves 86.3%accuracy on the APTOS dataset and 96.6%RFMiD,both of which are comparable to state-of-the-art methods.
基金supported by National Natural Science Foundation of China(No.61862037)Lanzhou Jiaotong University Tianyou Innovation Team Project(No.TY202002)。
文摘To solve the problems of redundant feature information,the insignificant difference in feature representation,and low recognition accuracy of the fine-grained image,based on the ResNeXt50 model,an MSFResNet network model is proposed by fusing multi-scale feature information.Firstly,a multi-scale feature extraction module is designed to obtain multi-scale information on feature images by using different scales of convolution kernels.Meanwhile,the channel attention mechanism is used to increase the global information acquisition of the network.Secondly,the feature images processed by the multi-scale feature extraction module are fused with the deep feature images through short links to guide the full learning of the network,thus reducing the loss of texture details of the deep network feature images,and improving network generalization ability and recognition accuracy.Finally,the validity of the MSFResNet model is verified using public datasets and applied to wild mushroom identification.Experimental results show that compared with ResNeXt50 network model,the accuracy of the MSFResNet model is improved by 6.01%on the FGVC-Aircraft common dataset.It achieves 99.13%classification accuracy on the wild mushroom dataset,which is 0.47%higher than ResNeXt50.Furthermore,the experimental results of the thermal map show that the MSFResNet model significantly reduces the interference of background information,making the network focus on the location of the main body of wild mushroom,which can effectively improve the accuracy of wild mushroom identification.
基金supported by the Natural Science Foundation of the Anhui Higher Education Institutions of China(Grant Nos.2023AH040149 and 2024AH051915)the Anhui Provincial Natural Science Foundation(Grant No.2208085MF168)+1 种基金the Science and Technology Innovation Tackle Plan Project of Maanshan(Grant No.2024RGZN001)the Scientific Research Fund Project of Anhui Medical University(Grant No.2023xkj122).
文摘Convolutional neural networks(CNNs)-based medical image segmentation technologies have been widely used in medical image segmentation because of their strong representation and generalization abilities.However,due to the inability to effectively capture global information from images,CNNs can easily lead to loss of contours and textures in segmentation results.Notice that the transformer model can effectively capture the properties of long-range dependencies in the image,and furthermore,combining the CNN and the transformer can effectively extract local details and global contextual features of the image.Motivated by this,we propose a multi-branch and multi-scale attention network(M2ANet)for medical image segmentation,whose architecture consists of three components.Specifically,in the first component,we construct an adaptive multi-branch patch module for parallel extraction of image features to reduce information loss caused by downsampling.In the second component,we apply residual block to the well-known convolutional block attention module to enhance the network’s ability to recognize important features of images and alleviate the phenomenon of gradient vanishing.In the third component,we design a multi-scale feature fusion module,in which we adopt adaptive average pooling and position encoding to enhance contextual features,and then multi-head attention is introduced to further enrich feature representation.Finally,we validate the effectiveness and feasibility of the proposed M2ANet method through comparative experiments on four benchmark medical image segmentation datasets,particularly in the context of preserving contours and textures.
基金This work was supported in part by the National Natural Science Foundation of China(Grant#:82260362)in part by the National Key R&D Program of China(Grant#:2021ZD0111000)+1 种基金in part by the Key R&D Project of Hainan Province(Grant#:ZDYF2021SHFZ243)in part by the Major Science and Technology Project of Haikou(Grant#:2020-009).
文摘The precise and automatic segmentation of prostate magnetic resonance imaging(MRI)images is vital for assisting doctors in diagnosing prostate diseases.In recent years,many advanced methods have been applied to prostate segmentation,but due to the variability caused by prostate diseases,automatic segmentation of the prostate presents significant challenges.In this paper,we propose an attention-guided multi-scale feature fusion network(AGMSF-Net)to segment prostate MRI images.We propose an attention mechanism for extracting multi-scale features,and introduce a 3D transformer module to enhance global feature representation by adding it during the transition phase from encoder to decoder.In the decoder stage,a feature fusion module is proposed to obtain global context information.We evaluate our model on MRI images of the prostate acquired from a local hospital.The relative volume difference(RVD)and dice similarity coefficient(DSC)between the results of automatic prostate segmentation and ground truth were 1.21%and 93.68%,respectively.To quantitatively evaluate prostate volume on MRI,which is of significant clinical significance,we propose a unique AGMSF-Net.The essential performance evaluation and validation experiments have demonstrated the effectiveness of our method in automatic prostate segmentation.
文摘With the rapid expansion of drone applications,accurate detection of objects in aerial imagery has become crucial for intelligent transportation,urban management,and emergency rescue missions.However,existing methods face numerous challenges in practical deployment,including scale variation handling,feature degradation,and complex backgrounds.To address these issues,we propose Edge-enhanced and Detail-Capturing You Only Look Once(EHDC-YOLO),a novel framework for object detection in Unmanned Aerial Vehicle(UAV)imagery.Based on the You Only Look Once version 11 nano(YOLOv11n)baseline,EHDC-YOLO systematically introduces several architectural enhancements:(1)a Multi-Scale Edge Enhancement(MSEE)module that leverages multi-scale pooling and edge information to enhance boundary feature extraction;(2)an Enhanced Feature Pyramid Network(EFPN)that integrates P2-level features with Cross Stage Partial(CSP)structures and OmniKernel convolutions for better fine-grained representation;and(3)Dynamic Head(DyHead)with multi-dimensional attention mechanisms for enhanced cross-scale modeling and perspective adaptability.Comprehensive experiments on the Vision meets Drones for Detection(VisDrone-DET)2019 dataset demonstrate that EHDC-YOLO achieves significant improvements,increasing mean Average Precision(mAP)@0.5 from 33.2%to 46.1%(an absolute improvement of 12.9 percentage points)and mAP@0.5:0.95 from 19.5%to 28.0%(an absolute improvement of 8.5 percentage points)compared with the YOLOv11n baseline,while maintaining a reasonable parameter count(2.81 M vs the baseline’s 2.58 M).Further ablation studies confirm the effectiveness of each proposed component,while visualization results highlight EHDC-YOLO’s superior performance in detecting objects and handling occlusions in complex drone scenarios.
基金supported in part by the National Natural Science Foundation of China under Grants 62463002,62062021 and 62473033in part by the Guiyang Scientific Plan Project[2023]48–11,in part by QKHZYD[2023]010 Guizhou Province Science and Technology Innovation Base Construction Project“Key Laboratory Construction of Intelligent Mountain Agricultural Equipment”.
文摘Solar cell defect detection is crucial for quality inspection in photovoltaic power generation modules.In the production process,defect samples occur infrequently and exhibit random shapes and sizes,which makes it challenging to collect defective samples.Additionally,the complex surface background of polysilicon cell wafers complicates the accurate identification and localization of defective regions.This paper proposes a novel Lightweight Multiscale Feature Fusion network(LMFF)to address these challenges.The network comprises a feature extraction network,a multi-scale feature fusion module(MFF),and a segmentation network.Specifically,a feature extraction network is proposed to obtain multi-scale feature outputs,and a multi-scale feature fusion module(MFF)is used to fuse multi-scale feature information effectively.In order to capture finer-grained multi-scale information from the fusion features,we propose a multi-scale attention module(MSA)in the segmentation network to enhance the network’s ability for small target detection.Moreover,depthwise separable convolutions are introduced to construct depthwise separable residual blocks(DSR)to reduce the model’s parameter number.Finally,to validate the proposed method’s defect segmentation and localization performance,we constructed three solar cell defect detection datasets:SolarCells,SolarCells-S,and PVEL-S.SolarCells and SolarCells-S are monocrystalline silicon datasets,and PVEL-S is a polycrystalline silicon dataset.Experimental results show that the IOU of our method on these three datasets can reach 68.5%,51.0%,and 92.7%,respectively,and the F1-Score can reach 81.3%,67.5%,and 96.2%,respectively,which surpasses other commonly usedmethods and verifies the effectiveness of our LMFF network.
基金the National Natural Science Foundation of China(Nos.61775139,62072126,61772164,and 61872242)。
文摘Human pose estimation has received much attention from the research community because of its wide range of applications.However,current research for pose estimation is usually complex and computationally intensive,especially the feature loss problems in the feature fusion process.To address the above problems,we propose a lightweight human pose estimation network based on multi-attention mechanism(LMANet).In our method,network parameters can be significantly reduced by lightweighting the bottleneck blocks with depth-wise separable convolution on the high-resolution networks.After that,we also introduce a multi-attention mechanism to improve the model prediction accuracy,and the channel attention module is added in the initial stage of the network to enhance the local cross-channel information interaction.More importantly,we inject spatial crossawareness module in the multi-scale feature fusion stage to reduce the spatial information loss during feature extraction.Extensive experiments on COCO2017 dataset and MPII dataset show that LMANet can guarantee a higher prediction accuracy with fewer network parameters and computational effort.Compared with the highresolution network HRNet,the number of parameters and the computational complexity of the network are reduced by 67%and 73%,respectively.
基金supported by the National Natural Science Foundation of China(62162007)the Guizhou Provincial Basic Research Program(Natural Science)(No.QianKeHeJiChu-ZK[2024]YiBan079).
文摘Semantic segmentation has made significant breakthroughs in various application fields,but achieving both accurate and efficient segmentation with limited computational resources remains a major challenge.To this end,we propose CGMISeg,an efficient semantic segmentation architecture based on a context-guided multi-scale interaction strategy,aiming to significantly reduce computational overhead while maintaining segmentation accuracy.CGMISeg consists of three core components:context-aware attention modulation,feature reconstruction,and crossinformation fusion.Context-aware attention modulation is carefully designed to capture key contextual information through channel and spatial attention mechanisms.The feature reconstruction module reconstructs contextual information from different scales,modeling key rectangular areas by capturing critical contextual information in both horizontal and vertical directions,thereby enhancing the focus on foreground features.The cross-information fusion module aims to fuse the reconstructed high-level features with the original low-level features during upsampling,promoting multi-scale interaction and enhancing the model’s ability to handle objects at different scales.We extensively evaluated CGMISeg on ADE20K,Cityscapes,and COCO-Stuff,three widely used datasets benchmarks,and the experimental results show that CGMISeg exhibits significant advantages in segmentation performance,computational efficiency,and inference speed,clearly outperforming several mainstream methods,including SegFormer,Feedformer,and SegNext.Specifically,CGMISeg achieves 42.9%mIoU(Mean Intersection over Union)and 15.7 FPS(Frames Per Second)on the ADE20K dataset with 3.8 GFLOPs(Giga Floating-point Operations Per Second),outperforming Feedformer and SegNeXt by 3.7%and 1.8%in mIoU,respectively,while also offering reduced computational complexity and faster inference.CGMISeg strikes an excellent balance between accuracy and efficiency,significantly enhancing both computational and inference performance while maintaining high precision,showcasing exceptional practical value and strong potential for widespread applications.
基金funded by the Science and Technology Development Program of Jilin Province(20190301024NY)the Precision Agriculture and Big Data Engineering Research Center of Jilin Province(2020C005).
文摘To solve the problem of difficulty in identifying apple diseases in the natural environment and the low application rate of deep learning recognition networks,a lightweight ResNet(LW-ResNet)model for apple disease recognition is proposed.Based on the deep residual network(ResNet18),the multi-scale feature extraction layer is constructed by group convolution to realize the compression model and improve the extraction ability of different sizes of lesion features.By improving the identity mapping structure to reduce information loss.By introducing the efficient channel attention module(ECANet)to suppress noise from a complex background.The experimental results show that the average precision,recall and F1-score of the LW-ResNet on the test set are 97.80%,97.92%and 97.85%,respectively.The parameter memory is 2.32 MB,which is 94%less than that of ResNet18.Compared with the classic lightweight networks SqueezeNet and MobileNetV2,LW-ResNet has obvious advantages in recognition performance,speed,parameter memory requirement and time complexity.The proposed model has the advantages of low computational cost,low storage cost,strong real-time performance,high identification accuracy,and strong practicability,which can meet the needs of real-time identification task of apple leaf disease on resource-constrained devices.
基金supported by the Key Research and Development Program of Hunan Province,No.2023SK2038.
文摘This study addresses the critical issue of anemia detection using machine learning(ML)techniques.Although a widespread blood disorder with significant health implications,anemia often remains undetected.This necessitates timely and efficient diagnostic methods,as traditional approaches that rely on manual assessment are time-consuming and subjective.The present study explored the application of ML-particularly classification models,such as logistic regression,decision trees,random forest,support vector machines,Naïve Bayes,and k-nearest neighbors-in conjunction with innovative models incorporating attention modules and spatial attention to detect anemia.The proposed models demonstrated promising results,achieving high accuracy,precision,recall,and F1 scores for both textual and image datasets.In addition,an integrated approach that combines textual and image data was found to outperform the individual modalities.Specifically,the proposed AlexNet Multiple Spatial Attention model achieved an exceptional accuracy of 99.58%,emphasizing its potential to revolutionize automated anemia detection.The results of ablation studies confirm the significance of key components-including the blue-green-red,multiple,and spatial attentions-in enhancing model performance.Overall,this study presents a comprehensive and innovative framework for noninvasive anemia detection,contributing valuable insights to the field.
基金jointly supported by the National Science and Technology Major Project(2022ZD0117103)the National Natural Science Foundations of China(62272364)+2 种基金the provincial Key Research and Development Program of Shaanxi(2024GH-ZDXM-47)the Research Project on Higher Education Teaching Reform of Shaanxi Province(23JG003)the Natural Science Basic Research Program of Shaanxi(2024JC-YBQN0639).
文摘Fine-grained visual classification(FGVC)is a very challenging task due to distinguishing subcategories under the same super-category.Recent works mainly localize discriminative image regions and capture subtle inter-class differences by utilizing attention-based methods.However,at the same layer,most attention-based works only consider large-scale attention blocks with the same size as feature maps,and they ignore small-scale attention blocks that are smaller than feature maps.To distinguish subcategories,it is important to exploit small local regions.In this work,a novel multi-scale attention network(MSANet)is proposed to capture large and small regions at the same layer in fine-grained visual classification.Specifically,a novel multi-scale attention layer(MSAL)is proposed,which generates multiple groups in each feature maps to capture different-scale discriminative regions.The groups based on large-scale regions can exploit global features and the groups based on the small-scale regions can extract local subtle features.Then,a simple feature fusion strategy is utilized to fully integrate global features and local subtle features to mine information that are more conducive to FGVC.Comprehensive experiments in Caltech-UCSD Birds-200-2011(CUB),FGVC-Aircraft(AIR)and Stanford Cars(Cars)datasets show that our method achieves the competitive performances,which demonstrate its effectiveness.
基金supported by the Beijing Capital Health Development Research Project[Grant no.2024-2-1031]the Beijing Municipal Natural Science Foundation[Grant no.7192105].
文摘Traditional classification methods for cervical cells heavily rely on manual feature extraction,constraining their versatility due to the intricacies of cytology images.Although deep learning approaches offer remarkable po-tential,they often sacrifice domain-specific knowledge,particularly the morphological patterns characterizing various cell subtypes during automated feature extraction.To bridge this gap,we introduce a novel hierarchical framework that integrates robust features from color,texture,and morphology with latent representations discovered by an improved attention-based multi-scale local binary convolutional neural networks(MS-LBCNN),designed to facilitate powerful feature extraction mechanism.We enhance the standard 6-class Bethesda system(TBS)classification by incorporating a coarse-to-refine fusion strategy,which optimizes the classification pro-cess.The proposed method is uniquely equipped to manage the complexities present in both individual and clustered cell images.Upon rigorous evaluation across three independent data cohorts,our method consistently surpassed existing state-of-the-art techniques.The experimental results indicated the potential of our method in enhancing the development of automation-aided diagnostic systems,and bolstering both the accuracy and ef-ficiency of cytology screening procedures.
基金This work was supported by the National Natural Science Foundation of China(Nos.62073322 and 61633020)the CIE-Tencent Robotics X Rhino-Bird Focused Research Program(No.2022-07)the Beijing Natural Science Foundation(No.2022MQ05).
文摘Grasp detection plays a critical role for robot manipulation.Mainstream pixel-wise grasp detection networks with encoder-decoder structure receive much attention due to good accuracy and efficiency.However,they usually transmit the high-level feature in the encoder to the decoder,and low-level features are neglected.It is noted that low-level features contain abundant detail information,and how to fully exploit low-level features remains unsolved.Meanwhile,the channel information in high-level feature is also not well mined.Inevitably,the performance of grasp detection is degraded.To solve these problems,we propose a grasp detection network with hierarchical multi-scale feature fusion and inverted shuffle residual.Both low-level and high-level features in the encoder are firstly fused by the designed skip connections with attention module,and the fused information is then propagated to corresponding layers of the decoder for in-depth feature fusion.Such a hierarchical fusion guarantees the quality of grasp prediction.Furthermore,an inverted shuffle residual module is created,where the high-level feature from encoder is split in channel and the resultant split features are processed in their respective branches.By such differentiation processing,more high-dimensional channel information is kept,which enhances the representation ability of the network.Besides,an information enhancement module is added before the encoder to reinforce input information.The proposed method attains 98.9%and 97.8%in image-wise and object-wise accuracy on the Cornell grasping dataset,respectively,and the experimental results verify the effectiveness of the method.