Accurate traffic flow prediction has a profound impact on modern traffic management. Traffic flow has complex spatial-temporal correlations and periodicity, which poses difficulties for precise prediction. To address ...Accurate traffic flow prediction has a profound impact on modern traffic management. Traffic flow has complex spatial-temporal correlations and periodicity, which poses difficulties for precise prediction. To address this problem, a Multi-head Self-attention and Spatial-Temporal Graph Convolutional Network (MSSTGCN) for multiscale traffic flow prediction is proposed. Firstly, to capture the hidden traffic periodicity of traffic flow, traffic flow is divided into three kinds of periods, including hourly, daily, and weekly data. Secondly, a graph attention residual layer is constructed to learn the global spatial features across regions. Local spatial-temporal dependence is captured by using a T-GCN module. Thirdly, a transformer layer is introduced to learn the long-term dependence in time. A position embedding mechanism is introduced to label position information for all traffic sequences. Thus, this multi-head self-attention mechanism can recognize the sequence order and allocate weights for different time nodes. Experimental results on four real-world datasets show that the MSSTGCN performs better than the baseline methods and can be successfully adapted to traffic prediction tasks.展开更多
Tomato is a major economic crop worldwide,and diseases on tomato leaves can significantly reduce both yield and quality.Traditional manual inspection is inefficient and highly subjective,making it difficult to meet th...Tomato is a major economic crop worldwide,and diseases on tomato leaves can significantly reduce both yield and quality.Traditional manual inspection is inefficient and highly subjective,making it difficult to meet the requirements of early disease identification in complex natural environments.To address this issue,this study proposes an improved YOLO11-based model,YOLO-SPDNet(Scale Sequence Fusion,Position-Channel Attention,and Dual Enhancement Network).The model integrates the SEAM(Self-Ensembling Attention Mechanism)semantic enhancement module,the MLCA(Mixed Local Channel Attention)lightweight attention mechanism,and the SPA(Scale-Position-Detail Awareness)module composed of SSFF(Scale Sequence Feature Fusion),TFE(Triple Feature Encoding),and CPAM(Channel and Position Attention Mechanism).These enhancements strengthen fine-grained lesion detection while maintaining model lightweightness.Experimental results show that YOLO-SPDNet achieves an accuracy of 91.8%,a recall of 86.5%,and an mAP@0.5 of 90.6%on the test set,with a computational complexity of 12.5 GFLOPs.Furthermore,the model reaches a real-time inference speed of 987 FPS,making it suitable for deployment on mobile agricultural terminals and online monitoring systems.Comparative analysis and ablation studies further validate the reliability and practical applicability of the proposed model in complex natural scenes.展开更多
Defect detection in printed circuit boards(PCB)remains challenging due to the difficulty of identifying small-scale defects,the inefficiency of conventional approaches,and the interference from complex backgrounds.To ...Defect detection in printed circuit boards(PCB)remains challenging due to the difficulty of identifying small-scale defects,the inefficiency of conventional approaches,and the interference from complex backgrounds.To address these issues,this paper proposes SIM-Net,an enhanced detection framework derived from YOLOv11.The model integrates SPDConv to preserve fine-grained features for small object detection,introduces a novel convolutional partial attention module(C2PAM)to suppress redundant background information and highlight salient regions,and employs a multi-scale fusion network(MFN)with a multi-grain contextual module(MGCT)to strengthen contextual representation and accelerate inference.Experimental evaluations demonstrate that SIM-Net achieves 92.4%mAP,92%accuracy,and 89.4%recall with an inference speed of 75.1 FPS,outperforming existing state-of-the-art methods.These results confirm the robustness and real-time applicability of SIM-Net for PCB defect inspection.展开更多
This study proposes a multi-scale simplified residual convolutional neural network(MS-SRCNN)for the precise prediction of Mg-Nd binary alloy compositions from scanning electron microscope(SEM)images.A multi-scale data...This study proposes a multi-scale simplified residual convolutional neural network(MS-SRCNN)for the precise prediction of Mg-Nd binary alloy compositions from scanning electron microscope(SEM)images.A multi-scale data structure is established by spatially aligning and stacking SEM images at different magnifications.The MS-SRCNN significantly reduces computational runtime by over 90%compared to traditional architectures like ResNet50,VGG16,and VGG19,without compromising prediction accuracy.The model demonstrates more excellent predictive performance,achieving a>5%increase in R^(2) compared to single-scale models.Furthermore,the MS-SRCNN exhibits robust composition prediction capability across other Mg-based binary alloys,including Mg-La,Mg-Sn,Mg-Ce,Mg-Sm,Mg-Ag,and Mg-Y,thereby emphasizing its generalization and extrapolation potential.This research establishes a non-destructive,microstructure-informed composition analysis framework,reduces characterization time compared to traditional experiment methods and provides insights into the composition-microstructure relationship in diverse material systems.展开更多
Video emotion recognition is widely used due to its alignment with the temporal characteristics of human emotional expression,but existingmodels have significant shortcomings.On the one hand,Transformermultihead self-...Video emotion recognition is widely used due to its alignment with the temporal characteristics of human emotional expression,but existingmodels have significant shortcomings.On the one hand,Transformermultihead self-attention modeling of global temporal dependency has problems of high computational overhead and feature similarity.On the other hand,fixed-size convolution kernels are often used,which have weak perception ability for emotional regions of different scales.Therefore,this paper proposes a video emotion recognition model that combines multi-scale region-aware convolution with temporal interactive sampling.In terms of space,multi-branch large-kernel stripe convolution is used to perceive emotional region features at different scales,and attention weights are generated for each scale feature.In terms of time,multi-layer odd-even down-sampling is performed on the time series,and oddeven sub-sequence interaction is performed to solve the problem of feature similarity,while reducing computational costs due to the linear relationship between sampling and convolution overhead.This paper was tested on CMU-MOSI,CMU-MOSEI,and Hume Reaction.The Acc-2 reached 83.4%,85.2%,and 81.2%,respectively.The experimental results show that the model can significantly improve the accuracy of emotion recognition.展开更多
Reliable traffic flow prediction is crucial for mitigating urban congestion.This paper proposes Attentionbased spatiotemporal Interactive Dynamic Graph Convolutional Network(AIDGCN),a novel architecture integrating In...Reliable traffic flow prediction is crucial for mitigating urban congestion.This paper proposes Attentionbased spatiotemporal Interactive Dynamic Graph Convolutional Network(AIDGCN),a novel architecture integrating Interactive Dynamic Graph Convolution Network(IDGCN)with Temporal Multi-Head Trend-Aware Attention.Its core innovation lies in IDGCN,which uniquely splits sequences into symmetric intervals for interactive feature sharing via dynamic graphs,and a novel attention mechanism incorporating convolutional operations to capture essential local traffic trends—addressing a critical gap in standard attention for continuous data.For 15-and 60-min forecasting on METR-LA,AIDGCN achieves MAEs of 0.75%and 0.39%,and RMSEs of 1.32%and 0.14%,respectively.In the 60-min long-term forecasting of the PEMS-BAY dataset,the AIDGCN out-performs the MRA-BGCN method by 6.28%,4.93%,and 7.17%in terms of MAE,RMSE,and MAPE,respectively.Experimental results demonstrate the superiority of our pro-posed model over state-of-the-art methods.展开更多
In light of the prevailing issue that the existing convolutional neural network(CNN)power quality disturbance identification method can only extract single-scale features,which leads to a lack of feature information a...In light of the prevailing issue that the existing convolutional neural network(CNN)power quality disturbance identification method can only extract single-scale features,which leads to a lack of feature information and weak anti-noise performance,a new approach for identifying power quality disturbances based on an adaptive Kalman filter(KF)and multi-scale channel attention(MS-CAM)fused convolutional neural network is suggested.Single and composite-disruption signals are generated through simulation.The adaptive maximum likelihood Kalman filter is employed for noise reduction in the initial disturbance signal,and subsequent integration of multi-scale features into the conventional CNN architecture is conducted.The multi-scale features of the signal are captured by convolution kernels of different sizes so that the model can obtain diverse feature expressions.The attention mechanism(ATT)is introduced to adaptively allocate the extracted features,and the features are fused and selected to obtain the new main features.The Softmax classifier is employed for the classification of power quality disturbances.Finally,by comparing the recognition accuracy of the convolutional neural network(CNN),the model using the attention mechanism,the bidirectional long-term and short-term memory network(MS-Bi-LSTM),and the multi-scale convolutional neural network(MSCNN)with the attention mechanism with the proposed method.The simulation results demonstrate that the proposed method is higher than CNN,MS-Bi-LSTM,and MSCNN,and the overall recognition rate exceeds 99%,and the proposed method has significant classification accuracy and robust classification performance.This achievement provides a new perspective for further exploration in the field of power quality disturbance classification.展开更多
Convolutional neural networks(CNNs)-based medical image segmentation technologies have been widely used in medical image segmentation because of their strong representation and generalization abilities.However,due to ...Convolutional neural networks(CNNs)-based medical image segmentation technologies have been widely used in medical image segmentation because of their strong representation and generalization abilities.However,due to the inability to effectively capture global information from images,CNNs can easily lead to loss of contours and textures in segmentation results.Notice that the transformer model can effectively capture the properties of long-range dependencies in the image,and furthermore,combining the CNN and the transformer can effectively extract local details and global contextual features of the image.Motivated by this,we propose a multi-branch and multi-scale attention network(M2ANet)for medical image segmentation,whose architecture consists of three components.Specifically,in the first component,we construct an adaptive multi-branch patch module for parallel extraction of image features to reduce information loss caused by downsampling.In the second component,we apply residual block to the well-known convolutional block attention module to enhance the network’s ability to recognize important features of images and alleviate the phenomenon of gradient vanishing.In the third component,we design a multi-scale feature fusion module,in which we adopt adaptive average pooling and position encoding to enhance contextual features,and then multi-head attention is introduced to further enrich feature representation.Finally,we validate the effectiveness and feasibility of the proposed M2ANet method through comparative experiments on four benchmark medical image segmentation datasets,particularly in the context of preserving contours and textures.展开更多
In recent years,gait-based emotion recognition has been widely applied in the field of computer vision.However,existing gait emotion recognition methods typically rely on complete human skeleton data,and their accurac...In recent years,gait-based emotion recognition has been widely applied in the field of computer vision.However,existing gait emotion recognition methods typically rely on complete human skeleton data,and their accuracy significantly declines when the data is occluded.To enhance the accuracy of gait emotion recognition under occlusion,this paper proposes a Multi-scale Suppression Graph ConvolutionalNetwork(MS-GCN).TheMS-GCN consists of three main components:Joint Interpolation Module(JI Moudle),Multi-scale Temporal Convolution Network(MS-TCN),and Suppression Graph Convolutional Network(SGCN).The JI Module completes the spatially occluded skeletal joints using the(K-Nearest Neighbors)KNN interpolation method.The MS-TCN employs convolutional kernels of various sizes to comprehensively capture the emotional information embedded in the gait,compensating for the temporal occlusion of gait information.The SGCN extracts more non-prominent human gait features by suppressing the extraction of key body part features,thereby reducing the negative impact of occlusion on emotion recognition results.The proposed method is evaluated on two comprehensive datasets:Emotion-Gait,containing 4227 real gaits from sources like BML,ICT-Pollick,and ELMD,and 1000 synthetic gaits generated using STEP-Gen technology,and ELMB,consisting of 3924 gaits,with 1835 labeled with emotions such as“Happy,”“Sad,”“Angry,”and“Neutral.”On the standard datasets Emotion-Gait and ELMB,the proposed method achieved accuracies of 0.900 and 0.896,respectively,attaining performance comparable to other state-ofthe-artmethods.Furthermore,on occlusion datasets,the proposedmethod significantly mitigates the performance degradation caused by occlusion compared to other methods,the accuracy is significantly higher than that of other methods.展开更多
Convolutional neural network(CNN)with the encoder-decoder structure is popular in medical image segmentation due to its excellent local feature extraction ability but it faces limitations in capturing the global featu...Convolutional neural network(CNN)with the encoder-decoder structure is popular in medical image segmentation due to its excellent local feature extraction ability but it faces limitations in capturing the global feature.The transformer can extract the global information well but adapting it to small medical datasets is challenging and its computational complexity can be heavy.In this work,a serial and parallel network is proposed for the accurate 3D medical image segmentation by combining CNN and transformer and promoting feature interactions across various semantic levels.The core components of the proposed method include the cross window self-attention based transformer(CWST)and multi-scale local enhanced(MLE)modules.The CWST module enhances the global context understanding by partitioning 3D images into non-overlapping windows and calculating sparse global attention between windows.The MLE module selectively fuses features by computing the voxel attention between different branch features,and uses convolution to strengthen the dense local information.The experiments on the prostate,atrium,and pancreas MR/CT image datasets consistently demonstrate the advantage of the proposed method over six popular segmentation models in both qualitative evaluation and quantitative indexes such as dice similarity coefficient,Intersection over Union,95%Hausdorff distance and average symmetric surface distance.展开更多
The application of image super-resolution(SR)has brought significant assistance in the medical field,aiding doctors to make more precise diagnoses.However,solely relying on a convolutional neural network(CNN)for image...The application of image super-resolution(SR)has brought significant assistance in the medical field,aiding doctors to make more precise diagnoses.However,solely relying on a convolutional neural network(CNN)for image SR may lead to issues such as blurry details and excessive smoothness.To address the limitations,we proposed an algorithm based on the generative adversarial network(GAN)framework.In the generator network,three different sizes of convolutions connected by a residual dense structure were used to extract detailed features,and an attention mechanism combined with dual channel and spatial information was applied to concentrate the computing power on crucial areas.In the discriminator network,using InstanceNorm to normalize tensors sped up the training process while retaining feature information.The experimental results demonstrate that our algorithm achieves higher peak signal-to-noise ratio(PSNR)and structural similarity index measure(SSIM)compared to other methods,resulting in an improved visual quality.展开更多
Microseismic monitoring technology is widely used in tunnel and coal mine safety production.For signals generated by ultra-weak microseismic events,traditional sensors encounter limitations in terms of detection sensi...Microseismic monitoring technology is widely used in tunnel and coal mine safety production.For signals generated by ultra-weak microseismic events,traditional sensors encounter limitations in terms of detection sensitivity.Given the complex engineering environment,automatic multi-classification of microseismic data is highly required.In this study,we use acceleration sensors to collect signals and combine the improved Visual Geometry Group with a convolutional block attention module to obtain a new network structure,termed CNN_BAM,for automatic classification and identification of microseismic events.We use the dataset collected from the Hanjiang-to-Weihe River Diversion Project to train and validate the network model.Results show that the CNN_BAM model exhibits good feature extraction ability,achieving a recognition accuracy of 99.29%,surpassing all its counterparts.The stability and accuracy of the classification algorithm improve remarkably.In addition,through fine-tuning and migration to the Pan Ⅱ Mine Project,the network demonstrates reliable generalization performance.This outcome reflects its adaptability across different projects and promising application prospects.展开更多
Hyperspectral image(HSI)classification is crucial for numerous remote sensing applications.Traditional deep learning methods may miss pixel relationships and context,leading to inefficiencies.This paper introduces the...Hyperspectral image(HSI)classification is crucial for numerous remote sensing applications.Traditional deep learning methods may miss pixel relationships and context,leading to inefficiencies.This paper introduces the spectral band graph convolutional and attention-enhanced CNN joint network(SGCCN),a novel approach that harnesses the power of spectral band graph convolutions for capturing long-range relationships,utilizes local perception of attention-enhanced multi-level convolutions for local spatial feature and employs a dynamic attention mechanism to enhance feature extraction.The SGCCN integrates spectral and spatial features through a self-attention fusion network,significantly improving classification accuracy and efficiency.The proposed method outperforms existing techniques,demonstrating its effectiveness in handling the challenges associated with HSI data.展开更多
This paper aims to develop a nonrigid registration method of preoperative and intraoperative thoracoabdominal CT images in computer-assisted interventional surgeries for accurate tumor localization and tissue visualiz...This paper aims to develop a nonrigid registration method of preoperative and intraoperative thoracoabdominal CT images in computer-assisted interventional surgeries for accurate tumor localization and tissue visualization enhancement.However,fine structure registration of complex thoracoabdominal organs and large deformation registration caused by respiratory motion is challenging.To deal with this problem,we propose a 3D multi-scale attention VoxelMorph(MAVoxelMorph)registration network.To alleviate the large deformation problem,a multi-scale axial attention mechanism is utilized by using a residual dilated pyramid pooling for multi-scale feature extraction,and position-aware axial attention for long-distance dependencies between pixels capture.To further improve the large deformation and fine structure registration results,a multi-scale context channel attention mechanism is employed utilizing content information via adjacent encoding layers.Our method was evaluated on four public lung datasets(DIR-Lab dataset,Creatis dataset,Learn2Reg dataset,OASIS dataset)and a local dataset.Results proved that the proposed method achieved better registration performance than current state-of-the-art methods,especially in handling the registration of large deformations and fine structures.It also proved to be fast in 3D image registration,using about 1.5 s,and faster than most methods.Qualitative and quantitative assessments proved that the proposed MA-VoxelMorph has the potential to realize precise and fast tumor localization in clinical interventional surgeries.展开更多
This paper proposes an automated detection framework for transmission facilities using a featureattention multi-scale robustness network(FAMSR-Net)with high-fidelity virtual images.The proposed framework exhibits thre...This paper proposes an automated detection framework for transmission facilities using a featureattention multi-scale robustness network(FAMSR-Net)with high-fidelity virtual images.The proposed framework exhibits three key characteristics.First,virtual images of the transmission facilities generated using StyleGAN2-ADA are co-trained with real images.This enables the neural network to learn various features of transmission facilities to improve the detection performance.Second,the convolutional block attention module is deployed in FAMSR-Net to effectively extract features from images and construct multi-dimensional feature maps,enabling the neural network to perform precise object detection in various environments.Third,an effective bounding box optimization method called Scylla-IoU is deployed on FAMSR-Net,considering the intersection over union,center point distance,angle,and shape of the bounding box.This enables the detection of power facilities of various sizes accurately.Extensive experiments demonstrated that FAMSRNet outperforms other neural networks in detecting power facilities.FAMSR-Net also achieved the highest detection accuracy when virtual images of the transmission facilities were co-trained in the training phase.The proposed framework is effective for the scheduled operation and maintenance of transmission facilities because an optical camera is currently the most promising tool for unmanned aerial vehicles.This ultimately contributes to improved inspection efficiency,reduced maintenance risks,and more reliable power delivery across extensive transmission facilities.展开更多
Vehicle re-identification involves matching images of vehicles across varying camera views.The diversity of camera locations along different roadways leads to significant intra-class variation and only minimal inter-c...Vehicle re-identification involves matching images of vehicles across varying camera views.The diversity of camera locations along different roadways leads to significant intra-class variation and only minimal inter-class similarity in the collected vehicle images,which increases the complexity of re-identification tasks.To tackle these challenges,this study proposes AG-GCN(Attention-Guided Graph Convolutional Network),a novel framework integrating several pivotal components.Initially,AG-GCN embeds a lightweight attention module within the ResNet-50 structure to learn feature weights automatically,thereby improving the representation of vehicle features globally by highlighting salient features and suppressing extraneous ones.Moreover,AG-GCN adopts a graph-based structure to encapsulate deep local features.A graph convolutional network then amalgamates these features to understand the relationships among vehicle-related characteristics.Subsequently,we amalgamate feature maps from both the attention and graph-based branches for a more comprehensive representation of vehicle features.The framework then gauges feature similarities and ranks them,thus enhancing the accuracy of vehicle re-identification.Comprehensive qualitative and quantitative analyses on two publicly available datasets verify the efficacy of AG-GCN in addressing intra-class and inter-class variability issues.展开更多
Semantic segmentation plays a foundational role in biomedical image analysis, providing precise information about cellular, tissue, and organ structures in both biological and medical imaging modalities. Traditional a...Semantic segmentation plays a foundational role in biomedical image analysis, providing precise information about cellular, tissue, and organ structures in both biological and medical imaging modalities. Traditional approaches often fail in the face of challenges such as low contrast, morphological variability, and densely packed structures. Recent advancements in deep learning have transformed segmentation capabilities through the integration of fine-scale detail preservation, coarse-scale contextual modeling, and multi-scale feature fusion. This work provides a comprehensive analysis of state-of-the-art deep learning models, including U-Net variants, attention-based frameworks, and Transformer-integrated networks, highlighting innovations that improve accuracy, generalizability, and computational efficiency. Key architectural components such as convolution operations, shallow and deep blocks, skip connections, and hybrid encoders are examined for their roles in enhancing spatial representation and semantic consistency. We further discuss the importance of hierarchical and instance-aware segmentation and annotation in interpreting complex biological scenes and multiplexed medical images. By bridging methodological developments with diverse application domains, this paper outlines current trends and future directions for semantic segmentation, emphasizing its critical role in facilitating annotation, diagnosis, and discovery in biomedical research.展开更多
We propose a hierarchical multi-scale attention mechanism-based model in response to the low accuracy and inefficient manual classification of existing oceanic biological image classification methods. Firstly, the hie...We propose a hierarchical multi-scale attention mechanism-based model in response to the low accuracy and inefficient manual classification of existing oceanic biological image classification methods. Firstly, the hierarchical efficient multi-scale attention(H-EMA) module is designed for lightweight feature extraction, achieving outstanding performance at a relatively low cost. Secondly, an improved EfficientNetV2 block is used to integrate information from different scales better and enhance inter-layer message passing. Furthermore, introducing the convolutional block attention module(CBAM) enhances the model's perception of critical features, optimizing its generalization ability. Lastly, Focal Loss is introduced to adjust the weights of complex samples to address the issue of imbalanced categories in the dataset, further improving the model's performance. The model achieved 96.11% accuracy on the intertidal marine organism dataset of Nanji Islands and 84.78% accuracy on the CIFAR-100 dataset, demonstrating its strong generalization ability to meet the demands of oceanic biological image classification.展开更多
Background The use of remote photoplethysmography(rPPG)to estimate blood volume pulse in a noncontact manner has been an active research topic in recent years.Existing methods are primarily based on a singlescale regi...Background The use of remote photoplethysmography(rPPG)to estimate blood volume pulse in a noncontact manner has been an active research topic in recent years.Existing methods are primarily based on a singlescale region of interest(ROI).However,some noise signals that are not easily separated in a single-scale space can be easily separated in a multi-scale space.Also,existing spatiotemporal networks mainly focus on local spatiotemporal information and do not emphasize temporal information,which is crucial in pulse extraction problems,resulting in insufficient spatiotemporal feature modelling.Methods Here,we propose a multi-scale facial video pulse extraction network based on separable spatiotemporal convolution(SSTC)and dimension separable attention(DSAT).First,to solve the problem of a single-scale ROI,we constructed a multi-scale feature space for initial signal separation.Second,SSTC and DSAT were designed for efficient spatiotemporal correlation modeling,which increased the information interaction between the long-span time and space dimensions;this placed more emphasis on temporal features.Results The signal-to-noise ratio(SNR)of the proposed network reached 9.58dB on the PURE dataset and 6.77dB on the UBFC-rPPG dataset,outperforming state-of-the-art algorithms.Conclusions The results showed that fusing multi-scale signals yielded better results than methods based on only single-scale signals.The proposed SSTC and dimension-separable attention mechanism will contribute to more accurate pulse signal extraction.展开更多
基金supported by the National Natural Science Foundation of China(Grant Nos.62472149,62376089,62202147)Hubei Provincial Science and Technology Plan Project(2023BCB04100).
文摘Accurate traffic flow prediction has a profound impact on modern traffic management. Traffic flow has complex spatial-temporal correlations and periodicity, which poses difficulties for precise prediction. To address this problem, a Multi-head Self-attention and Spatial-Temporal Graph Convolutional Network (MSSTGCN) for multiscale traffic flow prediction is proposed. Firstly, to capture the hidden traffic periodicity of traffic flow, traffic flow is divided into three kinds of periods, including hourly, daily, and weekly data. Secondly, a graph attention residual layer is constructed to learn the global spatial features across regions. Local spatial-temporal dependence is captured by using a T-GCN module. Thirdly, a transformer layer is introduced to learn the long-term dependence in time. A position embedding mechanism is introduced to label position information for all traffic sequences. Thus, this multi-head self-attention mechanism can recognize the sequence order and allocate weights for different time nodes. Experimental results on four real-world datasets show that the MSSTGCN performs better than the baseline methods and can be successfully adapted to traffic prediction tasks.
基金Tianmin Tianyuan Boutique Vegetable Industry Technology Service Station(Grant No.2024120011003081)Development of Environmental Monitoring and Traceability System for Wuqing Agricultural Production Areas(Grant No.2024120011001866)。
文摘Tomato is a major economic crop worldwide,and diseases on tomato leaves can significantly reduce both yield and quality.Traditional manual inspection is inefficient and highly subjective,making it difficult to meet the requirements of early disease identification in complex natural environments.To address this issue,this study proposes an improved YOLO11-based model,YOLO-SPDNet(Scale Sequence Fusion,Position-Channel Attention,and Dual Enhancement Network).The model integrates the SEAM(Self-Ensembling Attention Mechanism)semantic enhancement module,the MLCA(Mixed Local Channel Attention)lightweight attention mechanism,and the SPA(Scale-Position-Detail Awareness)module composed of SSFF(Scale Sequence Feature Fusion),TFE(Triple Feature Encoding),and CPAM(Channel and Position Attention Mechanism).These enhancements strengthen fine-grained lesion detection while maintaining model lightweightness.Experimental results show that YOLO-SPDNet achieves an accuracy of 91.8%,a recall of 86.5%,and an mAP@0.5 of 90.6%on the test set,with a computational complexity of 12.5 GFLOPs.Furthermore,the model reaches a real-time inference speed of 987 FPS,making it suitable for deployment on mobile agricultural terminals and online monitoring systems.Comparative analysis and ablation studies further validate the reliability and practical applicability of the proposed model in complex natural scenes.
文摘Defect detection in printed circuit boards(PCB)remains challenging due to the difficulty of identifying small-scale defects,the inefficiency of conventional approaches,and the interference from complex backgrounds.To address these issues,this paper proposes SIM-Net,an enhanced detection framework derived from YOLOv11.The model integrates SPDConv to preserve fine-grained features for small object detection,introduces a novel convolutional partial attention module(C2PAM)to suppress redundant background information and highlight salient regions,and employs a multi-scale fusion network(MFN)with a multi-grain contextual module(MGCT)to strengthen contextual representation and accelerate inference.Experimental evaluations demonstrate that SIM-Net achieves 92.4%mAP,92%accuracy,and 89.4%recall with an inference speed of 75.1 FPS,outperforming existing state-of-the-art methods.These results confirm the robustness and real-time applicability of SIM-Net for PCB defect inspection.
基金funded by the National Natural Science Foundation of China(No.52204407)the Natural Science Foundation of Jiangsu Province(No.BK20220595)the China Postdoctoral Science Foundation(No.2022M723689).
文摘This study proposes a multi-scale simplified residual convolutional neural network(MS-SRCNN)for the precise prediction of Mg-Nd binary alloy compositions from scanning electron microscope(SEM)images.A multi-scale data structure is established by spatially aligning and stacking SEM images at different magnifications.The MS-SRCNN significantly reduces computational runtime by over 90%compared to traditional architectures like ResNet50,VGG16,and VGG19,without compromising prediction accuracy.The model demonstrates more excellent predictive performance,achieving a>5%increase in R^(2) compared to single-scale models.Furthermore,the MS-SRCNN exhibits robust composition prediction capability across other Mg-based binary alloys,including Mg-La,Mg-Sn,Mg-Ce,Mg-Sm,Mg-Ag,and Mg-Y,thereby emphasizing its generalization and extrapolation potential.This research establishes a non-destructive,microstructure-informed composition analysis framework,reduces characterization time compared to traditional experiment methods and provides insights into the composition-microstructure relationship in diverse material systems.
基金supported,in part,by the National Nature Science Foundation of China under Grant 62272236,62376128in part,by the Natural Science Foundation of Jiangsu Province under Grant BK20201136,BK20191401.
文摘Video emotion recognition is widely used due to its alignment with the temporal characteristics of human emotional expression,but existingmodels have significant shortcomings.On the one hand,Transformermultihead self-attention modeling of global temporal dependency has problems of high computational overhead and feature similarity.On the other hand,fixed-size convolution kernels are often used,which have weak perception ability for emotional regions of different scales.Therefore,this paper proposes a video emotion recognition model that combines multi-scale region-aware convolution with temporal interactive sampling.In terms of space,multi-branch large-kernel stripe convolution is used to perceive emotional region features at different scales,and attention weights are generated for each scale feature.In terms of time,multi-layer odd-even down-sampling is performed on the time series,and oddeven sub-sequence interaction is performed to solve the problem of feature similarity,while reducing computational costs due to the linear relationship between sampling and convolution overhead.This paper was tested on CMU-MOSI,CMU-MOSEI,and Hume Reaction.The Acc-2 reached 83.4%,85.2%,and 81.2%,respectively.The experimental results show that the model can significantly improve the accuracy of emotion recognition.
文摘Reliable traffic flow prediction is crucial for mitigating urban congestion.This paper proposes Attentionbased spatiotemporal Interactive Dynamic Graph Convolutional Network(AIDGCN),a novel architecture integrating Interactive Dynamic Graph Convolution Network(IDGCN)with Temporal Multi-Head Trend-Aware Attention.Its core innovation lies in IDGCN,which uniquely splits sequences into symmetric intervals for interactive feature sharing via dynamic graphs,and a novel attention mechanism incorporating convolutional operations to capture essential local traffic trends—addressing a critical gap in standard attention for continuous data.For 15-and 60-min forecasting on METR-LA,AIDGCN achieves MAEs of 0.75%and 0.39%,and RMSEs of 1.32%and 0.14%,respectively.In the 60-min long-term forecasting of the PEMS-BAY dataset,the AIDGCN out-performs the MRA-BGCN method by 6.28%,4.93%,and 7.17%in terms of MAE,RMSE,and MAPE,respectively.Experimental results demonstrate the superiority of our pro-posed model over state-of-the-art methods.
基金The project is supported by the National Natural Science Foundation of China(52067013)the Key Projects of the Natural Science Foundation of Gansu Provincial Science and Technology Department(22JR5RA318).
文摘In light of the prevailing issue that the existing convolutional neural network(CNN)power quality disturbance identification method can only extract single-scale features,which leads to a lack of feature information and weak anti-noise performance,a new approach for identifying power quality disturbances based on an adaptive Kalman filter(KF)and multi-scale channel attention(MS-CAM)fused convolutional neural network is suggested.Single and composite-disruption signals are generated through simulation.The adaptive maximum likelihood Kalman filter is employed for noise reduction in the initial disturbance signal,and subsequent integration of multi-scale features into the conventional CNN architecture is conducted.The multi-scale features of the signal are captured by convolution kernels of different sizes so that the model can obtain diverse feature expressions.The attention mechanism(ATT)is introduced to adaptively allocate the extracted features,and the features are fused and selected to obtain the new main features.The Softmax classifier is employed for the classification of power quality disturbances.Finally,by comparing the recognition accuracy of the convolutional neural network(CNN),the model using the attention mechanism,the bidirectional long-term and short-term memory network(MS-Bi-LSTM),and the multi-scale convolutional neural network(MSCNN)with the attention mechanism with the proposed method.The simulation results demonstrate that the proposed method is higher than CNN,MS-Bi-LSTM,and MSCNN,and the overall recognition rate exceeds 99%,and the proposed method has significant classification accuracy and robust classification performance.This achievement provides a new perspective for further exploration in the field of power quality disturbance classification.
基金supported by the Natural Science Foundation of the Anhui Higher Education Institutions of China(Grant Nos.2023AH040149 and 2024AH051915)the Anhui Provincial Natural Science Foundation(Grant No.2208085MF168)+1 种基金the Science and Technology Innovation Tackle Plan Project of Maanshan(Grant No.2024RGZN001)the Scientific Research Fund Project of Anhui Medical University(Grant No.2023xkj122).
文摘Convolutional neural networks(CNNs)-based medical image segmentation technologies have been widely used in medical image segmentation because of their strong representation and generalization abilities.However,due to the inability to effectively capture global information from images,CNNs can easily lead to loss of contours and textures in segmentation results.Notice that the transformer model can effectively capture the properties of long-range dependencies in the image,and furthermore,combining the CNN and the transformer can effectively extract local details and global contextual features of the image.Motivated by this,we propose a multi-branch and multi-scale attention network(M2ANet)for medical image segmentation,whose architecture consists of three components.Specifically,in the first component,we construct an adaptive multi-branch patch module for parallel extraction of image features to reduce information loss caused by downsampling.In the second component,we apply residual block to the well-known convolutional block attention module to enhance the network’s ability to recognize important features of images and alleviate the phenomenon of gradient vanishing.In the third component,we design a multi-scale feature fusion module,in which we adopt adaptive average pooling and position encoding to enhance contextual features,and then multi-head attention is introduced to further enrich feature representation.Finally,we validate the effectiveness and feasibility of the proposed M2ANet method through comparative experiments on four benchmark medical image segmentation datasets,particularly in the context of preserving contours and textures.
基金supported by the National Natural Science Foundation of China(62272049,62236006,62172045)the Key Projects of Beijing Union University(ZKZD202301).
文摘In recent years,gait-based emotion recognition has been widely applied in the field of computer vision.However,existing gait emotion recognition methods typically rely on complete human skeleton data,and their accuracy significantly declines when the data is occluded.To enhance the accuracy of gait emotion recognition under occlusion,this paper proposes a Multi-scale Suppression Graph ConvolutionalNetwork(MS-GCN).TheMS-GCN consists of three main components:Joint Interpolation Module(JI Moudle),Multi-scale Temporal Convolution Network(MS-TCN),and Suppression Graph Convolutional Network(SGCN).The JI Module completes the spatially occluded skeletal joints using the(K-Nearest Neighbors)KNN interpolation method.The MS-TCN employs convolutional kernels of various sizes to comprehensively capture the emotional information embedded in the gait,compensating for the temporal occlusion of gait information.The SGCN extracts more non-prominent human gait features by suppressing the extraction of key body part features,thereby reducing the negative impact of occlusion on emotion recognition results.The proposed method is evaluated on two comprehensive datasets:Emotion-Gait,containing 4227 real gaits from sources like BML,ICT-Pollick,and ELMD,and 1000 synthetic gaits generated using STEP-Gen technology,and ELMB,consisting of 3924 gaits,with 1835 labeled with emotions such as“Happy,”“Sad,”“Angry,”and“Neutral.”On the standard datasets Emotion-Gait and ELMB,the proposed method achieved accuracies of 0.900 and 0.896,respectively,attaining performance comparable to other state-ofthe-artmethods.Furthermore,on occlusion datasets,the proposedmethod significantly mitigates the performance degradation caused by occlusion compared to other methods,the accuracy is significantly higher than that of other methods.
基金National Key Research and Development Program of China,Grant/Award Number:2018YFE0206900China Postdoctoral Science Foundation,Grant/Award Number:2023M731204+2 种基金The Open Project of Key Laboratory for Quality Evaluation of Ultrasound Surgical Equipment of National Medical Products Administration,Grant/Award Number:SMDTKL-2023-1-01The Hubei Province Key Research and Development Project,Grant/Award Number:2023BCB007CAAI-Huawei MindSpore Open Fund。
文摘Convolutional neural network(CNN)with the encoder-decoder structure is popular in medical image segmentation due to its excellent local feature extraction ability but it faces limitations in capturing the global feature.The transformer can extract the global information well but adapting it to small medical datasets is challenging and its computational complexity can be heavy.In this work,a serial and parallel network is proposed for the accurate 3D medical image segmentation by combining CNN and transformer and promoting feature interactions across various semantic levels.The core components of the proposed method include the cross window self-attention based transformer(CWST)and multi-scale local enhanced(MLE)modules.The CWST module enhances the global context understanding by partitioning 3D images into non-overlapping windows and calculating sparse global attention between windows.The MLE module selectively fuses features by computing the voxel attention between different branch features,and uses convolution to strengthen the dense local information.The experiments on the prostate,atrium,and pancreas MR/CT image datasets consistently demonstrate the advantage of the proposed method over six popular segmentation models in both qualitative evaluation and quantitative indexes such as dice similarity coefficient,Intersection over Union,95%Hausdorff distance and average symmetric surface distance.
文摘The application of image super-resolution(SR)has brought significant assistance in the medical field,aiding doctors to make more precise diagnoses.However,solely relying on a convolutional neural network(CNN)for image SR may lead to issues such as blurry details and excessive smoothness.To address the limitations,we proposed an algorithm based on the generative adversarial network(GAN)framework.In the generator network,three different sizes of convolutions connected by a residual dense structure were used to extract detailed features,and an attention mechanism combined with dual channel and spatial information was applied to concentrate the computing power on crucial areas.In the discriminator network,using InstanceNorm to normalize tensors sped up the training process while retaining feature information.The experimental results demonstrate that our algorithm achieves higher peak signal-to-noise ratio(PSNR)and structural similarity index measure(SSIM)compared to other methods,resulting in an improved visual quality.
基金supported by the Key Research and Development Plan of Anhui Province(202104a05020059)the Excellent Scientific Research and Innovation Team of Anhui Province(2022AH010003)support from Hefei Comprehensive National Science Center is highly appreciated.
文摘Microseismic monitoring technology is widely used in tunnel and coal mine safety production.For signals generated by ultra-weak microseismic events,traditional sensors encounter limitations in terms of detection sensitivity.Given the complex engineering environment,automatic multi-classification of microseismic data is highly required.In this study,we use acceleration sensors to collect signals and combine the improved Visual Geometry Group with a convolutional block attention module to obtain a new network structure,termed CNN_BAM,for automatic classification and identification of microseismic events.We use the dataset collected from the Hanjiang-to-Weihe River Diversion Project to train and validate the network model.Results show that the CNN_BAM model exhibits good feature extraction ability,achieving a recognition accuracy of 99.29%,surpassing all its counterparts.The stability and accuracy of the classification algorithm improve remarkably.In addition,through fine-tuning and migration to the Pan Ⅱ Mine Project,the network demonstrates reliable generalization performance.This outcome reflects its adaptability across different projects and promising application prospects.
基金supported in part by the National Natural Science Foundations of China(No.61801214)the Postgraduate Research Practice Innovation Program of NUAA(No.xcxjh20231504)。
文摘Hyperspectral image(HSI)classification is crucial for numerous remote sensing applications.Traditional deep learning methods may miss pixel relationships and context,leading to inefficiencies.This paper introduces the spectral band graph convolutional and attention-enhanced CNN joint network(SGCCN),a novel approach that harnesses the power of spectral band graph convolutions for capturing long-range relationships,utilizes local perception of attention-enhanced multi-level convolutions for local spatial feature and employs a dynamic attention mechanism to enhance feature extraction.The SGCCN integrates spectral and spatial features through a self-attention fusion network,significantly improving classification accuracy and efficiency.The proposed method outperforms existing techniques,demonstrating its effectiveness in handling the challenges associated with HSI data.
基金supported in part by the National Natural Science Foundation of China[62301374]Hubei Provincial Natural Science Foundation of China[2022CFB804]+2 种基金Hubei Provincial Education Research Project[B2022057]the Youths Science Foundation of Wuhan Institute of Technology[K202240]the 15th Graduate Education Innovation Fund of Wuhan Institute of Technology[CX2023295].
文摘This paper aims to develop a nonrigid registration method of preoperative and intraoperative thoracoabdominal CT images in computer-assisted interventional surgeries for accurate tumor localization and tissue visualization enhancement.However,fine structure registration of complex thoracoabdominal organs and large deformation registration caused by respiratory motion is challenging.To deal with this problem,we propose a 3D multi-scale attention VoxelMorph(MAVoxelMorph)registration network.To alleviate the large deformation problem,a multi-scale axial attention mechanism is utilized by using a residual dilated pyramid pooling for multi-scale feature extraction,and position-aware axial attention for long-distance dependencies between pixels capture.To further improve the large deformation and fine structure registration results,a multi-scale context channel attention mechanism is employed utilizing content information via adjacent encoding layers.Our method was evaluated on four public lung datasets(DIR-Lab dataset,Creatis dataset,Learn2Reg dataset,OASIS dataset)and a local dataset.Results proved that the proposed method achieved better registration performance than current state-of-the-art methods,especially in handling the registration of large deformations and fine structures.It also proved to be fast in 3D image registration,using about 1.5 s,and faster than most methods.Qualitative and quantitative assessments proved that the proposed MA-VoxelMorph has the potential to realize precise and fast tumor localization in clinical interventional surgeries.
基金supported by the Korea Electric Power Corporation(R22TA14,Development of Drone Systemfor Diagnosis of Porcelain Insulators in Overhead Transmission Lines)the National Fire Agency of Korea(RS-2024-00408270,Fire Hazard Analysis and Fire Safety Standards Development for Transportation and Storage Stage of Reuse Battery)the Ministry of the Interior and Safety of Korea(RS-2024-00408982,Development of Intelligent Fire Detection and Sprinkler Facility Technology Reflecting the Characteristics of Logistics Facilities).
文摘This paper proposes an automated detection framework for transmission facilities using a featureattention multi-scale robustness network(FAMSR-Net)with high-fidelity virtual images.The proposed framework exhibits three key characteristics.First,virtual images of the transmission facilities generated using StyleGAN2-ADA are co-trained with real images.This enables the neural network to learn various features of transmission facilities to improve the detection performance.Second,the convolutional block attention module is deployed in FAMSR-Net to effectively extract features from images and construct multi-dimensional feature maps,enabling the neural network to perform precise object detection in various environments.Third,an effective bounding box optimization method called Scylla-IoU is deployed on FAMSR-Net,considering the intersection over union,center point distance,angle,and shape of the bounding box.This enables the detection of power facilities of various sizes accurately.Extensive experiments demonstrated that FAMSRNet outperforms other neural networks in detecting power facilities.FAMSR-Net also achieved the highest detection accuracy when virtual images of the transmission facilities were co-trained in the training phase.The proposed framework is effective for the scheduled operation and maintenance of transmission facilities because an optical camera is currently the most promising tool for unmanned aerial vehicles.This ultimately contributes to improved inspection efficiency,reduced maintenance risks,and more reliable power delivery across extensive transmission facilities.
基金funded by the National Natural Science Foundation of China(grant number:62172292).
文摘Vehicle re-identification involves matching images of vehicles across varying camera views.The diversity of camera locations along different roadways leads to significant intra-class variation and only minimal inter-class similarity in the collected vehicle images,which increases the complexity of re-identification tasks.To tackle these challenges,this study proposes AG-GCN(Attention-Guided Graph Convolutional Network),a novel framework integrating several pivotal components.Initially,AG-GCN embeds a lightweight attention module within the ResNet-50 structure to learn feature weights automatically,thereby improving the representation of vehicle features globally by highlighting salient features and suppressing extraneous ones.Moreover,AG-GCN adopts a graph-based structure to encapsulate deep local features.A graph convolutional network then amalgamates these features to understand the relationships among vehicle-related characteristics.Subsequently,we amalgamate feature maps from both the attention and graph-based branches for a more comprehensive representation of vehicle features.The framework then gauges feature similarities and ranks them,thus enhancing the accuracy of vehicle re-identification.Comprehensive qualitative and quantitative analyses on two publicly available datasets verify the efficacy of AG-GCN in addressing intra-class and inter-class variability issues.
基金Open Access funding provided by the National Institutes of Health(NIH)The funding for this project was provided by NCATS Intramural Fund.
文摘Semantic segmentation plays a foundational role in biomedical image analysis, providing precise information about cellular, tissue, and organ structures in both biological and medical imaging modalities. Traditional approaches often fail in the face of challenges such as low contrast, morphological variability, and densely packed structures. Recent advancements in deep learning have transformed segmentation capabilities through the integration of fine-scale detail preservation, coarse-scale contextual modeling, and multi-scale feature fusion. This work provides a comprehensive analysis of state-of-the-art deep learning models, including U-Net variants, attention-based frameworks, and Transformer-integrated networks, highlighting innovations that improve accuracy, generalizability, and computational efficiency. Key architectural components such as convolution operations, shallow and deep blocks, skip connections, and hybrid encoders are examined for their roles in enhancing spatial representation and semantic consistency. We further discuss the importance of hierarchical and instance-aware segmentation and annotation in interpreting complex biological scenes and multiplexed medical images. By bridging methodological developments with diverse application domains, this paper outlines current trends and future directions for semantic segmentation, emphasizing its critical role in facilitating annotation, diagnosis, and discovery in biomedical research.
基金supported by the National Natural Science Foundation of China (Nos.61806107 and 61702135)。
文摘We propose a hierarchical multi-scale attention mechanism-based model in response to the low accuracy and inefficient manual classification of existing oceanic biological image classification methods. Firstly, the hierarchical efficient multi-scale attention(H-EMA) module is designed for lightweight feature extraction, achieving outstanding performance at a relatively low cost. Secondly, an improved EfficientNetV2 block is used to integrate information from different scales better and enhance inter-layer message passing. Furthermore, introducing the convolutional block attention module(CBAM) enhances the model's perception of critical features, optimizing its generalization ability. Lastly, Focal Loss is introduced to adjust the weights of complex samples to address the issue of imbalanced categories in the dataset, further improving the model's performance. The model achieved 96.11% accuracy on the intertidal marine organism dataset of Nanji Islands and 84.78% accuracy on the CIFAR-100 dataset, demonstrating its strong generalization ability to meet the demands of oceanic biological image classification.
基金Supported by the National Natural Science Foundation of China(61903336,61976190)the Natural Science Foundation of Zhejiang Province(LY21F030015)。
文摘Background The use of remote photoplethysmography(rPPG)to estimate blood volume pulse in a noncontact manner has been an active research topic in recent years.Existing methods are primarily based on a singlescale region of interest(ROI).However,some noise signals that are not easily separated in a single-scale space can be easily separated in a multi-scale space.Also,existing spatiotemporal networks mainly focus on local spatiotemporal information and do not emphasize temporal information,which is crucial in pulse extraction problems,resulting in insufficient spatiotemporal feature modelling.Methods Here,we propose a multi-scale facial video pulse extraction network based on separable spatiotemporal convolution(SSTC)and dimension separable attention(DSAT).First,to solve the problem of a single-scale ROI,we constructed a multi-scale feature space for initial signal separation.Second,SSTC and DSAT were designed for efficient spatiotemporal correlation modeling,which increased the information interaction between the long-span time and space dimensions;this placed more emphasis on temporal features.Results The signal-to-noise ratio(SNR)of the proposed network reached 9.58dB on the PURE dataset and 6.77dB on the UBFC-rPPG dataset,outperforming state-of-the-art algorithms.Conclusions The results showed that fusing multi-scale signals yielded better results than methods based on only single-scale signals.The proposed SSTC and dimension-separable attention mechanism will contribute to more accurate pulse signal extraction.