In contemporary computer vision,convolutional neural networks(CNNs)and vision transformers(ViTs)represent the two primary architectural paradigms for image recognition.While both approaches have been widely adopted in...In contemporary computer vision,convolutional neural networks(CNNs)and vision transformers(ViTs)represent the two primary architectural paradigms for image recognition.While both approaches have been widely adopted in medical imaging applications,they operate based on fundamentally different computational principles.This report attempts to provide brief application notes on ViTs and CNNs,particularly focusing on scenarios that guide the selection of one architecture over the other in practical medical implementations.Generally,CNNs rely on convolutional kernels,localized receptive fields,and weight sharing,enabling efficient hierarchical feature extraction.These properties contribute to strong performance in detecting spatially constrained patterns such as textures,edges,and anatomical boundaries,while maintaining relatively low computational requirements.ViTs,on the other hand,decompose images into smaller segments referred to as tokens and employ self-attention mechanisms to model relationships across the entire image.This global modeling capability allows ViTs to capture long-range dependencies that may be difficult for convolution-based architectures to learn.However,ViTs typically achieve optimal performance when trained on extremely large datasets or when supported by extensive pretraining,as their reduced inductive bias requires greater data exposure to learn robust representations.This report briefly examines the architectural structure,underlying mathematical foundations,and relative performance characteristics of CNNs and ViTs,drawing upon recent findings from contemporary research.Emphasis is placed on understanding how differences in data availability,computational resources,and task requirements influence model effectiveness across medical imaging domains.Most importantly,the report serves as a concise application guide for practitioners seeking informed implementation decisions between these two influential deep learning frameworks.展开更多
WiFi-based human activity recognition(HAR)provides a non-intrusive approach for ubiquitous monitoring;however,achieving both high accuracy and robustness simultaneously remains a significant challenge.This paper propo...WiFi-based human activity recognition(HAR)provides a non-intrusive approach for ubiquitous monitoring;however,achieving both high accuracy and robustness simultaneously remains a significant challenge.This paper proposes a Convolutional Neural Network with Enhanced Convolutional Block Attention Module(CNN-ECBAM)framework.The approach systematically converts raw Channel State Information(CSI)into pseudo-color images,effectively preserving essential signal characteristics for deep neural network processing.The core innovation is an Enhanced Convolutional Block Attention Module(ECBAM),tailored to CSI data characteristics,which integrates Efficient Channel Attention(ECA)and Multi-Scale Spatial Attention(MSSA).By employing learnable adaptive fusion weights,it achieves dynamic synergy between channel and spatial features,enabling the network to capture highly discriminative spatiotemporal patterns.The ECBAM module is integrated into a unified Convolutional Neural Network(CNN)to form the overall CNN-ECBAM model.Experimental results on the UT-HAR and NTU-Fi_HAR datasets demonstrate that CNN-ECBAM achieves competitive performance in recognition accuracy and outperforms mainstream baseline models.Specifically,it attains 99.20%accuracy on UT-HAR(surpassing ResNet-18 at 98.60%)and achieves 100%accuracy on NTU-Fi_HAR(exceeding GAF-CNN at 99.62%).These results validate the effectiveness of the proposed method for high-precision and reliable WiFi-based HAR.展开更多
文摘In contemporary computer vision,convolutional neural networks(CNNs)and vision transformers(ViTs)represent the two primary architectural paradigms for image recognition.While both approaches have been widely adopted in medical imaging applications,they operate based on fundamentally different computational principles.This report attempts to provide brief application notes on ViTs and CNNs,particularly focusing on scenarios that guide the selection of one architecture over the other in practical medical implementations.Generally,CNNs rely on convolutional kernels,localized receptive fields,and weight sharing,enabling efficient hierarchical feature extraction.These properties contribute to strong performance in detecting spatially constrained patterns such as textures,edges,and anatomical boundaries,while maintaining relatively low computational requirements.ViTs,on the other hand,decompose images into smaller segments referred to as tokens and employ self-attention mechanisms to model relationships across the entire image.This global modeling capability allows ViTs to capture long-range dependencies that may be difficult for convolution-based architectures to learn.However,ViTs typically achieve optimal performance when trained on extremely large datasets or when supported by extensive pretraining,as their reduced inductive bias requires greater data exposure to learn robust representations.This report briefly examines the architectural structure,underlying mathematical foundations,and relative performance characteristics of CNNs and ViTs,drawing upon recent findings from contemporary research.Emphasis is placed on understanding how differences in data availability,computational resources,and task requirements influence model effectiveness across medical imaging domains.Most importantly,the report serves as a concise application guide for practitioners seeking informed implementation decisions between these two influential deep learning frameworks.
基金Supported by Anhui Provincial Engineering Research Center for Sports and Health Information Monitoring Technology(KF2023012)。
文摘WiFi-based human activity recognition(HAR)provides a non-intrusive approach for ubiquitous monitoring;however,achieving both high accuracy and robustness simultaneously remains a significant challenge.This paper proposes a Convolutional Neural Network with Enhanced Convolutional Block Attention Module(CNN-ECBAM)framework.The approach systematically converts raw Channel State Information(CSI)into pseudo-color images,effectively preserving essential signal characteristics for deep neural network processing.The core innovation is an Enhanced Convolutional Block Attention Module(ECBAM),tailored to CSI data characteristics,which integrates Efficient Channel Attention(ECA)and Multi-Scale Spatial Attention(MSSA).By employing learnable adaptive fusion weights,it achieves dynamic synergy between channel and spatial features,enabling the network to capture highly discriminative spatiotemporal patterns.The ECBAM module is integrated into a unified Convolutional Neural Network(CNN)to form the overall CNN-ECBAM model.Experimental results on the UT-HAR and NTU-Fi_HAR datasets demonstrate that CNN-ECBAM achieves competitive performance in recognition accuracy and outperforms mainstream baseline models.Specifically,it attains 99.20%accuracy on UT-HAR(surpassing ResNet-18 at 98.60%)and achieves 100%accuracy on NTU-Fi_HAR(exceeding GAF-CNN at 99.62%).These results validate the effectiveness of the proposed method for high-precision and reliable WiFi-based HAR.
文摘针对不同磁密幅值、频率、谐波组合等复杂激励工况下磁致伸缩建模面临的精准性问题,该文利用空间注意力机制(spatial attention mechanism,SAM)对传统的卷积神经网络(convolutional neural network,CNN)进行改进,将SAM嵌套入CNN网络中,建立SAMCNN改进型网络。再结合双向长短期记忆(bidirectional long short-term memory,BiLSTM)网络,提出电工钢片SAMCNN-BiLSTM磁致伸缩模型。首先,利用灰狼优化算法(grey wolf optimization,GWO)寻优神经网络结构的参数,实现复杂工况下磁致伸缩效应的准确表征;然后,建立中低频范围单频与叠加谐波激励等复杂工况下的磁致伸缩应变数据库,开展数据预处理与特征分析;最后,对SAMCNN-BiLSTM模型开展对比验证。对比叠加3次谐波激励下的磁致伸缩应变频谱主要分量,SAMCNN-BiLSTM模型计算值最大相对误差为3.70%,其比Jiles-Atherton-Sablik(J-A-S)、二次畴转等模型能更精确地表征电工钢片的磁致伸缩效应。