This paper proposes SW-YOLO(StarNet Weighted-Conv YOLO),a lightweight human pose estimation network for edge devices.Current mainstream pose estimation algorithms are computationally inefficient and have poor feature ...This paper proposes SW-YOLO(StarNet Weighted-Conv YOLO),a lightweight human pose estimation network for edge devices.Current mainstream pose estimation algorithms are computationally inefficient and have poor feature capture capabilities for complex poses and occlusion scenarios.This work introduces a lightweight backbone architecture that integrates WConv(Weighted Convolution)and StarNet modules to address these issues.Leveraging StarNet’s superior capabilities in multi-level feature fusion and long-range dependency modeling,this architecture enhances the model’s spatial perception of human joint structures and contextual information integration.These improvements significantly enhance robustness in complex scenarios involving occlusion and deformation.Additionally,the introduction of WConv convolution operations,based on weight recalibration and receptive field optimization,dynamically adjusts feature importance during convolution.This reduces redundant computations while maintaining or enhancing feature representation capabilities at an extremely low computational cost.Consequently,SW-YOLO substantially reduces model complexity and inference latency while preserving high accuracy,significantly outperforming existing lightweight networks.展开更多
文摘This paper proposes SW-YOLO(StarNet Weighted-Conv YOLO),a lightweight human pose estimation network for edge devices.Current mainstream pose estimation algorithms are computationally inefficient and have poor feature capture capabilities for complex poses and occlusion scenarios.This work introduces a lightweight backbone architecture that integrates WConv(Weighted Convolution)and StarNet modules to address these issues.Leveraging StarNet’s superior capabilities in multi-level feature fusion and long-range dependency modeling,this architecture enhances the model’s spatial perception of human joint structures and contextual information integration.These improvements significantly enhance robustness in complex scenarios involving occlusion and deformation.Additionally,the introduction of WConv convolution operations,based on weight recalibration and receptive field optimization,dynamically adjusts feature importance during convolution.This reduces redundant computations while maintaining or enhancing feature representation capabilities at an extremely low computational cost.Consequently,SW-YOLO substantially reduces model complexity and inference latency while preserving high accuracy,significantly outperforming existing lightweight networks.