期刊文献+

基于掩码提示和注意力的手部姿态估计

Hand pose estimation based on mask prompts and attention
在线阅读 下载PDF
导出
摘要 手部姿态估计是计算机视觉的重要研究方向,传统方法易受复杂背景干扰,而深度学习方法虽具抗干扰能力,但在多手场景和细节识别方面仍存不足。因此,提出一种基于掩码提示和注意力机制的手部姿态估计方法HMCA(Hand Mask Prompts and Attention)。首先,利用目标检测和语义分割生成手部掩码图,从而屏蔽背景噪声并提供先验信息;其次,设计并行注意力模块(PAB)与多路残差模块(MRB),以提取多尺度特征,从而提高复杂手势识别能力,降低计算复杂度,并防止梯度消失;再次,利用掩码图引导模型关注手部区域,从而解决多手和遮挡问题;最后,在回归损失中加入惩罚项,从而约束关键点预测并加快模型收敛。实验结果表明,该方法在单手、多手和遮挡场景下均优于其他方法,在不同阈值下的曲线面积均值(AUC)和平均关节点位置误差(MPJPE)方面均取得最佳性能。在RHD(Rendered Handpose Dataset)上,该方法在不同阈值下的AUC为93.22%,MPJPE为2.15;在CMU Panoptic数据集上,该方法在不同阈值下的AUC为91.38%,手部关节点平均误差为2.06。 Hand pose estimation is an important research direction in computer vision.Traditional methods are susceptible to complex background interference,while deep learning methods,despite being more robust,still face difficulties in multi-hand scenarios and fine-grained detail recognition.Therefore,a hand pose estimation method based on mask prompts and attention mechanisms,named HMCA(Hand Mask Prompts and Attention),was proposed.Firstly,hand mask maps,generated via object detection and semantic segmentation,were used to suppress background noise and provide prior information.Secondly,a Parallel Attention Block(PAB)and a Multi-path Residual Block(MRB)were designed to extract multi-scale features,thereby enhancing complex hand pose recognition ability,reducing computational complexity,and preventing gradient vanishing.Thirdly,the hand mask maps were utilized to guide the model to focus on hand regions,thereby addressing issues such as multi-hand and occlusion.Finally,a penalty term was incorporated into the regression loss to constrain keypoint prediction and accelerate model convergence.Experimental results show that the proposed method outperforms other methods with best performance on both the Area Under the Curve(AUC)and the Mean Per Joint Position Error(MPJPE)under varying thresholds in single-hand,multi-hand,and occlusion scenarios.On the RHD(Rendered Handpose Dataset),an AUC of 93.22%and a MPJPE of 2.15 are achieved under varying thresholds;on the CMU Panoptic dataset,an AUC of 91.38%and a mean hand keypoint error of 2.06 are reported under varying thresholds.
作者 任建华 曹佳惠 贾迪 REN Jianhua;CAO Jiahui;JIA Di(School of Electronic and Information Engineering,Liaoning Technical University,Huludao Liaoning 125105,China;Ordos Research Institute,Liaoning Technical University,Ordos Inner Mongolia 017004,China)
出处 《计算机应用》 北大核心 2025年第12期4012-4020,共9页 journal of Computer Applications
基金 国家自然科学基金资助项目(61601213) 辽宁工程技术大学鄂尔多斯研究院校地科技合作培育项目(YJY-XD-2023-003)。
关键词 手部姿态估计 掩码提示 注意力机制 卷积神经网络 语义分割 hand pose estimation mask prompt attention mechanism Convolutional Neural Network(CNN) semantic segmentation
  • 相关文献

参考文献7

二级参考文献23

共引文献22

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部