复杂交互场景下融合关节遮挡信息的手部姿态估计研究被引量：1

Hand Pose Estimation via Fusing Joint Occlusion Informationin Complex Interaction Scenarios

下载PDF

导出

摘要基于视觉的手部姿态估计是计算机视觉领域的研究热点,也是理解人类交互意图的重要途径。手部在交互中不可避免会被自身或物体遮挡造成关键信息丢失,不仅影响遮挡域手部姿态估计精度,也会降低可见域手部姿态估计精度。由于现有公开数据集缺乏大规模遮挡标签,导致专注于解决遮挡问题的研究成果不足。受此启发,本文旨在通过分类手部遮挡关节,提高遮挡下手部姿态估计精度。面向复杂交互场景,以手部MANO模型和拓扑结构为基础,将手部分割为独立运动单元集合,并基于相机成像原理首次提出单手、双手和手物交互下关节遮挡判别器,能够为公开数据集制作遮挡标签。为了表明抑制遮挡关节对提高手部姿态估计精度的重要性,本文融合关节遮挡与可见性提出动态特征选择模块,并且级联于重要研究成果POEM网络上,提出了融合关节遮挡信息的手部姿态估计网络。此外,本文基于增强现实建立了具有交互意图的手部遮挡数据集HODA,在虚拟物体引导下完成预抓取和抓取运动,既能实时反馈真实操作状态,又有效地避免物体遮挡影响。为了丰富数据集,本文采用文本引导的扩散模型为手部图像生成自然且连贯的背景。在实验环节,通过7个公开数据集和HODA验证了关节遮挡分类方法的准确率超过95.07%;基于遮挡关节数量统计将3个数据集划分为无遮挡、轻微遮挡和重度遮挡,以此验证遮挡数量对手部姿态估计的不利影响;利用对比实验和权重矩阵消融实验验证了融合关节遮挡信息的手部姿态估计网络在DexYCB、HanCo和HO3D数据集上达到了先进水平;基于泛化性、相似性实验以及扩充数据集实验验证了HODA数据集的有效性。 Vision-based hand pose estimation has become a crucial research topic in the field of computer vision and plays an essential role in understanding human interaction intent.During interactions,the hand is inevitably occluded by either itself or external objects,resulting in the loss of image key information.This occlusion degrades the accuracy of hand pose estimation not only in the occluded regions but also in the visible regions.Moreover,the lack of large-scale occlusion annotations in existing publicly available datasets results in insufficient research focused on addressing the occlusion problem.Inspired by this,we propose a joint occlusion classification method to improve the accuracy of hand pose estimation under occlusion.For complex interaction scenarios,based on the MANO hand model and topology,the hand is divided into independent motion sections that have rigid body motion properties.Then discriminators based on the bone-eye angle strategy and the projection strategy are first presented to classify one-hand self-occlusion,two-hand mutual-occlusion and object occlusion by principle of camera imaging.Discriminators can be used to create occlusion labels for most public datasets,significantly enriching the dataset’s annotations.To verify the effect of suppressing occluded joints for improving hand pose estimation accuracy,we integrate joint occlusion and joint visibility into a dynamic feature selection module based on POEM and present a hand pose estimation network fusing joint occlusion information.The network is implemented by cascading feature extraction,dynamic feature selection,and feature utilization module.It can select only strongly related features for utilization and filter out weakly related features,which reduces the influence of occluded joints on the accuracy of hand pose estimation,thereby improving the accuracy.In addition,based on the occlusion classification method and augmented reality technology,we propose a hand occlusion dataset with occlusion labels,named HODA,where the joint occlusion classification method automatically annotates and augmented reality is used to simulate grasping interactions.Under the guidance of virtual objects,the data collection process is divided into pre-grasp and grasp motions,which not only provides real-time feedback on actual operations but also effectively avoids the influence of object occlusion during the grasping motions.A natural and coherent background is crucial for the dataset.To address the problems of unnatural position and semantic incoherence,we use a text-guided diffusion model to generate a natural and coherent background for hand images based on the content,style,and semantics of existing images.In the experiment,the proposed joint occlusion classification method achieves an impressive accuracy of over 95.07%,as verified by seven public datasets and HODA covering one-hand,two-hand,and hand-object interactions;three datasets are categorized into non-occlusion,slight occlusion,and severe occlusion based on the number of occluded joints,to validate the adverse effects of increasing the number of occluded joints;through comparative experiments and weight matrix ablation experiments,the hand pose estimation network fusing joint occlusion information outperforms existing methods,realizes satisfactory performance on the HO3D dataset.Especially on DexYCB and HanCo datasets,the network achieves the state-of-the-art performance;the effectiveness of the HODA is confirmed by the generalizability,similarity,and complementary dataset experiments.

作者李少东罗凯黄远智刘熹双丰高放 LI Shao-Dong;LUO Kai;HUANG Yuan-Zhi;LIU Xi;SHUANG Feng;GAO Fang(Guangxi Key Laboratory of Intelligent Control and Maintenance of Power Equipment,School of Electrical Engineering,Guangxi University,Nanning 530004)

机构地区广西大学电气工程学院广西电力装备智能控制与运维重点实验室

出处《计算机学报》北大核心 2025年第5期1212-1231,共20页 Chinese Journal of Computers

基金广西自然科学基金青年基金项目(No.2023GXNSFBA026069) 广西自然科学基金面上项目(No.2025GXNSFAA069931) 广西研究生教育创新项目(No.YCSW2023056)资助。

关键词关节遮挡分类手部姿态估计自遮挡双手遮挡物体遮挡手部数据集扩散模型 joint occlusion classification hand pose estimation self-occlusion mutual-occlusion object occlusion hand dataset diffusion model

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献3

1马胜蕾,李敬华,孔德慧,王立春,王少帆,尹宝才.基于双分支多尺度注意力的手三维姿态估计[J].计算机学报,2023,46(7):1383-1395. 被引量：5
2缪永伟,李佳颖,刘家宗,陈佳舟,孙树森.融合关节旋转特征和指尖距离特征的手势识别[J].计算机学报,2020,43(1):78-92. 被引量：17
3李至,潘越,陈殿生,秦雪娇,王韫,王俊杰.基于模仿学习的眼底手术行为机器人复现[J].机器人,2024,46(3):361-369. 被引量：3

二级参考文献3

1朱继玉,王西颖,王威信,戴国忠.基于结构分析的手势识别[J].计算机学报,2006,29(12):2130-2137. 被引量：26
2杨学文,冯志全,黄忠柱,何娜娜.结合手势主方向和类-Hausdorff距离的手势识别[J].计算机辅助设计与图形学学报,2016,28(1):75-81. 被引量：21
3Chang-Yan He,Long Huang,Yang Yang,Qing-Feng Liang,Yong-Kang Li.Research and Realization of a Master-Slave Robotic System for Retinal Vascular Bypass Surgery[J].Chinese Journal of Mechanical Engineering,2018,31(4):48-57. 被引量：12

共引文献22

1李晓峰,张银慧,李子阳,张文泉.基于多模态深度学习的实时交互系统设计[J].机械设计,2024,41(S02):200-204. 被引量：6
2姜洋洋.基于卷积神经网络与CUDA加速计算的手势识别算法应用研究[J].系统仿真技术,2020,16(1):22-26. 被引量：4
3郭丹,唐申庚,洪日昌,汪萌.手语识别、翻译与生成综述[J].计算机科学,2021,48(3):60-70. 被引量：12
4李和森,柳冠中.基于人机工程的智能塑壳断路器面板造型设计[J].机械设计,2021,38(5):127-131. 被引量：7
5刘亮,蒲浩洋.基于LSTM的多维度特征手势实时识别[J].计算机科学,2021,48(8):328-333. 被引量：8
6王文斌,李琨.基于特征跟踪的人机交互多点手势识别仿真[J].计算机仿真,2022,39(2):176-179. 被引量：7
7来言芳.基于人机交互的亲子游戏机手势视觉感应识别系统设计[J].自动化与仪器仪表,2022(7):266-269.
8陈万泽,陈家祯.基于手势识别的无接触解锁系统[J].中阿科技论坛（中英文）,2022(12):110-114.
9黄丹.基于深度迁移学习的钢琴演奏手势识别技术研究[J].河北北方学院学报（自然科学版）,2022,38(9):1-7.
10徐飞,邹寿春.基于计算机视觉技术和支持向量机的手势识别算法研究[J].佳木斯大学学报（自然科学版）,2023,41(1):29-33. 被引量：4

同被引文献4

1车云龙,齐越.基于深度图像的手部姿态估计综述[J].计算机辅助设计与图形学学报,2021,33(11):1635-1648. 被引量：3
2马胜蕾,李敬华,孔德慧,王立春,王少帆,尹宝才.基于双分支多尺度注意力的手三维姿态估计[J].计算机学报,2023,46(7):1383-1395. 被引量：5
3肖一,刘越.基于RGB图像的三维人手姿态估计技术综述[J].计算机辅助设计与图形学学报,2024,36(2):161-172. 被引量：3
4孙迪钢,张平.基于先验知识和网格监督的手部姿态估计[J].华南理工大学学报（自然科学版）,2024,52(6):138-147. 被引量：2

引证文献1

1王文润,党建武,王阳萍,任鹏百,潘瑞.联合多模态特征与结构感知的手物交互姿态估计[J].光学精密工程,2025,33(20):3265-3280.

1郝章程,李一洋,薛文军,徐艳,刘闯.一种基于脑肌电融合的无人平台人机交互意图识别方法[J].火力与指挥控制,2025,50(4):64-70.
2黎小巨,陈洵凛,祝华春.深度学习引导的喷涂产线工件自动识别及分拣[J].机床与液压,2025,53(8):93-100. 被引量：2
3王佐.基于问题导向的高中物理课程教学策略研究[J].成才之路,2025(18):93-96. 被引量：1
4顾思远,高曙.融合特征增强与互补的手物姿态估计方法[J].中国图象图形学报,2025,30(5):1433-1449. 被引量：1
5李芸,容加梅,缪应雷.经口内镜下肌切开术治疗贲门失弛缓症术后复发的危险因素分析[J].临床内科杂志,2025,42(4):312-314.
6向健华,芦丽,方方,石心红.紫外光谱结合机器学习算法的祛痘类化妆品中4种禁用抗感染类药物快速筛查[J].分析测试学报,2025,44(6):1096-1106.
7王玺帝,谢志坚,王玲,乔腾飞,和昌伟,刘叶青.基于熵值法改进的K最近邻算法[J].计算机科学与应用,2025,15(5):735-740.

计算机学报

2025年第5期

浏览历史

内容加载中请稍等...

复杂交互场景下融合关节遮挡信息的手部姿态估计研究被引量：1

参考文献3

二级参考文献3

共引文献22

同被引文献4

引证文献1

相关作者

相关机构

相关主题

浏览历史

复杂交互场景下融合关节遮挡信息的手部姿态估计研究 被引量：1

参考文献3

二级参考文献3

共引文献22

同被引文献4

引证文献1

相关作者

相关机构

相关主题

浏览历史

复杂交互场景下融合关节遮挡信息的手部姿态估计研究被引量：1