期刊文献+

复杂交互场景下融合关节遮挡信息的手部姿态估计研究 被引量:1

Hand Pose Estimation via Fusing Joint Occlusion Informationin Complex Interaction Scenarios
在线阅读 下载PDF
导出
摘要 基于视觉的手部姿态估计是计算机视觉领域的研究热点,也是理解人类交互意图的重要途径。手部在交互中不可避免会被自身或物体遮挡造成关键信息丢失,不仅影响遮挡域手部姿态估计精度,也会降低可见域手部姿态估计精度。由于现有公开数据集缺乏大规模遮挡标签,导致专注于解决遮挡问题的研究成果不足。受此启发,本文旨在通过分类手部遮挡关节,提高遮挡下手部姿态估计精度。面向复杂交互场景,以手部MANO模型和拓扑结构为基础,将手部分割为独立运动单元集合,并基于相机成像原理首次提出单手、双手和手物交互下关节遮挡判别器,能够为公开数据集制作遮挡标签。为了表明抑制遮挡关节对提高手部姿态估计精度的重要性,本文融合关节遮挡与可见性提出动态特征选择模块,并且级联于重要研究成果POEM网络上,提出了融合关节遮挡信息的手部姿态估计网络。此外,本文基于增强现实建立了具有交互意图的手部遮挡数据集HODA,在虚拟物体引导下完成预抓取和抓取运动,既能实时反馈真实操作状态,又有效地避免物体遮挡影响。为了丰富数据集,本文采用文本引导的扩散模型为手部图像生成自然且连贯的背景。在实验环节,通过7个公开数据集和HODA验证了关节遮挡分类方法的准确率超过95.07%;基于遮挡关节数量统计将3个数据集划分为无遮挡、轻微遮挡和重度遮挡,以此验证遮挡数量对手部姿态估计的不利影响;利用对比实验和权重矩阵消融实验验证了融合关节遮挡信息的手部姿态估计网络在DexYCB、HanCo和HO3D数据集上达到了先进水平;基于泛化性、相似性实验以及扩充数据集实验验证了HODA数据集的有效性。 Vision-based hand pose estimation has become a crucial research topic in the field of computer vision and plays an essential role in understanding human interaction intent.During interactions,the hand is inevitably occluded by either itself or external objects,resulting in the loss of image key information.This occlusion degrades the accuracy of hand pose estimation not only in the occluded regions but also in the visible regions.Moreover,the lack of large-scale occlusion annotations in existing publicly available datasets results in insufficient research focused on addressing the occlusion problem.Inspired by this,we propose a joint occlusion classification method to improve the accuracy of hand pose estimation under occlusion.For complex interaction scenarios,based on the MANO hand model and topology,the hand is divided into independent motion sections that have rigid body motion properties.Then discriminators based on the bone-eye angle strategy and the projection strategy are first presented to classify one-hand self-occlusion,two-hand mutual-occlusion and object occlusion by principle of camera imaging.Discriminators can be used to create occlusion labels for most public datasets,significantly enriching the dataset’s annotations.To verify the effect of suppressing occluded joints for improving hand pose estimation accuracy,we integrate joint occlusion and joint visibility into a dynamic feature selection module based on POEM and present a hand pose estimation network fusing joint occlusion information.The network is implemented by cascading feature extraction,dynamic feature selection,and feature utilization module.It can select only strongly related features for utilization and filter out weakly related features,which reduces the influence of occluded joints on the accuracy of hand pose estimation,thereby improving the accuracy.In addition,based on the occlusion classification method and augmented reality technology,we propose a hand occlusion dataset with occlusion labels,named HODA,where the joint occlusion classification method automatically annotates and augmented reality is used to simulate grasping interactions.Under the guidance of virtual objects,the data collection process is divided into pre-grasp and grasp motions,which not only provides real-time feedback on actual operations but also effectively avoids the influence of object occlusion during the grasping motions.A natural and coherent background is crucial for the dataset.To address the problems of unnatural position and semantic incoherence,we use a text-guided diffusion model to generate a natural and coherent background for hand images based on the content,style,and semantics of existing images.In the experiment,the proposed joint occlusion classification method achieves an impressive accuracy of over 95.07%,as verified by seven public datasets and HODA covering one-hand,two-hand,and hand-object interactions;three datasets are categorized into non-occlusion,slight occlusion,and severe occlusion based on the number of occluded joints,to validate the adverse effects of increasing the number of occluded joints;through comparative experiments and weight matrix ablation experiments,the hand pose estimation network fusing joint occlusion information outperforms existing methods,realizes satisfactory performance on the HO3D dataset.Especially on DexYCB and HanCo datasets,the network achieves the state-of-the-art performance;the effectiveness of the HODA is confirmed by the generalizability,similarity,and complementary dataset experiments.
作者 李少东 罗凯 黄远智 刘熹 双丰 高放 LI Shao-Dong;LUO Kai;HUANG Yuan-Zhi;LIU Xi;SHUANG Feng;GAO Fang(Guangxi Key Laboratory of Intelligent Control and Maintenance of Power Equipment,School of Electrical Engineering,Guangxi University,Nanning 530004)
出处 《计算机学报》 北大核心 2025年第5期1212-1231,共20页 Chinese Journal of Computers
基金 广西自然科学基金青年基金项目(No.2023GXNSFBA026069) 广西自然科学基金面上项目(No.2025GXNSFAA069931) 广西研究生教育创新项目(No.YCSW2023056)资助。
关键词 关节遮挡分类 手部姿态估计 自遮挡 双手遮挡 物体遮挡 手部数据集 扩散模型 joint occlusion classification hand pose estimation self-occlusion mutual-occlusion object occlusion hand dataset diffusion model
  • 相关文献

参考文献3

二级参考文献3

共引文献22

同被引文献4

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部