With the growing application of intelligent robots in service,manufacturing,and medical fields,efficient and natural interaction between humans and robots has become key to improving collaboration efficiency and user ...With the growing application of intelligent robots in service,manufacturing,and medical fields,efficient and natural interaction between humans and robots has become key to improving collaboration efficiency and user experience.Gesture recognition,as an intuitive and contactless interaction method,can overcome the limitations of traditional interfaces and enable real-time control and feedback of robot movements and behaviors.This study first reviews mainstream gesture recognition algorithms and their application on different sensing platforms(RGB cameras,depth cameras,and inertial measurement units).It then proposes a gesture recognition method based on multimodal feature fusion and a lightweight deep neural network that balances recognition accuracy with computational efficiency.At system level,a modular human-robot interaction architecture is constructed,comprising perception,decision,and execution layers,and gesture commands are transmitted and mapped to robot actions in real time via the ROS communication protocol.Through multiple comparative experiments on public gesture datasets and a self-collected dataset,the proposed method’s superiority is validated in terms of accuracy,response latency,and system robustness,while user-experience tests assess the interface’s usability.The results provide a reliable technical foundation for robot collaboration and service in complex scenarios,offering broad prospects for practical application and deployment.展开更多
为了充分利用特征间的高阶交互以提升点击率预测模型的预测精度,提出了一种基于图神经网络和注意力的点击率预测模型VBGA (vector-wise and bit-wise interaction model based on GNN and attention),该模型借助图神经网络和注意力机制...为了充分利用特征间的高阶交互以提升点击率预测模型的预测精度,提出了一种基于图神经网络和注意力的点击率预测模型VBGA (vector-wise and bit-wise interaction model based on GNN and attention),该模型借助图神经网络和注意力机制,为每个特征分别学习一个细粒度的权重,并将这种细粒度的特征权重输入到向量级交互层和元素级交互层联合预测点击率.VBGA模型主要由向量级交互层和元素级交互层构成,其中向量级交互层采用有向图来构建向量级的特征交互,实现无重复的显式特征交互,在减少计算量的同时,还可以实现更高阶的特征交叉,以获得更准确的预测精度.此外,本文还提出了一种交叉网络用于构建元素级特征交互.在Criteo和Avazu数据集上,与其他几种最先进的点击率预测模型进行了比较,实验结果表明,VBGA可以获得良好的预测结果.展开更多
从单张RGB图像中实现双手的3D交互式网格重建是一项极具挑战性的任务。由于双手之间的相互遮挡以及局部外观相似性较高,导致部分特征提取不够准确,从而丢失了双手之间的交互信息并使重建的手部网格与输入图像出现不对齐等问题。为了解...从单张RGB图像中实现双手的3D交互式网格重建是一项极具挑战性的任务。由于双手之间的相互遮挡以及局部外观相似性较高,导致部分特征提取不够准确,从而丢失了双手之间的交互信息并使重建的手部网格与输入图像出现不对齐等问题。为了解决上述问题,本文首先提出一种包含两个部分的特征交互适应模块,第一部分特征交互在保留左右手分离特征的同时生成两种新的特征表示,并通过交互注意力模块捕获双手的交互特征;第二部分特征适应则是将此交互特征利用交互注意力模块适应到每只手,为左右手特征注入全局上下文信息。其次,引入三层图卷积细化网络结构用于精确回归双手网格顶点,并通过基于注意力机制的特征对齐模块增强顶点特征和图像特征的对齐,从而增强重建的手部网格和输入图像的对齐。同时提出一种新的多层感知机结构,通过下采样和上采样操作学习多尺度特征信息。最后,设计相对偏移损失函数约束双手的空间关系。在InterHand2.6M数据集上的定量和定性实验表明,与现有的优秀方法相比,所提出的方法显著提升了模型性能,其中平均每关节位置误差(Mean Per Joint Position Error,MPJPE)和平均每顶点位置误差(Mean Per Vertex Position Error,MPVPE)分别降低至7.19 mm和7.33 mm。此外,在RGB2Hands和EgoHands数据集上进行泛化性实验,定性实验结果表明所提出的方法具有良好的泛化能力,能够适应不同环境背景下的手部网格重建。展开更多
文摘With the growing application of intelligent robots in service,manufacturing,and medical fields,efficient and natural interaction between humans and robots has become key to improving collaboration efficiency and user experience.Gesture recognition,as an intuitive and contactless interaction method,can overcome the limitations of traditional interfaces and enable real-time control and feedback of robot movements and behaviors.This study first reviews mainstream gesture recognition algorithms and their application on different sensing platforms(RGB cameras,depth cameras,and inertial measurement units).It then proposes a gesture recognition method based on multimodal feature fusion and a lightweight deep neural network that balances recognition accuracy with computational efficiency.At system level,a modular human-robot interaction architecture is constructed,comprising perception,decision,and execution layers,and gesture commands are transmitted and mapped to robot actions in real time via the ROS communication protocol.Through multiple comparative experiments on public gesture datasets and a self-collected dataset,the proposed method’s superiority is validated in terms of accuracy,response latency,and system robustness,while user-experience tests assess the interface’s usability.The results provide a reliable technical foundation for robot collaboration and service in complex scenarios,offering broad prospects for practical application and deployment.
文摘为了充分利用特征间的高阶交互以提升点击率预测模型的预测精度,提出了一种基于图神经网络和注意力的点击率预测模型VBGA (vector-wise and bit-wise interaction model based on GNN and attention),该模型借助图神经网络和注意力机制,为每个特征分别学习一个细粒度的权重,并将这种细粒度的特征权重输入到向量级交互层和元素级交互层联合预测点击率.VBGA模型主要由向量级交互层和元素级交互层构成,其中向量级交互层采用有向图来构建向量级的特征交互,实现无重复的显式特征交互,在减少计算量的同时,还可以实现更高阶的特征交叉,以获得更准确的预测精度.此外,本文还提出了一种交叉网络用于构建元素级特征交互.在Criteo和Avazu数据集上,与其他几种最先进的点击率预测模型进行了比较,实验结果表明,VBGA可以获得良好的预测结果.
文摘从单张RGB图像中实现双手的3D交互式网格重建是一项极具挑战性的任务。由于双手之间的相互遮挡以及局部外观相似性较高,导致部分特征提取不够准确,从而丢失了双手之间的交互信息并使重建的手部网格与输入图像出现不对齐等问题。为了解决上述问题,本文首先提出一种包含两个部分的特征交互适应模块,第一部分特征交互在保留左右手分离特征的同时生成两种新的特征表示,并通过交互注意力模块捕获双手的交互特征;第二部分特征适应则是将此交互特征利用交互注意力模块适应到每只手,为左右手特征注入全局上下文信息。其次,引入三层图卷积细化网络结构用于精确回归双手网格顶点,并通过基于注意力机制的特征对齐模块增强顶点特征和图像特征的对齐,从而增强重建的手部网格和输入图像的对齐。同时提出一种新的多层感知机结构,通过下采样和上采样操作学习多尺度特征信息。最后,设计相对偏移损失函数约束双手的空间关系。在InterHand2.6M数据集上的定量和定性实验表明,与现有的优秀方法相比,所提出的方法显著提升了模型性能,其中平均每关节位置误差(Mean Per Joint Position Error,MPJPE)和平均每顶点位置误差(Mean Per Vertex Position Error,MPVPE)分别降低至7.19 mm和7.33 mm。此外,在RGB2Hands和EgoHands数据集上进行泛化性实验,定性实验结果表明所提出的方法具有良好的泛化能力,能够适应不同环境背景下的手部网格重建。