Device-free gesture recognition is an emerging wireless sensing technique which could recognize gestures by analyzing its influence on surrounding wireless signals,it may empower wireless networks with the augmented s...Device-free gesture recognition is an emerging wireless sensing technique which could recognize gestures by analyzing its influence on surrounding wireless signals,it may empower wireless networks with the augmented sensing ability.Researchers have made great achievements for singleperson device-free gesture recognition.However,when multiple persons conduct gestures simultaneously,the received signals will be mixed together,and thus traditional methods would not work well anymore.Moreover,the anonymity of persons and the change in the surrounding environment would cause feature shift and mismatch,and thus the recognition accuracy would degrade remarkably.To address these problems,we explore and exploit the diversity of spatial information and propose a multidimensional analysis method to separate the gesture feature of each person using a focusing sensing strategy.Meanwhile,we also present a deep-learning based robust device free gesture recognition framework,which leverages an adversarial approach to extract robust gesture feature that is insensitive to the change of persons and environment.Furthermore,we also develop a 77GHz mmWave prototype system and evaluate the proposed methods extensively.Experimental results reveal that the proposed system can achieve average accuracies of 93%and 84%when 10 gestures are conducted in Received:Jun.18,2020 Revised:Aug.06,2020 Editor:Ning Ge different environments by two and four persons simultaneously,respectively.展开更多
Multi-view multi-person 3D human pose estimation is a hot topic in the field of human pose estimation due to its wide range of application scenarios.With the introduction of end-to-end direct regression methods,the fi...Multi-view multi-person 3D human pose estimation is a hot topic in the field of human pose estimation due to its wide range of application scenarios.With the introduction of end-to-end direct regression methods,the field has entered a new stage of development.However,the regression results of joints that are more heavily influenced by external factors are not accurate enough even for the optimal method.In this paper,we propose an effective feature recalibration module based on the channel attention mechanism and a relative optimal calibration strategy,which is applied to themulti-viewmulti-person 3D human pose estimation task to achieve improved detection accuracy for joints that are more severely affected by external factors.Specifically,it achieves relative optimal weight adjustment of joint feature information through the recalibration module and strategy,which enables the model to learn the dependencies between joints and the dependencies between people and their corresponding joints.We call this method as the Efficient Recalibration Network(ER-Net).Finally,experiments were conducted on two benchmark datasets for this task,Campus and Shelf,in which the PCP reached 97.3% and 98.3%,respectively.展开更多
Semi-supervised learning is a significant approach to learn robust human pose estimation models that perform well on wild images.Existing semi-supervised methods of human pose estimation mainly focus on instance-agnos...Semi-supervised learning is a significant approach to learn robust human pose estimation models that perform well on wild images.Existing semi-supervised methods of human pose estimation mainly focus on instance-agnostic keypoint detection.In multi-person scenes,the arbitrary number of instances that have made pose estimation much more challenging,and current semi-supervised methods cannot fully mine the information in unlabeled data.To leverage the instance information in unlabeled data,we propose an end-to-end semi-supervised training strategy.Different from previous semi-supervised methods in two stages,our method focuses on detector-free frameworks including bottom-up and single-stage ones.It not only performs consistency regularization on heatmaps,but also employs a pseudo-labeling approach to generate instance-specific pseudo annotations.On the COCO and CrowdPose benchmark,the proposed approach outperforms previous instance-agnostic methods under various labeling ratios.Our method is applicable to both bottom-up and single-stage frameworks,showing its general applicability.展开更多
Monitoring respiration is an important component of personal health care.Though recent developments in Wi-Fi sensing offer a potential tool to achieve contact-free respiration monitoring,existing proposals for Wi-Fi-b...Monitoring respiration is an important component of personal health care.Though recent developments in Wi-Fi sensing offer a potential tool to achieve contact-free respiration monitoring,existing proposals for Wi-Fi-based multi-person respiration sensing mainly extract individual's respiration rate in the frequency domain using the fast Fourier transform(FFT)or multiple signal classification(MUSIC)method,leading to the following limitations:1)largely ineffective in recovering breaths of multiple persons from received mixed signals and in differentiating individual breaths,2)unable to acquire the time-varying respiration pattern when the subject has respiratory abnormity,such as apnea and changing respiration rates,and 3)difficult to identify the real number of subjects when multiple subjects share the same or similar respiration rates.To address these issues,we propose Wi-Fi-enabled MUlti-person SEnsing(WiMUSE)as a signal processing pipeline to perform respiration monitoring for multiple persons simultaneously.Essentially,as a pioneering time domain approach,WiMUSE models the mixed signals of multi-person respiration as a linear superposition of multiple waveforms,so as to form a blind source separation(BSS)problem.The effective separation of the signal sources(respiratory waveforms)further enables us to quantify the differences in the respiratory waveform patterns of multiple subjects,and thus to identify the number of subjects along with their respective respiration waveforms.We implement WiMUSE on commodity Wi-Fi devices and conduct extensive experiments to demonstrate that,compared with the approaches based on the FFT or MUSIC method,90%error of respiration rate can be reduced by more than 60%.展开更多
For multi-person 2D pose estimation,current deep learning baised methods have exhibited impressive performance,but the trade-offs among efficiency,robustness,and accuracy in the existing approaches remain unavoidable....For multi-person 2D pose estimation,current deep learning baised methods have exhibited impressive performance,but the trade-offs among efficiency,robustness,and accuracy in the existing approaches remain unavoidable.In principle,bottom-up methods are superior to top-down methods in efficiency,but they perform worse in accuracy.To make full use of their respective advantages,in this paper we design a novel bidirectional optimization coupled lightweight network(BOCLN)architecture for efficient,robust,and general-purpose multi-person 2D(2-dimensional)pose estimation from natural images.With the BOCLN framework,the bottom-up network focuses oil global features,while the top-down net work places emphasis on det ailed features.The entire framework shares global features along the bottom-up data stream,while the top-down data stream aims to accelerate the accurate pose estimation.In particular,to exploit the priors of human joints'relationship,we propose a probability limb heat map to represent the spatial context of the joints and guide the overall pose skeleton prediction,so that each person's pose estimation in cluttered scenes(involving crowd)could be as accurate and robust as possible.Therefore,benefiting from the novel BOCLN architecture,the tinie-consuming refinement procedure could be much simplified to an efficient lightweight network.Extensive experiments and evaluations on public benchmarks have confirmed that our new method is more efficient and robust,yet still attain competitive accuracy performance compared with the state-of-the-art methods.Our BOCLN shows even greater promise in online applications.展开更多
Despite significant developments in 3D multi-view multi-person (3D MM) tracking, current frameworks separately target footprint tracking, or pose tracking. Frameworks designed for the former cannot be used for the lat...Despite significant developments in 3D multi-view multi-person (3D MM) tracking, current frameworks separately target footprint tracking, or pose tracking. Frameworks designed for the former cannot be used for the latter, because they directly obtain 3D positions on the ground plane via a homography projection, which is inapplicable to 3D poses above the ground. In contrast, frameworks designed for pose tracking generally isolate multi-view and multi-frame associations and may not be sufficiently robust for footprint tracking, which utilizes fewer key points than pose tracking, weakening multi-view association cues in a single frame. This study presents a unified multi-view multi-person tracking framework to bridge the gap between footprint tracking and pose tracking. Without additional modifications, the framework can adopt monocular 2D bounding boxes and 2D poses as its input to produce robust 3D trajectories for multiple persons. Importantly, multi-frame and multi-view information are jointly employed to improve association and triangulation. Our framework is shown to provide state-of-the-art performance on the Campus and Shelf datasets for 3D pose tracking, with comparable results on the WILDTRACK and MMPTRACK datasets for 3D footprint tracking.展开更多
Existing multi-person reconstruction methods require the human bodies in the input image to occupy a considerable portion of the picture.However,low-resolution human objects are ubiquitous due to trade-offbetween the ...Existing multi-person reconstruction methods require the human bodies in the input image to occupy a considerable portion of the picture.However,low-resolution human objects are ubiquitous due to trade-offbetween the field of view and target distance given a limited camera resolution.In this paper,we propose an end-to-end multi-task framework for multi-person inference from a low-resolution image(MILI).To perceive more information from a low-resolution image,we use pair-wise images at high resolution and low resolution for training,and design a restoration network with a simple loss for better feature extraction from the low-resolution image.To address the occlusion problem in multi-person scenes,we propose an occlusion-aware mask prediction network to estimate the mask of each person during 3D mesh regression.Experimental results on both small-scale scenes and large-scale scenes demonstrate that our method outperforms the state-of-the-art methods both quantitatively and qualitatively.The code is available at http://cic.tju.edu.cn/faculty/likun/projects/MILI.展开更多
个性化联邦学习因其在应对数据异质性和隐私保护方面的优势而备受关注。现有算法专注于平衡全局信息和个性化信息之间的矛盾,忽视了全局信息中的不同标签信息带来的干扰,尤其在维护单一全局头部的算法中,容易出现标签间特征冲突导致的...个性化联邦学习因其在应对数据异质性和隐私保护方面的优势而备受关注。现有算法专注于平衡全局信息和个性化信息之间的矛盾,忽视了全局信息中的不同标签信息带来的干扰,尤其在维护单一全局头部的算法中,容易出现标签间特征冲突导致的收敛困难。为此,提出一种新的算法——全局多头部联邦学习(federated learning with global multi-head,FedGMH)算法,该算法在服务器创建多个全局头部,每个头部专门处理一种标签信息,而客户端下载与本地标签相关的全局头部,从而避免无关标签信息的干扰。此外,FedGMH引入参数级聚合机制:评估头部参数重要性,并将关键参数更新为全局多头部的加权参数,以加快收敛速度并且提高准确率。在3个视觉数据集上的大量实验表明,FedGMH优于现有的先进算法。展开更多
基金This work was supported by National Natural Science Foundation of China under grants U1933104 and 62071081LiaoNing Revitalization Talents Program under grant XLYC1807019,Liaoning Province Natural Science Foundation under grants 2019-MS-058+1 种基金Dalian Science and Technology Innovation Foundation under grant 2018J12GX044Fundamental Research Funds for the Central Universities under grants DUT20LAB113 and DUT20JC07,and Cooperative Scientific Research Project of Chunhui Plan of Ministry of Education.
文摘Device-free gesture recognition is an emerging wireless sensing technique which could recognize gestures by analyzing its influence on surrounding wireless signals,it may empower wireless networks with the augmented sensing ability.Researchers have made great achievements for singleperson device-free gesture recognition.However,when multiple persons conduct gestures simultaneously,the received signals will be mixed together,and thus traditional methods would not work well anymore.Moreover,the anonymity of persons and the change in the surrounding environment would cause feature shift and mismatch,and thus the recognition accuracy would degrade remarkably.To address these problems,we explore and exploit the diversity of spatial information and propose a multidimensional analysis method to separate the gesture feature of each person using a focusing sensing strategy.Meanwhile,we also present a deep-learning based robust device free gesture recognition framework,which leverages an adversarial approach to extract robust gesture feature that is insensitive to the change of persons and environment.Furthermore,we also develop a 77GHz mmWave prototype system and evaluate the proposed methods extensively.Experimental results reveal that the proposed system can achieve average accuracies of 93%and 84%when 10 gestures are conducted in Received:Jun.18,2020 Revised:Aug.06,2020 Editor:Ning Ge different environments by two and four persons simultaneously,respectively.
基金supported in part by the Key Program of NSFC (Grant No.U1908214)Special Project of Central Government Guiding Local Science and Technology Development (Grant No.2021JH6/10500140)+3 种基金Program for the Liaoning Distinguished Professor,Program for Innovative Research Team in University of Liaoning Province (LT2020015)Dalian (2021RT06)and Dalian University (XLJ202010)the Science and Technology Innovation Fund of Dalian (Grant No.2020JJ25CY001)Dalian University Scientific Research Platform Project (No.202101YB03).
文摘Multi-view multi-person 3D human pose estimation is a hot topic in the field of human pose estimation due to its wide range of application scenarios.With the introduction of end-to-end direct regression methods,the field has entered a new stage of development.However,the regression results of joints that are more heavily influenced by external factors are not accurate enough even for the optimal method.In this paper,we propose an effective feature recalibration module based on the channel attention mechanism and a relative optimal calibration strategy,which is applied to themulti-viewmulti-person 3D human pose estimation task to achieve improved detection accuracy for joints that are more severely affected by external factors.Specifically,it achieves relative optimal weight adjustment of joint feature information through the recalibration module and strategy,which enables the model to learn the dependencies between joints and the dependencies between people and their corresponding joints.We call this method as the Efficient Recalibration Network(ER-Net).Finally,experiments were conducted on two benchmark datasets for this task,Campus and Shelf,in which the PCP reached 97.3% and 98.3%,respectively.
基金supported by the Strategic Priority Research Program of the Chinese Academy of Sciences,China(No.XDA27030600)National Natural Science Foundation of China(No.62206283).
文摘Semi-supervised learning is a significant approach to learn robust human pose estimation models that perform well on wild images.Existing semi-supervised methods of human pose estimation mainly focus on instance-agnostic keypoint detection.In multi-person scenes,the arbitrary number of instances that have made pose estimation much more challenging,and current semi-supervised methods cannot fully mine the information in unlabeled data.To leverage the instance information in unlabeled data,we propose an end-to-end semi-supervised training strategy.Different from previous semi-supervised methods in two stages,our method focuses on detector-free frameworks including bottom-up and single-stage ones.It not only performs consistency regularization on heatmaps,but also employs a pseudo-labeling approach to generate instance-specific pseudo annotations.On the COCO and CrowdPose benchmark,the proposed approach outperforms previous instance-agnostic methods under various labeling ratios.Our method is applicable to both bottom-up and single-stage frameworks,showing its general applicability.
基金supported by the National Natural Science Foundation of China A3 Foresight Program under Grant No.62061146001the Peking University(PKU)-Nanyang Technological University(NTU)Collaboration Project,the Project funded by China Postdoctoral Science Foundation under Grant No.2021TQo048+2 种基金the National Natural Science Foundation of China under Grant No.62172394the Beijing Natural Science Foundation under Grant No.L223034the Beijing Nova Program,and the Youth Innovation Promotion Association of Chinese Academy of Sciences under Grant No.2020109.
文摘Monitoring respiration is an important component of personal health care.Though recent developments in Wi-Fi sensing offer a potential tool to achieve contact-free respiration monitoring,existing proposals for Wi-Fi-based multi-person respiration sensing mainly extract individual's respiration rate in the frequency domain using the fast Fourier transform(FFT)or multiple signal classification(MUSIC)method,leading to the following limitations:1)largely ineffective in recovering breaths of multiple persons from received mixed signals and in differentiating individual breaths,2)unable to acquire the time-varying respiration pattern when the subject has respiratory abnormity,such as apnea and changing respiration rates,and 3)difficult to identify the real number of subjects when multiple subjects share the same or similar respiration rates.To address these issues,we propose Wi-Fi-enabled MUlti-person SEnsing(WiMUSE)as a signal processing pipeline to perform respiration monitoring for multiple persons simultaneously.Essentially,as a pioneering time domain approach,WiMUSE models the mixed signals of multi-person respiration as a linear superposition of multiple waveforms,so as to form a blind source separation(BSS)problem.The effective separation of the signal sources(respiratory waveforms)further enables us to quantify the differences in the respiratory waveform patterns of multiple subjects,and thus to identify the number of subjects along with their respective respiration waveforms.We implement WiMUSE on commodity Wi-Fi devices and conduct extensive experiments to demonstrate that,compared with the approaches based on the FFT or MUSIC method,90%error of respiration rate can be reduced by more than 60%.
基金the National Natural Science Foundation of China under Grant Nos.61672077 and 61532002the Applied Basic Research Program of Qingdao under Grant No.161013xxthe National Science Foundation of USA under Grant Nos.US-0949467.IIS-1047715,IIS-1715985,IIS61672149,and IIS-1049448.
文摘For multi-person 2D pose estimation,current deep learning baised methods have exhibited impressive performance,but the trade-offs among efficiency,robustness,and accuracy in the existing approaches remain unavoidable.In principle,bottom-up methods are superior to top-down methods in efficiency,but they perform worse in accuracy.To make full use of their respective advantages,in this paper we design a novel bidirectional optimization coupled lightweight network(BOCLN)architecture for efficient,robust,and general-purpose multi-person 2D(2-dimensional)pose estimation from natural images.With the BOCLN framework,the bottom-up network focuses oil global features,while the top-down net work places emphasis on det ailed features.The entire framework shares global features along the bottom-up data stream,while the top-down data stream aims to accelerate the accurate pose estimation.In particular,to exploit the priors of human joints'relationship,we propose a probability limb heat map to represent the spatial context of the joints and guide the overall pose skeleton prediction,so that each person's pose estimation in cluttered scenes(involving crowd)could be as accurate and robust as possible.Therefore,benefiting from the novel BOCLN architecture,the tinie-consuming refinement procedure could be much simplified to an efficient lightweight network.Extensive experiments and evaluations on public benchmarks have confirmed that our new method is more efficient and robust,yet still attain competitive accuracy performance compared with the state-of-the-art methods.Our BOCLN shows even greater promise in online applications.
文摘Despite significant developments in 3D multi-view multi-person (3D MM) tracking, current frameworks separately target footprint tracking, or pose tracking. Frameworks designed for the former cannot be used for the latter, because they directly obtain 3D positions on the ground plane via a homography projection, which is inapplicable to 3D poses above the ground. In contrast, frameworks designed for pose tracking generally isolate multi-view and multi-frame associations and may not be sufficiently robust for footprint tracking, which utilizes fewer key points than pose tracking, weakening multi-view association cues in a single frame. This study presents a unified multi-view multi-person tracking framework to bridge the gap between footprint tracking and pose tracking. Without additional modifications, the framework can adopt monocular 2D bounding boxes and 2D poses as its input to produce robust 3D trajectories for multiple persons. Importantly, multi-frame and multi-view information are jointly employed to improve association and triangulation. Our framework is shown to provide state-of-the-art performance on the Campus and Shelf datasets for 3D pose tracking, with comparable results on the WILDTRACK and MMPTRACK datasets for 3D footprint tracking.
基金partly supported by the National Natural Science Foundation of China(62122058,62171317,and 62231018).
文摘Existing multi-person reconstruction methods require the human bodies in the input image to occupy a considerable portion of the picture.However,low-resolution human objects are ubiquitous due to trade-offbetween the field of view and target distance given a limited camera resolution.In this paper,we propose an end-to-end multi-task framework for multi-person inference from a low-resolution image(MILI).To perceive more information from a low-resolution image,we use pair-wise images at high resolution and low resolution for training,and design a restoration network with a simple loss for better feature extraction from the low-resolution image.To address the occlusion problem in multi-person scenes,we propose an occlusion-aware mask prediction network to estimate the mask of each person during 3D mesh regression.Experimental results on both small-scale scenes and large-scale scenes demonstrate that our method outperforms the state-of-the-art methods both quantitatively and qualitatively.The code is available at http://cic.tju.edu.cn/faculty/likun/projects/MILI.
文摘个性化联邦学习因其在应对数据异质性和隐私保护方面的优势而备受关注。现有算法专注于平衡全局信息和个性化信息之间的矛盾,忽视了全局信息中的不同标签信息带来的干扰,尤其在维护单一全局头部的算法中,容易出现标签间特征冲突导致的收敛困难。为此,提出一种新的算法——全局多头部联邦学习(federated learning with global multi-head,FedGMH)算法,该算法在服务器创建多个全局头部,每个头部专门处理一种标签信息,而客户端下载与本地标签相关的全局头部,从而避免无关标签信息的干扰。此外,FedGMH引入参数级聚合机制:评估头部参数重要性,并将关键参数更新为全局多头部的加权参数,以加快收敛速度并且提高准确率。在3个视觉数据集上的大量实验表明,FedGMH优于现有的先进算法。