Gesture recognition technology enables machines to read human gestures and has significant application prospects in the fields of human-computer interaction and sign language translation.Existing researches usually us...Gesture recognition technology enables machines to read human gestures and has significant application prospects in the fields of human-computer interaction and sign language translation.Existing researches usually use convolutional neural networks to extract features directly from raw gesture data for gesture recognition,but the networks are affected by much interference information in the input data and thus fit to some unimportant features.In this paper,we proposed a novel method for encoding spatio-temporal information,which can enhance the key features required for gesture recognition,such as shape,structure,contour,position and hand motion of gestures,thereby improving the accuracy of gesture recognition.This encoding method can encode arbitrarily multiple frames of gesture data into a single frame of the spatio-temporal feature map and use the spatio-temporal feature map as the input to the neural network.This can guide the model to fit important features while avoiding the use of complex recurrent network structures to extract temporal features.In addition,we designed two sub-networks and trained the model using a sub-network pre-training strategy that trains the sub-networks first and then the entire network,so as to avoid the subnetworks focusing too much on the information of a single category feature and being overly influenced by each other’s features.Experimental results on two public gesture datasets show that the proposed spatio-temporal information encoding method achieves advanced accuracy.展开更多
With the rapid advancement of virtual reality,dynamic gesture recognition technology has become an indispensable and critical technique for users to achieve human–computer interaction in virtual environments.The reco...With the rapid advancement of virtual reality,dynamic gesture recognition technology has become an indispensable and critical technique for users to achieve human–computer interaction in virtual environments.The recognition of dynamic gestures is a challenging task due to the high degree of freedom and the influence of individual differences and the change of gesture space.To solve the problem of low recognition accuracy of existing networks,an improved dynamic gesture recognition algorithm based on ResNeXt architecture is proposed.The algorithm employs three-dimensional convolution techniques to effectively capture the spatiotemporal features intrinsic to dynamic gestures.Additionally,to enhance the model’s focus and improve its accuracy in identifying dynamic gestures,a lightweight convolutional attention mechanism is introduced.This mechanism not only augments the model’s precision but also facilitates faster convergence during the training phase.In order to further optimize the performance of the model,a deep attention submodule is added to the convolutional attention mechanism module to strengthen the network’s capability in temporal feature extraction.Empirical evaluations on EgoGesture and NvGesture datasets show that the accuracy of the proposed model in dynamic gesture recognition reaches 95.03%and 86.21%,respectively.When operating in RGB mode,the accuracy reached 93.49%and 80.22%,respectively.These results underscore the effectiveness of the proposed algorithm in recognizing dynamic gestures with high accuracy,showcasing its potential for applications in advanced human–computer interaction systems.展开更多
Gesture recognition utilizing flexible strain sensors is a highly valuable technology widely applied in human-machine interfaces.However,achieving rapid detection of subtle motions and timely processing of dynamic sig...Gesture recognition utilizing flexible strain sensors is a highly valuable technology widely applied in human-machine interfaces.However,achieving rapid detection of subtle motions and timely processing of dynamic signals remain a challenge for sensors.Here,highly resilient and durable ionogels are developed by introducing micro-scale incompatible phases in macroscopic homogeneous polymeric network.The compatible network disperses in conductive ionic liquid to form highly resilient and stretchable skeleton,while incompatible phase forms hydrogen bonds to dissipate energy thus strengthening the ionogels.The ionogels-derived strain sensors show highly sensitivity,fast response time(<10 ms),low detection limit(~50μm),and remarkable durability(>5000 cycles),allowing for precise monitoring of human motions.More importantly,a self-adaptive recognition program empowered by deep-learning algorithms is designed to compensate for sensors,creating a comprehensive system capable of dynamic gesture recognition.This system can comprehensively analyze both the temporal and spatial features of sensor data,enabling deeper understanding of the dynamic process underlying gestures.The system accurately classifies 10 hand gestures across five participants with impressive accuracy of 93.66%.Moreover,it maintains robust recognition performance without the need for further training even when different sensors or subjects are involved.This technological breakthrough paves the way for intuitive and seamless interaction between humans and machines,presenting significant opportunities in diverse applications,such as human-robot interaction,virtual reality control,and assistive devices for the disabled individuals.展开更多
Wearable wristband systems leverage deep learning to revolutionize hand gesture recognition in daily activities.Unlike existing approaches that often focus on static gestures and require extensive labeled data,the pro...Wearable wristband systems leverage deep learning to revolutionize hand gesture recognition in daily activities.Unlike existing approaches that often focus on static gestures and require extensive labeled data,the proposed wearable wristband with selfsupervised contrastive learning excels at dynamic motion tracking and adapts rapidly across multiple scenarios.It features a four-channel sensing array composed of an ionic hydrogel with hierarchical microcone structures and ultrathin flexible electrodes,resulting in high-sensitivity capacitance output.Through wireless transmission from a Wi-Fi module,the proposed algorithm learns latent features from the unlabeled signals of random wrist movements.Remarkably,only few-shot labeled data are sufficient for fine-tuning the model,enabling rapid adaptation to various tasks.The system achieves a high accuracy of 94.9%in different scenarios,including the prediction of eight-direction commands,and air-writing of all numbers and letters.The proposed method facilitates smooth transitions between multiple tasks without the need for modifying the structure or undergoing extensive task-specific training.Its utility has been further extended to enhance human–machine interaction over digital platforms,such as game controls,calculators,and three-language login systems,offering users a natural and intuitive way of communication.展开更多
The rapid evolution of virtual reality(VR)and augmented reality(AR)technologies has significantly transformed human-computer interaction,with applications spanning entertainment,education,healthcare,industry,and remot...The rapid evolution of virtual reality(VR)and augmented reality(AR)technologies has significantly transformed human-computer interaction,with applications spanning entertainment,education,healthcare,industry,and remote collaboration.A central challenge in these immersive systems lies in enabling intuitive,efficient,and natural interactions.Hand gesture recognition offers a compelling solution by leveraging the expressiveness of human hands to facilitate seamless control without relying on traditional input devices such as controllers or keyboards,which can limit immersion.However,achieving robust gesture recognition requires overcoming challenges related to accurate hand tracking,complex environmental conditions,and minimizing system latency.This study proposes an artificial intelligence(AI)-driven framework for recognizing both static and dynamic hand gestures in VR and AR environments using skeleton-based tracking compliant with the OpenXR standard.Our approach employs a lightweight neural network architecture capable of real-time classification within approximately 1.3mswhilemaintaining average accuracy of 95%.We also introduce a novel dataset generation method to support training robust models and demonstrate consistent classification of diverse gestures across widespread commercial VR devices.This work represents one of the first studies to implement and validate dynamic hand gesture recognition in real time using standardized VR hardware,laying the groundwork for more immersive,accessible,and user-friendly interaction systems.By advancing AI-driven gesture interfaces,this research has the potential to broaden the adoption of VR and AR across diverse domains and enhance the overall user experience.展开更多
The use of hand gestures can be the most intuitive human-machine interaction medium.The early approaches for hand gesture recognition used device-based methods.These methods use mechanical or optical sensors attached ...The use of hand gestures can be the most intuitive human-machine interaction medium.The early approaches for hand gesture recognition used device-based methods.These methods use mechanical or optical sensors attached to a glove or markers,which hinder the natural human-machine communication.On the other hand,vision-based methods are less restrictive and allow for a more spontaneous communication without the need of an intermediary between human and machine.Therefore,vision gesture recognition has been a popular area of research for the past thirty years.Hand gesture recognition finds its application in many areas,particularly the automotive industry where advanced automotive human-machine interface(HMI)designers are using gesture recognition to improve driver and vehicle safety.However,technology advances go beyond active/passive safety and into convenience and comfort.In this context,one of America’s big three automakers has partnered with the Centre of Pattern Analysis and Machine Intelligence(CPAMI)at the University of Waterloo to investigate expanding their product segment through machine learning to provide an increased driver convenience and comfort with the particular application of hand gesture recognition for autonomous car parking.The present paper leverages the state-of-the-art deep learning and optimization techniques to develop a vision-based multiview dynamic hand gesture recognizer for a self-parking system.We propose a 3D-CNN gesture model architecture that we train on a publicly available hand gesture database.We apply transfer learning methods to fine-tune the pre-trained gesture model on custom-made data,which significantly improves the proposed system performance in a real world environment.We adapt the architecture of end-to-end solution to expand the state-of-the-art video classifier from a single image as input(fed by monocular camera)to a Multiview 360 feed,offered by a six cameras module.Finally,we optimize the proposed solution to work on a limited resource embedded platform(Nvidia Jetson TX2)that is used by automakers for vehicle-based features,without sacrificing the accuracy robustness and real time functionality of the system.展开更多
Hearing and Speech impairment can be congenital or acquired.Hearing and speech-impaired students often hesitate to pursue higher education in reputable institutions due to their challenges.However,the development of a...Hearing and Speech impairment can be congenital or acquired.Hearing and speech-impaired students often hesitate to pursue higher education in reputable institutions due to their challenges.However,the development of automated assistive learning tools within the educational field has empowered disabled students to pursue higher education in any field of study.Assistive learning devices enable students to access institutional resources and facilities fully.The proposed assistive learning and communication tool allows hearing and speech-impaired students to interact productively with their teachers and classmates.This tool converts the audio signals into sign language videos for the speech and hearing-impaired to follow and converts the sign language to text format for the teachers to follow.This educational tool for the speech and hearing-impaired is implemented by customized deep learning models such as Convolution neural networks(CNN),Residual neural Networks(ResNet),and stacked Long short-term memory(LSTM)network models.This assistive learning tool is a novel framework that interprets the static and dynamic gesture actions in American Sign Language(ASL).Such communicative tools empower the speech and hearing impaired to communicate effectively in a classroom environment and foster inclusivity.Customized deep learning models were developed and experimentally evaluated with the standard performance metrics.The model exhibits an accuracy of 99.7% for all static gesture classification and 99% for specific vocabulary of gesture action words.This two-way communicative and educational tool encourages social inclusion and a promising career for disabled students.展开更多
基金This work was supported,in part,by the National Nature Science Foundation of China under grant numbers 62272236in part,by the Natural Science Foundation of Jiangsu Province under grant numbers BK20201136,BK20191401in part,by the Priority Academic Program Development of Jiangsu Higher Education Institutions(PAPD)fund.
文摘Gesture recognition technology enables machines to read human gestures and has significant application prospects in the fields of human-computer interaction and sign language translation.Existing researches usually use convolutional neural networks to extract features directly from raw gesture data for gesture recognition,but the networks are affected by much interference information in the input data and thus fit to some unimportant features.In this paper,we proposed a novel method for encoding spatio-temporal information,which can enhance the key features required for gesture recognition,such as shape,structure,contour,position and hand motion of gestures,thereby improving the accuracy of gesture recognition.This encoding method can encode arbitrarily multiple frames of gesture data into a single frame of the spatio-temporal feature map and use the spatio-temporal feature map as the input to the neural network.This can guide the model to fit important features while avoiding the use of complex recurrent network structures to extract temporal features.In addition,we designed two sub-networks and trained the model using a sub-network pre-training strategy that trains the sub-networks first and then the entire network,so as to avoid the subnetworks focusing too much on the information of a single category feature and being overly influenced by each other’s features.Experimental results on two public gesture datasets show that the proposed spatio-temporal information encoding method achieves advanced accuracy.
文摘With the rapid advancement of virtual reality,dynamic gesture recognition technology has become an indispensable and critical technique for users to achieve human–computer interaction in virtual environments.The recognition of dynamic gestures is a challenging task due to the high degree of freedom and the influence of individual differences and the change of gesture space.To solve the problem of low recognition accuracy of existing networks,an improved dynamic gesture recognition algorithm based on ResNeXt architecture is proposed.The algorithm employs three-dimensional convolution techniques to effectively capture the spatiotemporal features intrinsic to dynamic gestures.Additionally,to enhance the model’s focus and improve its accuracy in identifying dynamic gestures,a lightweight convolutional attention mechanism is introduced.This mechanism not only augments the model’s precision but also facilitates faster convergence during the training phase.In order to further optimize the performance of the model,a deep attention submodule is added to the convolutional attention mechanism module to strengthen the network’s capability in temporal feature extraction.Empirical evaluations on EgoGesture and NvGesture datasets show that the accuracy of the proposed model in dynamic gesture recognition reaches 95.03%and 86.21%,respectively.When operating in RGB mode,the accuracy reached 93.49%and 80.22%,respectively.These results underscore the effectiveness of the proposed algorithm in recognizing dynamic gestures with high accuracy,showcasing its potential for applications in advanced human–computer interaction systems.
基金supported by the National Key Research and Development Program of China(No.2021YFA1401103)the National Natural Science Foundation of China(Nos.61825403,61921005,and 82370520).
文摘Gesture recognition utilizing flexible strain sensors is a highly valuable technology widely applied in human-machine interfaces.However,achieving rapid detection of subtle motions and timely processing of dynamic signals remain a challenge for sensors.Here,highly resilient and durable ionogels are developed by introducing micro-scale incompatible phases in macroscopic homogeneous polymeric network.The compatible network disperses in conductive ionic liquid to form highly resilient and stretchable skeleton,while incompatible phase forms hydrogen bonds to dissipate energy thus strengthening the ionogels.The ionogels-derived strain sensors show highly sensitivity,fast response time(<10 ms),low detection limit(~50μm),and remarkable durability(>5000 cycles),allowing for precise monitoring of human motions.More importantly,a self-adaptive recognition program empowered by deep-learning algorithms is designed to compensate for sensors,creating a comprehensive system capable of dynamic gesture recognition.This system can comprehensively analyze both the temporal and spatial features of sensor data,enabling deeper understanding of the dynamic process underlying gestures.The system accurately classifies 10 hand gestures across five participants with impressive accuracy of 93.66%.Moreover,it maintains robust recognition performance without the need for further training even when different sensors or subjects are involved.This technological breakthrough paves the way for intuitive and seamless interaction between humans and machines,presenting significant opportunities in diverse applications,such as human-robot interaction,virtual reality control,and assistive devices for the disabled individuals.
基金supported by the Research Grant Fund from Kwangwoon University in 2023,the National Natural Science Foundation of China under Grant(62311540155)the Taishan Scholars Project Special Funds(tsqn202312035)the open research foundation of State Key Laboratory of Integrated Chips and Systems.
文摘Wearable wristband systems leverage deep learning to revolutionize hand gesture recognition in daily activities.Unlike existing approaches that often focus on static gestures and require extensive labeled data,the proposed wearable wristband with selfsupervised contrastive learning excels at dynamic motion tracking and adapts rapidly across multiple scenarios.It features a four-channel sensing array composed of an ionic hydrogel with hierarchical microcone structures and ultrathin flexible electrodes,resulting in high-sensitivity capacitance output.Through wireless transmission from a Wi-Fi module,the proposed algorithm learns latent features from the unlabeled signals of random wrist movements.Remarkably,only few-shot labeled data are sufficient for fine-tuning the model,enabling rapid adaptation to various tasks.The system achieves a high accuracy of 94.9%in different scenarios,including the prediction of eight-direction commands,and air-writing of all numbers and letters.The proposed method facilitates smooth transitions between multiple tasks without the need for modifying the structure or undergoing extensive task-specific training.Its utility has been further extended to enhance human–machine interaction over digital platforms,such as game controls,calculators,and three-language login systems,offering users a natural and intuitive way of communication.
基金supported by research fund from Chosun University,2024.
文摘The rapid evolution of virtual reality(VR)and augmented reality(AR)technologies has significantly transformed human-computer interaction,with applications spanning entertainment,education,healthcare,industry,and remote collaboration.A central challenge in these immersive systems lies in enabling intuitive,efficient,and natural interactions.Hand gesture recognition offers a compelling solution by leveraging the expressiveness of human hands to facilitate seamless control without relying on traditional input devices such as controllers or keyboards,which can limit immersion.However,achieving robust gesture recognition requires overcoming challenges related to accurate hand tracking,complex environmental conditions,and minimizing system latency.This study proposes an artificial intelligence(AI)-driven framework for recognizing both static and dynamic hand gestures in VR and AR environments using skeleton-based tracking compliant with the OpenXR standard.Our approach employs a lightweight neural network architecture capable of real-time classification within approximately 1.3mswhilemaintaining average accuracy of 95%.We also introduce a novel dataset generation method to support training robust models and demonstrate consistent classification of diverse gestures across widespread commercial VR devices.This work represents one of the first studies to implement and validate dynamic hand gesture recognition in real time using standardized VR hardware,laying the groundwork for more immersive,accessible,and user-friendly interaction systems.By advancing AI-driven gesture interfaces,this research has the potential to broaden the adoption of VR and AR across diverse domains and enhance the overall user experience.
文摘The use of hand gestures can be the most intuitive human-machine interaction medium.The early approaches for hand gesture recognition used device-based methods.These methods use mechanical or optical sensors attached to a glove or markers,which hinder the natural human-machine communication.On the other hand,vision-based methods are less restrictive and allow for a more spontaneous communication without the need of an intermediary between human and machine.Therefore,vision gesture recognition has been a popular area of research for the past thirty years.Hand gesture recognition finds its application in many areas,particularly the automotive industry where advanced automotive human-machine interface(HMI)designers are using gesture recognition to improve driver and vehicle safety.However,technology advances go beyond active/passive safety and into convenience and comfort.In this context,one of America’s big three automakers has partnered with the Centre of Pattern Analysis and Machine Intelligence(CPAMI)at the University of Waterloo to investigate expanding their product segment through machine learning to provide an increased driver convenience and comfort with the particular application of hand gesture recognition for autonomous car parking.The present paper leverages the state-of-the-art deep learning and optimization techniques to develop a vision-based multiview dynamic hand gesture recognizer for a self-parking system.We propose a 3D-CNN gesture model architecture that we train on a publicly available hand gesture database.We apply transfer learning methods to fine-tune the pre-trained gesture model on custom-made data,which significantly improves the proposed system performance in a real world environment.We adapt the architecture of end-to-end solution to expand the state-of-the-art video classifier from a single image as input(fed by monocular camera)to a Multiview 360 feed,offered by a six cameras module.Finally,we optimize the proposed solution to work on a limited resource embedded platform(Nvidia Jetson TX2)that is used by automakers for vehicle-based features,without sacrificing the accuracy robustness and real time functionality of the system.
基金sponsored by Prince Sattam Bin Abdulaziz University(PSAU)as part of funding for its SDG Roadmap Research Funding Programme project number PSAU-2023-SDG-2023/SDG/31.
文摘Hearing and Speech impairment can be congenital or acquired.Hearing and speech-impaired students often hesitate to pursue higher education in reputable institutions due to their challenges.However,the development of automated assistive learning tools within the educational field has empowered disabled students to pursue higher education in any field of study.Assistive learning devices enable students to access institutional resources and facilities fully.The proposed assistive learning and communication tool allows hearing and speech-impaired students to interact productively with their teachers and classmates.This tool converts the audio signals into sign language videos for the speech and hearing-impaired to follow and converts the sign language to text format for the teachers to follow.This educational tool for the speech and hearing-impaired is implemented by customized deep learning models such as Convolution neural networks(CNN),Residual neural Networks(ResNet),and stacked Long short-term memory(LSTM)network models.This assistive learning tool is a novel framework that interprets the static and dynamic gesture actions in American Sign Language(ASL).Such communicative tools empower the speech and hearing impaired to communicate effectively in a classroom environment and foster inclusivity.Customized deep learning models were developed and experimentally evaluated with the standard performance metrics.The model exhibits an accuracy of 99.7% for all static gesture classification and 99% for specific vocabulary of gesture action words.This two-way communicative and educational tool encourages social inclusion and a promising career for disabled students.