The rapid evolution of virtual reality(VR)and augmented reality(AR)technologies has significantly transformed human-computer interaction,with applications spanning entertainment,education,healthcare,industry,and remot...The rapid evolution of virtual reality(VR)and augmented reality(AR)technologies has significantly transformed human-computer interaction,with applications spanning entertainment,education,healthcare,industry,and remote collaboration.A central challenge in these immersive systems lies in enabling intuitive,efficient,and natural interactions.Hand gesture recognition offers a compelling solution by leveraging the expressiveness of human hands to facilitate seamless control without relying on traditional input devices such as controllers or keyboards,which can limit immersion.However,achieving robust gesture recognition requires overcoming challenges related to accurate hand tracking,complex environmental conditions,and minimizing system latency.This study proposes an artificial intelligence(AI)-driven framework for recognizing both static and dynamic hand gestures in VR and AR environments using skeleton-based tracking compliant with the OpenXR standard.Our approach employs a lightweight neural network architecture capable of real-time classification within approximately 1.3mswhilemaintaining average accuracy of 95%.We also introduce a novel dataset generation method to support training robust models and demonstrate consistent classification of diverse gestures across widespread commercial VR devices.This work represents one of the first studies to implement and validate dynamic hand gesture recognition in real time using standardized VR hardware,laying the groundwork for more immersive,accessible,and user-friendly interaction systems.By advancing AI-driven gesture interfaces,this research has the potential to broaden the adoption of VR and AR across diverse domains and enhance the overall user experience.展开更多
The use of hand gestures can be the most intuitive human-machine interaction medium.The early approaches for hand gesture recognition used device-based methods.These methods use mechanical or optical sensors attached ...The use of hand gestures can be the most intuitive human-machine interaction medium.The early approaches for hand gesture recognition used device-based methods.These methods use mechanical or optical sensors attached to a glove or markers,which hinder the natural human-machine communication.On the other hand,vision-based methods are less restrictive and allow for a more spontaneous communication without the need of an intermediary between human and machine.Therefore,vision gesture recognition has been a popular area of research for the past thirty years.Hand gesture recognition finds its application in many areas,particularly the automotive industry where advanced automotive human-machine interface(HMI)designers are using gesture recognition to improve driver and vehicle safety.However,technology advances go beyond active/passive safety and into convenience and comfort.In this context,one of America’s big three automakers has partnered with the Centre of Pattern Analysis and Machine Intelligence(CPAMI)at the University of Waterloo to investigate expanding their product segment through machine learning to provide an increased driver convenience and comfort with the particular application of hand gesture recognition for autonomous car parking.The present paper leverages the state-of-the-art deep learning and optimization techniques to develop a vision-based multiview dynamic hand gesture recognizer for a self-parking system.We propose a 3D-CNN gesture model architecture that we train on a publicly available hand gesture database.We apply transfer learning methods to fine-tune the pre-trained gesture model on custom-made data,which significantly improves the proposed system performance in a real world environment.We adapt the architecture of end-to-end solution to expand the state-of-the-art video classifier from a single image as input(fed by monocular camera)to a Multiview 360 feed,offered by a six cameras module.Finally,we optimize the proposed solution to work on a limited resource embedded platform(Nvidia Jetson TX2)that is used by automakers for vehicle-based features,without sacrificing the accuracy robustness and real time functionality of the system.展开更多
基金supported by research fund from Chosun University,2024.
文摘The rapid evolution of virtual reality(VR)and augmented reality(AR)technologies has significantly transformed human-computer interaction,with applications spanning entertainment,education,healthcare,industry,and remote collaboration.A central challenge in these immersive systems lies in enabling intuitive,efficient,and natural interactions.Hand gesture recognition offers a compelling solution by leveraging the expressiveness of human hands to facilitate seamless control without relying on traditional input devices such as controllers or keyboards,which can limit immersion.However,achieving robust gesture recognition requires overcoming challenges related to accurate hand tracking,complex environmental conditions,and minimizing system latency.This study proposes an artificial intelligence(AI)-driven framework for recognizing both static and dynamic hand gestures in VR and AR environments using skeleton-based tracking compliant with the OpenXR standard.Our approach employs a lightweight neural network architecture capable of real-time classification within approximately 1.3mswhilemaintaining average accuracy of 95%.We also introduce a novel dataset generation method to support training robust models and demonstrate consistent classification of diverse gestures across widespread commercial VR devices.This work represents one of the first studies to implement and validate dynamic hand gesture recognition in real time using standardized VR hardware,laying the groundwork for more immersive,accessible,and user-friendly interaction systems.By advancing AI-driven gesture interfaces,this research has the potential to broaden the adoption of VR and AR across diverse domains and enhance the overall user experience.
文摘The use of hand gestures can be the most intuitive human-machine interaction medium.The early approaches for hand gesture recognition used device-based methods.These methods use mechanical or optical sensors attached to a glove or markers,which hinder the natural human-machine communication.On the other hand,vision-based methods are less restrictive and allow for a more spontaneous communication without the need of an intermediary between human and machine.Therefore,vision gesture recognition has been a popular area of research for the past thirty years.Hand gesture recognition finds its application in many areas,particularly the automotive industry where advanced automotive human-machine interface(HMI)designers are using gesture recognition to improve driver and vehicle safety.However,technology advances go beyond active/passive safety and into convenience and comfort.In this context,one of America’s big three automakers has partnered with the Centre of Pattern Analysis and Machine Intelligence(CPAMI)at the University of Waterloo to investigate expanding their product segment through machine learning to provide an increased driver convenience and comfort with the particular application of hand gesture recognition for autonomous car parking.The present paper leverages the state-of-the-art deep learning and optimization techniques to develop a vision-based multiview dynamic hand gesture recognizer for a self-parking system.We propose a 3D-CNN gesture model architecture that we train on a publicly available hand gesture database.We apply transfer learning methods to fine-tune the pre-trained gesture model on custom-made data,which significantly improves the proposed system performance in a real world environment.We adapt the architecture of end-to-end solution to expand the state-of-the-art video classifier from a single image as input(fed by monocular camera)to a Multiview 360 feed,offered by a six cameras module.Finally,we optimize the proposed solution to work on a limited resource embedded platform(Nvidia Jetson TX2)that is used by automakers for vehicle-based features,without sacrificing the accuracy robustness and real time functionality of the system.