The Hand Gestures Recognition(HGR)System can be employed to facilitate communication between humans and computers instead of using special input and output devices.These devices may complicate communication with compu...The Hand Gestures Recognition(HGR)System can be employed to facilitate communication between humans and computers instead of using special input and output devices.These devices may complicate communication with computers especially for people with disabilities.Hand gestures can be defined as a natural human-to-human communication method,which also can be used in human-computer interaction.Many researchers developed various techniques and methods that aimed to understand and recognize specific hand gestures by employing one or two machine learning algorithms with a reasonable accuracy.Thiswork aims to develop a powerful hand gesture recognition model with a 100%recognition rate.We proposed an ensemble classification model that combines the most powerful machine learning classifiers to obtain diversity and improve accuracy.The majority voting method was used to aggregate accuracies produced by each classifier and get the final classification result.Our model was trained using a self-constructed dataset containing 1600 images of ten different hand gestures.The employing of canny’s edge detector and histogram of oriented gradient method was a great combination with the ensemble classifier and the recognition rate.The experimental results had shown the robustness of our proposed model.Logistic Regression and Support Vector Machine have achieved 100%accuracy.The developed model was validated using two public datasets,and the findings have proved that our model outperformed other compared studies.展开更多
Industrial operators need reliable communication in high-noise,safety-critical environments where speech or touch input is often impractical.Existing gesture systems either miss real-time deadlines on resourceconstrai...Industrial operators need reliable communication in high-noise,safety-critical environments where speech or touch input is often impractical.Existing gesture systems either miss real-time deadlines on resourceconstrained hardware or lose accuracy under occlusion,vibration,and lighting changes.We introduce Industrial EdgeSign,a dual-path framework that combines hardware-aware neural architecture search(NAS)with large multimodalmodel(LMM)guided semantics to deliver robust,low-latency gesture recognition on edge devices.The searched model uses a truncated ResNet50 front end,a dimensional-reduction network that preserves spatiotemporal structure for tubelet-based attention,and localized Transformer layers tuned for on-device inference.To reduce reliance on gloss annotations and mitigate domain shift,we distill semantics from factory-tuned vision-language models and pre-train with masked language modeling and video-text contrastive objectives,aligning visual features with a shared text space.OnML2HP and SHREC’17,theNAS-derived architecture attains 94.7% accuracywith 86ms inference latency and about 5.9W power on Jetson Nano.Under occlusion,lighting shifts,andmotion blur,accuracy remains above 82%.For safetycritical commands,the emergency-stop gesture achieves 72 ms 99th percentile latency with 99.7% fail-safe triggering.Ablation studies confirm the contribution of the spatiotemporal tubelet extractor and text-side pre-training,and we observe gains in translation quality(BLEU-422.33).These results show that Industrial EdgeSign provides accurate,resource-aware,and safety-aligned gesture recognition suitable for deployment in smart factory settings.展开更多
Gestures recognition is of great importance to intelligent human-computer interaction technology, but it is also very difficult to deal with, especially when the environment is quite complex. In this paper, the recogn...Gestures recognition is of great importance to intelligent human-computer interaction technology, but it is also very difficult to deal with, especially when the environment is quite complex. In this paper, the recognition algorithm of dynamic and combined gestures, which based on multi-feature fusion, is proposed. Firstly, in image segmentation stage, the algorithm extracts interested region of gestures in color and depth map by combining with the depth information. Then, to establish support vector machine (SVM) model for static hand gestures recognition, the algorithm fuses weighted Hu invariant moments of depth map into the Histogram of oriented gradients (HOG) of the color image. Finally, an hidden Markov model (HMM) toolbox supporting multi-dimensional continuous data input is adopted to do the training and recognition. Experimental results show that the proposed algorithm can not only overcome the influence of skin object, multi-object moving and hand gestures interference in the background, but also real-time and practical in Human-Computer interaction.展开更多
With the growing application of intelligent robots in service,manufacturing,and medical fields,efficient and natural interaction between humans and robots has become key to improving collaboration efficiency and user ...With the growing application of intelligent robots in service,manufacturing,and medical fields,efficient and natural interaction between humans and robots has become key to improving collaboration efficiency and user experience.Gesture recognition,as an intuitive and contactless interaction method,can overcome the limitations of traditional interfaces and enable real-time control and feedback of robot movements and behaviors.This study first reviews mainstream gesture recognition algorithms and their application on different sensing platforms(RGB cameras,depth cameras,and inertial measurement units).It then proposes a gesture recognition method based on multimodal feature fusion and a lightweight deep neural network that balances recognition accuracy with computational efficiency.At system level,a modular human-robot interaction architecture is constructed,comprising perception,decision,and execution layers,and gesture commands are transmitted and mapped to robot actions in real time via the ROS communication protocol.Through multiple comparative experiments on public gesture datasets and a self-collected dataset,the proposed method’s superiority is validated in terms of accuracy,response latency,and system robustness,while user-experience tests assess the interface’s usability.The results provide a reliable technical foundation for robot collaboration and service in complex scenarios,offering broad prospects for practical application and deployment.展开更多
Hand gesture recognition (HGR) is used in a numerous applications,including medical health-care, industrial purpose and sports detection.We have developed a real-time hand gesture recognition system using inertialsens...Hand gesture recognition (HGR) is used in a numerous applications,including medical health-care, industrial purpose and sports detection.We have developed a real-time hand gesture recognition system using inertialsensors for the smart home application. Developing such a model facilitatesthe medical health field (elders or disabled ones). Home automation has alsobeen proven to be a tremendous benefit for the elderly and disabled. Residentsare admitted to smart homes for comfort, luxury, improved quality of life,and protection against intrusion and burglars. This paper proposes a novelsystem that uses principal component analysis, linear discrimination analysisfeature extraction, and random forest as a classifier to improveHGRaccuracy.We have achieved an accuracy of 94% over the publicly benchmarked HGRdataset. The proposed system can be used to detect hand gestures in thehealthcare industry as well as in the industrial and educational sectors.展开更多
Permeable electronics promise improved physiological comfort,but remain constrained by limited functional integration and poor mechanical robustness.Here,we report a three-dimensional(3D)permeable electronic system th...Permeable electronics promise improved physiological comfort,but remain constrained by limited functional integration and poor mechanical robustness.Here,we report a three-dimensional(3D)permeable electronic system that overcomes these challenges by combining electrospun SEBS nanofiber mats,high-resolution liquid metal conductors patterned via thermal imprinting(50μm),and a strain isolators(SIL)that protects vertical interconnects(VIAs)from stress concentration.This architecture achieves ultrahigh air permeability(>5.09 m L cm^(-2)min^(-1)),exceptional stretchability(750%fracture strain),and reliable conductivity maintained through more than 32,500 strain cycles.Leveraging these advances,we have integrated multilayer circuits,strain sensors,and a three-axis accelerometer to achieve a fully integrated,stretchable,permeable wireless real-time gesture recognition glove.The system enables accurate sign language interpretation(98%)and seamless robotic hand control,demonstrating its potential for assistive technologies.By uniting comfort,durability,and high-density integration,this work establishes a versatile platform for nextgeneration wearable electronics and interactive human-robot interfaces.展开更多
Aim at the defects of easy to fall into the local minimum point and the low convergence speed of back propagation(BP)neural network in the gesture recognition, a new method that combines the chaos algorithm with the...Aim at the defects of easy to fall into the local minimum point and the low convergence speed of back propagation(BP)neural network in the gesture recognition, a new method that combines the chaos algorithm with the genetic algorithm(CGA) is proposed. According to the ergodicity of chaos algorithm and global convergence of genetic algorithm, the basic idea of this paper is to encode the weights and thresholds of BP neural network and obtain a general optimal solution with genetic algorithm, and then the general optimal solution is optimized to the accurate optimal solution by adding chaotic disturbance. The optimal results of the chaotic genetic algorithm are used as the initial weights and thresholds of the BP neural network to recognize the gesture. Simulation and experimental results show that the real-time performance and accuracy of the gesture recognition are greatly improved with CGA.展开更多
Hand gestures are a natural way for human-robot interaction.Vision based dynamic hand gesture recognition has become a hot research topic due to its various applications.This paper presents a novel deep learning netwo...Hand gestures are a natural way for human-robot interaction.Vision based dynamic hand gesture recognition has become a hot research topic due to its various applications.This paper presents a novel deep learning network for hand gesture recognition.The network integrates several well-proved modules together to learn both short-term and long-term features from video inputs and meanwhile avoid intensive computation.To learn short-term features,each video input is segmented into a fixed number of frame groups.A frame is randomly selected from each group and represented as an RGB image as well as an optical flow snapshot.These two entities are fused and fed into a convolutional neural network(Conv Net)for feature extraction.The Conv Nets for all groups share parameters.To learn longterm features,outputs from all Conv Nets are fed into a long short-term memory(LSTM)network,by which a final classification result is predicted.The new model has been tested with two popular hand gesture datasets,namely the Jester dataset and Nvidia dataset.Comparing with other models,our model produced very competitive results.The robustness of the new model has also been proved with an augmented dataset with enhanced diversity of hand gestures.展开更多
Dynamic hand gesture recognition is a desired alternative means for human-computer interactions.This paper presents a hand gesture recognition system that is designed for the control of flights of unmanned aerial vehi...Dynamic hand gesture recognition is a desired alternative means for human-computer interactions.This paper presents a hand gesture recognition system that is designed for the control of flights of unmanned aerial vehicles(UAV).A data representation model that represents a dynamic gesture sequence by converting the 4-D spatiotemporal data to 2-D matrix and a 1-D array is introduced.To train the system to recognize designed gestures,skeleton data collected from a Leap Motion Controller are converted to two different data models.As many as 9124 samples of the training dataset,1938 samples of the testing dataset are created to train and test the proposed three deep learning neural networks,which are a 2-layer fully connected neural network,a 5-layer fully connected neural network and an 8-layer convolutional neural network.The static testing results show that the 2-layer fully connected neural network achieves an average accuracy of 96.7%on scaled datasets and 12.3%on non-scaled datasets.The 5-layer fully connected neural network achieves an average accuracy of 98.0%on scaled datasets and 89.1%on non-scaled datasets.The 8-layer convolutional neural network achieves an average accuracy of 89.6%on scaled datasets and 96.9%on non-scaled datasets.Testing on a drone-kit simulator and a real drone shows that this system is feasible for drone flight controls.展开更多
In this article,to reduce the complexity and improve the generalization ability of current gesture recognition systems,we propose a novel SE-CNN attention architecture for sEMG-based hand gesture recognition.The propo...In this article,to reduce the complexity and improve the generalization ability of current gesture recognition systems,we propose a novel SE-CNN attention architecture for sEMG-based hand gesture recognition.The proposed algorithm introduces a temporal squeeze-and-excite block into a simple CNN architecture and then utilizes it to recalibrate the weights of the feature outputs from the convolutional layer.By enhancing important features while suppressing useless ones,the model realizes gesture recognition efficiently.The last procedure of the proposed algorithm is utilizing a simple attention mechanism to enhance the learned representations of sEMG signals to performmulti-channel sEMG-based gesture recognition tasks.To evaluate the effectiveness and accuracy of the proposed algorithm,we conduct experiments involving multi-gesture datasets Ninapro DB4 and Ninapro DB5 for both inter-session validation and subject-wise cross-validation.After a series of comparisons with the previous models,the proposed algorithm effectively increases the robustness with improved gesture recognition performance and generalization ability.展开更多
Hand gesture recognition is a popular topic in computer vision and makes human-computer interaction more flexible and convenient.The representation of hand gestures is critical for recognition.In this paper,we propose...Hand gesture recognition is a popular topic in computer vision and makes human-computer interaction more flexible and convenient.The representation of hand gestures is critical for recognition.In this paper,we propose a new method to measure the similarity between hand gestures and exploit it for hand gesture recognition.The depth maps of hand gestures captured via the Kinect sensors are used in our method,where the 3D hand shapes can be segmented from the cluttered backgrounds.To extract the pattern of salient 3D shape features,we propose a new descriptor-3D Shape Context,for 3D hand gesture representation.The 3D Shape Context information of each 3D point is obtained in multiple scales because both local shape context and global shape distribution are necessary for recognition.The description of all the 3D points constructs the hand gesture representation,and hand gesture recognition is explored via dynamic time warping algorithm.Extensive experiments are conducted on multiple benchmark datasets.The experimental results verify that the proposed method is robust to noise,articulated variations,and rigid transformations.Our method outperforms state-of-the-art methods in the comparisons of accuracy and efficiency.展开更多
In this study,we developed a system based on deep space–time neural networks for gesture recognition.When users change or the number of gesture categories increases,the accuracy of gesture recognition decreases consi...In this study,we developed a system based on deep space–time neural networks for gesture recognition.When users change or the number of gesture categories increases,the accuracy of gesture recognition decreases considerably because most gesture recognition systems cannot accommodate both user differentiation and gesture diversity.To overcome the limitations of existing methods,we designed a onedimensional parallel long short-term memory–fully convolutional network(LSTM–FCN)model to extract gesture features of different dimensions.LSTM can learn complex time dynamic information,whereas FCN can predict gestures efficiently by extracting the deep,abstract features of gestures in the spatial dimension.In the experiment,50 types of gestures of five users were collected and evaluated.The experimental results demonstrate the effectiveness of this system and robustness to various gestures and individual changes.Statistical analysis of the recognition results indicated that an average accuracy of approximately 98.9% was achieved.展开更多
Gesture recognition has been widely used for human-robot interaction.At present,a problem in gesture recognition is that the researchers did not use the learned knowledge in existing domains to discover and recognize ...Gesture recognition has been widely used for human-robot interaction.At present,a problem in gesture recognition is that the researchers did not use the learned knowledge in existing domains to discover and recognize gestures in new domains.For each new domain,it is required to collect and annotate a large amount of data,and the training of the algorithm does not benefit from prior knowledge,leading to redundant calculation workload and excessive time investment.To address this problem,the paper proposes a method that could transfer gesture data in different domains.We use a red-green-blue(RGB)Camera to collect images of the gestures,and use Leap Motion to collect the coordinates of 21 joint points of the human hand.Then,we extract a set of novel feature descriptors from two different distributions of data for the study of transfer learning.This paper compares the effects of three classification algorithms,i.e.,support vector machine(SVM),broad learning system(BLS)and deep learning(DL).We also compare learning performances with and without using the joint distribution adaptation(JDA)algorithm.The experimental results show that the proposed method could effectively solve the transfer problem between RGB Camera and Leap Motion.In addition,we found that when using DL to classify the data,excessive training on the source domain may reduce the accuracy of recognition in the target domain.展开更多
Gesture recognition is used in many practical applications such as human-robot interaction, medical rehabilitation and sign language. With increasing motion sensor development, multiple data sources have become availa...Gesture recognition is used in many practical applications such as human-robot interaction, medical rehabilitation and sign language. With increasing motion sensor development, multiple data sources have become available, which leads to the rise of multi-modal gesture recognition. Since our previous approach to gesture recognition depends on a unimodal system, it is difficult to classify similar motion patterns. In order to solve this problem, a novel approach which integrates motion, audio and video models is proposed by using dataset captured by Kinect. The proposed system can recognize observed gestures by using three models. Recognition results of three models are integrated by using the proposed framework and the output becomes the final result. The motion and audio models are learned by using Hidden Markov Model. Random Forest which is the video classifier is used to learn the video model. In the experiments to test the performances of the proposed system, the motion and audio models most suitable for gesture recognition are chosen by varying feature vectors and learning methods. Additionally, the unimodal and multi-modal models are compared with respect to recognition accuracy. All the experiments are conducted on dataset provided by the competition organizer of MMGRC, which is a workshop for Multi-Modal Gesture Recognition Challenge. The comparison results show that the multi-modal model composed of three models scores the highest recognition rate. This improvement of recognition accuracy means that the complementary relationship among three models improves the accuracy of gesture recognition. The proposed system provides the application technology to understand human actions of daily life more precisely.展开更多
Recognition of dynamic hand gestures in real-time is a difficult task because the system can never know when or from where the gesture starts and ends in a video stream.Many researchers have been working on visionbase...Recognition of dynamic hand gestures in real-time is a difficult task because the system can never know when or from where the gesture starts and ends in a video stream.Many researchers have been working on visionbased gesture recognition due to its various applications.This paper proposes a deep learning architecture based on the combination of a 3D Convolutional Neural Network(3D-CNN)and a Long Short-Term Memory(LSTM)network.The proposed architecture extracts spatial-temporal information from video sequences input while avoiding extensive computation.The 3D-CNN is used for the extraction of spectral and spatial features which are then given to the LSTM network through which classification is carried out.The proposed model is a light-weight architecture with only 3.7 million training parameters.The model has been evaluated on 15 classes from the 20BN-jester dataset available publicly.The model was trained on 2000 video-clips per class which were separated into 80%training and 20%validation sets.An accuracy of 99%and 97%was achieved on training and testing data,respectively.We further show that the combination of 3D-CNN with LSTM gives superior results as compared to MobileNetv2+LSTM.展开更多
Recently,vision-based gesture recognition(VGR)has become a hot research spot in human-computer interaction(HCI).Unlike other gesture recognition methods with data gloves or other wearable sensors,vision-based gesture ...Recently,vision-based gesture recognition(VGR)has become a hot research spot in human-computer interaction(HCI).Unlike other gesture recognition methods with data gloves or other wearable sensors,vision-based gesture recognition could lead to more natural and intuitive HCI interactions.This paper reviews the state-of-the-art vision-based gestures recognition methods,from different stages of gesture recognition process,i.e.,(1)image acquisition and pre-processing,(2)gesture segmentation,(3)gesture tracking,(4)feature extraction,and(5)gesture classification.This paper also analyzes the advantages and disadvantages of these various methods in detail.Finally,the challenges of vision-based gesture recognition in haptic rendering and future research directions are discussed.展开更多
Sign language recognition is vital for enhancing communication accessibility among the Deaf and hard-of-hearing communities.In Japan,approximately 360,000 individualswith hearing and speech disabilities rely on Japane...Sign language recognition is vital for enhancing communication accessibility among the Deaf and hard-of-hearing communities.In Japan,approximately 360,000 individualswith hearing and speech disabilities rely on Japanese Sign Language(JSL)for communication.However,existing JSL recognition systems have faced significant performance limitations due to inherent complexities.In response to these challenges,we present a novel JSL recognition system that employs a strategic fusion approach,combining joint skeleton-based handcrafted features and pixel-based deep learning features.Our system incorporates two distinct streams:the first stream extracts crucial handcrafted features,emphasizing the capture of hand and body movements within JSL gestures.Simultaneously,a deep learning-based transfer learning stream captures hierarchical representations of JSL gestures in the second stream.Then,we concatenated the critical information of the first stream and the hierarchy of the second stream features to produce the multiple levels of the fusion features,aiming to create a comprehensive representation of the JSL gestures.After reducing the dimensionality of the feature,a feature selection approach and a kernel-based support vector machine(SVM)were used for the classification.To assess the effectiveness of our approach,we conducted extensive experiments on our Lab JSL dataset and a publicly available Arabic sign language(ArSL)dataset.Our results unequivocally demonstrate that our fusion approach significantly enhances JSL recognition accuracy and robustness compared to individual feature sets or traditional recognition methods.展开更多
Sign language,a visual-gestural language used by the deaf and hard-of-hearing community,plays a crucial role in facilitating communication and promoting inclusivity.Sign language recognition(SLR),the process of automa...Sign language,a visual-gestural language used by the deaf and hard-of-hearing community,plays a crucial role in facilitating communication and promoting inclusivity.Sign language recognition(SLR),the process of automatically recognizing and interpreting sign language gestures,has gained significant attention in recent years due to its potential to bridge the communication gap between the hearing impaired and the hearing world.The emergence and continuous development of deep learning techniques have provided inspiration and momentum for advancing SLR.This paper presents a comprehensive and up-to-date analysis of the advancements,challenges,and opportunities in deep learning-based sign language recognition,focusing on the past five years of research.We explore various aspects of SLR,including sign data acquisition technologies,sign language datasets,evaluation methods,and different types of neural networks.Convolutional Neural Networks(CNN)and Recurrent Neural Networks(RNN)have shown promising results in fingerspelling and isolated sign recognition.However,the continuous nature of sign language poses challenges,leading to the exploration of advanced neural network models such as the Transformer model for continuous sign language recognition(CSLR).Despite significant advancements,several challenges remain in the field of SLR.These challenges include expanding sign language datasets,achieving user independence in recognition systems,exploring different input modalities,effectively fusing features,modeling co-articulation,and improving semantic and syntactic understanding.Additionally,developing lightweight network architectures for mobile applications is crucial for practical implementation.By addressing these challenges,we can further advance the field of deep learning for sign language recognition and improve communication for the hearing-impaired community.展开更多
Device-free gesture recognition is an emerging wireless sensing technique which could recognize gestures by analyzing its influence on surrounding wireless signals,it may empower wireless networks with the augmented s...Device-free gesture recognition is an emerging wireless sensing technique which could recognize gestures by analyzing its influence on surrounding wireless signals,it may empower wireless networks with the augmented sensing ability.Researchers have made great achievements for singleperson device-free gesture recognition.However,when multiple persons conduct gestures simultaneously,the received signals will be mixed together,and thus traditional methods would not work well anymore.Moreover,the anonymity of persons and the change in the surrounding environment would cause feature shift and mismatch,and thus the recognition accuracy would degrade remarkably.To address these problems,we explore and exploit the diversity of spatial information and propose a multidimensional analysis method to separate the gesture feature of each person using a focusing sensing strategy.Meanwhile,we also present a deep-learning based robust device free gesture recognition framework,which leverages an adversarial approach to extract robust gesture feature that is insensitive to the change of persons and environment.Furthermore,we also develop a 77GHz mmWave prototype system and evaluate the proposed methods extensively.Experimental results reveal that the proposed system can achieve average accuracies of 93%and 84%when 10 gestures are conducted in Received:Jun.18,2020 Revised:Aug.06,2020 Editor:Ning Ge different environments by two and four persons simultaneously,respectively.展开更多
Gestures are one of the most natural and intuitive approach for human-computer interaction.Compared with traditional camera-based or wearable sensors-based solutions,gesture recognition using the millimeter wave radar...Gestures are one of the most natural and intuitive approach for human-computer interaction.Compared with traditional camera-based or wearable sensors-based solutions,gesture recognition using the millimeter wave radar has attracted growing attention for its characteristics of contact-free,privacy-preserving and less environmentdependence.Although there have been many recent studies on hand gesture recognition,the existing hand gesture recognition methods still have recognition accuracy and generalization ability shortcomings in shortrange applications.In this paper,we present a hand gesture recognition method named multiscale feature fusion(MSFF)to accurately identify micro hand gestures.In MSFF,not only the overall action recognition of the palm but also the subtle movements of the fingers are taken into account.Specifically,we adopt hand gesture multiangle Doppler-time and gesture trajectory range-angle map multi-feature fusion to comprehensively extract hand gesture features and fuse high-level deep neural networks to make it pay more attention to subtle finger movements.We evaluate the proposed method using data collected from 10 users and our proposed solution achieves an average recognition accuracy of 99.7%.Extensive experiments on a public mmWave gesture dataset demonstrate the superior effectiveness of the proposed system.展开更多
文摘The Hand Gestures Recognition(HGR)System can be employed to facilitate communication between humans and computers instead of using special input and output devices.These devices may complicate communication with computers especially for people with disabilities.Hand gestures can be defined as a natural human-to-human communication method,which also can be used in human-computer interaction.Many researchers developed various techniques and methods that aimed to understand and recognize specific hand gestures by employing one or two machine learning algorithms with a reasonable accuracy.Thiswork aims to develop a powerful hand gesture recognition model with a 100%recognition rate.We proposed an ensemble classification model that combines the most powerful machine learning classifiers to obtain diversity and improve accuracy.The majority voting method was used to aggregate accuracies produced by each classifier and get the final classification result.Our model was trained using a self-constructed dataset containing 1600 images of ten different hand gestures.The employing of canny’s edge detector and histogram of oriented gradient method was a great combination with the ensemble classifier and the recognition rate.The experimental results had shown the robustness of our proposed model.Logistic Regression and Support Vector Machine have achieved 100%accuracy.The developed model was validated using two public datasets,and the findings have proved that our model outperformed other compared studies.
文摘Industrial operators need reliable communication in high-noise,safety-critical environments where speech or touch input is often impractical.Existing gesture systems either miss real-time deadlines on resourceconstrained hardware or lose accuracy under occlusion,vibration,and lighting changes.We introduce Industrial EdgeSign,a dual-path framework that combines hardware-aware neural architecture search(NAS)with large multimodalmodel(LMM)guided semantics to deliver robust,low-latency gesture recognition on edge devices.The searched model uses a truncated ResNet50 front end,a dimensional-reduction network that preserves spatiotemporal structure for tubelet-based attention,and localized Transformer layers tuned for on-device inference.To reduce reliance on gloss annotations and mitigate domain shift,we distill semantics from factory-tuned vision-language models and pre-train with masked language modeling and video-text contrastive objectives,aligning visual features with a shared text space.OnML2HP and SHREC’17,theNAS-derived architecture attains 94.7% accuracywith 86ms inference latency and about 5.9W power on Jetson Nano.Under occlusion,lighting shifts,andmotion blur,accuracy remains above 82%.For safetycritical commands,the emergency-stop gesture achieves 72 ms 99th percentile latency with 99.7% fail-safe triggering.Ablation studies confirm the contribution of the spatiotemporal tubelet extractor and text-side pre-training,and we observe gains in translation quality(BLEU-422.33).These results show that Industrial EdgeSign provides accurate,resource-aware,and safety-aligned gesture recognition suitable for deployment in smart factory settings.
基金supported by the National Ministries Foundation of China (Y42013040181)the National Ministries Research of Twelfth Five projects (Y31011040315)the Fundamental Research Funds for the Central Universities (NSIY191414)
文摘Gestures recognition is of great importance to intelligent human-computer interaction technology, but it is also very difficult to deal with, especially when the environment is quite complex. In this paper, the recognition algorithm of dynamic and combined gestures, which based on multi-feature fusion, is proposed. Firstly, in image segmentation stage, the algorithm extracts interested region of gestures in color and depth map by combining with the depth information. Then, to establish support vector machine (SVM) model for static hand gestures recognition, the algorithm fuses weighted Hu invariant moments of depth map into the Histogram of oriented gradients (HOG) of the color image. Finally, an hidden Markov model (HMM) toolbox supporting multi-dimensional continuous data input is adopted to do the training and recognition. Experimental results show that the proposed algorithm can not only overcome the influence of skin object, multi-object moving and hand gestures interference in the background, but also real-time and practical in Human-Computer interaction.
文摘With the growing application of intelligent robots in service,manufacturing,and medical fields,efficient and natural interaction between humans and robots has become key to improving collaboration efficiency and user experience.Gesture recognition,as an intuitive and contactless interaction method,can overcome the limitations of traditional interfaces and enable real-time control and feedback of robot movements and behaviors.This study first reviews mainstream gesture recognition algorithms and their application on different sensing platforms(RGB cameras,depth cameras,and inertial measurement units).It then proposes a gesture recognition method based on multimodal feature fusion and a lightweight deep neural network that balances recognition accuracy with computational efficiency.At system level,a modular human-robot interaction architecture is constructed,comprising perception,decision,and execution layers,and gesture commands are transmitted and mapped to robot actions in real time via the ROS communication protocol.Through multiple comparative experiments on public gesture datasets and a self-collected dataset,the proposed method’s superiority is validated in terms of accuracy,response latency,and system robustness,while user-experience tests assess the interface’s usability.The results provide a reliable technical foundation for robot collaboration and service in complex scenarios,offering broad prospects for practical application and deployment.
基金supported by a grant (2021R1F1A1063634)of the Basic Science Research Program through the National Research Foundation (NRF)funded by the Ministry of Education,Republic of Korea.
文摘Hand gesture recognition (HGR) is used in a numerous applications,including medical health-care, industrial purpose and sports detection.We have developed a real-time hand gesture recognition system using inertialsensors for the smart home application. Developing such a model facilitatesthe medical health field (elders or disabled ones). Home automation has alsobeen proven to be a tremendous benefit for the elderly and disabled. Residentsare admitted to smart homes for comfort, luxury, improved quality of life,and protection against intrusion and burglars. This paper proposes a novelsystem that uses principal component analysis, linear discrimination analysisfeature extraction, and random forest as a classifier to improveHGRaccuracy.We have achieved an accuracy of 94% over the publicly benchmarked HGRdataset. The proposed system can be used to detect hand gestures in thehealthcare industry as well as in the industrial and educational sectors.
基金supported in part by the National Key R&D Program of China under Grant 2024YFB4405300 and 2022YFA1204300the Natural Science Foundation of Hunan Province under Grant 2023JJ20016+2 种基金the National Natural Science Foundation of China under Grants of 52221001 and 62090035the Key Research and Development Plan of Hunan Province under grants of 2022GK3002 and 2023GK2012the Key Program of Science and Technology Department of Hunan Province under grant of 2020XK2001。
文摘Permeable electronics promise improved physiological comfort,but remain constrained by limited functional integration and poor mechanical robustness.Here,we report a three-dimensional(3D)permeable electronic system that overcomes these challenges by combining electrospun SEBS nanofiber mats,high-resolution liquid metal conductors patterned via thermal imprinting(50μm),and a strain isolators(SIL)that protects vertical interconnects(VIAs)from stress concentration.This architecture achieves ultrahigh air permeability(>5.09 m L cm^(-2)min^(-1)),exceptional stretchability(750%fracture strain),and reliable conductivity maintained through more than 32,500 strain cycles.Leveraging these advances,we have integrated multilayer circuits,strain sensors,and a three-axis accelerometer to achieve a fully integrated,stretchable,permeable wireless real-time gesture recognition glove.The system enables accurate sign language interpretation(98%)and seamless robotic hand control,demonstrating its potential for assistive technologies.By uniting comfort,durability,and high-density integration,this work establishes a versatile platform for nextgeneration wearable electronics and interactive human-robot interfaces.
基金supported by Natural Science Foundation of Heilongjiang Province Youth Fund(No.QC2014C054)Foundation for University Young Key Scholar by Heilongjiang Province(No.1254G023)the Science Funds for the Young Innovative Talents of HUST(No.201304)
文摘Aim at the defects of easy to fall into the local minimum point and the low convergence speed of back propagation(BP)neural network in the gesture recognition, a new method that combines the chaos algorithm with the genetic algorithm(CGA) is proposed. According to the ergodicity of chaos algorithm and global convergence of genetic algorithm, the basic idea of this paper is to encode the weights and thresholds of BP neural network and obtain a general optimal solution with genetic algorithm, and then the general optimal solution is optimized to the accurate optimal solution by adding chaotic disturbance. The optimal results of the chaotic genetic algorithm are used as the initial weights and thresholds of the BP neural network to recognize the gesture. Simulation and experimental results show that the real-time performance and accuracy of the gesture recognition are greatly improved with CGA.
文摘Hand gestures are a natural way for human-robot interaction.Vision based dynamic hand gesture recognition has become a hot research topic due to its various applications.This paper presents a novel deep learning network for hand gesture recognition.The network integrates several well-proved modules together to learn both short-term and long-term features from video inputs and meanwhile avoid intensive computation.To learn short-term features,each video input is segmented into a fixed number of frame groups.A frame is randomly selected from each group and represented as an RGB image as well as an optical flow snapshot.These two entities are fused and fed into a convolutional neural network(Conv Net)for feature extraction.The Conv Nets for all groups share parameters.To learn longterm features,outputs from all Conv Nets are fed into a long short-term memory(LSTM)network,by which a final classification result is predicted.The new model has been tested with two popular hand gesture datasets,namely the Jester dataset and Nvidia dataset.Comparing with other models,our model produced very competitive results.The robustness of the new model has also been proved with an augmented dataset with enhanced diversity of hand gestures.
文摘Dynamic hand gesture recognition is a desired alternative means for human-computer interactions.This paper presents a hand gesture recognition system that is designed for the control of flights of unmanned aerial vehicles(UAV).A data representation model that represents a dynamic gesture sequence by converting the 4-D spatiotemporal data to 2-D matrix and a 1-D array is introduced.To train the system to recognize designed gestures,skeleton data collected from a Leap Motion Controller are converted to two different data models.As many as 9124 samples of the training dataset,1938 samples of the testing dataset are created to train and test the proposed three deep learning neural networks,which are a 2-layer fully connected neural network,a 5-layer fully connected neural network and an 8-layer convolutional neural network.The static testing results show that the 2-layer fully connected neural network achieves an average accuracy of 96.7%on scaled datasets and 12.3%on non-scaled datasets.The 5-layer fully connected neural network achieves an average accuracy of 98.0%on scaled datasets and 89.1%on non-scaled datasets.The 8-layer convolutional neural network achieves an average accuracy of 89.6%on scaled datasets and 96.9%on non-scaled datasets.Testing on a drone-kit simulator and a real drone shows that this system is feasible for drone flight controls.
基金funded by the National Key Research and Development Program of China(2017YFB1303200)NSFC(81871444,62071241,62075098,and 62001240)+1 种基金Leading-Edge Technology and Basic Research Program of Jiangsu(BK20192004D)Jiangsu Graduate Scientific Research Innovation Programme(KYCX20_1391,KYCX21_1557).
文摘In this article,to reduce the complexity and improve the generalization ability of current gesture recognition systems,we propose a novel SE-CNN attention architecture for sEMG-based hand gesture recognition.The proposed algorithm introduces a temporal squeeze-and-excite block into a simple CNN architecture and then utilizes it to recalibrate the weights of the feature outputs from the convolutional layer.By enhancing important features while suppressing useless ones,the model realizes gesture recognition efficiently.The last procedure of the proposed algorithm is utilizing a simple attention mechanism to enhance the learned representations of sEMG signals to performmulti-channel sEMG-based gesture recognition tasks.To evaluate the effectiveness and accuracy of the proposed algorithm,we conduct experiments involving multi-gesture datasets Ninapro DB4 and Ninapro DB5 for both inter-session validation and subject-wise cross-validation.After a series of comparisons with the previous models,the proposed algorithm effectively increases the robustness with improved gesture recognition performance and generalization ability.
基金supported by the National Natural Science Foundation of China(61773272,61976191)the Six Talent Peaks Project of Jiangsu Province,China(XYDXX-053)Suzhou Research Project of Technical Innovation,Jiangsu,China(SYG201711)。
文摘Hand gesture recognition is a popular topic in computer vision and makes human-computer interaction more flexible and convenient.The representation of hand gestures is critical for recognition.In this paper,we propose a new method to measure the similarity between hand gestures and exploit it for hand gesture recognition.The depth maps of hand gestures captured via the Kinect sensors are used in our method,where the 3D hand shapes can be segmented from the cluttered backgrounds.To extract the pattern of salient 3D shape features,we propose a new descriptor-3D Shape Context,for 3D hand gesture representation.The 3D Shape Context information of each 3D point is obtained in multiple scales because both local shape context and global shape distribution are necessary for recognition.The description of all the 3D points constructs the hand gesture representation,and hand gesture recognition is explored via dynamic time warping algorithm.Extensive experiments are conducted on multiple benchmark datasets.The experimental results verify that the proposed method is robust to noise,articulated variations,and rigid transformations.Our method outperforms state-of-the-art methods in the comparisons of accuracy and efficiency.
基金supported in part by the National Natural Science Foundation of China under Grant 61461013in part of the Natural Science Foundation of Guangxi Province under Grant 2018GXNSFAA281179in part of the Dean Project of Guangxi Key Laboratory of Wireless Broadband Communication and Signal Processing under Grant GXKL06160103.
文摘In this study,we developed a system based on deep space–time neural networks for gesture recognition.When users change or the number of gesture categories increases,the accuracy of gesture recognition decreases considerably because most gesture recognition systems cannot accommodate both user differentiation and gesture diversity.To overcome the limitations of existing methods,we designed a onedimensional parallel long short-term memory–fully convolutional network(LSTM–FCN)model to extract gesture features of different dimensions.LSTM can learn complex time dynamic information,whereas FCN can predict gestures efficiently by extracting the deep,abstract features of gestures in the spatial dimension.In the experiment,50 types of gestures of five users were collected and evaluated.The experimental results demonstrate the effectiveness of this system and robustness to various gestures and individual changes.Statistical analysis of the recognition results indicated that an average accuracy of approximately 98.9% was achieved.
基金supported by National Nature Science Foundation of China(NSFC)(Nos.U20A20200,61811530281,and 61861136009)Guangdong Regional Joint Foundation(No.2019B1515120076)+1 种基金Fundamental Research for the Central Universitiesin part by the Foshan Science and Technology Innovation Team Special Project(No.2018IT100322)。
文摘Gesture recognition has been widely used for human-robot interaction.At present,a problem in gesture recognition is that the researchers did not use the learned knowledge in existing domains to discover and recognize gestures in new domains.For each new domain,it is required to collect and annotate a large amount of data,and the training of the algorithm does not benefit from prior knowledge,leading to redundant calculation workload and excessive time investment.To address this problem,the paper proposes a method that could transfer gesture data in different domains.We use a red-green-blue(RGB)Camera to collect images of the gestures,and use Leap Motion to collect the coordinates of 21 joint points of the human hand.Then,we extract a set of novel feature descriptors from two different distributions of data for the study of transfer learning.This paper compares the effects of three classification algorithms,i.e.,support vector machine(SVM),broad learning system(BLS)and deep learning(DL).We also compare learning performances with and without using the joint distribution adaptation(JDA)algorithm.The experimental results show that the proposed method could effectively solve the transfer problem between RGB Camera and Leap Motion.In addition,we found that when using DL to classify the data,excessive training on the source domain may reduce the accuracy of recognition in the target domain.
基金Supported by Grant-in-Aid for Young Scientists(A)(Grant No.26700021)Japan Society for the Promotion of Science and Strategic Information and Communications R&D Promotion Programme(Grant No.142103011)Ministry of Internal Affairs and Communications
文摘Gesture recognition is used in many practical applications such as human-robot interaction, medical rehabilitation and sign language. With increasing motion sensor development, multiple data sources have become available, which leads to the rise of multi-modal gesture recognition. Since our previous approach to gesture recognition depends on a unimodal system, it is difficult to classify similar motion patterns. In order to solve this problem, a novel approach which integrates motion, audio and video models is proposed by using dataset captured by Kinect. The proposed system can recognize observed gestures by using three models. Recognition results of three models are integrated by using the proposed framework and the output becomes the final result. The motion and audio models are learned by using Hidden Markov Model. Random Forest which is the video classifier is used to learn the video model. In the experiments to test the performances of the proposed system, the motion and audio models most suitable for gesture recognition are chosen by varying feature vectors and learning methods. Additionally, the unimodal and multi-modal models are compared with respect to recognition accuracy. All the experiments are conducted on dataset provided by the competition organizer of MMGRC, which is a workshop for Multi-Modal Gesture Recognition Challenge. The comparison results show that the multi-modal model composed of three models scores the highest recognition rate. This improvement of recognition accuracy means that the complementary relationship among three models improves the accuracy of gesture recognition. The proposed system provides the application technology to understand human actions of daily life more precisely.
文摘Recognition of dynamic hand gestures in real-time is a difficult task because the system can never know when or from where the gesture starts and ends in a video stream.Many researchers have been working on visionbased gesture recognition due to its various applications.This paper proposes a deep learning architecture based on the combination of a 3D Convolutional Neural Network(3D-CNN)and a Long Short-Term Memory(LSTM)network.The proposed architecture extracts spatial-temporal information from video sequences input while avoiding extensive computation.The 3D-CNN is used for the extraction of spectral and spatial features which are then given to the LSTM network through which classification is carried out.The proposed model is a light-weight architecture with only 3.7 million training parameters.The model has been evaluated on 15 classes from the 20BN-jester dataset available publicly.The model was trained on 2000 video-clips per class which were separated into 80%training and 20%validation sets.An accuracy of 99%and 97%was achieved on training and testing data,respectively.We further show that the combination of 3D-CNN with LSTM gives superior results as compared to MobileNetv2+LSTM.
基金Supported by the National Natural Science Foundation of China(61773205,61773219)the Fundamental Research Funds for the Central Universities(NS2016032,NS2019018,Nanjing University of Aeronautics and Astronautics)+1 种基金the Scholarship from China Scholarship Council(201906835020)the Fundamental Research Funds for the Central Universities(the Graduate Student Innovation Base Open Fund Project of NUAA,kfjj20190307)。
文摘Recently,vision-based gesture recognition(VGR)has become a hot research spot in human-computer interaction(HCI).Unlike other gesture recognition methods with data gloves or other wearable sensors,vision-based gesture recognition could lead to more natural and intuitive HCI interactions.This paper reviews the state-of-the-art vision-based gestures recognition methods,from different stages of gesture recognition process,i.e.,(1)image acquisition and pre-processing,(2)gesture segmentation,(3)gesture tracking,(4)feature extraction,and(5)gesture classification.This paper also analyzes the advantages and disadvantages of these various methods in detail.Finally,the challenges of vision-based gesture recognition in haptic rendering and future research directions are discussed.
基金supported by the Competitive Research Fund of the University of Aizu,Japan.
文摘Sign language recognition is vital for enhancing communication accessibility among the Deaf and hard-of-hearing communities.In Japan,approximately 360,000 individualswith hearing and speech disabilities rely on Japanese Sign Language(JSL)for communication.However,existing JSL recognition systems have faced significant performance limitations due to inherent complexities.In response to these challenges,we present a novel JSL recognition system that employs a strategic fusion approach,combining joint skeleton-based handcrafted features and pixel-based deep learning features.Our system incorporates two distinct streams:the first stream extracts crucial handcrafted features,emphasizing the capture of hand and body movements within JSL gestures.Simultaneously,a deep learning-based transfer learning stream captures hierarchical representations of JSL gestures in the second stream.Then,we concatenated the critical information of the first stream and the hierarchy of the second stream features to produce the multiple levels of the fusion features,aiming to create a comprehensive representation of the JSL gestures.After reducing the dimensionality of the feature,a feature selection approach and a kernel-based support vector machine(SVM)were used for the classification.To assess the effectiveness of our approach,we conducted extensive experiments on our Lab JSL dataset and a publicly available Arabic sign language(ArSL)dataset.Our results unequivocally demonstrate that our fusion approach significantly enhances JSL recognition accuracy and robustness compared to individual feature sets or traditional recognition methods.
基金supported from the National Philosophy and Social Sciences Foundation(Grant No.20BTQ065).
文摘Sign language,a visual-gestural language used by the deaf and hard-of-hearing community,plays a crucial role in facilitating communication and promoting inclusivity.Sign language recognition(SLR),the process of automatically recognizing and interpreting sign language gestures,has gained significant attention in recent years due to its potential to bridge the communication gap between the hearing impaired and the hearing world.The emergence and continuous development of deep learning techniques have provided inspiration and momentum for advancing SLR.This paper presents a comprehensive and up-to-date analysis of the advancements,challenges,and opportunities in deep learning-based sign language recognition,focusing on the past five years of research.We explore various aspects of SLR,including sign data acquisition technologies,sign language datasets,evaluation methods,and different types of neural networks.Convolutional Neural Networks(CNN)and Recurrent Neural Networks(RNN)have shown promising results in fingerspelling and isolated sign recognition.However,the continuous nature of sign language poses challenges,leading to the exploration of advanced neural network models such as the Transformer model for continuous sign language recognition(CSLR).Despite significant advancements,several challenges remain in the field of SLR.These challenges include expanding sign language datasets,achieving user independence in recognition systems,exploring different input modalities,effectively fusing features,modeling co-articulation,and improving semantic and syntactic understanding.Additionally,developing lightweight network architectures for mobile applications is crucial for practical implementation.By addressing these challenges,we can further advance the field of deep learning for sign language recognition and improve communication for the hearing-impaired community.
基金This work was supported by National Natural Science Foundation of China under grants U1933104 and 62071081LiaoNing Revitalization Talents Program under grant XLYC1807019,Liaoning Province Natural Science Foundation under grants 2019-MS-058+1 种基金Dalian Science and Technology Innovation Foundation under grant 2018J12GX044Fundamental Research Funds for the Central Universities under grants DUT20LAB113 and DUT20JC07,and Cooperative Scientific Research Project of Chunhui Plan of Ministry of Education.
文摘Device-free gesture recognition is an emerging wireless sensing technique which could recognize gestures by analyzing its influence on surrounding wireless signals,it may empower wireless networks with the augmented sensing ability.Researchers have made great achievements for singleperson device-free gesture recognition.However,when multiple persons conduct gestures simultaneously,the received signals will be mixed together,and thus traditional methods would not work well anymore.Moreover,the anonymity of persons and the change in the surrounding environment would cause feature shift and mismatch,and thus the recognition accuracy would degrade remarkably.To address these problems,we explore and exploit the diversity of spatial information and propose a multidimensional analysis method to separate the gesture feature of each person using a focusing sensing strategy.Meanwhile,we also present a deep-learning based robust device free gesture recognition framework,which leverages an adversarial approach to extract robust gesture feature that is insensitive to the change of persons and environment.Furthermore,we also develop a 77GHz mmWave prototype system and evaluate the proposed methods extensively.Experimental results reveal that the proposed system can achieve average accuracies of 93%and 84%when 10 gestures are conducted in Received:Jun.18,2020 Revised:Aug.06,2020 Editor:Ning Ge different environments by two and four persons simultaneously,respectively.
基金supported by the National Natural Science Foundation of China under grant no.62272242.
文摘Gestures are one of the most natural and intuitive approach for human-computer interaction.Compared with traditional camera-based or wearable sensors-based solutions,gesture recognition using the millimeter wave radar has attracted growing attention for its characteristics of contact-free,privacy-preserving and less environmentdependence.Although there have been many recent studies on hand gesture recognition,the existing hand gesture recognition methods still have recognition accuracy and generalization ability shortcomings in shortrange applications.In this paper,we present a hand gesture recognition method named multiscale feature fusion(MSFF)to accurately identify micro hand gestures.In MSFF,not only the overall action recognition of the palm but also the subtle movements of the fingers are taken into account.Specifically,we adopt hand gesture multiangle Doppler-time and gesture trajectory range-angle map multi-feature fusion to comprehensively extract hand gesture features and fuse high-level deep neural networks to make it pay more attention to subtle finger movements.We evaluate the proposed method using data collected from 10 users and our proposed solution achieves an average recognition accuracy of 99.7%.Extensive experiments on a public mmWave gesture dataset demonstrate the superior effectiveness of the proposed system.