Dynamic sign language recognition holds significant importance, particularly with the application of deep learning to address its complexity. However, existing methods face several challenges. Firstly, recognizing dyn...Dynamic sign language recognition holds significant importance, particularly with the application of deep learning to address its complexity. However, existing methods face several challenges. Firstly, recognizing dynamic sign language requires identifying keyframes that best represent the signs, and missing these keyframes reduces accuracy. Secondly, some methods do not focus enough on hand regions, which are small within the overall frame, leading to information loss. To address these challenges, we propose a novel Video Transformer Attention-based Network (VTAN) for dynamic sign language recognition. Our approach prioritizes informative frames and hand regions effectively. To tackle the first issue, we designed a keyframe extraction module enhanced by a convolutional autoencoder, which focuses on selecting information-rich frames and eliminating redundant ones from the video sequences. For the second issue, we developed a soft attention-based transformer module that emphasizes extracting features from hand regions, ensuring that the network pays more attention to hand information within sequences. This dual-focus approach improves effective dynamic sign language recognition by addressing the key challenges of identifying critical frames and emphasizing hand regions. Experimental results on two public benchmark datasets demonstrate the effectiveness of our network, outperforming most of the typical methods in sign language recognition tasks.展开更多
Arabic Sign Language(ArSL)recognition plays a vital role in enhancing the communication for the Deaf and Hard of Hearing(DHH)community.Researchers have proposed multiple methods for automated recognition of ArSL;howev...Arabic Sign Language(ArSL)recognition plays a vital role in enhancing the communication for the Deaf and Hard of Hearing(DHH)community.Researchers have proposed multiple methods for automated recognition of ArSL;however,these methods face multiple challenges that include high gesture variability,occlusions,limited signer diversity,and the scarcity of large annotated datasets.Existing methods,often relying solely on either skeletal data or video-based features,struggle with generalization and robustness,especially in dynamic and real-world conditions.This paper proposes a novel multimodal ensemble classification framework that integrates geometric features derived from 3D skeletal joint distances and angles with temporal features extracted from RGB videos using the Inflated 3D ConvNet(I3D).By fusing these complementary modalities at the feature level and applying a majority-voting ensemble of XGBoost,Random Forest,and Support Vector Machine classifiers,the framework robustly captures both spatial configurations and motion dynamics of sign gestures.Feature selection using the Pearson Correlation Coefficient further enhances efficiency by reducing redundancy.Extensive experiments on the ArabSign dataset,which includes RGB videos and corresponding skeletal data,demonstrate that the proposed approach significantly outperforms state-of-the-art methods,achieving an average F1-score of 97%using a majority-voting ensemble of XGBoost,Random Forest,and SVM classifiers,and improving recognition accuracy by more than 7%over previous best methods.This work not only advances the technical stateof-the-art in ArSL recognition but also provides a scalable,real-time solution for practical deployment in educational,social,and assistive communication technologies.Even though this study is about Arabic Sign Language,the framework proposed here can be extended to different sign languages,creating possibilities for potentially worldwide applicability in sign language recognition tasks.展开更多
Language barrier is the main cause of disagreement.Sign language,which is a common language in all the worldwide language families,is difficult to be entirely popularized due to the high cost of learning as well as th...Language barrier is the main cause of disagreement.Sign language,which is a common language in all the worldwide language families,is difficult to be entirely popularized due to the high cost of learning as well as the technical barrier in real-time translation.To solve these problems,here,we constructed a wearable organohydrogel-based electronic skin(e-skin)with fast self-healing,strong adhesion,extraor-dinary anti-freezing and moisturizing properties for sign language recognition under complex environ-ments.The e-skin was obtained by using an acrylic network as the main body,aluminum(III)and bay-berry tannin as the crosslinking agent,water/ethylene glycol as the solvent system,and a polyvinyl al-cohol network to optimize the network performance.Using this e-skin,a smart glove was further built,which could carry out the large-scale data collection of common gestures and sign languages.With the help of the deep learning method,specific recognition and translation for various gestures and sign lan-guages could be achieved.The accuracy was 93.5%,showing the ultra-high classification accuracy of a sign language interpreter.In short,by integrating multiple characteristics and combining deep learning technology with hydrogel materials,the e-skin achieved an important breakthrough in human-computer interaction and artificial intelligence,and provided a feasible strategy for solving the dilemma of mutual exclusion between flexible electronic devices and human bodies.展开更多
Sign language,a visual-gestural language used by the deaf and hard-of-hearing community,plays a crucial role in facilitating communication and promoting inclusivity.Sign language recognition(SLR),the process of automa...Sign language,a visual-gestural language used by the deaf and hard-of-hearing community,plays a crucial role in facilitating communication and promoting inclusivity.Sign language recognition(SLR),the process of automatically recognizing and interpreting sign language gestures,has gained significant attention in recent years due to its potential to bridge the communication gap between the hearing impaired and the hearing world.The emergence and continuous development of deep learning techniques have provided inspiration and momentum for advancing SLR.This paper presents a comprehensive and up-to-date analysis of the advancements,challenges,and opportunities in deep learning-based sign language recognition,focusing on the past five years of research.We explore various aspects of SLR,including sign data acquisition technologies,sign language datasets,evaluation methods,and different types of neural networks.Convolutional Neural Networks(CNN)and Recurrent Neural Networks(RNN)have shown promising results in fingerspelling and isolated sign recognition.However,the continuous nature of sign language poses challenges,leading to the exploration of advanced neural network models such as the Transformer model for continuous sign language recognition(CSLR).Despite significant advancements,several challenges remain in the field of SLR.These challenges include expanding sign language datasets,achieving user independence in recognition systems,exploring different input modalities,effectively fusing features,modeling co-articulation,and improving semantic and syntactic understanding.Additionally,developing lightweight network architectures for mobile applications is crucial for practical implementation.By addressing these challenges,we can further advance the field of deep learning for sign language recognition and improve communication for the hearing-impaired community.展开更多
Research on Chinese Sign Language(CSL)provides convenience and support for individuals with hearing impairments to communicate and integrate into society.This article reviews the relevant literature on Chinese Sign La...Research on Chinese Sign Language(CSL)provides convenience and support for individuals with hearing impairments to communicate and integrate into society.This article reviews the relevant literature on Chinese Sign Language Recognition(CSLR)in the past 20 years.Hidden Markov Models(HMM),Support Vector Machines(SVM),and Dynamic Time Warping(DTW)were found to be the most commonly employed technologies among traditional identificationmethods.Benefiting from the rapid development of computer vision and artificial intelligence technology,Convolutional Neural Networks(CNN),3D-CNN,YOLO,Capsule Network(CapsNet)and various deep neural networks have sprung up.Deep Neural Networks(DNNs)and their derived models are integral tomodern artificial intelligence recognitionmethods.In addition,technologies thatwerewidely used in the early days have also been integrated and applied to specific hybrid models and customized identification methods.Sign language data collection includes acquiring data from data gloves,data sensors(such as Kinect,LeapMotion,etc.),and high-definition photography.Meanwhile,facial expression recognition,complex background processing,and 3D sign language recognition have also attracted research interests among scholars.Due to the uniqueness and complexity of Chinese sign language,accuracy,robustness,real-time performance,and user independence are significant challenges for future sign language recognition research.Additionally,suitable datasets and evaluation criteria are also worth pursuing.展开更多
The hands and face are the most important parts for expressing sign language morphemes in sign language videos.However,we find that existing Continuous Sign Language Recognition(CSLR)methods lack the mining of hand an...The hands and face are the most important parts for expressing sign language morphemes in sign language videos.However,we find that existing Continuous Sign Language Recognition(CSLR)methods lack the mining of hand and face information in visual backbones or use expensive and time-consuming external extractors to explore this information.In addition,the signs have different lengths,whereas previous CSLR methods typically use a fixed-length window to segment the video to capture sequential features and then perform global temporal modeling,which disturbs the perception of complete signs.In this study,we propose a Multi-Scale Context-Aware network(MSCA-Net)to solve the aforementioned problems.Our MSCA-Net contains two main modules:(1)Multi-Scale Motion Attention(MSMA),which uses the differences among frames to perceive information of the hands and face in multiple spatial scales,replacing the heavy feature extractors;and(2)Multi-Scale Temporal Modeling(MSTM),which explores crucial temporal information in the sign language video from different temporal scales.We conduct extensive experiments using three widely used sign language datasets,i.e.,RWTH-PHOENIX-Weather-2014,RWTH-PHOENIX-Weather-2014T,and CSL-Daily.The proposed MSCA-Net achieve state-of-the-art performance,demonstrating the effectiveness of our approach.展开更多
Continuous sign language recognition(CSLR)is challenging due to the complexity of video background,hand gesture variability,and temporal modeling difficulties.This work proposes a CSLR method based on a spatialtempora...Continuous sign language recognition(CSLR)is challenging due to the complexity of video background,hand gesture variability,and temporal modeling difficulties.This work proposes a CSLR method based on a spatialtemporal graph attention network to focus on essential features of video series.The method considers local details of sign language movements by taking the information on joints and bones as inputs and constructing a spatialtemporal graph to reflect inter-frame relevance and physical connections between nodes.The graph-based multihead attention mechanism is utilized with adjacent matrix calculation for better local-feature exploration,and short-term motion correlation modeling is completed via a temporal convolutional network.We adopted BLSTM to learn the long-termdependence and connectionist temporal classification to align the word-level sequences.The proposed method achieves competitive results regarding word error rates(1.59%)on the Chinese Sign Language dataset and the mean Jaccard Index(65.78%)on the ChaLearn LAP Continuous Gesture Dataset.展开更多
This document presents a computer vision system for the automatic recognition of Mexican Sign Language (MSL), based on normalized moments as invariant (to translation and scale transforms) descriptors, using artificia...This document presents a computer vision system for the automatic recognition of Mexican Sign Language (MSL), based on normalized moments as invariant (to translation and scale transforms) descriptors, using artificial neural networks as pattern recognition model. An experimental feature selection was performed to reduce computational costs due to this work focusing on automatic recognition. The computer vision system includes four LED-reflectors of 700 lumens each in order to improve image acquisition quality;this illumination system allows reducing shadows in each sign of the MSL. MSL contains 27 signs in total but 6 of them are expressed with movement;this paper presents a framework for the automatic recognition of 21 static signs of MSL. The proposed system achieved 93% of recognition rate.展开更多
Sign language is mainly utilized in communication with people who have hearing disabilities.Sign language is used to communicate with people hav-ing developmental impairments who have some or no interaction skills.The...Sign language is mainly utilized in communication with people who have hearing disabilities.Sign language is used to communicate with people hav-ing developmental impairments who have some or no interaction skills.The inter-action via Sign language becomes a fruitful means of communication for hearing and speech impaired persons.A Hand gesture recognition systemfinds helpful for deaf and dumb people by making use of human computer interface(HCI)and convolutional neural networks(CNN)for identifying the static indications of Indian Sign Language(ISL).This study introduces a shark smell optimization with deep learning based automated sign language recognition(SSODL-ASLR)model for hearing and speaking impaired people.The presented SSODL-ASLR technique majorly concentrates on the recognition and classification of sign lan-guage provided by deaf and dumb people.The presented SSODL-ASLR model encompasses a two stage process namely sign language detection and sign lan-guage classification.In thefirst stage,the Mask Region based Convolution Neural Network(Mask RCNN)model is exploited for sign language recognition.Sec-ondly,SSO algorithm with soft margin support vector machine(SM-SVM)model can be utilized for sign language classification.To assure the enhanced classifica-tion performance of the SSODL-ASLR model,a brief set of simulations was car-ried out.The extensive results portrayed the supremacy of the SSODL-ASLR model over other techniques.展开更多
Hand gesture recognition system is considered as a way for more intuitive and proficient human computer interaction tool. The range of applications includes virtual prototyping, sign language analysis and medical trai...Hand gesture recognition system is considered as a way for more intuitive and proficient human computer interaction tool. The range of applications includes virtual prototyping, sign language analysis and medical training. In this paper, an efficient Indian Sign Language Recognition System (ISLR) is proposed for deaf and dump people using hand gesture images. The proposed ISLR system is considered as a pattern recognition technique that has two important modules: feature extraction and classification. The joint use of Discrete Wavelet Transform (DWT) based feature extraction and nearest neighbour classifier is used to recognize the sign language. The experimental results show that the proposed hand gesture recognition system achieves maximum 99.23% classification accuracy while using cosine distance classifier.展开更多
Wearable sign language recognition helps hearing/speech impaired people communicate with non-signers.However current technologies still unsatisfy practical uses due to the limitations of sensing and decoding capabilit...Wearable sign language recognition helps hearing/speech impaired people communicate with non-signers.However current technologies still unsatisfy practical uses due to the limitations of sensing and decoding capabilities.Here,A continuous sign language recognition system is proposed with multimodal hand/finger movement sensing and fuzzy encoding,trained with small word-level samples from one user,but applicable to sentence-level language recognition for new untrained users,achieving data-efficient universal recognition.A stretchable fabric strain sensor is developed by printing conductive poly(3,4-ethylenedioxythiophene):poly(styrenesulfonate)(PEDOT:PSS)ink on a pre-stretched fabric wrapping rubber band,allowing the strain sensor with superior performances of wide sensing range,high sensitivity,good linearity,fast dynamic response,low hysteresis,and good long-term reliability.A flexible e-skin with a homemade micro-flow sensor array is further developed to accurately capture three-dimensional hand movements.Benefitting from fabric strain sensors for finger movement sensing,microflow sensor array for 3D hand movement sensing,and human-inspired fuzzy encoding for semantic comprehension,sign language is captured accurately without the interferences from individual action differences.Experiment results show that the semantic comprehension accuracy reaches 99.7%and 95%,respectively,in recognizing 100 isolated words and 50 sentences for a trained user,and achieves 80%in recognizing 50 sentences for new untrained users.展开更多
Skeleton-based sign language recognition(SLR)is a challenging research area mainly due to the fast and complex hand movement.Currently,graph convolution networks(GCNs)have been employed in skeleton-based SLR and achie...Skeleton-based sign language recognition(SLR)is a challenging research area mainly due to the fast and complex hand movement.Currently,graph convolution networks(GCNs)have been employed in skeleton-based SLR and achieved remarkable performance.However,existing GCN-based SLR methods suffer from a lack of explicit attention to hand topology which plays an important role in the sign language representation.To address this issue,we propose a novel hand-aware graph convolution network(HA-GCN)to focus on hand topological relationships of skeleton graph.Specifically,a hand-aware graph convolution layer is designed to capture both global body and local hand information,in which two sub-graphs are defined and incorporated to represent hand topology information.In addition,in order to eliminate the over-fitting problem,an adaptive DropGraph is designed in construction of hand-aware graph convolution block to remove the spatial and temporal redundancy in the sign language representation.With the aim to further improve the performance,the joints information,bones,together with their motion information are simultaneously modeled in a multi-stream framework.Extensive experiments on the two open-source datasets,AUTSL and INCLUDE,demonstrate that our proposed algorithm outperforms the state-of-the-art with a significant margin.Our code is available at https://github.com/snorlaxse/HA-SLR-GCN.展开更多
Accessible communication based on sign language recognition(SLR)is the key to emergency medical assistance for the hearing-impaired community.Balancing the capture of both local and global information in SLR for emerg...Accessible communication based on sign language recognition(SLR)is the key to emergency medical assistance for the hearing-impaired community.Balancing the capture of both local and global information in SLR for emergency medicine poses a significant challenge.To address this,we propose a novel approach based on the inter-learning of visual features between global and local information.Specifically,our method enhances the perception capabilities of the visual feature extractor by strategically leveraging the strengths of convolutional neural network(CNN),which are adept at capturing local features,and visual transformers which perform well at perceiving global features.Furthermore,to mitigate the issue of overfitting caused by the limited availability of sign language data for emergency medical applications,we introduce an enhanced short temporal module for data augmentation through additional subsequences.Experimental results on three publicly available sign language datasets demonstrate the efficacy of the proposed approach.展开更多
Sign language dataset is essential in sign language recognition and translation(SLRT). Current public sign language datasets are small and lack diversity, which does not meet the practical application requirements for...Sign language dataset is essential in sign language recognition and translation(SLRT). Current public sign language datasets are small and lack diversity, which does not meet the practical application requirements for SLRT. However, making a large-scale and diverse sign language dataset is difficult as sign language data on the Internet is scarce. In making a large-scale and diverse sign language dataset, some sign language data qualities are not up to standard. This paper proposes a two information streams transformer(TIST) model to judge whether the quality of sign language data is qualified. To verify that TIST effectively improves sign language recognition(SLR), we make two datasets, the screened dataset and the unscreened dataset. In this experiment, this paper uses visual alignment constraint(VAC) as the baseline model. The experimental results show that the screened dataset can achieve better word error rate(WER) than the unscreened dataset.展开更多
Hand gestures have been used as a significant mode of communication since the advent of human civilization.By facilitating human-computer interaction(HCI),hand gesture recognition(HGRoc)technology is crucial for seaml...Hand gestures have been used as a significant mode of communication since the advent of human civilization.By facilitating human-computer interaction(HCI),hand gesture recognition(HGRoc)technology is crucial for seamless and error-free HCI.HGRoc technology is pivotal in healthcare and communication for the deaf community.Despite significant advancements in computer vision-based gesture recognition for language understanding,two considerable challenges persist in this field:(a)limited and common gestures are considered,(b)processing multiple channels of information across a network takes huge computational time during discriminative feature extraction.Therefore,a novel hand vision-based convolutional neural network(CNN)model named(HVCNNM)offers several benefits,notably enhanced accuracy,robustness to variations,real-time performance,reduced channels,and scalability.Additionally,these models can be optimized for real-time performance,learn from large amounts of data,and are scalable to handle complex recognition tasks for efficient human-computer interaction.The proposed model was evaluated on two challenging datasets,namely the Massey University Dataset(MUD)and the American Sign Language(ASL)Alphabet Dataset(ASLAD).On the MUD and ASLAD datasets,HVCNNM achieved a score of 99.23% and 99.00%,respectively.These results demonstrate the effectiveness of CNN as a promising HGRoc approach.The findings suggest that the proposed model have potential roles in applications such as sign language recognition,human-computer interaction,and robotics.展开更多
This paper is a pilot study that investigates the attitudes towards the official recognition of Hong Kong Sign Language(HKSL)by Hong Kong citizens.We used video-chat software(mainly WhatsApp,and Facebook Messenger,but...This paper is a pilot study that investigates the attitudes towards the official recognition of Hong Kong Sign Language(HKSL)by Hong Kong citizens.We used video-chat software(mainly WhatsApp,and Facebook Messenger,but also FaceTime)to conduct long-distance semi-structured interviews with 30 participants grouped as deaf,hearing-related(hearing people that are closely involved in the Deaf community),and hearing-unrelated(hearing people that have little contact with deaf people and the Deaf community).Results show that the majority of participants(N=22)holds a supportive attitude towards the recognition of HKSL;Five participants hold a neutral position,and three participants hold a negative attitude towards it.We discussed each type of attitude in detail.Results show that participants’attitudes are positively related to their awareness of deaf people’s need,the understanding of‘language recognition’,and personal world views.In other words,the more participants are aware,the more they foster official recognition,at least as a general trend.Results also indicate that hearing people who are not involved in the Deaf community know very little about deaf people and the Deaf community,in general.At the end of the paper,we also reflect on two issues:we argue that the standardization of HKSL plays an important role in deaf education and empowering citizenship awareness and participation.展开更多
Continuous Chinese sign language recognition(CCSLR)methods have shown their strong ability to learn excellent model architectures from datasets.However,due to data insufficiency,it is difficult to complete the CCSLR t...Continuous Chinese sign language recognition(CCSLR)methods have shown their strong ability to learn excellent model architectures from datasets.However,due to data insufficiency,it is difficult to complete the CCSLR task.In this work,we focus on a simple but important solution to alleviate data insufficiency:how to refine the model architecture of a CCSLR network to improve the robustness of feature processing by using some better-quality non-Chinese sign language datasets.To this end,a simple empirical study wasfirst conducted to verify the feasibility of knowledge transfer in the CCSLR task.Surprisingly,just by pre-training of our recognition model on a foreign sign language dataset,we can refine the model architecture and improve its robustness significantly.To make it more practical,the key issue of how tofine-tune the existing feature processing models for effective guidance should be carefully investigated.Then,we propose a novel scheme forfine-tuning of pre-trained models named FTP,which updates the spatial feature extractor initialized by a pre-trained backbone and freezes the temporal feature extractor implemented by a better shareable transformer encoder.Compared with the baseline method,our FTP method can achieve significant performance improvement on the public dataset USTC-CCSL.展开更多
Hearing and Speech impairment can be congenital or acquired.Hearing and speech-impaired students often hesitate to pursue higher education in reputable institutions due to their challenges.However,the development of a...Hearing and Speech impairment can be congenital or acquired.Hearing and speech-impaired students often hesitate to pursue higher education in reputable institutions due to their challenges.However,the development of automated assistive learning tools within the educational field has empowered disabled students to pursue higher education in any field of study.Assistive learning devices enable students to access institutional resources and facilities fully.The proposed assistive learning and communication tool allows hearing and speech-impaired students to interact productively with their teachers and classmates.This tool converts the audio signals into sign language videos for the speech and hearing-impaired to follow and converts the sign language to text format for the teachers to follow.This educational tool for the speech and hearing-impaired is implemented by customized deep learning models such as Convolution neural networks(CNN),Residual neural Networks(ResNet),and stacked Long short-term memory(LSTM)network models.This assistive learning tool is a novel framework that interprets the static and dynamic gesture actions in American Sign Language(ASL).Such communicative tools empower the speech and hearing impaired to communicate effectively in a classroom environment and foster inclusivity.Customized deep learning models were developed and experimentally evaluated with the standard performance metrics.The model exhibits an accuracy of 99.7% for all static gesture classification and 99% for specific vocabulary of gesture action words.This two-way communicative and educational tool encourages social inclusion and a promising career for disabled students.展开更多
Sign language includes the motion of the arms and hands to communicate with people with hearing disabilities.Several models have been available in the literature for sign language detection and classification for enha...Sign language includes the motion of the arms and hands to communicate with people with hearing disabilities.Several models have been available in the literature for sign language detection and classification for enhanced outcomes.But the latest advancements in computer vision enable us to perform signs/gesture recognition using deep neural networks.This paper introduces an Arabic Sign Language Gesture Classification using Deer Hunting Optimization with Machine Learning(ASLGC-DHOML)model.The presented ASLGC-DHOML technique mainly concentrates on recognising and classifying sign language gestures.The presented ASLGC-DHOML model primarily pre-processes the input gesture images and generates feature vectors using the densely connected network(DenseNet169)model.For gesture recognition and classification,a multilayer perceptron(MLP)classifier is exploited to recognize and classify the existence of sign language gestures.Lastly,the DHO algorithm is utilized for parameter optimization of the MLP model.The experimental results of the ASLGC-DHOML model are tested and the outcomes are inspected under distinct aspects.The comparison analysis highlighted that the ASLGC-DHOML method has resulted in enhanced gesture classification results than other techniques with maximum accuracy of 92.88%.展开更多
The scientific achievements appraising conference of multi-font printed Mongolian (mixed with Chinese and English) document recognition system and unified platform-based ethnic languages document recognition system ...The scientific achievements appraising conference of multi-font printed Mongolian (mixed with Chinese and English) document recognition system and unified platform-based ethnic languages document recognition system organized by the Ministry of Education was held on January 29, 2007 at Tsinghua University.展开更多
基金supported by the National Natural Science Foundation of China under Grant Nos.62076117 and 62166026the Jiangxi Provincial Key Laboratory of Virtual Reality under Grant No.2024SSY03151.
文摘Dynamic sign language recognition holds significant importance, particularly with the application of deep learning to address its complexity. However, existing methods face several challenges. Firstly, recognizing dynamic sign language requires identifying keyframes that best represent the signs, and missing these keyframes reduces accuracy. Secondly, some methods do not focus enough on hand regions, which are small within the overall frame, leading to information loss. To address these challenges, we propose a novel Video Transformer Attention-based Network (VTAN) for dynamic sign language recognition. Our approach prioritizes informative frames and hand regions effectively. To tackle the first issue, we designed a keyframe extraction module enhanced by a convolutional autoencoder, which focuses on selecting information-rich frames and eliminating redundant ones from the video sequences. For the second issue, we developed a soft attention-based transformer module that emphasizes extracting features from hand regions, ensuring that the network pays more attention to hand information within sequences. This dual-focus approach improves effective dynamic sign language recognition by addressing the key challenges of identifying critical frames and emphasizing hand regions. Experimental results on two public benchmark datasets demonstrate the effectiveness of our network, outperforming most of the typical methods in sign language recognition tasks.
基金funding this work through Research Group No.KS-2024-376.
文摘Arabic Sign Language(ArSL)recognition plays a vital role in enhancing the communication for the Deaf and Hard of Hearing(DHH)community.Researchers have proposed multiple methods for automated recognition of ArSL;however,these methods face multiple challenges that include high gesture variability,occlusions,limited signer diversity,and the scarcity of large annotated datasets.Existing methods,often relying solely on either skeletal data or video-based features,struggle with generalization and robustness,especially in dynamic and real-world conditions.This paper proposes a novel multimodal ensemble classification framework that integrates geometric features derived from 3D skeletal joint distances and angles with temporal features extracted from RGB videos using the Inflated 3D ConvNet(I3D).By fusing these complementary modalities at the feature level and applying a majority-voting ensemble of XGBoost,Random Forest,and Support Vector Machine classifiers,the framework robustly captures both spatial configurations and motion dynamics of sign gestures.Feature selection using the Pearson Correlation Coefficient further enhances efficiency by reducing redundancy.Extensive experiments on the ArabSign dataset,which includes RGB videos and corresponding skeletal data,demonstrate that the proposed approach significantly outperforms state-of-the-art methods,achieving an average F1-score of 97%using a majority-voting ensemble of XGBoost,Random Forest,and SVM classifiers,and improving recognition accuracy by more than 7%over previous best methods.This work not only advances the technical stateof-the-art in ArSL recognition but also provides a scalable,real-time solution for practical deployment in educational,social,and assistive communication technologies.Even though this study is about Arabic Sign Language,the framework proposed here can be extended to different sign languages,creating possibilities for potentially worldwide applicability in sign language recognition tasks.
基金supported by the National Natural Science Foundation of China(No.21978180).
文摘Language barrier is the main cause of disagreement.Sign language,which is a common language in all the worldwide language families,is difficult to be entirely popularized due to the high cost of learning as well as the technical barrier in real-time translation.To solve these problems,here,we constructed a wearable organohydrogel-based electronic skin(e-skin)with fast self-healing,strong adhesion,extraor-dinary anti-freezing and moisturizing properties for sign language recognition under complex environ-ments.The e-skin was obtained by using an acrylic network as the main body,aluminum(III)and bay-berry tannin as the crosslinking agent,water/ethylene glycol as the solvent system,and a polyvinyl al-cohol network to optimize the network performance.Using this e-skin,a smart glove was further built,which could carry out the large-scale data collection of common gestures and sign languages.With the help of the deep learning method,specific recognition and translation for various gestures and sign lan-guages could be achieved.The accuracy was 93.5%,showing the ultra-high classification accuracy of a sign language interpreter.In short,by integrating multiple characteristics and combining deep learning technology with hydrogel materials,the e-skin achieved an important breakthrough in human-computer interaction and artificial intelligence,and provided a feasible strategy for solving the dilemma of mutual exclusion between flexible electronic devices and human bodies.
基金supported from the National Philosophy and Social Sciences Foundation(Grant No.20BTQ065).
文摘Sign language,a visual-gestural language used by the deaf and hard-of-hearing community,plays a crucial role in facilitating communication and promoting inclusivity.Sign language recognition(SLR),the process of automatically recognizing and interpreting sign language gestures,has gained significant attention in recent years due to its potential to bridge the communication gap between the hearing impaired and the hearing world.The emergence and continuous development of deep learning techniques have provided inspiration and momentum for advancing SLR.This paper presents a comprehensive and up-to-date analysis of the advancements,challenges,and opportunities in deep learning-based sign language recognition,focusing on the past five years of research.We explore various aspects of SLR,including sign data acquisition technologies,sign language datasets,evaluation methods,and different types of neural networks.Convolutional Neural Networks(CNN)and Recurrent Neural Networks(RNN)have shown promising results in fingerspelling and isolated sign recognition.However,the continuous nature of sign language poses challenges,leading to the exploration of advanced neural network models such as the Transformer model for continuous sign language recognition(CSLR).Despite significant advancements,several challenges remain in the field of SLR.These challenges include expanding sign language datasets,achieving user independence in recognition systems,exploring different input modalities,effectively fusing features,modeling co-articulation,and improving semantic and syntactic understanding.Additionally,developing lightweight network architectures for mobile applications is crucial for practical implementation.By addressing these challenges,we can further advance the field of deep learning for sign language recognition and improve communication for the hearing-impaired community.
基金supported by National Social Science Foundation Annual Project“Research on Evaluation and Improvement Paths of Integrated Development of Disabled Persons”(Grant No.20BRK029)the National Language Commission’s“14th Five-Year Plan”Scientific Research Plan 2023 Project“Domain Digital Language Service Resource Construction and Key Technology Research”(YB145-72)the National Philosophy and Social Sciences Foundation(Grant No.20BTQ065).
文摘Research on Chinese Sign Language(CSL)provides convenience and support for individuals with hearing impairments to communicate and integrate into society.This article reviews the relevant literature on Chinese Sign Language Recognition(CSLR)in the past 20 years.Hidden Markov Models(HMM),Support Vector Machines(SVM),and Dynamic Time Warping(DTW)were found to be the most commonly employed technologies among traditional identificationmethods.Benefiting from the rapid development of computer vision and artificial intelligence technology,Convolutional Neural Networks(CNN),3D-CNN,YOLO,Capsule Network(CapsNet)and various deep neural networks have sprung up.Deep Neural Networks(DNNs)and their derived models are integral tomodern artificial intelligence recognitionmethods.In addition,technologies thatwerewidely used in the early days have also been integrated and applied to specific hybrid models and customized identification methods.Sign language data collection includes acquiring data from data gloves,data sensors(such as Kinect,LeapMotion,etc.),and high-definition photography.Meanwhile,facial expression recognition,complex background processing,and 3D sign language recognition have also attracted research interests among scholars.Due to the uniqueness and complexity of Chinese sign language,accuracy,robustness,real-time performance,and user independence are significant challenges for future sign language recognition research.Additionally,suitable datasets and evaluation criteria are also worth pursuing.
基金Supported by the National Natural Science Foundation of China(62072334).
文摘The hands and face are the most important parts for expressing sign language morphemes in sign language videos.However,we find that existing Continuous Sign Language Recognition(CSLR)methods lack the mining of hand and face information in visual backbones or use expensive and time-consuming external extractors to explore this information.In addition,the signs have different lengths,whereas previous CSLR methods typically use a fixed-length window to segment the video to capture sequential features and then perform global temporal modeling,which disturbs the perception of complete signs.In this study,we propose a Multi-Scale Context-Aware network(MSCA-Net)to solve the aforementioned problems.Our MSCA-Net contains two main modules:(1)Multi-Scale Motion Attention(MSMA),which uses the differences among frames to perceive information of the hands and face in multiple spatial scales,replacing the heavy feature extractors;and(2)Multi-Scale Temporal Modeling(MSTM),which explores crucial temporal information in the sign language video from different temporal scales.We conduct extensive experiments using three widely used sign language datasets,i.e.,RWTH-PHOENIX-Weather-2014,RWTH-PHOENIX-Weather-2014T,and CSL-Daily.The proposed MSCA-Net achieve state-of-the-art performance,demonstrating the effectiveness of our approach.
基金supported by the Key Research&Development Plan Project of Shandong Province,China(No.2017GGX10127).
文摘Continuous sign language recognition(CSLR)is challenging due to the complexity of video background,hand gesture variability,and temporal modeling difficulties.This work proposes a CSLR method based on a spatialtemporal graph attention network to focus on essential features of video series.The method considers local details of sign language movements by taking the information on joints and bones as inputs and constructing a spatialtemporal graph to reflect inter-frame relevance and physical connections between nodes.The graph-based multihead attention mechanism is utilized with adjacent matrix calculation for better local-feature exploration,and short-term motion correlation modeling is completed via a temporal convolutional network.We adopted BLSTM to learn the long-termdependence and connectionist temporal classification to align the word-level sequences.The proposed method achieves competitive results regarding word error rates(1.59%)on the Chinese Sign Language dataset and the mean Jaccard Index(65.78%)on the ChaLearn LAP Continuous Gesture Dataset.
文摘This document presents a computer vision system for the automatic recognition of Mexican Sign Language (MSL), based on normalized moments as invariant (to translation and scale transforms) descriptors, using artificial neural networks as pattern recognition model. An experimental feature selection was performed to reduce computational costs due to this work focusing on automatic recognition. The computer vision system includes four LED-reflectors of 700 lumens each in order to improve image acquisition quality;this illumination system allows reducing shadows in each sign of the MSL. MSL contains 27 signs in total but 6 of them are expressed with movement;this paper presents a framework for the automatic recognition of 21 static signs of MSL. The proposed system achieved 93% of recognition rate.
文摘Sign language is mainly utilized in communication with people who have hearing disabilities.Sign language is used to communicate with people hav-ing developmental impairments who have some or no interaction skills.The inter-action via Sign language becomes a fruitful means of communication for hearing and speech impaired persons.A Hand gesture recognition systemfinds helpful for deaf and dumb people by making use of human computer interface(HCI)and convolutional neural networks(CNN)for identifying the static indications of Indian Sign Language(ISL).This study introduces a shark smell optimization with deep learning based automated sign language recognition(SSODL-ASLR)model for hearing and speaking impaired people.The presented SSODL-ASLR technique majorly concentrates on the recognition and classification of sign lan-guage provided by deaf and dumb people.The presented SSODL-ASLR model encompasses a two stage process namely sign language detection and sign lan-guage classification.In thefirst stage,the Mask Region based Convolution Neural Network(Mask RCNN)model is exploited for sign language recognition.Sec-ondly,SSO algorithm with soft margin support vector machine(SM-SVM)model can be utilized for sign language classification.To assure the enhanced classifica-tion performance of the SSODL-ASLR model,a brief set of simulations was car-ried out.The extensive results portrayed the supremacy of the SSODL-ASLR model over other techniques.
文摘Hand gesture recognition system is considered as a way for more intuitive and proficient human computer interaction tool. The range of applications includes virtual prototyping, sign language analysis and medical training. In this paper, an efficient Indian Sign Language Recognition System (ISLR) is proposed for deaf and dump people using hand gesture images. The proposed ISLR system is considered as a pattern recognition technique that has two important modules: feature extraction and classification. The joint use of Discrete Wavelet Transform (DWT) based feature extraction and nearest neighbour classifier is used to recognize the sign language. The experimental results show that the proposed hand gesture recognition system achieves maximum 99.23% classification accuracy while using cosine distance classifier.
文摘Wearable sign language recognition helps hearing/speech impaired people communicate with non-signers.However current technologies still unsatisfy practical uses due to the limitations of sensing and decoding capabilities.Here,A continuous sign language recognition system is proposed with multimodal hand/finger movement sensing and fuzzy encoding,trained with small word-level samples from one user,but applicable to sentence-level language recognition for new untrained users,achieving data-efficient universal recognition.A stretchable fabric strain sensor is developed by printing conductive poly(3,4-ethylenedioxythiophene):poly(styrenesulfonate)(PEDOT:PSS)ink on a pre-stretched fabric wrapping rubber band,allowing the strain sensor with superior performances of wide sensing range,high sensitivity,good linearity,fast dynamic response,low hysteresis,and good long-term reliability.A flexible e-skin with a homemade micro-flow sensor array is further developed to accurately capture three-dimensional hand movements.Benefitting from fabric strain sensors for finger movement sensing,microflow sensor array for 3D hand movement sensing,and human-inspired fuzzy encoding for semantic comprehension,sign language is captured accurately without the interferences from individual action differences.Experiment results show that the semantic comprehension accuracy reaches 99.7%and 95%,respectively,in recognizing 100 isolated words and 50 sentences for a trained user,and achieves 80%in recognizing 50 sentences for new untrained users.
基金supported by Young Scientists Fund of the National Natural Science Foundation of China(62202356,62302373)Fundamental Research Funds for the Central Universities(ZYTS24092,QTZX24085).
文摘Skeleton-based sign language recognition(SLR)is a challenging research area mainly due to the fast and complex hand movement.Currently,graph convolution networks(GCNs)have been employed in skeleton-based SLR and achieved remarkable performance.However,existing GCN-based SLR methods suffer from a lack of explicit attention to hand topology which plays an important role in the sign language representation.To address this issue,we propose a novel hand-aware graph convolution network(HA-GCN)to focus on hand topological relationships of skeleton graph.Specifically,a hand-aware graph convolution layer is designed to capture both global body and local hand information,in which two sub-graphs are defined and incorporated to represent hand topology information.In addition,in order to eliminate the over-fitting problem,an adaptive DropGraph is designed in construction of hand-aware graph convolution block to remove the spatial and temporal redundancy in the sign language representation.With the aim to further improve the performance,the joints information,bones,together with their motion information are simultaneously modeled in a multi-stream framework.Extensive experiments on the two open-source datasets,AUTSL and INCLUDE,demonstrate that our proposed algorithm outperforms the state-of-the-art with a significant margin.Our code is available at https://github.com/snorlaxse/HA-SLR-GCN.
基金supported by the National Natural Science Foundation of China(No.62376197)the Tianjin Science and Technology Program(No.23JCYBJC00360)the Tianjin Health Research Project(No.TJWJ2025MS045).
文摘Accessible communication based on sign language recognition(SLR)is the key to emergency medical assistance for the hearing-impaired community.Balancing the capture of both local and global information in SLR for emergency medicine poses a significant challenge.To address this,we propose a novel approach based on the inter-learning of visual features between global and local information.Specifically,our method enhances the perception capabilities of the visual feature extractor by strategically leveraging the strengths of convolutional neural network(CNN),which are adept at capturing local features,and visual transformers which perform well at perceiving global features.Furthermore,to mitigate the issue of overfitting caused by the limited availability of sign language data for emergency medical applications,we introduce an enhanced short temporal module for data augmentation through additional subsequences.Experimental results on three publicly available sign language datasets demonstrate the efficacy of the proposed approach.
基金supported by the National Language Commission to research on sign language data specifications for artificial intelligence applications and test standards for language service translation systems (No.ZDI145-70)。
文摘Sign language dataset is essential in sign language recognition and translation(SLRT). Current public sign language datasets are small and lack diversity, which does not meet the practical application requirements for SLRT. However, making a large-scale and diverse sign language dataset is difficult as sign language data on the Internet is scarce. In making a large-scale and diverse sign language dataset, some sign language data qualities are not up to standard. This paper proposes a two information streams transformer(TIST) model to judge whether the quality of sign language data is qualified. To verify that TIST effectively improves sign language recognition(SLR), we make two datasets, the screened dataset and the unscreened dataset. In this experiment, this paper uses visual alignment constraint(VAC) as the baseline model. The experimental results show that the screened dataset can achieve better word error rate(WER) than the unscreened dataset.
基金funded by Researchers Supporting Project Number(RSPD2024 R947),King Saud University,Riyadh,Saudi Arabia.
文摘Hand gestures have been used as a significant mode of communication since the advent of human civilization.By facilitating human-computer interaction(HCI),hand gesture recognition(HGRoc)technology is crucial for seamless and error-free HCI.HGRoc technology is pivotal in healthcare and communication for the deaf community.Despite significant advancements in computer vision-based gesture recognition for language understanding,two considerable challenges persist in this field:(a)limited and common gestures are considered,(b)processing multiple channels of information across a network takes huge computational time during discriminative feature extraction.Therefore,a novel hand vision-based convolutional neural network(CNN)model named(HVCNNM)offers several benefits,notably enhanced accuracy,robustness to variations,real-time performance,reduced channels,and scalability.Additionally,these models can be optimized for real-time performance,learn from large amounts of data,and are scalable to handle complex recognition tasks for efficient human-computer interaction.The proposed model was evaluated on two challenging datasets,namely the Massey University Dataset(MUD)and the American Sign Language(ASL)Alphabet Dataset(ASLAD).On the MUD and ASLAD datasets,HVCNNM achieved a score of 99.23% and 99.00%,respectively.These results demonstrate the effectiveness of CNN as a promising HGRoc approach.The findings suggest that the proposed model have potential roles in applications such as sign language recognition,human-computer interaction,and robotics.
文摘This paper is a pilot study that investigates the attitudes towards the official recognition of Hong Kong Sign Language(HKSL)by Hong Kong citizens.We used video-chat software(mainly WhatsApp,and Facebook Messenger,but also FaceTime)to conduct long-distance semi-structured interviews with 30 participants grouped as deaf,hearing-related(hearing people that are closely involved in the Deaf community),and hearing-unrelated(hearing people that have little contact with deaf people and the Deaf community).Results show that the majority of participants(N=22)holds a supportive attitude towards the recognition of HKSL;Five participants hold a neutral position,and three participants hold a negative attitude towards it.We discussed each type of attitude in detail.Results show that participants’attitudes are positively related to their awareness of deaf people’s need,the understanding of‘language recognition’,and personal world views.In other words,the more participants are aware,the more they foster official recognition,at least as a general trend.Results also indicate that hearing people who are not involved in the Deaf community know very little about deaf people and the Deaf community,in general.At the end of the paper,we also reflect on two issues:we argue that the standardization of HKSL plays an important role in deaf education and empowering citizenship awareness and participation.
基金supported by the National Natural Science Foundation of China(Nos.62376197,62020106004,92048301,and 62202332).
文摘Continuous Chinese sign language recognition(CCSLR)methods have shown their strong ability to learn excellent model architectures from datasets.However,due to data insufficiency,it is difficult to complete the CCSLR task.In this work,we focus on a simple but important solution to alleviate data insufficiency:how to refine the model architecture of a CCSLR network to improve the robustness of feature processing by using some better-quality non-Chinese sign language datasets.To this end,a simple empirical study wasfirst conducted to verify the feasibility of knowledge transfer in the CCSLR task.Surprisingly,just by pre-training of our recognition model on a foreign sign language dataset,we can refine the model architecture and improve its robustness significantly.To make it more practical,the key issue of how tofine-tune the existing feature processing models for effective guidance should be carefully investigated.Then,we propose a novel scheme forfine-tuning of pre-trained models named FTP,which updates the spatial feature extractor initialized by a pre-trained backbone and freezes the temporal feature extractor implemented by a better shareable transformer encoder.Compared with the baseline method,our FTP method can achieve significant performance improvement on the public dataset USTC-CCSL.
基金sponsored by Prince Sattam Bin Abdulaziz University(PSAU)as part of funding for its SDG Roadmap Research Funding Programme project number PSAU-2023-SDG-2023/SDG/31.
文摘Hearing and Speech impairment can be congenital or acquired.Hearing and speech-impaired students often hesitate to pursue higher education in reputable institutions due to their challenges.However,the development of automated assistive learning tools within the educational field has empowered disabled students to pursue higher education in any field of study.Assistive learning devices enable students to access institutional resources and facilities fully.The proposed assistive learning and communication tool allows hearing and speech-impaired students to interact productively with their teachers and classmates.This tool converts the audio signals into sign language videos for the speech and hearing-impaired to follow and converts the sign language to text format for the teachers to follow.This educational tool for the speech and hearing-impaired is implemented by customized deep learning models such as Convolution neural networks(CNN),Residual neural Networks(ResNet),and stacked Long short-term memory(LSTM)network models.This assistive learning tool is a novel framework that interprets the static and dynamic gesture actions in American Sign Language(ASL).Such communicative tools empower the speech and hearing impaired to communicate effectively in a classroom environment and foster inclusivity.Customized deep learning models were developed and experimentally evaluated with the standard performance metrics.The model exhibits an accuracy of 99.7% for all static gesture classification and 99% for specific vocabulary of gesture action words.This two-way communicative and educational tool encourages social inclusion and a promising career for disabled students.
基金Princess Nourah bint Abdulrahman University Researchers Supporting Project Number(PNURSP2023R263)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia+1 种基金The authors would like to thank the Deanship of Scientific Research at Umm Al-Qura Universitysupporting this work by Grant Code:22UQU4310373DSR54.
文摘Sign language includes the motion of the arms and hands to communicate with people with hearing disabilities.Several models have been available in the literature for sign language detection and classification for enhanced outcomes.But the latest advancements in computer vision enable us to perform signs/gesture recognition using deep neural networks.This paper introduces an Arabic Sign Language Gesture Classification using Deer Hunting Optimization with Machine Learning(ASLGC-DHOML)model.The presented ASLGC-DHOML technique mainly concentrates on recognising and classifying sign language gestures.The presented ASLGC-DHOML model primarily pre-processes the input gesture images and generates feature vectors using the densely connected network(DenseNet169)model.For gesture recognition and classification,a multilayer perceptron(MLP)classifier is exploited to recognize and classify the existence of sign language gestures.Lastly,the DHO algorithm is utilized for parameter optimization of the MLP model.The experimental results of the ASLGC-DHOML model are tested and the outcomes are inspected under distinct aspects.The comparison analysis highlighted that the ASLGC-DHOML method has resulted in enhanced gesture classification results than other techniques with maximum accuracy of 92.88%.
文摘The scientific achievements appraising conference of multi-font printed Mongolian (mixed with Chinese and English) document recognition system and unified platform-based ethnic languages document recognition system organized by the Ministry of Education was held on January 29, 2007 at Tsinghua University.