Arabic Sign Language(ArSL)recognition plays a vital role in enhancing the communication for the Deaf and Hard of Hearing(DHH)community.Researchers have proposed multiple methods for automated recognition of ArSL;howev...Arabic Sign Language(ArSL)recognition plays a vital role in enhancing the communication for the Deaf and Hard of Hearing(DHH)community.Researchers have proposed multiple methods for automated recognition of ArSL;however,these methods face multiple challenges that include high gesture variability,occlusions,limited signer diversity,and the scarcity of large annotated datasets.Existing methods,often relying solely on either skeletal data or video-based features,struggle with generalization and robustness,especially in dynamic and real-world conditions.This paper proposes a novel multimodal ensemble classification framework that integrates geometric features derived from 3D skeletal joint distances and angles with temporal features extracted from RGB videos using the Inflated 3D ConvNet(I3D).By fusing these complementary modalities at the feature level and applying a majority-voting ensemble of XGBoost,Random Forest,and Support Vector Machine classifiers,the framework robustly captures both spatial configurations and motion dynamics of sign gestures.Feature selection using the Pearson Correlation Coefficient further enhances efficiency by reducing redundancy.Extensive experiments on the ArabSign dataset,which includes RGB videos and corresponding skeletal data,demonstrate that the proposed approach significantly outperforms state-of-the-art methods,achieving an average F1-score of 97%using a majority-voting ensemble of XGBoost,Random Forest,and SVM classifiers,and improving recognition accuracy by more than 7%over previous best methods.This work not only advances the technical stateof-the-art in ArSL recognition but also provides a scalable,real-time solution for practical deployment in educational,social,and assistive communication technologies.Even though this study is about Arabic Sign Language,the framework proposed here can be extended to different sign languages,creating possibilities for potentially worldwide applicability in sign language recognition tasks.展开更多
In many animal-related studies, a high-performance animal behavior recognition system can help researchers reduce or get rid of the limitation of human assessments and make the experiments easier to reproduce. Recentl...In many animal-related studies, a high-performance animal behavior recognition system can help researchers reduce or get rid of the limitation of human assessments and make the experiments easier to reproduce. Recently, although deep learning models are holding state-of-the-art performances in human action recognition tasks, these models are not well-studied in applying to animal behavior recognition tasks. One reason is the lack of extensive datasets which are required to train these deep models for good performances. In this research, we investigated two current state-of-the-art deep learning models in human action recognition tasks, the I3D model and the R(2 + 1)D model, in solving a mouse behavior recognition task. We compared their performances with other models from previous researches and the results showed that the deep learning models that pre-trained using human action datasets then fine-tuned using the mouse behavior dataset can outperform other models from previous researches. It also shows promises of applying these deep learning models to other animal behavior recognition tasks without any significant modification in the models’ architecture, all we need to do is collecting proper datasets for the tasks and fine-tuning the pre-trained models using the collected data.展开更多
基金funding this work through Research Group No.KS-2024-376.
文摘Arabic Sign Language(ArSL)recognition plays a vital role in enhancing the communication for the Deaf and Hard of Hearing(DHH)community.Researchers have proposed multiple methods for automated recognition of ArSL;however,these methods face multiple challenges that include high gesture variability,occlusions,limited signer diversity,and the scarcity of large annotated datasets.Existing methods,often relying solely on either skeletal data or video-based features,struggle with generalization and robustness,especially in dynamic and real-world conditions.This paper proposes a novel multimodal ensemble classification framework that integrates geometric features derived from 3D skeletal joint distances and angles with temporal features extracted from RGB videos using the Inflated 3D ConvNet(I3D).By fusing these complementary modalities at the feature level and applying a majority-voting ensemble of XGBoost,Random Forest,and Support Vector Machine classifiers,the framework robustly captures both spatial configurations and motion dynamics of sign gestures.Feature selection using the Pearson Correlation Coefficient further enhances efficiency by reducing redundancy.Extensive experiments on the ArabSign dataset,which includes RGB videos and corresponding skeletal data,demonstrate that the proposed approach significantly outperforms state-of-the-art methods,achieving an average F1-score of 97%using a majority-voting ensemble of XGBoost,Random Forest,and SVM classifiers,and improving recognition accuracy by more than 7%over previous best methods.This work not only advances the technical stateof-the-art in ArSL recognition but also provides a scalable,real-time solution for practical deployment in educational,social,and assistive communication technologies.Even though this study is about Arabic Sign Language,the framework proposed here can be extended to different sign languages,creating possibilities for potentially worldwide applicability in sign language recognition tasks.
文摘In many animal-related studies, a high-performance animal behavior recognition system can help researchers reduce or get rid of the limitation of human assessments and make the experiments easier to reproduce. Recently, although deep learning models are holding state-of-the-art performances in human action recognition tasks, these models are not well-studied in applying to animal behavior recognition tasks. One reason is the lack of extensive datasets which are required to train these deep models for good performances. In this research, we investigated two current state-of-the-art deep learning models in human action recognition tasks, the I3D model and the R(2 + 1)D model, in solving a mouse behavior recognition task. We compared their performances with other models from previous researches and the results showed that the deep learning models that pre-trained using human action datasets then fine-tuned using the mouse behavior dataset can outperform other models from previous researches. It also shows promises of applying these deep learning models to other animal behavior recognition tasks without any significant modification in the models’ architecture, all we need to do is collecting proper datasets for the tasks and fine-tuning the pre-trained models using the collected data.