Timely identification and treatment of medical conditions could facilitate faster recovery and better health.Existing systems address this issue using custom-built sensors,which are invasive and difficult to generaliz...Timely identification and treatment of medical conditions could facilitate faster recovery and better health.Existing systems address this issue using custom-built sensors,which are invasive and difficult to generalize.A low-complexity scalable process is proposed to detect and identify medical conditions from 2D skeletal movements on video feed data.Minimal set of features relevant to distinguish medical conditions:AMF,PVF and GDF are derived from skeletal data on sampled frames across the entire action.The AMF(angular motion features)are derived to capture the angular motion of limbs during a specific action.The relative position of joints is represented by PVF(positional variation features).GDF(global displacement features)identifies the direction of overall skeletal movement.The discriminative capability of these features is illustrated by their variance across time for different actions.The classification of medical conditions is approached in two stages.In the first stage,a low-complexity binary LSTM classifier is trained to distinguish visual medical conditions from general human actions.As part of stage 2,a multi-class LSTM classifier is trained to identify the exact medical condition from a given set of visually interpretable medical conditions.The proposed features are extracted from the 2D skeletal data of NTU RGB+D and then used to train the binary and multi-class LSTM classifiers.The binary and multi-class classifiers observed average F1 scores of 77%and 73%,respectively,while the overall system produced an average F1 score of 69%and a weighted average F1 score of 80%.The multi-class classifier is found to utilize 10 to 100 times fewer parameters than existing 2D CNN-based models while producing similar levels of accuracy.展开更多
Arabic Sign Language(ArSL)recognition plays a vital role in enhancing the communication for the Deaf and Hard of Hearing(DHH)community.Researchers have proposed multiple methods for automated recognition of ArSL;howev...Arabic Sign Language(ArSL)recognition plays a vital role in enhancing the communication for the Deaf and Hard of Hearing(DHH)community.Researchers have proposed multiple methods for automated recognition of ArSL;however,these methods face multiple challenges that include high gesture variability,occlusions,limited signer diversity,and the scarcity of large annotated datasets.Existing methods,often relying solely on either skeletal data or video-based features,struggle with generalization and robustness,especially in dynamic and real-world conditions.This paper proposes a novel multimodal ensemble classification framework that integrates geometric features derived from 3D skeletal joint distances and angles with temporal features extracted from RGB videos using the Inflated 3D ConvNet(I3D).By fusing these complementary modalities at the feature level and applying a majority-voting ensemble of XGBoost,Random Forest,and Support Vector Machine classifiers,the framework robustly captures both spatial configurations and motion dynamics of sign gestures.Feature selection using the Pearson Correlation Coefficient further enhances efficiency by reducing redundancy.Extensive experiments on the ArabSign dataset,which includes RGB videos and corresponding skeletal data,demonstrate that the proposed approach significantly outperforms state-of-the-art methods,achieving an average F1-score of 97%using a majority-voting ensemble of XGBoost,Random Forest,and SVM classifiers,and improving recognition accuracy by more than 7%over previous best methods.This work not only advances the technical stateof-the-art in ArSL recognition but also provides a scalable,real-time solution for practical deployment in educational,social,and assistive communication technologies.Even though this study is about Arabic Sign Language,the framework proposed here can be extended to different sign languages,creating possibilities for potentially worldwide applicability in sign language recognition tasks.展开更多
文摘Timely identification and treatment of medical conditions could facilitate faster recovery and better health.Existing systems address this issue using custom-built sensors,which are invasive and difficult to generalize.A low-complexity scalable process is proposed to detect and identify medical conditions from 2D skeletal movements on video feed data.Minimal set of features relevant to distinguish medical conditions:AMF,PVF and GDF are derived from skeletal data on sampled frames across the entire action.The AMF(angular motion features)are derived to capture the angular motion of limbs during a specific action.The relative position of joints is represented by PVF(positional variation features).GDF(global displacement features)identifies the direction of overall skeletal movement.The discriminative capability of these features is illustrated by their variance across time for different actions.The classification of medical conditions is approached in two stages.In the first stage,a low-complexity binary LSTM classifier is trained to distinguish visual medical conditions from general human actions.As part of stage 2,a multi-class LSTM classifier is trained to identify the exact medical condition from a given set of visually interpretable medical conditions.The proposed features are extracted from the 2D skeletal data of NTU RGB+D and then used to train the binary and multi-class LSTM classifiers.The binary and multi-class classifiers observed average F1 scores of 77%and 73%,respectively,while the overall system produced an average F1 score of 69%and a weighted average F1 score of 80%.The multi-class classifier is found to utilize 10 to 100 times fewer parameters than existing 2D CNN-based models while producing similar levels of accuracy.
基金funding this work through Research Group No.KS-2024-376.
文摘Arabic Sign Language(ArSL)recognition plays a vital role in enhancing the communication for the Deaf and Hard of Hearing(DHH)community.Researchers have proposed multiple methods for automated recognition of ArSL;however,these methods face multiple challenges that include high gesture variability,occlusions,limited signer diversity,and the scarcity of large annotated datasets.Existing methods,often relying solely on either skeletal data or video-based features,struggle with generalization and robustness,especially in dynamic and real-world conditions.This paper proposes a novel multimodal ensemble classification framework that integrates geometric features derived from 3D skeletal joint distances and angles with temporal features extracted from RGB videos using the Inflated 3D ConvNet(I3D).By fusing these complementary modalities at the feature level and applying a majority-voting ensemble of XGBoost,Random Forest,and Support Vector Machine classifiers,the framework robustly captures both spatial configurations and motion dynamics of sign gestures.Feature selection using the Pearson Correlation Coefficient further enhances efficiency by reducing redundancy.Extensive experiments on the ArabSign dataset,which includes RGB videos and corresponding skeletal data,demonstrate that the proposed approach significantly outperforms state-of-the-art methods,achieving an average F1-score of 97%using a majority-voting ensemble of XGBoost,Random Forest,and SVM classifiers,and improving recognition accuracy by more than 7%over previous best methods.This work not only advances the technical stateof-the-art in ArSL recognition but also provides a scalable,real-time solution for practical deployment in educational,social,and assistive communication technologies.Even though this study is about Arabic Sign Language,the framework proposed here can be extended to different sign languages,creating possibilities for potentially worldwide applicability in sign language recognition tasks.