Reliable human action recognition(HAR)in video sequences is critical for a wide range of applications,such as security surveillance,healthcare monitoring,and human-computer interaction.Several automated systems have b...Reliable human action recognition(HAR)in video sequences is critical for a wide range of applications,such as security surveillance,healthcare monitoring,and human-computer interaction.Several automated systems have been designed for this purpose;however,existing methods often struggle to effectively integrate spatial and temporal information from input samples such as 2-stream networks or 3D convolutional neural networks(CNNs),which limits their accuracy in discriminating numerous human actions.Therefore,this study introduces a novel deeplearning framework called theARNet,designed for robustHAR.ARNet consists of two mainmodules,namely,a refined InceptionResNet-V2-based CNN and a Bi-LSTM(Long Short-Term Memory)network.The refined InceptionResNet-V2 employs a parametric rectified linear unit(PReLU)activation strategy within convolutional layers to enhance spatial feature extraction fromindividual video frames.The inclusion of the PReLUmethod improves the spatial informationcapturing ability of the approach as it uses learnable parameters to adaptively control the slope of the negative part of the activation function,allowing richer gradient flow during backpropagation and resulting in robust information capturing and stable model training.These spatial features holding essential pixel characteristics are then processed by the Bi-LSTMmodule for temporal analysis,which assists the ARNet in understanding the dynamic behavior of actions over time.The ARNet integrates three additional dense layers after the Bi-LSTM module to ensure a comprehensive computation of both spatial and temporal patterns and further boost the feature representation.The experimental validation of the model is conducted on 3 benchmark datasets named HMDB51,KTH,and UCF Sports and reports accuracies of 93.82%,99%,and 99.16%,respectively.The Precision results of HMDB51,KTH,and UCF Sports datasets are 97.41%,99.54%,and 99.01%;the Recall values are 98.87%,98.60%,99.08%,and the F1-Score is 98.13%,99.07%,99.04%,respectively.These results highlight the robustness of the ARNet approach and its potential as a versatile tool for accurate HAR across various real-world applications.展开更多
Social media provide digitally interactional technologies to facilitate information sharing and exchanging individuals.Precisely,in case of disasters,a massive corpus is placed on platforms such as Twitter.Eyewitness ...Social media provide digitally interactional technologies to facilitate information sharing and exchanging individuals.Precisely,in case of disasters,a massive corpus is placed on platforms such as Twitter.Eyewitness accounts can benefit humanitarian organizations and agencies,but identifying the eyewitness Tweets related to the disaster from millions of Tweets is difficult.Different approaches have been developed to address this kind of problem.The recent state-of-the-art system was based on a manually created dictionary and this approach was further refined by introducing linguistic rules.However,these approaches suffer from limitations as they are dataset-dependent and not scalable.In this paper,we proposed a method to identify eyewitnesses from Twitter.To experiment,we utilized 13 features discovered by the pioneer of this domain and can classify the tweets to determine the eyewitness.Considering each feature,a dictionary of words was created with the Word Dictionary Maker algorithm,which is the crucial contribution of this research.This algorithm inputs some terms relevant to a specific feature for its initialization and then creates the words dictionary.Further,keyword matching for each feature in tweets is performed.If a feature exists in a tweet,it is termed as 1;otherwise,0.Similarly,for 13 features,we created a file that reflects features in each tweet.To classify the tweets based on features,Naïve Bayes,Random Forest,and Neural Network were utilized.The approach was implemented on different disasters like earthquakes,floods,hurricanes,and Forest fires.The results were compared with the state-of-the-art linguistic rule-based system with 0.81 F-measure values.At the same time,the proposed approach gained a 0.88 value of F-measure.The results were comparable as the proposed approach is not dataset-dependent.Therefore,it can be used for the identification of eyewitness accounts.展开更多
基金supported and funded by theDeanship of Scientific Research at ImamMohammad Ibn Saud Islamic University(IMSIU)(grant number IMSIU-DDRSP2504).
文摘Reliable human action recognition(HAR)in video sequences is critical for a wide range of applications,such as security surveillance,healthcare monitoring,and human-computer interaction.Several automated systems have been designed for this purpose;however,existing methods often struggle to effectively integrate spatial and temporal information from input samples such as 2-stream networks or 3D convolutional neural networks(CNNs),which limits their accuracy in discriminating numerous human actions.Therefore,this study introduces a novel deeplearning framework called theARNet,designed for robustHAR.ARNet consists of two mainmodules,namely,a refined InceptionResNet-V2-based CNN and a Bi-LSTM(Long Short-Term Memory)network.The refined InceptionResNet-V2 employs a parametric rectified linear unit(PReLU)activation strategy within convolutional layers to enhance spatial feature extraction fromindividual video frames.The inclusion of the PReLUmethod improves the spatial informationcapturing ability of the approach as it uses learnable parameters to adaptively control the slope of the negative part of the activation function,allowing richer gradient flow during backpropagation and resulting in robust information capturing and stable model training.These spatial features holding essential pixel characteristics are then processed by the Bi-LSTMmodule for temporal analysis,which assists the ARNet in understanding the dynamic behavior of actions over time.The ARNet integrates three additional dense layers after the Bi-LSTM module to ensure a comprehensive computation of both spatial and temporal patterns and further boost the feature representation.The experimental validation of the model is conducted on 3 benchmark datasets named HMDB51,KTH,and UCF Sports and reports accuracies of 93.82%,99%,and 99.16%,respectively.The Precision results of HMDB51,KTH,and UCF Sports datasets are 97.41%,99.54%,and 99.01%;the Recall values are 98.87%,98.60%,99.08%,and the F1-Score is 98.13%,99.07%,99.04%,respectively.These results highlight the robustness of the ARNet approach and its potential as a versatile tool for accurate HAR across various real-world applications.
基金This research is funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2022R54)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Social media provide digitally interactional technologies to facilitate information sharing and exchanging individuals.Precisely,in case of disasters,a massive corpus is placed on platforms such as Twitter.Eyewitness accounts can benefit humanitarian organizations and agencies,but identifying the eyewitness Tweets related to the disaster from millions of Tweets is difficult.Different approaches have been developed to address this kind of problem.The recent state-of-the-art system was based on a manually created dictionary and this approach was further refined by introducing linguistic rules.However,these approaches suffer from limitations as they are dataset-dependent and not scalable.In this paper,we proposed a method to identify eyewitnesses from Twitter.To experiment,we utilized 13 features discovered by the pioneer of this domain and can classify the tweets to determine the eyewitness.Considering each feature,a dictionary of words was created with the Word Dictionary Maker algorithm,which is the crucial contribution of this research.This algorithm inputs some terms relevant to a specific feature for its initialization and then creates the words dictionary.Further,keyword matching for each feature in tweets is performed.If a feature exists in a tweet,it is termed as 1;otherwise,0.Similarly,for 13 features,we created a file that reflects features in each tweet.To classify the tweets based on features,Naïve Bayes,Random Forest,and Neural Network were utilized.The approach was implemented on different disasters like earthquakes,floods,hurricanes,and Forest fires.The results were compared with the state-of-the-art linguistic rule-based system with 0.81 F-measure values.At the same time,the proposed approach gained a 0.88 value of F-measure.The results were comparable as the proposed approach is not dataset-dependent.Therefore,it can be used for the identification of eyewitness accounts.