期刊文献+
共找到94篇文章
< 1 2 5 >
每页显示 20 50 100
ARNet:Integrating Spatial and Temporal Deep Learning for Robust Action Recognition in Videos
1
作者 Hussain Dawood Marriam Nawaz +3 位作者 Tahira Nazir Ali Javed Abdul Khader Jilani Saudagar Hatoon S.AlSagri 《Computer Modeling in Engineering & Sciences》 2025年第7期429-459,共31页
Reliable human action recognition(HAR)in video sequences is critical for a wide range of applications,such as security surveillance,healthcare monitoring,and human-computer interaction.Several automated systems have b... Reliable human action recognition(HAR)in video sequences is critical for a wide range of applications,such as security surveillance,healthcare monitoring,and human-computer interaction.Several automated systems have been designed for this purpose;however,existing methods often struggle to effectively integrate spatial and temporal information from input samples such as 2-stream networks or 3D convolutional neural networks(CNNs),which limits their accuracy in discriminating numerous human actions.Therefore,this study introduces a novel deeplearning framework called theARNet,designed for robustHAR.ARNet consists of two mainmodules,namely,a refined InceptionResNet-V2-based CNN and a Bi-LSTM(Long Short-Term Memory)network.The refined InceptionResNet-V2 employs a parametric rectified linear unit(PReLU)activation strategy within convolutional layers to enhance spatial feature extraction fromindividual video frames.The inclusion of the PReLUmethod improves the spatial informationcapturing ability of the approach as it uses learnable parameters to adaptively control the slope of the negative part of the activation function,allowing richer gradient flow during backpropagation and resulting in robust information capturing and stable model training.These spatial features holding essential pixel characteristics are then processed by the Bi-LSTMmodule for temporal analysis,which assists the ARNet in understanding the dynamic behavior of actions over time.The ARNet integrates three additional dense layers after the Bi-LSTM module to ensure a comprehensive computation of both spatial and temporal patterns and further boost the feature representation.The experimental validation of the model is conducted on 3 benchmark datasets named HMDB51,KTH,and UCF Sports and reports accuracies of 93.82%,99%,and 99.16%,respectively.The Precision results of HMDB51,KTH,and UCF Sports datasets are 97.41%,99.54%,and 99.01%;the Recall values are 98.87%,98.60%,99.08%,and the F1-Score is 98.13%,99.07%,99.04%,respectively.These results highlight the robustness of the ARNet approach and its potential as a versatile tool for accurate HAR across various real-world applications. 展开更多
关键词 action recognition Bi-LSTM computer vision deep learning InceptionResNet-V2 PReLU
在线阅读 下载PDF
A Novel Attention-Based Parallel Blocks Deep Architecture for Human Action Recognition
2
作者 Yasir Khan Jadoon Yasir Noman Khalid +4 位作者 Muhammad Attique Khan Jungpil Shin Fatimah Alhayan Hee-Chan Cho Byoungchol Chang 《Computer Modeling in Engineering & Sciences》 2025年第7期1143-1164,共22页
Real-time surveillance is attributed to recognizing the variety of actions performed by humans.Human Action Recognition(HAR)is a technique that recognizes human actions from a video stream.A range of variations in hum... Real-time surveillance is attributed to recognizing the variety of actions performed by humans.Human Action Recognition(HAR)is a technique that recognizes human actions from a video stream.A range of variations in human actions makes it difficult to recognize with considerable accuracy.This paper presents a novel deep neural network architecture called Attention RB-Net for HAR using video frames.The input is provided to the model in the form of video frames.The proposed deep architecture is based on the unique structuring of residual blocks with several filter sizes.Features are extracted from each frame via several operations with specific parameters defined in the presented novel Attention-based Residual Bottleneck(Attention-RB)DCNN architecture.A fully connected layer receives an attention-based features matrix,and final classification is performed.Several hyperparameters of the proposed model are initialized using Bayesian Optimization(BO)and later utilized in the trained model for testing.In testing,features are extracted from the self-attention layer and passed to neural network classifiers for the final action classification.Two highly cited datasets,HMDB51 and UCF101,were used to validate the proposed architecture and obtained an average accuracy of 87.70%and 97.30%,respectively.The deep convolutional neural network(DCNN)architecture is compared with state-of-the-art(SOTA)methods,including pre-trained models,inside blocks,and recently published techniques,and performs better. 展开更多
关键词 Human action recognition self-attention video streams residual bottleneck classification neural networks
在线阅读 下载PDF
An Efficient Temporal Decoding Module for Action Recognition
3
作者 HUANG Qiubo MEI Jianmin +3 位作者 ZHAO Wupeng LU Yiru WANG Mei CHEN Dehua 《Journal of Donghua University(English Edition)》 2025年第2期187-196,共10页
Action recognition,a fundamental task in the field of video understanding,has been extensively researched and applied.In contrast to an image,a video introduces an extra temporal dimension.However,many existing action... Action recognition,a fundamental task in the field of video understanding,has been extensively researched and applied.In contrast to an image,a video introduces an extra temporal dimension.However,many existing action recognition networks either perform simple temporal fusion through averaging or rely on pre-trained models from image recognition,resulting in limited temporal information extraction capabilities.This work proposes a highly efficient temporal decoding module that can be seamlessly integrated into any action recognition backbone network to enhance the focus on temporal relationships between video frames.Firstly,the decoder initializes a set of learnable queries,termed video-level action category prediction queries.Then,they are combined with the video frame features extracted by the backbone network after self-attention learning to extract video context information.Finally,these prediction queries with rich temporal features are used for category prediction.Experimental results on HMDB51,MSRDailyAct3D,Diving48 and Breakfast datasets show that using TokShift-Transformer and VideoMAE as encoders results in a significant improvement in Top-1 accuracy compared to the original models(TokShift-Transformer and VideoMAE),after introducing the proposed temporal decoder.The introduction of the temporal decoder results in an average performance increase exceeding 11%for TokShift-Transformer and nearly 5%for VideoMAE across the four datasets.Furthermore,the work explores the combination of the decoder with various action recognition networks,including Timesformer,as encoders.This results in an average accuracy improvement of more than 3.5%on the HMDB51 dataset.The code is available at https://github.com/huangturbo/TempDecoder. 展开更多
关键词 action recognition video understanding temporal relationship temporal decoder TRANSFORMER
在线阅读 下载PDF
Dual-branch spatial-temporal decoupled fusion transformer for safety action recognition in smart grid substation
4
作者 HAO Yu ZHENG Hao +3 位作者 WANG Tongwen WANG Yu SUN Wei ZHANG Shujuan 《Optoelectronics Letters》 2025年第8期507-512,共6页
Smart grid substation operations often take place in hazardous environments and pose significant threats to the safety of power personnel.Relying solely on manual supervision can lead to inadequate oversight.In respon... Smart grid substation operations often take place in hazardous environments and pose significant threats to the safety of power personnel.Relying solely on manual supervision can lead to inadequate oversight.In response to the demand for technology to identify improper operations in substation work scenarios,this paper proposes a substation safety action recognition technology to avoid the misoperation and enhance the safety management.In general,this paper utilizes a dual-branch transformer network to extract spatial and temporal information from the video dataset of operational behaviors in complex substation environments.Firstly,in order to capture the spatial-temporal correlation of people's behaviors in smart grid substation,we devise a sparse attention module and a segmented linear attention module that are embedded into spatial branch transformer and temporal branch transformer respectively.To avoid the redundancy of spatial and temporal information,we fuse the temporal and spatial features using a tensor decomposition fusion module by a decoupled manner.Experimental results indicate that our proposed method accurately detects improper operational behaviors in substation work scenarios,outperforming other existing methods in terms of detection and recognition accuracy. 展开更多
关键词 identify improper operations manual supervision avoid misoperation spatial temporal substation safety action recognition technology dual branch decoupled fusion enhance safety managementin
原文传递
Skeleton-Based Action Recognition Using Graph Convolutional Network with Pose Correction and Channel Topology Refinement
5
作者 Yuxin Gao Xiaodong Duan Qiguo Dai 《Computers, Materials & Continua》 2025年第4期701-718,共18页
Graph convolutional network(GCN)as an essential tool in human action recognition tasks have achieved excellent performance in previous studies.However,most current skeleton-based action recognition using GCN methods u... Graph convolutional network(GCN)as an essential tool in human action recognition tasks have achieved excellent performance in previous studies.However,most current skeleton-based action recognition using GCN methods use a shared topology,which cannot flexibly adapt to the diverse correlations between joints under different motion features.The video-shooting angle or the occlusion of the body parts may bring about errors when extracting the human pose coordinates with estimation algorithms.In this work,we propose a novel graph convolutional learning framework,called PCCTR-GCN,which integrates pose correction and channel topology refinement for skeleton-based human action recognition.Firstly,a pose correction module(PCM)is introduced,which corrects the pose coordinates of the input network to reduce the error in pose feature extraction.Secondly,channel topology refinement graph convolution(CTR-GC)is employed,which can dynamically learn the topology features and aggregate joint features in different channel dimensions so as to enhance the performance of graph convolution networks in feature extraction.Finally,considering that the joint stream and bone stream of skeleton data and their dynamic information are also important for distinguishing different actions,we employ a multi-stream data fusion approach to improve the network’s recognition performance.We evaluate the model using top-1 and top-5 classification accuracy.On the benchmark datasets iMiGUE and Kinetics,the top-1 classification accuracy reaches 55.08%and 36.5%,respectively,while the top-5 classification accuracy reaches 89.98%and 59.2%,respectively.On the NTU dataset,for the two benchmark RGB+Dsettings(X-Sub and X-View),the classification accuracy achieves 89.7%and 95.4%,respectively. 展开更多
关键词 Pose correction multi-stream fusion GCN action recognition
在线阅读 下载PDF
Dual-channel graph convolutional network with multi-order information fusion for skeleton-based action recognition
6
作者 JIANG Tao HU Zhentao +2 位作者 WANG Kaige QIU Qian REN Xing 《High Technology Letters》 2025年第3期257-265,共9页
Skeleton-based human action recognition focuses on identifying actions from dynamic skeletal data,which contains both temporal and spatial characteristics.However,this approach faces chal-lenges such as viewpoint vari... Skeleton-based human action recognition focuses on identifying actions from dynamic skeletal data,which contains both temporal and spatial characteristics.However,this approach faces chal-lenges such as viewpoint variations,low recognition accuracy,and high model complexity.Skeleton-based graph convolutional network(GCN)generally outperform other deep learning methods in rec-ognition accuracy.However,they often underutilize temporal features and suffer from high model complexity,leading to increased training and validation costs,especially on large-scale datasets.This paper proposes a dual-channel graph convolutional network with multi-order information fusion(DM-AGCN)for human action recognition.The network integrates high frame rate skeleton chan-nels to capture action dynamics and low frame rate channels to preserve static semantic information,effectively balancing temporal and spatial features.This dual-channel architecture allows for separate processing of temporal and spatial information.Additionally,DM-AGCN extracts joint keypoints and bidirectional bone vectors from skeleton sequences,and employs a three-stream graph convolu-tional structure to extract features that describe human movement.Experimental results on the NTU-RGB+D dataset demonstrate that DM-AGCN achieves an accuracy of 89.4%on the X-Sub and 95.8%on the X-View,while reducing model complexity to 3.68 GFLOPs(Giga Floating-point Oper-ations Per Second).On the Kinetics-Skeleton dataset,the model achieves a Top-1 accuracy of 37.2%and a Top-5 accuracy of 60.3%,further validating its effectiveness across different benchmarks. 展开更多
关键词 human action recognition graph convolutional network spatiotemporal fusion feature extraction
在线阅读 下载PDF
Video Action Recognition Method Based on Personalized Federated Learning and Spatiotemporal Features
7
作者 Rongsen Wu Jie Xu +6 位作者 Yuhang Zhang Changming Zhao Yiweng Xie Zelei Wu Yunji Li Jinhong Guo Shiyang Tang 《Computers, Materials & Continua》 2025年第6期4961-4978,共18页
With the rapid development of artificial intelligence and Internet of Things technologies,video action recognition technology is widely applied in various scenarios,such as personal life and industrial production.Howe... With the rapid development of artificial intelligence and Internet of Things technologies,video action recognition technology is widely applied in various scenarios,such as personal life and industrial production.However,while enjoying the convenience brought by this technology,it is crucial to effectively protect the privacy of users’video data.Therefore,this paper proposes a video action recognition method based on personalized federated learning and spatiotemporal features.Under the framework of federated learning,a video action recognition method leveraging spatiotemporal features is designed.For the local spatiotemporal features of the video,a new differential information extraction scheme is proposed to extract differential features with a single RGB frame as the center,and a spatialtemporal module based on local information is designed to improve the effectiveness of local feature extraction;for the global temporal features,a method of extracting action rhythm features using differential technology is proposed,and a timemodule based on global information is designed.Different translational strides are used in the module to obtain bidirectional differential features under different action rhythms.Additionally,to address user data privacy issues,the method divides model parameters into local private parameters and public parameters based on the structure of the video action recognition model.This approach enhancesmodel training performance and ensures the security of video data.The experimental results show that under personalized federated learning conditions,an average accuracy of 97.792%was achieved on the UCF-101 dataset,which is non-independent and identically distributed(non-IID).This research provides technical support for privacy protection in video action recognition. 展开更多
关键词 Video action recognition personalized federated learning spatiotemporal features data privacy
在线阅读 下载PDF
Lightweight Classroom Student Action Recognition Method Based on Spatiotemporal Multimodal Feature Fusion
8
作者 Shaodong Zou Di Wu +2 位作者 Jianhou Gan Juxiang Zhou Jiatian Mei 《Computers, Materials & Continua》 2025年第4期1101-1116,共16页
The task of student action recognition in the classroom is to precisely capture and analyze the actions of students in classroom videos,providing a foundation for realizing intelligent and accurate teaching.However,th... The task of student action recognition in the classroom is to precisely capture and analyze the actions of students in classroom videos,providing a foundation for realizing intelligent and accurate teaching.However,the complex nature of the classroom environment has added challenges and difficulties in the process of student action recognition.In this research article,with regard to the circumstances where students are prone to be occluded and classroom computing resources are restricted in real classroom scenarios,a lightweight multi-modal fusion action recognition approach is put forward.This proposed method is capable of enhancing the accuracy of student action recognition while concurrently diminishing the number of parameters of the model and the Computation Amount,thereby achieving a more efficient and accurate recognition performance.In the feature extraction stage,this method fuses the keypoint heatmap with the RGB(Red-Green-Blue color model)image.In order to fully utilize the unique information of different modalities for feature complementarity,a Feature Fusion Module(FFE)is introduced.The FFE encodes and fuses the unique features of the two modalities during the feature extraction process.This fusion strategy not only achieves fusion and complementarity between modalities,but also improves the overall model performance.Furthermore,to reduce the computational load and parameter scale of the model,we use keypoint information to crop RGB images.At the same time,the first three networks of the lightweight feature extraction network X3D are used to extract dual-branch features.These methods significantly reduce the computational load and parameter scale.The number of parameters of the model is 1.40 million,and the computation amount is 5.04 billion floating-point operations per second(GFLOPs),achieving an efficient lightweight design.In the Student Classroom Action Dataset(SCAD),the accuracy of the model is 88.36%.In NTU 60(Nanyang Technological University Red-Green-Blue-Depth RGB+Ddataset with 60 categories),the accuracies on X-Sub(The people in the training set are different from those in the test set)and X-View(The perspectives of the training set and the test set are different)are 95.76%and 98.82%,respectively.On the NTU 120 dataset(Nanyang Technological University Red-Green-Blue-Depth dataset with 120 categories),RGB+Dthe accuracies on X-Sub and X-Set(the perspectives of the training set and the test set are different)are 91.97%and 93.45%,respectively.The model has achieved a balance in terms of accuracy,computation amount,and the number of parameters. 展开更多
关键词 action recognition student classroom action multimodal fusion lightweight model design
在线阅读 下载PDF
Two-Stream Auto-Encoder Network for Unsupervised Skeleton-Based Action Recognition
9
作者 WANG Gang GUAN Yaonan LI Dewei 《Journal of Shanghai Jiaotong university(Science)》 2025年第2期330-336,共7页
Representation learning from unlabeled skeleton data is a challenging task.Prior unsupervised learning algorithms mainly rely on the modeling ability of recurrent neural networks to extract the action representations.... Representation learning from unlabeled skeleton data is a challenging task.Prior unsupervised learning algorithms mainly rely on the modeling ability of recurrent neural networks to extract the action representations.However,the structural information of the skeleton data,which also plays a critical role in action recognition,is rarely explored in existing unsupervised methods.To deal with this limitation,we propose a novel twostream autoencoder network to combine the topological information with temporal information of skeleton data.Specifically,we encode the graph structure by graph convolutional network(GCN)and integrate the extracted GCN-based representations into the gate recurrent unit stream.Then we design a transfer module to merge the representations of the two streams adaptively.According to the characteristics of the two-stream autoencoder,a unified loss function composed of multiple tasks is proposed to update the learnable parameters of our model.Comprehensive experiments on NW-UCLA,UWA3D,and NTU-RGBD 60 datasets demonstrate that our proposed method can achieve an excellent performance among the unsupervised skeleton-based methods and even perform a similar or superior performance over numerous supervised skeleton-based methods. 展开更多
关键词 representation learning skeleton-based action recognition unsupervised deep learning
原文传递
Workout Action Recognition in Video Streams Using an Attention Driven Residual DC-GRU Network 被引量:2
10
作者 Arnab Dey Samit Biswas Dac-Nhuong Le 《Computers, Materials & Continua》 SCIE EI 2024年第5期3067-3087,共21页
Regular exercise is a crucial aspect of daily life, as it enables individuals to stay physically active, lowers thelikelihood of developing illnesses, and enhances life expectancy. The recognition of workout actions i... Regular exercise is a crucial aspect of daily life, as it enables individuals to stay physically active, lowers thelikelihood of developing illnesses, and enhances life expectancy. The recognition of workout actions in videostreams holds significant importance in computer vision research, as it aims to enhance exercise adherence, enableinstant recognition, advance fitness tracking technologies, and optimize fitness routines. However, existing actiondatasets often lack diversity and specificity for workout actions, hindering the development of accurate recognitionmodels. To address this gap, the Workout Action Video dataset (WAVd) has been introduced as a significantcontribution. WAVd comprises a diverse collection of labeled workout action videos, meticulously curated toencompass various exercises performed by numerous individuals in different settings. This research proposes aninnovative framework based on the Attention driven Residual Deep Convolutional-Gated Recurrent Unit (ResDCGRU)network for workout action recognition in video streams. Unlike image-based action recognition, videoscontain spatio-temporal information, making the task more complex and challenging. While substantial progresshas been made in this area, challenges persist in detecting subtle and complex actions, handling occlusions,and managing the computational demands of deep learning approaches. The proposed ResDC-GRU Attentionmodel demonstrated exceptional classification performance with 95.81% accuracy in classifying workout actionvideos and also outperformed various state-of-the-art models. The method also yielded 81.6%, 97.2%, 95.6%, and93.2% accuracy on established benchmark datasets, namely HMDB51, Youtube Actions, UCF50, and UCF101,respectively, showcasing its superiority and robustness in action recognition. The findings suggest practicalimplications in real-world scenarios where precise video action recognition is paramount, addressing the persistingchallenges in the field. TheWAVd dataset serves as a catalyst for the development ofmore robust and effective fitnesstracking systems and ultimately promotes healthier lifestyles through improved exercise monitoring and analysis. 展开更多
关键词 Workout action recognition video stream action recognition residual network GRU ATTENTION
在线阅读 下载PDF
Abnormal Action Recognition with Lightweight Pose Estimation Network in Electric Power Training Scene
11
作者 Yunfeng Cai Ran Qin +3 位作者 Jin Tang Long Zhang Xiaotian Bi Qing Yang 《Computers, Materials & Continua》 SCIE EI 2024年第6期4979-4994,共16页
Electric power training is essential for ensuring the safety and reliability of the system.In this study,we introduce a novel Abnormal Action Recognition(AAR)system that utilizes a Lightweight Pose Estimation Network(... Electric power training is essential for ensuring the safety and reliability of the system.In this study,we introduce a novel Abnormal Action Recognition(AAR)system that utilizes a Lightweight Pose Estimation Network(LPEN)to efficiently and effectively detect abnormal fall-down and trespass incidents in electric power training scenarios.The LPEN network,comprising three stages—MobileNet,Initial Stage,and Refinement Stage—is employed to swiftly extract image features,detect human key points,and refine them for accurate analysis.Subsequently,a Pose-aware Action Analysis Module(PAAM)captures the positional coordinates of human skeletal points in each frame.Finally,an Abnormal Action Inference Module(AAIM)evaluates whether abnormal fall-down or unauthorized trespass behavior is occurring.For fall-down recognition,three criteria—falling speed,main angles of skeletal points,and the person’s bounding box—are considered.To identify unauthorized trespass,emphasis is placed on the position of the ankles.Extensive experiments validate the effectiveness and efficiency of the proposed system in ensuring the safety and reliability of electric power training. 展开更多
关键词 Abnormal action recognition action recognition lightweight pose estimation electric power training
在线阅读 下载PDF
BCCLR:A Skeleton-Based Action Recognition with Graph Convolutional Network Combining Behavior Dependence and Context Clues 被引量:4
12
作者 Yunhe Wang Yuxin Xia Shuai Liu 《Computers, Materials & Continua》 SCIE EI 2024年第3期4489-4507,共19页
In recent years,skeleton-based action recognition has made great achievements in Computer Vision.A graph convolutional network(GCN)is effective for action recognition,modelling the human skeleton as a spatio-temporal ... In recent years,skeleton-based action recognition has made great achievements in Computer Vision.A graph convolutional network(GCN)is effective for action recognition,modelling the human skeleton as a spatio-temporal graph.Most GCNs define the graph topology by physical relations of the human joints.However,this predefined graph ignores the spatial relationship between non-adjacent joint pairs in special actions and the behavior dependence between joint pairs,resulting in a low recognition rate for specific actions with implicit correlation between joint pairs.In addition,existing methods ignore the trend correlation between adjacent frames within an action and context clues,leading to erroneous action recognition with similar poses.Therefore,this study proposes a learnable GCN based on behavior dependence,which considers implicit joint correlation by constructing a dynamic learnable graph with extraction of specific behavior dependence of joint pairs.By using the weight relationship between the joint pairs,an adaptive model is constructed.It also designs a self-attention module to obtain their inter-frame topological relationship for exploring the context of actions.Combining the shared topology and the multi-head self-attention map,the module obtains the context-based clue topology to update the dynamic graph convolution,achieving accurate recognition of different actions with similar poses.Detailed experiments on public datasets demonstrate that the proposed method achieves better results and realizes higher quality representation of actions under various evaluation protocols compared to state-of-the-art methods. 展开更多
关键词 action recognition deep learning GCN behavior dependence context clue self-attention
在线阅读 下载PDF
Explore human parsing modality for action recognition 被引量:1
13
作者 Jinfu Liu Runwei Ding +5 位作者 Yuhang Wen Nan Dai Fanyang Meng Fang-Lue Zhang Shen Zhao Mengyuan Liu 《CAAI Transactions on Intelligence Technology》 2024年第6期1623-1633,共11页
Multimodal-based action recognition methods have achieved high success using pose and RGB modality.However,skeletons sequences lack appearance depiction and RGB images suffer irrelevant noise due to modality limitatio... Multimodal-based action recognition methods have achieved high success using pose and RGB modality.However,skeletons sequences lack appearance depiction and RGB images suffer irrelevant noise due to modality limitations.To address this,the authors introduce human parsing feature map as a novel modality,since it can selectively retain effective semantic features of the body parts while filtering out most irrelevant noise.The authors propose a new dual-branch framework called ensemble human parsing and pose network(EPP-Net),which is the first to leverage both skeletons and human parsing modalities for action recognition.The first human pose branch feeds robust skeletons in the graph convolutional network to model pose features,while the second human parsing branch also leverages depictive parsing feature maps to model parsing features via convolutional backbones.The two high-level features will be effectively combined through a late fusion strategy for better action recognition.Extensive experiments on NTU RGB t D and NTU RGB t D 120 benchmarks consistently verify the effectiveness of our proposed EPP-Net,which outperforms the existing action recognition methods.Our code is available at https://github.com/liujf69/EPP-Net-Action. 展开更多
关键词 action recognition human parsing human skeletons
在线阅读 下载PDF
HgaNets:Fusion of Visual Data and Skeletal Heatmap for Human Gesture Action Recognition
14
作者 Wuyan Liang Xiaolong Xu 《Computers, Materials & Continua》 SCIE EI 2024年第4期1089-1103,共15页
Recognition of human gesture actions is a challenging issue due to the complex patterns in both visual andskeletal features. Existing gesture action recognition (GAR) methods typically analyze visual and skeletal data... Recognition of human gesture actions is a challenging issue due to the complex patterns in both visual andskeletal features. Existing gesture action recognition (GAR) methods typically analyze visual and skeletal data,failing to meet the demands of various scenarios. Furthermore, multi-modal approaches lack the versatility toefficiently process both uniformand disparate input patterns.Thus, in this paper, an attention-enhanced pseudo-3Dresidual model is proposed to address the GAR problem, called HgaNets. This model comprises two independentcomponents designed formodeling visual RGB (red, green and blue) images and 3Dskeletal heatmaps, respectively.More specifically, each component consists of two main parts: 1) a multi-dimensional attention module forcapturing important spatial, temporal and feature information in human gestures;2) a spatiotemporal convolutionmodule that utilizes pseudo-3D residual convolution to characterize spatiotemporal features of gestures. Then,the output weights of the two components are fused to generate the recognition results. Finally, we conductedexperiments on four datasets to assess the efficiency of the proposed model. The results show that the accuracy onfour datasets reaches 85.40%, 91.91%, 94.70%, and 95.30%, respectively, as well as the inference time is 0.54 s andthe parameters is 2.74M. These findings highlight that the proposed model outperforms other existing approachesin terms of recognition accuracy. 展开更多
关键词 Gesture action recognition multi-dimensional attention pseudo-3D skeletal heatmap
在线阅读 下载PDF
Enhancing Human Action Recognition with Adaptive Hybrid Deep Attentive Networks and Archerfish Optimization
15
作者 Ahmad Yahiya Ahmad Bani Ahmad Jafar Alzubi +3 位作者 Sophers James Vincent Omollo Nyangaresi Chanthirasekaran Kutralakani Anguraju Krishnan 《Computers, Materials & Continua》 SCIE EI 2024年第9期4791-4812,共22页
In recent years,wearable devices-based Human Activity Recognition(HAR)models have received significant attention.Previously developed HAR models use hand-crafted features to recognize human activities,leading to the e... In recent years,wearable devices-based Human Activity Recognition(HAR)models have received significant attention.Previously developed HAR models use hand-crafted features to recognize human activities,leading to the extraction of basic features.The images captured by wearable sensors contain advanced features,allowing them to be analyzed by deep learning algorithms to enhance the detection and recognition of human actions.Poor lighting and limited sensor capabilities can impact data quality,making the recognition of human actions a challenging task.The unimodal-based HAR approaches are not suitable in a real-time environment.Therefore,an updated HAR model is developed using multiple types of data and an advanced deep-learning approach.Firstly,the required signals and sensor data are accumulated from the standard databases.From these signals,the wave features are retrieved.Then the extracted wave features and sensor data are given as the input to recognize the human activity.An Adaptive Hybrid Deep Attentive Network(AHDAN)is developed by incorporating a“1D Convolutional Neural Network(1DCNN)”with a“Gated Recurrent Unit(GRU)”for the human activity recognition process.Additionally,the Enhanced Archerfish Hunting Optimizer(EAHO)is suggested to fine-tune the network parameters for enhancing the recognition process.An experimental evaluation is performed on various deep learning networks and heuristic algorithms to confirm the effectiveness of the proposed HAR model.The EAHO-based HAR model outperforms traditional deep learning networks with an accuracy of 95.36,95.25 for recall,95.48 for specificity,and 95.47 for precision,respectively.The result proved that the developed model is effective in recognizing human action by taking less time.Additionally,it reduces the computation complexity and overfitting issue through using an optimization approach. 展开更多
关键词 Human action recognition multi-modal sensor data and signals adaptive hybrid deep attentive network enhanced archerfish hunting optimizer 1D convolutional neural network gated recurrent units
在线阅读 下载PDF
Action recognition using a hierarchy of feature groups
16
作者 周同驰 程旭 +3 位作者 李拟珺 徐勤军 周琳 吴镇扬 《Journal of Southeast University(English Edition)》 EI CAS 2015年第3期327-332,共6页
To improve the recognition performance of video human actions,an approach that models the video actions in a hierarchical way is proposed. This hierarchical model summarizes the action contents with different spatio-t... To improve the recognition performance of video human actions,an approach that models the video actions in a hierarchical way is proposed. This hierarchical model summarizes the action contents with different spatio-temporal domains according to the properties of human body movement.First,the temporal gradient combined with the constraint of coherent motion pattern is utilized to extract stable and dense motion features that are viewed as point features,then the mean-shift clustering algorithm with the adaptive scale kernel is used to label these features.After pooling the features with the same label to generate part-based representation,the visual word responses within one large scale volume are collected as video object representation.On the benchmark KTH(Kungliga Tekniska H?gskolan)and UCF (University of Central Florida)-sports action datasets,the experimental results show that the proposed method enhances the representative and discriminative power of action features, and improves recognition rates.Compared with other related literature,the proposed method obtains superior performance. 展开更多
关键词 action recognition coherent motion pattern feature groups part-based representation
在线阅读 下载PDF
Study of Human Action Recognition Based on Improved Spatio-temporal Features 被引量:7
17
作者 Xiao-Fei Ji Qian-Qian Wu +1 位作者 Zhao-Jie Ju Yang-Yang Wang 《International Journal of Automation and computing》 EI CSCD 2014年第5期500-509,共10页
Most of the exist action recognition methods mainly utilize spatio-temporal descriptors of single interest point while ignoring their potential integral information, such as spatial distribution information. By combin... Most of the exist action recognition methods mainly utilize spatio-temporal descriptors of single interest point while ignoring their potential integral information, such as spatial distribution information. By combining local spatio-temporal feature and global positional distribution information(PDI) of interest points, a novel motion descriptor is proposed in this paper. The proposed method detects interest points by using an improved interest point detection method. Then, 3-dimensional scale-invariant feature transform(3D SIFT) descriptors are extracted for every interest point. In order to obtain a compact description and efficient computation, the principal component analysis(PCA) method is utilized twice on the 3D SIFT descriptors of single frame and multiple frames. Simultaneously, the PDI of the interest points are computed and combined with the above features. The combined features are quantified and selected and finally tested by using the support vector machine(SVM) recognition algorithm on the public KTH dataset. The testing results have showed that the recognition rate has been significantly improved and the proposed features can more accurately describe human motion with high adaptability to scenarios. 展开更多
关键词 action recognition spatio-temporal interest points 3-dimensional scale-invariant feature transform (3D SIFT) positional distribution information dimension reduction
原文传递
Multiple Feature Fusion in Convolutional Neural Networks for Action Recognition 被引量:5
18
作者 LI Hongyang CHEN Jun HU Ruimin 《Wuhan University Journal of Natural Sciences》 CAS CSCD 2017年第1期73-78,共6页
Action recognition is important for understanding the human behaviors in the video,and the video representation is the basis for action recognition.This paper provides a new video representation based on convolution n... Action recognition is important for understanding the human behaviors in the video,and the video representation is the basis for action recognition.This paper provides a new video representation based on convolution neural networks(CNN).For capturing human motion information in one CNN,we take both the optical flow maps and gray images as input,and combine multiple convolutional features by max pooling across frames.In another CNN,we input single color frame to capture context information.Finally,we take the top full connected layer vectors as video representation and train the classifiers by linear support vector machine.The experimental results show that the representation which integrates the optical flow maps and gray images obtains more discriminative properties than those which depend on only one element.On the most challenging data sets HMDB51 and UCF101,this video representation obtains competitive performance. 展开更多
关键词 action recognition video deep-learned representa-tion convolutional neural network feature fusion
原文传递
Human Action Recognition Based on Supervised Class-Specific Dictionary Learning with Deep Convolutional Neural Network Features 被引量:6
19
作者 Binjie Gu 《Computers, Materials & Continua》 SCIE EI 2020年第4期243-262,共20页
Human action recognition under complex environment is a challenging work.Recently,sparse representation has achieved excellent results of dealing with human action recognition problem under different conditions.The ma... Human action recognition under complex environment is a challenging work.Recently,sparse representation has achieved excellent results of dealing with human action recognition problem under different conditions.The main idea of sparse representation classification is to construct a general classification scheme where the training samples of each class can be considered as the dictionary to express the query class,and the minimal reconstruction error indicates its corresponding class.However,how to learn a discriminative dictionary is still a difficult work.In this work,we make two contributions.First,we build a new and robust human action recognition framework by combining one modified sparse classification model and deep convolutional neural network(CNN)features.Secondly,we construct a novel classification model which consists of the representation-constrained term and the coefficients incoherence term.Experimental results on benchmark datasets show that our modified model can obtain competitive results in comparison to other state-of-the-art models. 展开更多
关键词 action recognition deep CNN features sparse model supervised dictionary learning
在线阅读 下载PDF
Multi-Layered Deep Learning Features Fusion for Human Action Recognition 被引量:4
20
作者 Sadia Kiran Muhammad Attique Khan +5 位作者 Muhammad Younus Javed Majed Alhaisoni Usman Tariq Yunyoung Nam Robertas Damaševicius Muhammad Sharif 《Computers, Materials & Continua》 SCIE EI 2021年第12期4061-4075,共15页
Human Action Recognition(HAR)is an active research topic in machine learning for the last few decades.Visual surveillance,robotics,and pedestrian detection are the main applications for action recognition.Computer vis... Human Action Recognition(HAR)is an active research topic in machine learning for the last few decades.Visual surveillance,robotics,and pedestrian detection are the main applications for action recognition.Computer vision researchers have introduced many HAR techniques,but they still face challenges such as redundant features and the cost of computing.In this article,we proposed a new method for the use of deep learning for HAR.In the proposed method,video frames are initially pre-processed using a global contrast approach and later used to train a deep learning model using domain transfer learning.The Resnet-50 Pre-Trained Model is used as a deep learning model in this work.Features are extracted from two layers:Global Average Pool(GAP)and Fully Connected(FC).The features of both layers are fused by the Canonical Correlation Analysis(CCA).Then features are selected using the Shanon Entropy-based threshold function.The selected features are finally passed to multiple classifiers for final classification.Experiments are conducted on five publicly available datasets as IXMAS,UCF Sports,YouTube,UT-Interaction,and KTH.The accuracy of these data sets was 89.6%,99.7%,100%,96.7%and 96.6%,respectively.Comparison with existing techniques has shown that the proposed method provides improved accuracy for HAR.Also,the proposed method is computationally fast based on the time of execution. 展开更多
关键词 action recognition transfer learning features fusion features selection CLASSIFICATION
在线阅读 下载PDF
上一页 1 2 5 下一页 到第
使用帮助 返回顶部