期刊文献+
共找到3篇文章
< 1 >
每页显示 20 50 100
An Efficient Temporal Decoding Module for Action Recognition
1
作者 HUANG Qiubo MEI Jianmin +3 位作者 ZHAO Wupeng LU Yiru WANG Mei CHEN Dehua 《Journal of Donghua University(English Edition)》 2025年第2期187-196,共10页
Action recognition,a fundamental task in the field of video understanding,has been extensively researched and applied.In contrast to an image,a video introduces an extra temporal dimension.However,many existing action... Action recognition,a fundamental task in the field of video understanding,has been extensively researched and applied.In contrast to an image,a video introduces an extra temporal dimension.However,many existing action recognition networks either perform simple temporal fusion through averaging or rely on pre-trained models from image recognition,resulting in limited temporal information extraction capabilities.This work proposes a highly efficient temporal decoding module that can be seamlessly integrated into any action recognition backbone network to enhance the focus on temporal relationships between video frames.Firstly,the decoder initializes a set of learnable queries,termed video-level action category prediction queries.Then,they are combined with the video frame features extracted by the backbone network after self-attention learning to extract video context information.Finally,these prediction queries with rich temporal features are used for category prediction.Experimental results on HMDB51,MSRDailyAct3D,Diving48 and Breakfast datasets show that using TokShift-Transformer and VideoMAE as encoders results in a significant improvement in Top-1 accuracy compared to the original models(TokShift-Transformer and VideoMAE),after introducing the proposed temporal decoder.The introduction of the temporal decoder results in an average performance increase exceeding 11%for TokShift-Transformer and nearly 5%for VideoMAE across the four datasets.Furthermore,the work explores the combination of the decoder with various action recognition networks,including Timesformer,as encoders.This results in an average accuracy improvement of more than 3.5%on the HMDB51 dataset.The code is available at https://github.com/huangturbo/TempDecoder. 展开更多
关键词 action recognition video understanding temporal relationship temporal decoder TRANSFORMER
在线阅读 下载PDF
Space cannot substitute for time in the study of the ecosystem services-human wellbeing relationship
2
作者 Lumeng Liu Jianguo Wu 《Geography and Sustainability》 2025年第2期57-68,共12页
The relationship between ecosystem services(ES)and human well-being(HWB)is fundamental to the science and practice of sustainability.However,studies have shown conflicting results,which has been attributed to the infl... The relationship between ecosystem services(ES)and human well-being(HWB)is fundamental to the science and practice of sustainability.However,studies have shown conflicting results,which has been attributed to the influences of indicators,contexts,and scales.Yet,another potential factor,which has been overlooked,may be the mixed use of spatial and temporal approaches.Using twelve ES and seven well-being indicators and multiple statistical methods,we quantified and compared the spatial and temporal ES–HWB relationships for Inner Mongolia,China.The spatial and temporal relationships differed in both correlation direction and strength.Most relationships of economic and employment-related indicators with food provisioning and supporting services were temporally positive but spatially nonsignificant or negative.Some relationships of economic and employmentrelated indicators with water retention,sandstorm prevention,and wind erosion were temporally negative but spatially complex.However,the spatial and temporal ES–HWB relationships could also be similar in some cases.We conclude that although both the spatial and temporal approaches have merits,space generally cannot substitute for time in the study of ES–HWB relationship.Our study helps reconcile the seemingly conflicting findings in the literature,and suggests that future studies should explicitly distinguish between the spatial and temporal ES–HWB relationships. 展开更多
关键词 Ecosystem services Objective human well-being Space-for-time substitution Spatial relationship temporal relationship
在线阅读 下载PDF
Action Recognition Using Multi-Scale Temporal Shift Module and Temporal Feature Difference Extraction Based on 2D CNN
3
作者 Kun-Hsuan Wu Ching-Te Chiu 《Journal of Software Engineering and Applications》 2021年第5期172-188,共17页
<span style="font-family:Verdana;">Convolutional neural networks, which have achieved outstanding performance in image recognition, have been extensively applied to action recognition. The mainstream a... <span style="font-family:Verdana;">Convolutional neural networks, which have achieved outstanding performance in image recognition, have been extensively applied to action recognition. The mainstream approaches to video understanding can be categorized into two-dimensional and three-dimensional convolutional neural networks. Although three-dimensional convolutional filters can learn the temporal correlation between different frames by extracting the features of multiple frames simultaneously, it results in an explosive number of parameters and calculation cost. Methods based on two-dimensional convolutional neural networks use fewer parameters;they often incorporate optical flow to compensate for their inability to learn temporal relationships. However, calculating the corresponding optical flow results in additional calculation cost;further, it necessitates the use of another model to learn the features of optical flow. We proposed an action recognition framework based on the two-dimensional convolutional neural network;therefore, it was necessary to resolve the lack of temporal relationships. To expand the temporal receptive field, we proposed a multi-scale temporal shift module, which was then combined with a temporal feature difference extraction module to extract the difference between the features of different frames. Finally, the model was compressed to make it more compact. We evaluated our method on two major action recognition benchmarks: the HMDB51 and UCF-101 datasets. Before compression, the proposed method achieved an accuracy of 72.83% on the HMDB51 dataset and 96.25% on the UCF-101 dataset. Following compression, the accuracy was still impressive, at 95.57% and 72.19% on each dataset. The final model was more compact than most related works.</span> 展开更多
关键词 Action Recognition Convolutional Neural Network 2D CNN temporal relationship
在线阅读 下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部