The ever-growing available visual data(i.e.,uploaded videos and pictures by internet users)has attracted the research community’s attention in the computer vision field.Therefore,finding efficient solutions to extrac...The ever-growing available visual data(i.e.,uploaded videos and pictures by internet users)has attracted the research community’s attention in the computer vision field.Therefore,finding efficient solutions to extract knowledge from these sources is imperative.Recently,the BlazePose system has been released for skeleton extraction from images oriented to mobile devices.With this skeleton graph representation in place,a Spatial-Temporal Graph Convolutional Network can be implemented to predict the action.We hypothesize that just by changing the skeleton input data for a different set of joints that offers more information about the action of interest,it is possible to increase the performance of the Spatial-Temporal Graph Convolutional Network for HAR tasks.Hence,in this study,we present the first implementation of the BlazePose skeleton topology upon this architecture for action recognition.Moreover,we propose the Enhanced-BlazePose topology that can achieve better results than its predecessor.Additionally,we propose different skeleton detection thresholds that can improve the accuracy performance even further.We reached a top-1 accuracy performance of 40.1%on the Kinetics dataset.For the NTU-RGB+D dataset,we achieved 87.59%and 92.1%accuracy for Cross-Subject and Cross-View evaluation criteria,respectively.展开更多
Action recognition has been recognized as an activity in which individuals’behaviour can be observed.Assembling profiles of regular activities such as activities of daily living can support identifying trends in the ...Action recognition has been recognized as an activity in which individuals’behaviour can be observed.Assembling profiles of regular activities such as activities of daily living can support identifying trends in the data during critical events.A skeleton representation of the human body has been proven to be effective for this task.The skeletons are presented in graphs form-like.However,the topology of a graph is not structured like Euclideanbased data.Therefore,a new set of methods to perform the convolution operation upon the skeleton graph is proposed.Our proposal is based on the Spatial Temporal-Graph Convolutional Network(ST-GCN)framework.In this study,we proposed an improved set of label mapping methods for the ST-GCN framework.We introduce three split techniques(full distance split,connection split,and index split)as an alternative approach for the convolution operation.The experiments presented in this study have been trained using two benchmark datasets:NTU-RGB+D and Kinetics to evaluate the performance.Our results indicate that our split techniques outperform the previous partition strategies and aremore stable during training without using the edge importance weighting additional training parameter.Therefore,our proposal can provide a more realistic solution for real-time applications centred on daily living recognition systems activities for indoor environments.展开更多
Traffic flow forecasting plays a crucial role and is the key technology to realize dynamic traffic guidance and active traffic control in intelligent traffic systems(ITS).Aiming at the complex local and global spatial...Traffic flow forecasting plays a crucial role and is the key technology to realize dynamic traffic guidance and active traffic control in intelligent traffic systems(ITS).Aiming at the complex local and global spatial-temporal dynamic characteristics of traffic flow,this paper proposes a new traffic flow forecasting model spatial-temporal attention graph neural network(STA-GNN)by combining at-tention mechanism(AM)and spatial-temporal convolutional network.The model learns the hidden dynamic local spatial correlations of the traffic network by combining the dynamic adjacency matrix constructed by the graph learning layer with the graph convolutional network(GCN).The local tem-poral correlations of traffic flow at different scales are extracted by stacking multiple convolutional kernels in temporal convolutional network(TCN).And the global spatial-temporal dependencies of long-time sequences of traffic flow are captured by the spatial-temporal attention mechanism(STAtt),which enhances the global spatial-temporal modeling and the representational ability of model.The experimental results on two datasets,METR-LA and PEMS-BAY,show the proposed STA-GNN model outperforms the common baseline models in forecasting accuracy.展开更多
In recent years,gait-based emotion recognition has been widely applied in the field of computer vision.However,existing gait emotion recognition methods typically rely on complete human skeleton data,and their accurac...In recent years,gait-based emotion recognition has been widely applied in the field of computer vision.However,existing gait emotion recognition methods typically rely on complete human skeleton data,and their accuracy significantly declines when the data is occluded.To enhance the accuracy of gait emotion recognition under occlusion,this paper proposes a Multi-scale Suppression Graph ConvolutionalNetwork(MS-GCN).TheMS-GCN consists of three main components:Joint Interpolation Module(JI Moudle),Multi-scale Temporal Convolution Network(MS-TCN),and Suppression Graph Convolutional Network(SGCN).The JI Module completes the spatially occluded skeletal joints using the(K-Nearest Neighbors)KNN interpolation method.The MS-TCN employs convolutional kernels of various sizes to comprehensively capture the emotional information embedded in the gait,compensating for the temporal occlusion of gait information.The SGCN extracts more non-prominent human gait features by suppressing the extraction of key body part features,thereby reducing the negative impact of occlusion on emotion recognition results.The proposed method is evaluated on two comprehensive datasets:Emotion-Gait,containing 4227 real gaits from sources like BML,ICT-Pollick,and ELMD,and 1000 synthetic gaits generated using STEP-Gen technology,and ELMB,consisting of 3924 gaits,with 1835 labeled with emotions such as“Happy,”“Sad,”“Angry,”and“Neutral.”On the standard datasets Emotion-Gait and ELMB,the proposed method achieved accuracies of 0.900 and 0.896,respectively,attaining performance comparable to other state-ofthe-artmethods.Furthermore,on occlusion datasets,the proposedmethod significantly mitigates the performance degradation caused by occlusion compared to other methods,the accuracy is significantly higher than that of other methods.展开更多
Predicting human motion based on historical motion sequences is a fundamental problem in computer vision,which is at the core of many applications.Existing approaches primarily focus on encoding spatial dependencies a...Predicting human motion based on historical motion sequences is a fundamental problem in computer vision,which is at the core of many applications.Existing approaches primarily focus on encoding spatial dependencies among human joints while ignoring the temporal cues and the complex relationships across non-consecutive frames.These limitations hinder the model’s ability to generate accurate predictions over longer time horizons and in scenarios with complex motion patterns.To address the above problems,we proposed a novel multi-level spatial and temporal learning model,which consists of a Cross Spatial Dependencies Encoding Module(CSM)and a Dynamic Temporal Connection Encoding Module(DTM).Specifically,the CSM is designed to capture complementary local and global spatial dependent information at both the joint level and the joint pair level.We further present DTM to encode diverse temporal evolution contexts and compress motion features to a deep level,enabling the model to capture both short-term and long-term dependencies efficiently.Extensive experiments conducted on the Human 3.6M and CMU Mocap datasets demonstrate that our model achieves state-of-the-art performance in both short-term and long-term predictions,outperforming existing methods by up to 20.3% in accuracy.Furthermore,ablation studies confirm the significant contributions of the CSM and DTM in enhancing prediction accuracy.展开更多
Spatio-temporal heterogeneous data is the database for decisionmaking in many fields,and checking its accuracy can provide data support for making decisions.Due to the randomness,complexity,global and local correlatio...Spatio-temporal heterogeneous data is the database for decisionmaking in many fields,and checking its accuracy can provide data support for making decisions.Due to the randomness,complexity,global and local correlation of spatiotemporal heterogeneous data in the temporal and spatial dimensions,traditional detection methods can not guarantee both detection speed and accuracy.Therefore,this article proposes a method for detecting the accuracy of spatiotemporal heterogeneous data by fusing graph convolution and temporal convolution networks.Firstly,the geographic weighting function is introduced and improved to quantify the degree of association between nodes and calculate the weighted adjacency value to simplify the complex topology.Secondly,design spatiotemporal convolutional units based on graph convolutional neural networks and temporal convolutional networks to improve detection speed and accuracy.Finally,the proposed method is compared with three methods,ARIMA,T-GCN,and STGCN,in real scenarios to verify its effectiveness in terms of detection speed,detection accuracy and stability.The experimental results show that the RMSE,MAE,and MAPE of this method are the smallest in the cases of simple connectivity and complex connectivity degree,which are 13.82/12.08,2.77/2.41,and 16.70/14.73,respectively.Also,it detects the shortest time of 672.31/887.36,respectively.In addition,the evaluation results are the same under different time periods of processing and complex topology environment,which indicates that the detection accuracy of this method is the highest and has good research value and application prospects.展开更多
The global clustering of inventive talent shapes innovation capacity and drives economic growth.For China,this process is especially crucial in sustaining its development momentum.This paper draws on data from the EPO...The global clustering of inventive talent shapes innovation capacity and drives economic growth.For China,this process is especially crucial in sustaining its development momentum.This paper draws on data from the EPO Worldwide Patent Statistical Database(PATSTAT)to extract global inventive talent mobility information and analyzes the spatial structural evolution of the global inventive talent flow network.The study finds that this network is undergoing a multi-polar transformation,characterized by the rising importance of a few central countries-such as the United States,Germany,and China-and the increasing marginalization of many peripheral countries.In response to this typical phenomenon,the paper constructs an endogenous migration model and conducts empirical testing using the Temporal Exponential Random Graph Model(TERGM).The results reveal several endogenous mechanisms driving global inventive talent flows,including reciprocity,path dependence,convergence effects,transitivity,and cyclic structures,all of which contribute to the network’s multi-polar trend.In addition,differences in regional industrial structures significantly influence talent mobility choices and are a decisive factor in the formation of poles within the multi-polar landscape.Based on these findings,it is suggested that efforts be made to foster two-way channels for talent exchange between China and other global innovation hubs,in order to enhance international collaboration and knowledge flow.We should aim to reduce the migration costs and institutional barriers faced by R&D personnel,thereby encouraging greater mobility of high-skilled talent.Furthermore,the government is advised to strategically leverage regional strengths in high-tech industries as a lever to capture competitive advantages in emerging technologies and products,ultimately strengthening the country’s position in the global innovation landscape.展开更多
Video summarization has established itself as a fundamental technique for generating compact and concise video, which alleviates managing and browsing large-scale video data. Existing methods fail to fully consider th...Video summarization has established itself as a fundamental technique for generating compact and concise video, which alleviates managing and browsing large-scale video data. Existing methods fail to fully consider the local and global relations among frames of video, leading to a deteriorated summarization performance. To address the above problem, we propose a graph convolutional attention network(GCAN) for video summarization. GCAN consists of two parts, embedding learning and context fusion, where embedding learning includes the temporal branch and graph branch. In particular, GCAN uses dilated temporal convolution to model local cues and temporal self-attention to exploit global cues for video frames. It learns graph embedding via a multi-layer graph convolutional network to reveal the intrinsic structure of frame samples. The context fusion part combines the output streams from the temporal branch and graph branch to create the context-aware representation of frames, on which the importance scores are evaluated for selecting representative frames to generate video summary. Experiments are carried out on two benchmark databases, Sum Me and TVSum, showing that the proposed GCAN approach enjoys superior performance compared to several state-of-the-art alternatives in three evaluation settings.展开更多
Background Despite the recent progress in 3D point cloud processing using deep convolutional neural networks,the inability to extract local features remains a challenging problem.In addition,existing methods consider ...Background Despite the recent progress in 3D point cloud processing using deep convolutional neural networks,the inability to extract local features remains a challenging problem.In addition,existing methods consider only the spatial domain in the feature extraction process.Methods In this paper,we propose a spectral and spatial aggregation convolutional network(S^(2)ANet),which combines spectral and spatial features for point cloud processing.First,we calculate the local frequency of the point cloud in the spectral domain.Then,we use the local frequency to group points and provide a spectral aggregation convolution module to extract the features of the points grouped by the local frequency.We simultaneously extract the local features in the spatial domain to supplement the final features.Results S^(2)ANet was applied in several point cloud analysis tasks;it achieved stateof-the-art classification accuracies of 93.8%,88.0%,and 83.1%on the ModelNet40,ShapeNetCore,and ScanObjectNN datasets,respectively.For indoor scene segmentation,training and testing were performed on the S3DIS dataset,and the mean intersection over union was 62.4%.Conclusions The proposed S^(2)ANet can effectively capture the local geometric information of point clouds,thereby improving accuracy on various tasks.展开更多
文摘The ever-growing available visual data(i.e.,uploaded videos and pictures by internet users)has attracted the research community’s attention in the computer vision field.Therefore,finding efficient solutions to extract knowledge from these sources is imperative.Recently,the BlazePose system has been released for skeleton extraction from images oriented to mobile devices.With this skeleton graph representation in place,a Spatial-Temporal Graph Convolutional Network can be implemented to predict the action.We hypothesize that just by changing the skeleton input data for a different set of joints that offers more information about the action of interest,it is possible to increase the performance of the Spatial-Temporal Graph Convolutional Network for HAR tasks.Hence,in this study,we present the first implementation of the BlazePose skeleton topology upon this architecture for action recognition.Moreover,we propose the Enhanced-BlazePose topology that can achieve better results than its predecessor.Additionally,we propose different skeleton detection thresholds that can improve the accuracy performance even further.We reached a top-1 accuracy performance of 40.1%on the Kinetics dataset.For the NTU-RGB+D dataset,we achieved 87.59%and 92.1%accuracy for Cross-Subject and Cross-View evaluation criteria,respectively.
文摘Action recognition has been recognized as an activity in which individuals’behaviour can be observed.Assembling profiles of regular activities such as activities of daily living can support identifying trends in the data during critical events.A skeleton representation of the human body has been proven to be effective for this task.The skeletons are presented in graphs form-like.However,the topology of a graph is not structured like Euclideanbased data.Therefore,a new set of methods to perform the convolution operation upon the skeleton graph is proposed.Our proposal is based on the Spatial Temporal-Graph Convolutional Network(ST-GCN)framework.In this study,we proposed an improved set of label mapping methods for the ST-GCN framework.We introduce three split techniques(full distance split,connection split,and index split)as an alternative approach for the convolution operation.The experiments presented in this study have been trained using two benchmark datasets:NTU-RGB+D and Kinetics to evaluate the performance.Our results indicate that our split techniques outperform the previous partition strategies and aremore stable during training without using the edge importance weighting additional training parameter.Therefore,our proposal can provide a more realistic solution for real-time applications centred on daily living recognition systems activities for indoor environments.
基金Supported by the Key R&D Program of Gansu Province(No.23YFGA0063)the National Natural Science Foundation of China(No.62363022,61663021)+1 种基金the Natural Science Foundation of Gansu Province(No.22JR5RA226,23JRRA886)the Gansu Provincial De-partment of Education:Industrial Support Plan Project(No.2023CYZC-35).
文摘Traffic flow forecasting plays a crucial role and is the key technology to realize dynamic traffic guidance and active traffic control in intelligent traffic systems(ITS).Aiming at the complex local and global spatial-temporal dynamic characteristics of traffic flow,this paper proposes a new traffic flow forecasting model spatial-temporal attention graph neural network(STA-GNN)by combining at-tention mechanism(AM)and spatial-temporal convolutional network.The model learns the hidden dynamic local spatial correlations of the traffic network by combining the dynamic adjacency matrix constructed by the graph learning layer with the graph convolutional network(GCN).The local tem-poral correlations of traffic flow at different scales are extracted by stacking multiple convolutional kernels in temporal convolutional network(TCN).And the global spatial-temporal dependencies of long-time sequences of traffic flow are captured by the spatial-temporal attention mechanism(STAtt),which enhances the global spatial-temporal modeling and the representational ability of model.The experimental results on two datasets,METR-LA and PEMS-BAY,show the proposed STA-GNN model outperforms the common baseline models in forecasting accuracy.
基金supported by the National Natural Science Foundation of China(62272049,62236006,62172045)the Key Projects of Beijing Union University(ZKZD202301).
文摘In recent years,gait-based emotion recognition has been widely applied in the field of computer vision.However,existing gait emotion recognition methods typically rely on complete human skeleton data,and their accuracy significantly declines when the data is occluded.To enhance the accuracy of gait emotion recognition under occlusion,this paper proposes a Multi-scale Suppression Graph ConvolutionalNetwork(MS-GCN).TheMS-GCN consists of three main components:Joint Interpolation Module(JI Moudle),Multi-scale Temporal Convolution Network(MS-TCN),and Suppression Graph Convolutional Network(SGCN).The JI Module completes the spatially occluded skeletal joints using the(K-Nearest Neighbors)KNN interpolation method.The MS-TCN employs convolutional kernels of various sizes to comprehensively capture the emotional information embedded in the gait,compensating for the temporal occlusion of gait information.The SGCN extracts more non-prominent human gait features by suppressing the extraction of key body part features,thereby reducing the negative impact of occlusion on emotion recognition results.The proposed method is evaluated on two comprehensive datasets:Emotion-Gait,containing 4227 real gaits from sources like BML,ICT-Pollick,and ELMD,and 1000 synthetic gaits generated using STEP-Gen technology,and ELMB,consisting of 3924 gaits,with 1835 labeled with emotions such as“Happy,”“Sad,”“Angry,”and“Neutral.”On the standard datasets Emotion-Gait and ELMB,the proposed method achieved accuracies of 0.900 and 0.896,respectively,attaining performance comparable to other state-ofthe-artmethods.Furthermore,on occlusion datasets,the proposedmethod significantly mitigates the performance degradation caused by occlusion compared to other methods,the accuracy is significantly higher than that of other methods.
基金supported by the Urgent Need for Overseas Talent Project of Jiangxi Province(Grant No.20223BCJ25040)the Thousand Talents Plan of Jiangxi Province(Grant No.jxsg2023101085)+3 种基金the National Natural Science Foundation of China(Grant No.62106093)the Natural Science Foundation of Jiangxi(Grant Nos.20224BAB212011,20232BAB212008,20242BAB25078,and 20232BAB202051)The Youth Talent Cultivation Innovation Fund Project of Nanchang University(Grant No.XX202506030015)funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2025R759),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Predicting human motion based on historical motion sequences is a fundamental problem in computer vision,which is at the core of many applications.Existing approaches primarily focus on encoding spatial dependencies among human joints while ignoring the temporal cues and the complex relationships across non-consecutive frames.These limitations hinder the model’s ability to generate accurate predictions over longer time horizons and in scenarios with complex motion patterns.To address the above problems,we proposed a novel multi-level spatial and temporal learning model,which consists of a Cross Spatial Dependencies Encoding Module(CSM)and a Dynamic Temporal Connection Encoding Module(DTM).Specifically,the CSM is designed to capture complementary local and global spatial dependent information at both the joint level and the joint pair level.We further present DTM to encode diverse temporal evolution contexts and compress motion features to a deep level,enabling the model to capture both short-term and long-term dependencies efficiently.Extensive experiments conducted on the Human 3.6M and CMU Mocap datasets demonstrate that our model achieves state-of-the-art performance in both short-term and long-term predictions,outperforming existing methods by up to 20.3% in accuracy.Furthermore,ablation studies confirm the significant contributions of the CSM and DTM in enhancing prediction accuracy.
基金supported by the National Natural Science Foundation of China under Grants 42172161by the Heilongjiang Provincial Natural Science Foundation of China under Grant LH2020F003+2 种基金by the Heilongjiang Provincial Department of Education Project of China under Grants UNPYSCT-2020144by the Innovation Guidance Fund of Heilongjiang Province of China under Grants 15071202202by the Science and Technology Bureau Project of Qinhuangdao Province of China under Grants 202101A226.
文摘Spatio-temporal heterogeneous data is the database for decisionmaking in many fields,and checking its accuracy can provide data support for making decisions.Due to the randomness,complexity,global and local correlation of spatiotemporal heterogeneous data in the temporal and spatial dimensions,traditional detection methods can not guarantee both detection speed and accuracy.Therefore,this article proposes a method for detecting the accuracy of spatiotemporal heterogeneous data by fusing graph convolution and temporal convolution networks.Firstly,the geographic weighting function is introduced and improved to quantify the degree of association between nodes and calculate the weighted adjacency value to simplify the complex topology.Secondly,design spatiotemporal convolutional units based on graph convolutional neural networks and temporal convolutional networks to improve detection speed and accuracy.Finally,the proposed method is compared with three methods,ARIMA,T-GCN,and STGCN,in real scenarios to verify its effectiveness in terms of detection speed,detection accuracy and stability.The experimental results show that the RMSE,MAE,and MAPE of this method are the smallest in the cases of simple connectivity and complex connectivity degree,which are 13.82/12.08,2.77/2.41,and 16.70/14.73,respectively.Also,it detects the shortest time of 672.31/887.36,respectively.In addition,the evaluation results are the same under different time periods of processing and complex topology environment,which indicates that the detection accuracy of this method is the highest and has good research value and application prospects.
基金supported by the Major Project of the National Social Science Fund of China,titled“Design Path Selection for the Mechanism of New and Old Growth Driver Conversion”(Grant No.18ZDA077)by the Joint Special Major Research Project of the Yangtze River Delta Economics and Social Development Research Center at Nanjing University and the Collaborative Innovation Center for China Economy(CICCE),titled“Practicing Innovation in China’s Development Economics for the Yangtze River Delta:From Industrial Clusters to Technological Clusters”(Grant No.CYD2022006).
文摘The global clustering of inventive talent shapes innovation capacity and drives economic growth.For China,this process is especially crucial in sustaining its development momentum.This paper draws on data from the EPO Worldwide Patent Statistical Database(PATSTAT)to extract global inventive talent mobility information and analyzes the spatial structural evolution of the global inventive talent flow network.The study finds that this network is undergoing a multi-polar transformation,characterized by the rising importance of a few central countries-such as the United States,Germany,and China-and the increasing marginalization of many peripheral countries.In response to this typical phenomenon,the paper constructs an endogenous migration model and conducts empirical testing using the Temporal Exponential Random Graph Model(TERGM).The results reveal several endogenous mechanisms driving global inventive talent flows,including reciprocity,path dependence,convergence effects,transitivity,and cyclic structures,all of which contribute to the network’s multi-polar trend.In addition,differences in regional industrial structures significantly influence talent mobility choices and are a decisive factor in the formation of poles within the multi-polar landscape.Based on these findings,it is suggested that efforts be made to foster two-way channels for talent exchange between China and other global innovation hubs,in order to enhance international collaboration and knowledge flow.We should aim to reduce the migration costs and institutional barriers faced by R&D personnel,thereby encouraging greater mobility of high-skilled talent.Furthermore,the government is advised to strategically leverage regional strengths in high-tech industries as a lever to capture competitive advantages in emerging technologies and products,ultimately strengthening the country’s position in the global innovation landscape.
基金Project supported by the National Natural Science Foundation of China (Nos. 61872122 and 61502131)the Zhejiang Provincial Natural Science Foundation of China (No. LY18F020015)+1 种基金the Open Pro ject Program of the State Key Lab of CAD&CG,China (No. 1802)the Zhejiang Provincial Key Research and Development Program,China (No. 2020C01067)。
文摘Video summarization has established itself as a fundamental technique for generating compact and concise video, which alleviates managing and browsing large-scale video data. Existing methods fail to fully consider the local and global relations among frames of video, leading to a deteriorated summarization performance. To address the above problem, we propose a graph convolutional attention network(GCAN) for video summarization. GCAN consists of two parts, embedding learning and context fusion, where embedding learning includes the temporal branch and graph branch. In particular, GCAN uses dilated temporal convolution to model local cues and temporal self-attention to exploit global cues for video frames. It learns graph embedding via a multi-layer graph convolutional network to reveal the intrinsic structure of frame samples. The context fusion part combines the output streams from the temporal branch and graph branch to create the context-aware representation of frames, on which the importance scores are evaluated for selecting representative frames to generate video summary. Experiments are carried out on two benchmark databases, Sum Me and TVSum, showing that the proposed GCAN approach enjoys superior performance compared to several state-of-the-art alternatives in three evaluation settings.
文摘Background Despite the recent progress in 3D point cloud processing using deep convolutional neural networks,the inability to extract local features remains a challenging problem.In addition,existing methods consider only the spatial domain in the feature extraction process.Methods In this paper,we propose a spectral and spatial aggregation convolutional network(S^(2)ANet),which combines spectral and spatial features for point cloud processing.First,we calculate the local frequency of the point cloud in the spectral domain.Then,we use the local frequency to group points and provide a spectral aggregation convolution module to extract the features of the points grouped by the local frequency.We simultaneously extract the local features in the spatial domain to supplement the final features.Results S^(2)ANet was applied in several point cloud analysis tasks;it achieved stateof-the-art classification accuracies of 93.8%,88.0%,and 83.1%on the ModelNet40,ShapeNetCore,and ScanObjectNN datasets,respectively.For indoor scene segmentation,training and testing were performed on the S3DIS dataset,and the mean intersection over union was 62.4%.Conclusions The proposed S^(2)ANet can effectively capture the local geometric information of point clouds,thereby improving accuracy on various tasks.