Reliable traffic flow prediction is crucial for mitigating urban congestion.This paper proposes Attentionbased spatiotemporal Interactive Dynamic Graph Convolutional Network(AIDGCN),a novel architecture integrating In...Reliable traffic flow prediction is crucial for mitigating urban congestion.This paper proposes Attentionbased spatiotemporal Interactive Dynamic Graph Convolutional Network(AIDGCN),a novel architecture integrating Interactive Dynamic Graph Convolution Network(IDGCN)with Temporal Multi-Head Trend-Aware Attention.Its core innovation lies in IDGCN,which uniquely splits sequences into symmetric intervals for interactive feature sharing via dynamic graphs,and a novel attention mechanism incorporating convolutional operations to capture essential local traffic trends—addressing a critical gap in standard attention for continuous data.For 15-and 60-min forecasting on METR-LA,AIDGCN achieves MAEs of 0.75%and 0.39%,and RMSEs of 1.32%and 0.14%,respectively.In the 60-min long-term forecasting of the PEMS-BAY dataset,the AIDGCN out-performs the MRA-BGCN method by 6.28%,4.93%,and 7.17%in terms of MAE,RMSE,and MAPE,respectively.Experimental results demonstrate the superiority of our pro-posed model over state-of-the-art methods.展开更多
Continuous sign language recognition(CSLR)is challenging due to the complexity of video background,hand gesture variability,and temporal modeling difficulties.This work proposes a CSLR method based on a spatialtempora...Continuous sign language recognition(CSLR)is challenging due to the complexity of video background,hand gesture variability,and temporal modeling difficulties.This work proposes a CSLR method based on a spatialtemporal graph attention network to focus on essential features of video series.The method considers local details of sign language movements by taking the information on joints and bones as inputs and constructing a spatialtemporal graph to reflect inter-frame relevance and physical connections between nodes.The graph-based multihead attention mechanism is utilized with adjacent matrix calculation for better local-feature exploration,and short-term motion correlation modeling is completed via a temporal convolutional network.We adopted BLSTM to learn the long-termdependence and connectionist temporal classification to align the word-level sequences.The proposed method achieves competitive results regarding word error rates(1.59%)on the Chinese Sign Language dataset and the mean Jaccard Index(65.78%)on the ChaLearn LAP Continuous Gesture Dataset.展开更多
Multivariate time series forecasting plays a crucial role in decision-making for systems like energy grids and transportation networks,where temporal patterns emerge across diverse scales from short-term fluctuations ...Multivariate time series forecasting plays a crucial role in decision-making for systems like energy grids and transportation networks,where temporal patterns emerge across diverse scales from short-term fluctuations to long-term trends.However,existing Transformer-based methods often process data at a single resolution or handle multiple scales independently,overlooking critical cross-scale interactions that influence prediction accuracy.To address this gap,we introduce the Hierarchical Attention Transformer(HAT),which enables direct information exchange between temporal hierarchies through a novel cross-scale attention mechanism.HAT extracts multi-scale features using hierarchical convolutional-recurrent blocks,fuses them via temperature-controlled mechanisms,and optimizes gradient flow with residual connections for stable training.Evaluations on eight benchmark datasets show HAT outperforming state-of-the-art baselines,with average reductions of 8.2%in MSE and 7.5%in MAE across horizons,while achieving a 6.1×training speedup over patch-based methods.These advancements highlight HAT’s potential for applications requiring multi-resolution temporal modeling.展开更多
Dear Editor,This letter proposes the graph tensor alliance attention network(GT-A^(2)T)to represent a dynamic graph(DG)precisely.Its main idea includes 1)Establishing a unified spatio-temporal message propagation fram...Dear Editor,This letter proposes the graph tensor alliance attention network(GT-A^(2)T)to represent a dynamic graph(DG)precisely.Its main idea includes 1)Establishing a unified spatio-temporal message propagation framework on a DG via the tensor product for capturing the complex cohesive spatio-temporal interdependencies precisely and 2)Acquiring the alliance attention scores by node features and favorable high-order structural correlations.展开更多
In recent years,gait-based emotion recognition has been widely applied in the field of computer vision.However,existing gait emotion recognition methods typically rely on complete human skeleton data,and their accurac...In recent years,gait-based emotion recognition has been widely applied in the field of computer vision.However,existing gait emotion recognition methods typically rely on complete human skeleton data,and their accuracy significantly declines when the data is occluded.To enhance the accuracy of gait emotion recognition under occlusion,this paper proposes a Multi-scale Suppression Graph ConvolutionalNetwork(MS-GCN).TheMS-GCN consists of three main components:Joint Interpolation Module(JI Moudle),Multi-scale Temporal Convolution Network(MS-TCN),and Suppression Graph Convolutional Network(SGCN).The JI Module completes the spatially occluded skeletal joints using the(K-Nearest Neighbors)KNN interpolation method.The MS-TCN employs convolutional kernels of various sizes to comprehensively capture the emotional information embedded in the gait,compensating for the temporal occlusion of gait information.The SGCN extracts more non-prominent human gait features by suppressing the extraction of key body part features,thereby reducing the negative impact of occlusion on emotion recognition results.The proposed method is evaluated on two comprehensive datasets:Emotion-Gait,containing 4227 real gaits from sources like BML,ICT-Pollick,and ELMD,and 1000 synthetic gaits generated using STEP-Gen technology,and ELMB,consisting of 3924 gaits,with 1835 labeled with emotions such as“Happy,”“Sad,”“Angry,”and“Neutral.”On the standard datasets Emotion-Gait and ELMB,the proposed method achieved accuracies of 0.900 and 0.896,respectively,attaining performance comparable to other state-ofthe-artmethods.Furthermore,on occlusion datasets,the proposedmethod significantly mitigates the performance degradation caused by occlusion compared to other methods,the accuracy is significantly higher than that of other methods.展开更多
Purpose-Human behavior recognition poses a pivotal challenge in intelligent computing and cybernetics,significantly impacting engineering and management systems.With the rapid advancement of autonomous systems and int...Purpose-Human behavior recognition poses a pivotal challenge in intelligent computing and cybernetics,significantly impacting engineering and management systems.With the rapid advancement of autonomous systems and intelligent manufacturing,there is an increasing demand for precise and efficient human behavior recognition technologies.However,traditional methods often suffer from insufficient accuracy and limited generalization ability when dealing with complex and diverse human actions.Therefore,this study aims to enhance the precision of human behavior recognition by proposing an innovative framework,dynamic graph convolutional networks with multi-scale position attention(DGCN-MPA)to sup.Design/methodology/approach-The primary applications are in autonomous systems and intelligent manufacturing.The main objective of this study is to develop an efficient human behavior recognition framework that leverages advanced techniques to improve the prediction and interpretation of human actions.This framework aims to address the shortcomings of existing methods in handling the complexity and variability of human actions,providing more reliable and precise solutions for practical applications.The proposed DGCN-MPA framework integrates the strengths of convolutional neural networks and graph-based models.It innovatively incorporates wavelet packet transform to extract time-frequency characteristics and a MPA module to enhance the representation of skeletal node positions.The core innovation lies in the fusion of dynamic graph convolution with hierarchical attention mechanisms,which selectively attend to relevant features and spatial relationships,adjusting their importance across scales to address the variability in human actions.Findings-To validate the effectiveness of the DGCN-MPA framework,rigorous evaluations were conducted on benchmark datasets such as NTU-RGB+D and Kinetics-Skeleton.The results demonstrate that the framework achieves an F1 score of 62.18%and an accuracy of 75.93%on NTU-RGB+D and an F1 score of 69.34%and an accuracy of 76.86%on Kinetics-Skeleton,outperforming existing models.These findings underscore the framework’s capability to capture complex behavior patterns with high precision.Originality/value-By introducing a dynamic graph convolutional approach combined with multi-scale position attention mechanisms,this study represents a significant advancement in human behavior recognition technologies.The innovative design and superior performance of the DGCN-MPA framework contribute to its potential for real-world applications,particularly in integrating behavior recognition into engineering and autonomous systems.In the future,this framework has the potential to further propel the development of intelligent computing,cybernetics and related fields.展开更多
The hands and face are the most important parts for expressing sign language morphemes in sign language videos.However,we find that existing Continuous Sign Language Recognition(CSLR)methods lack the mining of hand an...The hands and face are the most important parts for expressing sign language morphemes in sign language videos.However,we find that existing Continuous Sign Language Recognition(CSLR)methods lack the mining of hand and face information in visual backbones or use expensive and time-consuming external extractors to explore this information.In addition,the signs have different lengths,whereas previous CSLR methods typically use a fixed-length window to segment the video to capture sequential features and then perform global temporal modeling,which disturbs the perception of complete signs.In this study,we propose a Multi-Scale Context-Aware network(MSCA-Net)to solve the aforementioned problems.Our MSCA-Net contains two main modules:(1)Multi-Scale Motion Attention(MSMA),which uses the differences among frames to perceive information of the hands and face in multiple spatial scales,replacing the heavy feature extractors;and(2)Multi-Scale Temporal Modeling(MSTM),which explores crucial temporal information in the sign language video from different temporal scales.We conduct extensive experiments using three widely used sign language datasets,i.e.,RWTH-PHOENIX-Weather-2014,RWTH-PHOENIX-Weather-2014T,and CSL-Daily.The proposed MSCA-Net achieve state-of-the-art performance,demonstrating the effectiveness of our approach.展开更多
Traffic flow forecasting plays a crucial role and is the key technology to realize dynamic traffic guidance and active traffic control in intelligent traffic systems(ITS).Aiming at the complex local and global spatial...Traffic flow forecasting plays a crucial role and is the key technology to realize dynamic traffic guidance and active traffic control in intelligent traffic systems(ITS).Aiming at the complex local and global spatial-temporal dynamic characteristics of traffic flow,this paper proposes a new traffic flow forecasting model spatial-temporal attention graph neural network(STA-GNN)by combining at-tention mechanism(AM)and spatial-temporal convolutional network.The model learns the hidden dynamic local spatial correlations of the traffic network by combining the dynamic adjacency matrix constructed by the graph learning layer with the graph convolutional network(GCN).The local tem-poral correlations of traffic flow at different scales are extracted by stacking multiple convolutional kernels in temporal convolutional network(TCN).And the global spatial-temporal dependencies of long-time sequences of traffic flow are captured by the spatial-temporal attention mechanism(STAtt),which enhances the global spatial-temporal modeling and the representational ability of model.The experimental results on two datasets,METR-LA and PEMS-BAY,show the proposed STA-GNN model outperforms the common baseline models in forecasting accuracy.展开更多
Event temporal relation extraction is an important part of natural language processing.Many models are being used in this task with the development of deep learning.However,most of the existing methods cannot accurate...Event temporal relation extraction is an important part of natural language processing.Many models are being used in this task with the development of deep learning.However,most of the existing methods cannot accurately obtain the degree of association between different tokens and events,and event-related information cannot be effectively integrated.In this paper,we propose an event information integration model that integrates event information through multilayer bidirectional long short-term memory(Bi-LSTM)and attention mechanism.Although the above scheme can improve the extraction performance,it can still be further optimized.To further improve the performance of the previous scheme,we propose a novel relational graph attention network that incorporates edge attributes.In this approach,we first build a semantic dependency graph through dependency parsing,model a semantic graph that considers the edges’attributes by using top-k attention mechanisms to learn hidden semantic contextual representations,and finally predict event temporal relations.We evaluate proposed models on the TimeBank-Dense dataset.Compared to previous baselines,the Micro-F1 scores obtained by our models improve by 3.9%and 14.5%,respectively.展开更多
文摘Reliable traffic flow prediction is crucial for mitigating urban congestion.This paper proposes Attentionbased spatiotemporal Interactive Dynamic Graph Convolutional Network(AIDGCN),a novel architecture integrating Interactive Dynamic Graph Convolution Network(IDGCN)with Temporal Multi-Head Trend-Aware Attention.Its core innovation lies in IDGCN,which uniquely splits sequences into symmetric intervals for interactive feature sharing via dynamic graphs,and a novel attention mechanism incorporating convolutional operations to capture essential local traffic trends—addressing a critical gap in standard attention for continuous data.For 15-and 60-min forecasting on METR-LA,AIDGCN achieves MAEs of 0.75%and 0.39%,and RMSEs of 1.32%and 0.14%,respectively.In the 60-min long-term forecasting of the PEMS-BAY dataset,the AIDGCN out-performs the MRA-BGCN method by 6.28%,4.93%,and 7.17%in terms of MAE,RMSE,and MAPE,respectively.Experimental results demonstrate the superiority of our pro-posed model over state-of-the-art methods.
基金supported by the Key Research&Development Plan Project of Shandong Province,China(No.2017GGX10127).
文摘Continuous sign language recognition(CSLR)is challenging due to the complexity of video background,hand gesture variability,and temporal modeling difficulties.This work proposes a CSLR method based on a spatialtemporal graph attention network to focus on essential features of video series.The method considers local details of sign language movements by taking the information on joints and bones as inputs and constructing a spatialtemporal graph to reflect inter-frame relevance and physical connections between nodes.The graph-based multihead attention mechanism is utilized with adjacent matrix calculation for better local-feature exploration,and short-term motion correlation modeling is completed via a temporal convolutional network.We adopted BLSTM to learn the long-termdependence and connectionist temporal classification to align the word-level sequences.The proposed method achieves competitive results regarding word error rates(1.59%)on the Chinese Sign Language dataset and the mean Jaccard Index(65.78%)on the ChaLearn LAP Continuous Gesture Dataset.
文摘Multivariate time series forecasting plays a crucial role in decision-making for systems like energy grids and transportation networks,where temporal patterns emerge across diverse scales from short-term fluctuations to long-term trends.However,existing Transformer-based methods often process data at a single resolution or handle multiple scales independently,overlooking critical cross-scale interactions that influence prediction accuracy.To address this gap,we introduce the Hierarchical Attention Transformer(HAT),which enables direct information exchange between temporal hierarchies through a novel cross-scale attention mechanism.HAT extracts multi-scale features using hierarchical convolutional-recurrent blocks,fuses them via temperature-controlled mechanisms,and optimizes gradient flow with residual connections for stable training.Evaluations on eight benchmark datasets show HAT outperforming state-of-the-art baselines,with average reductions of 8.2%in MSE and 7.5%in MAE across horizons,while achieving a 6.1×training speedup over patch-based methods.These advancements highlight HAT’s potential for applications requiring multi-resolution temporal modeling.
基金supported in part by the National Natural Science Foundation of China(62372385).
文摘Dear Editor,This letter proposes the graph tensor alliance attention network(GT-A^(2)T)to represent a dynamic graph(DG)precisely.Its main idea includes 1)Establishing a unified spatio-temporal message propagation framework on a DG via the tensor product for capturing the complex cohesive spatio-temporal interdependencies precisely and 2)Acquiring the alliance attention scores by node features and favorable high-order structural correlations.
基金supported by the National Natural Science Foundation of China(62272049,62236006,62172045)the Key Projects of Beijing Union University(ZKZD202301).
文摘In recent years,gait-based emotion recognition has been widely applied in the field of computer vision.However,existing gait emotion recognition methods typically rely on complete human skeleton data,and their accuracy significantly declines when the data is occluded.To enhance the accuracy of gait emotion recognition under occlusion,this paper proposes a Multi-scale Suppression Graph ConvolutionalNetwork(MS-GCN).TheMS-GCN consists of three main components:Joint Interpolation Module(JI Moudle),Multi-scale Temporal Convolution Network(MS-TCN),and Suppression Graph Convolutional Network(SGCN).The JI Module completes the spatially occluded skeletal joints using the(K-Nearest Neighbors)KNN interpolation method.The MS-TCN employs convolutional kernels of various sizes to comprehensively capture the emotional information embedded in the gait,compensating for the temporal occlusion of gait information.The SGCN extracts more non-prominent human gait features by suppressing the extraction of key body part features,thereby reducing the negative impact of occlusion on emotion recognition results.The proposed method is evaluated on two comprehensive datasets:Emotion-Gait,containing 4227 real gaits from sources like BML,ICT-Pollick,and ELMD,and 1000 synthetic gaits generated using STEP-Gen technology,and ELMB,consisting of 3924 gaits,with 1835 labeled with emotions such as“Happy,”“Sad,”“Angry,”and“Neutral.”On the standard datasets Emotion-Gait and ELMB,the proposed method achieved accuracies of 0.900 and 0.896,respectively,attaining performance comparable to other state-ofthe-artmethods.Furthermore,on occlusion datasets,the proposedmethod significantly mitigates the performance degradation caused by occlusion compared to other methods,the accuracy is significantly higher than that of other methods.
基金supported by the Guangxi University Young and middle-aged Teachers Basic Ability Improvement Project(No.:2023KY1692)Guilin University of Information Technology 2022 Research Project(No.:XJ202207)。
文摘Purpose-Human behavior recognition poses a pivotal challenge in intelligent computing and cybernetics,significantly impacting engineering and management systems.With the rapid advancement of autonomous systems and intelligent manufacturing,there is an increasing demand for precise and efficient human behavior recognition technologies.However,traditional methods often suffer from insufficient accuracy and limited generalization ability when dealing with complex and diverse human actions.Therefore,this study aims to enhance the precision of human behavior recognition by proposing an innovative framework,dynamic graph convolutional networks with multi-scale position attention(DGCN-MPA)to sup.Design/methodology/approach-The primary applications are in autonomous systems and intelligent manufacturing.The main objective of this study is to develop an efficient human behavior recognition framework that leverages advanced techniques to improve the prediction and interpretation of human actions.This framework aims to address the shortcomings of existing methods in handling the complexity and variability of human actions,providing more reliable and precise solutions for practical applications.The proposed DGCN-MPA framework integrates the strengths of convolutional neural networks and graph-based models.It innovatively incorporates wavelet packet transform to extract time-frequency characteristics and a MPA module to enhance the representation of skeletal node positions.The core innovation lies in the fusion of dynamic graph convolution with hierarchical attention mechanisms,which selectively attend to relevant features and spatial relationships,adjusting their importance across scales to address the variability in human actions.Findings-To validate the effectiveness of the DGCN-MPA framework,rigorous evaluations were conducted on benchmark datasets such as NTU-RGB+D and Kinetics-Skeleton.The results demonstrate that the framework achieves an F1 score of 62.18%and an accuracy of 75.93%on NTU-RGB+D and an F1 score of 69.34%and an accuracy of 76.86%on Kinetics-Skeleton,outperforming existing models.These findings underscore the framework’s capability to capture complex behavior patterns with high precision.Originality/value-By introducing a dynamic graph convolutional approach combined with multi-scale position attention mechanisms,this study represents a significant advancement in human behavior recognition technologies.The innovative design and superior performance of the DGCN-MPA framework contribute to its potential for real-world applications,particularly in integrating behavior recognition into engineering and autonomous systems.In the future,this framework has the potential to further propel the development of intelligent computing,cybernetics and related fields.
基金Supported by the National Natural Science Foundation of China(62072334).
文摘The hands and face are the most important parts for expressing sign language morphemes in sign language videos.However,we find that existing Continuous Sign Language Recognition(CSLR)methods lack the mining of hand and face information in visual backbones or use expensive and time-consuming external extractors to explore this information.In addition,the signs have different lengths,whereas previous CSLR methods typically use a fixed-length window to segment the video to capture sequential features and then perform global temporal modeling,which disturbs the perception of complete signs.In this study,we propose a Multi-Scale Context-Aware network(MSCA-Net)to solve the aforementioned problems.Our MSCA-Net contains two main modules:(1)Multi-Scale Motion Attention(MSMA),which uses the differences among frames to perceive information of the hands and face in multiple spatial scales,replacing the heavy feature extractors;and(2)Multi-Scale Temporal Modeling(MSTM),which explores crucial temporal information in the sign language video from different temporal scales.We conduct extensive experiments using three widely used sign language datasets,i.e.,RWTH-PHOENIX-Weather-2014,RWTH-PHOENIX-Weather-2014T,and CSL-Daily.The proposed MSCA-Net achieve state-of-the-art performance,demonstrating the effectiveness of our approach.
基金Supported by the Key R&D Program of Gansu Province(No.23YFGA0063)the National Natural Science Foundation of China(No.62363022,61663021)+1 种基金the Natural Science Foundation of Gansu Province(No.22JR5RA226,23JRRA886)the Gansu Provincial De-partment of Education:Industrial Support Plan Project(No.2023CYZC-35).
文摘Traffic flow forecasting plays a crucial role and is the key technology to realize dynamic traffic guidance and active traffic control in intelligent traffic systems(ITS).Aiming at the complex local and global spatial-temporal dynamic characteristics of traffic flow,this paper proposes a new traffic flow forecasting model spatial-temporal attention graph neural network(STA-GNN)by combining at-tention mechanism(AM)and spatial-temporal convolutional network.The model learns the hidden dynamic local spatial correlations of the traffic network by combining the dynamic adjacency matrix constructed by the graph learning layer with the graph convolutional network(GCN).The local tem-poral correlations of traffic flow at different scales are extracted by stacking multiple convolutional kernels in temporal convolutional network(TCN).And the global spatial-temporal dependencies of long-time sequences of traffic flow are captured by the spatial-temporal attention mechanism(STAtt),which enhances the global spatial-temporal modeling and the representational ability of model.The experimental results on two datasets,METR-LA and PEMS-BAY,show the proposed STA-GNN model outperforms the common baseline models in forecasting accuracy.
基金supported by the National key Research&Development Program of China(No.2017YFC0820503)the National Natural Science Foundation of China(No.62072149)+2 种基金the National Social Science Foundation of China(No.19ZDA348)the Primary Research&Development Plan of Zhejiang(No.2021C03156)the Public Welfare Research Program of Zhejiang(No.LGG19F020017)。
文摘Event temporal relation extraction is an important part of natural language processing.Many models are being used in this task with the development of deep learning.However,most of the existing methods cannot accurately obtain the degree of association between different tokens and events,and event-related information cannot be effectively integrated.In this paper,we propose an event information integration model that integrates event information through multilayer bidirectional long short-term memory(Bi-LSTM)and attention mechanism.Although the above scheme can improve the extraction performance,it can still be further optimized.To further improve the performance of the previous scheme,we propose a novel relational graph attention network that incorporates edge attributes.In this approach,we first build a semantic dependency graph through dependency parsing,model a semantic graph that considers the edges’attributes by using top-k attention mechanisms to learn hidden semantic contextual representations,and finally predict event temporal relations.We evaluate proposed models on the TimeBank-Dense dataset.Compared to previous baselines,the Micro-F1 scores obtained by our models improve by 3.9%and 14.5%,respectively.