Accurate traffic flow prediction has a profound impact on modern traffic management. Traffic flow has complex spatial-temporal correlations and periodicity, which poses difficulties for precise prediction. To address ...Accurate traffic flow prediction has a profound impact on modern traffic management. Traffic flow has complex spatial-temporal correlations and periodicity, which poses difficulties for precise prediction. To address this problem, a Multi-head Self-attention and Spatial-Temporal Graph Convolutional Network (MSSTGCN) for multiscale traffic flow prediction is proposed. Firstly, to capture the hidden traffic periodicity of traffic flow, traffic flow is divided into three kinds of periods, including hourly, daily, and weekly data. Secondly, a graph attention residual layer is constructed to learn the global spatial features across regions. Local spatial-temporal dependence is captured by using a T-GCN module. Thirdly, a transformer layer is introduced to learn the long-term dependence in time. A position embedding mechanism is introduced to label position information for all traffic sequences. Thus, this multi-head self-attention mechanism can recognize the sequence order and allocate weights for different time nodes. Experimental results on four real-world datasets show that the MSSTGCN performs better than the baseline methods and can be successfully adapted to traffic prediction tasks.展开更多
The ability to accurately predict urban traffic flows is crucial for optimising city operations.Consequently,various methods for forecasting urban traffic have been developed,focusing on analysing historical data to u...The ability to accurately predict urban traffic flows is crucial for optimising city operations.Consequently,various methods for forecasting urban traffic have been developed,focusing on analysing historical data to understand complex mobility patterns.Deep learning techniques,such as graph neural networks(GNNs),are popular for their ability to capture spatio-temporal dependencies.However,these models often become overly complex due to the large number of hyper-parameters involved.In this study,we introduce Dynamic Multi-Graph Spatial-Temporal Graph Neural Ordinary Differential Equation Networks(DMST-GNODE),a framework based on ordinary differential equations(ODEs)that autonomously discovers effective spatial-temporal graph neural network(STGNN)architectures for traffic prediction tasks.The comparative analysis of DMST-GNODE and baseline models indicates that DMST-GNODE model demonstrates superior performance across multiple datasets,consistently achieving the lowest Root Mean Square Error(RMSE)and Mean Absolute Error(MAE)values,alongside the highest accuracy.On the BKK(Bangkok)dataset,it outperformed other models with an RMSE of 3.3165 and an accuracy of 0.9367 for a 20-min interval,maintaining this trend across 40 and 60 min.Similarly,on the PeMS08 dataset,DMST-GNODE achieved the best performance with an RMSE of 19.4863 and an accuracy of 0.9377 at 20 min,demonstrating its effectiveness over longer periods.The Los_Loop dataset results further emphasise this model’s advantage,with an RMSE of 3.3422 and an accuracy of 0.7643 at 20 min,consistently maintaining superiority across all time intervals.These numerical highlights indicate that DMST-GNODE not only outperforms baseline models but also achieves higher accuracy and lower errors across different time intervals and datasets.展开更多
Fall behavior is closely related to high mortality in the elderly,so fall detection becomes an important and urgent research area.However,the existing fall detection methods are difficult to be applied in daily life d...Fall behavior is closely related to high mortality in the elderly,so fall detection becomes an important and urgent research area.However,the existing fall detection methods are difficult to be applied in daily life due to a large amount of calculation and poor detection accuracy.To solve the above problems,this paper proposes a dense spatial-temporal graph convolutional network based on lightweight OpenPose.Lightweight OpenPose uses MobileNet as a feature extraction network,and the prediction layer uses bottleneck-asymmetric structure,thus reducing the amount of the network.The bottleneck-asymmetrical structure compresses the number of input channels of feature maps by 1×1 convolution and replaces the 7×7 convolution structure with the asymmetric structure of 1×7 convolution,7×1 convolution,and 7×7 convolution in parallel.The spatial-temporal graph convolutional network divides the multi-layer convolution into dense blocks,and the convolutional layers in each dense block are connected,thus improving the feature transitivity,enhancing the network’s ability to extract features,thus improving the detection accuracy.Two representative datasets,Multiple Cameras Fall dataset(MCF),and Nanyang Technological University Red Green Blue+Depth Action Recognition dataset(NTU RGB+D),are selected for our experiments,among which NTU RGB+D has two evaluation benchmarks.The results show that the proposed model is superior to the current fall detection models.The accuracy of this network on the MCF dataset is 96.3%,and the accuracies on the two evaluation benchmarks of the NTU RGB+D dataset are 85.6%and 93.5%,respectively.展开更多
The prediction for Multivariate Time Series(MTS)explores the interrelationships among variables at historical moments,extracts their relevant characteristics,and is widely used in finance,weather,complex industries an...The prediction for Multivariate Time Series(MTS)explores the interrelationships among variables at historical moments,extracts their relevant characteristics,and is widely used in finance,weather,complex industries and other fields.Furthermore,it is important to construct a digital twin system.However,existing methods do not take full advantage of the potential properties of variables,which results in poor predicted accuracy.In this paper,we propose the Adaptive Fused Spatial-Temporal Graph Convolutional Network(AFSTGCN).First,to address the problem of the unknown spatial-temporal structure,we construct the Adaptive Fused Spatial-Temporal Graph(AFSTG)layer.Specifically,we fuse the spatial-temporal graph based on the interrelationship of spatial graphs.Simultaneously,we construct the adaptive adjacency matrix of the spatial-temporal graph using node embedding methods.Subsequently,to overcome the insufficient extraction of disordered correlation features,we construct the Adaptive Fused Spatial-Temporal Graph Convolutional(AFSTGC)module.The module forces the reordering of disordered temporal,spatial and spatial-temporal dependencies into rule-like data.AFSTGCN dynamically and synchronously acquires potential temporal,spatial and spatial-temporal correlations,thereby fully extracting rich hierarchical feature information to enhance the predicted accuracy.Experiments on different types of MTS datasets demonstrate that the model achieves state-of-the-art single-step and multi-step performance compared with eight other deep learning models.展开更多
Accurate traffic pattern prediction in largescale networks is of great importance for intelligent system management and automatic resource allocation.System-level mobile traffic forecasting has significant challenges ...Accurate traffic pattern prediction in largescale networks is of great importance for intelligent system management and automatic resource allocation.System-level mobile traffic forecasting has significant challenges due to the tremendous temporal and spatial dynamics introduced by diverse Internet user behaviors and frequent traffic migration.Spatialtemporal graph modeling is an efficient approach for analyzing the spatial relations and temporal trends of mobile traffic in a large system.Previous research may not reflect the optimal dependency by ignoring inter-base station dependency or pre-determining the explicit geological distance as the interrelationship of base stations.To overcome the limitations of graph structure,this study proposes an adaptive graph convolutional network(AGCN)that captures the latent spatial dependency by developing self-adaptive dependency matrices and acquires temporal dependency using recurrent neural networks.Evaluated on two mobile network datasets,the experimental results demonstrate that this method outperforms other baselines and reduces the mean absolute error by 3.7%and 5.6%compared to time-series based approaches.展开更多
In recent years,aquaculture has developed rapidly,especially in coastal and open ocean areas.In practice,water quality prediction is of critical importance.However,traditional water quality prediction models face limi...In recent years,aquaculture has developed rapidly,especially in coastal and open ocean areas.In practice,water quality prediction is of critical importance.However,traditional water quality prediction models face limitations in handling complex spatiotemporal patterns.To address this challenge,a prediction model was proposed for water quality,namely an adaptive multi-channel temporal graph convolutional network(AMTGCN).The AMTGCN integrates adaptive graph construction,multi-channel spatiotemporal graph convolutional network,and fusion layers,and can comprehensively capture the spatial relationships and spatiotemporal patterns in aquaculture water quality data.Onsite aquaculture water quality data and the metrics MAE,RMSE,MAPE,and R^(2) were collected to validate the AMTGCN.The results show that the AMTGCN presents an average improvement of 34.01%,34.59%,36.05%,and 17.71%compared to LSTM,respectively;an average improvement of 64.84%,56.78%,64.82%,and 153.16%compared to the STGCN,respectively;an average improvement of 55.25%,48.67%,57.01%,and 209.00%compared to GCN-LSTM,respectively;and an average improvement of 7.05%,5.66%,7.42%,and 2.47%compared to TCN,respectively.This indicates that the AMTGCN,integrating the innovative structure of adaptive graph construction and multi-channel spatiotemporal graph convolutional network,could provide an efficient solution for water quality prediction in aquaculture.展开更多
Energy expenditure estimation can be used to measure the exercise load and physical condition of different individuals, such as soldiers, athletes, firemen, etc., during their training and work. Energy expenditure est...Energy expenditure estimation can be used to measure the exercise load and physical condition of different individuals, such as soldiers, athletes, firemen, etc., during their training and work. Energy expenditure estimation methods based on computer vision have rapidly developed in recent years. Compared with sensor-based methods, such methods are capable of monitoring several target persons at the same time, and the subjects do not need to wear different sensor devices that hamper their movement. In this paper, we propose a cross-attention spatial–temporal convolutional neural network to predict the energy expenditure of people under different exercise intensities. The model explores the relationship between changes in the human skeleton and energy expenditure intensity. In addition, a cross-attention correction module is used to reduce the negative effects of individual physical fitness characteristics during energy expenditure estimation. The experimental results show that our proposed method achieves high accuracy for energy expenditure estimation and performs better than existing computer vision-based energy expenditure estimation methods do. The proposed method can be widely used in various physical activity scenarios to measure energy expenditure, increasing the convenience of usage.展开更多
Analyzing the vulnerability of power systems in cascading failures is generally regarded as a challenging problem.Although existing studies can extract some critical rules,they fail to capture the complex subtleties u...Analyzing the vulnerability of power systems in cascading failures is generally regarded as a challenging problem.Although existing studies can extract some critical rules,they fail to capture the complex subtleties under different operational conditions.In recent years,several deep learning methods have been applied to address this issue.However,most of the existing deep learning methods consider only the grid topology of a power system in terms of topological connections,but do not encompass a power system’s spatial information such as the electrical distance to increase the accuracy in the process of graph convolution.In this paper,we construct a novel power-weighted line graph that uses power system topology and spatial information to optimize the edge weight assignment of the line graph.Then we propose a multi-graph convolutional network(MGCN)based on a graph classification task,which preserves a power system’s spatial correlations and captures the relationships among physical components.Our model can better handle the problem with power systems that have parallel lines,where our method can maintain desirable accuracy in modeling systems with these extra topology features.To increase the interpretability of the model,we present the MGCN using layer-wise relevance propagation and quantify the contributing factors of model classification.展开更多
Accurately predicting the State of Health(SOH)of lithium-ion batteries is a critical challenge to ensure their reliability and safety in energy storage systems,such as electric vehicles and renewable energy grids.The ...Accurately predicting the State of Health(SOH)of lithium-ion batteries is a critical challenge to ensure their reliability and safety in energy storage systems,such as electric vehicles and renewable energy grids.The intricate battery degradation process is influenced by evolving spatial and temporal interactions among health indicators.Existing methods often fail to capture the dynamic interactions between health indicators over time,resulting in limited predictive accuracy.To address these challenges,we propose a novel framework,Dynamic Graph Learning with Spatial-Temporal Fusion Attention(DGL-STFA),which transforms health indicator series time-data into time-evolving graph representations.The framework employs multi-scale convolutional neural networks to capture diverse temporal patterns,a self-attention mechanism to construct dynamic adjacency matrices that adapt over time,and a temporal attention mechanism to identify and prioritize key moments that influence battery degradation.This combination enables DGL-STFA to effectively model both dynamic spatial relationships and long-term temporal dependencies,enhancing SOH prediction accuracy.Extensive experiments were conducted on the NASA and CALCE battery datasets,comparing this framework with traditional time-series prediction methods and other graph-based prediction methods.The results demonstrate that our framework significantly improves prediction accuracy,with a mean absolute error more than 30%lower than other methods.Further analysis demonstrated the robustness of DGL-STFA across various battery life stages,including early,mid,and end-of-life phases.These results highlight the capability of DGL-STFA to accurately predict SOH,addressing critical challenges in advancing battery health monitoring for energy storage applications.展开更多
Nowadays,deep neural networks(DNNs)have been equipped with powerful representation capabilities.The deep convolutional neural networks(CNNs)that draw inspiration from the visual processing mechanism of the primate ear...Nowadays,deep neural networks(DNNs)have been equipped with powerful representation capabilities.The deep convolutional neural networks(CNNs)that draw inspiration from the visual processing mechanism of the primate early visual cortex have outperformed humans on object categorization and have been found to possess many brain-like properties.Recently,vision transformers(ViTs)have been striking paradigms of DNNs and have achieved remarkable improvements on many vision tasks compared to CNNs.It is natural to ask how the brain-like properties of ViTs are.Beyond the model paradigm,we are also interested in the effects of factors,such as model size,multimodality,and temporality,on the ability of networks to model the human visual pathway,especially when considering that existing research has been limited to CNNs.In this paper,we systematically evaluate the brain-like properties of 30 kinds of computer vision models varying from CNNs and ViTs to their hybrids from the perspective of explaining brain activities of the human visual cortex triggered by dynamic stimuli.Experiments on two neural datasets demonstrate that neither CNN nor transformer is the optimal model paradigm for modelling the human visual pathway.ViTs reveal hierarchical correspondences to the visual pathway as CNNs do.Moreover,we find that multi-modal and temporal networks can better explain the neural activities of large parts of the visual cortex,whereas a larger model size is not a sufficient condition for bridging the gap between human vision and artificial networks.Our study sheds light on the design principles for more brain-like networks.The code is available at https://github.com/QYiZhou/LWNeuralEncoding.展开更多
Methanol-to-olefins,as a promising non-oil pathway for the synthesis of light olefins,has been successfully industrialized.The accurate prediction of process variables can yield significant benefits for advanced proce...Methanol-to-olefins,as a promising non-oil pathway for the synthesis of light olefins,has been successfully industrialized.The accurate prediction of process variables can yield significant benefits for advanced process control and optimization.The challenge of this task is underscored by the failure of traditional methods in capturing the complex characteristics of industrial processes,such as high nonlinearities,dynamics,and data distribution shift caused by diverse operating conditions.In this paper,we propose a novel hybrid spatial-temporal deep learning prediction model to address these issues.Firstly,a unique data normalization technique called reversible instance normalization is employed to solve the problem of different data distributions.Subsequently,convolutional neural network integrated with the self-attention mechanism are utilized to extract the temporal patterns.Meanwhile,a multi-graph convolutional network is leveraged to model the spatial interactions.Afterward,the extracted temporal and spatial features are fused as input into a fully connected neural network to complete the prediction.Finally,the outputs are denormalized to obtain the ultimate results.The monitoring results of the dynamic trends of process variables in an actual industrial methanol-to-olefins process demonstrate that our model not only achieves superior prediction performance but also can reveal complex spatial-temporal relationships using the learned attention matrices and adjacency matrices,making the model more interpretable.Lastly,this model is deployed onto an end-to-end Industrial Internet Platform,which achieves effective practical results.展开更多
Marine oil spill emulsions are difficult to recover,and the damage to the environment is not easy to eliminate.The use of remote sensing to accurately identify oil spill emulsions is highly important for the protectio...Marine oil spill emulsions are difficult to recover,and the damage to the environment is not easy to eliminate.The use of remote sensing to accurately identify oil spill emulsions is highly important for the protection of marine environments.However,the spectrum of oil emulsions changes due to different water content.Hyperspectral remote sensing and deep learning can use spectral and spatial information to identify different types of oil emulsions.Nonetheless,hyperspectral data can also cause information redundancy,reducing classification accuracy and efficiency,and even overfitting in machine learning models.To address these problems,an oil emulsion deep-learning identification model with spatial-spectral feature fusion is established,and feature bands that can distinguish between crude oil,seawater,water-in-oil emulsion(WO),and oil-in-water emulsion(OW)are filtered based on a standard deviation threshold–mutual information method.Using oil spill airborne hyperspectral data,we conducted identification experiments on oil emulsions in different background waters and under different spatial and temporal conditions,analyzed the transferability of the model,and explored the effects of feature band selection and spectral resolution on the identification of oil emulsions.The results show the following.(1)The standard deviation–mutual information feature selection method is able to effectively extract feature bands that can distinguish between WO,OW,oil slick,and seawater.The number of bands was reduced from 224 to 134 after feature selection on the Airborne Visible Infrared Imaging Spectrometer(AVIRIS)data and from 126 to 100 on the S185 data.(2)With feature selection,the overall accuracy and Kappa of the identification results for the training area are 91.80%and 0.86,respectively,improved by 2.62%and 0.04,and the overall accuracy and Kappa of the identification results for the migration area are 86.53%and 0.80,respectively,improved by 3.45%and 0.05.(3)The oil emulsion identification model has a certain degree of transferability and can effectively identify oil spill emulsions for AVIRIS data at different times and locations,with an overall accuracy of more than 80%,Kappa coefficient of more than 0.7,and F1 score of 0.75 or more for each category.(4)As the spectral resolution decreasing,the model yields different degrees of misclassification for areas with a mixed distribution of oil slick and seawater or mixed distribution of WO and OW.Based on the above experimental results,we demonstrate that the oil emulsion identification model with spatial–spectral feature fusion achieves a high accuracy rate in identifying oil emulsion using airborne hyperspectral data,and can be applied to images under different spatial and temporal conditions.Furthermore,we also elucidate the impact of factors such as spectral resolution and background water bodies on the identification process.These findings provide new reference for future endeavors in automated marine oil spill detection.展开更多
Safety production is of great significance to the development of enterprises and society.Accidents often cause great losses because of the particularity environment of electric power.Therefore,it is important to impro...Safety production is of great significance to the development of enterprises and society.Accidents often cause great losses because of the particularity environment of electric power.Therefore,it is important to improve the safety supervision and protection in the electric power environment.In this paper,we simulate the actual electric power operation scenario by monitoring equipment and propose a real-time detection method of illegal actions based on human body key points to ensure safety behavior in real time.In this method,the human body key points in video frames were first extracted by the high-resolution network,and then classified in real time by spatial-temporal graph convolutional network.Experimental results show that this method can effectively detect illegal actions in the simulated scene.展开更多
Lip-reading is a process of interpreting speech by visually analysing lip movements.Recent research in this area has shifted from simple word recognition to lip-reading sentences in the wild.This paper attempts to use...Lip-reading is a process of interpreting speech by visually analysing lip movements.Recent research in this area has shifted from simple word recognition to lip-reading sentences in the wild.This paper attempts to use phonemes as a classification schema for lip-reading sentences to explore an alternative schema and to enhance system performance.Different classification schemas have been investigated,including characterbased and visemes-based schemas.The visual front-end model of the system consists of a Spatial-Temporal(3D)convolution followed by a 2D ResNet.Transformers utilise multi-headed attention for phoneme recognition models.For the language model,a Recurrent Neural Network is used.The performance of the proposed system has been testified with the BBC Lip Reading Sentences 2(LRS2)benchmark dataset.Compared with the state-of-the-art approaches in lip-reading sentences,the proposed system has demonstrated an improved performance by a 10%lower word error rate on average under varying illumination ratios.展开更多
The prediction of regional traffic flows is important for traffic control and management in an intelligent traffic system.With the help of deep neural networks,the convolutional neural network or residual neural netwo...The prediction of regional traffic flows is important for traffic control and management in an intelligent traffic system.With the help of deep neural networks,the convolutional neural network or residual neural network,which can be applied only to regular grids,is adopted to capture the spatial dependence for flow prediction.However,the obtained regions are always irregular considering the road network and administrative boundaries;thus,dividing the city into grids is inaccurate for prediction.In this paper,we propose a new model based on multi-graph convolutional network and gated recurrent unit(MGCN-GRU)to predict traffic flows for irregular regions.Specifically,we first construct heterogeneous inter-region graphs for a city to reflect the rela-tionships among regions.In each graph,nodes represent the irregular regions and edges represent the relationship types between regions.Then,we propose a multi-graph convolutional network to fuse different inter-region graphs and additional attributes.The GRU is further used to capture the temporal dependence and to predict future traffic flows.Experimental results based on three real-world large-scale datasets(public bicycle system dataset,taxi dataset,and dockless bike-sharing dataset)show that our MGCN-GRU model outperforms a variety of existing methods.展开更多
The challenge of coping with non-frontal head poses during facial expression recognition results in considerable reduction of accuracy and robustness when capturing expressions that occur during natural communications...The challenge of coping with non-frontal head poses during facial expression recognition results in considerable reduction of accuracy and robustness when capturing expressions that occur during natural communications. In this paper, we attempt to recognize facial expressions under poses with large rotation angles from 2D videos. A depth^patch based 4D expression representation model is proposed. It was reconstructed from 2D dynamic images for delineating continuous spatial changes and temporal context under non-frontal cases. Furthermore, we present an effective deep neural network classifier, which can accurately capture pose-variant expression features from the depth patches and recognize non-frontal expressions. Experimental results on the BU-4DFE database show that the proposed method achieves a high recognition accuracy of 86.87% for non-frontal facial expressions within a range of head rotation angle of up to 52%, outperforming existing methods. We also present a quantitative analysis of the components contributing to the performance gain through tests on the BU-4DFE and Multi-PIE datasets.展开更多
With the gradual advancement of digital transformation in the tourism industry,exploiting implicit information of tourism demand for prediction has gradually become mainstream in this research field.Among these works,...With the gradual advancement of digital transformation in the tourism industry,exploiting implicit information of tourism demand for prediction has gradually become mainstream in this research field.Among these works,the research on unidirectional implicit information is well-developed,whereas studies on interactive implicit information are scarce.Therefore,to further enhance the performance of tourism forecasting,this study employs a LangChain-based interactive context sentiment analysis model in the realm of feature engineering.By incorporating the sentiment tendencies found in online tourism reviews into tourism demand forecasting research,the model's inferential capabilities are significantly improved.In terms of model processing,a new tourism demand prediction fusion model,EMD-STGCN-GRU-LSTM-Transformer(abbreviated as EST-Net),has been developed to address the unique spatio-temporal characteristics and imbalance of tourism data,thereby enhancing the model's ability to accurately extract spatio-temporal sequences.Additionally,the PCA method is utilized to aggregate multiple key indicators of sentiment attention and natural environmental factors,constructing a tourism prediction indicator system to further correct the overall framework bias.展开更多
Photovoltaic(PV)power forecasting is essential for secure operation of a power system.Effective prediction of PV power can improve new energy consumption capacity,help power system planning,promote development of smar...Photovoltaic(PV)power forecasting is essential for secure operation of a power system.Effective prediction of PV power can improve new energy consumption capacity,help power system planning,promote development of smart grids,and ultimately support construction of smart energy cities.However,different from centralized PV power forecasts,three critical challenges are encountered in distributed PV power forecasting:1)lack of on-site meteorological observation,2)leveraging extraneous data to enhance forecasting performance,3)spatial-temporal modelling methods of meteorological information around the distributed PV stations.To address these issues,we propose a Graph Spatial-Temporal Attention Neural Network(GSTANN)to predict the very short-term power of distributed PV.First,we use satellite remote sensing data covering a specific geographical area to supplement meteorological information for all PV stations.Then,we apply the graph convolution block to model the non-Euclidean local and global spatial dependence and design an attention mechanism to simultaneously derive temporal and spatial correlations.Subsequently,we propose a data fusion module to solve the time misalignment between satellite remote sensing data and surrounding measured on-site data and design a power approximation block to map the conversion from solar irradiance to PV power.Experiments conducted with real-world case study datasets demonstrate that the prediction performance of GSTANN outperforms five state-of-the-art baselines.展开更多
Crowd flow prediction has become a strategically important task in urban computing,which is the prerequisite for traffic management,urban planning and public safety.However,due to variousness of crowd flows,multiple h...Crowd flow prediction has become a strategically important task in urban computing,which is the prerequisite for traffic management,urban planning and public safety.However,due to variousness of crowd flows,multiple hidden correlations among urban regions affect the flows.Besides,crowd flows are also influenced by the distribution of Points-of-Interests(POIs),transitional functional zones,environmental climate,and different time slots of the dynamic urban environment.Thus,we exploit multiple correlations between urban regions by considering the mentioned factors comprehensively rather than the geographical distance and propose multi-graph convolution gated recurrent units(MGCGRU)for capturing these multiple spatial correlations.For adapting to the dynamic mobile data,we leverage multiple spatial correlations and the temporal dependency to build an urban flow prediction framework that uses only a little recent data as the input but can mine rich internal modes.Hence,the framework can mitigate the influence of the instability of data distributions in highly dynamic environments for prediction.The experimental results on two real-world datasets in Shanghai show that our model is superior to state-of-the-art methods for crowd flow prediction.展开更多
基金supported by the National Natural Science Foundation of China(Grant Nos.62472149,62376089,62202147)Hubei Provincial Science and Technology Plan Project(2023BCB04100).
文摘Accurate traffic flow prediction has a profound impact on modern traffic management. Traffic flow has complex spatial-temporal correlations and periodicity, which poses difficulties for precise prediction. To address this problem, a Multi-head Self-attention and Spatial-Temporal Graph Convolutional Network (MSSTGCN) for multiscale traffic flow prediction is proposed. Firstly, to capture the hidden traffic periodicity of traffic flow, traffic flow is divided into three kinds of periods, including hourly, daily, and weekly data. Secondly, a graph attention residual layer is constructed to learn the global spatial features across regions. Local spatial-temporal dependence is captured by using a T-GCN module. Thirdly, a transformer layer is introduced to learn the long-term dependence in time. A position embedding mechanism is introduced to label position information for all traffic sequences. Thus, this multi-head self-attention mechanism can recognize the sequence order and allocate weights for different time nodes. Experimental results on four real-world datasets show that the MSSTGCN performs better than the baseline methods and can be successfully adapted to traffic prediction tasks.
文摘The ability to accurately predict urban traffic flows is crucial for optimising city operations.Consequently,various methods for forecasting urban traffic have been developed,focusing on analysing historical data to understand complex mobility patterns.Deep learning techniques,such as graph neural networks(GNNs),are popular for their ability to capture spatio-temporal dependencies.However,these models often become overly complex due to the large number of hyper-parameters involved.In this study,we introduce Dynamic Multi-Graph Spatial-Temporal Graph Neural Ordinary Differential Equation Networks(DMST-GNODE),a framework based on ordinary differential equations(ODEs)that autonomously discovers effective spatial-temporal graph neural network(STGNN)architectures for traffic prediction tasks.The comparative analysis of DMST-GNODE and baseline models indicates that DMST-GNODE model demonstrates superior performance across multiple datasets,consistently achieving the lowest Root Mean Square Error(RMSE)and Mean Absolute Error(MAE)values,alongside the highest accuracy.On the BKK(Bangkok)dataset,it outperformed other models with an RMSE of 3.3165 and an accuracy of 0.9367 for a 20-min interval,maintaining this trend across 40 and 60 min.Similarly,on the PeMS08 dataset,DMST-GNODE achieved the best performance with an RMSE of 19.4863 and an accuracy of 0.9377 at 20 min,demonstrating its effectiveness over longer periods.The Los_Loop dataset results further emphasise this model’s advantage,with an RMSE of 3.3422 and an accuracy of 0.7643 at 20 min,consistently maintaining superiority across all time intervals.These numerical highlights indicate that DMST-GNODE not only outperforms baseline models but also achieves higher accuracy and lower errors across different time intervals and datasets.
基金supported,in part,by the National Nature Science Foundation of China under Grant Numbers 62272236,62376128in part,by the Natural Science Foundation of Jiangsu Province under Grant Numbers BK20201136,BK20191401.
文摘Fall behavior is closely related to high mortality in the elderly,so fall detection becomes an important and urgent research area.However,the existing fall detection methods are difficult to be applied in daily life due to a large amount of calculation and poor detection accuracy.To solve the above problems,this paper proposes a dense spatial-temporal graph convolutional network based on lightweight OpenPose.Lightweight OpenPose uses MobileNet as a feature extraction network,and the prediction layer uses bottleneck-asymmetric structure,thus reducing the amount of the network.The bottleneck-asymmetrical structure compresses the number of input channels of feature maps by 1×1 convolution and replaces the 7×7 convolution structure with the asymmetric structure of 1×7 convolution,7×1 convolution,and 7×7 convolution in parallel.The spatial-temporal graph convolutional network divides the multi-layer convolution into dense blocks,and the convolutional layers in each dense block are connected,thus improving the feature transitivity,enhancing the network’s ability to extract features,thus improving the detection accuracy.Two representative datasets,Multiple Cameras Fall dataset(MCF),and Nanyang Technological University Red Green Blue+Depth Action Recognition dataset(NTU RGB+D),are selected for our experiments,among which NTU RGB+D has two evaluation benchmarks.The results show that the proposed model is superior to the current fall detection models.The accuracy of this network on the MCF dataset is 96.3%,and the accuracies on the two evaluation benchmarks of the NTU RGB+D dataset are 85.6%and 93.5%,respectively.
基金supported by the China Scholarship Council and the CERNET Innovation Project under grant No.20170111.
文摘The prediction for Multivariate Time Series(MTS)explores the interrelationships among variables at historical moments,extracts their relevant characteristics,and is widely used in finance,weather,complex industries and other fields.Furthermore,it is important to construct a digital twin system.However,existing methods do not take full advantage of the potential properties of variables,which results in poor predicted accuracy.In this paper,we propose the Adaptive Fused Spatial-Temporal Graph Convolutional Network(AFSTGCN).First,to address the problem of the unknown spatial-temporal structure,we construct the Adaptive Fused Spatial-Temporal Graph(AFSTG)layer.Specifically,we fuse the spatial-temporal graph based on the interrelationship of spatial graphs.Simultaneously,we construct the adaptive adjacency matrix of the spatial-temporal graph using node embedding methods.Subsequently,to overcome the insufficient extraction of disordered correlation features,we construct the Adaptive Fused Spatial-Temporal Graph Convolutional(AFSTGC)module.The module forces the reordering of disordered temporal,spatial and spatial-temporal dependencies into rule-like data.AFSTGCN dynamically and synchronously acquires potential temporal,spatial and spatial-temporal correlations,thereby fully extracting rich hierarchical feature information to enhance the predicted accuracy.Experiments on different types of MTS datasets demonstrate that the model achieves state-of-the-art single-step and multi-step performance compared with eight other deep learning models.
基金supported by the National Natural Science Foundation of China(61975020,62171053)。
文摘Accurate traffic pattern prediction in largescale networks is of great importance for intelligent system management and automatic resource allocation.System-level mobile traffic forecasting has significant challenges due to the tremendous temporal and spatial dynamics introduced by diverse Internet user behaviors and frequent traffic migration.Spatialtemporal graph modeling is an efficient approach for analyzing the spatial relations and temporal trends of mobile traffic in a large system.Previous research may not reflect the optimal dependency by ignoring inter-base station dependency or pre-determining the explicit geological distance as the interrelationship of base stations.To overcome the limitations of graph structure,this study proposes an adaptive graph convolutional network(AGCN)that captures the latent spatial dependency by developing self-adaptive dependency matrices and acquires temporal dependency using recurrent neural networks.Evaluated on two mobile network datasets,the experimental results demonstrate that this method outperforms other baselines and reduces the mean absolute error by 3.7%and 5.6%compared to time-series based approaches.
基金funded by the National Key Research and Development Program of China:Sino-Malta Fund 2022“Autonomous Biomimetic Underwater Vehicle for Digital Cage Monitoring”(Grant No.2022YFE0107100).
文摘In recent years,aquaculture has developed rapidly,especially in coastal and open ocean areas.In practice,water quality prediction is of critical importance.However,traditional water quality prediction models face limitations in handling complex spatiotemporal patterns.To address this challenge,a prediction model was proposed for water quality,namely an adaptive multi-channel temporal graph convolutional network(AMTGCN).The AMTGCN integrates adaptive graph construction,multi-channel spatiotemporal graph convolutional network,and fusion layers,and can comprehensively capture the spatial relationships and spatiotemporal patterns in aquaculture water quality data.Onsite aquaculture water quality data and the metrics MAE,RMSE,MAPE,and R^(2) were collected to validate the AMTGCN.The results show that the AMTGCN presents an average improvement of 34.01%,34.59%,36.05%,and 17.71%compared to LSTM,respectively;an average improvement of 64.84%,56.78%,64.82%,and 153.16%compared to the STGCN,respectively;an average improvement of 55.25%,48.67%,57.01%,and 209.00%compared to GCN-LSTM,respectively;and an average improvement of 7.05%,5.66%,7.42%,and 2.47%compared to TCN,respectively.This indicates that the AMTGCN,integrating the innovative structure of adaptive graph construction and multi-channel spatiotemporal graph convolutional network,could provide an efficient solution for water quality prediction in aquaculture.
基金supported by the National Key Research and Development Program of China (Grant No.2022YFC3600403)Emerging Interdisciplinary Platform for Medicine and Engineering in Sports (EIPMES),Beijing,China。
文摘Energy expenditure estimation can be used to measure the exercise load and physical condition of different individuals, such as soldiers, athletes, firemen, etc., during their training and work. Energy expenditure estimation methods based on computer vision have rapidly developed in recent years. Compared with sensor-based methods, such methods are capable of monitoring several target persons at the same time, and the subjects do not need to wear different sensor devices that hamper their movement. In this paper, we propose a cross-attention spatial–temporal convolutional neural network to predict the energy expenditure of people under different exercise intensities. The model explores the relationship between changes in the human skeleton and energy expenditure intensity. In addition, a cross-attention correction module is used to reduce the negative effects of individual physical fitness characteristics during energy expenditure estimation. The experimental results show that our proposed method achieves high accuracy for energy expenditure estimation and performs better than existing computer vision-based energy expenditure estimation methods do. The proposed method can be widely used in various physical activity scenarios to measure energy expenditure, increasing the convenience of usage.
基金Project supported by the National Natural Science Foundation of China(No.U1866602)the Natural Science Foundation of Zhejiang Province,China(No.LZ22F020015)。
文摘Analyzing the vulnerability of power systems in cascading failures is generally regarded as a challenging problem.Although existing studies can extract some critical rules,they fail to capture the complex subtleties under different operational conditions.In recent years,several deep learning methods have been applied to address this issue.However,most of the existing deep learning methods consider only the grid topology of a power system in terms of topological connections,but do not encompass a power system’s spatial information such as the electrical distance to increase the accuracy in the process of graph convolution.In this paper,we construct a novel power-weighted line graph that uses power system topology and spatial information to optimize the edge weight assignment of the line graph.Then we propose a multi-graph convolutional network(MGCN)based on a graph classification task,which preserves a power system’s spatial correlations and captures the relationships among physical components.Our model can better handle the problem with power systems that have parallel lines,where our method can maintain desirable accuracy in modeling systems with these extra topology features.To increase the interpretability of the model,we present the MGCN using layer-wise relevance propagation and quantify the contributing factors of model classification.
基金sponsored by the National Key Research and Development Program of China(No.2023YFB4606200)Key Program of Science and Technology of Yunnan Province,China (No.202302AB080020)Key Project of Shanghai Zhangjiang National Independent hnovation Demonstration Zone,China(No.ZJ2021-2D-006).
文摘Accurately predicting the State of Health(SOH)of lithium-ion batteries is a critical challenge to ensure their reliability and safety in energy storage systems,such as electric vehicles and renewable energy grids.The intricate battery degradation process is influenced by evolving spatial and temporal interactions among health indicators.Existing methods often fail to capture the dynamic interactions between health indicators over time,resulting in limited predictive accuracy.To address these challenges,we propose a novel framework,Dynamic Graph Learning with Spatial-Temporal Fusion Attention(DGL-STFA),which transforms health indicator series time-data into time-evolving graph representations.The framework employs multi-scale convolutional neural networks to capture diverse temporal patterns,a self-attention mechanism to construct dynamic adjacency matrices that adapt over time,and a temporal attention mechanism to identify and prioritize key moments that influence battery degradation.This combination enables DGL-STFA to effectively model both dynamic spatial relationships and long-term temporal dependencies,enhancing SOH prediction accuracy.Extensive experiments were conducted on the NASA and CALCE battery datasets,comparing this framework with traditional time-series prediction methods and other graph-based prediction methods.The results demonstrate that our framework significantly improves prediction accuracy,with a mean absolute error more than 30%lower than other methods.Further analysis demonstrated the robustness of DGL-STFA across various battery life stages,including early,mid,and end-of-life phases.These results highlight the capability of DGL-STFA to accurately predict SOH,addressing critical challenges in advancing battery health monitoring for energy storage applications.
基金supported by National Natural Science Foundation of China(Nos.61976209 and 62020106015)the CAS International Collaboration Key Project,China(No.173211KYSB20190024)the Strategic Priority Research Program of CAS,China(No.XDB32040000)。
文摘Nowadays,deep neural networks(DNNs)have been equipped with powerful representation capabilities.The deep convolutional neural networks(CNNs)that draw inspiration from the visual processing mechanism of the primate early visual cortex have outperformed humans on object categorization and have been found to possess many brain-like properties.Recently,vision transformers(ViTs)have been striking paradigms of DNNs and have achieved remarkable improvements on many vision tasks compared to CNNs.It is natural to ask how the brain-like properties of ViTs are.Beyond the model paradigm,we are also interested in the effects of factors,such as model size,multimodality,and temporality,on the ability of networks to model the human visual pathway,especially when considering that existing research has been limited to CNNs.In this paper,we systematically evaluate the brain-like properties of 30 kinds of computer vision models varying from CNNs and ViTs to their hybrids from the perspective of explaining brain activities of the human visual cortex triggered by dynamic stimuli.Experiments on two neural datasets demonstrate that neither CNN nor transformer is the optimal model paradigm for modelling the human visual pathway.ViTs reveal hierarchical correspondences to the visual pathway as CNNs do.Moreover,we find that multi-modal and temporal networks can better explain the neural activities of large parts of the visual cortex,whereas a larger model size is not a sufficient condition for bridging the gap between human vision and artificial networks.Our study sheds light on the design principles for more brain-like networks.The code is available at https://github.com/QYiZhou/LWNeuralEncoding.
基金the National Natural Science Foundation of China(Grant No.21991093)the Strategic Priority Research Program of Chinese Academy of Sciences(Grant No.XDA29050200)+1 种基金the Dalian Institute of Chemical Physics(DICP I202135)the Energy Science and Technology Revolution Project(Grant No.E2010412).
文摘Methanol-to-olefins,as a promising non-oil pathway for the synthesis of light olefins,has been successfully industrialized.The accurate prediction of process variables can yield significant benefits for advanced process control and optimization.The challenge of this task is underscored by the failure of traditional methods in capturing the complex characteristics of industrial processes,such as high nonlinearities,dynamics,and data distribution shift caused by diverse operating conditions.In this paper,we propose a novel hybrid spatial-temporal deep learning prediction model to address these issues.Firstly,a unique data normalization technique called reversible instance normalization is employed to solve the problem of different data distributions.Subsequently,convolutional neural network integrated with the self-attention mechanism are utilized to extract the temporal patterns.Meanwhile,a multi-graph convolutional network is leveraged to model the spatial interactions.Afterward,the extracted temporal and spatial features are fused as input into a fully connected neural network to complete the prediction.Finally,the outputs are denormalized to obtain the ultimate results.The monitoring results of the dynamic trends of process variables in an actual industrial methanol-to-olefins process demonstrate that our model not only achieves superior prediction performance but also can reveal complex spatial-temporal relationships using the learned attention matrices and adjacency matrices,making the model more interpretable.Lastly,this model is deployed onto an end-to-end Industrial Internet Platform,which achieves effective practical results.
基金The National Natural Science Foundation of China under contract Nos 61890964 and 42206177the Joint Funds of the National Natural Science Foundation of China under contract No.U1906217.
文摘Marine oil spill emulsions are difficult to recover,and the damage to the environment is not easy to eliminate.The use of remote sensing to accurately identify oil spill emulsions is highly important for the protection of marine environments.However,the spectrum of oil emulsions changes due to different water content.Hyperspectral remote sensing and deep learning can use spectral and spatial information to identify different types of oil emulsions.Nonetheless,hyperspectral data can also cause information redundancy,reducing classification accuracy and efficiency,and even overfitting in machine learning models.To address these problems,an oil emulsion deep-learning identification model with spatial-spectral feature fusion is established,and feature bands that can distinguish between crude oil,seawater,water-in-oil emulsion(WO),and oil-in-water emulsion(OW)are filtered based on a standard deviation threshold–mutual information method.Using oil spill airborne hyperspectral data,we conducted identification experiments on oil emulsions in different background waters and under different spatial and temporal conditions,analyzed the transferability of the model,and explored the effects of feature band selection and spectral resolution on the identification of oil emulsions.The results show the following.(1)The standard deviation–mutual information feature selection method is able to effectively extract feature bands that can distinguish between WO,OW,oil slick,and seawater.The number of bands was reduced from 224 to 134 after feature selection on the Airborne Visible Infrared Imaging Spectrometer(AVIRIS)data and from 126 to 100 on the S185 data.(2)With feature selection,the overall accuracy and Kappa of the identification results for the training area are 91.80%and 0.86,respectively,improved by 2.62%and 0.04,and the overall accuracy and Kappa of the identification results for the migration area are 86.53%and 0.80,respectively,improved by 3.45%and 0.05.(3)The oil emulsion identification model has a certain degree of transferability and can effectively identify oil spill emulsions for AVIRIS data at different times and locations,with an overall accuracy of more than 80%,Kappa coefficient of more than 0.7,and F1 score of 0.75 or more for each category.(4)As the spectral resolution decreasing,the model yields different degrees of misclassification for areas with a mixed distribution of oil slick and seawater or mixed distribution of WO and OW.Based on the above experimental results,we demonstrate that the oil emulsion identification model with spatial–spectral feature fusion achieves a high accuracy rate in identifying oil emulsion using airborne hyperspectral data,and can be applied to images under different spatial and temporal conditions.Furthermore,we also elucidate the impact of factors such as spectral resolution and background water bodies on the identification process.These findings provide new reference for future endeavors in automated marine oil spill detection.
基金the Science and Technology Program of State Grid Corporation of China(No.5211TZ1900S6)。
文摘Safety production is of great significance to the development of enterprises and society.Accidents often cause great losses because of the particularity environment of electric power.Therefore,it is important to improve the safety supervision and protection in the electric power environment.In this paper,we simulate the actual electric power operation scenario by monitoring equipment and propose a real-time detection method of illegal actions based on human body key points to ensure safety behavior in real time.In this method,the human body key points in video frames were first extracted by the high-resolution network,and then classified in real time by spatial-temporal graph convolutional network.Experimental results show that this method can effectively detect illegal actions in the simulated scene.
文摘Lip-reading is a process of interpreting speech by visually analysing lip movements.Recent research in this area has shifted from simple word recognition to lip-reading sentences in the wild.This paper attempts to use phonemes as a classification schema for lip-reading sentences to explore an alternative schema and to enhance system performance.Different classification schemas have been investigated,including characterbased and visemes-based schemas.The visual front-end model of the system consists of a Spatial-Temporal(3D)convolution followed by a 2D ResNet.Transformers utilise multi-headed attention for phoneme recognition models.For the language model,a Recurrent Neural Network is used.The performance of the proposed system has been testified with the BBC Lip Reading Sentences 2(LRS2)benchmark dataset.Compared with the state-of-the-art approaches in lip-reading sentences,the proposed system has demonstrated an improved performance by a 10%lower word error rate on average under varying illumination ratios.
基金the National Natural Science Foundation of China(No.61903109)the Zhejiang Provincial Natural Science Foundation of China(No.LY19F030021)。
文摘The prediction of regional traffic flows is important for traffic control and management in an intelligent traffic system.With the help of deep neural networks,the convolutional neural network or residual neural network,which can be applied only to regular grids,is adopted to capture the spatial dependence for flow prediction.However,the obtained regions are always irregular considering the road network and administrative boundaries;thus,dividing the city into grids is inaccurate for prediction.In this paper,we propose a new model based on multi-graph convolutional network and gated recurrent unit(MGCN-GRU)to predict traffic flows for irregular regions.Specifically,we first construct heterogeneous inter-region graphs for a city to reflect the rela-tionships among regions.In each graph,nodes represent the irregular regions and edges represent the relationship types between regions.Then,we propose a multi-graph convolutional network to fuse different inter-region graphs and additional attributes.The GRU is further used to capture the temporal dependence and to predict future traffic flows.Experimental results based on three real-world large-scale datasets(public bicycle system dataset,taxi dataset,and dockless bike-sharing dataset)show that our MGCN-GRU model outperforms a variety of existing methods.
基金This work was supported by the National Key Research and Development Program of China under Grant No. 2016YFBI001405, and the National Natural Science Foundation of China under Grant Nos. 61232013, 61422212, and 61661146002.
文摘The challenge of coping with non-frontal head poses during facial expression recognition results in considerable reduction of accuracy and robustness when capturing expressions that occur during natural communications. In this paper, we attempt to recognize facial expressions under poses with large rotation angles from 2D videos. A depth^patch based 4D expression representation model is proposed. It was reconstructed from 2D dynamic images for delineating continuous spatial changes and temporal context under non-frontal cases. Furthermore, we present an effective deep neural network classifier, which can accurately capture pose-variant expression features from the depth patches and recognize non-frontal expressions. Experimental results on the BU-4DFE database show that the proposed method achieves a high recognition accuracy of 86.87% for non-frontal facial expressions within a range of head rotation angle of up to 52%, outperforming existing methods. We also present a quantitative analysis of the components contributing to the performance gain through tests on the BU-4DFE and Multi-PIE datasets.
基金Supported by the National Social Science Fund of China(24BJY088)the Natural Science Foundation of Guangdong Province(2025A1515011633)。
文摘With the gradual advancement of digital transformation in the tourism industry,exploiting implicit information of tourism demand for prediction has gradually become mainstream in this research field.Among these works,the research on unidirectional implicit information is well-developed,whereas studies on interactive implicit information are scarce.Therefore,to further enhance the performance of tourism forecasting,this study employs a LangChain-based interactive context sentiment analysis model in the realm of feature engineering.By incorporating the sentiment tendencies found in online tourism reviews into tourism demand forecasting research,the model's inferential capabilities are significantly improved.In terms of model processing,a new tourism demand prediction fusion model,EMD-STGCN-GRU-LSTM-Transformer(abbreviated as EST-Net),has been developed to address the unique spatio-temporal characteristics and imbalance of tourism data,thereby enhancing the model's ability to accurately extract spatio-temporal sequences.Additionally,the PCA method is utilized to aggregate multiple key indicators of sentiment attention and natural environmental factors,constructing a tourism prediction indicator system to further correct the overall framework bias.
基金supported in part by the Strategic Priority Research Program of the Chinese Academy of Sciences(No.XDA27000000)。
文摘Photovoltaic(PV)power forecasting is essential for secure operation of a power system.Effective prediction of PV power can improve new energy consumption capacity,help power system planning,promote development of smart grids,and ultimately support construction of smart energy cities.However,different from centralized PV power forecasts,three critical challenges are encountered in distributed PV power forecasting:1)lack of on-site meteorological observation,2)leveraging extraneous data to enhance forecasting performance,3)spatial-temporal modelling methods of meteorological information around the distributed PV stations.To address these issues,we propose a Graph Spatial-Temporal Attention Neural Network(GSTANN)to predict the very short-term power of distributed PV.First,we use satellite remote sensing data covering a specific geographical area to supplement meteorological information for all PV stations.Then,we apply the graph convolution block to model the non-Euclidean local and global spatial dependence and design an attention mechanism to simultaneously derive temporal and spatial correlations.Subsequently,we propose a data fusion module to solve the time misalignment between satellite remote sensing data and surrounding measured on-site data and design a power approximation block to map the conversion from solar irradiance to PV power.Experiments conducted with real-world case study datasets demonstrate that the prediction performance of GSTANN outperforms five state-of-the-art baselines.
基金This work was supported by the National Natural Science Foundation of China under Grant No.61572253the Aviation Science Fund of China under Grant No.2016ZC52030.
文摘Crowd flow prediction has become a strategically important task in urban computing,which is the prerequisite for traffic management,urban planning and public safety.However,due to variousness of crowd flows,multiple hidden correlations among urban regions affect the flows.Besides,crowd flows are also influenced by the distribution of Points-of-Interests(POIs),transitional functional zones,environmental climate,and different time slots of the dynamic urban environment.Thus,we exploit multiple correlations between urban regions by considering the mentioned factors comprehensively rather than the geographical distance and propose multi-graph convolution gated recurrent units(MGCGRU)for capturing these multiple spatial correlations.For adapting to the dynamic mobile data,we leverage multiple spatial correlations and the temporal dependency to build an urban flow prediction framework that uses only a little recent data as the input but can mine rich internal modes.Hence,the framework can mitigate the influence of the instability of data distributions in highly dynamic environments for prediction.The experimental results on two real-world datasets in Shanghai show that our model is superior to state-of-the-art methods for crowd flow prediction.