Object detection is a fundamental task in computer vision that involves identifying and localizing objects within an image.Local features extracted by convolutions,etc.,capture finegrained details such as edges and te...Object detection is a fundamental task in computer vision that involves identifying and localizing objects within an image.Local features extracted by convolutions,etc.,capture finegrained details such as edges and textures,while global features extracted by full connection layers,etc.,represent the overall structure and long-range relationships within the image.These features are crucial for accurate object detection,yet most existing methods focus on aggregating local and global features,often overlooking the importance of medium-range dependencies.To address this gap,we propose a novel full perception module(FPModule),a simple yet effective feature extraction module designed to simultaneously capture local details,medium-range dependencies,and long-range dependencies.Building on this,we construct a full perception head(FP-Head)by cascading multiple FP-Modules,enabling the prediction layer to leverage the most informative features.Experimental results in the MS COCO dataset demonstrate that our approach significantly enhances object recognition and localization,achieving 2.7−5.7 APval gains when integrated into standard object detectors.Notably,the FP-Module is a universal solution that can be seamlessly incorporated into existing detectors to boost performance.The code will be released at https://github.com/Idcogroup/FP-Head.展开更多
The deployment of vehicle micro-motors has witnessed an expansion owing to the progression in electrification and intelligent technologies.However,some micro-motors may exhibit design deficiencies,component wear,assem...The deployment of vehicle micro-motors has witnessed an expansion owing to the progression in electrification and intelligent technologies.However,some micro-motors may exhibit design deficiencies,component wear,assembly errors,and other imperfections that may arise during the design or manufacturing phases.Conse-quently,these micro-motors might generate anomalous noises during their operation,consequently exerting a substantial adverse influence on the overall comfort of drivers and passengers.Automobile micro-motors exhibit a diverse array of structural variations,consequently leading to the manifestation of a multitude of distinctive auditory irregularities.To address the identification of diverse forms of abnormal noise,this research presents a novel approach rooted in the utilization of vibro-acoustic fusion-convolutional neural network(VAF-CNN).This method entails the deployment of distinct network branches,each serving to capture disparate features from the multi-sensor data,all the while considering the auditory perception traits inherent in the human auditory sys-tem.The intermediary layer integrates the concept of adaptive weighting of multi-sensor features,thus affording a calibration mechanism for the features hailing from multiple sensors,thereby enabling a further refinement of features within the branch network.For optimal model efficacy,a feature fusion mechanism is implemented in the concluding layer.To substantiate the efficacy of the proposed approach,this paper initially employs an augmented data methodology inspired by modified SpecAugment,applied to the dataset of abnormal noise sam-ples,encompassing scenarios both with and without in-vehicle interior noise.This serves to mitigate the issue of limited sample availability.Subsequent comparative evaluations are executed,contrasting the performance of the model founded upon single-sensor data against other feature fusion models reliant on multi-sensor data.The experimental results substantiate that the suggested methodology yields heightened recognition accuracy and greater resilience against interference.Moreover,it holds notable practical significance in the engineering domain,as it furnishes valuable support for the targeted management of noise emanating from vehicle micro-motors.展开更多
Gait,the unique pattern of how a person walks,has emerged as one of the most promising biometric features in modern intelligent sensing.Unlike fingerprints or facial characteristics,gait can be captured unobtrusively ...Gait,the unique pattern of how a person walks,has emerged as one of the most promising biometric features in modern intelligent sensing.Unlike fingerprints or facial characteristics,gait can be captured unobtrusively and at a distance,without requiring the subject’s awareness or cooperation.This makes it highly suitable for long-range surveillance,forensic investigation,and smart environments where contactless recognition is crucial.Traditional gait-recognition systems rely either on silhouettes,which capture the outer appearance of a person,or on skeletons,which describe the internal structure of human motion.Each modality provides only a partial understanding of gait.Silhouettes emphasize shape and contour but are easily distorted by clothing or carried objects;skeletons describe motion dynamics and limb coordination but lose discriminative details about body shape.This article presents the concept of Complementary Semantic Embedding(CSE),a unified framework that merges silhouette and skeleton information into a comprehensive semantic representation of human walking.By modeling the complementary nature of appearance and structure,the approach achieves more robust and accurate gait recognition even under challenging conditions.展开更多
Accessible communication based on sign language recognition(SLR)is the key to emergency medical assistance for the hearing-impaired community.Balancing the capture of both local and global information in SLR for emerg...Accessible communication based on sign language recognition(SLR)is the key to emergency medical assistance for the hearing-impaired community.Balancing the capture of both local and global information in SLR for emergency medicine poses a significant challenge.To address this,we propose a novel approach based on the inter-learning of visual features between global and local information.Specifically,our method enhances the perception capabilities of the visual feature extractor by strategically leveraging the strengths of convolutional neural network(CNN),which are adept at capturing local features,and visual transformers which perform well at perceiving global features.Furthermore,to mitigate the issue of overfitting caused by the limited availability of sign language data for emergency medical applications,we introduce an enhanced short temporal module for data augmentation through additional subsequences.Experimental results on three publicly available sign language datasets demonstrate the efficacy of the proposed approach.展开更多
Imaging hyperspectral technology has distinctive advantages of non-destructive and non-contact measurement,and the integration of spectral and spatial data.These characteristics present new methodologies for intellige...Imaging hyperspectral technology has distinctive advantages of non-destructive and non-contact measurement,and the integration of spectral and spatial data.These characteristics present new methodologies for intelligent geological sensing in tunnels and other underground engineering projects.However,the in situ acquisition and rapid classification of hyperspectral images in underground still faces great challenges,including the difficulty in obtaining uniform hyperspectral images and the complexity of deploying sophisticated models on mobile platforms.This study proposes an intelligent lithology identification method based on partition feature extraction of hyperspectral images.Firstly,pixel-level hyperspectral information from representative lithological regions is extracted and fused to obtain rock hyperspectral image partition features.Subsequently,an SG-SNV-PCA-DNN(SSPD)model specifically designed for optimizing rock hyperspectral data,performing spectral dimensionality reduction,and identifying lithology is integrated.In an experimental study involving 3420 hyperspectral images,the SSPD identification model achieved the highest accuracy in the testing set,reaching 98.77%.Moreover,the speed of the SSPD model was found to be 18.5%faster than that of the unprocessed model,with an accuracy improvement of 5.22%.In contrast,the ResNet-101 model,used for point-by-point identification based on non-partitioned features,achieved a maximum accuracy of 97.86%in the testing set.In addition,the partition feature extraction methods significantly reduce computational complexity.An objective evaluation of various models demonstrated that the SSPD model exhibited superior performance,achieving a precision(P)of 99.46%,a recall(R)of 99.44%,and F1 score(F1)of 99.45%.Additionally,a pioneering in situ detection work was carried out in a tunnel using underground hyperspectral imaging technology.展开更多
Scene perception and trajectory forecasting are two fundamental challenges that are crucial to a safe and reliable autonomous driving(AD)system.However,most proposed methods aim at addressing one of the two challenges...Scene perception and trajectory forecasting are two fundamental challenges that are crucial to a safe and reliable autonomous driving(AD)system.However,most proposed methods aim at addressing one of the two challenges mentioned above with a single model.To tackle this dilemma,this paper proposes spatio-temporal semantics and interaction graph aggregation for multi-agent perception and trajectory forecasting(STSIGMA),an efficient end-to-end method to jointly and accurately perceive the AD environment and forecast the trajectories of the surrounding traffic agents within a unified framework.ST-SIGMA adopts a trident encoder-decoder architecture to learn scene semantics and agent interaction information on bird’s-eye view(BEV)maps simultaneously.Specifically,an iterative aggregation network is first employed as the scene semantic encoder(SSE)to learn diverse scene information.To preserve dynamic interactions of traffic agents,ST-SIGMA further exploits a spatio-temporal graph network as the graph interaction encoder.Meanwhile,a simple yet efficient feature fusion method to fuse semantic and interaction features into a unified feature space as the input to a novel hierarchical aggregation decoder for downstream prediction tasks is designed.Extensive experiments on the nuScenes data set have demonstrated that the proposed ST-SIGMA achieves significant improvements compared to the state-of-theart(SOTA)methods in terms of scene perception and trajectory forecasting,respectively.Therefore,the proposed approach outperforms SOTA in terms of model generalisation and robustness and is therefore more feasible for deployment in realworld AD scenarios.展开更多
Northeast China, as the most important production base of agriculture, forestry, and livestock-breeding as well as the old industrial base in the whole country, has been playin a key role in the construction and deve...Northeast China, as the most important production base of agriculture, forestry, and livestock-breeding as well as the old industrial base in the whole country, has been playin a key role in the construction and development of China's economy. However, after the policy of reform and open-up was taken in China. the economic development speed and efficiency ofthis area have turned to be evidently lower than those of coastal area and the national average level as well, which is so-called 'Northeast Phenomenon' and 'Neo-Northeast Phenomenon'. In terms of those phenomena, this paper firstly reviews the spatial and temporal features of the regional evolution of this area so as to unveil the profound forming causes of 'Northeast Phenomena' and 'Neo-Northeast Phenomena'. And then the paper makes a further exploration into the status quo of this region and its forming causes by analyzing its economy gross, industrial structure, product structure, regional eco-categories, etc. At the end of the paper, the authors put forward the basic coordinated development strategies for Northeast China. namely we can revitalize this area by means of adjustment of economic structure, regional coordination, planning urban and rural areas as a whole, institutional innovation, etc.展开更多
The success of intelligent transportation systems relies heavily on accurate traffic prediction,in which how to model the underlying spatial-temporal information from traffic data has come under the spotlight.Most exi...The success of intelligent transportation systems relies heavily on accurate traffic prediction,in which how to model the underlying spatial-temporal information from traffic data has come under the spotlight.Most existing frameworks typically utilize separate modules for spatial and temporal correlations modeling.However,this stepwise pattern may limit the effectiveness and efficiency in spatial-temporal feature extraction and cause the overlook of important information in some steps.Furthermore,it is lacking sufficient guidance from prior information while modeling based on a given spatial adjacency graph(e.g.,deriving from the geodesic distance or approximate connectivity),and may not reflect the actual interaction between nodes.To overcome those limitations,our paper proposes a spatial-temporal graph synchronous aggregation(STGSA)model to extract the localized and long-term spatial-temporal dependencies simultaneously.Specifically,a tailored graph aggregation method in the vertex domain is designed to extract spatial and temporal features in one graph convolution process.In each STGSA block,we devise a directed temporal correlation graph to represent the localized and long-term dependencies between nodes,and the potential temporal dependence is further fine-tuned by an adaptive weighting operation.Meanwhile,we construct an elaborated spatial adjacency matrix to represent the road sensor graph by considering both physical distance and node similarity in a datadriven manner.Then,inspired by the multi-head attention mechanism which can jointly emphasize information from different r epresentation subspaces,we construct a multi-stream module based on the STGSA blocks to capture global information.It projects the embedding input repeatedly with multiple different channels.Finally,the predicted values are generated by stacking several multi-stream modules.Extensive experiments are constructed on six real-world datasets,and numerical results show that the proposed STGSA model significantly outperforms the benchmarks.展开更多
An objective identification technique is used to detect regional extreme low temperature events (RELTE) in China during 1960-2009. Their spatial-temporal characteristics are analyzed. The results indicate that the l...An objective identification technique is used to detect regional extreme low temperature events (RELTE) in China during 1960-2009. Their spatial-temporal characteristics are analyzed. The results indicate that the lowest temperatures of RELTE, together with the frequency distribution of the geometric latitude center, exhibit a double-peak feature. The RELTE frequently happen near the geometric area of 30°N and 42°N before the mid-1980s, but shifted afterwards to 30°N. During 1960-2009, the frequency~ intensity, and the maximum impacted area of RELTE show overall decreasing trends. Due to the contribution of RELTE, with long duratioh and large spatial range, which account for 10% of the total RELTE, there is a significant turning point in the late 1980s. A change to a much more steady state after the late 1990s is identified. In addition, the integrated indices of RELTE are classified and analyzed.展开更多
Compared to 3D object detection using a single camera,multiple cameras can overcome some limitations on field-of-view,occlusion,and low detection confidence.This study employs multiple surveillance cameras and develop...Compared to 3D object detection using a single camera,multiple cameras can overcome some limitations on field-of-view,occlusion,and low detection confidence.This study employs multiple surveillance cameras and develops a cooperative 3D object detection and tracking framework by incorporating temporal and spatial information.The framework consists of a 3D vehicle detection model,cooperatively spatial-temporal relation scheme,and heuristic camera constellation method.Specifically,the proposed cross-camera association scheme combines the geometric relationship between multiple cameras and objects in corresponding detections.The spatial-temporal method is designed to associate vehicles between different points of view at a single timestamp and fulfill vehicle tracking in the time aspect.The proposed framework is evaluated based on a synthetic cooperative dataset and shows high reliability,where the cooperative perception can recall more than 66%of the trajectory instead of 11%for single-point sensing.This could contribute to full-range surveillance for intelligent transportation systems.展开更多
Marine oil spill emulsions are difficult to recover,and the damage to the environment is not easy to eliminate.The use of remote sensing to accurately identify oil spill emulsions is highly important for the protectio...Marine oil spill emulsions are difficult to recover,and the damage to the environment is not easy to eliminate.The use of remote sensing to accurately identify oil spill emulsions is highly important for the protection of marine environments.However,the spectrum of oil emulsions changes due to different water content.Hyperspectral remote sensing and deep learning can use spectral and spatial information to identify different types of oil emulsions.Nonetheless,hyperspectral data can also cause information redundancy,reducing classification accuracy and efficiency,and even overfitting in machine learning models.To address these problems,an oil emulsion deep-learning identification model with spatial-spectral feature fusion is established,and feature bands that can distinguish between crude oil,seawater,water-in-oil emulsion(WO),and oil-in-water emulsion(OW)are filtered based on a standard deviation threshold–mutual information method.Using oil spill airborne hyperspectral data,we conducted identification experiments on oil emulsions in different background waters and under different spatial and temporal conditions,analyzed the transferability of the model,and explored the effects of feature band selection and spectral resolution on the identification of oil emulsions.The results show the following.(1)The standard deviation–mutual information feature selection method is able to effectively extract feature bands that can distinguish between WO,OW,oil slick,and seawater.The number of bands was reduced from 224 to 134 after feature selection on the Airborne Visible Infrared Imaging Spectrometer(AVIRIS)data and from 126 to 100 on the S185 data.(2)With feature selection,the overall accuracy and Kappa of the identification results for the training area are 91.80%and 0.86,respectively,improved by 2.62%and 0.04,and the overall accuracy and Kappa of the identification results for the migration area are 86.53%and 0.80,respectively,improved by 3.45%and 0.05.(3)The oil emulsion identification model has a certain degree of transferability and can effectively identify oil spill emulsions for AVIRIS data at different times and locations,with an overall accuracy of more than 80%,Kappa coefficient of more than 0.7,and F1 score of 0.75 or more for each category.(4)As the spectral resolution decreasing,the model yields different degrees of misclassification for areas with a mixed distribution of oil slick and seawater or mixed distribution of WO and OW.Based on the above experimental results,we demonstrate that the oil emulsion identification model with spatial–spectral feature fusion achieves a high accuracy rate in identifying oil emulsion using airborne hyperspectral data,and can be applied to images under different spatial and temporal conditions.Furthermore,we also elucidate the impact of factors such as spectral resolution and background water bodies on the identification process.These findings provide new reference for future endeavors in automated marine oil spill detection.展开更多
基金supported by the National Natural Science Foundation of China(62371350,62171324,62471338,U1903214).
文摘Object detection is a fundamental task in computer vision that involves identifying and localizing objects within an image.Local features extracted by convolutions,etc.,capture finegrained details such as edges and textures,while global features extracted by full connection layers,etc.,represent the overall structure and long-range relationships within the image.These features are crucial for accurate object detection,yet most existing methods focus on aggregating local and global features,often overlooking the importance of medium-range dependencies.To address this gap,we propose a novel full perception module(FPModule),a simple yet effective feature extraction module designed to simultaneously capture local details,medium-range dependencies,and long-range dependencies.Building on this,we construct a full perception head(FP-Head)by cascading multiple FP-Modules,enabling the prediction layer to leverage the most informative features.Experimental results in the MS COCO dataset demonstrate that our approach significantly enhances object recognition and localization,achieving 2.7−5.7 APval gains when integrated into standard object detectors.Notably,the FP-Module is a universal solution that can be seamlessly incorporated into existing detectors to boost performance.The code will be released at https://github.com/Idcogroup/FP-Head.
基金The author received the funding from Sichuan Natural Science Foundation(2022NSFSC1892).
文摘The deployment of vehicle micro-motors has witnessed an expansion owing to the progression in electrification and intelligent technologies.However,some micro-motors may exhibit design deficiencies,component wear,assembly errors,and other imperfections that may arise during the design or manufacturing phases.Conse-quently,these micro-motors might generate anomalous noises during their operation,consequently exerting a substantial adverse influence on the overall comfort of drivers and passengers.Automobile micro-motors exhibit a diverse array of structural variations,consequently leading to the manifestation of a multitude of distinctive auditory irregularities.To address the identification of diverse forms of abnormal noise,this research presents a novel approach rooted in the utilization of vibro-acoustic fusion-convolutional neural network(VAF-CNN).This method entails the deployment of distinct network branches,each serving to capture disparate features from the multi-sensor data,all the while considering the auditory perception traits inherent in the human auditory sys-tem.The intermediary layer integrates the concept of adaptive weighting of multi-sensor features,thus affording a calibration mechanism for the features hailing from multiple sensors,thereby enabling a further refinement of features within the branch network.For optimal model efficacy,a feature fusion mechanism is implemented in the concluding layer.To substantiate the efficacy of the proposed approach,this paper initially employs an augmented data methodology inspired by modified SpecAugment,applied to the dataset of abnormal noise sam-ples,encompassing scenarios both with and without in-vehicle interior noise.This serves to mitigate the issue of limited sample availability.Subsequent comparative evaluations are executed,contrasting the performance of the model founded upon single-sensor data against other feature fusion models reliant on multi-sensor data.The experimental results substantiate that the suggested methodology yields heightened recognition accuracy and greater resilience against interference.Moreover,it holds notable practical significance in the engineering domain,as it furnishes valuable support for the targeted management of noise emanating from vehicle micro-motors.
文摘Gait,the unique pattern of how a person walks,has emerged as one of the most promising biometric features in modern intelligent sensing.Unlike fingerprints or facial characteristics,gait can be captured unobtrusively and at a distance,without requiring the subject’s awareness or cooperation.This makes it highly suitable for long-range surveillance,forensic investigation,and smart environments where contactless recognition is crucial.Traditional gait-recognition systems rely either on silhouettes,which capture the outer appearance of a person,or on skeletons,which describe the internal structure of human motion.Each modality provides only a partial understanding of gait.Silhouettes emphasize shape and contour but are easily distorted by clothing or carried objects;skeletons describe motion dynamics and limb coordination but lose discriminative details about body shape.This article presents the concept of Complementary Semantic Embedding(CSE),a unified framework that merges silhouette and skeleton information into a comprehensive semantic representation of human walking.By modeling the complementary nature of appearance and structure,the approach achieves more robust and accurate gait recognition even under challenging conditions.
基金supported by the National Natural Science Foundation of China(No.62376197)the Tianjin Science and Technology Program(No.23JCYBJC00360)the Tianjin Health Research Project(No.TJWJ2025MS045).
文摘Accessible communication based on sign language recognition(SLR)is the key to emergency medical assistance for the hearing-impaired community.Balancing the capture of both local and global information in SLR for emergency medicine poses a significant challenge.To address this,we propose a novel approach based on the inter-learning of visual features between global and local information.Specifically,our method enhances the perception capabilities of the visual feature extractor by strategically leveraging the strengths of convolutional neural network(CNN),which are adept at capturing local features,and visual transformers which perform well at perceiving global features.Furthermore,to mitigate the issue of overfitting caused by the limited availability of sign language data for emergency medical applications,we introduce an enhanced short temporal module for data augmentation through additional subsequences.Experimental results on three publicly available sign language datasets demonstrate the efficacy of the proposed approach.
基金support from the National Natural Science Foundation of China(Grant Nos.52379103,52279103)the Natural Science Foundation of Shandong Province(Grant No.ZR2023YQ049).
文摘Imaging hyperspectral technology has distinctive advantages of non-destructive and non-contact measurement,and the integration of spectral and spatial data.These characteristics present new methodologies for intelligent geological sensing in tunnels and other underground engineering projects.However,the in situ acquisition and rapid classification of hyperspectral images in underground still faces great challenges,including the difficulty in obtaining uniform hyperspectral images and the complexity of deploying sophisticated models on mobile platforms.This study proposes an intelligent lithology identification method based on partition feature extraction of hyperspectral images.Firstly,pixel-level hyperspectral information from representative lithological regions is extracted and fused to obtain rock hyperspectral image partition features.Subsequently,an SG-SNV-PCA-DNN(SSPD)model specifically designed for optimizing rock hyperspectral data,performing spectral dimensionality reduction,and identifying lithology is integrated.In an experimental study involving 3420 hyperspectral images,the SSPD identification model achieved the highest accuracy in the testing set,reaching 98.77%.Moreover,the speed of the SSPD model was found to be 18.5%faster than that of the unprocessed model,with an accuracy improvement of 5.22%.In contrast,the ResNet-101 model,used for point-by-point identification based on non-partitioned features,achieved a maximum accuracy of 97.86%in the testing set.In addition,the partition feature extraction methods significantly reduce computational complexity.An objective evaluation of various models demonstrated that the SSPD model exhibited superior performance,achieving a precision(P)of 99.46%,a recall(R)of 99.44%,and F1 score(F1)of 99.45%.Additionally,a pioneering in situ detection work was carried out in a tunnel using underground hyperspectral imaging technology.
基金Basic and Advanced Research Projects of CSTC,Grant/Award Number:cstc2019jcyj-zdxmX0008Science and Technology Research Program of Chongqing Municipal Education Commission,Grant/Award Numbers:KJQN202100634,KJZDK201900605National Natural Science Foundation of China,Grant/Award Number:62006065。
文摘Scene perception and trajectory forecasting are two fundamental challenges that are crucial to a safe and reliable autonomous driving(AD)system.However,most proposed methods aim at addressing one of the two challenges mentioned above with a single model.To tackle this dilemma,this paper proposes spatio-temporal semantics and interaction graph aggregation for multi-agent perception and trajectory forecasting(STSIGMA),an efficient end-to-end method to jointly and accurately perceive the AD environment and forecast the trajectories of the surrounding traffic agents within a unified framework.ST-SIGMA adopts a trident encoder-decoder architecture to learn scene semantics and agent interaction information on bird’s-eye view(BEV)maps simultaneously.Specifically,an iterative aggregation network is first employed as the scene semantic encoder(SSE)to learn diverse scene information.To preserve dynamic interactions of traffic agents,ST-SIGMA further exploits a spatio-temporal graph network as the graph interaction encoder.Meanwhile,a simple yet efficient feature fusion method to fuse semantic and interaction features into a unified feature space as the input to a novel hierarchical aggregation decoder for downstream prediction tasks is designed.Extensive experiments on the nuScenes data set have demonstrated that the proposed ST-SIGMA achieves significant improvements compared to the state-of-theart(SOTA)methods in terms of scene perception and trajectory forecasting,respectively.Therefore,the proposed approach outperforms SOTA in terms of model generalisation and robustness and is therefore more feasible for deployment in realworld AD scenarios.
基金Under the auspices of National Natural Science Foundation of China (No. 40471040)
文摘Northeast China, as the most important production base of agriculture, forestry, and livestock-breeding as well as the old industrial base in the whole country, has been playin a key role in the construction and development of China's economy. However, after the policy of reform and open-up was taken in China. the economic development speed and efficiency ofthis area have turned to be evidently lower than those of coastal area and the national average level as well, which is so-called 'Northeast Phenomenon' and 'Neo-Northeast Phenomenon'. In terms of those phenomena, this paper firstly reviews the spatial and temporal features of the regional evolution of this area so as to unveil the profound forming causes of 'Northeast Phenomena' and 'Neo-Northeast Phenomena'. And then the paper makes a further exploration into the status quo of this region and its forming causes by analyzing its economy gross, industrial structure, product structure, regional eco-categories, etc. At the end of the paper, the authors put forward the basic coordinated development strategies for Northeast China. namely we can revitalize this area by means of adjustment of economic structure, regional coordination, planning urban and rural areas as a whole, institutional innovation, etc.
基金partially supported by the National Key Research and Development Program of China(2020YFB2104001)。
文摘The success of intelligent transportation systems relies heavily on accurate traffic prediction,in which how to model the underlying spatial-temporal information from traffic data has come under the spotlight.Most existing frameworks typically utilize separate modules for spatial and temporal correlations modeling.However,this stepwise pattern may limit the effectiveness and efficiency in spatial-temporal feature extraction and cause the overlook of important information in some steps.Furthermore,it is lacking sufficient guidance from prior information while modeling based on a given spatial adjacency graph(e.g.,deriving from the geodesic distance or approximate connectivity),and may not reflect the actual interaction between nodes.To overcome those limitations,our paper proposes a spatial-temporal graph synchronous aggregation(STGSA)model to extract the localized and long-term spatial-temporal dependencies simultaneously.Specifically,a tailored graph aggregation method in the vertex domain is designed to extract spatial and temporal features in one graph convolution process.In each STGSA block,we devise a directed temporal correlation graph to represent the localized and long-term dependencies between nodes,and the potential temporal dependence is further fine-tuned by an adaptive weighting operation.Meanwhile,we construct an elaborated spatial adjacency matrix to represent the road sensor graph by considering both physical distance and node similarity in a datadriven manner.Then,inspired by the multi-head attention mechanism which can jointly emphasize information from different r epresentation subspaces,we construct a multi-stream module based on the STGSA blocks to capture global information.It projects the embedding input repeatedly with multiple different channels.Finally,the predicted values are generated by stacking several multi-stream modules.Extensive experiments are constructed on six real-world datasets,and numerical results show that the proposed STGSA model significantly outperforms the benchmarks.
基金supported by the Special Scientific Research Projects for Public Interest(No.GYHY201006021 and GYHY201106016)the National Natural Science Foundation of China(No.41205040 and 40930952)
文摘An objective identification technique is used to detect regional extreme low temperature events (RELTE) in China during 1960-2009. Their spatial-temporal characteristics are analyzed. The results indicate that the lowest temperatures of RELTE, together with the frequency distribution of the geometric latitude center, exhibit a double-peak feature. The RELTE frequently happen near the geometric area of 30°N and 42°N before the mid-1980s, but shifted afterwards to 30°N. During 1960-2009, the frequency~ intensity, and the maximum impacted area of RELTE show overall decreasing trends. Due to the contribution of RELTE, with long duratioh and large spatial range, which account for 10% of the total RELTE, there is a significant turning point in the late 1980s. A change to a much more steady state after the late 1990s is identified. In addition, the integrated indices of RELTE are classified and analyzed.
基金the National Natural Science Foundation of China(No.61873167)the Automotive Industry Science and Technology Development Foundation of Shanghai(No.1904)。
文摘Compared to 3D object detection using a single camera,multiple cameras can overcome some limitations on field-of-view,occlusion,and low detection confidence.This study employs multiple surveillance cameras and develops a cooperative 3D object detection and tracking framework by incorporating temporal and spatial information.The framework consists of a 3D vehicle detection model,cooperatively spatial-temporal relation scheme,and heuristic camera constellation method.Specifically,the proposed cross-camera association scheme combines the geometric relationship between multiple cameras and objects in corresponding detections.The spatial-temporal method is designed to associate vehicles between different points of view at a single timestamp and fulfill vehicle tracking in the time aspect.The proposed framework is evaluated based on a synthetic cooperative dataset and shows high reliability,where the cooperative perception can recall more than 66%of the trajectory instead of 11%for single-point sensing.This could contribute to full-range surveillance for intelligent transportation systems.
基金The National Natural Science Foundation of China under contract Nos 61890964 and 42206177the Joint Funds of the National Natural Science Foundation of China under contract No.U1906217.
文摘Marine oil spill emulsions are difficult to recover,and the damage to the environment is not easy to eliminate.The use of remote sensing to accurately identify oil spill emulsions is highly important for the protection of marine environments.However,the spectrum of oil emulsions changes due to different water content.Hyperspectral remote sensing and deep learning can use spectral and spatial information to identify different types of oil emulsions.Nonetheless,hyperspectral data can also cause information redundancy,reducing classification accuracy and efficiency,and even overfitting in machine learning models.To address these problems,an oil emulsion deep-learning identification model with spatial-spectral feature fusion is established,and feature bands that can distinguish between crude oil,seawater,water-in-oil emulsion(WO),and oil-in-water emulsion(OW)are filtered based on a standard deviation threshold–mutual information method.Using oil spill airborne hyperspectral data,we conducted identification experiments on oil emulsions in different background waters and under different spatial and temporal conditions,analyzed the transferability of the model,and explored the effects of feature band selection and spectral resolution on the identification of oil emulsions.The results show the following.(1)The standard deviation–mutual information feature selection method is able to effectively extract feature bands that can distinguish between WO,OW,oil slick,and seawater.The number of bands was reduced from 224 to 134 after feature selection on the Airborne Visible Infrared Imaging Spectrometer(AVIRIS)data and from 126 to 100 on the S185 data.(2)With feature selection,the overall accuracy and Kappa of the identification results for the training area are 91.80%and 0.86,respectively,improved by 2.62%and 0.04,and the overall accuracy and Kappa of the identification results for the migration area are 86.53%and 0.80,respectively,improved by 3.45%and 0.05.(3)The oil emulsion identification model has a certain degree of transferability and can effectively identify oil spill emulsions for AVIRIS data at different times and locations,with an overall accuracy of more than 80%,Kappa coefficient of more than 0.7,and F1 score of 0.75 or more for each category.(4)As the spectral resolution decreasing,the model yields different degrees of misclassification for areas with a mixed distribution of oil slick and seawater or mixed distribution of WO and OW.Based on the above experimental results,we demonstrate that the oil emulsion identification model with spatial–spectral feature fusion achieves a high accuracy rate in identifying oil emulsion using airborne hyperspectral data,and can be applied to images under different spatial and temporal conditions.Furthermore,we also elucidate the impact of factors such as spectral resolution and background water bodies on the identification process.These findings provide new reference for future endeavors in automated marine oil spill detection.