Face antispoofing has received a lot of attention because it plays a role in strengthening the security of face recognition systems.Face recognition is commonly used for authentication in surveillance applications.How...Face antispoofing has received a lot of attention because it plays a role in strengthening the security of face recognition systems.Face recognition is commonly used for authentication in surveillance applications.However,attackers try to compromise these systems by using spoofing techniques such as using photos or videos of users to gain access to services or information.Many existing methods for face spoofing face difficulties when dealing with new scenarios,especially when there are variations in background,lighting,and other environmental factors.Recent advancements in deep learning with multi-modality methods have shown their effectiveness in face antispoofing,surpassing single-modal methods.However,these approaches often generate several features that can lead to issues with data dimensionality.In this study,we introduce a multimodal deep fusion network for face anti-spoofing that incorporates cross-axial attention and deep reinforcement learning techniques.This network operates at three patch levels and analyzes images from modalities(RGB,IR,and depth).Initially,our design includes an axial attention network(XANet)model that extracts deeply hidden features from multimodal images.Further,we use a bidirectional fusion technique that pays attention to both directions to combine features from each mode effectively.We further improve feature optimization by using the Enhanced Pity Beetle Optimization(EPBO)algorithm,which selects the features to address data dimensionality problems.Moreover,our proposed model employs a hybrid federated reinforcement learning(FDDRL)approach to detect and classify face anti-spoofing,achieving a more optimal tradeoff between detection rates and false positive rates.We evaluated the proposed approach on publicly available datasets,including CASIA-SURF and GREATFASD-S,and realized 98.985%and 97.956%classification accuracy,respectively.In addition,the current method outperforms other state-of-the-art methods in terms of precision,recall,and Fmeasures.Overall,the developed methodology boosts the effectiveness of our model in detecting various types of spoofing attempts.展开更多
With the development of Deep Convolutional Neural Networks(DCNNs),the extracted features for image recognition tasks have shifted from low-level features to the high-level semantic features of DCNNs.Previous studies h...With the development of Deep Convolutional Neural Networks(DCNNs),the extracted features for image recognition tasks have shifted from low-level features to the high-level semantic features of DCNNs.Previous studies have shown that the deeper the network is,the more abstract the features are.However,the recognition ability of deep features would be limited by insufficient training samples.To address this problem,this paper derives an improved Deep Fusion Convolutional Neural Network(DF-Net)which can make full use of the differences and complementarities during network learning and enhance feature expression under the condition of limited datasets.Specifically,DF-Net organizes two identical subnets to extract features from the input image in parallel,and then a well-designed fusion module is introduced to the deep layer of DF-Net to fuse the subnet’s features in multi-scale.Thus,the more complex mappings are created and the more abundant and accurate fusion features can be extracted to improve recognition accuracy.Furthermore,a corresponding training strategy is also proposed to speed up the convergence and reduce the computation overhead of network training.Finally,DF-Nets based on the well-known ResNet,DenseNet and MobileNetV2 are evaluated on CIFAR100,Stanford Dogs,and UECFOOD-100.Theoretical analysis and experimental results strongly demonstrate that DF-Net enhances the performance of DCNNs and increases the accuracy of image recognition.展开更多
With the rise of encrypted traffic,traditional network analysis methods have become less effective,leading to a shift towards deep learning-based approaches.Among these,multimodal learning-based classification methods...With the rise of encrypted traffic,traditional network analysis methods have become less effective,leading to a shift towards deep learning-based approaches.Among these,multimodal learning-based classification methods have gained attention due to their ability to leverage diverse feature sets from encrypted traffic,improving classification accuracy.However,existing research predominantly relies on late fusion techniques,which hinder the full utilization of deep features within the data.To address this limitation,we propose a novel multimodal encrypted traffic classification model that synchronizes modality fusion with multiscale feature extraction.Specifically,our approach performs real-time fusion of modalities at each stage of feature extraction,enhancing feature representation at each level and preserving inter-level correlations for more effective learning.This continuous fusion strategy improves the model’s ability to detect subtle variations in encrypted traffic,while boosting its robustness and adaptability to evolving network conditions.Experimental results on two real-world encrypted traffic datasets demonstrate that our method achieves a classification accuracy of 98.23% and 97.63%,outperforming existing multimodal learning-based methods.展开更多
Visual question answering(VQA)is a multimodal task,involving a deep understanding of the image scene and the question’s meaning and capturing the relevant correlations between both modalities to infer the appropriate...Visual question answering(VQA)is a multimodal task,involving a deep understanding of the image scene and the question’s meaning and capturing the relevant correlations between both modalities to infer the appropriate answer.In this paper,we propose a VQA system intended to answer yes/no questions about real-world images,in Arabic.To support a robust VQA system,we work in two directions:(1)Using deep neural networks to semantically represent the given image and question in a fine-grainedmanner,namely ResNet-152 and Gated Recurrent Units(GRU).(2)Studying the role of the utilizedmultimodal bilinear pooling fusion technique in the trade-o.between the model complexity and the overall model performance.Some fusion techniques could significantly increase the model complexity,which seriously limits their applicability for VQA models.So far,there is no evidence of how efficient these multimodal bilinear pooling fusion techniques are for VQA systems dedicated to yes/no questions.Hence,a comparative analysis is conducted between eight bilinear pooling fusion techniques,in terms of their ability to reduce themodel complexity and improve themodel performance in this case of VQA systems.Experiments indicate that these multimodal bilinear pooling fusion techniques have improved the VQA model’s performance,until reaching the best performance of 89.25%.Further,experiments have proven that the number of answers in the developed VQA system is a critical factor that a.ects the effectiveness of these multimodal bilinear pooling techniques in achieving their main objective of reducing the model complexity.The Multimodal Local Perception Bilinear Pooling(MLPB)technique has shown the best balance between the model complexity and its performance,for VQA systems designed to answer yes/no questions.展开更多
Based on the CNN-LSTM fusion deep neural network,this paper proposes a seismic velocity model building method that can simultaneously estimate the root mean square(RMS)velocity and interval velocity from the common-mi...Based on the CNN-LSTM fusion deep neural network,this paper proposes a seismic velocity model building method that can simultaneously estimate the root mean square(RMS)velocity and interval velocity from the common-midpoint(CMP)gather.In the proposed method,a convolutional neural network(CNN)Encoder and two long short-term memory networks(LSTMs)are used to extract spatial and temporal features from seismic signals,respectively,and a CNN Decoder is used to recover RMS velocity and interval velocity of underground media from various feature vectors.To address the problems of unstable gradients and easily fall into a local minimum in the deep neural network training process,we propose to use Kaiming normal initialization with zero negative slopes of rectifi ed units and to adjust the network learning process by optimizing the mean square error(MSE)loss function with the introduction of a freezing factor.The experiments on testing dataset show that CNN-LSTM fusion deep neural network can predict RMS velocity as well as interval velocity more accurately,and its inversion accuracy is superior to that of single neural network models.The predictions on the complex structures and Marmousi model are consistent with the true velocity variation trends,and the predictions on fi eld data can eff ectively correct the phase axis,improve the lateral continuity of phase axis and quality of stack section,indicating the eff ectiveness and decent generalization capability of the proposed method.展开更多
Natural events have had a significant impact on overall flight activity,and the aviation industry plays a vital role in helping society cope with the impact of these events.As one of the most impactful weather typhoon...Natural events have had a significant impact on overall flight activity,and the aviation industry plays a vital role in helping society cope with the impact of these events.As one of the most impactful weather typhoon seasons appears and continues,airlines operating in threatened areas and passengers having travel plans during this time period will pay close attention to the development of tropical storms.This paper proposes a deep multimodal fusion and multitasking trajectory prediction model that can improve the reliability of typhoon trajectory prediction and reduce the quantity of flight scheduling cancellation.The deep multimodal fusion module is formed by deep fusion of the feature output by multiple submodal fusion modules,and the multitask generation module uses longitude and latitude as two related tasks for simultaneous prediction.With more dependable data accuracy,problems can be analysed rapidly and more efficiently,enabling better decision-making with a proactive versus reactive posture.When multiple modalities coexist,features can be extracted from them simultaneously to supplement each other’s information.An actual case study,the typhoon Lichma that swept China in 2019,has demonstrated that the algorithm can effectively reduce the number of unnecessary flight cancellations compared to existing flight scheduling and assist the new generation of flight scheduling systems under extreme weather.展开更多
Personality distinguishes individuals’ patterns of feeling, thinking,and behaving. Predicting personality from small video series is an excitingresearch area in computer vision. The majority of the existing research ...Personality distinguishes individuals’ patterns of feeling, thinking,and behaving. Predicting personality from small video series is an excitingresearch area in computer vision. The majority of the existing research concludespreliminary results to get immense knowledge from visual and Audio(sound) modality. To overcome the deficiency, we proposed the Deep BimodalFusion (DBF) approach to predict five traits of personality-agreeableness,extraversion, openness, conscientiousness and neuroticism. In the proposedframework, regarding visual modality, the modified convolution neural networks(CNN), more specifically Descriptor Aggregator Model (DAN) areused to attain significant visual modality. The proposed model extracts audiorepresentations for greater efficiency to construct the long short-termmemory(LSTM) for the audio modality. Moreover, employing modality-based neuralnetworks allows this framework to independently determine the traits beforecombining them with weighted fusion to achieve a conclusive prediction of thegiven traits. The proposed approach attains the optimal mean accuracy score,which is 0.9183. It is achieved based on the average of five personality traitsand is thus better than previously proposed frameworks.展开更多
Thunderstorm wind gusts are small in scale,typically occurring within a range of a few kilometers.It is extremely challenging to monitor and forecast thunderstorm wind gusts using only automatic weather stations.There...Thunderstorm wind gusts are small in scale,typically occurring within a range of a few kilometers.It is extremely challenging to monitor and forecast thunderstorm wind gusts using only automatic weather stations.Therefore,it is necessary to establish thunderstorm wind gust identification techniques based on multisource high-resolution observations.This paper introduces a new algorithm,called thunderstorm wind gust identification network(TGNet).It leverages multimodal feature fusion to fuse the temporal and spatial features of thunderstorm wind gust events.The shapelet transform is first used to extract the temporal features of wind speeds from automatic weather stations,which is aimed at distinguishing thunderstorm wind gusts from those caused by synoptic-scale systems or typhoons.Then,the encoder,structured upon the U-shaped network(U-Net)and incorporating recurrent residual convolutional blocks(R2U-Net),is employed to extract the corresponding spatial convective characteristics of satellite,radar,and lightning observations.Finally,by using the multimodal deep fusion module based on multi-head cross-attention,the temporal features of wind speed at each automatic weather station are incorporated into the spatial features to obtain 10-minutely classification of thunderstorm wind gusts.TGNet products have high accuracy,with a critical success index reaching 0.77.Compared with those of U-Net and R2U-Net,the false alarm rate of TGNet products decreases by 31.28%and 24.15%,respectively.The new algorithm provides grid products of thunderstorm wind gusts with a spatial resolution of 0.01°,updated every 10minutes.The results are finer and more accurate,thereby helping to improve the accuracy of operational warnings for thunderstorm wind gusts.展开更多
1 Introduction The Paleogene strata(with a depth of more than 2500m)in the Bohai sea is complex(Xu Changgui,2006),the reservoir buried deeply,the reservoir prediction is difficult(LAI Weicheng,XU Changgui,2012),and more
Combining both visible and infrared object information, multispectral data is a promising source data for automatic maritime ship recognition. In this paper, in order to take advantage of deep convolutional neural net...Combining both visible and infrared object information, multispectral data is a promising source data for automatic maritime ship recognition. In this paper, in order to take advantage of deep convolutional neural network and multispectral data, we model multispectral ship recognition task into a convolutional feature fusion problem, and propose a feature fusion architecture called Hybrid Fusion. We fine-tune the VGG-16 model pre-trained on ImageNet through three channels single spectral image and four channels multispectral images, and use existing regularization techniques to avoid over-fitting problem. Hybrid Fusion as well as the other three feature fusion architectures is investigated. Each fusion architecture consists of visible image and infrared image feature extraction branches, in which the pre-trained and fine-tuned VGG-16 models are taken as feature extractor. In each fusion architecture, image features of two branches are firstly extracted from the same layer or different layers of VGG-16 model. Subsequently, the features extracted from the two branches are flattened and concatenated to produce a multispectral feature vector, which is finally fed into a classifier to achieve ship recognition task. Furthermore, based on these fusion architectures, we also evaluate recognition performance of a feature vector normalization method and three combinations of feature extractors. Experimental results on the visible and infrared ship (VAIS) dataset show that the best Hybrid Fusion achieves 89.6% mean per-class recognition accuracy on daytime paired images and 64.9% on nighttime infrared images, and outperforms the state-of-the-art method by 1.4% and 3.9%, respectively.展开更多
For some important object recognition applications such as intelligent robots and unmanned driving, images are collected on a consecutive basis and associated among themselves, besides, the scenes have steady prior fe...For some important object recognition applications such as intelligent robots and unmanned driving, images are collected on a consecutive basis and associated among themselves, besides, the scenes have steady prior features. Yet existing technologies do not take full advantage of this information. In order to take object recognition further than existing algorithms in the above application, an object recognition method that fuses temporal sequence with scene priori information is proposed. This method first employs YOLOv3 as the basic algorithm to recognize objects in single-frame images, then the DeepSort algorithm to establish association among potential objects recognized in images of different moments, and finally the confidence fusion method and temporal boundary processing method designed herein to fuse, at the decision level, temporal sequence information with scene priori information. Experiments using public datasets and self-built industrial scene datasets show that due to the expansion of information sources, the quality of single-frame images has less impact on the recognition results, whereby the object recognition is greatly improved. It is presented herein as a widely applicable framework for the fusion of information under multiple classes. All the object recognition algorithms that output object class, location information and recognition confidence at the same time can be integrated into this information fusion framework to improve performance.展开更多
To address the difficulties in fusing multi-mode sensor data for complex industrial machinery, an adaptive deep coupling convolutional auto-encoder (ADCCAE) fusion method was proposed. First, the multi-mode features e...To address the difficulties in fusing multi-mode sensor data for complex industrial machinery, an adaptive deep coupling convolutional auto-encoder (ADCCAE) fusion method was proposed. First, the multi-mode features extracted synchronously by the CCAE were stacked and fed to the multi-channel convolution layers for fusion. Then, the fused data was passed to all connection layers for compression and fed to the Softmax module for classification. Finally, the coupling loss function coefficients and the network parameters were optimized through an adaptive approach using the gray wolf optimization (GWO) algorithm. Experimental comparisons showed that the proposed ADCCAE fusion model was superior to existing models for multi-mode data fusion.展开更多
文摘Face antispoofing has received a lot of attention because it plays a role in strengthening the security of face recognition systems.Face recognition is commonly used for authentication in surveillance applications.However,attackers try to compromise these systems by using spoofing techniques such as using photos or videos of users to gain access to services or information.Many existing methods for face spoofing face difficulties when dealing with new scenarios,especially when there are variations in background,lighting,and other environmental factors.Recent advancements in deep learning with multi-modality methods have shown their effectiveness in face antispoofing,surpassing single-modal methods.However,these approaches often generate several features that can lead to issues with data dimensionality.In this study,we introduce a multimodal deep fusion network for face anti-spoofing that incorporates cross-axial attention and deep reinforcement learning techniques.This network operates at three patch levels and analyzes images from modalities(RGB,IR,and depth).Initially,our design includes an axial attention network(XANet)model that extracts deeply hidden features from multimodal images.Further,we use a bidirectional fusion technique that pays attention to both directions to combine features from each mode effectively.We further improve feature optimization by using the Enhanced Pity Beetle Optimization(EPBO)algorithm,which selects the features to address data dimensionality problems.Moreover,our proposed model employs a hybrid federated reinforcement learning(FDDRL)approach to detect and classify face anti-spoofing,achieving a more optimal tradeoff between detection rates and false positive rates.We evaluated the proposed approach on publicly available datasets,including CASIA-SURF and GREATFASD-S,and realized 98.985%and 97.956%classification accuracy,respectively.In addition,the current method outperforms other state-of-the-art methods in terms of precision,recall,and Fmeasures.Overall,the developed methodology boosts the effectiveness of our model in detecting various types of spoofing attempts.
基金This work is partially supported by National Natural Foundation of China(Grant No.61772561)the Key Research&Development Plan of Hunan Province(Grant No.2018NK2012)+2 种基金the Degree&Postgraduate Education Reform Project of Hunan Province(Grant No.2019JGYB154)the Postgraduate Excellent teaching team Project of Hunan Province(Grant[2019]370-133)Teaching Reform Project of Central South University of Forestry and Technology(Grant No.20180682).
文摘With the development of Deep Convolutional Neural Networks(DCNNs),the extracted features for image recognition tasks have shifted from low-level features to the high-level semantic features of DCNNs.Previous studies have shown that the deeper the network is,the more abstract the features are.However,the recognition ability of deep features would be limited by insufficient training samples.To address this problem,this paper derives an improved Deep Fusion Convolutional Neural Network(DF-Net)which can make full use of the differences and complementarities during network learning and enhance feature expression under the condition of limited datasets.Specifically,DF-Net organizes two identical subnets to extract features from the input image in parallel,and then a well-designed fusion module is introduced to the deep layer of DF-Net to fuse the subnet’s features in multi-scale.Thus,the more complex mappings are created and the more abundant and accurate fusion features can be extracted to improve recognition accuracy.Furthermore,a corresponding training strategy is also proposed to speed up the convergence and reduce the computation overhead of network training.Finally,DF-Nets based on the well-known ResNet,DenseNet and MobileNetV2 are evaluated on CIFAR100,Stanford Dogs,and UECFOOD-100.Theoretical analysis and experimental results strongly demonstrate that DF-Net enhances the performance of DCNNs and increases the accuracy of image recognition.
基金supported by the National Key Research and Development Program of China No.2023YFB2705000.
文摘With the rise of encrypted traffic,traditional network analysis methods have become less effective,leading to a shift towards deep learning-based approaches.Among these,multimodal learning-based classification methods have gained attention due to their ability to leverage diverse feature sets from encrypted traffic,improving classification accuracy.However,existing research predominantly relies on late fusion techniques,which hinder the full utilization of deep features within the data.To address this limitation,we propose a novel multimodal encrypted traffic classification model that synchronizes modality fusion with multiscale feature extraction.Specifically,our approach performs real-time fusion of modalities at each stage of feature extraction,enhancing feature representation at each level and preserving inter-level correlations for more effective learning.This continuous fusion strategy improves the model’s ability to detect subtle variations in encrypted traffic,while boosting its robustness and adaptability to evolving network conditions.Experimental results on two real-world encrypted traffic datasets demonstrate that our method achieves a classification accuracy of 98.23% and 97.63%,outperforming existing multimodal learning-based methods.
文摘Visual question answering(VQA)is a multimodal task,involving a deep understanding of the image scene and the question’s meaning and capturing the relevant correlations between both modalities to infer the appropriate answer.In this paper,we propose a VQA system intended to answer yes/no questions about real-world images,in Arabic.To support a robust VQA system,we work in two directions:(1)Using deep neural networks to semantically represent the given image and question in a fine-grainedmanner,namely ResNet-152 and Gated Recurrent Units(GRU).(2)Studying the role of the utilizedmultimodal bilinear pooling fusion technique in the trade-o.between the model complexity and the overall model performance.Some fusion techniques could significantly increase the model complexity,which seriously limits their applicability for VQA models.So far,there is no evidence of how efficient these multimodal bilinear pooling fusion techniques are for VQA systems dedicated to yes/no questions.Hence,a comparative analysis is conducted between eight bilinear pooling fusion techniques,in terms of their ability to reduce themodel complexity and improve themodel performance in this case of VQA systems.Experiments indicate that these multimodal bilinear pooling fusion techniques have improved the VQA model’s performance,until reaching the best performance of 89.25%.Further,experiments have proven that the number of answers in the developed VQA system is a critical factor that a.ects the effectiveness of these multimodal bilinear pooling techniques in achieving their main objective of reducing the model complexity.The Multimodal Local Perception Bilinear Pooling(MLPB)technique has shown the best balance between the model complexity and its performance,for VQA systems designed to answer yes/no questions.
基金financially supported by the Key Project of National Natural Science Foundation of China (No. 41930431)the Project of National Natural Science Foundation of China (Nos. 41904121, 41804133, and 41974116)Joint Guidance Project of Natural Science Foundation of Heilongjiang Province (No. LH2020D006)
文摘Based on the CNN-LSTM fusion deep neural network,this paper proposes a seismic velocity model building method that can simultaneously estimate the root mean square(RMS)velocity and interval velocity from the common-midpoint(CMP)gather.In the proposed method,a convolutional neural network(CNN)Encoder and two long short-term memory networks(LSTMs)are used to extract spatial and temporal features from seismic signals,respectively,and a CNN Decoder is used to recover RMS velocity and interval velocity of underground media from various feature vectors.To address the problems of unstable gradients and easily fall into a local minimum in the deep neural network training process,we propose to use Kaiming normal initialization with zero negative slopes of rectifi ed units and to adjust the network learning process by optimizing the mean square error(MSE)loss function with the introduction of a freezing factor.The experiments on testing dataset show that CNN-LSTM fusion deep neural network can predict RMS velocity as well as interval velocity more accurately,and its inversion accuracy is superior to that of single neural network models.The predictions on the complex structures and Marmousi model are consistent with the true velocity variation trends,and the predictions on fi eld data can eff ectively correct the phase axis,improve the lateral continuity of phase axis and quality of stack section,indicating the eff ectiveness and decent generalization capability of the proposed method.
基金supported by the National Natural Science Foundation of China(62073330)。
文摘Natural events have had a significant impact on overall flight activity,and the aviation industry plays a vital role in helping society cope with the impact of these events.As one of the most impactful weather typhoon seasons appears and continues,airlines operating in threatened areas and passengers having travel plans during this time period will pay close attention to the development of tropical storms.This paper proposes a deep multimodal fusion and multitasking trajectory prediction model that can improve the reliability of typhoon trajectory prediction and reduce the quantity of flight scheduling cancellation.The deep multimodal fusion module is formed by deep fusion of the feature output by multiple submodal fusion modules,and the multitask generation module uses longitude and latitude as two related tasks for simultaneous prediction.With more dependable data accuracy,problems can be analysed rapidly and more efficiently,enabling better decision-making with a proactive versus reactive posture.When multiple modalities coexist,features can be extracted from them simultaneously to supplement each other’s information.An actual case study,the typhoon Lichma that swept China in 2019,has demonstrated that the algorithm can effectively reduce the number of unnecessary flight cancellations compared to existing flight scheduling and assist the new generation of flight scheduling systems under extreme weather.
文摘Personality distinguishes individuals’ patterns of feeling, thinking,and behaving. Predicting personality from small video series is an excitingresearch area in computer vision. The majority of the existing research concludespreliminary results to get immense knowledge from visual and Audio(sound) modality. To overcome the deficiency, we proposed the Deep BimodalFusion (DBF) approach to predict five traits of personality-agreeableness,extraversion, openness, conscientiousness and neuroticism. In the proposedframework, regarding visual modality, the modified convolution neural networks(CNN), more specifically Descriptor Aggregator Model (DAN) areused to attain significant visual modality. The proposed model extracts audiorepresentations for greater efficiency to construct the long short-termmemory(LSTM) for the audio modality. Moreover, employing modality-based neuralnetworks allows this framework to independently determine the traits beforecombining them with weighted fusion to achieve a conclusive prediction of thegiven traits. The proposed approach attains the optimal mean accuracy score,which is 0.9183. It is achieved based on the average of five personality traitsand is thus better than previously proposed frameworks.
基金supported by the National Key Research and Development Program of China(Grant No.2022YFC3004104)the National Natural Science Foundation of China(Grant No.U2342204)+4 种基金the Innovation and Development Program of the China Meteorological Administration(Grant No.CXFZ2024J001)the Open Research Project of the Key Open Laboratory of Hydrology and Meteorology of the China Meteorological Administration(Grant No.23SWQXZ010)the Science and Technology Plan Project of Zhejiang Province(Grant No.2022C03150)the Open Research Fund Project of Anyang National Climate Observatory(Grant No.AYNCOF202401)the Open Bidding for Selecting the Best Candidates Program(Grant No.CMAJBGS202318)。
文摘Thunderstorm wind gusts are small in scale,typically occurring within a range of a few kilometers.It is extremely challenging to monitor and forecast thunderstorm wind gusts using only automatic weather stations.Therefore,it is necessary to establish thunderstorm wind gust identification techniques based on multisource high-resolution observations.This paper introduces a new algorithm,called thunderstorm wind gust identification network(TGNet).It leverages multimodal feature fusion to fuse the temporal and spatial features of thunderstorm wind gust events.The shapelet transform is first used to extract the temporal features of wind speeds from automatic weather stations,which is aimed at distinguishing thunderstorm wind gusts from those caused by synoptic-scale systems or typhoons.Then,the encoder,structured upon the U-shaped network(U-Net)and incorporating recurrent residual convolutional blocks(R2U-Net),is employed to extract the corresponding spatial convective characteristics of satellite,radar,and lightning observations.Finally,by using the multimodal deep fusion module based on multi-head cross-attention,the temporal features of wind speed at each automatic weather station are incorporated into the spatial features to obtain 10-minutely classification of thunderstorm wind gusts.TGNet products have high accuracy,with a critical success index reaching 0.77.Compared with those of U-Net and R2U-Net,the false alarm rate of TGNet products decreases by 31.28%and 24.15%,respectively.The new algorithm provides grid products of thunderstorm wind gusts with a spatial resolution of 0.01°,updated every 10minutes.The results are finer and more accurate,thereby helping to improve the accuracy of operational warnings for thunderstorm wind gusts.
基金funded by Major Projects of National Science and Technology “Large Oil and Gas Fields and CBM development”(Grant No. 2016ZX05 027)
文摘1 Introduction The Paleogene strata(with a depth of more than 2500m)in the Bohai sea is complex(Xu Changgui,2006),the reservoir buried deeply,the reservoir prediction is difficult(LAI Weicheng,XU Changgui,2012),and more
文摘Combining both visible and infrared object information, multispectral data is a promising source data for automatic maritime ship recognition. In this paper, in order to take advantage of deep convolutional neural network and multispectral data, we model multispectral ship recognition task into a convolutional feature fusion problem, and propose a feature fusion architecture called Hybrid Fusion. We fine-tune the VGG-16 model pre-trained on ImageNet through three channels single spectral image and four channels multispectral images, and use existing regularization techniques to avoid over-fitting problem. Hybrid Fusion as well as the other three feature fusion architectures is investigated. Each fusion architecture consists of visible image and infrared image feature extraction branches, in which the pre-trained and fine-tuned VGG-16 models are taken as feature extractor. In each fusion architecture, image features of two branches are firstly extracted from the same layer or different layers of VGG-16 model. Subsequently, the features extracted from the two branches are flattened and concatenated to produce a multispectral feature vector, which is finally fed into a classifier to achieve ship recognition task. Furthermore, based on these fusion architectures, we also evaluate recognition performance of a feature vector normalization method and three combinations of feature extractors. Experimental results on the visible and infrared ship (VAIS) dataset show that the best Hybrid Fusion achieves 89.6% mean per-class recognition accuracy on daytime paired images and 64.9% on nighttime infrared images, and outperforms the state-of-the-art method by 1.4% and 3.9%, respectively.
文摘For some important object recognition applications such as intelligent robots and unmanned driving, images are collected on a consecutive basis and associated among themselves, besides, the scenes have steady prior features. Yet existing technologies do not take full advantage of this information. In order to take object recognition further than existing algorithms in the above application, an object recognition method that fuses temporal sequence with scene priori information is proposed. This method first employs YOLOv3 as the basic algorithm to recognize objects in single-frame images, then the DeepSort algorithm to establish association among potential objects recognized in images of different moments, and finally the confidence fusion method and temporal boundary processing method designed herein to fuse, at the decision level, temporal sequence information with scene priori information. Experiments using public datasets and self-built industrial scene datasets show that due to the expansion of information sources, the quality of single-frame images has less impact on the recognition results, whereby the object recognition is greatly improved. It is presented herein as a widely applicable framework for the fusion of information under multiple classes. All the object recognition algorithms that output object class, location information and recognition confidence at the same time can be integrated into this information fusion framework to improve performance.
文摘To address the difficulties in fusing multi-mode sensor data for complex industrial machinery, an adaptive deep coupling convolutional auto-encoder (ADCCAE) fusion method was proposed. First, the multi-mode features extracted synchronously by the CCAE were stacked and fed to the multi-channel convolution layers for fusion. Then, the fused data was passed to all connection layers for compression and fed to the Softmax module for classification. Finally, the coupling loss function coefficients and the network parameters were optimized through an adaptive approach using the gray wolf optimization (GWO) algorithm. Experimental comparisons showed that the proposed ADCCAE fusion model was superior to existing models for multi-mode data fusion.