Biometric characteristics are playing a vital role in security for the last few years.Human gait classification in video sequences is an important biometrics attribute and is used for security purposes.A new framework...Biometric characteristics are playing a vital role in security for the last few years.Human gait classification in video sequences is an important biometrics attribute and is used for security purposes.A new framework for human gait classification in video sequences using deep learning(DL)fusion assisted and posterior probability-based moth flames optimization(MFO)is proposed.In the first step,the video frames are resized and finetuned by two pre-trained lightweight DL models,EfficientNetB0 and MobileNetV2.Both models are selected based on the top-5 accuracy and less number of parameters.Later,both models are trained through deep transfer learning and extracted deep features fused using a voting scheme.In the last step,the authors develop a posterior probabilitybased MFO feature selection algorithm to select the best features.The selected features are classified using several supervised learning methods.The CASIA-B publicly available dataset has been employed for the experimental process.On this dataset,the authors selected six angles such as 0°,18°,90°,108°,162°,and 180°and obtained an average accuracy of 96.9%,95.7%,86.8%,90.0%,95.1%,and 99.7%.Results demonstrate comparable improvement in accuracy and significantly minimize the computational time with recent state-of-the-art techniques.展开更多
Fault diagnosis of rolling bearings is crucial for ensuring the stable operation of mechanical equipment and production safety in industrial environments.However,due to the nonlinearity and non-stationarity of collect...Fault diagnosis of rolling bearings is crucial for ensuring the stable operation of mechanical equipment and production safety in industrial environments.However,due to the nonlinearity and non-stationarity of collected vibration signals,single-modal methods struggle to capture fault features fully.This paper proposes a rolling bearing fault diagnosis method based on multi-modal information fusion.The method first employs the Hippopotamus Optimization Algorithm(HO)to optimize the number of modes in Variational Mode Decomposition(VMD)to achieve optimal modal decomposition performance.It combines Convolutional Neural Networks(CNN)and Gated Recurrent Units(GRU)to extract temporal features from one-dimensional time-series signals.Meanwhile,the Markovian Transition Field(MTF)is used to transform one-dimensional signals into two-dimensional images for spatial feature mining.Through visualization techniques,the effectiveness of generated images from different parameter combinations is compared to determine the optimal parameter configuration.A multi-modal network(GSTCN)is constructed by integrating Swin-Transformer and the Convolutional Block Attention Module(CBAM),where the attention module is utilized to enhance fault features.Finally,the fault features extracted from different modalities are deeply fused and fed into a fully connected layer to complete fault classification.Experimental results show that the GSTCN model achieves an average diagnostic accuracy of 99.5%across three datasets,significantly outperforming existing comparison methods.This demonstrates that the proposed model has high diagnostic precision and good generalization ability,providing an efficient and reliable solution for rolling bearing fault diagnosis.展开更多
Human Action Recognition(HAR)is an active research topic in machine learning for the last few decades.Visual surveillance,robotics,and pedestrian detection are the main applications for action recognition.Computer vis...Human Action Recognition(HAR)is an active research topic in machine learning for the last few decades.Visual surveillance,robotics,and pedestrian detection are the main applications for action recognition.Computer vision researchers have introduced many HAR techniques,but they still face challenges such as redundant features and the cost of computing.In this article,we proposed a new method for the use of deep learning for HAR.In the proposed method,video frames are initially pre-processed using a global contrast approach and later used to train a deep learning model using domain transfer learning.The Resnet-50 Pre-Trained Model is used as a deep learning model in this work.Features are extracted from two layers:Global Average Pool(GAP)and Fully Connected(FC).The features of both layers are fused by the Canonical Correlation Analysis(CCA).Then features are selected using the Shanon Entropy-based threshold function.The selected features are finally passed to multiple classifiers for final classification.Experiments are conducted on five publicly available datasets as IXMAS,UCF Sports,YouTube,UT-Interaction,and KTH.The accuracy of these data sets was 89.6%,99.7%,100%,96.7%and 96.6%,respectively.Comparison with existing techniques has shown that the proposed method provides improved accuracy for HAR.Also,the proposed method is computationally fast based on the time of execution.展开更多
Image classification based on bag-of-words(BOW)has a broad application prospect in pattern recognition field but the shortcomings such as single feature and low classification accuracy are apparent.To deal with this...Image classification based on bag-of-words(BOW)has a broad application prospect in pattern recognition field but the shortcomings such as single feature and low classification accuracy are apparent.To deal with this problem,this paper proposes to combine two ingredients:(i)Three features with functions of mutual complementation are adopted to describe the images,including pyramid histogram of words(PHOW),pyramid histogram of color(PHOC)and pyramid histogram of orientated gradients(PHOG).(ii)An adaptive feature-weight adjusted image categorization algorithm based on the SVM and the decision level fusion of multiple features are employed.Experiments are carried out on the Caltech101 database,which confirms the validity of the proposed approach.The experimental results show that the classification accuracy rate of the proposed method is improved by 7%-14%higher than that of the traditional BOW methods.With full utilization of global,local and spatial information,the algorithm is much more complete and flexible to describe the feature information of the image through the multi-feature fusion and the pyramid structure composed by image spatial multi-resolution decomposition.Significant improvements to the classification accuracy are achieved as the result.展开更多
Regular inspection of bridge cracks is crucial to bridge maintenance and repair.The traditional manual crack detection methods are timeconsuming,dangerous and subjective.At the same time,for the existing mainstream vi...Regular inspection of bridge cracks is crucial to bridge maintenance and repair.The traditional manual crack detection methods are timeconsuming,dangerous and subjective.At the same time,for the existing mainstream vision-based automatic crack detection algorithms,it is challenging to detect fine cracks and balance the detection accuracy and speed.Therefore,this paper proposes a new bridge crack segmentationmethod based on parallel attention mechanism and multi-scale features fusion on top of the DeeplabV3+network framework.First,the improved lightweight MobileNetv2 network and dilated separable convolution are integrated into the original DeeplabV3+network to improve the original backbone network Xception and atrous spatial pyramid pooling(ASPP)module,respectively,dramatically reducing the number of parameters in the network and accelerates the training and prediction speed of the model.Moreover,we introduce the parallel attention mechanism into the encoding and decoding stages.The attention to the crack regions can be enhanced from the aspects of both channel and spatial parts and significantly suppress the interference of various noises.Finally,we further improve the detection performance of the model for fine cracks by introducing a multi-scale features fusion module.Our research results are validated on the self-made dataset.The experiments show that our method is more accurate than other methods.Its intersection of union(IoU)and F1-score(F1)are increased to 77.96%and 87.57%,respectively.In addition,the number of parameters is only 4.10M,which is much smaller than the original network;also,the frames per second(FPS)is increased to 15 frames/s.The results prove that the proposed method fits well the requirements of rapid and accurate detection of bridge cracks and is superior to other methods.展开更多
Real-time detection of driver fatigue status is of great significance for road traffic safety.In this paper,a proposed novel driver fatigue detection method is able to detect the driver’s fatigue status around the cl...Real-time detection of driver fatigue status is of great significance for road traffic safety.In this paper,a proposed novel driver fatigue detection method is able to detect the driver’s fatigue status around the clock.The driver’s face images were captured by a camera with a colored lens and an infrared lens mounted above the dashboard.The landmarks of the driver’s face were labeled and the eye-area was segmented.By calculating the aspect ratios of the eyes,the duration of eye closure,frequency of blinks and PERCLOS of both colored and infrared,fatigue can be detected.Based on the change of light intensity detected by a photosensitive device,the weight matrix of the colored features and the infrared features was adjusted adaptively to reduce the impact of lighting on fatigue detection.Video samples of the driver’s face were recorded in the test vehicle.After training the classification model,the results showed that our method has high accuracy on driver fatigue detection in both daytime and nighttime.展开更多
In the production of the sucker rod well, the dynamic liquid level is important for the production efficiency and safety in the lifting process. It is influenced by multi-source data which need to be combined for the ...In the production of the sucker rod well, the dynamic liquid level is important for the production efficiency and safety in the lifting process. It is influenced by multi-source data which need to be combined for the dynamic liquid level real-time calculation. In this paper, the multi-source data are regarded as the different views including the load of the sucker rod and liquid in the wellbore, the image of the dynamometer card and production dynamics parameters. These views can be fused by the multi-branch neural network with special fusion layer. With this method, the features of different views can be extracted by considering the difference of the modality and physical meaning between them. Then, the extraction results which are selected by multinomial sampling can be the input of the fusion layer.During the fusion process, the availability under different views determines whether the views are fused in the fusion layer or not. In this way, not only the correlation between the views can be considered, but also the missing data can be processed automatically. The results have shown that the load and production features fusion(the method proposed in this paper) performs best with the lowest mean absolute error(MAE) 39.63 m, followed by the features concatenation with MAE 42.47 m. They both performed better than only a single view and the lower MAE of the features fusion indicates that its generalization ability is stronger. In contrast, the image feature as a single view contributes little to the accuracy improvement after fused with other views with the highest MAE. When there is data missing in some view, compared with the features concatenation, the multi-view features fusion will not result in the unavailability of a large number of samples. When the missing rate is 10%, 30%, 50% and 80%, the method proposed in this paper can reduce MAE by 5.8, 7, 9.3 and 20.3 m respectively. In general, the multi-view features fusion method proposed in this paper can improve the accuracy obviously and process the missing data effectively, which helps provide technical support for real-time monitoring of the dynamic liquid level in oil fields.展开更多
To solve the problem of low robustness of trackers under significant appearance changes in complex background,a novel moving target tracking method based on hierarchical deep features weighted fusion and correlation f...To solve the problem of low robustness of trackers under significant appearance changes in complex background,a novel moving target tracking method based on hierarchical deep features weighted fusion and correlation filter is proposed.Firstly,multi-layer features are extracted by a deep model pre-trained on massive object recognition datasets.The linearly separable features of Relu3-1,Relu4-1 and Relu5-4 layers from VGG-Net-19 are especially suitable for target tracking.Then,correlation filters over hierarchical convolutional features are learned to generate their correlation response maps.Finally,a novel approach of weight adjustment is presented to fuse response maps.The maximum value of the final response map is just the location of the target.Extensive experiments on the object tracking benchmark datasets demonstrate the high robustness and recognition precision compared with several state-of-the-art trackers under the different conditions.展开更多
Signature verification is regarded as the most beneficial behavioral characteristic-based biometric feature in security and fraud protection.It is also a popular biometric authentication technology in forensic and com...Signature verification is regarded as the most beneficial behavioral characteristic-based biometric feature in security and fraud protection.It is also a popular biometric authentication technology in forensic and commercial transactions due to its various advantages,including noninvasiveness,user-friendliness,and social and legal acceptability.According to the literature,extensive research has been conducted on signature verification systems in a variety of languages,including English,Hindi,Bangla,and Chinese.However,the Arabic Offline Signature Verification(OSV)system is still a challenging issue that has not been investigated as much by researchers due to the Arabic script being distinguished by changing letter shapes,diacritics,ligatures,and overlapping,making verification more difficult.Recently,signature verification systems have shown promising results for recognizing signatures that are genuine or forgeries;however,performance on skilled forgery detection is still unsatisfactory.Most existing methods require many learning samples to improve verification accuracy,which is a major drawback because the number of available signature samples is often limited in the practical application of signature verification systems.This study addresses these issues by presenting an OSV system based on multifeature fusion and discriminant feature selection using a genetic algorithm(GA).In contrast to existing methods,which use multiclass learning approaches,this study uses a oneclass learning strategy to address imbalanced signature data in the practical application of a signature verification system.The proposed approach is tested on three signature databases(SID)-Arabic handwriting signatures,CEDAR(Center of Excellence for Document Analysis and Recognition),and UTSIG(University of Tehran Persian Signature),and experimental results show that the proposed system outperforms existing systems in terms of reducing the False Acceptance Rate(FAR),False Rejection Rate(FRR),and Equal Error Rate(ERR).The proposed system achieved 5%improvement.展开更多
To explore the influence of the fusion of different features on recognition,this paper took the electromyography(EMG)signals of rectus femoris under different motions(walk,step,ramp,squat,and sitting)as samples,linear...To explore the influence of the fusion of different features on recognition,this paper took the electromyography(EMG)signals of rectus femoris under different motions(walk,step,ramp,squat,and sitting)as samples,linear features(time-domain features(variance(VAR)and root mean square(RMS)),frequency-domain features(mean frequency(MF)and mean power frequency(MPF)),and nonlinear features(empirical mode decomposition(EMD))of the samples were extracted.Two feature fusion algorithms,the series splicing method and complex vector method,were designed,which were verified by a double hidden layer(BP)error back propagation neural network.Results show that with the increase of the types and complexity of feature fusions,the recognition rate of the EMG signal to actions is gradually improved.When the EMG signal is used in the series splicing method,the recognition rate of time-domain+frequency-domain+empirical mode decomposition(TD+FD+EMD)splicing is the highest,and the average recognition rate is 92.32%.And this rate is raised to 96.1%by using the complex vector method,and the variance of the BP system is also reduced.展开更多
Thunderstorm wind gusts are small in scale,typically occurring within a range of a few kilometers.It is extremely challenging to monitor and forecast thunderstorm wind gusts using only automatic weather stations.There...Thunderstorm wind gusts are small in scale,typically occurring within a range of a few kilometers.It is extremely challenging to monitor and forecast thunderstorm wind gusts using only automatic weather stations.Therefore,it is necessary to establish thunderstorm wind gust identification techniques based on multisource high-resolution observations.This paper introduces a new algorithm,called thunderstorm wind gust identification network(TGNet).It leverages multimodal feature fusion to fuse the temporal and spatial features of thunderstorm wind gust events.The shapelet transform is first used to extract the temporal features of wind speeds from automatic weather stations,which is aimed at distinguishing thunderstorm wind gusts from those caused by synoptic-scale systems or typhoons.Then,the encoder,structured upon the U-shaped network(U-Net)and incorporating recurrent residual convolutional blocks(R2U-Net),is employed to extract the corresponding spatial convective characteristics of satellite,radar,and lightning observations.Finally,by using the multimodal deep fusion module based on multi-head cross-attention,the temporal features of wind speed at each automatic weather station are incorporated into the spatial features to obtain 10-minutely classification of thunderstorm wind gusts.TGNet products have high accuracy,with a critical success index reaching 0.77.Compared with those of U-Net and R2U-Net,the false alarm rate of TGNet products decreases by 31.28%and 24.15%,respectively.The new algorithm provides grid products of thunderstorm wind gusts with a spatial resolution of 0.01°,updated every 10minutes.The results are finer and more accurate,thereby helping to improve the accuracy of operational warnings for thunderstorm wind gusts.展开更多
Solar cell defect detection is crucial for quality inspection in photovoltaic power generation modules.In the production process,defect samples occur infrequently and exhibit random shapes and sizes,which makes it cha...Solar cell defect detection is crucial for quality inspection in photovoltaic power generation modules.In the production process,defect samples occur infrequently and exhibit random shapes and sizes,which makes it challenging to collect defective samples.Additionally,the complex surface background of polysilicon cell wafers complicates the accurate identification and localization of defective regions.This paper proposes a novel Lightweight Multiscale Feature Fusion network(LMFF)to address these challenges.The network comprises a feature extraction network,a multi-scale feature fusion module(MFF),and a segmentation network.Specifically,a feature extraction network is proposed to obtain multi-scale feature outputs,and a multi-scale feature fusion module(MFF)is used to fuse multi-scale feature information effectively.In order to capture finer-grained multi-scale information from the fusion features,we propose a multi-scale attention module(MSA)in the segmentation network to enhance the network’s ability for small target detection.Moreover,depthwise separable convolutions are introduced to construct depthwise separable residual blocks(DSR)to reduce the model’s parameter number.Finally,to validate the proposed method’s defect segmentation and localization performance,we constructed three solar cell defect detection datasets:SolarCells,SolarCells-S,and PVEL-S.SolarCells and SolarCells-S are monocrystalline silicon datasets,and PVEL-S is a polycrystalline silicon dataset.Experimental results show that the IOU of our method on these three datasets can reach 68.5%,51.0%,and 92.7%,respectively,and the F1-Score can reach 81.3%,67.5%,and 96.2%,respectively,which surpasses other commonly usedmethods and verifies the effectiveness of our LMFF network.展开更多
An improved model based on you only look once version 8(YOLOv8)is proposed to solve the problem of low detection accuracy due to the diversity of object sizes in optical remote sensing images.Firstly,the feature pyram...An improved model based on you only look once version 8(YOLOv8)is proposed to solve the problem of low detection accuracy due to the diversity of object sizes in optical remote sensing images.Firstly,the feature pyramid network(FPN)structure of the original YOLOv8 mode is replaced by the generalized-FPN(GFPN)structure in GiraffeDet to realize the"cross-layer"and"cross-scale"adaptive feature fusion,to enrich the semantic information and spatial information on the feature map to improve the target detection ability of the model.Secondly,a pyramid-pool module of multi atrous spatial pyramid pooling(MASPP)is designed by using the idea of atrous convolution and feature pyramid structure to extract multi-scale features,so as to improve the processing ability of the model for multi-scale objects.The experimental results show that the detection accuracy of the improved YOLOv8 model on DIOR dataset is 92%and mean average precision(mAP)is 87.9%,respectively 3.5%and 1.7%higher than those of the original model.It is proved the detection and classification ability of the proposed model on multi-dimensional optical remote sensing target has been improved.展开更多
Multi-label image classification is a challenging task due to the diverse sizes and complex backgrounds of objects in images.Obtaining class-specific precise representations at different scales is a key aspect of feat...Multi-label image classification is a challenging task due to the diverse sizes and complex backgrounds of objects in images.Obtaining class-specific precise representations at different scales is a key aspect of feature representation.However,existing methods often rely on the single-scale deep feature,neglecting shallow and deeper layer features,which poses challenges when predicting objects of varying scales within the same image.Although some studies have explored multi-scale features,they rarely address the flow of information between scales or efficiently obtain class-specific precise representations for features at different scales.To address these issues,we propose a two-stage,three-branch Transformer-based framework.The first stage incorporates multi-scale image feature extraction and hierarchical scale attention.This design enables the model to consider objects at various scales while enhancing the flow of information across different feature scales,improving the model’s generalization to diverse object scales.The second stage includes a global feature enhancement module and a region selection module.The global feature enhancement module strengthens interconnections between different image regions,mitigating the issue of incomplete represen-tations,while the region selection module models the cross-modal relationships between image features and labels.Together,these components enable the efficient acquisition of class-specific precise feature representations.Extensive experiments on public datasets,including COCO2014,VOC2007,and VOC2012,demonstrate the effectiveness of our proposed method.Our approach achieves consistent performance gains of 0.3%,0.4%,and 0.2%over state-of-the-art methods on the three datasets,respectively.These results validate the reliability and superiority of our approach for multi-label image classification.展开更多
In recent years,fungal diseases affecting grape crops have attracted significant attention.Currently,the assessment of black rot severitymainly depends on the ratio of lesion area to leaf surface area.However,effectiv...In recent years,fungal diseases affecting grape crops have attracted significant attention.Currently,the assessment of black rot severitymainly depends on the ratio of lesion area to leaf surface area.However,effectively and accurately segmenting leaf lesions presents considerable challenges.Existing grape leaf lesion segmentationmodels have several limitations,such as a large number of parameters,long training durations,and limited precision in extracting small lesions and boundary details.To address these issues,we propose an enhanced DeepLabv3+model incorporating Strip Pooling,Content-Guided Fusion,and Convolutional Block Attention Module(SFC_DeepLabv3+),an enhanced lesion segmentation method based on DeepLabv3+.This approach uses the lightweight MobileNetv2 backbone to replace the original Xception,incorporates a lightweight convolutional block attention module,and introduces a content-guided feature fusion module to improve the detection accuracy of small lesions and blurred boundaries.Experimental results showthat the enhancedmodel achieves a mean Intersection overUnion(mIoU)of 90.98%,amean Pixel Accuracy(mPA)of 94.33%,and a precision of 95.84%.This represents relative gains of 2.22%,1.78%,and 0.89%respectively compared to the original model.Additionally,its complexity is significantly reduced without sacrificing performance,the parameter count is reduced to 6.27 M,a decrease of 88.5%compared to the original model,floating point of operations(GFLOPs)drops from 83.62 to 29.00 G,a reduction of 65.1%.Additionally,Frames Per Second(FPS)increases from 63.7 to 74.3 FPS,marking an improvement of 16.7%.Compared to other models,the improved architecture shows faster convergence and superior segmentation accuracy,making it highly suitable for applications in resource-constrained environments.展开更多
A heart attack disrupts the normal flow of blood to the heart muscle,potentially causing severe damage or death if not treated promptly.It can lead to long-term health complications,reduce quality of life,and signific...A heart attack disrupts the normal flow of blood to the heart muscle,potentially causing severe damage or death if not treated promptly.It can lead to long-term health complications,reduce quality of life,and significantly impact daily activities and overall well-being.Despite the growing popularity of deep learning,several drawbacks persist,such as complexity and the limitation of single-model learning.In this paper,we introduce a residual learning-based feature fusion technique to achieve high accuracy in differentiating abnormal cardiac rhythms heart sound.Combining MobileNet with DenseNet201 for feature fusion leverages MobileNet lightweight,efficient architecture with DenseNet201,dense connections,resulting in enhanced feature extraction and improved model performance with reduced computational cost.To further enhance the fusion,we employed residual learning to optimize the hierarchical features of heart abnormal sounds during training.The experimental results demonstrate that the proposed fusion method achieved an accuracy of 95.67%on the benchmark PhysioNet-2016 Spectrogram dataset.To further validate the performance,we applied it to the BreakHis dataset with a magnification level of 100X.The results indicate that the model maintains robust performance on the second dataset,achieving an accuracy of 96.55%.it highlights its consistent performance,making it a suitable for various applications.展开更多
To address the challenge of missing modal information in entity alignment and to mitigate information loss or bias arising frommodal heterogeneity during fusion,while also capturing shared information acrossmodalities...To address the challenge of missing modal information in entity alignment and to mitigate information loss or bias arising frommodal heterogeneity during fusion,while also capturing shared information acrossmodalities,this paper proposes a Multi-modal Pre-synergistic Entity Alignmentmodel based on Cross-modalMutual Information Strategy Optimization(MPSEA).The model first employs independent encoders to process multi-modal features,including text,images,and numerical values.Next,a multi-modal pre-synergistic fusion mechanism integrates graph structural and visual modal features into the textual modality as preparatory information.This pre-fusion strategy enables unified perception of heterogeneous modalities at the model’s initial stage,reducing discrepancies during the fusion process.Finally,using cross-modal deep perception reinforcement learning,the model achieves adaptive multilevel feature fusion between modalities,supporting learningmore effective alignment strategies.Extensive experiments on multiple public datasets show that the MPSEA method achieves gains of up to 7% in Hits@1 and 8.2% in MRR on the FBDB15K dataset,and up to 9.1% in Hits@1 and 7.7% in MRR on the FBYG15K dataset,compared to existing state-of-the-art methods.These results confirm the effectiveness of the proposed model.展开更多
Face antispoofing has received a lot of attention because it plays a role in strengthening the security of face recognition systems.Face recognition is commonly used for authentication in surveillance applications.How...Face antispoofing has received a lot of attention because it plays a role in strengthening the security of face recognition systems.Face recognition is commonly used for authentication in surveillance applications.However,attackers try to compromise these systems by using spoofing techniques such as using photos or videos of users to gain access to services or information.Many existing methods for face spoofing face difficulties when dealing with new scenarios,especially when there are variations in background,lighting,and other environmental factors.Recent advancements in deep learning with multi-modality methods have shown their effectiveness in face antispoofing,surpassing single-modal methods.However,these approaches often generate several features that can lead to issues with data dimensionality.In this study,we introduce a multimodal deep fusion network for face anti-spoofing that incorporates cross-axial attention and deep reinforcement learning techniques.This network operates at three patch levels and analyzes images from modalities(RGB,IR,and depth).Initially,our design includes an axial attention network(XANet)model that extracts deeply hidden features from multimodal images.Further,we use a bidirectional fusion technique that pays attention to both directions to combine features from each mode effectively.We further improve feature optimization by using the Enhanced Pity Beetle Optimization(EPBO)algorithm,which selects the features to address data dimensionality problems.Moreover,our proposed model employs a hybrid federated reinforcement learning(FDDRL)approach to detect and classify face anti-spoofing,achieving a more optimal tradeoff between detection rates and false positive rates.We evaluated the proposed approach on publicly available datasets,including CASIA-SURF and GREATFASD-S,and realized 98.985%and 97.956%classification accuracy,respectively.In addition,the current method outperforms other state-of-the-art methods in terms of precision,recall,and Fmeasures.Overall,the developed methodology boosts the effectiveness of our model in detecting various types of spoofing attempts.展开更多
Ransomware attacks pose a significant threat to critical infrastructures,demanding robust detection mechanisms.This study introduces a hybrid model that combines vision transformer(ViT)and one-dimensional convolutiona...Ransomware attacks pose a significant threat to critical infrastructures,demanding robust detection mechanisms.This study introduces a hybrid model that combines vision transformer(ViT)and one-dimensional convolutional neural network(1DCNN)architectures to enhance ransomware detection capabilities.Addressing common challenges in ransomware detection,particularly dataset class imbalance,the synthetic minority oversampling technique(SMOTE)is employed to generate synthetic samples for minority class,thereby improving detection accuracy.The integration of ViT and 1DCNN through feature fusion enables the model to capture both global contextual and local sequential features,resulting in comprehensive ransomware classification.Tested on the UNSW-NB15 dataset,the proposed ViT-1DCNN model achieved 98%detection accuracy with precision,recall,and F1-score metrics surpassing conventional methods.This approach not only reduces false positives and negatives but also offers scalability and robustness for real-world cybersecurity applications.The results demonstrate the model’s potential as an effective tool for proactive ransomware detection,especially in environments where evolving threats require adaptable and high-accuracy solutions.展开更多
Bird monitoring and protection are essential for maintaining biodiversity,and fine-grained bird classification has become a key focus in this field.Audio-visual modalities provide critical cues for this task,but robus...Bird monitoring and protection are essential for maintaining biodiversity,and fine-grained bird classification has become a key focus in this field.Audio-visual modalities provide critical cues for this task,but robust feature extraction and efficient fusion remain major challenges.We introduce a multi-stage fine-grained audiovisual fusion network(MSFG-AVFNet) for fine-grained bird species classification,which addresses these challenges through two key components:(1) the audiovisual feature extraction module,which adopts a multi-stage finetuning strategy to provide high-quality unimodal features,laying a solid foundation for modality fusion;(2) the audiovisual feature fusion module,which combines a max pooling aggregation strategy with a novel audiovisual loss function to achieve effective and robust feature fusion.Experiments were conducted on the self-built AVB81and the publicly available SSW60 datasets,which contain data from 81 and 60 bird species,respectively.Comprehensive experiments demonstrate that our approach achieves notable performance gains,outperforming existing state-of-the-art methods.These results highlight its effectiveness in leveraging audiovisual modalities for fine-grained bird classification and its potential to support ecological monitoring and biodiversity research.展开更多
基金King Saud University,Grant/Award Number:RSP2024R157。
文摘Biometric characteristics are playing a vital role in security for the last few years.Human gait classification in video sequences is an important biometrics attribute and is used for security purposes.A new framework for human gait classification in video sequences using deep learning(DL)fusion assisted and posterior probability-based moth flames optimization(MFO)is proposed.In the first step,the video frames are resized and finetuned by two pre-trained lightweight DL models,EfficientNetB0 and MobileNetV2.Both models are selected based on the top-5 accuracy and less number of parameters.Later,both models are trained through deep transfer learning and extracted deep features fused using a voting scheme.In the last step,the authors develop a posterior probabilitybased MFO feature selection algorithm to select the best features.The selected features are classified using several supervised learning methods.The CASIA-B publicly available dataset has been employed for the experimental process.On this dataset,the authors selected six angles such as 0°,18°,90°,108°,162°,and 180°and obtained an average accuracy of 96.9%,95.7%,86.8%,90.0%,95.1%,and 99.7%.Results demonstrate comparable improvement in accuracy and significantly minimize the computational time with recent state-of-the-art techniques.
基金funded by the Jilin Provincial Department of Science and Technology,grant number 20230101208JC.
文摘Fault diagnosis of rolling bearings is crucial for ensuring the stable operation of mechanical equipment and production safety in industrial environments.However,due to the nonlinearity and non-stationarity of collected vibration signals,single-modal methods struggle to capture fault features fully.This paper proposes a rolling bearing fault diagnosis method based on multi-modal information fusion.The method first employs the Hippopotamus Optimization Algorithm(HO)to optimize the number of modes in Variational Mode Decomposition(VMD)to achieve optimal modal decomposition performance.It combines Convolutional Neural Networks(CNN)and Gated Recurrent Units(GRU)to extract temporal features from one-dimensional time-series signals.Meanwhile,the Markovian Transition Field(MTF)is used to transform one-dimensional signals into two-dimensional images for spatial feature mining.Through visualization techniques,the effectiveness of generated images from different parameter combinations is compared to determine the optimal parameter configuration.A multi-modal network(GSTCN)is constructed by integrating Swin-Transformer and the Convolutional Block Attention Module(CBAM),where the attention module is utilized to enhance fault features.Finally,the fault features extracted from different modalities are deeply fused and fed into a fully connected layer to complete fault classification.Experimental results show that the GSTCN model achieves an average diagnostic accuracy of 99.5%across three datasets,significantly outperforming existing comparison methods.This demonstrates that the proposed model has high diagnostic precision and good generalization ability,providing an efficient and reliable solution for rolling bearing fault diagnosis.
基金This research was supported by Korea Institute for Advancement of Technology(KIAT)grant funded by the Korea Government(MOTIE)(P0012724,The Competency Development Program for Industry Specialist)and the Soonchunhyang University Research Fund.
文摘Human Action Recognition(HAR)is an active research topic in machine learning for the last few decades.Visual surveillance,robotics,and pedestrian detection are the main applications for action recognition.Computer vision researchers have introduced many HAR techniques,but they still face challenges such as redundant features and the cost of computing.In this article,we proposed a new method for the use of deep learning for HAR.In the proposed method,video frames are initially pre-processed using a global contrast approach and later used to train a deep learning model using domain transfer learning.The Resnet-50 Pre-Trained Model is used as a deep learning model in this work.Features are extracted from two layers:Global Average Pool(GAP)and Fully Connected(FC).The features of both layers are fused by the Canonical Correlation Analysis(CCA).Then features are selected using the Shanon Entropy-based threshold function.The selected features are finally passed to multiple classifiers for final classification.Experiments are conducted on five publicly available datasets as IXMAS,UCF Sports,YouTube,UT-Interaction,and KTH.The accuracy of these data sets was 89.6%,99.7%,100%,96.7%and 96.6%,respectively.Comparison with existing techniques has shown that the proposed method provides improved accuracy for HAR.Also,the proposed method is computationally fast based on the time of execution.
基金Supported by Foundation for Innovative Research Groups of the National Natural Science Foundation of China(61321002)Projects of Major International(Regional)Jiont Research Program NSFC(61120106010)+1 种基金Beijing Education Committee Cooperation Building Foundation ProjectProgram for Changjiang Scholars and Innovative Research Team in University(IRT1208)
文摘Image classification based on bag-of-words(BOW)has a broad application prospect in pattern recognition field but the shortcomings such as single feature and low classification accuracy are apparent.To deal with this problem,this paper proposes to combine two ingredients:(i)Three features with functions of mutual complementation are adopted to describe the images,including pyramid histogram of words(PHOW),pyramid histogram of color(PHOC)and pyramid histogram of orientated gradients(PHOG).(ii)An adaptive feature-weight adjusted image categorization algorithm based on the SVM and the decision level fusion of multiple features are employed.Experiments are carried out on the Caltech101 database,which confirms the validity of the proposed approach.The experimental results show that the classification accuracy rate of the proposed method is improved by 7%-14%higher than that of the traditional BOW methods.With full utilization of global,local and spatial information,the algorithm is much more complete and flexible to describe the feature information of the image through the multi-feature fusion and the pyramid structure composed by image spatial multi-resolution decomposition.Significant improvements to the classification accuracy are achieved as the result.
基金This work was supported by the High-Tech Industry Science and Technology Innovation Leading Plan Project of Hunan Provincial under Grant 2020GK2026,author B.Y,http://kjt.hunan.gov.cn/.
文摘Regular inspection of bridge cracks is crucial to bridge maintenance and repair.The traditional manual crack detection methods are timeconsuming,dangerous and subjective.At the same time,for the existing mainstream vision-based automatic crack detection algorithms,it is challenging to detect fine cracks and balance the detection accuracy and speed.Therefore,this paper proposes a new bridge crack segmentationmethod based on parallel attention mechanism and multi-scale features fusion on top of the DeeplabV3+network framework.First,the improved lightweight MobileNetv2 network and dilated separable convolution are integrated into the original DeeplabV3+network to improve the original backbone network Xception and atrous spatial pyramid pooling(ASPP)module,respectively,dramatically reducing the number of parameters in the network and accelerates the training and prediction speed of the model.Moreover,we introduce the parallel attention mechanism into the encoding and decoding stages.The attention to the crack regions can be enhanced from the aspects of both channel and spatial parts and significantly suppress the interference of various noises.Finally,we further improve the detection performance of the model for fine cracks by introducing a multi-scale features fusion module.Our research results are validated on the self-made dataset.The experiments show that our method is more accurate than other methods.Its intersection of union(IoU)and F1-score(F1)are increased to 77.96%and 87.57%,respectively.In addition,the number of parameters is only 4.10M,which is much smaller than the original network;also,the frames per second(FPS)is increased to 15 frames/s.The results prove that the proposed method fits well the requirements of rapid and accurate detection of bridge cracks and is superior to other methods.
基金The work of this paper was supported by the National Natural Science Foundation of China under grant numbers 61572038 received by J.Z.in 2015.URL:https://isisn.nsfc.gov.cn/egrantindex/funcindex/prjsearch-list。
文摘Real-time detection of driver fatigue status is of great significance for road traffic safety.In this paper,a proposed novel driver fatigue detection method is able to detect the driver’s fatigue status around the clock.The driver’s face images were captured by a camera with a colored lens and an infrared lens mounted above the dashboard.The landmarks of the driver’s face were labeled and the eye-area was segmented.By calculating the aspect ratios of the eyes,the duration of eye closure,frequency of blinks and PERCLOS of both colored and infrared,fatigue can be detected.Based on the change of light intensity detected by a photosensitive device,the weight matrix of the colored features and the infrared features was adjusted adaptively to reduce the impact of lighting on fatigue detection.Video samples of the driver’s face were recorded in the test vehicle.After training the classification model,the results showed that our method has high accuracy on driver fatigue detection in both daytime and nighttime.
基金supported by the National Natural Science Foundation of China under Grant 52325402, 52274057, 52074340 and 51874335the National Key R&D Program of China under Grant 2023YFB4104200+1 种基金the Major Scientific and Technological Projects of CNOOC under Grant CCL2022RCPS0397RSN111 Project under Grant B08028。
文摘In the production of the sucker rod well, the dynamic liquid level is important for the production efficiency and safety in the lifting process. It is influenced by multi-source data which need to be combined for the dynamic liquid level real-time calculation. In this paper, the multi-source data are regarded as the different views including the load of the sucker rod and liquid in the wellbore, the image of the dynamometer card and production dynamics parameters. These views can be fused by the multi-branch neural network with special fusion layer. With this method, the features of different views can be extracted by considering the difference of the modality and physical meaning between them. Then, the extraction results which are selected by multinomial sampling can be the input of the fusion layer.During the fusion process, the availability under different views determines whether the views are fused in the fusion layer or not. In this way, not only the correlation between the views can be considered, but also the missing data can be processed automatically. The results have shown that the load and production features fusion(the method proposed in this paper) performs best with the lowest mean absolute error(MAE) 39.63 m, followed by the features concatenation with MAE 42.47 m. They both performed better than only a single view and the lower MAE of the features fusion indicates that its generalization ability is stronger. In contrast, the image feature as a single view contributes little to the accuracy improvement after fused with other views with the highest MAE. When there is data missing in some view, compared with the features concatenation, the multi-view features fusion will not result in the unavailability of a large number of samples. When the missing rate is 10%, 30%, 50% and 80%, the method proposed in this paper can reduce MAE by 5.8, 7, 9.3 and 20.3 m respectively. In general, the multi-view features fusion method proposed in this paper can improve the accuracy obviously and process the missing data effectively, which helps provide technical support for real-time monitoring of the dynamic liquid level in oil fields.
文摘To solve the problem of low robustness of trackers under significant appearance changes in complex background,a novel moving target tracking method based on hierarchical deep features weighted fusion and correlation filter is proposed.Firstly,multi-layer features are extracted by a deep model pre-trained on massive object recognition datasets.The linearly separable features of Relu3-1,Relu4-1 and Relu5-4 layers from VGG-Net-19 are especially suitable for target tracking.Then,correlation filters over hierarchical convolutional features are learned to generate their correlation response maps.Finally,a novel approach of weight adjustment is presented to fuse response maps.The maximum value of the final response map is just the location of the target.Extensive experiments on the object tracking benchmark datasets demonstrate the high robustness and recognition precision compared with several state-of-the-art trackers under the different conditions.
文摘Signature verification is regarded as the most beneficial behavioral characteristic-based biometric feature in security and fraud protection.It is also a popular biometric authentication technology in forensic and commercial transactions due to its various advantages,including noninvasiveness,user-friendliness,and social and legal acceptability.According to the literature,extensive research has been conducted on signature verification systems in a variety of languages,including English,Hindi,Bangla,and Chinese.However,the Arabic Offline Signature Verification(OSV)system is still a challenging issue that has not been investigated as much by researchers due to the Arabic script being distinguished by changing letter shapes,diacritics,ligatures,and overlapping,making verification more difficult.Recently,signature verification systems have shown promising results for recognizing signatures that are genuine or forgeries;however,performance on skilled forgery detection is still unsatisfactory.Most existing methods require many learning samples to improve verification accuracy,which is a major drawback because the number of available signature samples is often limited in the practical application of signature verification systems.This study addresses these issues by presenting an OSV system based on multifeature fusion and discriminant feature selection using a genetic algorithm(GA).In contrast to existing methods,which use multiclass learning approaches,this study uses a oneclass learning strategy to address imbalanced signature data in the practical application of a signature verification system.The proposed approach is tested on three signature databases(SID)-Arabic handwriting signatures,CEDAR(Center of Excellence for Document Analysis and Recognition),and UTSIG(University of Tehran Persian Signature),and experimental results show that the proposed system outperforms existing systems in terms of reducing the False Acceptance Rate(FAR),False Rejection Rate(FRR),and Equal Error Rate(ERR).The proposed system achieved 5%improvement.
基金support by the Aerospace Research Project of China under Grant No.020202。
文摘To explore the influence of the fusion of different features on recognition,this paper took the electromyography(EMG)signals of rectus femoris under different motions(walk,step,ramp,squat,and sitting)as samples,linear features(time-domain features(variance(VAR)and root mean square(RMS)),frequency-domain features(mean frequency(MF)and mean power frequency(MPF)),and nonlinear features(empirical mode decomposition(EMD))of the samples were extracted.Two feature fusion algorithms,the series splicing method and complex vector method,were designed,which were verified by a double hidden layer(BP)error back propagation neural network.Results show that with the increase of the types and complexity of feature fusions,the recognition rate of the EMG signal to actions is gradually improved.When the EMG signal is used in the series splicing method,the recognition rate of time-domain+frequency-domain+empirical mode decomposition(TD+FD+EMD)splicing is the highest,and the average recognition rate is 92.32%.And this rate is raised to 96.1%by using the complex vector method,and the variance of the BP system is also reduced.
基金supported by the National Key Research and Development Program of China(Grant No.2022YFC3004104)the National Natural Science Foundation of China(Grant No.U2342204)+4 种基金the Innovation and Development Program of the China Meteorological Administration(Grant No.CXFZ2024J001)the Open Research Project of the Key Open Laboratory of Hydrology and Meteorology of the China Meteorological Administration(Grant No.23SWQXZ010)the Science and Technology Plan Project of Zhejiang Province(Grant No.2022C03150)the Open Research Fund Project of Anyang National Climate Observatory(Grant No.AYNCOF202401)the Open Bidding for Selecting the Best Candidates Program(Grant No.CMAJBGS202318)。
文摘Thunderstorm wind gusts are small in scale,typically occurring within a range of a few kilometers.It is extremely challenging to monitor and forecast thunderstorm wind gusts using only automatic weather stations.Therefore,it is necessary to establish thunderstorm wind gust identification techniques based on multisource high-resolution observations.This paper introduces a new algorithm,called thunderstorm wind gust identification network(TGNet).It leverages multimodal feature fusion to fuse the temporal and spatial features of thunderstorm wind gust events.The shapelet transform is first used to extract the temporal features of wind speeds from automatic weather stations,which is aimed at distinguishing thunderstorm wind gusts from those caused by synoptic-scale systems or typhoons.Then,the encoder,structured upon the U-shaped network(U-Net)and incorporating recurrent residual convolutional blocks(R2U-Net),is employed to extract the corresponding spatial convective characteristics of satellite,radar,and lightning observations.Finally,by using the multimodal deep fusion module based on multi-head cross-attention,the temporal features of wind speed at each automatic weather station are incorporated into the spatial features to obtain 10-minutely classification of thunderstorm wind gusts.TGNet products have high accuracy,with a critical success index reaching 0.77.Compared with those of U-Net and R2U-Net,the false alarm rate of TGNet products decreases by 31.28%and 24.15%,respectively.The new algorithm provides grid products of thunderstorm wind gusts with a spatial resolution of 0.01°,updated every 10minutes.The results are finer and more accurate,thereby helping to improve the accuracy of operational warnings for thunderstorm wind gusts.
基金supported in part by the National Natural Science Foundation of China under Grants 62463002,62062021 and 62473033in part by the Guiyang Scientific Plan Project[2023]48–11,in part by QKHZYD[2023]010 Guizhou Province Science and Technology Innovation Base Construction Project“Key Laboratory Construction of Intelligent Mountain Agricultural Equipment”.
文摘Solar cell defect detection is crucial for quality inspection in photovoltaic power generation modules.In the production process,defect samples occur infrequently and exhibit random shapes and sizes,which makes it challenging to collect defective samples.Additionally,the complex surface background of polysilicon cell wafers complicates the accurate identification and localization of defective regions.This paper proposes a novel Lightweight Multiscale Feature Fusion network(LMFF)to address these challenges.The network comprises a feature extraction network,a multi-scale feature fusion module(MFF),and a segmentation network.Specifically,a feature extraction network is proposed to obtain multi-scale feature outputs,and a multi-scale feature fusion module(MFF)is used to fuse multi-scale feature information effectively.In order to capture finer-grained multi-scale information from the fusion features,we propose a multi-scale attention module(MSA)in the segmentation network to enhance the network’s ability for small target detection.Moreover,depthwise separable convolutions are introduced to construct depthwise separable residual blocks(DSR)to reduce the model’s parameter number.Finally,to validate the proposed method’s defect segmentation and localization performance,we constructed three solar cell defect detection datasets:SolarCells,SolarCells-S,and PVEL-S.SolarCells and SolarCells-S are monocrystalline silicon datasets,and PVEL-S is a polycrystalline silicon dataset.Experimental results show that the IOU of our method on these three datasets can reach 68.5%,51.0%,and 92.7%,respectively,and the F1-Score can reach 81.3%,67.5%,and 96.2%,respectively,which surpasses other commonly usedmethods and verifies the effectiveness of our LMFF network.
基金supported by the National Natural Science Foundation of China(No.62241109)the Tianjin Science and Technology Commissioner Project(No.20YDTPJC01110)。
文摘An improved model based on you only look once version 8(YOLOv8)is proposed to solve the problem of low detection accuracy due to the diversity of object sizes in optical remote sensing images.Firstly,the feature pyramid network(FPN)structure of the original YOLOv8 mode is replaced by the generalized-FPN(GFPN)structure in GiraffeDet to realize the"cross-layer"and"cross-scale"adaptive feature fusion,to enrich the semantic information and spatial information on the feature map to improve the target detection ability of the model.Secondly,a pyramid-pool module of multi atrous spatial pyramid pooling(MASPP)is designed by using the idea of atrous convolution and feature pyramid structure to extract multi-scale features,so as to improve the processing ability of the model for multi-scale objects.The experimental results show that the detection accuracy of the improved YOLOv8 model on DIOR dataset is 92%and mean average precision(mAP)is 87.9%,respectively 3.5%and 1.7%higher than those of the original model.It is proved the detection and classification ability of the proposed model on multi-dimensional optical remote sensing target has been improved.
基金supported by the National Natural Science Foundation of China(62302167,62477013)Natural Science Foundation of Shanghai(No.24ZR1456100)+1 种基金Science and Technology Commission of Shanghai Municipality(No.24DZ2305900)the Shanghai Municipal Special Fund for Promoting High-Quality Development of Industries(2211106).
文摘Multi-label image classification is a challenging task due to the diverse sizes and complex backgrounds of objects in images.Obtaining class-specific precise representations at different scales is a key aspect of feature representation.However,existing methods often rely on the single-scale deep feature,neglecting shallow and deeper layer features,which poses challenges when predicting objects of varying scales within the same image.Although some studies have explored multi-scale features,they rarely address the flow of information between scales or efficiently obtain class-specific precise representations for features at different scales.To address these issues,we propose a two-stage,three-branch Transformer-based framework.The first stage incorporates multi-scale image feature extraction and hierarchical scale attention.This design enables the model to consider objects at various scales while enhancing the flow of information across different feature scales,improving the model’s generalization to diverse object scales.The second stage includes a global feature enhancement module and a region selection module.The global feature enhancement module strengthens interconnections between different image regions,mitigating the issue of incomplete represen-tations,while the region selection module models the cross-modal relationships between image features and labels.Together,these components enable the efficient acquisition of class-specific precise feature representations.Extensive experiments on public datasets,including COCO2014,VOC2007,and VOC2012,demonstrate the effectiveness of our proposed method.Our approach achieves consistent performance gains of 0.3%,0.4%,and 0.2%over state-of-the-art methods on the three datasets,respectively.These results validate the reliability and superiority of our approach for multi-label image classification.
基金supported by the following grants:Zhejiang A&F University Research Development Fund(Talent Initiation Project No.2021LFR048)and 2023 University-Enterprise Joint Research Program(Grant No.LHYFZ2302)from the Modern Agricultural and Forestry Artificial Intelligence Industry Academy.
文摘In recent years,fungal diseases affecting grape crops have attracted significant attention.Currently,the assessment of black rot severitymainly depends on the ratio of lesion area to leaf surface area.However,effectively and accurately segmenting leaf lesions presents considerable challenges.Existing grape leaf lesion segmentationmodels have several limitations,such as a large number of parameters,long training durations,and limited precision in extracting small lesions and boundary details.To address these issues,we propose an enhanced DeepLabv3+model incorporating Strip Pooling,Content-Guided Fusion,and Convolutional Block Attention Module(SFC_DeepLabv3+),an enhanced lesion segmentation method based on DeepLabv3+.This approach uses the lightweight MobileNetv2 backbone to replace the original Xception,incorporates a lightweight convolutional block attention module,and introduces a content-guided feature fusion module to improve the detection accuracy of small lesions and blurred boundaries.Experimental results showthat the enhancedmodel achieves a mean Intersection overUnion(mIoU)of 90.98%,amean Pixel Accuracy(mPA)of 94.33%,and a precision of 95.84%.This represents relative gains of 2.22%,1.78%,and 0.89%respectively compared to the original model.Additionally,its complexity is significantly reduced without sacrificing performance,the parameter count is reduced to 6.27 M,a decrease of 88.5%compared to the original model,floating point of operations(GFLOPs)drops from 83.62 to 29.00 G,a reduction of 65.1%.Additionally,Frames Per Second(FPS)increases from 63.7 to 74.3 FPS,marking an improvement of 16.7%.Compared to other models,the improved architecture shows faster convergence and superior segmentation accuracy,making it highly suitable for applications in resource-constrained environments.
文摘A heart attack disrupts the normal flow of blood to the heart muscle,potentially causing severe damage or death if not treated promptly.It can lead to long-term health complications,reduce quality of life,and significantly impact daily activities and overall well-being.Despite the growing popularity of deep learning,several drawbacks persist,such as complexity and the limitation of single-model learning.In this paper,we introduce a residual learning-based feature fusion technique to achieve high accuracy in differentiating abnormal cardiac rhythms heart sound.Combining MobileNet with DenseNet201 for feature fusion leverages MobileNet lightweight,efficient architecture with DenseNet201,dense connections,resulting in enhanced feature extraction and improved model performance with reduced computational cost.To further enhance the fusion,we employed residual learning to optimize the hierarchical features of heart abnormal sounds during training.The experimental results demonstrate that the proposed fusion method achieved an accuracy of 95.67%on the benchmark PhysioNet-2016 Spectrogram dataset.To further validate the performance,we applied it to the BreakHis dataset with a magnification level of 100X.The results indicate that the model maintains robust performance on the second dataset,achieving an accuracy of 96.55%.it highlights its consistent performance,making it a suitable for various applications.
基金partially supported by the National Natural Science Foundation of China under Grants 62471493 and 62402257(for conceptualization and investigation)partially supported by the Natural Science Foundation of Shandong Province,China under Grants ZR2023LZH017,ZR2024MF066,and 2023QF025(for formal analysis and validation)+1 种基金partially supported by the Open Foundation of Key Laboratory of Computing Power Network and Information Security,Ministry of Education,Qilu University of Technology(Shandong Academy of Sciences)under Grant 2023ZD010(for methodology and model design)partially supported by the Russian Science Foundation(RSF)Project under Grant 22-71-10095-P(for validation and results verification).
文摘To address the challenge of missing modal information in entity alignment and to mitigate information loss or bias arising frommodal heterogeneity during fusion,while also capturing shared information acrossmodalities,this paper proposes a Multi-modal Pre-synergistic Entity Alignmentmodel based on Cross-modalMutual Information Strategy Optimization(MPSEA).The model first employs independent encoders to process multi-modal features,including text,images,and numerical values.Next,a multi-modal pre-synergistic fusion mechanism integrates graph structural and visual modal features into the textual modality as preparatory information.This pre-fusion strategy enables unified perception of heterogeneous modalities at the model’s initial stage,reducing discrepancies during the fusion process.Finally,using cross-modal deep perception reinforcement learning,the model achieves adaptive multilevel feature fusion between modalities,supporting learningmore effective alignment strategies.Extensive experiments on multiple public datasets show that the MPSEA method achieves gains of up to 7% in Hits@1 and 8.2% in MRR on the FBDB15K dataset,and up to 9.1% in Hits@1 and 7.7% in MRR on the FBYG15K dataset,compared to existing state-of-the-art methods.These results confirm the effectiveness of the proposed model.
文摘Face antispoofing has received a lot of attention because it plays a role in strengthening the security of face recognition systems.Face recognition is commonly used for authentication in surveillance applications.However,attackers try to compromise these systems by using spoofing techniques such as using photos or videos of users to gain access to services or information.Many existing methods for face spoofing face difficulties when dealing with new scenarios,especially when there are variations in background,lighting,and other environmental factors.Recent advancements in deep learning with multi-modality methods have shown their effectiveness in face antispoofing,surpassing single-modal methods.However,these approaches often generate several features that can lead to issues with data dimensionality.In this study,we introduce a multimodal deep fusion network for face anti-spoofing that incorporates cross-axial attention and deep reinforcement learning techniques.This network operates at three patch levels and analyzes images from modalities(RGB,IR,and depth).Initially,our design includes an axial attention network(XANet)model that extracts deeply hidden features from multimodal images.Further,we use a bidirectional fusion technique that pays attention to both directions to combine features from each mode effectively.We further improve feature optimization by using the Enhanced Pity Beetle Optimization(EPBO)algorithm,which selects the features to address data dimensionality problems.Moreover,our proposed model employs a hybrid federated reinforcement learning(FDDRL)approach to detect and classify face anti-spoofing,achieving a more optimal tradeoff between detection rates and false positive rates.We evaluated the proposed approach on publicly available datasets,including CASIA-SURF and GREATFASD-S,and realized 98.985%and 97.956%classification accuracy,respectively.In addition,the current method outperforms other state-of-the-art methods in terms of precision,recall,and Fmeasures.Overall,the developed methodology boosts the effectiveness of our model in detecting various types of spoofing attempts.
文摘Ransomware attacks pose a significant threat to critical infrastructures,demanding robust detection mechanisms.This study introduces a hybrid model that combines vision transformer(ViT)and one-dimensional convolutional neural network(1DCNN)architectures to enhance ransomware detection capabilities.Addressing common challenges in ransomware detection,particularly dataset class imbalance,the synthetic minority oversampling technique(SMOTE)is employed to generate synthetic samples for minority class,thereby improving detection accuracy.The integration of ViT and 1DCNN through feature fusion enables the model to capture both global contextual and local sequential features,resulting in comprehensive ransomware classification.Tested on the UNSW-NB15 dataset,the proposed ViT-1DCNN model achieved 98%detection accuracy with precision,recall,and F1-score metrics surpassing conventional methods.This approach not only reduces false positives and negatives but also offers scalability and robustness for real-world cybersecurity applications.The results demonstrate the model’s potential as an effective tool for proactive ransomware detection,especially in environments where evolving threats require adaptable and high-accuracy solutions.
基金supported by the Beijing Natural Science Foundation(No.5252014)the Open Fund of The Key Laboratory of Urban Ecological Environment Simulation and Protection,Ministry of Ecology and Environment of the People's Republic of China (No.UEESP-202502)the National Natural Science Foundation of China (No.62303063&32371874)。
文摘Bird monitoring and protection are essential for maintaining biodiversity,and fine-grained bird classification has become a key focus in this field.Audio-visual modalities provide critical cues for this task,but robust feature extraction and efficient fusion remain major challenges.We introduce a multi-stage fine-grained audiovisual fusion network(MSFG-AVFNet) for fine-grained bird species classification,which addresses these challenges through two key components:(1) the audiovisual feature extraction module,which adopts a multi-stage finetuning strategy to provide high-quality unimodal features,laying a solid foundation for modality fusion;(2) the audiovisual feature fusion module,which combines a max pooling aggregation strategy with a novel audiovisual loss function to achieve effective and robust feature fusion.Experiments were conducted on the self-built AVB81and the publicly available SSW60 datasets,which contain data from 81 and 60 bird species,respectively.Comprehensive experiments demonstrate that our approach achieves notable performance gains,outperforming existing state-of-the-art methods.These results highlight its effectiveness in leveraging audiovisual modalities for fine-grained bird classification and its potential to support ecological monitoring and biodiversity research.