Safety maintenance of power equipment is of great importance in power grids,in which image-processing-based defect recognition is supposed to classify abnormal conditions during daily inspection.However,owing to the b...Safety maintenance of power equipment is of great importance in power grids,in which image-processing-based defect recognition is supposed to classify abnormal conditions during daily inspection.However,owing to the blurred features of defect images,the current defect recognition algorithm has poor fine-grained recognition ability.Visual attention can achieve fine-grained recognition with its abil-ity to model long-range dependencies while introducing extra computational complexity,especially for multi-head attention in vision transformer structures.Under these circumstances,this paper proposes a self-reduction multi-head attention module that can reduce computational complexity and be easily combined with a Convolutional Neural Network(CNN).In this manner,local and global fea-tures can be calculated simultaneously in our proposed structure,aiming to improve the defect recognition performance.Specifically,the proposed self-reduction multi-head attention can reduce redundant parameters,thereby solving the problem of limited computational resources.Experimental results were obtained based on the defect dataset collected from the substation.The results demonstrated the efficiency and superiority of the proposed method over other advanced algorithms.展开更多
The generation of high-quality,realistic face generation has emerged as a key field of research in computer vision.This paper proposes a robust approach that combines a Super-Resolution Generative Adversarial Network(...The generation of high-quality,realistic face generation has emerged as a key field of research in computer vision.This paper proposes a robust approach that combines a Super-Resolution Generative Adversarial Network(SRGAN)with a Pyramid Attention Module(PAM)to enhance the quality of deep face generation.The SRGAN framework is designed to improve the resolution of generated images,addressing common challenges such as blurriness and a lack of intricate details.The Pyramid Attention Module further complements the process by focusing on multi-scale feature extraction,enabling the network to capture finer details and complex facial features more effectively.The proposed method was trained and evaluated over 100 epochs on the CelebA dataset,demonstrating consistent improvements in image quality and a marked decrease in generator and discriminator losses,reflecting the model’s capacity to learn and synthesize high-quality images effectively,given adequate computational resources.Experimental outcome demonstrates that the SRGAN model with PAM module has outperformed,yielding an aggregate discriminator loss of 0.055 for real,0.043 for fake,and a generator loss of 10.58 after training for 100 epochs.The model has yielded an structural similarity index measure of 0.923,that has outperformed the other models that are considered in the current study for analysis.展开更多
With the continuous development of artificial intelligence and machine learning techniques,there have been effective methods supporting the work of dermatologist in the field of skin cancer detection.However,object si...With the continuous development of artificial intelligence and machine learning techniques,there have been effective methods supporting the work of dermatologist in the field of skin cancer detection.However,object significant challenges have been presented in accurately segmenting melanomas in dermoscopic images due to the objects that could interfere human observations,such as bubbles and scales.To address these challenges,we propose a dual U-Net network framework for skin melanoma segmentation.In our proposed architecture,we introduce several innovative components that aim to enhance the performance and capabilities of the traditional U-Net.First,we establish a novel framework that links two simplified U-Nets,enabling more comprehensive information exchange and feature integration throughout the network.Second,after cascading the second U-Net,we introduce a skip connection between the decoder and encoder networks,and incorporate a modified receptive field block(MRFB),which is designed to capture multi-scale spatial information.Third,to further enhance the feature representation capabilities,we add a multi-path convolution block attention module(MCBAM)to the first two layers of the first U-Net encoding,and integrate a new squeeze-and-excitation(SE)mechanism with residual connections in the second U-Net.To illustrate the performance of our proposed model,we conducted comprehensive experiments on widely recognized skin datasets.On the ISIC-2017 dataset,the IoU value of our proposed model increased from 0.6406 to 0.6819 and the Dice coefficient increased from 0.7625 to 0.8023.On the ISIC-2018 dataset,the IoU value of proposed model also improved from 0.7138 to 0.7709,while the Dice coefficient increased from 0.8285 to 0.8665.Furthermore,the generalization experiments conducted on the jaw cyst dataset from Quzhou People’s Hospital further verified the outstanding segmentation performance of the proposed model.These findings collectively affirm the potential of our approach as a valuable tool in supporting clinical decision-making in the field of skin cancer detection,as well as advancing research in medical image analysis.展开更多
Sarcasm detection is a complex and challenging task,particularly in the context of Chinese social media,where it exhibits strong contextual dependencies and cultural specificity.To address the limitations of existing ...Sarcasm detection is a complex and challenging task,particularly in the context of Chinese social media,where it exhibits strong contextual dependencies and cultural specificity.To address the limitations of existing methods in capturing the implicit semantics and contextual associations in sarcastic expressions,this paper proposes an event-aware model for Chinese sarcasm detection,leveraging a multi-head attention(MHA)mechanism and contrastive learning(CL)strategies.The proposed model employs a dual-path Bidirectional Encoder Representations from Transformers(BERT)encoder to process comment text and event context separately and integrates an MHA mechanism to facilitate deep interactions between the two,thereby capturing multidimensional semantic associations.Additionally,a CL strategy is introduced to enhance feature representation capabilities,further improving the model’s performance in handling class imbalance and complex contextual scenarios.The model achieves state-of-the-art performance on the Chinese sarcasm dataset,with significant improvements in accuracy(79.55%),F1-score(84.22%),and an area under the curve(AUC,84.35%).展开更多
Abnormal network traffic, as a frequent security risk, requires a series of techniques to categorize and detect it. Existing network traffic anomaly detection still faces challenges: the inability to fully extract loc...Abnormal network traffic, as a frequent security risk, requires a series of techniques to categorize and detect it. Existing network traffic anomaly detection still faces challenges: the inability to fully extract local and global features, as well as the lack of effective mechanisms to capture complex interactions between features;Additionally, when increasing the receptive field to obtain deeper feature representations, the reliance on increasing network depth leads to a significant increase in computational resource consumption, affecting the efficiency and performance of detection. Based on these issues, firstly, this paper proposes a network traffic anomaly detection model based on parallel dilated convolution and residual learning (Res-PDC). To better explore the interactive relationships between features, the traffic samples are converted into two-dimensional matrix. A module combining parallel dilated convolutions and residual learning (res-pdc) was designed to extract local and global features of traffic at different scales. By utilizing res-pdc modules with different dilation rates, we can effectively capture spatial features at different scales and explore feature dependencies spanning wider regions without increasing computational resources. Secondly, to focus and integrate the information in different feature subspaces, further enhance and extract the interactions among the features, multi-head attention is added to Res-PDC, resulting in the final model: multi-head attention enhanced parallel dilated convolution and residual learning (MHA-Res-PDC) for network traffic anomaly detection. Finally, comparisons with other machine learning and deep learning algorithms are conducted on the NSL-KDD and CIC-IDS-2018 datasets. The experimental results demonstrate that the proposed method in this paper can effectively improve the detection performance.展开更多
As the group-buying model shows significant progress in attracting new users,enhancing user engagement,and increasing platform profitability,providing personalized recommendations for group-buying users has emerged as...As the group-buying model shows significant progress in attracting new users,enhancing user engagement,and increasing platform profitability,providing personalized recommendations for group-buying users has emerged as a new challenge in the field of recommendation systems.This paper introduces a group-buying recommendation model based on multi-head attention mechanisms and multi-task learning,termed the Multi-head Attention Mechanisms and Multi-task Learning Group-Buying Recommendation(MAMGBR)model,specifically designed to optimize group-buying recommendations on e-commerce platforms.The core dataset of this study comes from the Chinese maternal and infant e-commerce platform“Beibei,”encompassing approximately 430,000 successful groupbuying actions and over 120,000 users.Themodel focuses on twomain tasks:recommending items for group organizers(Task Ⅰ)and recommending participants for a given group-buying event(Task Ⅱ).In model evaluation,MAMGBR achieves an MRR@10 of 0.7696 for Task I,marking a 20.23%improvement over baseline models.Furthermore,in Task II,where complex interaction patterns prevail,MAMGBR utilizes auxiliary loss functions to effectively model the multifaceted roles of users,items,and participants,leading to a 24.08%increase in MRR@100 under a 1:99 sample ratio.Experimental results show that compared to benchmark models,such as NGCF and EATNN,MAMGBR’s integration ofmulti-head attentionmechanisms,expert networks,and gating mechanisms enables more accurate modeling of user preferences and social associations within group-buying scenarios,significantly enhancing recommendation accuracy and platform group-buying success rates.展开更多
Coal dust explosions are severe safety accidents in coal mine production,posing significant threats to life and property.Predicting the maximum explosion pressure(Pm)of coal dust using deep learning models can effecti...Coal dust explosions are severe safety accidents in coal mine production,posing significant threats to life and property.Predicting the maximum explosion pressure(Pm)of coal dust using deep learning models can effectively assess potential risks and provide a scientific basis for preventing coal dust explosions.In this study,a 20-L explosion sphere apparatus was used to test the maximum explosion pressure of coal dust under seven different particle sizes and ten mass concentrations(Cdust),resulting in a dataset of 70 experimental groups.Through Spearman correlation analysis and random forest feature selection methods,particle size(D_(10),D_(20),D_(50))and mass concentration(Cdust)were identified as critical feature parameters from the ten initial parameters of the coal dust samples.Based on this,a hybrid Long Short-Term Memory(LSTM)network model incorporating a Multi-Head Attention Mechanism and the Sparrow Search Algorithm(SSA)was proposed to predict the maximum explosion pressure of coal dust.The results demonstrate that the SSA-LSTM-Multi-Head Attention model excels in predicting the maximum explosion pressure of coal dust.The four evaluation metrics indicate that the model achieved a coefficient of determination(R^(2)),root mean square error(RMSE),mean absolute percentage error(MAPE),and mean absolute error(MAE)of 0.9841,0.0030,0.0074,and 0.0049,respectively,in the training set.In the testing set,these values were 0.9743,0.0087,0.0108,and 0.0069,respectively.Compared to artificial neural networks(ANN),random forest(RF),support vector machines(SVM),particle swarm optimized-SVM(PSO-SVM)neural networks,and the traditional single-model LSTM,the SSA-LSTM-Multi-Head Attention model demonstrated superior generalization capability and prediction accuracy.The findings of this study not only advance the application of deep learning in coal dust explosion prediction but also provide robust technical support for the prevention and risk assessment of coal dust explosions.展开更多
Traffic flow prediction is a crucial element of intelligent transportation systems.However,accu-rate traffic flow prediction is quite challenging because of its highly nonlinear,complex,and dynam-ic characteristics.To...Traffic flow prediction is a crucial element of intelligent transportation systems.However,accu-rate traffic flow prediction is quite challenging because of its highly nonlinear,complex,and dynam-ic characteristics.To address the difficulties in simultaneously capturing local and global dynamic spatiotemporal correlations in traffic flow,as well as the high time complexity of existing models,a multi-head flow attention-based local-global dynamic hypergraph convolution(MFA-LGDHC)pre-diction model is proposed.which consists of multi-head flow attention(MHFA)mechanism,graph convolution network(GCN),and local-global dynamic hypergraph convolution(LGHC).MHFA is utilized to extract the time dependency of traffic flow and reduce the time complexity of the model.GCN is employed to catch the spatial dependency of traffic flow.LGHC utilizes down-sampling con-volution and isometric convolution to capture the local and global spatial dependencies of traffic flow.And dynamic hypergraph convolution is used to model the dynamic higher-order relationships of the traffic road network.Experimental results indicate that the MFA-LGDHC model outperforms current popular baseline models and exhibits good prediction performance.展开更多
The unsupervised multi-modal image translation is an emerging domain of computer vision whose goal is to transform an image from the source domain into many diverse styles in the target domain.However,the multi-genera...The unsupervised multi-modal image translation is an emerging domain of computer vision whose goal is to transform an image from the source domain into many diverse styles in the target domain.However,the multi-generator mechanism is employed among the advanced approaches available to model different domain mappings,which results in inefficient training of neural networks and pattern collapse,leading to inefficient generation of image diversity.To address this issue,this paper introduces a multi-modal unsupervised image translation framework that uses a generator to perform multi-modal image translation.Specifically,firstly,the domain code is introduced in this paper to explicitly control the different generation tasks.Secondly,this paper brings in the squeeze-and-excitation(SE)mechanism and feature attention(FA)module.Finally,the model integrates multiple optimization objectives to ensure efficient multi-modal translation.This paper performs qualitative and quantitative experiments on multiple non-paired benchmark image translation datasets while demonstrating the benefits of the proposed method over existing technologies.Overall,experimental results have shown that the proposed method is versatile and scalable.展开更多
Due to the lack of accurate data and complex parameterization,the prediction of groundwater depth is a chal-lenge for numerical models.Machine learning can effectively solve this issue and has been proven useful in th...Due to the lack of accurate data and complex parameterization,the prediction of groundwater depth is a chal-lenge for numerical models.Machine learning can effectively solve this issue and has been proven useful in the prediction of groundwater depth in many areas.In this study,two new models are applied to the prediction of groundwater depth in the Ningxia area,China.The two models combine the improved dung beetle optimizer(DBO)algorithm with two deep learning models:The Multi-head Attention-Convolution Neural Network-Long Short Term Memory networks(MH-CNN-LSTM)and the Multi-head Attention-Convolution Neural Network-Gated Recurrent Unit(MH-CNN-GRU).The models with DBO show better prediction performance,with larger R(correlation coefficient),RPD(residual prediction deviation),and lower RMSE(root-mean-square error).Com-pared with the models with the original DBO,the R and RPD of models with the improved DBO increase by over 1.5%,and the RMSE decreases by over 1.8%,indicating better prediction results.In addition,compared with the multiple linear regression model,a traditional statistical model,deep learning models have better prediction performance.展开更多
Due to the time-varying topology and possible disturbances in a conflict environment,it is still challenging to maintain the mission performance of flying Ad hoc networks(FANET),which limits the application of Unmanne...Due to the time-varying topology and possible disturbances in a conflict environment,it is still challenging to maintain the mission performance of flying Ad hoc networks(FANET),which limits the application of Unmanned Aerial Vehicle(UAV)swarms in harsh environments.This paper proposes an intelligent framework to quickly recover the cooperative coveragemission by aggregating the historical spatio-temporal network with the attention mechanism.The mission resilience metric is introduced in conjunction with connectivity and coverage status information to simplify the optimization model.A spatio-temporal node pooling method is proposed to ensure all node location features can be updated after destruction by capturing the temporal network structure.Combined with the corresponding Laplacian matrix as the hyperparameter,a recovery algorithm based on the multi-head attention graph network is designed to achieve rapid recovery.Simulation results showed that the proposed framework can facilitate rapid recovery of the connectivity and coverage more effectively compared to the existing studies.The results demonstrate that the average connectivity and coverage results is improved by 17.92%and 16.96%,respectively compared with the state-of-the-art model.Furthermore,by the ablation study,the contributions of each different improvement are compared.The proposed model can be used to support resilient network design for real-time mission execution.展开更多
Rail surface damage is a critical component of high-speed railway infrastructure,directly affecting train operational stability and safety.Existing methods face limitations in accuracy and speed for small-sample,multi...Rail surface damage is a critical component of high-speed railway infrastructure,directly affecting train operational stability and safety.Existing methods face limitations in accuracy and speed for small-sample,multi-category,and multi-scale target segmentation tasks.To address these challenges,this paper proposes Pyramid-MixNet,an intelligent segmentation model for high-speed rail surface damage,leveraging dataset construction and expansion alongside a feature pyramid-based encoder-decoder network with multi-attention mechanisms.The encoding net-work integrates Spatial Reduction Masked Multi-Head Attention(SRMMHA)to enhance global feature extraction while reducing trainable parameters.The decoding network incorporates Mix-Attention(MA),enabling multi-scale structural understanding and cross-scale token group correlation learning.Experimental results demonstrate that the proposed method achieves 62.17%average segmentation accuracy,80.28%Damage Dice Coefficient,and 56.83 FPS,meeting real-time detection requirements.The model’s high accuracy and scene adaptability significantly improve the detection of small-scale and complex multi-scale rail damage,offering practical value for real-time monitoring in high-speed railway maintenance systems.展开更多
Pest detection techniques are helpful in reducing the frequency and scale of pest outbreaks;however,their application in the actual agricultural production process is still challenging owing to the problems of intersp...Pest detection techniques are helpful in reducing the frequency and scale of pest outbreaks;however,their application in the actual agricultural production process is still challenging owing to the problems of interspecies similarity,multi-scale,and background complexity of pests.To address these problems,this study proposes an FD-YOLO pest target detection model.The FD-YOLO model uses a Fully Connected Feature Pyramid Network(FC-FPN)instead of a PANet in the neck,which can adaptively fuse multi-scale information so that the model can retain small-scale target features in the deep layer,enhance large-scale target features in the shallow layer,and enhance the multiplexing of effective features.A dual self-attention module(DSA)is then embedded in the C3 module of the neck,which captures the dependencies between the information in both spatial and channel dimensions,effectively enhancing global features.We selected 16 types of pests that widely damage field crops in the IP102 pest dataset,which were used as our dataset after data supplementation and enhancement.The experimental results showed that FD-YOLO’s mAP@0.5 improved by 6.8%compared to YOLOv5,reaching 82.6%and 19.1%–5%better than other state-of-the-art models.This method provides an effective new approach for detecting similar or multiscale pests in field crops.展开更多
This paper proposes an automated detection framework for transmission facilities using a featureattention multi-scale robustness network(FAMSR-Net)with high-fidelity virtual images.The proposed framework exhibits thre...This paper proposes an automated detection framework for transmission facilities using a featureattention multi-scale robustness network(FAMSR-Net)with high-fidelity virtual images.The proposed framework exhibits three key characteristics.First,virtual images of the transmission facilities generated using StyleGAN2-ADA are co-trained with real images.This enables the neural network to learn various features of transmission facilities to improve the detection performance.Second,the convolutional block attention module is deployed in FAMSR-Net to effectively extract features from images and construct multi-dimensional feature maps,enabling the neural network to perform precise object detection in various environments.Third,an effective bounding box optimization method called Scylla-IoU is deployed on FAMSR-Net,considering the intersection over union,center point distance,angle,and shape of the bounding box.This enables the detection of power facilities of various sizes accurately.Extensive experiments demonstrated that FAMSRNet outperforms other neural networks in detecting power facilities.FAMSR-Net also achieved the highest detection accuracy when virtual images of the transmission facilities were co-trained in the training phase.The proposed framework is effective for the scheduled operation and maintenance of transmission facilities because an optical camera is currently the most promising tool for unmanned aerial vehicles.This ultimately contributes to improved inspection efficiency,reduced maintenance risks,and more reliable power delivery across extensive transmission facilities.展开更多
Multimodal image fusion plays an important role in image analysis and applications.Multimodal medical image fusion helps to combine contrast features from two or more input imaging modalities to represent fused inform...Multimodal image fusion plays an important role in image analysis and applications.Multimodal medical image fusion helps to combine contrast features from two or more input imaging modalities to represent fused information in a single image.One of the critical clinical applications of medical image fusion is to fuse anatomical and functional modalities for rapid diagnosis of malignant tissues.This paper proposes a multimodal medical image fusion network(MMIF-Net)based on multiscale hybrid attention.The method first decomposes the original image to obtain the low-rank and significant parts.Then,to utilize the features at different scales,we add amultiscalemechanism that uses three filters of different sizes to extract the features in the encoded network.Also,a hybrid attention module is introduced to obtain more image details.Finally,the fused images are reconstructed by decoding the network.We conducted experiments with clinical images from brain computed tomography/magnetic resonance.The experimental results show that the multimodal medical image fusion network method based on multiscale hybrid attention works better than other advanced fusion methods.展开更多
Vehicle re-identification involves matching images of vehicles across varying camera views.The diversity of camera locations along different roadways leads to significant intra-class variation and only minimal inter-c...Vehicle re-identification involves matching images of vehicles across varying camera views.The diversity of camera locations along different roadways leads to significant intra-class variation and only minimal inter-class similarity in the collected vehicle images,which increases the complexity of re-identification tasks.To tackle these challenges,this study proposes AG-GCN(Attention-Guided Graph Convolutional Network),a novel framework integrating several pivotal components.Initially,AG-GCN embeds a lightweight attention module within the ResNet-50 structure to learn feature weights automatically,thereby improving the representation of vehicle features globally by highlighting salient features and suppressing extraneous ones.Moreover,AG-GCN adopts a graph-based structure to encapsulate deep local features.A graph convolutional network then amalgamates these features to understand the relationships among vehicle-related characteristics.Subsequently,we amalgamate feature maps from both the attention and graph-based branches for a more comprehensive representation of vehicle features.The framework then gauges feature similarities and ranks them,thus enhancing the accuracy of vehicle re-identification.Comprehensive qualitative and quantitative analyses on two publicly available datasets verify the efficacy of AG-GCN in addressing intra-class and inter-class variability issues.展开更多
Medical image fusion technology is crucial for improving the detection accuracy and treatment efficiency of diseases,but existing fusion methods have problems such as blurred texture details,low contrast,and inability...Medical image fusion technology is crucial for improving the detection accuracy and treatment efficiency of diseases,but existing fusion methods have problems such as blurred texture details,low contrast,and inability to fully extract fused image information.Therefore,a multimodal medical image fusion method based on mask optimization and parallel attention mechanism was proposed to address the aforementioned issues.Firstly,it converted the entire image into a binary mask,and constructed a contour feature map to maximize the contour feature information of the image and a triple path network for image texture detail feature extraction and optimization.Secondly,a contrast enhancement module and a detail preservation module were proposed to enhance the overall brightness and texture details of the image.Afterwards,a parallel attention mechanism was constructed using channel features and spatial feature changes to fuse images and enhance the salient information of the fused images.Finally,a decoupling network composed of residual networks was set up to optimize the information between the fused image and the source image so as to reduce information loss in the fused image.Compared with nine high-level methods proposed in recent years,the seven objective evaluation indicators of our method have improved by 6%−31%,indicating that this method can obtain fusion results with clearer texture details,higher contrast,and smaller pixel differences between the fused image and the source image.It is superior to other comparison algorithms in both subjective and objective indicators.展开更多
We propose a hierarchical multi-scale attention mechanism-based model in response to the low accuracy and inefficient manual classification of existing oceanic biological image classification methods. Firstly, the hie...We propose a hierarchical multi-scale attention mechanism-based model in response to the low accuracy and inefficient manual classification of existing oceanic biological image classification methods. Firstly, the hierarchical efficient multi-scale attention(H-EMA) module is designed for lightweight feature extraction, achieving outstanding performance at a relatively low cost. Secondly, an improved EfficientNetV2 block is used to integrate information from different scales better and enhance inter-layer message passing. Furthermore, introducing the convolutional block attention module(CBAM) enhances the model's perception of critical features, optimizing its generalization ability. Lastly, Focal Loss is introduced to adjust the weights of complex samples to address the issue of imbalanced categories in the dataset, further improving the model's performance. The model achieved 96.11% accuracy on the intertidal marine organism dataset of Nanji Islands and 84.78% accuracy on the CIFAR-100 dataset, demonstrating its strong generalization ability to meet the demands of oceanic biological image classification.展开更多
The present study examines the impact of short-term public opinion sentiment on the secondary market,with a focus on the potential for such sentiment to cause dramatic stock price fluctuations and increase investment ...The present study examines the impact of short-term public opinion sentiment on the secondary market,with a focus on the potential for such sentiment to cause dramatic stock price fluctuations and increase investment risk.The quantification of investment sentiment indicators and the persistent analysis of their impact has been a complex and significant area of research.In this paper,a structured multi-head attention stock index prediction method based adaptive public opinion sentiment vector is proposed.The proposedmethod utilizes an innovative approach to transform numerous investor comments on social platforms over time into public opinion sentiment vectors expressing complex sentiments.It then analyzes the continuous impact of these vectors on the market through the use of aggregating techniques and public opinion data via a structured multi-head attention mechanism.The experimental results demonstrate that the public opinion sentiment vector can provide more comprehensive feedback on market sentiment than traditional sentiment polarity analysis.Furthermore,the multi-head attention mechanism is shown to improve prediction accuracy through attention convergence on each type of input information separately.Themean absolute percentage error(MAPE)of the proposedmethod is 0.463%,a reduction of 0.294% compared to the benchmark attention algorithm.Additionally,the market backtesting results indicate that the return was 24.560%,an improvement of 8.202% compared to the benchmark algorithm.These results suggest that themarket trading strategy based on thismethod has the potential to improve trading profits.展开更多
基金supported in part by Major Program of the National Natural Science Foundation of China under Grant 62127803.
文摘Safety maintenance of power equipment is of great importance in power grids,in which image-processing-based defect recognition is supposed to classify abnormal conditions during daily inspection.However,owing to the blurred features of defect images,the current defect recognition algorithm has poor fine-grained recognition ability.Visual attention can achieve fine-grained recognition with its abil-ity to model long-range dependencies while introducing extra computational complexity,especially for multi-head attention in vision transformer structures.Under these circumstances,this paper proposes a self-reduction multi-head attention module that can reduce computational complexity and be easily combined with a Convolutional Neural Network(CNN).In this manner,local and global fea-tures can be calculated simultaneously in our proposed structure,aiming to improve the defect recognition performance.Specifically,the proposed self-reduction multi-head attention can reduce redundant parameters,thereby solving the problem of limited computational resources.Experimental results were obtained based on the defect dataset collected from the substation.The results demonstrated the efficiency and superiority of the proposed method over other advanced algorithms.
基金supported by the National Research Foundation of Korea(NRF)grant funded by the Korea government(*MSIT)(No.2018R1A5A7059549).
文摘The generation of high-quality,realistic face generation has emerged as a key field of research in computer vision.This paper proposes a robust approach that combines a Super-Resolution Generative Adversarial Network(SRGAN)with a Pyramid Attention Module(PAM)to enhance the quality of deep face generation.The SRGAN framework is designed to improve the resolution of generated images,addressing common challenges such as blurriness and a lack of intricate details.The Pyramid Attention Module further complements the process by focusing on multi-scale feature extraction,enabling the network to capture finer details and complex facial features more effectively.The proposed method was trained and evaluated over 100 epochs on the CelebA dataset,demonstrating consistent improvements in image quality and a marked decrease in generator and discriminator losses,reflecting the model’s capacity to learn and synthesize high-quality images effectively,given adequate computational resources.Experimental outcome demonstrates that the SRGAN model with PAM module has outperformed,yielding an aggregate discriminator loss of 0.055 for real,0.043 for fake,and a generator loss of 10.58 after training for 100 epochs.The model has yielded an structural similarity index measure of 0.923,that has outperformed the other models that are considered in the current study for analysis.
基金funded by Zhejiang Basic Public Welfare Research Project,grant number LZY24E060001supported by Guangzhou Development Zone Science and Technology(2021GH10,2020GH10,2023GH02)+1 种基金the University of Macao(MYRG2022-00271-FST)the Science and Technology Development Fund(FDCT)of Macao(0032/2022/A).
文摘With the continuous development of artificial intelligence and machine learning techniques,there have been effective methods supporting the work of dermatologist in the field of skin cancer detection.However,object significant challenges have been presented in accurately segmenting melanomas in dermoscopic images due to the objects that could interfere human observations,such as bubbles and scales.To address these challenges,we propose a dual U-Net network framework for skin melanoma segmentation.In our proposed architecture,we introduce several innovative components that aim to enhance the performance and capabilities of the traditional U-Net.First,we establish a novel framework that links two simplified U-Nets,enabling more comprehensive information exchange and feature integration throughout the network.Second,after cascading the second U-Net,we introduce a skip connection between the decoder and encoder networks,and incorporate a modified receptive field block(MRFB),which is designed to capture multi-scale spatial information.Third,to further enhance the feature representation capabilities,we add a multi-path convolution block attention module(MCBAM)to the first two layers of the first U-Net encoding,and integrate a new squeeze-and-excitation(SE)mechanism with residual connections in the second U-Net.To illustrate the performance of our proposed model,we conducted comprehensive experiments on widely recognized skin datasets.On the ISIC-2017 dataset,the IoU value of our proposed model increased from 0.6406 to 0.6819 and the Dice coefficient increased from 0.7625 to 0.8023.On the ISIC-2018 dataset,the IoU value of proposed model also improved from 0.7138 to 0.7709,while the Dice coefficient increased from 0.8285 to 0.8665.Furthermore,the generalization experiments conducted on the jaw cyst dataset from Quzhou People’s Hospital further verified the outstanding segmentation performance of the proposed model.These findings collectively affirm the potential of our approach as a valuable tool in supporting clinical decision-making in the field of skin cancer detection,as well as advancing research in medical image analysis.
基金granted by Qin Xin Talents Cultivation Program(No.QXTCP C202115),Beijing Information Science&Technology Universitythe Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing Fund(No.GJJ-23),National Social Science Foundation,China(No.21BTQ079).
文摘Sarcasm detection is a complex and challenging task,particularly in the context of Chinese social media,where it exhibits strong contextual dependencies and cultural specificity.To address the limitations of existing methods in capturing the implicit semantics and contextual associations in sarcastic expressions,this paper proposes an event-aware model for Chinese sarcasm detection,leveraging a multi-head attention(MHA)mechanism and contrastive learning(CL)strategies.The proposed model employs a dual-path Bidirectional Encoder Representations from Transformers(BERT)encoder to process comment text and event context separately and integrates an MHA mechanism to facilitate deep interactions between the two,thereby capturing multidimensional semantic associations.Additionally,a CL strategy is introduced to enhance feature representation capabilities,further improving the model’s performance in handling class imbalance and complex contextual scenarios.The model achieves state-of-the-art performance on the Chinese sarcasm dataset,with significant improvements in accuracy(79.55%),F1-score(84.22%),and an area under the curve(AUC,84.35%).
基金supported by the Xiamen Science and Technology Subsidy Project(No.2023CXY0318).
文摘Abnormal network traffic, as a frequent security risk, requires a series of techniques to categorize and detect it. Existing network traffic anomaly detection still faces challenges: the inability to fully extract local and global features, as well as the lack of effective mechanisms to capture complex interactions between features;Additionally, when increasing the receptive field to obtain deeper feature representations, the reliance on increasing network depth leads to a significant increase in computational resource consumption, affecting the efficiency and performance of detection. Based on these issues, firstly, this paper proposes a network traffic anomaly detection model based on parallel dilated convolution and residual learning (Res-PDC). To better explore the interactive relationships between features, the traffic samples are converted into two-dimensional matrix. A module combining parallel dilated convolutions and residual learning (res-pdc) was designed to extract local and global features of traffic at different scales. By utilizing res-pdc modules with different dilation rates, we can effectively capture spatial features at different scales and explore feature dependencies spanning wider regions without increasing computational resources. Secondly, to focus and integrate the information in different feature subspaces, further enhance and extract the interactions among the features, multi-head attention is added to Res-PDC, resulting in the final model: multi-head attention enhanced parallel dilated convolution and residual learning (MHA-Res-PDC) for network traffic anomaly detection. Finally, comparisons with other machine learning and deep learning algorithms are conducted on the NSL-KDD and CIC-IDS-2018 datasets. The experimental results demonstrate that the proposed method in this paper can effectively improve the detection performance.
基金supported by the Key Research and Development Program of Heilongjiang Province(No.2022ZX01A35).
文摘As the group-buying model shows significant progress in attracting new users,enhancing user engagement,and increasing platform profitability,providing personalized recommendations for group-buying users has emerged as a new challenge in the field of recommendation systems.This paper introduces a group-buying recommendation model based on multi-head attention mechanisms and multi-task learning,termed the Multi-head Attention Mechanisms and Multi-task Learning Group-Buying Recommendation(MAMGBR)model,specifically designed to optimize group-buying recommendations on e-commerce platforms.The core dataset of this study comes from the Chinese maternal and infant e-commerce platform“Beibei,”encompassing approximately 430,000 successful groupbuying actions and over 120,000 users.Themodel focuses on twomain tasks:recommending items for group organizers(Task Ⅰ)and recommending participants for a given group-buying event(Task Ⅱ).In model evaluation,MAMGBR achieves an MRR@10 of 0.7696 for Task I,marking a 20.23%improvement over baseline models.Furthermore,in Task II,where complex interaction patterns prevail,MAMGBR utilizes auxiliary loss functions to effectively model the multifaceted roles of users,items,and participants,leading to a 24.08%increase in MRR@100 under a 1:99 sample ratio.Experimental results show that compared to benchmark models,such as NGCF and EATNN,MAMGBR’s integration ofmulti-head attentionmechanisms,expert networks,and gating mechanisms enables more accurate modeling of user preferences and social associations within group-buying scenarios,significantly enhancing recommendation accuracy and platform group-buying success rates.
基金funded by the Research on Intelligent Mining Geological Model and Ventilation Model for Extremely Thin Coal Seam in Heilongjiang Province,China(2021ZXJ02A03)the Demonstration of Intelligent Mining for Comprehensive Mining Face in Extremely Thin Coal Seam in Heilongjiang Province,China(2021ZXJ02A04)the Natural Science Foundation of Heilongjiang Province,China(LH2024E112).
文摘Coal dust explosions are severe safety accidents in coal mine production,posing significant threats to life and property.Predicting the maximum explosion pressure(Pm)of coal dust using deep learning models can effectively assess potential risks and provide a scientific basis for preventing coal dust explosions.In this study,a 20-L explosion sphere apparatus was used to test the maximum explosion pressure of coal dust under seven different particle sizes and ten mass concentrations(Cdust),resulting in a dataset of 70 experimental groups.Through Spearman correlation analysis and random forest feature selection methods,particle size(D_(10),D_(20),D_(50))and mass concentration(Cdust)were identified as critical feature parameters from the ten initial parameters of the coal dust samples.Based on this,a hybrid Long Short-Term Memory(LSTM)network model incorporating a Multi-Head Attention Mechanism and the Sparrow Search Algorithm(SSA)was proposed to predict the maximum explosion pressure of coal dust.The results demonstrate that the SSA-LSTM-Multi-Head Attention model excels in predicting the maximum explosion pressure of coal dust.The four evaluation metrics indicate that the model achieved a coefficient of determination(R^(2)),root mean square error(RMSE),mean absolute percentage error(MAPE),and mean absolute error(MAE)of 0.9841,0.0030,0.0074,and 0.0049,respectively,in the training set.In the testing set,these values were 0.9743,0.0087,0.0108,and 0.0069,respectively.Compared to artificial neural networks(ANN),random forest(RF),support vector machines(SVM),particle swarm optimized-SVM(PSO-SVM)neural networks,and the traditional single-model LSTM,the SSA-LSTM-Multi-Head Attention model demonstrated superior generalization capability and prediction accuracy.The findings of this study not only advance the application of deep learning in coal dust explosion prediction but also provide robust technical support for the prevention and risk assessment of coal dust explosions.
基金Supported by the Key R&D Program of Gansu Province(No.23YFGA0063)the Key Talent Project of Gansu Province(No.2024RCXM57,2024RCXM22)the Major Science and Technology Special Program of Gansu Province(No.25ZYJA037).
文摘Traffic flow prediction is a crucial element of intelligent transportation systems.However,accu-rate traffic flow prediction is quite challenging because of its highly nonlinear,complex,and dynam-ic characteristics.To address the difficulties in simultaneously capturing local and global dynamic spatiotemporal correlations in traffic flow,as well as the high time complexity of existing models,a multi-head flow attention-based local-global dynamic hypergraph convolution(MFA-LGDHC)pre-diction model is proposed.which consists of multi-head flow attention(MHFA)mechanism,graph convolution network(GCN),and local-global dynamic hypergraph convolution(LGHC).MHFA is utilized to extract the time dependency of traffic flow and reduce the time complexity of the model.GCN is employed to catch the spatial dependency of traffic flow.LGHC utilizes down-sampling con-volution and isometric convolution to capture the local and global spatial dependencies of traffic flow.And dynamic hypergraph convolution is used to model the dynamic higher-order relationships of the traffic road network.Experimental results indicate that the MFA-LGDHC model outperforms current popular baseline models and exhibits good prediction performance.
基金the National Natural Science Foundation of China(No.61976080)the Academic Degrees&Graduate Education Reform Project of Henan Province(No.2021SJGLX195Y)+1 种基金the Teaching Reform Research and Practice Project of Henan Undergraduate Universities(No.2022SYJXLX008)the Key Project on Research and Practice of Henan University Graduate Education and Teaching Reform(No.YJSJG2023XJ006)。
文摘The unsupervised multi-modal image translation is an emerging domain of computer vision whose goal is to transform an image from the source domain into many diverse styles in the target domain.However,the multi-generator mechanism is employed among the advanced approaches available to model different domain mappings,which results in inefficient training of neural networks and pattern collapse,leading to inefficient generation of image diversity.To address this issue,this paper introduces a multi-modal unsupervised image translation framework that uses a generator to perform multi-modal image translation.Specifically,firstly,the domain code is introduced in this paper to explicitly control the different generation tasks.Secondly,this paper brings in the squeeze-and-excitation(SE)mechanism and feature attention(FA)module.Finally,the model integrates multiple optimization objectives to ensure efficient multi-modal translation.This paper performs qualitative and quantitative experiments on multiple non-paired benchmark image translation datasets while demonstrating the benefits of the proposed method over existing technologies.Overall,experimental results have shown that the proposed method is versatile and scalable.
基金supported by the National Natural Science Foundation of China [grant numbers 42088101 and 42375048]。
文摘Due to the lack of accurate data and complex parameterization,the prediction of groundwater depth is a chal-lenge for numerical models.Machine learning can effectively solve this issue and has been proven useful in the prediction of groundwater depth in many areas.In this study,two new models are applied to the prediction of groundwater depth in the Ningxia area,China.The two models combine the improved dung beetle optimizer(DBO)algorithm with two deep learning models:The Multi-head Attention-Convolution Neural Network-Long Short Term Memory networks(MH-CNN-LSTM)and the Multi-head Attention-Convolution Neural Network-Gated Recurrent Unit(MH-CNN-GRU).The models with DBO show better prediction performance,with larger R(correlation coefficient),RPD(residual prediction deviation),and lower RMSE(root-mean-square error).Com-pared with the models with the original DBO,the R and RPD of models with the improved DBO increase by over 1.5%,and the RMSE decreases by over 1.8%,indicating better prediction results.In addition,compared with the multiple linear regression model,a traditional statistical model,deep learning models have better prediction performance.
基金the National Natural Science Foundation of China(NNSFC)(Grant Nos.72001213 and 72301292)the National Social Science Fund of China(Grant No.19BGL297)the Basic Research Program of Natural Science in Shaanxi Province(Grant No.2021JQ-369).
文摘Due to the time-varying topology and possible disturbances in a conflict environment,it is still challenging to maintain the mission performance of flying Ad hoc networks(FANET),which limits the application of Unmanned Aerial Vehicle(UAV)swarms in harsh environments.This paper proposes an intelligent framework to quickly recover the cooperative coveragemission by aggregating the historical spatio-temporal network with the attention mechanism.The mission resilience metric is introduced in conjunction with connectivity and coverage status information to simplify the optimization model.A spatio-temporal node pooling method is proposed to ensure all node location features can be updated after destruction by capturing the temporal network structure.Combined with the corresponding Laplacian matrix as the hyperparameter,a recovery algorithm based on the multi-head attention graph network is designed to achieve rapid recovery.Simulation results showed that the proposed framework can facilitate rapid recovery of the connectivity and coverage more effectively compared to the existing studies.The results demonstrate that the average connectivity and coverage results is improved by 17.92%and 16.96%,respectively compared with the state-of-the-art model.Furthermore,by the ablation study,the contributions of each different improvement are compared.The proposed model can be used to support resilient network design for real-time mission execution.
基金supported in part by the National Natural Science Foundation of China under Grant 6226070954Jiangxi Provincial Key R&D Programme under Grant 20244BBG73002.
文摘Rail surface damage is a critical component of high-speed railway infrastructure,directly affecting train operational stability and safety.Existing methods face limitations in accuracy and speed for small-sample,multi-category,and multi-scale target segmentation tasks.To address these challenges,this paper proposes Pyramid-MixNet,an intelligent segmentation model for high-speed rail surface damage,leveraging dataset construction and expansion alongside a feature pyramid-based encoder-decoder network with multi-attention mechanisms.The encoding net-work integrates Spatial Reduction Masked Multi-Head Attention(SRMMHA)to enhance global feature extraction while reducing trainable parameters.The decoding network incorporates Mix-Attention(MA),enabling multi-scale structural understanding and cross-scale token group correlation learning.Experimental results demonstrate that the proposed method achieves 62.17%average segmentation accuracy,80.28%Damage Dice Coefficient,and 56.83 FPS,meeting real-time detection requirements.The model’s high accuracy and scene adaptability significantly improve the detection of small-scale and complex multi-scale rail damage,offering practical value for real-time monitoring in high-speed railway maintenance systems.
基金funded by Liaoning Provincial Department of Education Project,Award number JYTMS20230418.
文摘Pest detection techniques are helpful in reducing the frequency and scale of pest outbreaks;however,their application in the actual agricultural production process is still challenging owing to the problems of interspecies similarity,multi-scale,and background complexity of pests.To address these problems,this study proposes an FD-YOLO pest target detection model.The FD-YOLO model uses a Fully Connected Feature Pyramid Network(FC-FPN)instead of a PANet in the neck,which can adaptively fuse multi-scale information so that the model can retain small-scale target features in the deep layer,enhance large-scale target features in the shallow layer,and enhance the multiplexing of effective features.A dual self-attention module(DSA)is then embedded in the C3 module of the neck,which captures the dependencies between the information in both spatial and channel dimensions,effectively enhancing global features.We selected 16 types of pests that widely damage field crops in the IP102 pest dataset,which were used as our dataset after data supplementation and enhancement.The experimental results showed that FD-YOLO’s mAP@0.5 improved by 6.8%compared to YOLOv5,reaching 82.6%and 19.1%–5%better than other state-of-the-art models.This method provides an effective new approach for detecting similar or multiscale pests in field crops.
基金supported by the Korea Electric Power Corporation(R22TA14,Development of Drone Systemfor Diagnosis of Porcelain Insulators in Overhead Transmission Lines)the National Fire Agency of Korea(RS-2024-00408270,Fire Hazard Analysis and Fire Safety Standards Development for Transportation and Storage Stage of Reuse Battery)the Ministry of the Interior and Safety of Korea(RS-2024-00408982,Development of Intelligent Fire Detection and Sprinkler Facility Technology Reflecting the Characteristics of Logistics Facilities).
文摘This paper proposes an automated detection framework for transmission facilities using a featureattention multi-scale robustness network(FAMSR-Net)with high-fidelity virtual images.The proposed framework exhibits three key characteristics.First,virtual images of the transmission facilities generated using StyleGAN2-ADA are co-trained with real images.This enables the neural network to learn various features of transmission facilities to improve the detection performance.Second,the convolutional block attention module is deployed in FAMSR-Net to effectively extract features from images and construct multi-dimensional feature maps,enabling the neural network to perform precise object detection in various environments.Third,an effective bounding box optimization method called Scylla-IoU is deployed on FAMSR-Net,considering the intersection over union,center point distance,angle,and shape of the bounding box.This enables the detection of power facilities of various sizes accurately.Extensive experiments demonstrated that FAMSRNet outperforms other neural networks in detecting power facilities.FAMSR-Net also achieved the highest detection accuracy when virtual images of the transmission facilities were co-trained in the training phase.The proposed framework is effective for the scheduled operation and maintenance of transmission facilities because an optical camera is currently the most promising tool for unmanned aerial vehicles.This ultimately contributes to improved inspection efficiency,reduced maintenance risks,and more reliable power delivery across extensive transmission facilities.
基金supported by Qingdao Huanghai University School-Level ScientificResearch Project(2023KJ14)Undergraduate Teaching Reform Research Project of Shandong Provincial Department of Education(M2022328)+1 种基金National Natural Science Foundation of China under Grant(42472324)Qingdao Postdoctoral Foundation under Grant(QDBSH202402049).
文摘Multimodal image fusion plays an important role in image analysis and applications.Multimodal medical image fusion helps to combine contrast features from two or more input imaging modalities to represent fused information in a single image.One of the critical clinical applications of medical image fusion is to fuse anatomical and functional modalities for rapid diagnosis of malignant tissues.This paper proposes a multimodal medical image fusion network(MMIF-Net)based on multiscale hybrid attention.The method first decomposes the original image to obtain the low-rank and significant parts.Then,to utilize the features at different scales,we add amultiscalemechanism that uses three filters of different sizes to extract the features in the encoded network.Also,a hybrid attention module is introduced to obtain more image details.Finally,the fused images are reconstructed by decoding the network.We conducted experiments with clinical images from brain computed tomography/magnetic resonance.The experimental results show that the multimodal medical image fusion network method based on multiscale hybrid attention works better than other advanced fusion methods.
基金funded by the National Natural Science Foundation of China(grant number:62172292).
文摘Vehicle re-identification involves matching images of vehicles across varying camera views.The diversity of camera locations along different roadways leads to significant intra-class variation and only minimal inter-class similarity in the collected vehicle images,which increases the complexity of re-identification tasks.To tackle these challenges,this study proposes AG-GCN(Attention-Guided Graph Convolutional Network),a novel framework integrating several pivotal components.Initially,AG-GCN embeds a lightweight attention module within the ResNet-50 structure to learn feature weights automatically,thereby improving the representation of vehicle features globally by highlighting salient features and suppressing extraneous ones.Moreover,AG-GCN adopts a graph-based structure to encapsulate deep local features.A graph convolutional network then amalgamates these features to understand the relationships among vehicle-related characteristics.Subsequently,we amalgamate feature maps from both the attention and graph-based branches for a more comprehensive representation of vehicle features.The framework then gauges feature similarities and ranks them,thus enhancing the accuracy of vehicle re-identification.Comprehensive qualitative and quantitative analyses on two publicly available datasets verify the efficacy of AG-GCN in addressing intra-class and inter-class variability issues.
基金supported by Gansu Natural Science Foundation Programme(No.24JRRA231)National Natural Science Foundation of China(No.62061023)Gansu Provincial Education,Science and Technology Innovation and Industry(No.2021CYZC-04)。
文摘Medical image fusion technology is crucial for improving the detection accuracy and treatment efficiency of diseases,but existing fusion methods have problems such as blurred texture details,low contrast,and inability to fully extract fused image information.Therefore,a multimodal medical image fusion method based on mask optimization and parallel attention mechanism was proposed to address the aforementioned issues.Firstly,it converted the entire image into a binary mask,and constructed a contour feature map to maximize the contour feature information of the image and a triple path network for image texture detail feature extraction and optimization.Secondly,a contrast enhancement module and a detail preservation module were proposed to enhance the overall brightness and texture details of the image.Afterwards,a parallel attention mechanism was constructed using channel features and spatial feature changes to fuse images and enhance the salient information of the fused images.Finally,a decoupling network composed of residual networks was set up to optimize the information between the fused image and the source image so as to reduce information loss in the fused image.Compared with nine high-level methods proposed in recent years,the seven objective evaluation indicators of our method have improved by 6%−31%,indicating that this method can obtain fusion results with clearer texture details,higher contrast,and smaller pixel differences between the fused image and the source image.It is superior to other comparison algorithms in both subjective and objective indicators.
基金supported by the National Natural Science Foundation of China (Nos.61806107 and 61702135)。
文摘We propose a hierarchical multi-scale attention mechanism-based model in response to the low accuracy and inefficient manual classification of existing oceanic biological image classification methods. Firstly, the hierarchical efficient multi-scale attention(H-EMA) module is designed for lightweight feature extraction, achieving outstanding performance at a relatively low cost. Secondly, an improved EfficientNetV2 block is used to integrate information from different scales better and enhance inter-layer message passing. Furthermore, introducing the convolutional block attention module(CBAM) enhances the model's perception of critical features, optimizing its generalization ability. Lastly, Focal Loss is introduced to adjust the weights of complex samples to address the issue of imbalanced categories in the dataset, further improving the model's performance. The model achieved 96.11% accuracy on the intertidal marine organism dataset of Nanji Islands and 84.78% accuracy on the CIFAR-100 dataset, demonstrating its strong generalization ability to meet the demands of oceanic biological image classification.
基金funded by the Major Humanities and Social Sciences Research Projects in Zhejiang higher education institutions,grant number 2023QN082,awarded to Cheng ZhaoThe National Natural Science Foundation of China also provided funding,grant number 61902349,awarded to Cheng Zhao.
文摘The present study examines the impact of short-term public opinion sentiment on the secondary market,with a focus on the potential for such sentiment to cause dramatic stock price fluctuations and increase investment risk.The quantification of investment sentiment indicators and the persistent analysis of their impact has been a complex and significant area of research.In this paper,a structured multi-head attention stock index prediction method based adaptive public opinion sentiment vector is proposed.The proposedmethod utilizes an innovative approach to transform numerous investor comments on social platforms over time into public opinion sentiment vectors expressing complex sentiments.It then analyzes the continuous impact of these vectors on the market through the use of aggregating techniques and public opinion data via a structured multi-head attention mechanism.The experimental results demonstrate that the public opinion sentiment vector can provide more comprehensive feedback on market sentiment than traditional sentiment polarity analysis.Furthermore,the multi-head attention mechanism is shown to improve prediction accuracy through attention convergence on each type of input information separately.Themean absolute percentage error(MAPE)of the proposedmethod is 0.463%,a reduction of 0.294% compared to the benchmark attention algorithm.Additionally,the market backtesting results indicate that the return was 24.560%,an improvement of 8.202% compared to the benchmark algorithm.These results suggest that themarket trading strategy based on thismethod has the potential to improve trading profits.