Aiming at the problems of low detection accuracy and large model size of existing object detection algorithms applied to complex road scenes,an improved you only look once version 8(YOLOv8)object detection algorithm f...Aiming at the problems of low detection accuracy and large model size of existing object detection algorithms applied to complex road scenes,an improved you only look once version 8(YOLOv8)object detection algorithm for infrared images,F-YOLOv8,is proposed.First,a spatial-to-depth network replaces the traditional backbone network's strided convolution or pooling layer.At the same time,it combines with the channel attention mechanism so that the neural network focuses on the channels with large weight values to better extract low-resolution image feature information;then an improved feature pyramid network of lightweight bidirectional feature pyramid network(L-BiFPN)is proposed,which can efficiently fuse features of different scales.In addition,a loss function of insertion of union based on the minimum point distance(MPDIoU)is introduced for bounding box regression,which obtains faster convergence speed and more accurate regression results.Experimental results on the FLIR dataset show that the improved algorithm can accurately detect infrared road targets in real time with 3%and 2.2%enhancement in mean average precision at 50%IoU(mAP50)and mean average precision at 50%—95%IoU(mAP50-95),respectively,and 38.1%,37.3%and 16.9%reduction in the number of model parameters,the model weight,and floating-point operations per second(FLOPs),respectively.To further demonstrate the detection capability of the improved algorithm,it is tested on the public dataset PASCAL VOC,and the results show that F-YOLO has excellent generalized detection performance.展开更多
To accurately diagnosemisfire faults in automotive engines,we propose a Channel Attention Convolutional Model,specifically the Squeeze-and-Excitation Networks(SENET),for classifying engine vibration signals and precis...To accurately diagnosemisfire faults in automotive engines,we propose a Channel Attention Convolutional Model,specifically the Squeeze-and-Excitation Networks(SENET),for classifying engine vibration signals and precisely pinpointing misfire faults.In the experiment,we established a total of 11 distinct states,encompassing the engine’s normal state,single-cylinder misfire faults,and dual-cylinder misfire faults for different cylinders.Data collection was facilitated by a highly sensitive acceleration signal collector with a high sampling rate of 20,840Hz.The collected data were methodically divided into training and testing sets based on different experimental groups to ensure generalization and prevent overlap between the two sets.The results revealed that,with a vibration acceleration sequence of 1000 time steps(approximately 50 ms)as input,the SENET model achieved a misfire fault detection accuracy of 99.8%.For comparison,we also trained and tested several commonly used models,including Long Short-Term Memory(LSTM),Transformer,and Multi-Scale Residual Networks(MSRESNET),yielding accuracy rates of 84%,79%,and 95%,respectively.This underscores the superior accuracy of the SENET model in detecting engine misfire faults compared to other models.Furthermore,the F1 scores for each type of recognition in the SENET model surpassed 0.98,outperforming the baseline models.Our analysis indicated that the misclassified samples in the LSTM and Transformer models’predictions were primarily due to intra-class misidentifications between single-cylinder and dual-cylinder misfire scenarios.To delve deeper,we conducted a visual analysis of the features extracted by the LSTM and SENET models using T-distributed Stochastic Neighbor Embedding(T-SNE)technology.The findings revealed that,in the LSTMmodel,data points of the same type tended to cluster together with significant overlap.Conversely,in the SENET model,data points of various types were more widely and evenly dispersed,demonstrating its effectiveness in distinguishing between different fault types.展开更多
Hypersonic Glide Vehicles(HGVs)are advanced aircraft that can achieve extremely high speeds(generally over 5 Mach)and maneuverability within the Earth's atmosphere.HGV trajectory prediction is crucial for effectiv...Hypersonic Glide Vehicles(HGVs)are advanced aircraft that can achieve extremely high speeds(generally over 5 Mach)and maneuverability within the Earth's atmosphere.HGV trajectory prediction is crucial for effective defense planning and interception strategies.In recent years,HGV trajectory prediction methods based on deep learning have the great potential to significantly enhance prediction accuracy and efficiency.However,it's still challenging to strike a balance between improving prediction performance and reducing computation costs of the deep learning trajectory prediction models.To solve this problem,we propose a new deep learning framework(FECA-LSMN)for efficient HGV trajectory prediction.The model first uses a Frequency Enhanced Channel Attention(FECA)module to facilitate the fusion of different HGV trajectory features,and then subsequently employs a Light Sampling-oriented Multi-Layer Perceptron Network(LSMN)based on simple MLP-based structures to extract long/shortterm HGV trajectory features for accurate trajectory prediction.Also,we employ a new data normalization method called reversible instance normalization(RevIN)to enhance the prediction accuracy and training stability of the network.Compared to other popular trajectory prediction models based on LSTM,GRU and Transformer,our FECA-LSMN model achieves leading or comparable performance in terms of RMSE,MAE and MAPE metrics while demonstrating notably faster computation time.The ablation experiments show that the incorporation of the FECA module significantly improves the prediction performance of the network.The RevIN data normalization technique outperforms traditional min-max normalization as well.展开更多
Responding to the stochasticity and uncertainty in the power height of distributed photovoltaic power generation.This paper presents a distributed photovoltaic ultra-short-term power forecasting method based on Variat...Responding to the stochasticity and uncertainty in the power height of distributed photovoltaic power generation.This paper presents a distributed photovoltaic ultra-short-term power forecasting method based on Variational Mode Decomposition(VMD)and Channel Attention Mechanism.First,Pearson’s correlation coefficient was utilized to filter out the meteorological factors that had a high impact on historical power.Second,the distributed PV power data were decomposed into a relatively smooth power series with different fluctuation patterns using variational modal decomposition(VMD).Finally,the reconstructed distributed PV power as well as other features are input into the combined CNN-SENet-BiLSTM model.In this model,the convolutional neural network(CNN)and channel attention mechanism dynamically adjust the weights while capturing the spatial features of the input data to improve the discriminative ability of key features.The extracted data is then fed into the bidirectional long short-term memory network(BiLSTM)to capture the time-series features,and the final output is the prediction result.The verification is conducted using a dataset from a distributed photovoltaic power station in the Northwest region of China.The results show that compared with other prediction methods,the method proposed in this paper has a higher prediction accuracy,which helps to improve the proportion of distributed PV access to the grid,and can guarantee the safe and stable operation of the power grid.展开更多
For the problems of complex model structure and too many training parameters in facial expression recognition algorithms,we proposed a residual network structure with a multi-headed channel attention(MCA)module.The mi...For the problems of complex model structure and too many training parameters in facial expression recognition algorithms,we proposed a residual network structure with a multi-headed channel attention(MCA)module.The migration learning algorithm is used to pre-train the convolutional layer parameters and mitigate the overfitting caused by the insufficient number of training samples.The designed MCA module is integrated into the ResNet18 backbone network.The attention mechanism highlights important information and suppresses irrelevant information by assigning different coefficients or weights,and the multi-head structure focuses more on the local features of the pictures,which improves the efficiency of facial expression recognition.Experimental results demonstrate that the model proposed in this paper achieves excellent recognition results in Fer2013,CK+and Jaffe datasets,with accuracy rates of 72.7%,98.8%and 93.33%,respectively.展开更多
In many existing multi-view gait recognition methods based on images or video sequences,gait sequences are usually used to superimpose and synthesize images and construct energy-like template.However,information may b...In many existing multi-view gait recognition methods based on images or video sequences,gait sequences are usually used to superimpose and synthesize images and construct energy-like template.However,information may be lost during the process of compositing image and capture EMG signals.Errors and the recognition accuracy may be introduced and affected respectively by some factors such as period detection.To better solve the problems,a multi-view gait recognition method using deep convolutional neural network and channel attention mechanism is proposed.Firstly,the sliding time window method is used to capture EMG signals.Then,the back-propagation learning algorithm is used to train each layer of convolution,which improves the learning ability of the convolutional neural network.Finally,the channel attention mechanism is integrated into the neural network,which will improve the ability of expressing gait features.And a classifier is used to classify gait.As can be shown from experimental results on two public datasets,OULP and CASIA-B,the recognition rate of the proposed method can be achieved at 88.44%and 97.25%respectively.As can be shown from the comparative experimental results,the proposed method has better recognition effect than several other newer convolutional neural network methods.Therefore,the combination of convolutional neural network and channel attention mechanism is of great value for gait recognition.展开更多
Intrusion detection systems(IDS)are essential in the field of cybersecurity because they protect networks from a wide range of online threats.The goal of this research is to meet the urgent need for small-footprint,hi...Intrusion detection systems(IDS)are essential in the field of cybersecurity because they protect networks from a wide range of online threats.The goal of this research is to meet the urgent need for small-footprint,highly-adaptable Network Intrusion Detection Systems(NIDS)that can identify anomalies.The NSL-KDD dataset is used in the study;it is a sizable collection comprising 43 variables with the label’s“attack”and“level.”It proposes a novel approach to intrusion detection based on the combination of channel attention and convolutional neural networks(CNN).Furthermore,this dataset makes it easier to conduct a thorough assessment of the suggested intrusion detection strategy.Furthermore,maintaining operating efficiency while improving detection accuracy is the primary goal of this work.Moreover,typical NIDS examines both risky and typical behavior using a variety of techniques.On the NSL-KDD dataset,our CNN-based approach achieves an astounding 99.728%accuracy rate when paired with channel attention.Compared to previous approaches such as ensemble learning,CNN,RBM(Boltzmann machine),ANN,hybrid auto-encoders with CNN,MCNN,and ANN,and adaptive algorithms,our solution significantly improves intrusion detection performance.Moreover,the results highlight the effectiveness of our suggested method in improving intrusion detection precision,signifying a noteworthy advancement in this field.Subsequent efforts will focus on strengthening and expanding our approach in order to counteract growing cyberthreats and adjust to changing network circumstances.展开更多
Convolutional neural networks(CNNs) have shown great potential for image super-resolution(SR).However,most existing CNNs only reconstruct images in the spatial domain,resulting in insufficient high-frequency details o...Convolutional neural networks(CNNs) have shown great potential for image super-resolution(SR).However,most existing CNNs only reconstruct images in the spatial domain,resulting in insufficient high-frequency details of reconstructed images.To address this issue,a channel attention based wavelet cascaded network for image super-resolution(CWSR) is proposed.Specifically,a second-order channel attention(SOCA) mechanism is incorporated into the network,and the covariance matrix normalization is utilized to explore interdependencies between channel-wise features.Then,to boost the quality of residual features,the non-local module is adopted to further improve the global information integration ability of the network.Finally,taking the image loss in the spatial and wavelet domains into account,a dual-constrained loss function is proposed to optimize the network.Experimental results illustrate that CWSR outperforms several state-of-the-art methods in terms of both visual quality and quantitative metrics.展开更多
In recent years,the convolutional neural networks(CNNs)for single image super-resolution(SISR)are becoming more and more complex,and it is more challenging to improve the SISR performance.In contrast,the reference ima...In recent years,the convolutional neural networks(CNNs)for single image super-resolution(SISR)are becoming more and more complex,and it is more challenging to improve the SISR performance.In contrast,the reference image guided super-resolution(RefSR)is an effective strategy to boost the SR(super-resolution)performance.In RefSR,the introduced high-resolution(HR)references can facilitate the high-frequency residual prediction process.According to the best of our knowledge,the existing CNN-based RefSR methods treat the features from the references and the low-resolution(LR)input equally by simply concatenating them together.However,the HR references and the LR inputs contribute differently to the final SR results.Therefore,we propose a progressive channel attention network(PCANet)for RefSR.There are two technical contributions in this paper.First,we propose a novel channel attention module(CAM),which estimates the channel weighting parameter by weightedly averaging the spatial features instead of using global averaging.Second,considering that the residual prediction process can be improved when the LR input is enriched with more details,we perform super-resolution progressively,which can take advantage of the reference images in multi-scales.Extensive quantitative and qualitative evaluations on three benchmark datasets,which represent three typical scenarios for RefSR,demonstrate that our method is superior to the state-of-the-art SISR and RefSR methods in terms of PSNR(Peak Signal-to-Noise Ratio)and SSIM(Structural Similarity).展开更多
Image deraining is a highly ill-posed problem.Although significant progress has been made due to the use of deep convolutional neural networks,this problem still remains challenging,especially for the details restorat...Image deraining is a highly ill-posed problem.Although significant progress has been made due to the use of deep convolutional neural networks,this problem still remains challenging,especially for the details restoration and generalization to real rain images.In this paper,we propose a deep residual channel attention network(DeRCAN)for deraining.The channel attention mechanism is able to capture the inherent properties of the feature space and thus facilitates more accurate estimations of structures and details for image deraining.In addition,we further propose an unsupervised learning approach to better solve real rain images based on the proposed network.Extensive qualitative and quantitative evaluation results on both synthetic and real-world images demonstrate that the proposed DeRCAN performs favorably against state-of-the-art methods.展开更多
Audio-visual scene classification(AVSC)poses a formidable challenge owing to the intricate spatial-temporal relationships exhibited by audio-visual signals,coupled with the complex spatial patterns of objects and text...Audio-visual scene classification(AVSC)poses a formidable challenge owing to the intricate spatial-temporal relationships exhibited by audio-visual signals,coupled with the complex spatial patterns of objects and textures found in visual images.The focus of recent studies has predominantly revolved around extracting features from diverse neural network structures,inadvertently neglecting the acquisition of semantically meaningful regions and crucial components within audio-visual data.The authors present a feature pyramid attention network(FPANet)for audio-visual scene understanding,which extracts semantically significant characteristics from audio-visual data.The authors’approach builds multi-scale hierarchical features of sound spectrograms and visual images using a feature pyramid representation and localises the semantically relevant regions with a feature pyramid attention module(FPAM).A dimension alignment(DA)strategy is employed to align feature maps from multiple layers,a pyramid spatial attention(PSA)to spatially locate essential regions,and a pyramid channel attention(PCA)to pinpoint significant temporal frames.Experiments on visual scene classification(VSC),audio scene classification(ASC),and AVSC tasks demonstrate that FPANet achieves performance on par with state-of-the-art(SOTA)approaches,with a 95.9 F1-score on the ADVANCE dataset and a relative improvement of 28.8%.Visualisation results show that FPANet can prioritise semantically meaningful areas in audio-visual signals.展开更多
An end-to-end channel attention and pixel attention network(CP-Net)is proposed to produce dehazed image directly in the paper.The CP-Net structure contains three critical components.Firstly,the double attention(DA)mod...An end-to-end channel attention and pixel attention network(CP-Net)is proposed to produce dehazed image directly in the paper.The CP-Net structure contains three critical components.Firstly,the double attention(DA)module consisting of channel attention(CA)and pixel attention(PA).Different channel features contain different levels of important information,and CA can give more weight to relevant information,so the network can learn more useful information.Meanwhile,haze is unevenly distributed on different pixels,and PA is able to filter out haze with varying weights for different pixels.It sums the outputs of the two attention modules to improve further feature representation which contributes to better dehazing result.Secondly,local residual learning and DA module constitute another important component,namely basic block structure.Local residual learning can transfer the feature information in the shallow part of the network to the deep part of the network through multiple local residual connections and enhance the expressive ability of CP-Net.Thirdly,CP-Net mainly uses its core component,DA module,to automatically assign different weights to different features to achieve satisfactory dehazing effect.The experiment results on synthetic datasets and real hazy images indicate that many state-of-the-art single image dehazing methods have been surpassed by the CP-Net both quantitatively and qualitatively.展开更多
Detecting Alzheimer’s disease is essential for patient care,as an accurate diagnosis influences treatment options.Classifying dementia from non-dementia in brain MRIs is challenging due to features such as hippocampa...Detecting Alzheimer’s disease is essential for patient care,as an accurate diagnosis influences treatment options.Classifying dementia from non-dementia in brain MRIs is challenging due to features such as hippocampal atrophy,while manual diagnosis is susceptible to error.Optimal computer-aided diagnosis(CAD)systems are essential for improving accuracy and reducing misclassification risks.This study proposes an optimized ensemble method(CEOE-Net)that initiates with the selection of pre-trained models,including DenseNet121,ResNet50V2,and ResNet152V2 for unique feature extraction.Each selected model is enhanced with the inclusion of a channel attention(CA)block to improve the feature extraction process.In addition,this study employs the Short Time Fourier transform(STFT)technique with each individual model for hierarchical feature extraction before making final predictions in classifying MRI images of dementia and non-demented individuals,considering them as backbone models for building the ensemble method.STFT highlights subtle differences in brain structure and activity,particularly when combined with CA mechanisms that emphasize relevant features by converting spatial data into the frequency domain.The predictions generated from these models are then processed by the Chaotic Evolution Optimization(CEO)algorithm,which determines the optimal weightage set for each backbone model to maximize their contribution.The CEO optimizer explores weight distribution to ensure the most effective combination of model predictions for enhancing classification accuracy,thus significantly improving overall ensemble performance.This study utilized three datasets for validation:two private clinical brain MRI datasets(OSASIS and ADNI)to test the proposed model’s effectiveness.Image augmentation techniques were also employed to enhance dataset diversity and improve classification performance.The proposed CEOE-Net outperforms conventional baseline models and existing methods by showing its effectiveness as a clinical tool for the accurate classification of dementia and non-dementia MRI brain images,as well as autistic and non-autistic facial features.It achieved consistent accuracies of 93.44%on OSASIS and 81.94%on ADNI.展开更多
In remote sensing imagery,approximately 67%of the data are affected by cloud cover,significantly increasing the difficulty of image classification,recognition,and other downstream interpretation tasks.To effectively a...In remote sensing imagery,approximately 67%of the data are affected by cloud cover,significantly increasing the difficulty of image classification,recognition,and other downstream interpretation tasks.To effectively address the randomness of cloud distribution and the non-uniformity of cloud thickness,we propose a coarse-to-fine thin cloud removal architecture based on the observations of the random distribution and uneven thickness of cloud.In the coarse-level declouding network,we innovatively introduce a multi-scale attention mechanism,i.e.,pyramid nonlocal attention(PNA).By integrating global context with local detail information,it specifically addresses image quality degradation caused by the uncertainty in cloud distribution.During the fine-level declouding stage,we focus on the impact of cloud thickness on declouding results(primarily manifested as insufficient detail information).Through a carefully designed residual dense module,we significantly enhance the extraction and utilization of feature details.Thus,our approach precisely restores lost local texture features on top of coarse-level results,achieving a substantial leap in declouding quality.To evaluate the effectiveness of our cloud removal technology and attention mechanism,we conducted comprehensive analyses on publicly available datasets.Results demonstrate that our method achieves state-of-the-art performance across a wide range of techniques.展开更多
Existing almost deep learning methods rely on a large amount of annotated data, so they are inappropriate for forest fire smoke detection with limited data. In this paper, a novel hybrid attention-based few-shot learn...Existing almost deep learning methods rely on a large amount of annotated data, so they are inappropriate for forest fire smoke detection with limited data. In this paper, a novel hybrid attention-based few-shot learning method, named Attention-Based Prototypical Network, is proposed for forest fire smoke detection. Specifically, feature extraction network, which consists of convolutional block attention module, could extract high-level and discriminative features and further decrease the false alarm rate resulting from suspected smoke areas. Moreover, we design a metalearning module to alleviate the overfitting issue caused by limited smoke images, and the meta-learning network enables achieving effective detection via comparing the distance between the class prototype of support images and the features of query images. A series of experiments on forest fire smoke datasets and miniImageNet dataset testify that the proposed method is superior to state-of-the-art few-shot learning approaches.展开更多
This research developed a hybrid position-channel network (named PCNet) through incorporating newly designed channel and position attention modules into U-Net to alleviate the crack discontinuity problem in channel an...This research developed a hybrid position-channel network (named PCNet) through incorporating newly designed channel and position attention modules into U-Net to alleviate the crack discontinuity problem in channel and spatial dimensions. In PCNet, the U-Net is used as a baseline to extract informative spatial and channel-wise features from shield tunnel lining crack images. A channel and a position attention module are designed and embedded after each convolution layer of U-Net to model the feature interdependencies in channel and spatial dimensions. These attention modules can make the U-Net adaptively integrate local crack features with their global dependencies. Experiments were conducted utilizing the dataset based on the images from Shanghai metro shield tunnels. The results validate the effectiveness of the designed channel and position attention modules, since they can individually increase balanced accuracy (BA) by 11.25% and 12.95%, intersection over union (IoU) by 10.79% and 11.83%, and F1 score by 9.96% and 10.63%, respectively. In comparison with the state-of-the-art models (i.e. LinkNet, PSPNet, U-Net, PANet, and Mask R–CNN) on the testing dataset, the proposed PCNet outperforms others with an improvement of BA, IoU, and F1 score owing to the implementation of the channel and position attention modules. These evaluation metrics indicate that the proposed PCNet presents refined crack segmentation with improved performance and is a practicable approach to segment shield tunnel lining cracks in field practice.展开更多
The end-to-end separation algorithm with superior performance in the field of speech separation has not been effectively used in music separation.Moreover,since music signals are often dual channel data with a high sa...The end-to-end separation algorithm with superior performance in the field of speech separation has not been effectively used in music separation.Moreover,since music signals are often dual channel data with a high sampling rate,how to model longsequence data and make rational use of the relevant information between channels is also an urgent problem to be solved.In order to solve the above problems,the performance of the end-to-end music separation algorithm is enhanced by improving the network structure.Our main contributions include the following:(1)A more reasonable densely connected U-Net is designed to capture the long-term characteristics of music,such as main melody,tone and so on.(2)On this basis,the multi-head attention and dualpath transformer are introduced in the separation module.Channel attention units are applied recursively on the feature map of each layer of the network,enabling the network to perform long-sequence separation.Experimental results show that after the introduction of the channel attention,the performance of the proposed algorithm has a stable improvement compared with the baseline system.On the MUSDB18 dataset,the average score of the separated audio exceeds that of the current best-performing music separation algorithm based on the time-frequency domain(T-F domain).展开更多
Object detection has made a significant leap forward in recent years.However,the detection of small objects continues to be a great difficulty for various reasons,such as they have a very small size and they are susce...Object detection has made a significant leap forward in recent years.However,the detection of small objects continues to be a great difficulty for various reasons,such as they have a very small size and they are susceptible to missed detection due to background noise.Additionally,small object information is affected due to the downsampling operations.Deep learning-based detection methods have been utilized to address the challenge posed by small objects.In this work,we propose a novel method,the Multi-Convolutional Block Attention Network(MCBAN),to increase the detection accuracy of minute objects aiming to overcome the challenge of information loss during the downsampling process.The multi-convolutional attention block(MCAB);channel attention and spatial attention module(SAM)that make up MCAB,have been crafted to accomplish small object detection with higher precision.We have carried out the experiments on the Karlsruhe Institute of Technology and Toyota Technological Institute(KITTI)and Pattern Analysis,Statical Modeling and Computational Learning(PASCAL)Visual Object Classes(VOC)datasets and have followed a step-wise process to analyze the results.These experiment results demonstrate that significant gains in performance are achieved,such as 97.75%for KITTI and 88.97%for PASCAL VOC.The findings of this study assert quite unequivocally the fact that MCBAN is much more efficient in the small object detection domain as compared to other existing approaches.展开更多
Structured illumination microscopy(SIM)is a popular and powerful super-resolution(SR)technique in biomedical research.However,the conventional reconstruction algorithm for SIM heavily relies on the accurate prior know...Structured illumination microscopy(SIM)is a popular and powerful super-resolution(SR)technique in biomedical research.However,the conventional reconstruction algorithm for SIM heavily relies on the accurate prior knowledge of illumination patterns and signal-to-noise ratio(SNR)of raw images.To obtain high-quality SR images,several raw images need to be captured under high fluorescence level,which further restricts SIM’s temporal resolution and its applications.Deep learning(DL)is a data-driven technology that has been used to expand the limits of optical microscopy.In this study,we propose a deep neural network based on multi-level wavelet and attention mechanism(MWAM)for SIM.Our results show that the MWAM network can extract high-frequency information contained in SIM raw images and accurately integrate it into the output image,resulting in superior SR images compared to those generated using wide-field images as input data.We also demonstrate that the number of SIM raw images can be reduced to three,with one image in each illumination orientation,to achieve the optimal tradeoff between temporal and spatial resolution.Furthermore,our MWAM network exhibits superior reconstruction ability on low-SNR images compared to conventional SIM algorithms.We have also analyzed the adaptability of this network on other biological samples and successfully applied the pretrained model to other SIM systems.展开更多
The detection of foreign object intrusion is crucial for ensuring the safety of railway operations.To address challenges such as low efficiency,suboptimal detection accuracy,and slow detection speed inherent in conven...The detection of foreign object intrusion is crucial for ensuring the safety of railway operations.To address challenges such as low efficiency,suboptimal detection accuracy,and slow detection speed inherent in conventional comprehensive video monitoring systems for railways,a railway foreign object intrusion recognition and detection system is conceived and implemented using edge computing and deep learning technologies.In a bid to raise detection accuracy,the convolutional block attention module(CBAM),including spatial and channel attention modules,is seamlessly integrated into the YOLOv5 model,giving rise to the CBAM-YOLOv5 model.Furthermore,the distance intersection-over-union_non-maximum suppression(DIo U_NMS)algorithm is employed in lieu of the weighted nonmaximum suppression algorithm,resulting in improved detection performance for intrusive targets.To accelerate detection speed,the model undergoes pruning based on the batch normalization(BN)layer,and Tensor RT inference acceleration techniques are employed,culminating in the successful deployment of the algorithm on edge devices.The CBAM-YOLOv5 model exhibits a notable 2.1%enhancement in detection accuracy when evaluated on a selfconstructed railway dataset,achieving 95.0%for mean average precision(m AP).Furthermore,the inference speed on edge devices attains a commendable 15 frame/s.展开更多
基金supported by the National Natural Science Foundation of China(No.62103298)。
文摘Aiming at the problems of low detection accuracy and large model size of existing object detection algorithms applied to complex road scenes,an improved you only look once version 8(YOLOv8)object detection algorithm for infrared images,F-YOLOv8,is proposed.First,a spatial-to-depth network replaces the traditional backbone network's strided convolution or pooling layer.At the same time,it combines with the channel attention mechanism so that the neural network focuses on the channels with large weight values to better extract low-resolution image feature information;then an improved feature pyramid network of lightweight bidirectional feature pyramid network(L-BiFPN)is proposed,which can efficiently fuse features of different scales.In addition,a loss function of insertion of union based on the minimum point distance(MPDIoU)is introduced for bounding box regression,which obtains faster convergence speed and more accurate regression results.Experimental results on the FLIR dataset show that the improved algorithm can accurately detect infrared road targets in real time with 3%and 2.2%enhancement in mean average precision at 50%IoU(mAP50)and mean average precision at 50%—95%IoU(mAP50-95),respectively,and 38.1%,37.3%and 16.9%reduction in the number of model parameters,the model weight,and floating-point operations per second(FLOPs),respectively.To further demonstrate the detection capability of the improved algorithm,it is tested on the public dataset PASCAL VOC,and the results show that F-YOLO has excellent generalized detection performance.
基金Yongxian Huang supported by Projects of Guangzhou Science and Technology Plan(2023A04J0409)。
文摘To accurately diagnosemisfire faults in automotive engines,we propose a Channel Attention Convolutional Model,specifically the Squeeze-and-Excitation Networks(SENET),for classifying engine vibration signals and precisely pinpointing misfire faults.In the experiment,we established a total of 11 distinct states,encompassing the engine’s normal state,single-cylinder misfire faults,and dual-cylinder misfire faults for different cylinders.Data collection was facilitated by a highly sensitive acceleration signal collector with a high sampling rate of 20,840Hz.The collected data were methodically divided into training and testing sets based on different experimental groups to ensure generalization and prevent overlap between the two sets.The results revealed that,with a vibration acceleration sequence of 1000 time steps(approximately 50 ms)as input,the SENET model achieved a misfire fault detection accuracy of 99.8%.For comparison,we also trained and tested several commonly used models,including Long Short-Term Memory(LSTM),Transformer,and Multi-Scale Residual Networks(MSRESNET),yielding accuracy rates of 84%,79%,and 95%,respectively.This underscores the superior accuracy of the SENET model in detecting engine misfire faults compared to other models.Furthermore,the F1 scores for each type of recognition in the SENET model surpassed 0.98,outperforming the baseline models.Our analysis indicated that the misclassified samples in the LSTM and Transformer models’predictions were primarily due to intra-class misidentifications between single-cylinder and dual-cylinder misfire scenarios.To delve deeper,we conducted a visual analysis of the features extracted by the LSTM and SENET models using T-distributed Stochastic Neighbor Embedding(T-SNE)technology.The findings revealed that,in the LSTMmodel,data points of the same type tended to cluster together with significant overlap.Conversely,in the SENET model,data points of various types were more widely and evenly dispersed,demonstrating its effectiveness in distinguishing between different fault types.
文摘Hypersonic Glide Vehicles(HGVs)are advanced aircraft that can achieve extremely high speeds(generally over 5 Mach)and maneuverability within the Earth's atmosphere.HGV trajectory prediction is crucial for effective defense planning and interception strategies.In recent years,HGV trajectory prediction methods based on deep learning have the great potential to significantly enhance prediction accuracy and efficiency.However,it's still challenging to strike a balance between improving prediction performance and reducing computation costs of the deep learning trajectory prediction models.To solve this problem,we propose a new deep learning framework(FECA-LSMN)for efficient HGV trajectory prediction.The model first uses a Frequency Enhanced Channel Attention(FECA)module to facilitate the fusion of different HGV trajectory features,and then subsequently employs a Light Sampling-oriented Multi-Layer Perceptron Network(LSMN)based on simple MLP-based structures to extract long/shortterm HGV trajectory features for accurate trajectory prediction.Also,we employ a new data normalization method called reversible instance normalization(RevIN)to enhance the prediction accuracy and training stability of the network.Compared to other popular trajectory prediction models based on LSTM,GRU and Transformer,our FECA-LSMN model achieves leading or comparable performance in terms of RMSE,MAE and MAPE metrics while demonstrating notably faster computation time.The ablation experiments show that the incorporation of the FECA module significantly improves the prediction performance of the network.The RevIN data normalization technique outperforms traditional min-max normalization as well.
基金supported by the Inner Mongolia Power Company 2024 Staff Innovation Studio Innovation Project“Research on Cluster Output Prediction and Group Control Technology for County-Wide Distributed Photovoltaic Construction”.
文摘Responding to the stochasticity and uncertainty in the power height of distributed photovoltaic power generation.This paper presents a distributed photovoltaic ultra-short-term power forecasting method based on Variational Mode Decomposition(VMD)and Channel Attention Mechanism.First,Pearson’s correlation coefficient was utilized to filter out the meteorological factors that had a high impact on historical power.Second,the distributed PV power data were decomposed into a relatively smooth power series with different fluctuation patterns using variational modal decomposition(VMD).Finally,the reconstructed distributed PV power as well as other features are input into the combined CNN-SENet-BiLSTM model.In this model,the convolutional neural network(CNN)and channel attention mechanism dynamically adjust the weights while capturing the spatial features of the input data to improve the discriminative ability of key features.The extracted data is then fed into the bidirectional long short-term memory network(BiLSTM)to capture the time-series features,and the final output is the prediction result.The verification is conducted using a dataset from a distributed photovoltaic power station in the Northwest region of China.The results show that compared with other prediction methods,the method proposed in this paper has a higher prediction accuracy,which helps to improve the proportion of distributed PV access to the grid,and can guarantee the safe and stable operation of the power grid.
基金funded by Anhui Province Quality Engineering Project No.2021jyxm0801Natural Science Foundation of Anhui University of Chinese Medicine under Grant Nos.2020zrzd18,2019zrzd11+1 种基金Humanity Social Science foundation Grants 2021rwzd20,2020rwzd07Anhui University of Chinese Medicine Quality Engineering Projects No.2021zlgc046.
文摘For the problems of complex model structure and too many training parameters in facial expression recognition algorithms,we proposed a residual network structure with a multi-headed channel attention(MCA)module.The migration learning algorithm is used to pre-train the convolutional layer parameters and mitigate the overfitting caused by the insufficient number of training samples.The designed MCA module is integrated into the ResNet18 backbone network.The attention mechanism highlights important information and suppresses irrelevant information by assigning different coefficients or weights,and the multi-head structure focuses more on the local features of the pictures,which improves the efficiency of facial expression recognition.Experimental results demonstrate that the model proposed in this paper achieves excellent recognition results in Fer2013,CK+and Jaffe datasets,with accuracy rates of 72.7%,98.8%and 93.33%,respectively.
基金This work was supported by the Natural Science Foundation of China(No.61902133)Fujian natural science foundation project(No.2018J05106)Xiamen Collaborative Innovation projects of Produces study grinds(3502Z20173046)。
文摘In many existing multi-view gait recognition methods based on images or video sequences,gait sequences are usually used to superimpose and synthesize images and construct energy-like template.However,information may be lost during the process of compositing image and capture EMG signals.Errors and the recognition accuracy may be introduced and affected respectively by some factors such as period detection.To better solve the problems,a multi-view gait recognition method using deep convolutional neural network and channel attention mechanism is proposed.Firstly,the sliding time window method is used to capture EMG signals.Then,the back-propagation learning algorithm is used to train each layer of convolution,which improves the learning ability of the convolutional neural network.Finally,the channel attention mechanism is integrated into the neural network,which will improve the ability of expressing gait features.And a classifier is used to classify gait.As can be shown from experimental results on two public datasets,OULP and CASIA-B,the recognition rate of the proposed method can be achieved at 88.44%and 97.25%respectively.As can be shown from the comparative experimental results,the proposed method has better recognition effect than several other newer convolutional neural network methods.Therefore,the combination of convolutional neural network and channel attention mechanism is of great value for gait recognition.
基金The authors would like to thank Princess Nourah bint Abdulrahman University for funding this project through the Researchers Supporting Project(PNURSP2023R319)this research was funded by the Prince Sultan University,Riyadh,Saudi Arabia.
文摘Intrusion detection systems(IDS)are essential in the field of cybersecurity because they protect networks from a wide range of online threats.The goal of this research is to meet the urgent need for small-footprint,highly-adaptable Network Intrusion Detection Systems(NIDS)that can identify anomalies.The NSL-KDD dataset is used in the study;it is a sizable collection comprising 43 variables with the label’s“attack”and“level.”It proposes a novel approach to intrusion detection based on the combination of channel attention and convolutional neural networks(CNN).Furthermore,this dataset makes it easier to conduct a thorough assessment of the suggested intrusion detection strategy.Furthermore,maintaining operating efficiency while improving detection accuracy is the primary goal of this work.Moreover,typical NIDS examines both risky and typical behavior using a variety of techniques.On the NSL-KDD dataset,our CNN-based approach achieves an astounding 99.728%accuracy rate when paired with channel attention.Compared to previous approaches such as ensemble learning,CNN,RBM(Boltzmann machine),ANN,hybrid auto-encoders with CNN,MCNN,and ANN,and adaptive algorithms,our solution significantly improves intrusion detection performance.Moreover,the results highlight the effectiveness of our suggested method in improving intrusion detection precision,signifying a noteworthy advancement in this field.Subsequent efforts will focus on strengthening and expanding our approach in order to counteract growing cyberthreats and adjust to changing network circumstances.
基金Supported by the National Natural Science Foundation of China(No.61901183)Fundamental Research Funds for the Central Universities(No.ZQN921)+4 种基金Natural Science Foundation of Fujian Province Science and Technology Department(No.2021H6037)Key Project of Quanzhou Science and Technology Plan(No.2021C008R)Natural Science Foundation of Fujian Province(No.2019J01010561)Education and Scientific Research Project for Young and Middle-aged Teachers of Fujian Province 2019(No.JAT191080)Science and Technology Bureau of Quanzhou(No.2017G046)。
文摘Convolutional neural networks(CNNs) have shown great potential for image super-resolution(SR).However,most existing CNNs only reconstruct images in the spatial domain,resulting in insufficient high-frequency details of reconstructed images.To address this issue,a channel attention based wavelet cascaded network for image super-resolution(CWSR) is proposed.Specifically,a second-order channel attention(SOCA) mechanism is incorporated into the network,and the covariance matrix normalization is utilized to explore interdependencies between channel-wise features.Then,to boost the quality of residual features,the non-local module is adopted to further improve the global information integration ability of the network.Finally,taking the image loss in the spatial and wavelet domains into account,a dual-constrained loss function is proposed to optimize the network.Experimental results illustrate that CWSR outperforms several state-of-the-art methods in terms of both visual quality and quantitative metrics.
基金This work was supported in part by the National Natural Science Foundation of China under Grant Nos.61672378,61771339,and 61520106002.
文摘In recent years,the convolutional neural networks(CNNs)for single image super-resolution(SISR)are becoming more and more complex,and it is more challenging to improve the SISR performance.In contrast,the reference image guided super-resolution(RefSR)is an effective strategy to boost the SR(super-resolution)performance.In RefSR,the introduced high-resolution(HR)references can facilitate the high-frequency residual prediction process.According to the best of our knowledge,the existing CNN-based RefSR methods treat the features from the references and the low-resolution(LR)input equally by simply concatenating them together.However,the HR references and the LR inputs contribute differently to the final SR results.Therefore,we propose a progressive channel attention network(PCANet)for RefSR.There are two technical contributions in this paper.First,we propose a novel channel attention module(CAM),which estimates the channel weighting parameter by weightedly averaging the spatial features instead of using global averaging.Second,considering that the residual prediction process can be improved when the LR input is enriched with more details,we perform super-resolution progressively,which can take advantage of the reference images in multi-scales.Extensive quantitative and qualitative evaluations on three benchmark datasets,which represent three typical scenarios for RefSR,demonstrate that our method is superior to the state-of-the-art SISR and RefSR methods in terms of PSNR(Peak Signal-to-Noise Ratio)and SSIM(Structural Similarity).
基金supported by the National Key Research and Development Program of China under Grant No.2018AAA0102001the Fundamental Research Funds for the Central Universities of China under Grant No.30920041109.
文摘Image deraining is a highly ill-posed problem.Although significant progress has been made due to the use of deep convolutional neural networks,this problem still remains challenging,especially for the details restoration and generalization to real rain images.In this paper,we propose a deep residual channel attention network(DeRCAN)for deraining.The channel attention mechanism is able to capture the inherent properties of the feature space and thus facilitates more accurate estimations of structures and details for image deraining.In addition,we further propose an unsupervised learning approach to better solve real rain images based on the proposed network.Extensive qualitative and quantitative evaluation results on both synthetic and real-world images demonstrate that the proposed DeRCAN performs favorably against state-of-the-art methods.
基金Shenzhen Institute of Artificial Intelligence and Robotics for Society,Grant/Award Number:AC01202201003-02GuangDong Basic and Applied Basic Research Foundation,Grant/Award Number:2024A1515010252Longgang District Shenzhen's“Ten Action Plan”for Supporting Innovation Projects,Grant/Award Number:LGKCSDPT2024002。
文摘Audio-visual scene classification(AVSC)poses a formidable challenge owing to the intricate spatial-temporal relationships exhibited by audio-visual signals,coupled with the complex spatial patterns of objects and textures found in visual images.The focus of recent studies has predominantly revolved around extracting features from diverse neural network structures,inadvertently neglecting the acquisition of semantically meaningful regions and crucial components within audio-visual data.The authors present a feature pyramid attention network(FPANet)for audio-visual scene understanding,which extracts semantically significant characteristics from audio-visual data.The authors’approach builds multi-scale hierarchical features of sound spectrograms and visual images using a feature pyramid representation and localises the semantically relevant regions with a feature pyramid attention module(FPAM).A dimension alignment(DA)strategy is employed to align feature maps from multiple layers,a pyramid spatial attention(PSA)to spatially locate essential regions,and a pyramid channel attention(PCA)to pinpoint significant temporal frames.Experiments on visual scene classification(VSC),audio scene classification(ASC),and AVSC tasks demonstrate that FPANet achieves performance on par with state-of-the-art(SOTA)approaches,with a 95.9 F1-score on the ADVANCE dataset and a relative improvement of 28.8%.Visualisation results show that FPANet can prioritise semantically meaningful areas in audio-visual signals.
文摘An end-to-end channel attention and pixel attention network(CP-Net)is proposed to produce dehazed image directly in the paper.The CP-Net structure contains three critical components.Firstly,the double attention(DA)module consisting of channel attention(CA)and pixel attention(PA).Different channel features contain different levels of important information,and CA can give more weight to relevant information,so the network can learn more useful information.Meanwhile,haze is unevenly distributed on different pixels,and PA is able to filter out haze with varying weights for different pixels.It sums the outputs of the two attention modules to improve further feature representation which contributes to better dehazing result.Secondly,local residual learning and DA module constitute another important component,namely basic block structure.Local residual learning can transfer the feature information in the shallow part of the network to the deep part of the network through multiple local residual connections and enhance the expressive ability of CP-Net.Thirdly,CP-Net mainly uses its core component,DA module,to automatically assign different weights to different features to achieve satisfactory dehazing effect.The experiment results on synthetic datasets and real hazy images indicate that many state-of-the-art single image dehazing methods have been surpassed by the CP-Net both quantitatively and qualitatively.
基金supported in part by the Science and Technology Major Special Project Fund of Changsha(No.kh2401010)in part by the High-Performance Computing Center of Central South University+3 种基金supported by the National Natural Science Foundation of China(Grants Nos.82022024,31970572)The Science and Technology Innovation Program of Hunan Province(2021RC4018,2021RC5027)Innovation-Driven Project of Central South University(Grant No.2020CX003)NIH grants U01 MH122591,1U01MH116489,1R01MH110920,R01MH126459.
文摘Detecting Alzheimer’s disease is essential for patient care,as an accurate diagnosis influences treatment options.Classifying dementia from non-dementia in brain MRIs is challenging due to features such as hippocampal atrophy,while manual diagnosis is susceptible to error.Optimal computer-aided diagnosis(CAD)systems are essential for improving accuracy and reducing misclassification risks.This study proposes an optimized ensemble method(CEOE-Net)that initiates with the selection of pre-trained models,including DenseNet121,ResNet50V2,and ResNet152V2 for unique feature extraction.Each selected model is enhanced with the inclusion of a channel attention(CA)block to improve the feature extraction process.In addition,this study employs the Short Time Fourier transform(STFT)technique with each individual model for hierarchical feature extraction before making final predictions in classifying MRI images of dementia and non-demented individuals,considering them as backbone models for building the ensemble method.STFT highlights subtle differences in brain structure and activity,particularly when combined with CA mechanisms that emphasize relevant features by converting spatial data into the frequency domain.The predictions generated from these models are then processed by the Chaotic Evolution Optimization(CEO)algorithm,which determines the optimal weightage set for each backbone model to maximize their contribution.The CEO optimizer explores weight distribution to ensure the most effective combination of model predictions for enhancing classification accuracy,thus significantly improving overall ensemble performance.This study utilized three datasets for validation:two private clinical brain MRI datasets(OSASIS and ADNI)to test the proposed model’s effectiveness.Image augmentation techniques were also employed to enhance dataset diversity and improve classification performance.The proposed CEOE-Net outperforms conventional baseline models and existing methods by showing its effectiveness as a clinical tool for the accurate classification of dementia and non-dementia MRI brain images,as well as autistic and non-autistic facial features.It achieved consistent accuracies of 93.44%on OSASIS and 81.94%on ADNI.
基金supported by the Fundamental Research Funds for the Central Universities(No.2572025BR14)the China Energy Digital Intelligence Technology Development(Beijing)Co.,Ltd.Science and Technology Innovation Project(No.YA2024001500).
文摘In remote sensing imagery,approximately 67%of the data are affected by cloud cover,significantly increasing the difficulty of image classification,recognition,and other downstream interpretation tasks.To effectively address the randomness of cloud distribution and the non-uniformity of cloud thickness,we propose a coarse-to-fine thin cloud removal architecture based on the observations of the random distribution and uneven thickness of cloud.In the coarse-level declouding network,we innovatively introduce a multi-scale attention mechanism,i.e.,pyramid nonlocal attention(PNA).By integrating global context with local detail information,it specifically addresses image quality degradation caused by the uncertainty in cloud distribution.During the fine-level declouding stage,we focus on the impact of cloud thickness on declouding results(primarily manifested as insufficient detail information).Through a carefully designed residual dense module,we significantly enhance the extraction and utilization of feature details.Thus,our approach precisely restores lost local texture features on top of coarse-level results,achieving a substantial leap in declouding quality.To evaluate the effectiveness of our cloud removal technology and attention mechanism,we conducted comprehensive analyses on publicly available datasets.Results demonstrate that our method achieves state-of-the-art performance across a wide range of techniques.
基金The work was supported by the National Key R&D Program of China(Grant No.2020YFC1511601)Fundamental Research Funds for the Central Universities(Grant No.2019SHFWLC01).
文摘Existing almost deep learning methods rely on a large amount of annotated data, so they are inappropriate for forest fire smoke detection with limited data. In this paper, a novel hybrid attention-based few-shot learning method, named Attention-Based Prototypical Network, is proposed for forest fire smoke detection. Specifically, feature extraction network, which consists of convolutional block attention module, could extract high-level and discriminative features and further decrease the false alarm rate resulting from suspected smoke areas. Moreover, we design a metalearning module to alleviate the overfitting issue caused by limited smoke images, and the meta-learning network enables achieving effective detection via comparing the distance between the class prototype of support images and the features of query images. A series of experiments on forest fire smoke datasets and miniImageNet dataset testify that the proposed method is superior to state-of-the-art few-shot learning approaches.
基金support from the Ministry of Science and Tech-nology of the:People's Republic of China(Grant No.2021 YFB2600804)the Open Research Project Programme of the State Key Labor atory of Interet of Things for Smart City(University of Macao)(Grant No.SKL-IoTSC(UM)-2021-2023/ORPF/A19/2022)the General Research Fund(GRF)project(Grant No.15214722)from Research Grants Council(RGC)of Hong Kong Special Administrative Re gion Government of China are gratefully acknowledged.
文摘This research developed a hybrid position-channel network (named PCNet) through incorporating newly designed channel and position attention modules into U-Net to alleviate the crack discontinuity problem in channel and spatial dimensions. In PCNet, the U-Net is used as a baseline to extract informative spatial and channel-wise features from shield tunnel lining crack images. A channel and a position attention module are designed and embedded after each convolution layer of U-Net to model the feature interdependencies in channel and spatial dimensions. These attention modules can make the U-Net adaptively integrate local crack features with their global dependencies. Experiments were conducted utilizing the dataset based on the images from Shanghai metro shield tunnels. The results validate the effectiveness of the designed channel and position attention modules, since they can individually increase balanced accuracy (BA) by 11.25% and 12.95%, intersection over union (IoU) by 10.79% and 11.83%, and F1 score by 9.96% and 10.63%, respectively. In comparison with the state-of-the-art models (i.e. LinkNet, PSPNet, U-Net, PANet, and Mask R–CNN) on the testing dataset, the proposed PCNet outperforms others with an improvement of BA, IoU, and F1 score owing to the implementation of the channel and position attention modules. These evaluation metrics indicate that the proposed PCNet presents refined crack segmentation with improved performance and is a practicable approach to segment shield tunnel lining cracks in field practice.
基金National Natural Science Foundation of China,Grant/Award Number:62071039Beijing Natural Science Foundation,Grant/Award Number:L223033。
文摘The end-to-end separation algorithm with superior performance in the field of speech separation has not been effectively used in music separation.Moreover,since music signals are often dual channel data with a high sampling rate,how to model longsequence data and make rational use of the relevant information between channels is also an urgent problem to be solved.In order to solve the above problems,the performance of the end-to-end music separation algorithm is enhanced by improving the network structure.Our main contributions include the following:(1)A more reasonable densely connected U-Net is designed to capture the long-term characteristics of music,such as main melody,tone and so on.(2)On this basis,the multi-head attention and dualpath transformer are introduced in the separation module.Channel attention units are applied recursively on the feature map of each layer of the network,enabling the network to perform long-sequence separation.Experimental results show that after the introduction of the channel attention,the performance of the proposed algorithm has a stable improvement compared with the baseline system.On the MUSDB18 dataset,the average score of the separated audio exceeds that of the current best-performing music separation algorithm based on the time-frequency domain(T-F domain).
基金funded by Yayasan UTP FRG(YUTP-FRG),grant number 015LC0-280 and Computer and Information Science Department of Universiti Teknologi PETRONAS.
文摘Object detection has made a significant leap forward in recent years.However,the detection of small objects continues to be a great difficulty for various reasons,such as they have a very small size and they are susceptible to missed detection due to background noise.Additionally,small object information is affected due to the downsampling operations.Deep learning-based detection methods have been utilized to address the challenge posed by small objects.In this work,we propose a novel method,the Multi-Convolutional Block Attention Network(MCBAN),to increase the detection accuracy of minute objects aiming to overcome the challenge of information loss during the downsampling process.The multi-convolutional attention block(MCAB);channel attention and spatial attention module(SAM)that make up MCAB,have been crafted to accomplish small object detection with higher precision.We have carried out the experiments on the Karlsruhe Institute of Technology and Toyota Technological Institute(KITTI)and Pattern Analysis,Statical Modeling and Computational Learning(PASCAL)Visual Object Classes(VOC)datasets and have followed a step-wise process to analyze the results.These experiment results demonstrate that significant gains in performance are achieved,such as 97.75%for KITTI and 88.97%for PASCAL VOC.The findings of this study assert quite unequivocally the fact that MCBAN is much more efficient in the small object detection domain as compared to other existing approaches.
基金supported by the National Natural Science Foundation of China(Grant Nos.62005307 and 61975228).
文摘Structured illumination microscopy(SIM)is a popular and powerful super-resolution(SR)technique in biomedical research.However,the conventional reconstruction algorithm for SIM heavily relies on the accurate prior knowledge of illumination patterns and signal-to-noise ratio(SNR)of raw images.To obtain high-quality SR images,several raw images need to be captured under high fluorescence level,which further restricts SIM’s temporal resolution and its applications.Deep learning(DL)is a data-driven technology that has been used to expand the limits of optical microscopy.In this study,we propose a deep neural network based on multi-level wavelet and attention mechanism(MWAM)for SIM.Our results show that the MWAM network can extract high-frequency information contained in SIM raw images and accurately integrate it into the output image,resulting in superior SR images compared to those generated using wide-field images as input data.We also demonstrate that the number of SIM raw images can be reduced to three,with one image in each illumination orientation,to achieve the optimal tradeoff between temporal and spatial resolution.Furthermore,our MWAM network exhibits superior reconstruction ability on low-SNR images compared to conventional SIM algorithms.We have also analyzed the adaptability of this network on other biological samples and successfully applied the pretrained model to other SIM systems.
基金supported in part by the Science and Technology Innovation Project of CHN Energy Shuo Huang Railway Development Company Ltd(No.SHTL-22-28)the Beijing Natural Science Foundation Fengtai Urban Rail Transit Frontier Research Joint Fund(No.L231002)the Major Project of China State Railway Group Co.,Ltd.(No.K2023T003)。
文摘The detection of foreign object intrusion is crucial for ensuring the safety of railway operations.To address challenges such as low efficiency,suboptimal detection accuracy,and slow detection speed inherent in conventional comprehensive video monitoring systems for railways,a railway foreign object intrusion recognition and detection system is conceived and implemented using edge computing and deep learning technologies.In a bid to raise detection accuracy,the convolutional block attention module(CBAM),including spatial and channel attention modules,is seamlessly integrated into the YOLOv5 model,giving rise to the CBAM-YOLOv5 model.Furthermore,the distance intersection-over-union_non-maximum suppression(DIo U_NMS)algorithm is employed in lieu of the weighted nonmaximum suppression algorithm,resulting in improved detection performance for intrusive targets.To accelerate detection speed,the model undergoes pruning based on the batch normalization(BN)layer,and Tensor RT inference acceleration techniques are employed,culminating in the successful deployment of the algorithm on edge devices.The CBAM-YOLOv5 model exhibits a notable 2.1%enhancement in detection accuracy when evaluated on a selfconstructed railway dataset,achieving 95.0%for mean average precision(m AP).Furthermore,the inference speed on edge devices attains a commendable 15 frame/s.