Microseismic(MS)monitoring is an effective technique to detect mining-induced rock fractures.However,recognizing grouting-induced signals is challenging due to complex geological conditions in deep rock plates.Therefo...Microseismic(MS)monitoring is an effective technique to detect mining-induced rock fractures.However,recognizing grouting-induced signals is challenging due to complex geological conditions in deep rock plates.Therefore,a hybrid model(WM-ResNet50)integrating data enhancement,a deep convolutional neural network(CNN),and convolutional block attention modules(CBAM)was proposed.Firstly,an MS system was established at the Xieqiao coal mine in Anhui Province,China.MS waveforms and injection parameters were acquired during grouting.Secondly,signals were categorized based on time-frequency characteristics to build a dataset,which was divided into training,validation,and test sets at a ratio of 4:1:1.Subsequently,the performance of WM-ResNet50 was evaluated based on indices such as individual precision,total accuracy,recall,and loss function.The results indicated that WMResNet50 achieved an average recognition accuracy of 94.38%,surpassing that of a simple CNN(90.04%),ResNet18(91.72%),and ResNet50(92.48%).Finally,WM-ResNet50 was applied to monitor the whole process at laboratory tests and field cases.Both results affirmed the feasibility and effectiveness of MS inversion in predicting actual slurry diffusion ranges within deep rock layers.By comparison,it was revealed that the MS sources classified by WM-ResNet50 matched grouting records well.A solution to address insufficient diffusion under long-borehole grouting has been proposed.WM-ResNet50′s accuracy was validated through in-situ coring and XRD analysis for cement-based hydration products.This study provides a beneficial reference for similar rock signal processing and in-field grouting practices.展开更多
WiFi-based human activity recognition(HAR)provides a non-intrusive approach for ubiquitous monitoring;however,achieving both high accuracy and robustness simultaneously remains a significant challenge.This paper propo...WiFi-based human activity recognition(HAR)provides a non-intrusive approach for ubiquitous monitoring;however,achieving both high accuracy and robustness simultaneously remains a significant challenge.This paper proposes a Convolutional Neural Network with Enhanced Convolutional Block Attention Module(CNN-ECBAM)framework.The approach systematically converts raw Channel State Information(CSI)into pseudo-color images,effectively preserving essential signal characteristics for deep neural network processing.The core innovation is an Enhanced Convolutional Block Attention Module(ECBAM),tailored to CSI data characteristics,which integrates Efficient Channel Attention(ECA)and Multi-Scale Spatial Attention(MSSA).By employing learnable adaptive fusion weights,it achieves dynamic synergy between channel and spatial features,enabling the network to capture highly discriminative spatiotemporal patterns.The ECBAM module is integrated into a unified Convolutional Neural Network(CNN)to form the overall CNN-ECBAM model.Experimental results on the UT-HAR and NTU-Fi_HAR datasets demonstrate that CNN-ECBAM achieves competitive performance in recognition accuracy and outperforms mainstream baseline models.Specifically,it attains 99.20%accuracy on UT-HAR(surpassing ResNet-18 at 98.60%)and achieves 100%accuracy on NTU-Fi_HAR(exceeding GAF-CNN at 99.62%).These results validate the effectiveness of the proposed method for high-precision and reliable WiFi-based HAR.展开更多
Mango is a plant with high economic value in the agricultural industry;thus,it is necessary to maximize the productivity performance of the mango plant,which can be done by implementing artificial intelligence.In this...Mango is a plant with high economic value in the agricultural industry;thus,it is necessary to maximize the productivity performance of the mango plant,which can be done by implementing artificial intelligence.In this study,a lightweight object detection model will be developed that can detect mango plant conditions based on disease potential,so that it becomes an early detection warning system that has an impact on increasing agricultural productivity.The proposed lightweight model integrates YOLOv7-Tiny and the proposed modules,namely the C2S module.The C2S module consists of three sub-modules such as the convolutional block attention module(CBAM),the coordinate attention(CA)module,and the squeeze-and-excitation(SE)module.The dataset is constructed by eight classes,including seven classes of disease conditions and one class of health conditions.The experimental result shows that the proposed lightweight model has the optimal results,which increase by 13.15% of mAP50 compared to the original model YOLOv7-Tiny.While the mAP50:95 also achieved the highest results compared to other models,including YOLOv3-Tiny,YOLOv4-Tiny,YOLOv5,and YOLOv7-Tiny.The advantage of the proposed lightweightmodel is the adaptability that supports it in constrained environments,such as edge computing systems.This proposedmodel can support a robust,precise,and convenient precision agriculture system for the user.展开更多
The generation of high-quality,realistic face generation has emerged as a key field of research in computer vision.This paper proposes a robust approach that combines a Super-Resolution Generative Adversarial Network(...The generation of high-quality,realistic face generation has emerged as a key field of research in computer vision.This paper proposes a robust approach that combines a Super-Resolution Generative Adversarial Network(SRGAN)with a Pyramid Attention Module(PAM)to enhance the quality of deep face generation.The SRGAN framework is designed to improve the resolution of generated images,addressing common challenges such as blurriness and a lack of intricate details.The Pyramid Attention Module further complements the process by focusing on multi-scale feature extraction,enabling the network to capture finer details and complex facial features more effectively.The proposed method was trained and evaluated over 100 epochs on the CelebA dataset,demonstrating consistent improvements in image quality and a marked decrease in generator and discriminator losses,reflecting the model’s capacity to learn and synthesize high-quality images effectively,given adequate computational resources.Experimental outcome demonstrates that the SRGAN model with PAM module has outperformed,yielding an aggregate discriminator loss of 0.055 for real,0.043 for fake,and a generator loss of 10.58 after training for 100 epochs.The model has yielded an structural similarity index measure of 0.923,that has outperformed the other models that are considered in the current study for analysis.展开更多
With the continuous development of artificial intelligence and machine learning techniques,there have been effective methods supporting the work of dermatologist in the field of skin cancer detection.However,object si...With the continuous development of artificial intelligence and machine learning techniques,there have been effective methods supporting the work of dermatologist in the field of skin cancer detection.However,object significant challenges have been presented in accurately segmenting melanomas in dermoscopic images due to the objects that could interfere human observations,such as bubbles and scales.To address these challenges,we propose a dual U-Net network framework for skin melanoma segmentation.In our proposed architecture,we introduce several innovative components that aim to enhance the performance and capabilities of the traditional U-Net.First,we establish a novel framework that links two simplified U-Nets,enabling more comprehensive information exchange and feature integration throughout the network.Second,after cascading the second U-Net,we introduce a skip connection between the decoder and encoder networks,and incorporate a modified receptive field block(MRFB),which is designed to capture multi-scale spatial information.Third,to further enhance the feature representation capabilities,we add a multi-path convolution block attention module(MCBAM)to the first two layers of the first U-Net encoding,and integrate a new squeeze-and-excitation(SE)mechanism with residual connections in the second U-Net.To illustrate the performance of our proposed model,we conducted comprehensive experiments on widely recognized skin datasets.On the ISIC-2017 dataset,the IoU value of our proposed model increased from 0.6406 to 0.6819 and the Dice coefficient increased from 0.7625 to 0.8023.On the ISIC-2018 dataset,the IoU value of proposed model also improved from 0.7138 to 0.7709,while the Dice coefficient increased from 0.8285 to 0.8665.Furthermore,the generalization experiments conducted on the jaw cyst dataset from Quzhou People’s Hospital further verified the outstanding segmentation performance of the proposed model.These findings collectively affirm the potential of our approach as a valuable tool in supporting clinical decision-making in the field of skin cancer detection,as well as advancing research in medical image analysis.展开更多
Aim: To diagnose COVID-19 more efficiently and more correctly, this study proposed a novel attention network forCOVID-19 (ANC). Methods: Two datasets were used in this study. An 18-way data augmentation was proposed t...Aim: To diagnose COVID-19 more efficiently and more correctly, this study proposed a novel attention network forCOVID-19 (ANC). Methods: Two datasets were used in this study. An 18-way data augmentation was proposed toavoid overfitting. Then, convolutional block attention module (CBAM) was integrated to our model, the structureof which is fine-tuned. Finally, Grad-CAM was used to provide an explainable diagnosis. Results: The accuracyof our ANC methods on two datasets are 96.32% ± 1.06%, and 96.00% ± 1.03%, respectively. Conclusions: Thisproposed ANC method is superior to 9 state-of-the-art approaches.展开更多
Deep learning technology is widely used in computer vision.Generally,a large amount of data is used to train the model weights in deep learning,so as to obtain a model with higher accuracy.However,massive data and com...Deep learning technology is widely used in computer vision.Generally,a large amount of data is used to train the model weights in deep learning,so as to obtain a model with higher accuracy.However,massive data and complex model structures require more calculating resources.Since people generally can only carry and use mobile and portable devices in application scenarios,neural networks have limitations in terms of calculating resources,size and power consumption.Therefore,the efficient lightweight model MobileNet is used as the basic network in this study for optimization.First,the accuracy of the MobileNet model is improved by adding methods such as the convolutional block attention module(CBAM)and expansion convolution.Then,the MobileNet model is compressed by using pruning and weight quantization algorithms based on weight size.Afterwards,methods such as Python crawlers and data augmentation are employed to create a garbage classification data set.Based on the above model optimization strategy,the garbage classification mobile terminal application is deployed on mobile phones and raspberry pies,realizing completing the garbage classification task more conveniently.展开更多
The infrastructure and construction of roads are crucial for the economic and social development of a region,but traffic-related challenges like accidents and congestion persist.Artificial Intelligence(AI)and Machine ...The infrastructure and construction of roads are crucial for the economic and social development of a region,but traffic-related challenges like accidents and congestion persist.Artificial Intelligence(AI)and Machine Learning(ML)have been used in road infrastructure and construction,particularly with the Internet of Things(IoT)devices.Object detection in Computer Vision also plays a key role in improving road infrastructure and addressing trafficrelated problems.This study aims to use You Only Look Once version 7(YOLOv7),Convolutional Block Attention Module(CBAM),the most optimized object-detection algorithm,to detect and identify traffic signs,and analyze effective combinations of adaptive optimizers like Adaptive Moment estimation(Adam),Root Mean Squared Propagation(RMSprop)and Stochastic Gradient Descent(SGD)with the YOLOv7.Using a portion of German traffic signs for training,the study investigates the feasibility of adopting smaller datasets while maintaining high accuracy.The model proposed in this study not only improves traffic safety by detecting traffic signs but also has the potential to contribute to the rapid development of autonomous vehicle systems.The study results showed an impressive accuracy of 99.7%when using a batch size of 8 and the Adam optimizer.This high level of accuracy demonstrates the effectiveness of the proposed model for the image classification task of traffic sign recognition.展开更多
The unsupervised multi-modal image translation is an emerging domain of computer vision whose goal is to transform an image from the source domain into many diverse styles in the target domain.However,the multi-genera...The unsupervised multi-modal image translation is an emerging domain of computer vision whose goal is to transform an image from the source domain into many diverse styles in the target domain.However,the multi-generator mechanism is employed among the advanced approaches available to model different domain mappings,which results in inefficient training of neural networks and pattern collapse,leading to inefficient generation of image diversity.To address this issue,this paper introduces a multi-modal unsupervised image translation framework that uses a generator to perform multi-modal image translation.Specifically,firstly,the domain code is introduced in this paper to explicitly control the different generation tasks.Secondly,this paper brings in the squeeze-and-excitation(SE)mechanism and feature attention(FA)module.Finally,the model integrates multiple optimization objectives to ensure efficient multi-modal translation.This paper performs qualitative and quantitative experiments on multiple non-paired benchmark image translation datasets while demonstrating the benefits of the proposed method over existing technologies.Overall,experimental results have shown that the proposed method is versatile and scalable.展开更多
In massive multiple-input multiple-output(MIMO)systems utilizing frequency division duplexing,optimizing system performance requires user equipment(UE)to compress downlink channel state information(CSI)and transmit it...In massive multiple-input multiple-output(MIMO)systems utilizing frequency division duplexing,optimizing system performance requires user equipment(UE)to compress downlink channel state information(CSI)and transmit it to the base station(BS).As the number of antennas increases,there is a significant rise in the overhead related to CSI feedback,posing considerable challenges to the precise acquisition of CSI by the BS.Existing approaches to CSI feedback utilizing deep learning techniques face challenges such as significant feedback overhead and limited precision in the reconstruction process.This study presents a novel lightweight CSI feedback framework known as the dual attention neural network(DANet).Within the DANet architecture,a dual attention module(DAM)is designed to enhance the network's performance.This DAM includes both channel attention blocks and spatial attention blocks.The channel attention blocks direct the model's focus toward channel features rich in information content while simultaneously suppressing less significant features.This approach enables the extraction of temporal correlations within the CSI matrix.The spatial attention block aids in extracting the correlation between the delay domain and the angle domain in the CSI matrix.By enhancing neural network performance,the DAM reduces information dispersion while enhancing the representation of global interactions.Simulation results demonstrate that DANet exhibits superior normalized mean square error and cosine similarity with comparable complexity compared to existing advanced CSI feedback methods.展开更多
As one of the key technologies of intelligent vehicles, traffic sign detection is still a challenging task because of the tiny size of its target object. To address the challenge, we present a novel detection network ...As one of the key technologies of intelligent vehicles, traffic sign detection is still a challenging task because of the tiny size of its target object. To address the challenge, we present a novel detection network improved from yolo-v3 for the tiny traffic sign with high precision in real-time. First, a visual multi-scale attention module(MSAM), a light-weight yet effective module, is devised to fuse the multi-scale feature maps with channel weights and spatial masks. It increases the representation power of the network by emphasizing useful features and suppressing unnecessary ones. Second, we exploit effectively fine-grained features about tiny objects from the shallower layers through modifying backbone Darknet-53 and adding one prediction head to yolo-v3. Finally, a receptive field block is added into the neck of the network to broaden the receptive field. Experiments prove the effectiveness of our network in both quantitative and qualitative aspects. The m AP@0.5 of our network reaches 0.965 and its detection speed is55.56 FPS for 512 × 512 images on the challenging Tsinghua-Tencent 100 k(TT100 k) dataset.展开更多
Self-supervised monocular depth estimation has been widely investigated and applied in previous works.However,existing methods suffer from texture-copy,depth drift,and incomplete structure.It is difficult for normal C...Self-supervised monocular depth estimation has been widely investigated and applied in previous works.However,existing methods suffer from texture-copy,depth drift,and incomplete structure.It is difficult for normal CNN networks to completely understand the relationship between the object and its surrounding environment.Moreover,it is hard to design the depth smoothness loss to balance depth smoothness and sharpness.To address these issues,we propose a coarse-to-fine method with a normalized convolutional block attention module(NCBAM).In the coarse estimation stage,we incorporate the NCBAM into depth and pose networks to overcome the texture-copy and depth drift problems.Then,we use a new network to refine the coarse depth guided by the color image and produce a structure-preserving depth result in the refinement stage.Our method can produce results competitive with state-of-the-art methods.Comprehensive experiments prove the effectiveness of our two-stage method using the NCBAM.展开更多
The non-destructive recognition of coated seeds is crucial for advancing studies in coating theory.Currently,the recognition of coated seeds heavily relies on manual visual inspection and machine vision detection.Howe...The non-destructive recognition of coated seeds is crucial for advancing studies in coating theory.Currently,the recognition of coated seeds heavily relies on manual visual inspection and machine vision detection.However,these methods pose challenges such as high misclassification rates,low recognition efficiency,and elevated labor intensity.In response to the aforementioned challenges,this study leveraged deep learning techniques to develop a coated seed recognition model named YOLO-Coated Seeds Recognition(YOLO-CSR),aiming to address the challenges posed by coated seed recognition tasks.The experiment of this study mainly includes the following steps:First,a seed coating machine was set up to coat red clover seeds,resulting in three types of coated red clover seeds.Subsequently,by collecting images of the three types of coated seeds,a coated seed image dataset was further constructed.Then,the YOLOv5s was built,incorporating the Convolutional Block Attention Module(CBAM)into the model’s backbone to enhance its ability to learn features of coated seeds.Finally,the training results of YOLO-CSR were compared with those of other classical recognition models.The experimental results showed that YOLO-CSR achieved the best recognition performance on the self-built coated seed image dataset.The average precision(AP)for recognizing the three types of coated seeds reached 98.43%,97.91%,and 97.26%,with a mean average precision@0.5(mAP@0.5)of 97.87%.Compared to YOLOv5,YOLO-CSR showed a 1.18%improvement in mAP@0.5.Additionally,YOLO-CSR has a model size of only 14.9 MB,with an average recognition time(ART)of 10.1 ms and a frame per second(FPS)of 99.Experimental results prove that YOLO-CSR can accurately,efficiently,and rapidly recognize coated red clover seeds.The findings of this study provide technical support for the non-destructive recognition of spherical coated seeds.展开更多
Pest detection techniques are helpful in reducing the frequency and scale of pest outbreaks;however,their application in the actual agricultural production process is still challenging owing to the problems of intersp...Pest detection techniques are helpful in reducing the frequency and scale of pest outbreaks;however,their application in the actual agricultural production process is still challenging owing to the problems of interspecies similarity,multi-scale,and background complexity of pests.To address these problems,this study proposes an FD-YOLO pest target detection model.The FD-YOLO model uses a Fully Connected Feature Pyramid Network(FC-FPN)instead of a PANet in the neck,which can adaptively fuse multi-scale information so that the model can retain small-scale target features in the deep layer,enhance large-scale target features in the shallow layer,and enhance the multiplexing of effective features.A dual self-attention module(DSA)is then embedded in the C3 module of the neck,which captures the dependencies between the information in both spatial and channel dimensions,effectively enhancing global features.We selected 16 types of pests that widely damage field crops in the IP102 pest dataset,which were used as our dataset after data supplementation and enhancement.The experimental results showed that FD-YOLO’s mAP@0.5 improved by 6.8%compared to YOLOv5,reaching 82.6%and 19.1%–5%better than other state-of-the-art models.This method provides an effective new approach for detecting similar or multiscale pests in field crops.展开更多
This paper proposes an automated detection framework for transmission facilities using a featureattention multi-scale robustness network(FAMSR-Net)with high-fidelity virtual images.The proposed framework exhibits thre...This paper proposes an automated detection framework for transmission facilities using a featureattention multi-scale robustness network(FAMSR-Net)with high-fidelity virtual images.The proposed framework exhibits three key characteristics.First,virtual images of the transmission facilities generated using StyleGAN2-ADA are co-trained with real images.This enables the neural network to learn various features of transmission facilities to improve the detection performance.Second,the convolutional block attention module is deployed in FAMSR-Net to effectively extract features from images and construct multi-dimensional feature maps,enabling the neural network to perform precise object detection in various environments.Third,an effective bounding box optimization method called Scylla-IoU is deployed on FAMSR-Net,considering the intersection over union,center point distance,angle,and shape of the bounding box.This enables the detection of power facilities of various sizes accurately.Extensive experiments demonstrated that FAMSRNet outperforms other neural networks in detecting power facilities.FAMSR-Net also achieved the highest detection accuracy when virtual images of the transmission facilities were co-trained in the training phase.The proposed framework is effective for the scheduled operation and maintenance of transmission facilities because an optical camera is currently the most promising tool for unmanned aerial vehicles.This ultimately contributes to improved inspection efficiency,reduced maintenance risks,and more reliable power delivery across extensive transmission facilities.展开更多
We propose a hierarchical multi-scale attention mechanism-based model in response to the low accuracy and inefficient manual classification of existing oceanic biological image classification methods. Firstly, the hie...We propose a hierarchical multi-scale attention mechanism-based model in response to the low accuracy and inefficient manual classification of existing oceanic biological image classification methods. Firstly, the hierarchical efficient multi-scale attention(H-EMA) module is designed for lightweight feature extraction, achieving outstanding performance at a relatively low cost. Secondly, an improved EfficientNetV2 block is used to integrate information from different scales better and enhance inter-layer message passing. Furthermore, introducing the convolutional block attention module(CBAM) enhances the model's perception of critical features, optimizing its generalization ability. Lastly, Focal Loss is introduced to adjust the weights of complex samples to address the issue of imbalanced categories in the dataset, further improving the model's performance. The model achieved 96.11% accuracy on the intertidal marine organism dataset of Nanji Islands and 84.78% accuracy on the CIFAR-100 dataset, demonstrating its strong generalization ability to meet the demands of oceanic biological image classification.展开更多
Vehicle re-identification involves matching images of vehicles across varying camera views.The diversity of camera locations along different roadways leads to significant intra-class variation and only minimal inter-c...Vehicle re-identification involves matching images of vehicles across varying camera views.The diversity of camera locations along different roadways leads to significant intra-class variation and only minimal inter-class similarity in the collected vehicle images,which increases the complexity of re-identification tasks.To tackle these challenges,this study proposes AG-GCN(Attention-Guided Graph Convolutional Network),a novel framework integrating several pivotal components.Initially,AG-GCN embeds a lightweight attention module within the ResNet-50 structure to learn feature weights automatically,thereby improving the representation of vehicle features globally by highlighting salient features and suppressing extraneous ones.Moreover,AG-GCN adopts a graph-based structure to encapsulate deep local features.A graph convolutional network then amalgamates these features to understand the relationships among vehicle-related characteristics.Subsequently,we amalgamate feature maps from both the attention and graph-based branches for a more comprehensive representation of vehicle features.The framework then gauges feature similarities and ranks them,thus enhancing the accuracy of vehicle re-identification.Comprehensive qualitative and quantitative analyses on two publicly available datasets verify the efficacy of AG-GCN in addressing intra-class and inter-class variability issues.展开更多
Accurate vehicle detection is essential for autonomous driving,traffic monitoring,and intelligent transportation systems.This paper presents an enhanced YOLOv8n model that incorporates the Ghost Module,Convolutional B...Accurate vehicle detection is essential for autonomous driving,traffic monitoring,and intelligent transportation systems.This paper presents an enhanced YOLOv8n model that incorporates the Ghost Module,Convolutional Block Attention Module(CBAM),and Deformable Convolutional Networks v2(DCNv2).The Ghost Module streamlines feature generation to reduce redundancy,CBAM applies channel and spatial attention to improve feature focus,and DCNv2 enables adaptability to geometric variations in vehicle shapes.These components work together to improve both accuracy and computational efficiency.Evaluated on the KITTI dataset,the proposed model achieves 95.4%mAP@0.5—an 8.97% gain over standard YOLOv8n—along with 96.2% precision,93.7% recall,and a 94.93%F1-score.Comparative analysis with seven state-of-the-art detectors demonstrates consistent superiority in key performance metrics.An ablation study is also conducted to quantify the individual and combined contributions of GhostModule,CBAM,and DCNv2,highlighting their effectiveness in improving detection performance.By addressing feature redundancy,attention refinement,and spatial adaptability,the proposed model offers a robust and scalable solution for vehicle detection across diverse traffic scenarios.展开更多
Multimodal image fusion plays an important role in image analysis and applications.Multimodal medical image fusion helps to combine contrast features from two or more input imaging modalities to represent fused inform...Multimodal image fusion plays an important role in image analysis and applications.Multimodal medical image fusion helps to combine contrast features from two or more input imaging modalities to represent fused information in a single image.One of the critical clinical applications of medical image fusion is to fuse anatomical and functional modalities for rapid diagnosis of malignant tissues.This paper proposes a multimodal medical image fusion network(MMIF-Net)based on multiscale hybrid attention.The method first decomposes the original image to obtain the low-rank and significant parts.Then,to utilize the features at different scales,we add amultiscalemechanism that uses three filters of different sizes to extract the features in the encoded network.Also,a hybrid attention module is introduced to obtain more image details.Finally,the fused images are reconstructed by decoding the network.We conducted experiments with clinical images from brain computed tomography/magnetic resonance.The experimental results show that the multimodal medical image fusion network method based on multiscale hybrid attention works better than other advanced fusion methods.展开更多
Existing almost deep learning methods rely on a large amount of annotated data, so they are inappropriate for forest fire smoke detection with limited data. In this paper, a novel hybrid attention-based few-shot learn...Existing almost deep learning methods rely on a large amount of annotated data, so they are inappropriate for forest fire smoke detection with limited data. In this paper, a novel hybrid attention-based few-shot learning method, named Attention-Based Prototypical Network, is proposed for forest fire smoke detection. Specifically, feature extraction network, which consists of convolutional block attention module, could extract high-level and discriminative features and further decrease the false alarm rate resulting from suspected smoke areas. Moreover, we design a metalearning module to alleviate the overfitting issue caused by limited smoke images, and the meta-learning network enables achieving effective detection via comparing the distance between the class prototype of support images and the features of query images. A series of experiments on forest fire smoke datasets and miniImageNet dataset testify that the proposed method is superior to state-of-the-art few-shot learning approaches.展开更多
基金financial support from the National Natural Science Foundation of China(Nos.52204089,52374082)the Young Elite Scientists Sponsorship Program(No.2023QNRC001)by China Association for Science and Technology(CAST).
文摘Microseismic(MS)monitoring is an effective technique to detect mining-induced rock fractures.However,recognizing grouting-induced signals is challenging due to complex geological conditions in deep rock plates.Therefore,a hybrid model(WM-ResNet50)integrating data enhancement,a deep convolutional neural network(CNN),and convolutional block attention modules(CBAM)was proposed.Firstly,an MS system was established at the Xieqiao coal mine in Anhui Province,China.MS waveforms and injection parameters were acquired during grouting.Secondly,signals were categorized based on time-frequency characteristics to build a dataset,which was divided into training,validation,and test sets at a ratio of 4:1:1.Subsequently,the performance of WM-ResNet50 was evaluated based on indices such as individual precision,total accuracy,recall,and loss function.The results indicated that WMResNet50 achieved an average recognition accuracy of 94.38%,surpassing that of a simple CNN(90.04%),ResNet18(91.72%),and ResNet50(92.48%).Finally,WM-ResNet50 was applied to monitor the whole process at laboratory tests and field cases.Both results affirmed the feasibility and effectiveness of MS inversion in predicting actual slurry diffusion ranges within deep rock layers.By comparison,it was revealed that the MS sources classified by WM-ResNet50 matched grouting records well.A solution to address insufficient diffusion under long-borehole grouting has been proposed.WM-ResNet50′s accuracy was validated through in-situ coring and XRD analysis for cement-based hydration products.This study provides a beneficial reference for similar rock signal processing and in-field grouting practices.
基金Supported by Anhui Provincial Engineering Research Center for Sports and Health Information Monitoring Technology(KF2023012)。
文摘WiFi-based human activity recognition(HAR)provides a non-intrusive approach for ubiquitous monitoring;however,achieving both high accuracy and robustness simultaneously remains a significant challenge.This paper proposes a Convolutional Neural Network with Enhanced Convolutional Block Attention Module(CNN-ECBAM)framework.The approach systematically converts raw Channel State Information(CSI)into pseudo-color images,effectively preserving essential signal characteristics for deep neural network processing.The core innovation is an Enhanced Convolutional Block Attention Module(ECBAM),tailored to CSI data characteristics,which integrates Efficient Channel Attention(ECA)and Multi-Scale Spatial Attention(MSSA).By employing learnable adaptive fusion weights,it achieves dynamic synergy between channel and spatial features,enabling the network to capture highly discriminative spatiotemporal patterns.The ECBAM module is integrated into a unified Convolutional Neural Network(CNN)to form the overall CNN-ECBAM model.Experimental results on the UT-HAR and NTU-Fi_HAR datasets demonstrate that CNN-ECBAM achieves competitive performance in recognition accuracy and outperforms mainstream baseline models.Specifically,it attains 99.20%accuracy on UT-HAR(surpassing ResNet-18 at 98.60%)and achieves 100%accuracy on NTU-Fi_HAR(exceeding GAF-CNN at 99.62%).These results validate the effectiveness of the proposed method for high-precision and reliable WiFi-based HAR.
基金supported by National Science and Technology Council(NSTC)Taiwan,Grant No.NSTC 113-2221-E-167-023.
文摘Mango is a plant with high economic value in the agricultural industry;thus,it is necessary to maximize the productivity performance of the mango plant,which can be done by implementing artificial intelligence.In this study,a lightweight object detection model will be developed that can detect mango plant conditions based on disease potential,so that it becomes an early detection warning system that has an impact on increasing agricultural productivity.The proposed lightweight model integrates YOLOv7-Tiny and the proposed modules,namely the C2S module.The C2S module consists of three sub-modules such as the convolutional block attention module(CBAM),the coordinate attention(CA)module,and the squeeze-and-excitation(SE)module.The dataset is constructed by eight classes,including seven classes of disease conditions and one class of health conditions.The experimental result shows that the proposed lightweight model has the optimal results,which increase by 13.15% of mAP50 compared to the original model YOLOv7-Tiny.While the mAP50:95 also achieved the highest results compared to other models,including YOLOv3-Tiny,YOLOv4-Tiny,YOLOv5,and YOLOv7-Tiny.The advantage of the proposed lightweightmodel is the adaptability that supports it in constrained environments,such as edge computing systems.This proposedmodel can support a robust,precise,and convenient precision agriculture system for the user.
基金supported by the National Research Foundation of Korea(NRF)grant funded by the Korea government(*MSIT)(No.2018R1A5A7059549).
文摘The generation of high-quality,realistic face generation has emerged as a key field of research in computer vision.This paper proposes a robust approach that combines a Super-Resolution Generative Adversarial Network(SRGAN)with a Pyramid Attention Module(PAM)to enhance the quality of deep face generation.The SRGAN framework is designed to improve the resolution of generated images,addressing common challenges such as blurriness and a lack of intricate details.The Pyramid Attention Module further complements the process by focusing on multi-scale feature extraction,enabling the network to capture finer details and complex facial features more effectively.The proposed method was trained and evaluated over 100 epochs on the CelebA dataset,demonstrating consistent improvements in image quality and a marked decrease in generator and discriminator losses,reflecting the model’s capacity to learn and synthesize high-quality images effectively,given adequate computational resources.Experimental outcome demonstrates that the SRGAN model with PAM module has outperformed,yielding an aggregate discriminator loss of 0.055 for real,0.043 for fake,and a generator loss of 10.58 after training for 100 epochs.The model has yielded an structural similarity index measure of 0.923,that has outperformed the other models that are considered in the current study for analysis.
基金funded by Zhejiang Basic Public Welfare Research Project,grant number LZY24E060001supported by Guangzhou Development Zone Science and Technology(2021GH10,2020GH10,2023GH02)+1 种基金the University of Macao(MYRG2022-00271-FST)the Science and Technology Development Fund(FDCT)of Macao(0032/2022/A).
文摘With the continuous development of artificial intelligence and machine learning techniques,there have been effective methods supporting the work of dermatologist in the field of skin cancer detection.However,object significant challenges have been presented in accurately segmenting melanomas in dermoscopic images due to the objects that could interfere human observations,such as bubbles and scales.To address these challenges,we propose a dual U-Net network framework for skin melanoma segmentation.In our proposed architecture,we introduce several innovative components that aim to enhance the performance and capabilities of the traditional U-Net.First,we establish a novel framework that links two simplified U-Nets,enabling more comprehensive information exchange and feature integration throughout the network.Second,after cascading the second U-Net,we introduce a skip connection between the decoder and encoder networks,and incorporate a modified receptive field block(MRFB),which is designed to capture multi-scale spatial information.Third,to further enhance the feature representation capabilities,we add a multi-path convolution block attention module(MCBAM)to the first two layers of the first U-Net encoding,and integrate a new squeeze-and-excitation(SE)mechanism with residual connections in the second U-Net.To illustrate the performance of our proposed model,we conducted comprehensive experiments on widely recognized skin datasets.On the ISIC-2017 dataset,the IoU value of our proposed model increased from 0.6406 to 0.6819 and the Dice coefficient increased from 0.7625 to 0.8023.On the ISIC-2018 dataset,the IoU value of proposed model also improved from 0.7138 to 0.7709,while the Dice coefficient increased from 0.8285 to 0.8665.Furthermore,the generalization experiments conducted on the jaw cyst dataset from Quzhou People’s Hospital further verified the outstanding segmentation performance of the proposed model.These findings collectively affirm the potential of our approach as a valuable tool in supporting clinical decision-making in the field of skin cancer detection,as well as advancing research in medical image analysis.
基金This paper is partially supported by Open Fund for Jiangsu Key Laboratory of Advanced Manufacturing Technology(HGAMTL-1703)Guangxi Key Laboratory of Trusted Software(kx201901)+5 种基金Fundamental Research Funds for the Central Universities(CDLS-2020-03)Key Laboratory of Child Development and Learning Science(Southeast University),Ministry of EducationRoyal Society International Exchanges Cost Share Award,UK(RP202G0230)Medical Research Council Confidence in Concept Award,UK(MC_PC_17171)Hope Foundation for Cancer Research,UK(RM60G0680)British Heart Foundation Accelerator Award,UK.
文摘Aim: To diagnose COVID-19 more efficiently and more correctly, this study proposed a novel attention network forCOVID-19 (ANC). Methods: Two datasets were used in this study. An 18-way data augmentation was proposed toavoid overfitting. Then, convolutional block attention module (CBAM) was integrated to our model, the structureof which is fine-tuned. Finally, Grad-CAM was used to provide an explainable diagnosis. Results: The accuracyof our ANC methods on two datasets are 96.32% ± 1.06%, and 96.00% ± 1.03%, respectively. Conclusions: Thisproposed ANC method is superior to 9 state-of-the-art approaches.
文摘Deep learning technology is widely used in computer vision.Generally,a large amount of data is used to train the model weights in deep learning,so as to obtain a model with higher accuracy.However,massive data and complex model structures require more calculating resources.Since people generally can only carry and use mobile and portable devices in application scenarios,neural networks have limitations in terms of calculating resources,size and power consumption.Therefore,the efficient lightweight model MobileNet is used as the basic network in this study for optimization.First,the accuracy of the MobileNet model is improved by adding methods such as the convolutional block attention module(CBAM)and expansion convolution.Then,the MobileNet model is compressed by using pruning and weight quantization algorithms based on weight size.Afterwards,methods such as Python crawlers and data augmentation are employed to create a garbage classification data set.Based on the above model optimization strategy,the garbage classification mobile terminal application is deployed on mobile phones and raspberry pies,realizing completing the garbage classification task more conveniently.
文摘The infrastructure and construction of roads are crucial for the economic and social development of a region,but traffic-related challenges like accidents and congestion persist.Artificial Intelligence(AI)and Machine Learning(ML)have been used in road infrastructure and construction,particularly with the Internet of Things(IoT)devices.Object detection in Computer Vision also plays a key role in improving road infrastructure and addressing trafficrelated problems.This study aims to use You Only Look Once version 7(YOLOv7),Convolutional Block Attention Module(CBAM),the most optimized object-detection algorithm,to detect and identify traffic signs,and analyze effective combinations of adaptive optimizers like Adaptive Moment estimation(Adam),Root Mean Squared Propagation(RMSprop)and Stochastic Gradient Descent(SGD)with the YOLOv7.Using a portion of German traffic signs for training,the study investigates the feasibility of adopting smaller datasets while maintaining high accuracy.The model proposed in this study not only improves traffic safety by detecting traffic signs but also has the potential to contribute to the rapid development of autonomous vehicle systems.The study results showed an impressive accuracy of 99.7%when using a batch size of 8 and the Adam optimizer.This high level of accuracy demonstrates the effectiveness of the proposed model for the image classification task of traffic sign recognition.
基金the National Natural Science Foundation of China(No.61976080)the Academic Degrees&Graduate Education Reform Project of Henan Province(No.2021SJGLX195Y)+1 种基金the Teaching Reform Research and Practice Project of Henan Undergraduate Universities(No.2022SYJXLX008)the Key Project on Research and Practice of Henan University Graduate Education and Teaching Reform(No.YJSJG2023XJ006)。
文摘The unsupervised multi-modal image translation is an emerging domain of computer vision whose goal is to transform an image from the source domain into many diverse styles in the target domain.However,the multi-generator mechanism is employed among the advanced approaches available to model different domain mappings,which results in inefficient training of neural networks and pattern collapse,leading to inefficient generation of image diversity.To address this issue,this paper introduces a multi-modal unsupervised image translation framework that uses a generator to perform multi-modal image translation.Specifically,firstly,the domain code is introduced in this paper to explicitly control the different generation tasks.Secondly,this paper brings in the squeeze-and-excitation(SE)mechanism and feature attention(FA)module.Finally,the model integrates multiple optimization objectives to ensure efficient multi-modal translation.This paper performs qualitative and quantitative experiments on multiple non-paired benchmark image translation datasets while demonstrating the benefits of the proposed method over existing technologies.Overall,experimental results have shown that the proposed method is versatile and scalable.
基金National Natural Science Foundation of China(12005108)。
文摘In massive multiple-input multiple-output(MIMO)systems utilizing frequency division duplexing,optimizing system performance requires user equipment(UE)to compress downlink channel state information(CSI)and transmit it to the base station(BS).As the number of antennas increases,there is a significant rise in the overhead related to CSI feedback,posing considerable challenges to the precise acquisition of CSI by the BS.Existing approaches to CSI feedback utilizing deep learning techniques face challenges such as significant feedback overhead and limited precision in the reconstruction process.This study presents a novel lightweight CSI feedback framework known as the dual attention neural network(DANet).Within the DANet architecture,a dual attention module(DAM)is designed to enhance the network's performance.This DAM includes both channel attention blocks and spatial attention blocks.The channel attention blocks direct the model's focus toward channel features rich in information content while simultaneously suppressing less significant features.This approach enables the extraction of temporal correlations within the CSI matrix.The spatial attention block aids in extracting the correlation between the delay domain and the angle domain in the CSI matrix.By enhancing neural network performance,the DAM reduces information dispersion while enhancing the representation of global interactions.Simulation results demonstrate that DANet exhibits superior normalized mean square error and cosine similarity with comparable complexity compared to existing advanced CSI feedback methods.
基金supported by the National Key R&D Program of China(Grant Nos.2018YFB2101100 and 2019YFB2101600)the National Natural Science Foundation of China(Grant No.62176016)+2 种基金the Guizhou Province Science and Technology Project:Research and Demonstration of Science and Technology Big Data Mining Technology Based on Knowledge Graph(Qiankehe[2021]General 382)the Training Program of the Major Research Plan of the National Natural Science Foundation of China(Grant No.92046015)the Beijing Natural Science Foundation Program and Scientific Research Key Program of Beijing Municipal Commission of Education(Grant No.KZ202010025047)。
文摘As one of the key technologies of intelligent vehicles, traffic sign detection is still a challenging task because of the tiny size of its target object. To address the challenge, we present a novel detection network improved from yolo-v3 for the tiny traffic sign with high precision in real-time. First, a visual multi-scale attention module(MSAM), a light-weight yet effective module, is devised to fuse the multi-scale feature maps with channel weights and spatial masks. It increases the representation power of the network by emphasizing useful features and suppressing unnecessary ones. Second, we exploit effectively fine-grained features about tiny objects from the shallower layers through modifying backbone Darknet-53 and adding one prediction head to yolo-v3. Finally, a receptive field block is added into the neck of the network to broaden the receptive field. Experiments prove the effectiveness of our network in both quantitative and qualitative aspects. The m AP@0.5 of our network reaches 0.965 and its detection speed is55.56 FPS for 512 × 512 images on the challenging Tsinghua-Tencent 100 k(TT100 k) dataset.
基金partially supported by the Key Technological Innovation Projects of Hubei Province(2018AAA062)National Natural Science Foundation of China(61972298)Wuhan University-Huawei GeoInformatics Innovation Lab.
文摘Self-supervised monocular depth estimation has been widely investigated and applied in previous works.However,existing methods suffer from texture-copy,depth drift,and incomplete structure.It is difficult for normal CNN networks to completely understand the relationship between the object and its surrounding environment.Moreover,it is hard to design the depth smoothness loss to balance depth smoothness and sharpness.To address these issues,we propose a coarse-to-fine method with a normalized convolutional block attention module(NCBAM).In the coarse estimation stage,we incorporate the NCBAM into depth and pose networks to overcome the texture-copy and depth drift problems.Then,we use a new network to refine the coarse depth guided by the color image and produce a structure-preserving depth result in the refinement stage.Our method can produce results competitive with state-of-the-art methods.Comprehensive experiments prove the effectiveness of our two-stage method using the NCBAM.
基金funded by the National Key Research and Development Program of China (Grant No.2022YFF1302300)the Key R&D and Achievement Transformation Plan Project of Inner Mongolia (Grant No.2023YFDZ0006)+1 种基金the Program for Improving the Scientific Research Ability of Youth Teachers of Inner Mongolia Agricultural University (Grant No.BR220128)the Research Program of science and technology at Universities of Inner Mongolia Autonomous Region (Grant No.NJZZ23046).
文摘The non-destructive recognition of coated seeds is crucial for advancing studies in coating theory.Currently,the recognition of coated seeds heavily relies on manual visual inspection and machine vision detection.However,these methods pose challenges such as high misclassification rates,low recognition efficiency,and elevated labor intensity.In response to the aforementioned challenges,this study leveraged deep learning techniques to develop a coated seed recognition model named YOLO-Coated Seeds Recognition(YOLO-CSR),aiming to address the challenges posed by coated seed recognition tasks.The experiment of this study mainly includes the following steps:First,a seed coating machine was set up to coat red clover seeds,resulting in three types of coated red clover seeds.Subsequently,by collecting images of the three types of coated seeds,a coated seed image dataset was further constructed.Then,the YOLOv5s was built,incorporating the Convolutional Block Attention Module(CBAM)into the model’s backbone to enhance its ability to learn features of coated seeds.Finally,the training results of YOLO-CSR were compared with those of other classical recognition models.The experimental results showed that YOLO-CSR achieved the best recognition performance on the self-built coated seed image dataset.The average precision(AP)for recognizing the three types of coated seeds reached 98.43%,97.91%,and 97.26%,with a mean average precision@0.5(mAP@0.5)of 97.87%.Compared to YOLOv5,YOLO-CSR showed a 1.18%improvement in mAP@0.5.Additionally,YOLO-CSR has a model size of only 14.9 MB,with an average recognition time(ART)of 10.1 ms and a frame per second(FPS)of 99.Experimental results prove that YOLO-CSR can accurately,efficiently,and rapidly recognize coated red clover seeds.The findings of this study provide technical support for the non-destructive recognition of spherical coated seeds.
基金funded by Liaoning Provincial Department of Education Project,Award number JYTMS20230418.
文摘Pest detection techniques are helpful in reducing the frequency and scale of pest outbreaks;however,their application in the actual agricultural production process is still challenging owing to the problems of interspecies similarity,multi-scale,and background complexity of pests.To address these problems,this study proposes an FD-YOLO pest target detection model.The FD-YOLO model uses a Fully Connected Feature Pyramid Network(FC-FPN)instead of a PANet in the neck,which can adaptively fuse multi-scale information so that the model can retain small-scale target features in the deep layer,enhance large-scale target features in the shallow layer,and enhance the multiplexing of effective features.A dual self-attention module(DSA)is then embedded in the C3 module of the neck,which captures the dependencies between the information in both spatial and channel dimensions,effectively enhancing global features.We selected 16 types of pests that widely damage field crops in the IP102 pest dataset,which were used as our dataset after data supplementation and enhancement.The experimental results showed that FD-YOLO’s mAP@0.5 improved by 6.8%compared to YOLOv5,reaching 82.6%and 19.1%–5%better than other state-of-the-art models.This method provides an effective new approach for detecting similar or multiscale pests in field crops.
基金supported by the Korea Electric Power Corporation(R22TA14,Development of Drone Systemfor Diagnosis of Porcelain Insulators in Overhead Transmission Lines)the National Fire Agency of Korea(RS-2024-00408270,Fire Hazard Analysis and Fire Safety Standards Development for Transportation and Storage Stage of Reuse Battery)the Ministry of the Interior and Safety of Korea(RS-2024-00408982,Development of Intelligent Fire Detection and Sprinkler Facility Technology Reflecting the Characteristics of Logistics Facilities).
文摘This paper proposes an automated detection framework for transmission facilities using a featureattention multi-scale robustness network(FAMSR-Net)with high-fidelity virtual images.The proposed framework exhibits three key characteristics.First,virtual images of the transmission facilities generated using StyleGAN2-ADA are co-trained with real images.This enables the neural network to learn various features of transmission facilities to improve the detection performance.Second,the convolutional block attention module is deployed in FAMSR-Net to effectively extract features from images and construct multi-dimensional feature maps,enabling the neural network to perform precise object detection in various environments.Third,an effective bounding box optimization method called Scylla-IoU is deployed on FAMSR-Net,considering the intersection over union,center point distance,angle,and shape of the bounding box.This enables the detection of power facilities of various sizes accurately.Extensive experiments demonstrated that FAMSRNet outperforms other neural networks in detecting power facilities.FAMSR-Net also achieved the highest detection accuracy when virtual images of the transmission facilities were co-trained in the training phase.The proposed framework is effective for the scheduled operation and maintenance of transmission facilities because an optical camera is currently the most promising tool for unmanned aerial vehicles.This ultimately contributes to improved inspection efficiency,reduced maintenance risks,and more reliable power delivery across extensive transmission facilities.
基金supported by the National Natural Science Foundation of China (Nos.61806107 and 61702135)。
文摘We propose a hierarchical multi-scale attention mechanism-based model in response to the low accuracy and inefficient manual classification of existing oceanic biological image classification methods. Firstly, the hierarchical efficient multi-scale attention(H-EMA) module is designed for lightweight feature extraction, achieving outstanding performance at a relatively low cost. Secondly, an improved EfficientNetV2 block is used to integrate information from different scales better and enhance inter-layer message passing. Furthermore, introducing the convolutional block attention module(CBAM) enhances the model's perception of critical features, optimizing its generalization ability. Lastly, Focal Loss is introduced to adjust the weights of complex samples to address the issue of imbalanced categories in the dataset, further improving the model's performance. The model achieved 96.11% accuracy on the intertidal marine organism dataset of Nanji Islands and 84.78% accuracy on the CIFAR-100 dataset, demonstrating its strong generalization ability to meet the demands of oceanic biological image classification.
基金funded by the National Natural Science Foundation of China(grant number:62172292).
文摘Vehicle re-identification involves matching images of vehicles across varying camera views.The diversity of camera locations along different roadways leads to significant intra-class variation and only minimal inter-class similarity in the collected vehicle images,which increases the complexity of re-identification tasks.To tackle these challenges,this study proposes AG-GCN(Attention-Guided Graph Convolutional Network),a novel framework integrating several pivotal components.Initially,AG-GCN embeds a lightweight attention module within the ResNet-50 structure to learn feature weights automatically,thereby improving the representation of vehicle features globally by highlighting salient features and suppressing extraneous ones.Moreover,AG-GCN adopts a graph-based structure to encapsulate deep local features.A graph convolutional network then amalgamates these features to understand the relationships among vehicle-related characteristics.Subsequently,we amalgamate feature maps from both the attention and graph-based branches for a more comprehensive representation of vehicle features.The framework then gauges feature similarities and ranks them,thus enhancing the accuracy of vehicle re-identification.Comprehensive qualitative and quantitative analyses on two publicly available datasets verify the efficacy of AG-GCN in addressing intra-class and inter-class variability issues.
文摘Accurate vehicle detection is essential for autonomous driving,traffic monitoring,and intelligent transportation systems.This paper presents an enhanced YOLOv8n model that incorporates the Ghost Module,Convolutional Block Attention Module(CBAM),and Deformable Convolutional Networks v2(DCNv2).The Ghost Module streamlines feature generation to reduce redundancy,CBAM applies channel and spatial attention to improve feature focus,and DCNv2 enables adaptability to geometric variations in vehicle shapes.These components work together to improve both accuracy and computational efficiency.Evaluated on the KITTI dataset,the proposed model achieves 95.4%mAP@0.5—an 8.97% gain over standard YOLOv8n—along with 96.2% precision,93.7% recall,and a 94.93%F1-score.Comparative analysis with seven state-of-the-art detectors demonstrates consistent superiority in key performance metrics.An ablation study is also conducted to quantify the individual and combined contributions of GhostModule,CBAM,and DCNv2,highlighting their effectiveness in improving detection performance.By addressing feature redundancy,attention refinement,and spatial adaptability,the proposed model offers a robust and scalable solution for vehicle detection across diverse traffic scenarios.
基金supported by Qingdao Huanghai University School-Level ScientificResearch Project(2023KJ14)Undergraduate Teaching Reform Research Project of Shandong Provincial Department of Education(M2022328)+1 种基金National Natural Science Foundation of China under Grant(42472324)Qingdao Postdoctoral Foundation under Grant(QDBSH202402049).
文摘Multimodal image fusion plays an important role in image analysis and applications.Multimodal medical image fusion helps to combine contrast features from two or more input imaging modalities to represent fused information in a single image.One of the critical clinical applications of medical image fusion is to fuse anatomical and functional modalities for rapid diagnosis of malignant tissues.This paper proposes a multimodal medical image fusion network(MMIF-Net)based on multiscale hybrid attention.The method first decomposes the original image to obtain the low-rank and significant parts.Then,to utilize the features at different scales,we add amultiscalemechanism that uses three filters of different sizes to extract the features in the encoded network.Also,a hybrid attention module is introduced to obtain more image details.Finally,the fused images are reconstructed by decoding the network.We conducted experiments with clinical images from brain computed tomography/magnetic resonance.The experimental results show that the multimodal medical image fusion network method based on multiscale hybrid attention works better than other advanced fusion methods.
基金The work was supported by the National Key R&D Program of China(Grant No.2020YFC1511601)Fundamental Research Funds for the Central Universities(Grant No.2019SHFWLC01).
文摘Existing almost deep learning methods rely on a large amount of annotated data, so they are inappropriate for forest fire smoke detection with limited data. In this paper, a novel hybrid attention-based few-shot learning method, named Attention-Based Prototypical Network, is proposed for forest fire smoke detection. Specifically, feature extraction network, which consists of convolutional block attention module, could extract high-level and discriminative features and further decrease the false alarm rate resulting from suspected smoke areas. Moreover, we design a metalearning module to alleviate the overfitting issue caused by limited smoke images, and the meta-learning network enables achieving effective detection via comparing the distance between the class prototype of support images and the features of query images. A series of experiments on forest fire smoke datasets and miniImageNet dataset testify that the proposed method is superior to state-of-the-art few-shot learning approaches.