Audio-visual scene classification(AVSC)poses a formidable challenge owing to the intricate spatial-temporal relationships exhibited by audio-visual signals,coupled with the complex spatial patterns of objects and text...Audio-visual scene classification(AVSC)poses a formidable challenge owing to the intricate spatial-temporal relationships exhibited by audio-visual signals,coupled with the complex spatial patterns of objects and textures found in visual images.The focus of recent studies has predominantly revolved around extracting features from diverse neural network structures,inadvertently neglecting the acquisition of semantically meaningful regions and crucial components within audio-visual data.The authors present a feature pyramid attention network(FPANet)for audio-visual scene understanding,which extracts semantically significant characteristics from audio-visual data.The authors’approach builds multi-scale hierarchical features of sound spectrograms and visual images using a feature pyramid representation and localises the semantically relevant regions with a feature pyramid attention module(FPAM).A dimension alignment(DA)strategy is employed to align feature maps from multiple layers,a pyramid spatial attention(PSA)to spatially locate essential regions,and a pyramid channel attention(PCA)to pinpoint significant temporal frames.Experiments on visual scene classification(VSC),audio scene classification(ASC),and AVSC tasks demonstrate that FPANet achieves performance on par with state-of-the-art(SOTA)approaches,with a 95.9 F1-score on the ADVANCE dataset and a relative improvement of 28.8%.Visualisation results show that FPANet can prioritise semantically meaningful areas in audio-visual signals.展开更多
Due to their high mechanical compliance and excellent biocompatibility,conductive hydrogels exhibit significant potential for applications in flexible electronics.However,as the demand for high sensitivity,superior me...Due to their high mechanical compliance and excellent biocompatibility,conductive hydrogels exhibit significant potential for applications in flexible electronics.However,as the demand for high sensitivity,superior mechanical properties,and strong adhesion performance continues to grow,many conventional fabrication methods remain complex and costly.Herein,we propose a simple and efficient strategy to construct an entangled network hydrogel through a liquid-metal-induced cross-linking reaction,hydrogel demonstrates outstanding properties,including exceptional stretchability(1643%),high tensile strength(366.54 kPa),toughness(350.2 kJ m^(−3)),and relatively low mechanical hysteresis.The hydrogel exhibits long-term stable reusable adhesion(104 kPa),enabling conformal and stable adhesion to human skin.This capability allows it to effectively capture high-quality epidermal electrophysiological signals with high signal-to-noise ratio(25.2 dB)and low impedance(310 ohms).Furthermore,by integrating advanced machine learning algorithms,achieving an attention classification accuracy of 91.38%,which will significantly impact fields like education,healthcare,and artificial intelligence.展开更多
Convolutional neural networks(CNNs)-based medical image segmentation technologies have been widely used in medical image segmentation because of their strong representation and generalization abilities.However,due to ...Convolutional neural networks(CNNs)-based medical image segmentation technologies have been widely used in medical image segmentation because of their strong representation and generalization abilities.However,due to the inability to effectively capture global information from images,CNNs can easily lead to loss of contours and textures in segmentation results.Notice that the transformer model can effectively capture the properties of long-range dependencies in the image,and furthermore,combining the CNN and the transformer can effectively extract local details and global contextual features of the image.Motivated by this,we propose a multi-branch and multi-scale attention network(M2ANet)for medical image segmentation,whose architecture consists of three components.Specifically,in the first component,we construct an adaptive multi-branch patch module for parallel extraction of image features to reduce information loss caused by downsampling.In the second component,we apply residual block to the well-known convolutional block attention module to enhance the network’s ability to recognize important features of images and alleviate the phenomenon of gradient vanishing.In the third component,we design a multi-scale feature fusion module,in which we adopt adaptive average pooling and position encoding to enhance contextual features,and then multi-head attention is introduced to further enrich feature representation.Finally,we validate the effectiveness and feasibility of the proposed M2ANet method through comparative experiments on four benchmark medical image segmentation datasets,particularly in the context of preserving contours and textures.展开更多
Convolutional neural network(CNN)with the encoder-decoder structure is popular in medical image segmentation due to its excellent local feature extraction ability but it faces limitations in capturing the global featu...Convolutional neural network(CNN)with the encoder-decoder structure is popular in medical image segmentation due to its excellent local feature extraction ability but it faces limitations in capturing the global feature.The transformer can extract the global information well but adapting it to small medical datasets is challenging and its computational complexity can be heavy.In this work,a serial and parallel network is proposed for the accurate 3D medical image segmentation by combining CNN and transformer and promoting feature interactions across various semantic levels.The core components of the proposed method include the cross window self-attention based transformer(CWST)and multi-scale local enhanced(MLE)modules.The CWST module enhances the global context understanding by partitioning 3D images into non-overlapping windows and calculating sparse global attention between windows.The MLE module selectively fuses features by computing the voxel attention between different branch features,and uses convolution to strengthen the dense local information.The experiments on the prostate,atrium,and pancreas MR/CT image datasets consistently demonstrate the advantage of the proposed method over six popular segmentation models in both qualitative evaluation and quantitative indexes such as dice similarity coefficient,Intersection over Union,95%Hausdorff distance and average symmetric surface distance.展开更多
In this paper,we propose hierarchical attention dual network(DNet)for fine-grained image classification.The DNet can randomly select pairs of inputs from the dataset and compare the differences between them through hi...In this paper,we propose hierarchical attention dual network(DNet)for fine-grained image classification.The DNet can randomly select pairs of inputs from the dataset and compare the differences between them through hierarchical attention feature learning,which are used simultaneously to remove noise and retain salient features.In the loss function,it considers the losses of difference in paired images according to the intra-variance and inter-variance.In addition,we also collect the disaster scene dataset from remote sensing images and apply the proposed method to disaster scene classification,which contains complex scenes and multiple types of disasters.Compared to other methods,experimental results show that the DNet with hierarchical attention is robust to different datasets and performs better.展开更多
Aiming at the problems of low detection accuracy and large model size of existing object detection algorithms applied to complex road scenes,an improved you only look once version 8(YOLOv8)object detection algorithm f...Aiming at the problems of low detection accuracy and large model size of existing object detection algorithms applied to complex road scenes,an improved you only look once version 8(YOLOv8)object detection algorithm for infrared images,F-YOLOv8,is proposed.First,a spatial-to-depth network replaces the traditional backbone network's strided convolution or pooling layer.At the same time,it combines with the channel attention mechanism so that the neural network focuses on the channels with large weight values to better extract low-resolution image feature information;then an improved feature pyramid network of lightweight bidirectional feature pyramid network(L-BiFPN)is proposed,which can efficiently fuse features of different scales.In addition,a loss function of insertion of union based on the minimum point distance(MPDIoU)is introduced for bounding box regression,which obtains faster convergence speed and more accurate regression results.Experimental results on the FLIR dataset show that the improved algorithm can accurately detect infrared road targets in real time with 3%and 2.2%enhancement in mean average precision at 50%IoU(mAP50)and mean average precision at 50%—95%IoU(mAP50-95),respectively,and 38.1%,37.3%and 16.9%reduction in the number of model parameters,the model weight,and floating-point operations per second(FLOPs),respectively.To further demonstrate the detection capability of the improved algorithm,it is tested on the public dataset PASCAL VOC,and the results show that F-YOLO has excellent generalized detection performance.展开更多
Infrared imaging technology has been widely adopted in various fields,such as military reconnaissance,medical diagnosis,and security monitoring,due to its excellent ability to penetrate smoke and fog.However,the preva...Infrared imaging technology has been widely adopted in various fields,such as military reconnaissance,medical diagnosis,and security monitoring,due to its excellent ability to penetrate smoke and fog.However,the prevalent low resolution of infrared images severely limits the accurate interpretation of their contents.In addition,deploying super-resolution models on resource-constrained devices faces significant challenges.To address these issues,this study proposes a lightweight super-resolution network for infrared images based on an adaptive attention mechanism.The network’s dynamic weighting module automatically adjusts the weights of the attention and nonattention branch outputs based on the network’s characteristics at different levels.Among them,the attention branch is further subdivided into pixel attention and brightness-texture attention,which are specialized for extracting the most informative features in infrared images.Meanwhile,the non-attention branch supplements the extraction of those neglected features to enhance the comprehensiveness of the features.Through ablation experiments,we verify the effectiveness of the proposed module.Finally,through experiments on two datasets,FLIR and Thermal101,qualitative and quantitative results demonstrate that the model can effectively recover high-frequency details of infrared images and significantly improve image resolution.In detail,compared with the suboptimal method,we have reduced the number of parameters by 30%and improved the model performance.When the scale factor is 2,the peak signal-tonoise ratio of the test datasets FLIR and Thermal101 is improved by 0.09 and 0.15 dB,respectively.When the scale factor is 4,it is improved by 0.05 and 0.09 dB,respectively.In addition,due to the lightweight design of the network structure,it has a low computational cost.It is suitable for deployment on edge devices,thus effectively enhancing the sensing performance of infrared imaging devices.展开更多
The growing incidence of cyberattacks necessitates a robust and effective Intrusion Detection Systems(IDS)for enhanced network security.While conventional IDSs can be unsuitable for detecting different and emerging at...The growing incidence of cyberattacks necessitates a robust and effective Intrusion Detection Systems(IDS)for enhanced network security.While conventional IDSs can be unsuitable for detecting different and emerging attacks,there is a demand for better techniques to improve detection reliability.This study introduces a new method,the Deep Adaptive Multi-Layer Attention Network(DAMLAN),to boost the result of intrusion detection on network data.Due to its multi-scale attention mechanisms and graph features,DAMLAN aims to address both known and unknown intrusions.The real-world NSL-KDD dataset,a popular choice among IDS researchers,is used to assess the proposed model.There are 67,343 normal samples and 58,630 intrusion attacks in the training set,12,833 normal samples,and 9711 intrusion attacks in the test set.Thus,the proposed DAMLAN method is more effective than the standard models due to the consideration of patterns by the attention layers.The experimental performance of the proposed model demonstrates that it achieves 99.26%training accuracy and 90.68%testing accuracy,with precision reaching 98.54%on the training set and 96.64%on the testing set.The recall and F1 scores again support the model with training set values of 99.90%and 99.21%and testing set values of 86.65%and 91.37%.These results provide a strong basis for the claims made regarding the model’s potential to identify intrusion attacks and affirm its relatively strong overall performance,irrespective of type.Future work would employ more attempts to extend the scalability and applicability of DAMLAN for real-time use in intrusion detection systems.展开更多
Precise traffic flow forecasting is essential for mitigating urban traffic congestion.However,it is difficult for existing methods to adequately capture the dynamic spatio-temporal characteristics and multiscale tempo...Precise traffic flow forecasting is essential for mitigating urban traffic congestion.However,it is difficult for existing methods to adequately capture the dynamic spatio-temporal characteristics and multiscale temporal dependencies of traffic flow.A traffic flow prediction model with multiscale temporal awareness and graph diffusion attention networks(MT-GDAN)is proposed to address these issues.Specifically,a graph diffusion attention module is constructed,which dynamically adjusts and calculates the weights of neighboring nodes in the graph structure using a random graph attention network(GAT)and captures the spatial characteristics of hidden nodes through an adaptive adjacency matrix,thus better exploiting the dynamic spatio-temporal properties of traffic flow.Secondly,a multiscale isometric convolutional network and bi-level routing attention are used to construct a multiscale temporal awareness module.The former extracts local information of traffic flow segments by convolution with different sizes of convolution kernels and then introduces isometric convolution to obtain the global temporal relationship between local features of traffic flow segments;the latter filters irrelevant spatio-temporal features at a coarse regional level and focuses locally on key points to more accurately capture the multiscale temporal dependencies of traffic flows.Experimental results reveal that the MT-GDAN model surpasses the mainstream baseline model in terms of forecasting accuracy and exhibits good prediction performance.展开更多
Rapid,accurate seed classification of soybean varieties is needed for product quality control.We describe a hyperspectral image-based deep-learning model called Dual Attention Feature Fusion Networks(DAFFnet),which se...Rapid,accurate seed classification of soybean varieties is needed for product quality control.We describe a hyperspectral image-based deep-learning model called Dual Attention Feature Fusion Networks(DAFFnet),which sequentially applies 3D Convolutional Neural Network(CNN)and 2D CNN.A fusion attention mechanism module in 2D CNN permits the model to capture local and global feature information by combining with Convolution Block Attention Module(CBAM)and Mobile Vision Transformer(MViT),outperforming conventional hyperspectral image classification models in seed classification.展开更多
Microseismic monitoring technology is widely used in tunnel and coal mine safety production.For signals generated by ultra-weak microseismic events,traditional sensors encounter limitations in terms of detection sensi...Microseismic monitoring technology is widely used in tunnel and coal mine safety production.For signals generated by ultra-weak microseismic events,traditional sensors encounter limitations in terms of detection sensitivity.Given the complex engineering environment,automatic multi-classification of microseismic data is highly required.In this study,we use acceleration sensors to collect signals and combine the improved Visual Geometry Group with a convolutional block attention module to obtain a new network structure,termed CNN_BAM,for automatic classification and identification of microseismic events.We use the dataset collected from the Hanjiang-to-Weihe River Diversion Project to train and validate the network model.Results show that the CNN_BAM model exhibits good feature extraction ability,achieving a recognition accuracy of 99.29%,surpassing all its counterparts.The stability and accuracy of the classification algorithm improve remarkably.In addition,through fine-tuning and migration to the Pan Ⅱ Mine Project,the network demonstrates reliable generalization performance.This outcome reflects its adaptability across different projects and promising application prospects.展开更多
Dear Editor,This letter proposes the graph tensor alliance attention network(GT-A^(2)T)to represent a dynamic graph(DG)precisely.Its main idea includes 1)Establishing a unified spatio-temporal message propagation fram...Dear Editor,This letter proposes the graph tensor alliance attention network(GT-A^(2)T)to represent a dynamic graph(DG)precisely.Its main idea includes 1)Establishing a unified spatio-temporal message propagation framework on a DG via the tensor product for capturing the complex cohesive spatio-temporal interdependencies precisely and 2)Acquiring the alliance attention scores by node features and favorable high-order structural correlations.展开更多
Hyperspectral image(HSI)classification is crucial for numerous remote sensing applications.Traditional deep learning methods may miss pixel relationships and context,leading to inefficiencies.This paper introduces the...Hyperspectral image(HSI)classification is crucial for numerous remote sensing applications.Traditional deep learning methods may miss pixel relationships and context,leading to inefficiencies.This paper introduces the spectral band graph convolutional and attention-enhanced CNN joint network(SGCCN),a novel approach that harnesses the power of spectral band graph convolutions for capturing long-range relationships,utilizes local perception of attention-enhanced multi-level convolutions for local spatial feature and employs a dynamic attention mechanism to enhance feature extraction.The SGCCN integrates spectral and spatial features through a self-attention fusion network,significantly improving classification accuracy and efficiency.The proposed method outperforms existing techniques,demonstrating its effectiveness in handling the challenges associated with HSI data.展开更多
In remote sensing imagery,approximately 67%of the data are affected by cloud cover,significantly increasing the difficulty of image classification,recognition,and other downstream interpretation tasks.To effectively a...In remote sensing imagery,approximately 67%of the data are affected by cloud cover,significantly increasing the difficulty of image classification,recognition,and other downstream interpretation tasks.To effectively address the randomness of cloud distribution and the non-uniformity of cloud thickness,we propose a coarse-to-fine thin cloud removal architecture based on the observations of the random distribution and uneven thickness of cloud.In the coarse-level declouding network,we innovatively introduce a multi-scale attention mechanism,i.e.,pyramid nonlocal attention(PNA).By integrating global context with local detail information,it specifically addresses image quality degradation caused by the uncertainty in cloud distribution.During the fine-level declouding stage,we focus on the impact of cloud thickness on declouding results(primarily manifested as insufficient detail information).Through a carefully designed residual dense module,we significantly enhance the extraction and utilization of feature details.Thus,our approach precisely restores lost local texture features on top of coarse-level results,achieving a substantial leap in declouding quality.To evaluate the effectiveness of our cloud removal technology and attention mechanism,we conducted comprehensive analyses on publicly available datasets.Results demonstrate that our method achieves state-of-the-art performance across a wide range of techniques.展开更多
Pest detection techniques are helpful in reducing the frequency and scale of pest outbreaks;however,their application in the actual agricultural production process is still challenging owing to the problems of intersp...Pest detection techniques are helpful in reducing the frequency and scale of pest outbreaks;however,their application in the actual agricultural production process is still challenging owing to the problems of interspecies similarity,multi-scale,and background complexity of pests.To address these problems,this study proposes an FD-YOLO pest target detection model.The FD-YOLO model uses a Fully Connected Feature Pyramid Network(FC-FPN)instead of a PANet in the neck,which can adaptively fuse multi-scale information so that the model can retain small-scale target features in the deep layer,enhance large-scale target features in the shallow layer,and enhance the multiplexing of effective features.A dual self-attention module(DSA)is then embedded in the C3 module of the neck,which captures the dependencies between the information in both spatial and channel dimensions,effectively enhancing global features.We selected 16 types of pests that widely damage field crops in the IP102 pest dataset,which were used as our dataset after data supplementation and enhancement.The experimental results showed that FD-YOLO’s mAP@0.5 improved by 6.8%compared to YOLOv5,reaching 82.6%and 19.1%–5%better than other state-of-the-art models.This method provides an effective new approach for detecting similar or multiscale pests in field crops.展开更多
Abnormal network traffic, as a frequent security risk, requires a series of techniques to categorize and detect it. Existing network traffic anomaly detection still faces challenges: the inability to fully extract loc...Abnormal network traffic, as a frequent security risk, requires a series of techniques to categorize and detect it. Existing network traffic anomaly detection still faces challenges: the inability to fully extract local and global features, as well as the lack of effective mechanisms to capture complex interactions between features;Additionally, when increasing the receptive field to obtain deeper feature representations, the reliance on increasing network depth leads to a significant increase in computational resource consumption, affecting the efficiency and performance of detection. Based on these issues, firstly, this paper proposes a network traffic anomaly detection model based on parallel dilated convolution and residual learning (Res-PDC). To better explore the interactive relationships between features, the traffic samples are converted into two-dimensional matrix. A module combining parallel dilated convolutions and residual learning (res-pdc) was designed to extract local and global features of traffic at different scales. By utilizing res-pdc modules with different dilation rates, we can effectively capture spatial features at different scales and explore feature dependencies spanning wider regions without increasing computational resources. Secondly, to focus and integrate the information in different feature subspaces, further enhance and extract the interactions among the features, multi-head attention is added to Res-PDC, resulting in the final model: multi-head attention enhanced parallel dilated convolution and residual learning (MHA-Res-PDC) for network traffic anomaly detection. Finally, comparisons with other machine learning and deep learning algorithms are conducted on the NSL-KDD and CIC-IDS-2018 datasets. The experimental results demonstrate that the proposed method in this paper can effectively improve the detection performance.展开更多
Counterfeit agricultural products pose a significant challenge to global food security and economic stability, necessitating advanced detection mechanisms to ensure authenticity and quality. To address this pressing i...Counterfeit agricultural products pose a significant challenge to global food security and economic stability, necessitating advanced detection mechanisms to ensure authenticity and quality. To address this pressing issue, we introduce iGFruit, an innovative model designed to enhance the detection of counterfeit agricultural products by integrating multimodal data processing. Our approach utilizes both image and text data for comprehensive feature extraction, employing advanced backbone models such as Vision Transformer (ViT), Normalizer-Free Network (NFNet), and Bidirectional Encoder Representations from Transformers (BERT). These extracted features are fused and processed using a Graph Attention Network (GAT) to capture intricate relationships within the multimodal data. The resulting fused representation is subsequently classified to detect counterfeit products with high precision. We validate the effectiveness of iGFruit through extensive experiments on two datasets: the publicly available MIT-States dataset and the proprietary TLU-States dataset, achieving state-of-the-art performance on both benchmarks. Specifically, iGFruit demonstrates an improvement of over 3% in average accuracy compared to baseline models, all while maintaining computational efficiency during inference. This work underscores the necessity and innovativeness of integrating graph-based feature learning to tackle the critical issue of counterfeit agricultural product detection.展开更多
Landslide hazard detection is a prevalent problem in remote sensing studies,particularly with the technological advancement of computer vision.With the continuous and exceptional growth of the computational environmen...Landslide hazard detection is a prevalent problem in remote sensing studies,particularly with the technological advancement of computer vision.With the continuous and exceptional growth of the computational environment,the manual and partially automated procedure of landslide detection from remotely sensed images has shifted toward automatic methods with deep learning.Furthermore,attention models,driven by human visual procedures,have become vital in natural hazard-related studies.Hence,this paper proposes an enhanced YOLOv5(You Only Look Once version 5)network for improved satellite-based landslide detection,embedded with two popular attention modules:CBAM(Convolutional Block Attention Module)and ECA(Efficient Channel Attention).These attention mechanisms are incorporated into the backbone and neck of the YOLOv5 architecture,distinctly,and evaluated across three YOLOv5 variants:nano(n),small(s),and medium(m).The experiments use opensource satellite images from three distinct regions with complex terrain.The standard metrics,including F-score,precision,recall,and mean average precision(mAP),are computed for quantitative assessment.The YOLOv5n+CBAM demonstrates the most optimal results with an F-score of 77.2%,confirming its effectiveness.The suggested attention-driven architecture augments detection accuracy,supporting post-landslide event assessment and recovery.展开更多
Anomaly fluctuations in operating conditions, catalyst wear, crushing, and the deterioration of feedstock properties in fluid catalytic cracking (FCC) units can disrupt the normal circulating fluidization process of t...Anomaly fluctuations in operating conditions, catalyst wear, crushing, and the deterioration of feedstock properties in fluid catalytic cracking (FCC) units can disrupt the normal circulating fluidization process of the catalyst. Although several effective models have been proposed in previous research to address anomaly detection in chemical processes, most fail to adequately capture the spatial-temporal dependencies of multi-source, mixed-frequency information. In this study, an innovative multi-source mixed-frequency information fusion framework based on a spatial-temporal graph attention network (MIF-STGAT) is proposed to investigate the causes of FCC regenerator catalyst loss anomalies for guide onsite operational management, enhancing the long-term stability of FCC unit operations. First, a reconstruction-based dual-encoder-decoder framework is developed to facilitate the acquisition of mixed-frequency features and information fusion during the FCC regenerator catalyst loss process. Subsequently, a graph attention network and a multilayer long short-term memory network with a differential structure are integrated into the reconstruction-based dual-encoder-shared-decoder framework to capture the dynamic fluctuations and critical features associated with anomalies. Experimental results from the Chinese FCC industrial process demonstrate that MIF-STGAT achieves excellent accuracy and interpretability for anomaly detection.展开更多
基金Shenzhen Institute of Artificial Intelligence and Robotics for Society,Grant/Award Number:AC01202201003-02GuangDong Basic and Applied Basic Research Foundation,Grant/Award Number:2024A1515010252Longgang District Shenzhen's“Ten Action Plan”for Supporting Innovation Projects,Grant/Award Number:LGKCSDPT2024002。
文摘Audio-visual scene classification(AVSC)poses a formidable challenge owing to the intricate spatial-temporal relationships exhibited by audio-visual signals,coupled with the complex spatial patterns of objects and textures found in visual images.The focus of recent studies has predominantly revolved around extracting features from diverse neural network structures,inadvertently neglecting the acquisition of semantically meaningful regions and crucial components within audio-visual data.The authors present a feature pyramid attention network(FPANet)for audio-visual scene understanding,which extracts semantically significant characteristics from audio-visual data.The authors’approach builds multi-scale hierarchical features of sound spectrograms and visual images using a feature pyramid representation and localises the semantically relevant regions with a feature pyramid attention module(FPAM).A dimension alignment(DA)strategy is employed to align feature maps from multiple layers,a pyramid spatial attention(PSA)to spatially locate essential regions,and a pyramid channel attention(PCA)to pinpoint significant temporal frames.Experiments on visual scene classification(VSC),audio scene classification(ASC),and AVSC tasks demonstrate that FPANet achieves performance on par with state-of-the-art(SOTA)approaches,with a 95.9 F1-score on the ADVANCE dataset and a relative improvement of 28.8%.Visualisation results show that FPANet can prioritise semantically meaningful areas in audio-visual signals.
基金supported by the National Key Research&Development Program of China(grant no.2022YFC3500503)the National Natural Science Foundation of China(grant nos.62227807,12374171,12004034,62402041)+2 种基金the Beijing Institute of Technology Research Fund Program for Young Scholars,Chinathe Fundamental Research Funds for the Central Universities(grant nos.2024CX06060)Beijing Youth Talent Lifting Project.
文摘Due to their high mechanical compliance and excellent biocompatibility,conductive hydrogels exhibit significant potential for applications in flexible electronics.However,as the demand for high sensitivity,superior mechanical properties,and strong adhesion performance continues to grow,many conventional fabrication methods remain complex and costly.Herein,we propose a simple and efficient strategy to construct an entangled network hydrogel through a liquid-metal-induced cross-linking reaction,hydrogel demonstrates outstanding properties,including exceptional stretchability(1643%),high tensile strength(366.54 kPa),toughness(350.2 kJ m^(−3)),and relatively low mechanical hysteresis.The hydrogel exhibits long-term stable reusable adhesion(104 kPa),enabling conformal and stable adhesion to human skin.This capability allows it to effectively capture high-quality epidermal electrophysiological signals with high signal-to-noise ratio(25.2 dB)and low impedance(310 ohms).Furthermore,by integrating advanced machine learning algorithms,achieving an attention classification accuracy of 91.38%,which will significantly impact fields like education,healthcare,and artificial intelligence.
基金supported by the Natural Science Foundation of the Anhui Higher Education Institutions of China(Grant Nos.2023AH040149 and 2024AH051915)the Anhui Provincial Natural Science Foundation(Grant No.2208085MF168)+1 种基金the Science and Technology Innovation Tackle Plan Project of Maanshan(Grant No.2024RGZN001)the Scientific Research Fund Project of Anhui Medical University(Grant No.2023xkj122).
文摘Convolutional neural networks(CNNs)-based medical image segmentation technologies have been widely used in medical image segmentation because of their strong representation and generalization abilities.However,due to the inability to effectively capture global information from images,CNNs can easily lead to loss of contours and textures in segmentation results.Notice that the transformer model can effectively capture the properties of long-range dependencies in the image,and furthermore,combining the CNN and the transformer can effectively extract local details and global contextual features of the image.Motivated by this,we propose a multi-branch and multi-scale attention network(M2ANet)for medical image segmentation,whose architecture consists of three components.Specifically,in the first component,we construct an adaptive multi-branch patch module for parallel extraction of image features to reduce information loss caused by downsampling.In the second component,we apply residual block to the well-known convolutional block attention module to enhance the network’s ability to recognize important features of images and alleviate the phenomenon of gradient vanishing.In the third component,we design a multi-scale feature fusion module,in which we adopt adaptive average pooling and position encoding to enhance contextual features,and then multi-head attention is introduced to further enrich feature representation.Finally,we validate the effectiveness and feasibility of the proposed M2ANet method through comparative experiments on four benchmark medical image segmentation datasets,particularly in the context of preserving contours and textures.
基金National Key Research and Development Program of China,Grant/Award Number:2018YFE0206900China Postdoctoral Science Foundation,Grant/Award Number:2023M731204+2 种基金The Open Project of Key Laboratory for Quality Evaluation of Ultrasound Surgical Equipment of National Medical Products Administration,Grant/Award Number:SMDTKL-2023-1-01The Hubei Province Key Research and Development Project,Grant/Award Number:2023BCB007CAAI-Huawei MindSpore Open Fund。
文摘Convolutional neural network(CNN)with the encoder-decoder structure is popular in medical image segmentation due to its excellent local feature extraction ability but it faces limitations in capturing the global feature.The transformer can extract the global information well but adapting it to small medical datasets is challenging and its computational complexity can be heavy.In this work,a serial and parallel network is proposed for the accurate 3D medical image segmentation by combining CNN and transformer and promoting feature interactions across various semantic levels.The core components of the proposed method include the cross window self-attention based transformer(CWST)and multi-scale local enhanced(MLE)modules.The CWST module enhances the global context understanding by partitioning 3D images into non-overlapping windows and calculating sparse global attention between windows.The MLE module selectively fuses features by computing the voxel attention between different branch features,and uses convolution to strengthen the dense local information.The experiments on the prostate,atrium,and pancreas MR/CT image datasets consistently demonstrate the advantage of the proposed method over six popular segmentation models in both qualitative evaluation and quantitative indexes such as dice similarity coefficient,Intersection over Union,95%Hausdorff distance and average symmetric surface distance.
基金Supported by the National Natural Science Foundation of China(61601176)。
文摘In this paper,we propose hierarchical attention dual network(DNet)for fine-grained image classification.The DNet can randomly select pairs of inputs from the dataset and compare the differences between them through hierarchical attention feature learning,which are used simultaneously to remove noise and retain salient features.In the loss function,it considers the losses of difference in paired images according to the intra-variance and inter-variance.In addition,we also collect the disaster scene dataset from remote sensing images and apply the proposed method to disaster scene classification,which contains complex scenes and multiple types of disasters.Compared to other methods,experimental results show that the DNet with hierarchical attention is robust to different datasets and performs better.
基金supported by the National Natural Science Foundation of China(No.62103298)。
文摘Aiming at the problems of low detection accuracy and large model size of existing object detection algorithms applied to complex road scenes,an improved you only look once version 8(YOLOv8)object detection algorithm for infrared images,F-YOLOv8,is proposed.First,a spatial-to-depth network replaces the traditional backbone network's strided convolution or pooling layer.At the same time,it combines with the channel attention mechanism so that the neural network focuses on the channels with large weight values to better extract low-resolution image feature information;then an improved feature pyramid network of lightweight bidirectional feature pyramid network(L-BiFPN)is proposed,which can efficiently fuse features of different scales.In addition,a loss function of insertion of union based on the minimum point distance(MPDIoU)is introduced for bounding box regression,which obtains faster convergence speed and more accurate regression results.Experimental results on the FLIR dataset show that the improved algorithm can accurately detect infrared road targets in real time with 3%and 2.2%enhancement in mean average precision at 50%IoU(mAP50)and mean average precision at 50%—95%IoU(mAP50-95),respectively,and 38.1%,37.3%and 16.9%reduction in the number of model parameters,the model weight,and floating-point operations per second(FLOPs),respectively.To further demonstrate the detection capability of the improved algorithm,it is tested on the public dataset PASCAL VOC,and the results show that F-YOLO has excellent generalized detection performance.
基金funded in part by theHenan ProvinceKeyR&DProgramProject,“Research and Application Demonstration of Class Ⅱ Superlattice Medium Wave High Temperature Infrared Detector Technology”under Grant No.231111210400.
文摘Infrared imaging technology has been widely adopted in various fields,such as military reconnaissance,medical diagnosis,and security monitoring,due to its excellent ability to penetrate smoke and fog.However,the prevalent low resolution of infrared images severely limits the accurate interpretation of their contents.In addition,deploying super-resolution models on resource-constrained devices faces significant challenges.To address these issues,this study proposes a lightweight super-resolution network for infrared images based on an adaptive attention mechanism.The network’s dynamic weighting module automatically adjusts the weights of the attention and nonattention branch outputs based on the network’s characteristics at different levels.Among them,the attention branch is further subdivided into pixel attention and brightness-texture attention,which are specialized for extracting the most informative features in infrared images.Meanwhile,the non-attention branch supplements the extraction of those neglected features to enhance the comprehensiveness of the features.Through ablation experiments,we verify the effectiveness of the proposed module.Finally,through experiments on two datasets,FLIR and Thermal101,qualitative and quantitative results demonstrate that the model can effectively recover high-frequency details of infrared images and significantly improve image resolution.In detail,compared with the suboptimal method,we have reduced the number of parameters by 30%and improved the model performance.When the scale factor is 2,the peak signal-tonoise ratio of the test datasets FLIR and Thermal101 is improved by 0.09 and 0.15 dB,respectively.When the scale factor is 4,it is improved by 0.05 and 0.09 dB,respectively.In addition,due to the lightweight design of the network structure,it has a low computational cost.It is suitable for deployment on edge devices,thus effectively enhancing the sensing performance of infrared imaging devices.
基金Nourah bint Abdulrahman University for funding this project through the Researchers Supporting Project(PNURSP2025R319)Riyadh,Saudi Arabia and Prince Sultan University for covering the article processing charges(APC)associated with this publication.Special acknowledgement to Automated Systems&Soft Computing Lab(ASSCL),Prince Sultan University,Riyadh,Saudi Arabia.
文摘The growing incidence of cyberattacks necessitates a robust and effective Intrusion Detection Systems(IDS)for enhanced network security.While conventional IDSs can be unsuitable for detecting different and emerging attacks,there is a demand for better techniques to improve detection reliability.This study introduces a new method,the Deep Adaptive Multi-Layer Attention Network(DAMLAN),to boost the result of intrusion detection on network data.Due to its multi-scale attention mechanisms and graph features,DAMLAN aims to address both known and unknown intrusions.The real-world NSL-KDD dataset,a popular choice among IDS researchers,is used to assess the proposed model.There are 67,343 normal samples and 58,630 intrusion attacks in the training set,12,833 normal samples,and 9711 intrusion attacks in the test set.Thus,the proposed DAMLAN method is more effective than the standard models due to the consideration of patterns by the attention layers.The experimental performance of the proposed model demonstrates that it achieves 99.26%training accuracy and 90.68%testing accuracy,with precision reaching 98.54%on the training set and 96.64%on the testing set.The recall and F1 scores again support the model with training set values of 99.90%and 99.21%and testing set values of 86.65%and 91.37%.These results provide a strong basis for the claims made regarding the model’s potential to identify intrusion attacks and affirm its relatively strong overall performance,irrespective of type.Future work would employ more attempts to extend the scalability and applicability of DAMLAN for real-time use in intrusion detection systems.
基金Supported by the by Key R&D Program of Gansu Province(No.23YFGA0063)the Key Talent Project of Gansu Province(No.2024RCXM57,2024RCXM22)the Major Science and Technology Special Program of Gansu Province(No.25ZYJA037).
文摘Precise traffic flow forecasting is essential for mitigating urban traffic congestion.However,it is difficult for existing methods to adequately capture the dynamic spatio-temporal characteristics and multiscale temporal dependencies of traffic flow.A traffic flow prediction model with multiscale temporal awareness and graph diffusion attention networks(MT-GDAN)is proposed to address these issues.Specifically,a graph diffusion attention module is constructed,which dynamically adjusts and calculates the weights of neighboring nodes in the graph structure using a random graph attention network(GAT)and captures the spatial characteristics of hidden nodes through an adaptive adjacency matrix,thus better exploiting the dynamic spatio-temporal properties of traffic flow.Secondly,a multiscale isometric convolutional network and bi-level routing attention are used to construct a multiscale temporal awareness module.The former extracts local information of traffic flow segments by convolution with different sizes of convolution kernels and then introduces isometric convolution to obtain the global temporal relationship between local features of traffic flow segments;the latter filters irrelevant spatio-temporal features at a coarse regional level and focuses locally on key points to more accurately capture the multiscale temporal dependencies of traffic flows.Experimental results reveal that the MT-GDAN model surpasses the mainstream baseline model in terms of forecasting accuracy and exhibits good prediction performance.
基金supported by Natural Science Foundation of Heilongjiang Province of China(SS2021C005)Province Key Research and Development Program of Heilongjiang Province of China(GZ20220121)the Agricultural Science and Technology Innovation Program of the Chinese Academy of Agricultural Sciences.
文摘Rapid,accurate seed classification of soybean varieties is needed for product quality control.We describe a hyperspectral image-based deep-learning model called Dual Attention Feature Fusion Networks(DAFFnet),which sequentially applies 3D Convolutional Neural Network(CNN)and 2D CNN.A fusion attention mechanism module in 2D CNN permits the model to capture local and global feature information by combining with Convolution Block Attention Module(CBAM)and Mobile Vision Transformer(MViT),outperforming conventional hyperspectral image classification models in seed classification.
基金supported by the Key Research and Development Plan of Anhui Province(202104a05020059)the Excellent Scientific Research and Innovation Team of Anhui Province(2022AH010003)support from Hefei Comprehensive National Science Center is highly appreciated.
文摘Microseismic monitoring technology is widely used in tunnel and coal mine safety production.For signals generated by ultra-weak microseismic events,traditional sensors encounter limitations in terms of detection sensitivity.Given the complex engineering environment,automatic multi-classification of microseismic data is highly required.In this study,we use acceleration sensors to collect signals and combine the improved Visual Geometry Group with a convolutional block attention module to obtain a new network structure,termed CNN_BAM,for automatic classification and identification of microseismic events.We use the dataset collected from the Hanjiang-to-Weihe River Diversion Project to train and validate the network model.Results show that the CNN_BAM model exhibits good feature extraction ability,achieving a recognition accuracy of 99.29%,surpassing all its counterparts.The stability and accuracy of the classification algorithm improve remarkably.In addition,through fine-tuning and migration to the Pan Ⅱ Mine Project,the network demonstrates reliable generalization performance.This outcome reflects its adaptability across different projects and promising application prospects.
基金supported in part by the National Natural Science Foundation of China(62372385).
文摘Dear Editor,This letter proposes the graph tensor alliance attention network(GT-A^(2)T)to represent a dynamic graph(DG)precisely.Its main idea includes 1)Establishing a unified spatio-temporal message propagation framework on a DG via the tensor product for capturing the complex cohesive spatio-temporal interdependencies precisely and 2)Acquiring the alliance attention scores by node features and favorable high-order structural correlations.
基金supported in part by the National Natural Science Foundations of China(No.61801214)the Postgraduate Research Practice Innovation Program of NUAA(No.xcxjh20231504)。
文摘Hyperspectral image(HSI)classification is crucial for numerous remote sensing applications.Traditional deep learning methods may miss pixel relationships and context,leading to inefficiencies.This paper introduces the spectral band graph convolutional and attention-enhanced CNN joint network(SGCCN),a novel approach that harnesses the power of spectral band graph convolutions for capturing long-range relationships,utilizes local perception of attention-enhanced multi-level convolutions for local spatial feature and employs a dynamic attention mechanism to enhance feature extraction.The SGCCN integrates spectral and spatial features through a self-attention fusion network,significantly improving classification accuracy and efficiency.The proposed method outperforms existing techniques,demonstrating its effectiveness in handling the challenges associated with HSI data.
基金supported by the Fundamental Research Funds for the Central Universities(No.2572025BR14)the China Energy Digital Intelligence Technology Development(Beijing)Co.,Ltd.Science and Technology Innovation Project(No.YA2024001500).
文摘In remote sensing imagery,approximately 67%of the data are affected by cloud cover,significantly increasing the difficulty of image classification,recognition,and other downstream interpretation tasks.To effectively address the randomness of cloud distribution and the non-uniformity of cloud thickness,we propose a coarse-to-fine thin cloud removal architecture based on the observations of the random distribution and uneven thickness of cloud.In the coarse-level declouding network,we innovatively introduce a multi-scale attention mechanism,i.e.,pyramid nonlocal attention(PNA).By integrating global context with local detail information,it specifically addresses image quality degradation caused by the uncertainty in cloud distribution.During the fine-level declouding stage,we focus on the impact of cloud thickness on declouding results(primarily manifested as insufficient detail information).Through a carefully designed residual dense module,we significantly enhance the extraction and utilization of feature details.Thus,our approach precisely restores lost local texture features on top of coarse-level results,achieving a substantial leap in declouding quality.To evaluate the effectiveness of our cloud removal technology and attention mechanism,we conducted comprehensive analyses on publicly available datasets.Results demonstrate that our method achieves state-of-the-art performance across a wide range of techniques.
基金funded by Liaoning Provincial Department of Education Project,Award number JYTMS20230418.
文摘Pest detection techniques are helpful in reducing the frequency and scale of pest outbreaks;however,their application in the actual agricultural production process is still challenging owing to the problems of interspecies similarity,multi-scale,and background complexity of pests.To address these problems,this study proposes an FD-YOLO pest target detection model.The FD-YOLO model uses a Fully Connected Feature Pyramid Network(FC-FPN)instead of a PANet in the neck,which can adaptively fuse multi-scale information so that the model can retain small-scale target features in the deep layer,enhance large-scale target features in the shallow layer,and enhance the multiplexing of effective features.A dual self-attention module(DSA)is then embedded in the C3 module of the neck,which captures the dependencies between the information in both spatial and channel dimensions,effectively enhancing global features.We selected 16 types of pests that widely damage field crops in the IP102 pest dataset,which were used as our dataset after data supplementation and enhancement.The experimental results showed that FD-YOLO’s mAP@0.5 improved by 6.8%compared to YOLOv5,reaching 82.6%and 19.1%–5%better than other state-of-the-art models.This method provides an effective new approach for detecting similar or multiscale pests in field crops.
基金supported by the Xiamen Science and Technology Subsidy Project(No.2023CXY0318).
文摘Abnormal network traffic, as a frequent security risk, requires a series of techniques to categorize and detect it. Existing network traffic anomaly detection still faces challenges: the inability to fully extract local and global features, as well as the lack of effective mechanisms to capture complex interactions between features;Additionally, when increasing the receptive field to obtain deeper feature representations, the reliance on increasing network depth leads to a significant increase in computational resource consumption, affecting the efficiency and performance of detection. Based on these issues, firstly, this paper proposes a network traffic anomaly detection model based on parallel dilated convolution and residual learning (Res-PDC). To better explore the interactive relationships between features, the traffic samples are converted into two-dimensional matrix. A module combining parallel dilated convolutions and residual learning (res-pdc) was designed to extract local and global features of traffic at different scales. By utilizing res-pdc modules with different dilation rates, we can effectively capture spatial features at different scales and explore feature dependencies spanning wider regions without increasing computational resources. Secondly, to focus and integrate the information in different feature subspaces, further enhance and extract the interactions among the features, multi-head attention is added to Res-PDC, resulting in the final model: multi-head attention enhanced parallel dilated convolution and residual learning (MHA-Res-PDC) for network traffic anomaly detection. Finally, comparisons with other machine learning and deep learning algorithms are conducted on the NSL-KDD and CIC-IDS-2018 datasets. The experimental results demonstrate that the proposed method in this paper can effectively improve the detection performance.
文摘Counterfeit agricultural products pose a significant challenge to global food security and economic stability, necessitating advanced detection mechanisms to ensure authenticity and quality. To address this pressing issue, we introduce iGFruit, an innovative model designed to enhance the detection of counterfeit agricultural products by integrating multimodal data processing. Our approach utilizes both image and text data for comprehensive feature extraction, employing advanced backbone models such as Vision Transformer (ViT), Normalizer-Free Network (NFNet), and Bidirectional Encoder Representations from Transformers (BERT). These extracted features are fused and processed using a Graph Attention Network (GAT) to capture intricate relationships within the multimodal data. The resulting fused representation is subsequently classified to detect counterfeit products with high precision. We validate the effectiveness of iGFruit through extensive experiments on two datasets: the publicly available MIT-States dataset and the proprietary TLU-States dataset, achieving state-of-the-art performance on both benchmarks. Specifically, iGFruit demonstrates an improvement of over 3% in average accuracy compared to baseline models, all while maintaining computational efficiency during inference. This work underscores the necessity and innovativeness of integrating graph-based feature learning to tackle the critical issue of counterfeit agricultural product detection.
基金supported by the Department of Science and Technology,Science and Engineering Research Board,New Delhi,India,under Grant No.EEQ/2022/000812.
文摘Landslide hazard detection is a prevalent problem in remote sensing studies,particularly with the technological advancement of computer vision.With the continuous and exceptional growth of the computational environment,the manual and partially automated procedure of landslide detection from remotely sensed images has shifted toward automatic methods with deep learning.Furthermore,attention models,driven by human visual procedures,have become vital in natural hazard-related studies.Hence,this paper proposes an enhanced YOLOv5(You Only Look Once version 5)network for improved satellite-based landslide detection,embedded with two popular attention modules:CBAM(Convolutional Block Attention Module)and ECA(Efficient Channel Attention).These attention mechanisms are incorporated into the backbone and neck of the YOLOv5 architecture,distinctly,and evaluated across three YOLOv5 variants:nano(n),small(s),and medium(m).The experiments use opensource satellite images from three distinct regions with complex terrain.The standard metrics,including F-score,precision,recall,and mean average precision(mAP),are computed for quantitative assessment.The YOLOv5n+CBAM demonstrates the most optimal results with an F-score of 77.2%,confirming its effectiveness.The suggested attention-driven architecture augments detection accuracy,supporting post-landslide event assessment and recovery.
基金supported by the Innovative Research Group Project of the National Natural Science Foundation of China(22021004)Sinopec Major Science and Technology Projects(321123-1).
文摘Anomaly fluctuations in operating conditions, catalyst wear, crushing, and the deterioration of feedstock properties in fluid catalytic cracking (FCC) units can disrupt the normal circulating fluidization process of the catalyst. Although several effective models have been proposed in previous research to address anomaly detection in chemical processes, most fail to adequately capture the spatial-temporal dependencies of multi-source, mixed-frequency information. In this study, an innovative multi-source mixed-frequency information fusion framework based on a spatial-temporal graph attention network (MIF-STGAT) is proposed to investigate the causes of FCC regenerator catalyst loss anomalies for guide onsite operational management, enhancing the long-term stability of FCC unit operations. First, a reconstruction-based dual-encoder-decoder framework is developed to facilitate the acquisition of mixed-frequency features and information fusion during the FCC regenerator catalyst loss process. Subsequently, a graph attention network and a multilayer long short-term memory network with a differential structure are integrated into the reconstruction-based dual-encoder-shared-decoder framework to capture the dynamic fluctuations and critical features associated with anomalies. Experimental results from the Chinese FCC industrial process demonstrate that MIF-STGAT achieves excellent accuracy and interpretability for anomaly detection.