Recently,ship detection technology has been applied extensively in the marine security monitoring field.However,achieving accurate marine ship detection still poses significant challenges due to factors such as varyin...Recently,ship detection technology has been applied extensively in the marine security monitoring field.However,achieving accurate marine ship detection still poses significant challenges due to factors such as varying scales,slightly occluded objects,uneven illumination,and sea clutter.To address these issues,we propose a novel ship detection approach,i.e.,the Twin Feature Pyramid Network and Data Augmentation(TFPN-DA),which mainly consists of three modules.First,to eliminate the negative effects of slightly occluded objects and uneven illumination,we propose the Spatial Attention within the Twin Feature Pyramid Network(SA-TFPN)method,which is based on spatial attention to reconstruct the feature pyramid.Second,the ROI Feature Module(ROIFM)is introduced into the SA-TFPN,which is used to enhance specific crucial details from multi-scale features for object regression and classification.Additionally,data augmentation strategies such as spatial affine transformation and noise processing,are developed to optimize the data sample distribution.A self-construct dataset is used to train the detection model,and the experiments conducted on the dataset demonstrate the effectiveness of our model.展开更多
The rapid development and widespread adoption of Internet technology have significantly increased Internet traffic,highlighting the growing importance of network security.Intrusion Detection Systems(IDS)are essential ...The rapid development and widespread adoption of Internet technology have significantly increased Internet traffic,highlighting the growing importance of network security.Intrusion Detection Systems(IDS)are essential for safeguarding network integrity.To address the low accuracy of existing intrusion detection models in identifying network attacks,this paper proposes an intrusion detection method based on the fusion of Spatial Attention mechanism and Residual Neural Network(SA-ResNet).Utilizing residual connections can effectively capture local features in the data;by introducing a spatial attention mechanism,the global dependency relationships of intrusion features can be extracted,enhancing the intrusion recognition model’s focus on the global features of intrusions,and effectively improving the accuracy of intrusion recognition.The proposed model in this paper was experimentally verified on theNSL-KDD dataset.The experimental results showthat the intrusion recognition accuracy of the intrusion detection method based on SA-ResNet has reached 99.86%,and its overall accuracy is 0.41% higher than that of traditional Convolutional Neural Network(CNN)models.展开更多
Aiming at the problem that the existing models have a poor segmentation effect on imbalanced data sets with small-scale samples,a bilateral U-Net network model with a spatial attention mechanism is designed.The model ...Aiming at the problem that the existing models have a poor segmentation effect on imbalanced data sets with small-scale samples,a bilateral U-Net network model with a spatial attention mechanism is designed.The model uses the lightweight MobileNetV2 as the backbone network for feature hierarchical extraction and proposes an Attentive Pyramid Spatial Attention(APSA)module compared to the Attenuated Spatial Pyramid module,which can increase the receptive field and enhance the information,and finally adds the context fusion prediction branch that fuses high-semantic and low-semantic prediction results,and the model effectively improves the segmentation accuracy of small data sets.The experimental results on the CamVid data set show that compared with some existing semantic segmentation networks,the algorithm has a better segmentation effect and segmentation accuracy,and its mIOU reaches 75.85%.Moreover,to verify the generality of the model and the effectiveness of the APSA module,experiments were conducted on the VOC 2012 data set,and the APSA module improved mIOU by about 12.2%.展开更多
Aiming at the problems of low detection accuracy and large model size of existing object detection algorithms applied to complex road scenes,an improved you only look once version 8(YOLOv8)object detection algorithm f...Aiming at the problems of low detection accuracy and large model size of existing object detection algorithms applied to complex road scenes,an improved you only look once version 8(YOLOv8)object detection algorithm for infrared images,F-YOLOv8,is proposed.First,a spatial-to-depth network replaces the traditional backbone network's strided convolution or pooling layer.At the same time,it combines with the channel attention mechanism so that the neural network focuses on the channels with large weight values to better extract low-resolution image feature information;then an improved feature pyramid network of lightweight bidirectional feature pyramid network(L-BiFPN)is proposed,which can efficiently fuse features of different scales.In addition,a loss function of insertion of union based on the minimum point distance(MPDIoU)is introduced for bounding box regression,which obtains faster convergence speed and more accurate regression results.Experimental results on the FLIR dataset show that the improved algorithm can accurately detect infrared road targets in real time with 3%and 2.2%enhancement in mean average precision at 50%IoU(mAP50)and mean average precision at 50%—95%IoU(mAP50-95),respectively,and 38.1%,37.3%and 16.9%reduction in the number of model parameters,the model weight,and floating-point operations per second(FLOPs),respectively.To further demonstrate the detection capability of the improved algorithm,it is tested on the public dataset PASCAL VOC,and the results show that F-YOLO has excellent generalized detection performance.展开更多
Due to the lack of annotations in target bounding boxes,most methods for weakly supervised target detection transform the problem of object detection into a classification problem of candidate regions,making it easy f...Due to the lack of annotations in target bounding boxes,most methods for weakly supervised target detection transform the problem of object detection into a classification problem of candidate regions,making it easy for weakly supervised target detectors to locate significant and highly discriminative local areas of objects.We propose a weak monitoring method that combines attention and erasure mechanisms.The supervised target detection method uses attention maps to search for areas with higher discrimination within candidate regions,and then uses an erasure mechanism to erase the region,forcing the model to enhance its learning of features in areas with weaker discrimination.To improve the positioning ability of the detector,we cascade the weakly supervised target detection network and the fully supervised target detection network,and jointly train the weakly supervised target detection network and the fully supervised target detection network through multi-task learning.Based on the validation trials,the category mean average precision(mAP)and the correct localization(CorLoc)on the two datasets,i.e.,VOC2007 and VOC2012,are 55.2% and 53.8%,respectively.In regard to the mAP and CorLoc,this approach significantly outperforms previous approaches,which creates opportunities for additional investigations into weakly supervised target identification algorithms.展开更多
Audio-visual scene classification(AVSC)poses a formidable challenge owing to the intricate spatial-temporal relationships exhibited by audio-visual signals,coupled with the complex spatial patterns of objects and text...Audio-visual scene classification(AVSC)poses a formidable challenge owing to the intricate spatial-temporal relationships exhibited by audio-visual signals,coupled with the complex spatial patterns of objects and textures found in visual images.The focus of recent studies has predominantly revolved around extracting features from diverse neural network structures,inadvertently neglecting the acquisition of semantically meaningful regions and crucial components within audio-visual data.The authors present a feature pyramid attention network(FPANet)for audio-visual scene understanding,which extracts semantically significant characteristics from audio-visual data.The authors’approach builds multi-scale hierarchical features of sound spectrograms and visual images using a feature pyramid representation and localises the semantically relevant regions with a feature pyramid attention module(FPAM).A dimension alignment(DA)strategy is employed to align feature maps from multiple layers,a pyramid spatial attention(PSA)to spatially locate essential regions,and a pyramid channel attention(PCA)to pinpoint significant temporal frames.Experiments on visual scene classification(VSC),audio scene classification(ASC),and AVSC tasks demonstrate that FPANet achieves performance on par with state-of-the-art(SOTA)approaches,with a 95.9 F1-score on the ADVANCE dataset and a relative improvement of 28.8%.Visualisation results show that FPANet can prioritise semantically meaningful areas in audio-visual signals.展开更多
Multispectral pedestrian detection technology leverages infrared images to provide reliable information for visible light images, demonstrating significant advantages in low-light conditions and background occlusion s...Multispectral pedestrian detection technology leverages infrared images to provide reliable information for visible light images, demonstrating significant advantages in low-light conditions and background occlusion scenarios. However, while continuously improving cross-modal feature extraction and fusion, ensuring the model’s detection speed is also a challenging issue. We have devised a deep learning network model for cross-modal pedestrian detection based on Resnet50, aiming to focus on more reliable features and enhance the model’s detection efficiency. This model employs a spatial attention mechanism to reweight the input visible light and infrared image data, enhancing the model’s focus on different spatial positions and sharing the weighted feature data across different modalities, thereby reducing the interference of multi-modal features. Subsequently, lightweight modules with depthwise separable convolution are incorporated to reduce the model’s parameter count and computational load through channel-wise and point-wise convolutions. The network model algorithm proposed in this paper was experimentally validated on the publicly available KAIST dataset and compared with other existing methods. The experimental results demonstrate that our approach achieves favorable performance in various complex environments, affirming the effectiveness of the multispectral pedestrian detection technology proposed in this paper.展开更多
In view of the weak ability of the convolutional neural networks to explicitly learn spatial invariance and the probabilistic loss of discriminative features caused by occlusion and background interference in pedestri...In view of the weak ability of the convolutional neural networks to explicitly learn spatial invariance and the probabilistic loss of discriminative features caused by occlusion and background interference in pedestrian re-identification tasks,a person re-identification method combining spatial feature learning and multi-granularity feature fusion was proposed.First,an attention spatial transformation network(A-STN)is proposed to learn spatial features and solve the problem of misalignment of pedestrian spatial features.Then the network was divided into a global branch,a local coarse-grained fusion branch,and a local fine-grained fusion branch to extract pedestrian global features,coarse-grained fusion features,and fine-grained fusion features,respectively.Among them,the global branch enriches the global features by fusing different pooling features.The local coarse-grained fusion branch uses an overlay pooling to enhance each local feature while learning the correlation relationship between multi-granularity features.The local fine-grained fusion branch uses a differential pooling to obtain the differential features that were fused with global features to learn the relationship between pedestrian local features and pedestrian global features.Finally,the proposed method was compared on three public datasets:Market1501,DukeMTMC-ReID and CUHK03.The experimental results were better than those of the comparative methods,which verifies the effectiveness of the proposed method.展开更多
Working memory is a core cognitive function that supports goal-directed behavior and complex thought.We developed a spatial working memory and attention test on paired symbols(SWAPS)which has been proved to be a usefu...Working memory is a core cognitive function that supports goal-directed behavior and complex thought.We developed a spatial working memory and attention test on paired symbols(SWAPS)which has been proved to be a useful and valid tool for spatial working memory and attention studies in the fields of cognitive psychology,education,and psychiatry.The repeated administration of working memory capacity tests is common in clinical and research settings.Studies suggest that repeated cognitive tests may improve the performance scores also known as retest effects.The systematic investigation of retest effects in SWAPS is critical for interpreting scientific results,but it is still not fully developed.To address this,we recruited 77 college students aged 18–21 years and used SWAPS comprising 72 trials with different memory loads,learning time,and delay span.We repeated the test once a week for five weeks to investigate the retest effects of SWAPS.There were significant retest effects in the first two tests:the accuracy of the SWAPS tests significantly increased,and then stabilized.These findings provide useful information for researchers to appropriately use or interpret the repeated working memory tests.Further experiments are still needed to clarify the factors that mediate the retest effects,and find out the cognitive mechanism that influences the retest effects.展开更多
Top-down attention mechanisms require the selection of specificobjects or locations;however,the brain mechanism involved when attention is allocated across different modalities is not well understood.The aim of this s...Top-down attention mechanisms require the selection of specificobjects or locations;however,the brain mechanism involved when attention is allocated across different modalities is not well understood.The aim of this study was to use functional magnetic resonance imaging to define the neural mechanisms underlyingdivided and selective spatial attention.A concurrent audiovisual stimulus was used,and subjects were prompted to focus on a visual,auditory and audiovisual stimulus in a Posner paradigm.Ourbehavioral results confirmed the better performance of selectiveattention compared to devided attention.We found differences in the activation level of the frontoparietal network,visual/auditorycortex,the putamen and the salience network under differentattention conditions.We further used Granger causality(GC)toexplore effective connectivity differences between tasks.Differences in GC connectivity between visual and auditory selective tasksreflected the visual dominance effect under spatial attention.In addition,our results supported the role of the putamen inredistributing attention and the functional separation of the saliencenetwork.In summary,we explored the audiovisual top-down allocation of attention and observed the differences in neuralmechanisms under endogenous attention modes,which revealedthe differences in cross-modal expression in visual and auditory attention under attentional modulation.展开更多
The separation of individual pigs from the pigpen scenes is crucial for precision farming,and the technology based on convolutional neural networks can provide a low-cost,non-contact,non-invasive method of pig image s...The separation of individual pigs from the pigpen scenes is crucial for precision farming,and the technology based on convolutional neural networks can provide a low-cost,non-contact,non-invasive method of pig image segmentation.However,two factors limit the development of this field.On the one hand,the individual pigs are easy to stick together,and the occlusion of debris such as pigpens can easily make the model misjudgment.On the other hand,manual labeling of group-raised pig data is time-consuming and labor-intensive and is prone to labeling errors.Therefore,it is urgent for an individual pig image segmentation model that can perform well in individual scenarios and can be easily migrated to a group-raised environment.In order to solve the above problems,taking individual pigs as research objects,an individual pig image segmentation dataset containing 2066 images was constructed,and a series of algorithms based on fully convolutional networks were proposed to solve the pig image segmentation problem.In order to capture the long-range dependencies and weaken the background information such as pigpens while enhancing the information of individual parts of pigs,the channel and spatial attention blocks were introduced into the best-performing decoders UNet and LinkNet.Experiments show that using ResNext50 as the encoder and Unet as the decoder as the basic model,adding two attention blocks at the same time achieves 98.30%and 96.71%on the F1 and IOU metrics,respectively.Compared with the model adding channel attention block alone,the two metrics are improved by 0.13%and 0.22%,respectively.The experiment of introducing channel and spatial attention alone shows that spatial attention is more effective than channel attention.Taking VGG16-LinkNet as an example,compared with channel attention,spatial attention improves the F1 and IOU metrics by 0.16%and 0.30%,respectively.Furthermore,the heatmap of the feature of different layers of the decoder after adding different attention information proves that with the increase of layers,the boundary of pig image segmentation is clearer.In order to verify the effectiveness of the individual pig image segmentation model in group-raised scenes,the transfer performance of the model is verified in three scenarios of high separation,deep adhesion,and pigpen occlusion.The experiments show that the segmentation results of adding attention information,especially the simultaneous fusion of channel and spatial attention blocks,are more refined and complete.The attention-based individual pig image segmentation model can be effectively transferred to the field of group-raised pigs and can provide a reference for its pre-segmentation.展开更多
With the metaverse being the development direction of the next generation Internet,the popularity of intelligent devices,and the maturity of various emerging technologies,more and more intelligent devices try to conne...With the metaverse being the development direction of the next generation Internet,the popularity of intelligent devices,and the maturity of various emerging technologies,more and more intelligent devices try to connect to the Internet,which poses a major threat to the management and security protection of network equipment.At present,the mainstream method of network equipment identification in the metaverse is to obtain the network traffic data generated in the process of device communication,extract the device features through analysis and processing,and identify the device based on a variety of learning algorithms.Such methods often require manual participation,and it is difficult to capture the small differences between similar devices,leading to identification errors.Therefore,we propose a deep learning device recognition method based on a spatial attention mechanism.Firstly,we extract the required feature fields from the acquired network traffic data.Then,we normalize the data and convert it into grayscale images.After that,we add a spatial attention mechanism to CNN and MLP respectively to increase the difference between similar network devices and further improve the recognition accuracy.Finally,we identify devices based on the deep learning model.A large number of experiments were carried out on 31 types of network devices such as web cameras,wireless routers,and smartwatches.The results show that the accuracy of the proposed recognition method based on the spatial attention mechanism is increased by 0.8%and 2.0%,respectively,compared with the recognition method based only on the deep learning model under the CNN and MLP models.The method proposed in this paper is significantly superior to the existing method of device-type recognition based only on a deep learning model.展开更多
Object detection has made a significant leap forward in recent years.However,the detection of small objects continues to be a great difficulty for various reasons,such as they have a very small size and they are susce...Object detection has made a significant leap forward in recent years.However,the detection of small objects continues to be a great difficulty for various reasons,such as they have a very small size and they are susceptible to missed detection due to background noise.Additionally,small object information is affected due to the downsampling operations.Deep learning-based detection methods have been utilized to address the challenge posed by small objects.In this work,we propose a novel method,the Multi-Convolutional Block Attention Network(MCBAN),to increase the detection accuracy of minute objects aiming to overcome the challenge of information loss during the downsampling process.The multi-convolutional attention block(MCAB);channel attention and spatial attention module(SAM)that make up MCAB,have been crafted to accomplish small object detection with higher precision.We have carried out the experiments on the Karlsruhe Institute of Technology and Toyota Technological Institute(KITTI)and Pattern Analysis,Statical Modeling and Computational Learning(PASCAL)Visual Object Classes(VOC)datasets and have followed a step-wise process to analyze the results.These experiment results demonstrate that significant gains in performance are achieved,such as 97.75%for KITTI and 88.97%for PASCAL VOC.The findings of this study assert quite unequivocally the fact that MCBAN is much more efficient in the small object detection domain as compared to other existing approaches.展开更多
The detection of foreign object intrusion is crucial for ensuring the safety of railway operations.To address challenges such as low efficiency,suboptimal detection accuracy,and slow detection speed inherent in conven...The detection of foreign object intrusion is crucial for ensuring the safety of railway operations.To address challenges such as low efficiency,suboptimal detection accuracy,and slow detection speed inherent in conventional comprehensive video monitoring systems for railways,a railway foreign object intrusion recognition and detection system is conceived and implemented using edge computing and deep learning technologies.In a bid to raise detection accuracy,the convolutional block attention module(CBAM),including spatial and channel attention modules,is seamlessly integrated into the YOLOv5 model,giving rise to the CBAM-YOLOv5 model.Furthermore,the distance intersection-over-union_non-maximum suppression(DIo U_NMS)algorithm is employed in lieu of the weighted nonmaximum suppression algorithm,resulting in improved detection performance for intrusive targets.To accelerate detection speed,the model undergoes pruning based on the batch normalization(BN)layer,and Tensor RT inference acceleration techniques are employed,culminating in the successful deployment of the algorithm on edge devices.The CBAM-YOLOv5 model exhibits a notable 2.1%enhancement in detection accuracy when evaluated on a selfconstructed railway dataset,achieving 95.0%for mean average precision(m AP).Furthermore,the inference speed on edge devices attains a commendable 15 frame/s.展开更多
Existing almost deep learning methods rely on a large amount of annotated data, so they are inappropriate for forest fire smoke detection with limited data. In this paper, a novel hybrid attention-based few-shot learn...Existing almost deep learning methods rely on a large amount of annotated data, so they are inappropriate for forest fire smoke detection with limited data. In this paper, a novel hybrid attention-based few-shot learning method, named Attention-Based Prototypical Network, is proposed for forest fire smoke detection. Specifically, feature extraction network, which consists of convolutional block attention module, could extract high-level and discriminative features and further decrease the false alarm rate resulting from suspected smoke areas. Moreover, we design a metalearning module to alleviate the overfitting issue caused by limited smoke images, and the meta-learning network enables achieving effective detection via comparing the distance between the class prototype of support images and the features of query images. A series of experiments on forest fire smoke datasets and miniImageNet dataset testify that the proposed method is superior to state-of-the-art few-shot learning approaches.展开更多
Studies have shown that spatial attention remarkably affects the trial-to-trial response variability shared between neurons.Difficulty in the attentional task adjusts how much concentration we maintain on what is curr...Studies have shown that spatial attention remarkably affects the trial-to-trial response variability shared between neurons.Difficulty in the attentional task adjusts how much concentration we maintain on what is currently important and what is filtered as irrelevant sensory information.However,how task difficulty mediates the interactions between neurons with separated receptive fields(RFs)that are attended to or attended away is still not clear.We examined spike count correlations between single-unit activities recorded simultaneously in the primary visual cortex(V1)while monkeys performed a spatial attention task with two levels of difficulty.Moreover,the RFs of the two neurons recorded were non-overlapping to allow us to study fluctuations in the correlated responses between competing visual inputs when the focus of attention was allocated to the RF of one neuron.While increasing difficulty in the spatial attention task,spike count correlations were either decreased to become negative between neuronal pairs,implying competition among them,with one neuron(or none)exhibiting attentional enhancement of firing rate,or increased to become positive,suggesting inter-neuronal cooperation,with one of the pair showing attentional suppression of spiking responses.Besides,the modulation of spike count correlations by task difficulty was independent of the attended locations.These findings provide evidence that task difficulty affects the functional interactions between different neuronal pools in V1 when selective attention resolves the spatial competition.展开更多
Image inpainting based on deep learning has been greatly improved.The original purpose of image inpainting was to repair some broken photos, suchas inpainting artifacts. However, it may also be used for malicious oper...Image inpainting based on deep learning has been greatly improved.The original purpose of image inpainting was to repair some broken photos, suchas inpainting artifacts. However, it may also be used for malicious operations,such as destroying evidence. Therefore, detection and localization of imageinpainting operations are essential. Recent research shows that high-pass filteringfull convolutional network (HPFCN) is applied to image inpainting detection andachieves good results. However, those methods did not consider the spatial location and channel information of the feature map. To solve these shortcomings, weintroduce the squeezed excitation blocks (SE) and propose a high-pass filter attention full convolutional network (HPACN). In feature extraction, we apply concurrent spatial and channel attention (scSE) to enhance feature extraction and obtainmore information. Channel attention (cSE) is introduced in upsampling toenhance detection and localization. The experimental results show that the proposed method can achieve improvement on ImageNet.展开更多
Visual question answering(VQA)has attracted more and more attention in computer vision and natural language processing.Scholars are committed to studying how to better integrate image features and text features to ach...Visual question answering(VQA)has attracted more and more attention in computer vision and natural language processing.Scholars are committed to studying how to better integrate image features and text features to achieve better results in VQA tasks.Analysis of all features may cause information redundancy and heavy computational burden.Attention mechanism is a wise way to solve this problem.However,using single attention mechanism may cause incomplete concern of features.This paper improves the attention mechanism method and proposes a hybrid attention mechanism that combines the spatial attention mechanism method and the channel attention mechanism method.In the case that the attention mechanism will cause the loss of the original features,a small portion of image features were added as compensation.For the attention mechanism of text features,a selfattention mechanism was introduced,and the internal structural features of sentences were strengthened to improve the overall model.The results show that attention mechanism and feature compensation add 6.1%accuracy to multimodal low-rank bilinear pooling network.展开更多
Recent applications of convolutional neural networks(CNNs)in single image super-resolution(SISR)have achieved unprecedented performance.However,existing CNN-based SISR network structure design consider mostly only cha...Recent applications of convolutional neural networks(CNNs)in single image super-resolution(SISR)have achieved unprecedented performance.However,existing CNN-based SISR network structure design consider mostly only channel or spatial information,and cannot make full use of both channel and spatial information to improve SISR performance further.The present work addresses this problem by proposing a mixed attention densely residual network architecture that can make full and simultaneous use of both channel and spatial information.Specifically,we propose a residual in dense network structure composed of dense connections between multiple dense residual groups to form a very deep network.This structure allows each dense residual group to apply a local residual skip connection and enables the cascading of multiple residual blocks to reuse previous features.A mixed attention module is inserted into each dense residual group,to enable the algorithm to fuse channel attention with laplacian spatial attention effectively,and thereby more adaptively focus on valuable feature learning.The qualitative and quantitative results of extensive experiments have demonstrate that the proposed method has a comparable performance with other stateof-the-art methods.展开更多
Sparse rewards pose significant challenges in deep reinforcement learning as agents struggle to learn from experiences with limited reward signals.Hindsight experience replay(HER)addresses this problem by creating“sm...Sparse rewards pose significant challenges in deep reinforcement learning as agents struggle to learn from experiences with limited reward signals.Hindsight experience replay(HER)addresses this problem by creating“small goals”within a hierarchical decision model.However,HER does not consider the value of different episodes for agent learning.In this paper,we propose SPAHER,a framework for prioritizing hindsight experiences based on spatial position attention.SPAHER allows the agent to prioritize more valuable experiences in a manipulation task.It achieves this by calculating transition and trajectory spatial position functions to determine the value of each episode for experience replays.We evaluate SPAHER on eight robot manipulation tasks in the Fetch and Hand environments provided by OpenAI Gym.Simulation results show that our method improves the final mean success rate by an average of 3.63%compared to HER,especially in challenging Hand environments.Notably,these improvements are achieved without any increase in computation time.展开更多
文摘Recently,ship detection technology has been applied extensively in the marine security monitoring field.However,achieving accurate marine ship detection still poses significant challenges due to factors such as varying scales,slightly occluded objects,uneven illumination,and sea clutter.To address these issues,we propose a novel ship detection approach,i.e.,the Twin Feature Pyramid Network and Data Augmentation(TFPN-DA),which mainly consists of three modules.First,to eliminate the negative effects of slightly occluded objects and uneven illumination,we propose the Spatial Attention within the Twin Feature Pyramid Network(SA-TFPN)method,which is based on spatial attention to reconstruct the feature pyramid.Second,the ROI Feature Module(ROIFM)is introduced into the SA-TFPN,which is used to enhance specific crucial details from multi-scale features for object regression and classification.Additionally,data augmentation strategies such as spatial affine transformation and noise processing,are developed to optimize the data sample distribution.A self-construct dataset is used to train the detection model,and the experiments conducted on the dataset demonstrate the effectiveness of our model.
基金supported by National Natural Science Foundation of China(62473341)Key Research and Development Special Project of Henan Province(221111210500)Key Research and Development Special Project of Henan Province(242102211071,242102210142,232102211053).
文摘The rapid development and widespread adoption of Internet technology have significantly increased Internet traffic,highlighting the growing importance of network security.Intrusion Detection Systems(IDS)are essential for safeguarding network integrity.To address the low accuracy of existing intrusion detection models in identifying network attacks,this paper proposes an intrusion detection method based on the fusion of Spatial Attention mechanism and Residual Neural Network(SA-ResNet).Utilizing residual connections can effectively capture local features in the data;by introducing a spatial attention mechanism,the global dependency relationships of intrusion features can be extracted,enhancing the intrusion recognition model’s focus on the global features of intrusions,and effectively improving the accuracy of intrusion recognition.The proposed model in this paper was experimentally verified on theNSL-KDD dataset.The experimental results showthat the intrusion recognition accuracy of the intrusion detection method based on SA-ResNet has reached 99.86%,and its overall accuracy is 0.41% higher than that of traditional Convolutional Neural Network(CNN)models.
基金Ministry of Science and Technology Basic Resources Survey Special Project,Grant/Award Number:2019FY100900High-level Hospital Construction Project,Grant/Award Number:DFJH2019015+2 种基金National Natural Science Foundation of China,Grant/Award Number:61871021Guangdong Natural Science Foundation,Grant/Award Number:2019A1515011676Beijing Key Laboratory of Robotics Bionic and Functional Research。
文摘Aiming at the problem that the existing models have a poor segmentation effect on imbalanced data sets with small-scale samples,a bilateral U-Net network model with a spatial attention mechanism is designed.The model uses the lightweight MobileNetV2 as the backbone network for feature hierarchical extraction and proposes an Attentive Pyramid Spatial Attention(APSA)module compared to the Attenuated Spatial Pyramid module,which can increase the receptive field and enhance the information,and finally adds the context fusion prediction branch that fuses high-semantic and low-semantic prediction results,and the model effectively improves the segmentation accuracy of small data sets.The experimental results on the CamVid data set show that compared with some existing semantic segmentation networks,the algorithm has a better segmentation effect and segmentation accuracy,and its mIOU reaches 75.85%.Moreover,to verify the generality of the model and the effectiveness of the APSA module,experiments were conducted on the VOC 2012 data set,and the APSA module improved mIOU by about 12.2%.
基金supported by the National Natural Science Foundation of China(No.62103298)。
文摘Aiming at the problems of low detection accuracy and large model size of existing object detection algorithms applied to complex road scenes,an improved you only look once version 8(YOLOv8)object detection algorithm for infrared images,F-YOLOv8,is proposed.First,a spatial-to-depth network replaces the traditional backbone network's strided convolution or pooling layer.At the same time,it combines with the channel attention mechanism so that the neural network focuses on the channels with large weight values to better extract low-resolution image feature information;then an improved feature pyramid network of lightweight bidirectional feature pyramid network(L-BiFPN)is proposed,which can efficiently fuse features of different scales.In addition,a loss function of insertion of union based on the minimum point distance(MPDIoU)is introduced for bounding box regression,which obtains faster convergence speed and more accurate regression results.Experimental results on the FLIR dataset show that the improved algorithm can accurately detect infrared road targets in real time with 3%and 2.2%enhancement in mean average precision at 50%IoU(mAP50)and mean average precision at 50%—95%IoU(mAP50-95),respectively,and 38.1%,37.3%and 16.9%reduction in the number of model parameters,the model weight,and floating-point operations per second(FLOPs),respectively.To further demonstrate the detection capability of the improved algorithm,it is tested on the public dataset PASCAL VOC,and the results show that F-YOLO has excellent generalized detection performance.
基金supported by the National Natural Science Foundation of China(No.61871182,61773160)the Natural Science Foundation of Hebei Province of China(No.F2021502013)+1 种基金the Fundamental Research Funds for the Central Universities(No.2020MS153,2021PT018)the National Natural Science Foundation of China(No.62371188).
文摘Due to the lack of annotations in target bounding boxes,most methods for weakly supervised target detection transform the problem of object detection into a classification problem of candidate regions,making it easy for weakly supervised target detectors to locate significant and highly discriminative local areas of objects.We propose a weak monitoring method that combines attention and erasure mechanisms.The supervised target detection method uses attention maps to search for areas with higher discrimination within candidate regions,and then uses an erasure mechanism to erase the region,forcing the model to enhance its learning of features in areas with weaker discrimination.To improve the positioning ability of the detector,we cascade the weakly supervised target detection network and the fully supervised target detection network,and jointly train the weakly supervised target detection network and the fully supervised target detection network through multi-task learning.Based on the validation trials,the category mean average precision(mAP)and the correct localization(CorLoc)on the two datasets,i.e.,VOC2007 and VOC2012,are 55.2% and 53.8%,respectively.In regard to the mAP and CorLoc,this approach significantly outperforms previous approaches,which creates opportunities for additional investigations into weakly supervised target identification algorithms.
基金Shenzhen Institute of Artificial Intelligence and Robotics for Society,Grant/Award Number:AC01202201003-02GuangDong Basic and Applied Basic Research Foundation,Grant/Award Number:2024A1515010252Longgang District Shenzhen's“Ten Action Plan”for Supporting Innovation Projects,Grant/Award Number:LGKCSDPT2024002。
文摘Audio-visual scene classification(AVSC)poses a formidable challenge owing to the intricate spatial-temporal relationships exhibited by audio-visual signals,coupled with the complex spatial patterns of objects and textures found in visual images.The focus of recent studies has predominantly revolved around extracting features from diverse neural network structures,inadvertently neglecting the acquisition of semantically meaningful regions and crucial components within audio-visual data.The authors present a feature pyramid attention network(FPANet)for audio-visual scene understanding,which extracts semantically significant characteristics from audio-visual data.The authors’approach builds multi-scale hierarchical features of sound spectrograms and visual images using a feature pyramid representation and localises the semantically relevant regions with a feature pyramid attention module(FPAM).A dimension alignment(DA)strategy is employed to align feature maps from multiple layers,a pyramid spatial attention(PSA)to spatially locate essential regions,and a pyramid channel attention(PCA)to pinpoint significant temporal frames.Experiments on visual scene classification(VSC),audio scene classification(ASC),and AVSC tasks demonstrate that FPANet achieves performance on par with state-of-the-art(SOTA)approaches,with a 95.9 F1-score on the ADVANCE dataset and a relative improvement of 28.8%.Visualisation results show that FPANet can prioritise semantically meaningful areas in audio-visual signals.
基金supported by the Henan Provincial Science and Technology Research Project under Grants 232102211006,232102210044,232102211017,232102210055 and 222102210214the Science and Technology Innovation Project of Zhengzhou University of Light Industry under Grant 23XNKJTD0205+1 种基金the Undergraduate Universities Smart Teaching Special Research Project of Henan Province under Grant Jiao Gao[2021]No.489-29the Doctor Natural Science Foundation of Zhengzhou University of Light Industry under Grants 2021BSJJ025 and 2022BSJJZK13.
文摘Multispectral pedestrian detection technology leverages infrared images to provide reliable information for visible light images, demonstrating significant advantages in low-light conditions and background occlusion scenarios. However, while continuously improving cross-modal feature extraction and fusion, ensuring the model’s detection speed is also a challenging issue. We have devised a deep learning network model for cross-modal pedestrian detection based on Resnet50, aiming to focus on more reliable features and enhance the model’s detection efficiency. This model employs a spatial attention mechanism to reweight the input visible light and infrared image data, enhancing the model’s focus on different spatial positions and sharing the weighted feature data across different modalities, thereby reducing the interference of multi-modal features. Subsequently, lightweight modules with depthwise separable convolution are incorporated to reduce the model’s parameter count and computational load through channel-wise and point-wise convolutions. The network model algorithm proposed in this paper was experimentally validated on the publicly available KAIST dataset and compared with other existing methods. The experimental results demonstrate that our approach achieves favorable performance in various complex environments, affirming the effectiveness of the multispectral pedestrian detection technology proposed in this paper.
基金the Foshan Science and technology Innovation Team Project(No.FS0AA-KJ919-4402-0060)the National Natural Science Foundation of China(No.62263018)。
文摘In view of the weak ability of the convolutional neural networks to explicitly learn spatial invariance and the probabilistic loss of discriminative features caused by occlusion and background interference in pedestrian re-identification tasks,a person re-identification method combining spatial feature learning and multi-granularity feature fusion was proposed.First,an attention spatial transformation network(A-STN)is proposed to learn spatial features and solve the problem of misalignment of pedestrian spatial features.Then the network was divided into a global branch,a local coarse-grained fusion branch,and a local fine-grained fusion branch to extract pedestrian global features,coarse-grained fusion features,and fine-grained fusion features,respectively.Among them,the global branch enriches the global features by fusing different pooling features.The local coarse-grained fusion branch uses an overlay pooling to enhance each local feature while learning the correlation relationship between multi-granularity features.The local fine-grained fusion branch uses a differential pooling to obtain the differential features that were fused with global features to learn the relationship between pedestrian local features and pedestrian global features.Finally,the proposed method was compared on three public datasets:Market1501,DukeMTMC-ReID and CUHK03.The experimental results were better than those of the comparative methods,which verifies the effectiveness of the proposed method.
基金the National Natural Science Foundation of China(No.91632103)the Shanghai Education Commission Research and Innovation Program(No.2019-01-07-00-02-E00037)+2 种基金the Program of Shanghai Subject Chief Scientist(No.17XD1401700)the Higher Education Disciplinary Innovation Programthe“Eastern Scholar”Project。
文摘Working memory is a core cognitive function that supports goal-directed behavior and complex thought.We developed a spatial working memory and attention test on paired symbols(SWAPS)which has been proved to be a useful and valid tool for spatial working memory and attention studies in the fields of cognitive psychology,education,and psychiatry.The repeated administration of working memory capacity tests is common in clinical and research settings.Studies suggest that repeated cognitive tests may improve the performance scores also known as retest effects.The systematic investigation of retest effects in SWAPS is critical for interpreting scientific results,but it is still not fully developed.To address this,we recruited 77 college students aged 18–21 years and used SWAPS comprising 72 trials with different memory loads,learning time,and delay span.We repeated the test once a week for five weeks to investigate the retest effects of SWAPS.There were significant retest effects in the first two tests:the accuracy of the SWAPS tests significantly increased,and then stabilized.These findings provide useful information for researchers to appropriately use or interpret the repeated working memory tests.Further experiments are still needed to clarify the factors that mediate the retest effects,and find out the cognitive mechanism that influences the retest effects.
基金The study was supported by the National Natural Science Foundation of China(Grant Nos.62171300,61727807).
文摘Top-down attention mechanisms require the selection of specificobjects or locations;however,the brain mechanism involved when attention is allocated across different modalities is not well understood.The aim of this study was to use functional magnetic resonance imaging to define the neural mechanisms underlyingdivided and selective spatial attention.A concurrent audiovisual stimulus was used,and subjects were prompted to focus on a visual,auditory and audiovisual stimulus in a Posner paradigm.Ourbehavioral results confirmed the better performance of selectiveattention compared to devided attention.We found differences in the activation level of the frontoparietal network,visual/auditorycortex,the putamen and the salience network under differentattention conditions.We further used Granger causality(GC)toexplore effective connectivity differences between tasks.Differences in GC connectivity between visual and auditory selective tasksreflected the visual dominance effect under spatial attention.In addition,our results supported the role of the putamen inredistributing attention and the functional separation of the saliencenetwork.In summary,we explored the audiovisual top-down allocation of attention and observed the differences in neuralmechanisms under endogenous attention modes,which revealedthe differences in cross-modal expression in visual and auditory attention under attentional modulation.
基金supported by the National Natural Science Foundation of China(Grant No.31671571)the Shanxi Province Basic Research Program Project(Free Exploration)(No.20210302124523,20210302123408,202103021224149,and 202103021223141)the Youth Agricultural Science and Technology Innovation Fund of Shanxi Agricultural University(Grant No.2019027)。
文摘The separation of individual pigs from the pigpen scenes is crucial for precision farming,and the technology based on convolutional neural networks can provide a low-cost,non-contact,non-invasive method of pig image segmentation.However,two factors limit the development of this field.On the one hand,the individual pigs are easy to stick together,and the occlusion of debris such as pigpens can easily make the model misjudgment.On the other hand,manual labeling of group-raised pig data is time-consuming and labor-intensive and is prone to labeling errors.Therefore,it is urgent for an individual pig image segmentation model that can perform well in individual scenarios and can be easily migrated to a group-raised environment.In order to solve the above problems,taking individual pigs as research objects,an individual pig image segmentation dataset containing 2066 images was constructed,and a series of algorithms based on fully convolutional networks were proposed to solve the pig image segmentation problem.In order to capture the long-range dependencies and weaken the background information such as pigpens while enhancing the information of individual parts of pigs,the channel and spatial attention blocks were introduced into the best-performing decoders UNet and LinkNet.Experiments show that using ResNext50 as the encoder and Unet as the decoder as the basic model,adding two attention blocks at the same time achieves 98.30%and 96.71%on the F1 and IOU metrics,respectively.Compared with the model adding channel attention block alone,the two metrics are improved by 0.13%and 0.22%,respectively.The experiment of introducing channel and spatial attention alone shows that spatial attention is more effective than channel attention.Taking VGG16-LinkNet as an example,compared with channel attention,spatial attention improves the F1 and IOU metrics by 0.16%and 0.30%,respectively.Furthermore,the heatmap of the feature of different layers of the decoder after adding different attention information proves that with the increase of layers,the boundary of pig image segmentation is clearer.In order to verify the effectiveness of the individual pig image segmentation model in group-raised scenes,the transfer performance of the model is verified in three scenarios of high separation,deep adhesion,and pigpen occlusion.The experiments show that the segmentation results of adding attention information,especially the simultaneous fusion of channel and spatial attention blocks,are more refined and complete.The attention-based individual pig image segmentation model can be effectively transferred to the field of group-raised pigs and can provide a reference for its pre-segmentation.
基金supported by the National Key Research and Development Program of China(No.2022YFB3102900)the National Natural Science Foundation of China(No.U1804263,62172435 and 62002386)the Zhongyuan Science and Technology Innovation Leading Talent Project,China(No.214200510019)
文摘With the metaverse being the development direction of the next generation Internet,the popularity of intelligent devices,and the maturity of various emerging technologies,more and more intelligent devices try to connect to the Internet,which poses a major threat to the management and security protection of network equipment.At present,the mainstream method of network equipment identification in the metaverse is to obtain the network traffic data generated in the process of device communication,extract the device features through analysis and processing,and identify the device based on a variety of learning algorithms.Such methods often require manual participation,and it is difficult to capture the small differences between similar devices,leading to identification errors.Therefore,we propose a deep learning device recognition method based on a spatial attention mechanism.Firstly,we extract the required feature fields from the acquired network traffic data.Then,we normalize the data and convert it into grayscale images.After that,we add a spatial attention mechanism to CNN and MLP respectively to increase the difference between similar network devices and further improve the recognition accuracy.Finally,we identify devices based on the deep learning model.A large number of experiments were carried out on 31 types of network devices such as web cameras,wireless routers,and smartwatches.The results show that the accuracy of the proposed recognition method based on the spatial attention mechanism is increased by 0.8%and 2.0%,respectively,compared with the recognition method based only on the deep learning model under the CNN and MLP models.The method proposed in this paper is significantly superior to the existing method of device-type recognition based only on a deep learning model.
基金funded by Yayasan UTP FRG(YUTP-FRG),grant number 015LC0-280 and Computer and Information Science Department of Universiti Teknologi PETRONAS.
文摘Object detection has made a significant leap forward in recent years.However,the detection of small objects continues to be a great difficulty for various reasons,such as they have a very small size and they are susceptible to missed detection due to background noise.Additionally,small object information is affected due to the downsampling operations.Deep learning-based detection methods have been utilized to address the challenge posed by small objects.In this work,we propose a novel method,the Multi-Convolutional Block Attention Network(MCBAN),to increase the detection accuracy of minute objects aiming to overcome the challenge of information loss during the downsampling process.The multi-convolutional attention block(MCAB);channel attention and spatial attention module(SAM)that make up MCAB,have been crafted to accomplish small object detection with higher precision.We have carried out the experiments on the Karlsruhe Institute of Technology and Toyota Technological Institute(KITTI)and Pattern Analysis,Statical Modeling and Computational Learning(PASCAL)Visual Object Classes(VOC)datasets and have followed a step-wise process to analyze the results.These experiment results demonstrate that significant gains in performance are achieved,such as 97.75%for KITTI and 88.97%for PASCAL VOC.The findings of this study assert quite unequivocally the fact that MCBAN is much more efficient in the small object detection domain as compared to other existing approaches.
基金supported in part by the Science and Technology Innovation Project of CHN Energy Shuo Huang Railway Development Company Ltd(No.SHTL-22-28)the Beijing Natural Science Foundation Fengtai Urban Rail Transit Frontier Research Joint Fund(No.L231002)the Major Project of China State Railway Group Co.,Ltd.(No.K2023T003)。
文摘The detection of foreign object intrusion is crucial for ensuring the safety of railway operations.To address challenges such as low efficiency,suboptimal detection accuracy,and slow detection speed inherent in conventional comprehensive video monitoring systems for railways,a railway foreign object intrusion recognition and detection system is conceived and implemented using edge computing and deep learning technologies.In a bid to raise detection accuracy,the convolutional block attention module(CBAM),including spatial and channel attention modules,is seamlessly integrated into the YOLOv5 model,giving rise to the CBAM-YOLOv5 model.Furthermore,the distance intersection-over-union_non-maximum suppression(DIo U_NMS)algorithm is employed in lieu of the weighted nonmaximum suppression algorithm,resulting in improved detection performance for intrusive targets.To accelerate detection speed,the model undergoes pruning based on the batch normalization(BN)layer,and Tensor RT inference acceleration techniques are employed,culminating in the successful deployment of the algorithm on edge devices.The CBAM-YOLOv5 model exhibits a notable 2.1%enhancement in detection accuracy when evaluated on a selfconstructed railway dataset,achieving 95.0%for mean average precision(m AP).Furthermore,the inference speed on edge devices attains a commendable 15 frame/s.
基金The work was supported by the National Key R&D Program of China(Grant No.2020YFC1511601)Fundamental Research Funds for the Central Universities(Grant No.2019SHFWLC01).
文摘Existing almost deep learning methods rely on a large amount of annotated data, so they are inappropriate for forest fire smoke detection with limited data. In this paper, a novel hybrid attention-based few-shot learning method, named Attention-Based Prototypical Network, is proposed for forest fire smoke detection. Specifically, feature extraction network, which consists of convolutional block attention module, could extract high-level and discriminative features and further decrease the false alarm rate resulting from suspected smoke areas. Moreover, we design a metalearning module to alleviate the overfitting issue caused by limited smoke images, and the meta-learning network enables achieving effective detection via comparing the distance between the class prototype of support images and the features of query images. A series of experiments on forest fire smoke datasets and miniImageNet dataset testify that the proposed method is superior to state-of-the-art few-shot learning approaches.
基金This work was supported by the National Natural Science Foundation of China(61773259,31471081,61773256,62073221,and 61971280).
文摘Studies have shown that spatial attention remarkably affects the trial-to-trial response variability shared between neurons.Difficulty in the attentional task adjusts how much concentration we maintain on what is currently important and what is filtered as irrelevant sensory information.However,how task difficulty mediates the interactions between neurons with separated receptive fields(RFs)that are attended to or attended away is still not clear.We examined spike count correlations between single-unit activities recorded simultaneously in the primary visual cortex(V1)while monkeys performed a spatial attention task with two levels of difficulty.Moreover,the RFs of the two neurons recorded were non-overlapping to allow us to study fluctuations in the correlated responses between competing visual inputs when the focus of attention was allocated to the RF of one neuron.While increasing difficulty in the spatial attention task,spike count correlations were either decreased to become negative between neuronal pairs,implying competition among them,with one neuron(or none)exhibiting attentional enhancement of firing rate,or increased to become positive,suggesting inter-neuronal cooperation,with one of the pair showing attentional suppression of spiking responses.Besides,the modulation of spike count correlations by task difficulty was independent of the attended locations.These findings provide evidence that task difficulty affects the functional interactions between different neuronal pools in V1 when selective attention resolves the spatial competition.
基金supported by the National Natural Science Foundation of China under Grant 62172059,61972057 and 62072055Hunan Provincial Natural Science Foundations of China under Grant 2020JJ4626+1 种基金Scientific Research Fund of Hunan Provincial Education Department of China under Grant 19B004Postgraduate Scientific Research Innovation Project of Hunan Province under Grant CX20210811.
文摘Image inpainting based on deep learning has been greatly improved.The original purpose of image inpainting was to repair some broken photos, suchas inpainting artifacts. However, it may also be used for malicious operations,such as destroying evidence. Therefore, detection and localization of imageinpainting operations are essential. Recent research shows that high-pass filteringfull convolutional network (HPFCN) is applied to image inpainting detection andachieves good results. However, those methods did not consider the spatial location and channel information of the feature map. To solve these shortcomings, weintroduce the squeezed excitation blocks (SE) and propose a high-pass filter attention full convolutional network (HPACN). In feature extraction, we apply concurrent spatial and channel attention (scSE) to enhance feature extraction and obtainmore information. Channel attention (cSE) is introduced in upsampling toenhance detection and localization. The experimental results show that the proposed method can achieve improvement on ImageNet.
基金This work was supported by the Sichuan Science and Technology Program(2021YFQ0003).
文摘Visual question answering(VQA)has attracted more and more attention in computer vision and natural language processing.Scholars are committed to studying how to better integrate image features and text features to achieve better results in VQA tasks.Analysis of all features may cause information redundancy and heavy computational burden.Attention mechanism is a wise way to solve this problem.However,using single attention mechanism may cause incomplete concern of features.This paper improves the attention mechanism method and proposes a hybrid attention mechanism that combines the spatial attention mechanism method and the channel attention mechanism method.In the case that the attention mechanism will cause the loss of the original features,a small portion of image features were added as compensation.For the attention mechanism of text features,a selfattention mechanism was introduced,and the internal structural features of sentences were strengthened to improve the overall model.The results show that attention mechanism and feature compensation add 6.1%accuracy to multimodal low-rank bilinear pooling network.
基金This work was supported in part by the Natural Science Foundation of China under Grant 62063004 and 61762033in part by the Hainan Provincial Natural Science Foundation of China under Grant 2019RC018 and 619QN246by the Postdoctoral Science Foundation under Grant 2020TQ0293.
文摘Recent applications of convolutional neural networks(CNNs)in single image super-resolution(SISR)have achieved unprecedented performance.However,existing CNN-based SISR network structure design consider mostly only channel or spatial information,and cannot make full use of both channel and spatial information to improve SISR performance further.The present work addresses this problem by proposing a mixed attention densely residual network architecture that can make full and simultaneous use of both channel and spatial information.Specifically,we propose a residual in dense network structure composed of dense connections between multiple dense residual groups to form a very deep network.This structure allows each dense residual group to apply a local residual skip connection and enables the cascading of multiple residual blocks to reuse previous features.A mixed attention module is inserted into each dense residual group,to enable the algorithm to fuse channel attention with laplacian spatial attention effectively,and thereby more adaptively focus on valuable feature learning.The qualitative and quantitative results of extensive experiments have demonstrate that the proposed method has a comparable performance with other stateof-the-art methods.
基金supported by the Natural Science Foundation of Shaanxi Province,China(No.2022JQ-661)the Project of Science and Technology Development Plan in Hangzhou,China(No.202202B38)the Xidian-FIAS International Joint Research Center,China.
文摘Sparse rewards pose significant challenges in deep reinforcement learning as agents struggle to learn from experiences with limited reward signals.Hindsight experience replay(HER)addresses this problem by creating“small goals”within a hierarchical decision model.However,HER does not consider the value of different episodes for agent learning.In this paper,we propose SPAHER,a framework for prioritizing hindsight experiences based on spatial position attention.SPAHER allows the agent to prioritize more valuable experiences in a manipulation task.It achieves this by calculating transition and trajectory spatial position functions to determine the value of each episode for experience replays.We evaluate SPAHER on eight robot manipulation tasks in the Fetch and Hand environments provided by OpenAI Gym.Simulation results show that our method improves the final mean success rate by an average of 3.63%compared to HER,especially in challenging Hand environments.Notably,these improvements are achieved without any increase in computation time.