Audio-visual scene classification(AVSC)poses a formidable challenge owing to the intricate spatial-temporal relationships exhibited by audio-visual signals,coupled with the complex spatial patterns of objects and text...Audio-visual scene classification(AVSC)poses a formidable challenge owing to the intricate spatial-temporal relationships exhibited by audio-visual signals,coupled with the complex spatial patterns of objects and textures found in visual images.The focus of recent studies has predominantly revolved around extracting features from diverse neural network structures,inadvertently neglecting the acquisition of semantically meaningful regions and crucial components within audio-visual data.The authors present a feature pyramid attention network(FPANet)for audio-visual scene understanding,which extracts semantically significant characteristics from audio-visual data.The authors’approach builds multi-scale hierarchical features of sound spectrograms and visual images using a feature pyramid representation and localises the semantically relevant regions with a feature pyramid attention module(FPAM).A dimension alignment(DA)strategy is employed to align feature maps from multiple layers,a pyramid spatial attention(PSA)to spatially locate essential regions,and a pyramid channel attention(PCA)to pinpoint significant temporal frames.Experiments on visual scene classification(VSC),audio scene classification(ASC),and AVSC tasks demonstrate that FPANet achieves performance on par with state-of-the-art(SOTA)approaches,with a 95.9 F1-score on the ADVANCE dataset and a relative improvement of 28.8%.Visualisation results show that FPANet can prioritise semantically meaningful areas in audio-visual signals.展开更多
Objective To report the development,validation,and findings of the Multi-dimensional Attention Rating Scale(MARS),a self-report tool crafted to evaluate six-dimension attention levels.Methods The MARS was developed ba...Objective To report the development,validation,and findings of the Multi-dimensional Attention Rating Scale(MARS),a self-report tool crafted to evaluate six-dimension attention levels.Methods The MARS was developed based on Classical Test Theory(CTT).Totally 202 highly educated healthy adult participants were recruited for reliability and validity tests.Reliability was measured using Cronbach's alpha and test-retest reliability.Structural validity was explored using principal component analysis.Criterion validity was analyzed by correlating MARS scores with the Toronto Hospital Alertness Test(THAT),the Attentional Control Scale(ACS),and the Attention Network Test(ANT).Results The MARS comprises 12 items spanning six distinct dimensions of attention:focused attention,sustained attention,shifting attention,selective attention,divided attention,and response inhibition.As assessed by six experts,the content validation index(CVI)was 0.95,the Cronbach's alpha for the MARS was 0.78,and the test-retest reliability was 0.81.Four factors were identified(cumulative variance contribution rate 68.79%).The total score of MARS was correlated positively with THAT(r=0.60,P<0.01)and ACS(r=0.78,P<0.01)and negatively with ANT's reaction time for alerting(r=−0.31,P=0.049).Conclusion The MARS can reliably and validly assess six-dimension attention levels in real-world settings and is expected to be a new tool for assessing multi-dimensional attention impairments in different mental disorders.展开更多
Aiming at the problems of low detection accuracy and large model size of existing object detection algorithms applied to complex road scenes,an improved you only look once version 8(YOLOv8)object detection algorithm f...Aiming at the problems of low detection accuracy and large model size of existing object detection algorithms applied to complex road scenes,an improved you only look once version 8(YOLOv8)object detection algorithm for infrared images,F-YOLOv8,is proposed.First,a spatial-to-depth network replaces the traditional backbone network's strided convolution or pooling layer.At the same time,it combines with the channel attention mechanism so that the neural network focuses on the channels with large weight values to better extract low-resolution image feature information;then an improved feature pyramid network of lightweight bidirectional feature pyramid network(L-BiFPN)is proposed,which can efficiently fuse features of different scales.In addition,a loss function of insertion of union based on the minimum point distance(MPDIoU)is introduced for bounding box regression,which obtains faster convergence speed and more accurate regression results.Experimental results on the FLIR dataset show that the improved algorithm can accurately detect infrared road targets in real time with 3%and 2.2%enhancement in mean average precision at 50%IoU(mAP50)and mean average precision at 50%—95%IoU(mAP50-95),respectively,and 38.1%,37.3%and 16.9%reduction in the number of model parameters,the model weight,and floating-point operations per second(FLOPs),respectively.To further demonstrate the detection capability of the improved algorithm,it is tested on the public dataset PASCAL VOC,and the results show that F-YOLO has excellent generalized detection performance.展开更多
We propose a hierarchical multi-scale attention mechanism-based model in response to the low accuracy and inefficient manual classification of existing oceanic biological image classification methods. Firstly, the hie...We propose a hierarchical multi-scale attention mechanism-based model in response to the low accuracy and inefficient manual classification of existing oceanic biological image classification methods. Firstly, the hierarchical efficient multi-scale attention(H-EMA) module is designed for lightweight feature extraction, achieving outstanding performance at a relatively low cost. Secondly, an improved EfficientNetV2 block is used to integrate information from different scales better and enhance inter-layer message passing. Furthermore, introducing the convolutional block attention module(CBAM) enhances the model's perception of critical features, optimizing its generalization ability. Lastly, Focal Loss is introduced to adjust the weights of complex samples to address the issue of imbalanced categories in the dataset, further improving the model's performance. The model achieved 96.11% accuracy on the intertidal marine organism dataset of Nanji Islands and 84.78% accuracy on the CIFAR-100 dataset, demonstrating its strong generalization ability to meet the demands of oceanic biological image classification.展开更多
Background Raising a child with attention deficit hyperactivity disorder(ADHD)is a key challenge for the primary caregiver.This systematic review aims to identify major burdens facing the primary caregiver of a child ...Background Raising a child with attention deficit hyperactivity disorder(ADHD)is a key challenge for the primary caregiver.This systematic review aims to identify major burdens facing the primary caregiver of a child with ADHD.Methods The electronic databases CINAHL,PubMed,and Google Scholar were searched for studies published in English from 2017 to 2022 assessing the challenges facing caregivers of a child with ADHD.The Johns Hopkins Nursing Evidence-Based Practice Model was used to assess quality and risk of bias of studies identified for inclusion.Articles were synthesized by evaluating principal themes of burden to caregivers,stress of caregivers,and effectiveness of intervention programs.Results Eleven articles were included in this review and included a total of 2426 participants.Findings revealed that caregivers of children with ADHD have a poor quality of life and high stress levels.Supportive parenting programs can be effective for improved coping and adaptation mechanisms with children with ADHD.However,few interventional studies were identified,increasing potential for bias.No meta-analysis was conducted.Conclusion Caregivers of children with ADHD can benefit from strategies to improve their quality of life and reduce their stress levels.Targeted parenting programs can make a positive difference in the well-being of caregivers and children with ADHD.Additional research is needed to address the evidence-based effectiveness of parenting support programs.展开更多
The Informer model leverages its innovative ProbSparse self-attention mechanism to demonstrate significant performance advantages in long-sequence time-series forecasting tasks.However,when confronted with time-series...The Informer model leverages its innovative ProbSparse self-attention mechanism to demonstrate significant performance advantages in long-sequence time-series forecasting tasks.However,when confronted with time-series data exhibiting multi-scale characteristics and substantial noise,the model’s attention mechanism reveals inherent limitations.Specifically,the model is susceptible to interference from local noise or irrelevant patterns,leading to diminished focus on globally critical information and consequently impairing forecasting accuracy.To address this challenge,this study proposes an enhanced architecture that integrates a Gated Attention mechanism into the original Informer framework.This mechanism employs learnable gating functions to dynamically and selectively impose differentiated weighting on crucial temporal segments and discriminative feature dimensions within the input sequence.This adaptive weighting strategy is designed to effectively suppress noise interference while amplifying the capture of core dynamic patterns.Consequently,it substantially strengthens the model’s capability to represent complex temporal dynamics and ultimately elevates its predictive performance.展开更多
Dynamic sign language recognition holds significant importance, particularly with the application of deep learning to address its complexity. However, existing methods face several challenges. Firstly, recognizing dyn...Dynamic sign language recognition holds significant importance, particularly with the application of deep learning to address its complexity. However, existing methods face several challenges. Firstly, recognizing dynamic sign language requires identifying keyframes that best represent the signs, and missing these keyframes reduces accuracy. Secondly, some methods do not focus enough on hand regions, which are small within the overall frame, leading to information loss. To address these challenges, we propose a novel Video Transformer Attention-based Network (VTAN) for dynamic sign language recognition. Our approach prioritizes informative frames and hand regions effectively. To tackle the first issue, we designed a keyframe extraction module enhanced by a convolutional autoencoder, which focuses on selecting information-rich frames and eliminating redundant ones from the video sequences. For the second issue, we developed a soft attention-based transformer module that emphasizes extracting features from hand regions, ensuring that the network pays more attention to hand information within sequences. This dual-focus approach improves effective dynamic sign language recognition by addressing the key challenges of identifying critical frames and emphasizing hand regions. Experimental results on two public benchmark datasets demonstrate the effectiveness of our network, outperforming most of the typical methods in sign language recognition tasks.展开更多
Hyperspectral image(HSI)classification is crucial for numerous remote sensing applications.Traditional deep learning methods may miss pixel relationships and context,leading to inefficiencies.This paper introduces the...Hyperspectral image(HSI)classification is crucial for numerous remote sensing applications.Traditional deep learning methods may miss pixel relationships and context,leading to inefficiencies.This paper introduces the spectral band graph convolutional and attention-enhanced CNN joint network(SGCCN),a novel approach that harnesses the power of spectral band graph convolutions for capturing long-range relationships,utilizes local perception of attention-enhanced multi-level convolutions for local spatial feature and employs a dynamic attention mechanism to enhance feature extraction.The SGCCN integrates spectral and spatial features through a self-attention fusion network,significantly improving classification accuracy and efficiency.The proposed method outperforms existing techniques,demonstrating its effectiveness in handling the challenges associated with HSI data.展开更多
Rail surface damage is a critical component of high-speed railway infrastructure,directly affecting train operational stability and safety.Existing methods face limitations in accuracy and speed for small-sample,multi...Rail surface damage is a critical component of high-speed railway infrastructure,directly affecting train operational stability and safety.Existing methods face limitations in accuracy and speed for small-sample,multi-category,and multi-scale target segmentation tasks.To address these challenges,this paper proposes Pyramid-MixNet,an intelligent segmentation model for high-speed rail surface damage,leveraging dataset construction and expansion alongside a feature pyramid-based encoder-decoder network with multi-attention mechanisms.The encoding net-work integrates Spatial Reduction Masked Multi-Head Attention(SRMMHA)to enhance global feature extraction while reducing trainable parameters.The decoding network incorporates Mix-Attention(MA),enabling multi-scale structural understanding and cross-scale token group correlation learning.Experimental results demonstrate that the proposed method achieves 62.17%average segmentation accuracy,80.28%Damage Dice Coefficient,and 56.83 FPS,meeting real-time detection requirements.The model’s high accuracy and scene adaptability significantly improve the detection of small-scale and complex multi-scale rail damage,offering practical value for real-time monitoring in high-speed railway maintenance systems.展开更多
Fine-grained Image Recognition(FGIR)task is dedicated to distinguishing similar sub-categories that belong to the same super-category,such as bird species and car types.In order to highlight visual differences,existin...Fine-grained Image Recognition(FGIR)task is dedicated to distinguishing similar sub-categories that belong to the same super-category,such as bird species and car types.In order to highlight visual differences,existing FGIR works often follow two steps:discriminative sub-region localization and local feature representation.However,these works pay less attention on global context information.They neglect a fact that the subtle visual difference in challenging scenarios can be highlighted through exploiting the spatial relationship among different subregions from a global view point.Therefore,in this paper,we consider both global and local information for FGIR,and propose a collaborative teacher-student strategy to reinforce and unity the two types of information.Our framework is implemented mainly by convolutional neural network,referred to Teacher-Student Based Attention Convolutional Neural Network(T-S-ACNN).For fine-grained local information,we choose the classic Multi-Attention Network(MA-Net)as our baseline,and propose a type of boundary constraint to further reduce background noises in the local attention maps.In this way,the discriminative sub-regions tend to appear in the area occupied by fine-grained objects,leading to more accurate sub-region localization.For fine-grained global information,we design a graph convolution based Global Attention Network(GA-Net),which can combine extracted local attention maps from MA-Net with non-local techniques to explore spatial relationship among subregions.At last,we develop a collaborative teacher-student strategy to adaptively determine the attended roles and optimization modes,so as to enhance the cooperative reinforcement of MA-Net and GA-Net.Extensive experiments on CUB-200-2011,Stanford Cars and FGVC Aircraft datasets illustrate the promising performance of our framework.展开更多
This study investigated how components of threat-related attentional biases are associated with levels of sense of control.Utilizing a using a spatial-cueing paradigm,36 college students with a high sense of control(f...This study investigated how components of threat-related attentional biases are associated with levels of sense of control.Utilizing a using a spatial-cueing paradigm,36 college students with a high sense of control(females=22,Mage=19.44,SD=1.36)and 35 with a low sense of control(females=15,Mage=19.77,SD=1.40)were assigned to task featuring different cue-target intervals(i.e.,50 and 800 ms).The student participants completed the Control Sense Scale,the GAD-7 Anxiety Scale,and the PHQ-9 Patient Health Questionnaire.Data from employing spatial-cueing task procedure,would provide the evidence on any differences in attentional biases toward threat images between the two groups.A repeated measures ANOVA indicated that both groups to exhibit attentional avoidance under the 50 ms interval condition.However,individuals in the low sense of control group(i.e.,LSC Group)demonstrated exacerbation of avoidance compared to those in the high sense of control group(i.e.,HSC Group).The current study did notfind any attentional bias components under the 800 ms interval condition.Thefindings provide preliminary evidence for a new vigilance-avoidance model for further study with a view to developing interventions targeting negative emotional disorders based on individuals’sense of control.展开更多
To improve small object detection and trajectory estimation from an aerial moving perspective,we propose the Aerial View Attention-PRB(AVA-PRB)model.AVA-PRB integrates two attention mechanisms—Coordinate Attention(CA...To improve small object detection and trajectory estimation from an aerial moving perspective,we propose the Aerial View Attention-PRB(AVA-PRB)model.AVA-PRB integrates two attention mechanisms—Coordinate Attention(CA)and the Convolutional Block Attention Module(CBAM)—to enhance detection accuracy.Additionally,Shape-IoU is employed as the loss function to refine localization precision.Our model further incorporates an adaptive feature fusion mechanism,which optimizes multi-scale object representation,ensuring robust tracking in complex aerial environments.We evaluate the performance of AVA-PRB on two benchmark datasets:Aerial Person Detection and VisDrone2019-Det.The model achieves 60.9%mAP@0.5 on the Aerial Person Detection dataset,and 51.2%mAP@0.5 on VisDrone2019-Det,demonstrating its effectiveness in aerial object detection.Beyond detection,we propose a novel trajectory estimation method that improves movement path prediction under aerial motion.Experimental results indicate that our approach reduces path deviation by up to 64%,effectively mitigating errors caused by rapid camera movements and background variations.By optimizing feature extraction and enhancing spatialtemporal coherence,our method significantly improves object tracking under aerial moving perspectives.This research addresses the limitations of fixed-camera tracking,enhancing flexibility and accuracy in aerial tracking applications.The proposed approach has broad potential for real-world applications,including surveillance,traffic monitoring,and environmental observation.展开更多
Due to the lack of accurate data and complex parameterization,the prediction of groundwater depth is a chal-lenge for numerical models.Machine learning can effectively solve this issue and has been proven useful in th...Due to the lack of accurate data and complex parameterization,the prediction of groundwater depth is a chal-lenge for numerical models.Machine learning can effectively solve this issue and has been proven useful in the prediction of groundwater depth in many areas.In this study,two new models are applied to the prediction of groundwater depth in the Ningxia area,China.The two models combine the improved dung beetle optimizer(DBO)algorithm with two deep learning models:The Multi-head Attention-Convolution Neural Network-Long Short Term Memory networks(MH-CNN-LSTM)and the Multi-head Attention-Convolution Neural Network-Gated Recurrent Unit(MH-CNN-GRU).The models with DBO show better prediction performance,with larger R(correlation coefficient),RPD(residual prediction deviation),and lower RMSE(root-mean-square error).Com-pared with the models with the original DBO,the R and RPD of models with the improved DBO increase by over 1.5%,and the RMSE decreases by over 1.8%,indicating better prediction results.In addition,compared with the multiple linear regression model,a traditional statistical model,deep learning models have better prediction performance.展开更多
基金Shenzhen Institute of Artificial Intelligence and Robotics for Society,Grant/Award Number:AC01202201003-02GuangDong Basic and Applied Basic Research Foundation,Grant/Award Number:2024A1515010252Longgang District Shenzhen's“Ten Action Plan”for Supporting Innovation Projects,Grant/Award Number:LGKCSDPT2024002。
文摘Audio-visual scene classification(AVSC)poses a formidable challenge owing to the intricate spatial-temporal relationships exhibited by audio-visual signals,coupled with the complex spatial patterns of objects and textures found in visual images.The focus of recent studies has predominantly revolved around extracting features from diverse neural network structures,inadvertently neglecting the acquisition of semantically meaningful regions and crucial components within audio-visual data.The authors present a feature pyramid attention network(FPANet)for audio-visual scene understanding,which extracts semantically significant characteristics from audio-visual data.The authors’approach builds multi-scale hierarchical features of sound spectrograms and visual images using a feature pyramid representation and localises the semantically relevant regions with a feature pyramid attention module(FPAM).A dimension alignment(DA)strategy is employed to align feature maps from multiple layers,a pyramid spatial attention(PSA)to spatially locate essential regions,and a pyramid channel attention(PCA)to pinpoint significant temporal frames.Experiments on visual scene classification(VSC),audio scene classification(ASC),and AVSC tasks demonstrate that FPANet achieves performance on par with state-of-the-art(SOTA)approaches,with a 95.9 F1-score on the ADVANCE dataset and a relative improvement of 28.8%.Visualisation results show that FPANet can prioritise semantically meaningful areas in audio-visual signals.
文摘Objective To report the development,validation,and findings of the Multi-dimensional Attention Rating Scale(MARS),a self-report tool crafted to evaluate six-dimension attention levels.Methods The MARS was developed based on Classical Test Theory(CTT).Totally 202 highly educated healthy adult participants were recruited for reliability and validity tests.Reliability was measured using Cronbach's alpha and test-retest reliability.Structural validity was explored using principal component analysis.Criterion validity was analyzed by correlating MARS scores with the Toronto Hospital Alertness Test(THAT),the Attentional Control Scale(ACS),and the Attention Network Test(ANT).Results The MARS comprises 12 items spanning six distinct dimensions of attention:focused attention,sustained attention,shifting attention,selective attention,divided attention,and response inhibition.As assessed by six experts,the content validation index(CVI)was 0.95,the Cronbach's alpha for the MARS was 0.78,and the test-retest reliability was 0.81.Four factors were identified(cumulative variance contribution rate 68.79%).The total score of MARS was correlated positively with THAT(r=0.60,P<0.01)and ACS(r=0.78,P<0.01)and negatively with ANT's reaction time for alerting(r=−0.31,P=0.049).Conclusion The MARS can reliably and validly assess six-dimension attention levels in real-world settings and is expected to be a new tool for assessing multi-dimensional attention impairments in different mental disorders.
基金supported by the National Natural Science Foundation of China(No.62103298)。
文摘Aiming at the problems of low detection accuracy and large model size of existing object detection algorithms applied to complex road scenes,an improved you only look once version 8(YOLOv8)object detection algorithm for infrared images,F-YOLOv8,is proposed.First,a spatial-to-depth network replaces the traditional backbone network's strided convolution or pooling layer.At the same time,it combines with the channel attention mechanism so that the neural network focuses on the channels with large weight values to better extract low-resolution image feature information;then an improved feature pyramid network of lightweight bidirectional feature pyramid network(L-BiFPN)is proposed,which can efficiently fuse features of different scales.In addition,a loss function of insertion of union based on the minimum point distance(MPDIoU)is introduced for bounding box regression,which obtains faster convergence speed and more accurate regression results.Experimental results on the FLIR dataset show that the improved algorithm can accurately detect infrared road targets in real time with 3%and 2.2%enhancement in mean average precision at 50%IoU(mAP50)and mean average precision at 50%—95%IoU(mAP50-95),respectively,and 38.1%,37.3%and 16.9%reduction in the number of model parameters,the model weight,and floating-point operations per second(FLOPs),respectively.To further demonstrate the detection capability of the improved algorithm,it is tested on the public dataset PASCAL VOC,and the results show that F-YOLO has excellent generalized detection performance.
基金supported by the National Natural Science Foundation of China (Nos.61806107 and 61702135)。
文摘We propose a hierarchical multi-scale attention mechanism-based model in response to the low accuracy and inefficient manual classification of existing oceanic biological image classification methods. Firstly, the hierarchical efficient multi-scale attention(H-EMA) module is designed for lightweight feature extraction, achieving outstanding performance at a relatively low cost. Secondly, an improved EfficientNetV2 block is used to integrate information from different scales better and enhance inter-layer message passing. Furthermore, introducing the convolutional block attention module(CBAM) enhances the model's perception of critical features, optimizing its generalization ability. Lastly, Focal Loss is introduced to adjust the weights of complex samples to address the issue of imbalanced categories in the dataset, further improving the model's performance. The model achieved 96.11% accuracy on the intertidal marine organism dataset of Nanji Islands and 84.78% accuracy on the CIFAR-100 dataset, demonstrating its strong generalization ability to meet the demands of oceanic biological image classification.
文摘Background Raising a child with attention deficit hyperactivity disorder(ADHD)is a key challenge for the primary caregiver.This systematic review aims to identify major burdens facing the primary caregiver of a child with ADHD.Methods The electronic databases CINAHL,PubMed,and Google Scholar were searched for studies published in English from 2017 to 2022 assessing the challenges facing caregivers of a child with ADHD.The Johns Hopkins Nursing Evidence-Based Practice Model was used to assess quality and risk of bias of studies identified for inclusion.Articles were synthesized by evaluating principal themes of burden to caregivers,stress of caregivers,and effectiveness of intervention programs.Results Eleven articles were included in this review and included a total of 2426 participants.Findings revealed that caregivers of children with ADHD have a poor quality of life and high stress levels.Supportive parenting programs can be effective for improved coping and adaptation mechanisms with children with ADHD.However,few interventional studies were identified,increasing potential for bias.No meta-analysis was conducted.Conclusion Caregivers of children with ADHD can benefit from strategies to improve their quality of life and reduce their stress levels.Targeted parenting programs can make a positive difference in the well-being of caregivers and children with ADHD.Additional research is needed to address the evidence-based effectiveness of parenting support programs.
文摘The Informer model leverages its innovative ProbSparse self-attention mechanism to demonstrate significant performance advantages in long-sequence time-series forecasting tasks.However,when confronted with time-series data exhibiting multi-scale characteristics and substantial noise,the model’s attention mechanism reveals inherent limitations.Specifically,the model is susceptible to interference from local noise or irrelevant patterns,leading to diminished focus on globally critical information and consequently impairing forecasting accuracy.To address this challenge,this study proposes an enhanced architecture that integrates a Gated Attention mechanism into the original Informer framework.This mechanism employs learnable gating functions to dynamically and selectively impose differentiated weighting on crucial temporal segments and discriminative feature dimensions within the input sequence.This adaptive weighting strategy is designed to effectively suppress noise interference while amplifying the capture of core dynamic patterns.Consequently,it substantially strengthens the model’s capability to represent complex temporal dynamics and ultimately elevates its predictive performance.
基金supported by the National Natural Science Foundation of China under Grant Nos.62076117 and 62166026the Jiangxi Provincial Key Laboratory of Virtual Reality under Grant No.2024SSY03151.
文摘Dynamic sign language recognition holds significant importance, particularly with the application of deep learning to address its complexity. However, existing methods face several challenges. Firstly, recognizing dynamic sign language requires identifying keyframes that best represent the signs, and missing these keyframes reduces accuracy. Secondly, some methods do not focus enough on hand regions, which are small within the overall frame, leading to information loss. To address these challenges, we propose a novel Video Transformer Attention-based Network (VTAN) for dynamic sign language recognition. Our approach prioritizes informative frames and hand regions effectively. To tackle the first issue, we designed a keyframe extraction module enhanced by a convolutional autoencoder, which focuses on selecting information-rich frames and eliminating redundant ones from the video sequences. For the second issue, we developed a soft attention-based transformer module that emphasizes extracting features from hand regions, ensuring that the network pays more attention to hand information within sequences. This dual-focus approach improves effective dynamic sign language recognition by addressing the key challenges of identifying critical frames and emphasizing hand regions. Experimental results on two public benchmark datasets demonstrate the effectiveness of our network, outperforming most of the typical methods in sign language recognition tasks.
基金supported in part by the National Natural Science Foundations of China(No.61801214)the Postgraduate Research Practice Innovation Program of NUAA(No.xcxjh20231504)。
文摘Hyperspectral image(HSI)classification is crucial for numerous remote sensing applications.Traditional deep learning methods may miss pixel relationships and context,leading to inefficiencies.This paper introduces the spectral band graph convolutional and attention-enhanced CNN joint network(SGCCN),a novel approach that harnesses the power of spectral band graph convolutions for capturing long-range relationships,utilizes local perception of attention-enhanced multi-level convolutions for local spatial feature and employs a dynamic attention mechanism to enhance feature extraction.The SGCCN integrates spectral and spatial features through a self-attention fusion network,significantly improving classification accuracy and efficiency.The proposed method outperforms existing techniques,demonstrating its effectiveness in handling the challenges associated with HSI data.
基金supported in part by the National Natural Science Foundation of China under Grant 6226070954Jiangxi Provincial Key R&D Programme under Grant 20244BBG73002.
文摘Rail surface damage is a critical component of high-speed railway infrastructure,directly affecting train operational stability and safety.Existing methods face limitations in accuracy and speed for small-sample,multi-category,and multi-scale target segmentation tasks.To address these challenges,this paper proposes Pyramid-MixNet,an intelligent segmentation model for high-speed rail surface damage,leveraging dataset construction and expansion alongside a feature pyramid-based encoder-decoder network with multi-attention mechanisms.The encoding net-work integrates Spatial Reduction Masked Multi-Head Attention(SRMMHA)to enhance global feature extraction while reducing trainable parameters.The decoding network incorporates Mix-Attention(MA),enabling multi-scale structural understanding and cross-scale token group correlation learning.Experimental results demonstrate that the proposed method achieves 62.17%average segmentation accuracy,80.28%Damage Dice Coefficient,and 56.83 FPS,meeting real-time detection requirements.The model’s high accuracy and scene adaptability significantly improve the detection of small-scale and complex multi-scale rail damage,offering practical value for real-time monitoring in high-speed railway maintenance systems.
基金supported by the National Natural Science Foundation of China,China (Grants No.62171232)the Priority Academic Program Development of Jiangsu Higher Education Institutions,China。
文摘Fine-grained Image Recognition(FGIR)task is dedicated to distinguishing similar sub-categories that belong to the same super-category,such as bird species and car types.In order to highlight visual differences,existing FGIR works often follow two steps:discriminative sub-region localization and local feature representation.However,these works pay less attention on global context information.They neglect a fact that the subtle visual difference in challenging scenarios can be highlighted through exploiting the spatial relationship among different subregions from a global view point.Therefore,in this paper,we consider both global and local information for FGIR,and propose a collaborative teacher-student strategy to reinforce and unity the two types of information.Our framework is implemented mainly by convolutional neural network,referred to Teacher-Student Based Attention Convolutional Neural Network(T-S-ACNN).For fine-grained local information,we choose the classic Multi-Attention Network(MA-Net)as our baseline,and propose a type of boundary constraint to further reduce background noises in the local attention maps.In this way,the discriminative sub-regions tend to appear in the area occupied by fine-grained objects,leading to more accurate sub-region localization.For fine-grained global information,we design a graph convolution based Global Attention Network(GA-Net),which can combine extracted local attention maps from MA-Net with non-local techniques to explore spatial relationship among subregions.At last,we develop a collaborative teacher-student strategy to adaptively determine the attended roles and optimization modes,so as to enhance the cooperative reinforcement of MA-Net and GA-Net.Extensive experiments on CUB-200-2011,Stanford Cars and FGVC Aircraft datasets illustrate the promising performance of our framework.
基金supported by the Philosophy and Social Science Fund for Young Scholars of Guangdong Province(GD23YXL06)Humanities and Social Sciences of Jiaying University(2023SKY01)+1 种基金General Project of Philosophy and Social Sciences Planning Fund of Guangdong Province(GD24XXL06)Humanities and Social Sciences of Jiaying University(2023SKY02).
文摘This study investigated how components of threat-related attentional biases are associated with levels of sense of control.Utilizing a using a spatial-cueing paradigm,36 college students with a high sense of control(females=22,Mage=19.44,SD=1.36)and 35 with a low sense of control(females=15,Mage=19.77,SD=1.40)were assigned to task featuring different cue-target intervals(i.e.,50 and 800 ms).The student participants completed the Control Sense Scale,the GAD-7 Anxiety Scale,and the PHQ-9 Patient Health Questionnaire.Data from employing spatial-cueing task procedure,would provide the evidence on any differences in attentional biases toward threat images between the two groups.A repeated measures ANOVA indicated that both groups to exhibit attentional avoidance under the 50 ms interval condition.However,individuals in the low sense of control group(i.e.,LSC Group)demonstrated exacerbation of avoidance compared to those in the high sense of control group(i.e.,HSC Group).The current study did notfind any attentional bias components under the 800 ms interval condition.Thefindings provide preliminary evidence for a new vigilance-avoidance model for further study with a view to developing interventions targeting negative emotional disorders based on individuals’sense of control.
基金funded by theNational Science and TechnologyCouncil(NSTC),Taiwan,under grant numbers NSTC 113-2634-F-A49-007 and NSTC 112-2634-F-A49-007.
文摘To improve small object detection and trajectory estimation from an aerial moving perspective,we propose the Aerial View Attention-PRB(AVA-PRB)model.AVA-PRB integrates two attention mechanisms—Coordinate Attention(CA)and the Convolutional Block Attention Module(CBAM)—to enhance detection accuracy.Additionally,Shape-IoU is employed as the loss function to refine localization precision.Our model further incorporates an adaptive feature fusion mechanism,which optimizes multi-scale object representation,ensuring robust tracking in complex aerial environments.We evaluate the performance of AVA-PRB on two benchmark datasets:Aerial Person Detection and VisDrone2019-Det.The model achieves 60.9%mAP@0.5 on the Aerial Person Detection dataset,and 51.2%mAP@0.5 on VisDrone2019-Det,demonstrating its effectiveness in aerial object detection.Beyond detection,we propose a novel trajectory estimation method that improves movement path prediction under aerial motion.Experimental results indicate that our approach reduces path deviation by up to 64%,effectively mitigating errors caused by rapid camera movements and background variations.By optimizing feature extraction and enhancing spatialtemporal coherence,our method significantly improves object tracking under aerial moving perspectives.This research addresses the limitations of fixed-camera tracking,enhancing flexibility and accuracy in aerial tracking applications.The proposed approach has broad potential for real-world applications,including surveillance,traffic monitoring,and environmental observation.
基金supported by the National Natural Science Foundation of China [grant numbers 42088101 and 42375048]。
文摘Due to the lack of accurate data and complex parameterization,the prediction of groundwater depth is a chal-lenge for numerical models.Machine learning can effectively solve this issue and has been proven useful in the prediction of groundwater depth in many areas.In this study,two new models are applied to the prediction of groundwater depth in the Ningxia area,China.The two models combine the improved dung beetle optimizer(DBO)algorithm with two deep learning models:The Multi-head Attention-Convolution Neural Network-Long Short Term Memory networks(MH-CNN-LSTM)and the Multi-head Attention-Convolution Neural Network-Gated Recurrent Unit(MH-CNN-GRU).The models with DBO show better prediction performance,with larger R(correlation coefficient),RPD(residual prediction deviation),and lower RMSE(root-mean-square error).Com-pared with the models with the original DBO,the R and RPD of models with the improved DBO increase by over 1.5%,and the RMSE decreases by over 1.8%,indicating better prediction results.In addition,compared with the multiple linear regression model,a traditional statistical model,deep learning models have better prediction performance.