随着光伏发电在全球能源体系中占比不断提升,超短期光伏发电量预测对电力系统调度与安全运行至关重要。然而,光伏发电量受多因素影响,具有显著随机性与波动性。为此,提出了一种基于TCN-BiLSTM-Attention模型的超短期光伏发电量预测方法...随着光伏发电在全球能源体系中占比不断提升,超短期光伏发电量预测对电力系统调度与安全运行至关重要。然而,光伏发电量受多因素影响,具有显著随机性与波动性。为此,提出了一种基于TCN-BiLSTM-Attention模型的超短期光伏发电量预测方法。首先通过皮尔逊相关分析筛选关键特征,并利用孤立森林算法检测异常值,结合线性插值法和标准化完成数据预处理。随后,通过时间卷积网络(Temporal Convolutional Network,TCN)提取时序特征,再利用双向长短期记忆网络(Bidirectional Long Short-Term Memory,BiLSTM)网络捕获前后向时间依赖关系,并在输出端引入注意力机制聚焦关键时间步特征。最后,在Desert Knowledge Australia Solar Centre(DKASC)数据集上的对比实验表明,与传统LSTM、BiLSTM模型相比,提出的TCN-BiLSTM-Attention模型在预测精度、稳定性等方面均表现出一定优势。展开更多
To enhance speech emotion recognition capability,this study constructs a speech emotion recognition model integrating the adaptive acoustic mixup(AAM)and improved coordinate and shuffle attention(ICASA)methods.The AAM...To enhance speech emotion recognition capability,this study constructs a speech emotion recognition model integrating the adaptive acoustic mixup(AAM)and improved coordinate and shuffle attention(ICASA)methods.The AAM method optimizes data augmentation by combining a sample selection strategy and dynamic interpolation coefficients,thus enabling information fusion of speech data with different emotions at the acoustic level.The ICASA method enhances feature extraction capability through dynamic fusion of the improved coordinate attention(ICA)and shuffle attention(SA)techniques.The ICA technique reduces computational overhead by employing depth-separable convolution and an h-swish activation function and captures long-range dependencies of multi-scale time-frequency features using the attention weights.The SA technique promotes feature interaction through channel shuffling,which helps the model learn richer and more discriminative emotional features.Experimental results demonstrate that,compared to the baseline model,the proposed model improves the weighted accuracy by 5.42%and 4.54%,and the unweighted accuracy by 3.37%and 3.85%on the IEMOCAP and RAVDESS datasets,respectively.These improvements were confirmed to be statistically significant by independent samples t-tests,further supporting the practical reliability and applicability of the proposed model in real-world emotion-aware speech systems.展开更多
Modern business information systems face significant challenges in managing heterogeneous data sources,integrating disparate systems,and providing real-time decision support in complex enterprise environments.Contempo...Modern business information systems face significant challenges in managing heterogeneous data sources,integrating disparate systems,and providing real-time decision support in complex enterprise environments.Contemporary enterprises typically operate 200+interconnected systems,with research indicating that 52% of organizations manage three or more enterprise content management systems,creating information silos that reduce operational efficiency by up to 35%.While attention mechanisms have demonstrated remarkable success in natural language processing and computer vision,their systematic application to business information systems remains largely unexplored.This paper presents the theoretical foundation for a Hierarchical Attention-Based Business Information System(HABIS)framework that applies multi-level attention mechanisms to enterprise environments.We provide a comprehensive mathematical formulation of the framework,analyze its computational complexity,and present a proof-of-concept implementation with simulation-based validation that demonstrates a 42% reduction in crosssystem query latency compared to legacy ERP modules and 70% improvement in prediction accuracy over baseline methods.The theoretical framework introduces four hierarchical attention levels:system-level attention for dynamic weighting of business systems,process-level attention for business process prioritization,data-level attention for critical information selection,and temporal attention for time-sensitive pattern recognition.Our complexity analysis demonstrates that the framework achieves O(n log n)computational complexity for attention computation,making it scalable to large enterprise environments including retail supply chains with 200+system-scale deployments.The proof-of-concept implementation validates the theoretical framework’s feasibility withMSE loss of 0.439 and response times of 0.000120 s per query,demonstrating its potential for addressing key challenges in business information systems.This work establishes a foundation for future empirical research and practical implementation of attention-driven enterprise systems.展开更多
Brain tumors require precise segmentation for diagnosis and treatment plans due to their complex morphology and heterogeneous characteristics.While MRI-based automatic brain tumor segmentation technology reduces the b...Brain tumors require precise segmentation for diagnosis and treatment plans due to their complex morphology and heterogeneous characteristics.While MRI-based automatic brain tumor segmentation technology reduces the burden on medical staff and provides quantitative information,existing methodologies and recent models still struggle to accurately capture and classify the fine boundaries and diverse morphologies of tumors.In order to address these challenges and maximize the performance of brain tumor segmentation,this research introduces a novel SwinUNETR-based model by integrating a new decoder block,the Hierarchical Channel-wise Attention Decoder(HCAD),into a powerful SwinUNETR encoder.The HCAD decoder block utilizes hierarchical features and channelspecific attention mechanisms to further fuse information at different scales transmitted from the encoder and preserve spatial details throughout the reconstruction phase.Rigorous evaluations on the recent BraTS GLI datasets demonstrate that the proposed SwinHCAD model achieved superior and improved segmentation accuracy on both the Dice score and HD95 metrics across all tumor subregions(WT,TC,and ET)compared to baseline models.In particular,the rationale and contribution of the model design were clarified through ablation studies to verify the effectiveness of the proposed HCAD decoder block.The results of this study are expected to greatly contribute to enhancing the efficiency of clinical diagnosis and treatment planning by increasing the precision of automated brain tumor segmentation.展开更多
Graph Federated Learning(GFL)has shown great potential in privacy protection and distributed intelligence through distributed collaborative training of graph-structured data without sharing raw information.However,exi...Graph Federated Learning(GFL)has shown great potential in privacy protection and distributed intelligence through distributed collaborative training of graph-structured data without sharing raw information.However,existing GFL approaches often lack the capability for comprehensive feature extraction and adaptive optimization,particularly in non-independent and identically distributed(NON-IID)scenarios where balancing global structural understanding and local node-level detail remains a challenge.To this end,this paper proposes a novel framework called GFL-SAR(Graph Federated Collaborative Learning Framework Based on Structural Amplification and Attention Refinement),which enhances the representation learning capability of graph data through a dual-branch collaborative design.Specifically,we propose the Structural Insight Amplifier(SIA),which utilizes an improved Graph Convolutional Network(GCN)to strengthen structural awareness and improve modeling of topological patterns.In parallel,we propose the Attentive Relational Refiner(ARR),which employs an enhanced Graph Attention Network(GAT)to perform fine-grained modeling of node relationships and neighborhood features,thereby improving the expressiveness of local interactions and preserving critical contextual information.GFL-SAR effectively integrates multi-scale features from every branch via feature fusion and federated optimization,thereby addressing existing GFL limitations in structural modeling and feature representation.Experiments on standard benchmark datasets including Cora,Citeseer,Polblogs,and Cora_ML demonstrate that GFL-SAR achieves superior performance in classification accuracy,convergence speed,and robustness compared to existing methods,confirming its effectiveness and generalizability in GFL tasks.展开更多
Clock synchronization has important applications in multi-agent collaboration(such as drone light shows,intelligent transportation systems,and game AI),group decision-making,and emergency rescue operations.Synchroniza...Clock synchronization has important applications in multi-agent collaboration(such as drone light shows,intelligent transportation systems,and game AI),group decision-making,and emergency rescue operations.Synchronization method based on pulse-coupled oscillators(PCOs)provides an effective solution for clock synchronization in wireless networks.However,the existing clock synchronization algorithms in multi-agent ad hoc networks are difficult to meet the requirements of high precision and high stability of synchronization clock in group cooperation.Hence,this paper constructs a network model,named DAUNet(unsupervised neural network based on dual attention),to enhance clock synchronization accuracy in multi-agent wireless ad hoc networks.Specifically,we design an unsupervised distributed neural network framework as the backbone,building upon classical PCO-based synchronization methods.This framework resolves issues such as prolonged time synchronization message exchange between nodes,difficulties in centralized node coordination,and challenges in distributed training.Furthermore,we introduce a dual-attention mechanism as the core module of DAUNet.By integrating a Multi-Head Attention module and a Gated Attention module,the model significantly improves information extraction capabilities while reducing computational complexity,effectively mitigating synchronization inaccuracies and instability in multi-agent ad hoc networks.To evaluate the effectiveness of the proposed model,comparative experiments and ablation studies were conducted against classical methods and existing deep learning models.The research results show that,compared with the deep learning networks based on DASA and LSTM,DAUNet can reduce the mean normalized phase difference(NPD)by 1 to 2 orders of magnitude.Compared with the attention models based on additive attention and self-attention mechanisms,the performance of DAUNet has improved by more than ten times.This study demonstrates DAUNet’s potential in advancing multi-agent ad hoc networking technologies.展开更多
Autonomous vehicles rely heavily on accurate and efficient scene segmentation for safe navigation and efficient operations.Traditional Bird’s Eye View(BEV)methods on semantic scene segmentation,which leverage multimo...Autonomous vehicles rely heavily on accurate and efficient scene segmentation for safe navigation and efficient operations.Traditional Bird’s Eye View(BEV)methods on semantic scene segmentation,which leverage multimodal sensor fusion,often struggle with noisy data and demand high-performance GPUs,leading to sensor misalignment and performance degradation.This paper introduces an Enhanced Channel Attention BEV(ECABEV),a novel approach designed to address the challenges under insufficient GPU memory conditions.ECABEV integrates camera and radar data through a de-noise enhanced channel attention mechanism,which utilizes global average and max pooling to effectively filter out noise while preserving discriminative features.Furthermore,an improved fusion approach is proposed to efficiently merge categorical data across modalities.To reduce computational overhead,a bilinear interpolation layer normalizationmethod is devised to ensure spatial feature fidelity.Moreover,a scalable crossentropy loss function is further designed to handle the imbalanced classes with less computational efficiency sacrifice.Extensive experiments on the nuScenes dataset demonstrate that ECABEV achieves state-of-the-art performance with an IoU of 39.961,using a lightweight ViT-B/14 backbone and lower resolution(224×224).Our approach highlights its cost-effectiveness and practical applicability,even on low-end devices.The code is publicly available at:https://github.com/YYF-CQU/ECABEV.git.展开更多
Single Image Super-Resolution(SISR)seeks to reconstruct high-resolution(HR)images from lowresolution(LR)inputs,thereby enhancing visual fidelity and the perception of fine details.While Transformer-based models—such ...Single Image Super-Resolution(SISR)seeks to reconstruct high-resolution(HR)images from lowresolution(LR)inputs,thereby enhancing visual fidelity and the perception of fine details.While Transformer-based models—such as SwinIR,Restormer,and HAT—have recently achieved impressive results in super-resolution tasks by capturing global contextual information,these methods often suffer from substantial computational and memory overhead,which limits their deployment on resource-constrained edge devices.To address these challenges,we propose a novel lightweight super-resolution network,termed Binary Attention-Guided Information Distillation(BAID),which integrates frequency-aware modeling with a binary attention mechanism to significantly reduce computational complexity and parameter count whilemaintaining strong reconstruction performance.The network combines a high–low frequency decoupling strategy with a local–global attention sharing mechanism,enabling efficient compression of redundant computations through binary attention guidance.At the core of the architecture lies the Attention-Guided Distillation Block(AGDB),which retains the strengths of the information distillation framework while introducing a sparse binary attention module to enhance both inference efficiency and feature representation.Extensive×4 superresolution experiments on four standard benchmarks—Set5,Set14,BSD100,and Urban100—demonstrate that BAID achieves Peak Signal-to-Noise Ratio(PSNR)values of 32.13,28.51,27.47,and 26.15,respectively,with only 1.22 million parameters and 26.1 G Floating-Point Operations(FLOPs),outperforming other state-of-the-art lightweight methods such as Information Multi-Distillation Network(IMDN)and Residual Feature Distillation Network(RFDN).These results highlight the proposed model’s ability to deliver high-quality image reconstruction while offering strong deployment efficiency,making it well-suited for image restoration tasks in resource-limited environments.展开更多
The 6D pose estimation of objects is of great significance for the intelligent assembly and sorting of industrial parts.In the industrial robot production scenarios,the 6D pose estimation of industrial parts mainly fa...The 6D pose estimation of objects is of great significance for the intelligent assembly and sorting of industrial parts.In the industrial robot production scenarios,the 6D pose estimation of industrial parts mainly faces two challenges:one is the loss of information and interference caused by occlusion and stacking in the sorting scenario,the other is the difficulty of feature extraction due to the weak texture of industrial parts.To address the above problems,this paper proposes an attention-based pixel-level voting network for 6D pose estimation of weakly textured industrial parts,namely CB-PVNet.On the one hand,the voting scheme can predict the keypoints of affected pixels,which improves the accuracy of keypoint localization even in scenarios such as weak texture and partial occlusion.On the other hand,the attention mechanism can extract interesting features of the object while suppressing useless features of surroundings.Extensive comparative experiments were conducted on both public datasets(including LINEMOD,Occlusion LINEMOD and T-LESS datasets)and self-made datasets.The experimental results indicate that the proposed network CB-PVNet can achieve accuracy of ADD(-s)comparable to state-of-the-art using only RGB images while ensuring real-time performance.Additionally,we also conducted robot grasping experiments in the real world.The balance between accuracy and computational efficiency makes the method well-suited for applications in industrial automation.展开更多
Fabric defect detection plays a vital role in ensuring textile quality.However,traditional manual inspection methods are often inefficient and inaccurate.To overcome these limitations,we propose FD-YOLO,an enhanced li...Fabric defect detection plays a vital role in ensuring textile quality.However,traditional manual inspection methods are often inefficient and inaccurate.To overcome these limitations,we propose FD-YOLO,an enhanced lightweight detection model based on the YOLOv11n framework.The proposed model introduces the Bi-level Routing Attention(BRAttention)mechanism to enhance defect feature extraction,enabling more detailed feature representation.It proposes Deep Progressive Cross-Scale Fusion Neck(DPCSFNeck)to better capture smallscale defects and incorporates a Multi-Scale Dilated Residual(MSDR)module to strengthen multi-scale feature representation.Furthermore,a Shared Detail-Enhanced Lightweight Head(SDELHead)is employed to reduce the risk of gradient explosion during training.Experimental results demonstrate that FD-YOLO achieves superior detection accuracy and Lightweight performance compared to the baseline YOLOv11n.展开更多
Tomato is a major economic crop worldwide,and diseases on tomato leaves can significantly reduce both yield and quality.Traditional manual inspection is inefficient and highly subjective,making it difficult to meet th...Tomato is a major economic crop worldwide,and diseases on tomato leaves can significantly reduce both yield and quality.Traditional manual inspection is inefficient and highly subjective,making it difficult to meet the requirements of early disease identification in complex natural environments.To address this issue,this study proposes an improved YOLO11-based model,YOLO-SPDNet(Scale Sequence Fusion,Position-Channel Attention,and Dual Enhancement Network).The model integrates the SEAM(Self-Ensembling Attention Mechanism)semantic enhancement module,the MLCA(Mixed Local Channel Attention)lightweight attention mechanism,and the SPA(Scale-Position-Detail Awareness)module composed of SSFF(Scale Sequence Feature Fusion),TFE(Triple Feature Encoding),and CPAM(Channel and Position Attention Mechanism).These enhancements strengthen fine-grained lesion detection while maintaining model lightweightness.Experimental results show that YOLO-SPDNet achieves an accuracy of 91.8%,a recall of 86.5%,and an mAP@0.5 of 90.6%on the test set,with a computational complexity of 12.5 GFLOPs.Furthermore,the model reaches a real-time inference speed of 987 FPS,making it suitable for deployment on mobile agricultural terminals and online monitoring systems.Comparative analysis and ablation studies further validate the reliability and practical applicability of the proposed model in complex natural scenes.展开更多
Stereo matching is a pivotal task in computer vision,enabling precise depth estimation from stereo image pairs,yet it encounters challenges in regions with reflections,repetitive textures,or fine structures.In this pa...Stereo matching is a pivotal task in computer vision,enabling precise depth estimation from stereo image pairs,yet it encounters challenges in regions with reflections,repetitive textures,or fine structures.In this paper,we propose a Semantic-Guided Parallax Attention Stereo Matching Network(SGPASMnet)that can be trained in unsupervised manner,building upon the Parallax Attention Stereo Matching Network(PASMnet).Our approach leverages unsupervised learning to address the scarcity of ground truth disparity in stereo matching datasets,facilitating robust training across diverse scene-specific datasets and enhancing generalization.SGPASMnet incorporates two novel components:a Cross-Scale Feature Interaction(CSFI)block and semantic feature augmentation using a pre-trained semantic segmentation model,SegFormer,seamlessly embedded into the parallax attention mechanism.The CSFI block enables effective fusion ofmulti-scale features,integrating coarse and fine details to enhance disparity estimation accuracy.Semantic features,extracted by SegFormer,enrich the parallax attention mechanism by providing high-level scene context,significantly improving performance in ambiguous regions.Our model unifies these enhancements within a cohesive architecture,comprising semantic feature extraction,an hourglass network,a semantic-guided cascaded parallax attentionmodule,outputmodule,and a disparity refinement network.Evaluations on the KITTI2015 dataset demonstrate that our unsupervised method achieves a lower error rate compared to the original PASMnet,highlighting the effectiveness of our enhancements in handling complex scenes.By harnessing unsupervised learning without ground truth disparity needed,SGPASMnet offers a scalable and robust solution for accurate stereo matching,with superior generalization across varied real-world applications.展开更多
The Vertical Total Electron Content(VTEC)of the ionosphere is a crucial parameter for describing the distribution and dynamic changes within the ionosphere.The study utilizes Dual Hybrid Attentional UNet(DHA-UNet)mode...The Vertical Total Electron Content(VTEC)of the ionosphere is a crucial parameter for describing the distribution and dynamic changes within the ionosphere.The study utilizes Dual Hybrid Attentional UNet(DHA-UNet)model to achieve higher forecasting performance for global VTEC predictions under the condition of data acquisition delays.Initially,this study uses the first Hybrid Attentional UNet(HA-UNet)model to predict the intermediate missing data.The missing data are caused by delays in data processing,making the Global Ionosphere Map(GIM)for the current day unavailable.Subsequently,the predicted results from the first HA-UNet model are concatenated with the input data to serve as the input data for the second HA-UNet model,yielding the final prediction results.The performance of DHA-UNet model is then evaluated under varying solar and geomagnetic activity conditions.Evaluation results demonstrate that the DHA-UNet model exhibits higher forecasting accuracy and stability compared to commonly used temporal and spatiotemporal forecasting models.Compared to CODG VTEC,the DHA-UNet model achieves Mean Absolute Error(MAE)values of 2.60 TECU,3.07 TECU,3.78 TECU,and 6.45TECU during quiet,weak,moderate,and strong geomagnetic storm periods,respectively,in years of high solar activity.In years of low solar activity,the model achieves MAE values of 1.00 TECU,1.15 TECU,and 1.54 TECU during quiet,weak,and moderate geomagnetic storm periods,respectively.Even during strong geomagnetic storms,55%of the residuals from the DHA-UNet model fall within the-5.0 TECU to 5.0 TECU range,surpassing other commonly used models.Compared to the C1PG forecasting product,the DHA-UNet model shows particularly notable improvements in accuracy during the spring and winter seasons,as well as in mid-to high-latitude regions.展开更多
Transformer-based neural-network quantum states(NNQS)have shown great promise in representing quantum manybody ground states,offering high flexibility and accuracy.However,the interpretability of such models remains l...Transformer-based neural-network quantum states(NNQS)have shown great promise in representing quantum manybody ground states,offering high flexibility and accuracy.However,the interpretability of such models remains limited,especially in terms of connecting network components to physically meaningful quantities.We propose that the attention mechanism—a central module in transformer architectures—explicitly models the conditional information flow between orbitals.Intuitively,as the transformer learns to predict orbital configurations by optimizing an energy functional,it approximates the conditional probability distribution p(xn|x_(1),...,x_(n-1)),implicitly encoding conditional mutual information(CMI)among orbitals.This suggests a natural correspondence between attention maps and CMI structures in quantum systems.To probe this idea,we compare weighted attention scores from trained transformer wavefunction ansatze with CMI matrices across several representative small molecules.In most cases,we observe a positive rank-level correlation(Kendall's tau)between attention and CMI,suggesting that the learned attention can reflect physically relevant orbital dependencies.This study provides a quantitative link between transformer attention and conditional mutual information in the NNQS setting.Our results provide a step toward explainable deep learning in quantum chemistry,pointing to opportunities in interpreting attention as a proxy for physical correlations.展开更多
文摘随着光伏发电在全球能源体系中占比不断提升,超短期光伏发电量预测对电力系统调度与安全运行至关重要。然而,光伏发电量受多因素影响,具有显著随机性与波动性。为此,提出了一种基于TCN-BiLSTM-Attention模型的超短期光伏发电量预测方法。首先通过皮尔逊相关分析筛选关键特征,并利用孤立森林算法检测异常值,结合线性插值法和标准化完成数据预处理。随后,通过时间卷积网络(Temporal Convolutional Network,TCN)提取时序特征,再利用双向长短期记忆网络(Bidirectional Long Short-Term Memory,BiLSTM)网络捕获前后向时间依赖关系,并在输出端引入注意力机制聚焦关键时间步特征。最后,在Desert Knowledge Australia Solar Centre(DKASC)数据集上的对比实验表明,与传统LSTM、BiLSTM模型相比,提出的TCN-BiLSTM-Attention模型在预测精度、稳定性等方面均表现出一定优势。
基金supported by the National Natural Science Foundation of China under Grant No.12204062the Natural Science Foundation of Shandong Province under Grant No.ZR2022MF330。
文摘To enhance speech emotion recognition capability,this study constructs a speech emotion recognition model integrating the adaptive acoustic mixup(AAM)and improved coordinate and shuffle attention(ICASA)methods.The AAM method optimizes data augmentation by combining a sample selection strategy and dynamic interpolation coefficients,thus enabling information fusion of speech data with different emotions at the acoustic level.The ICASA method enhances feature extraction capability through dynamic fusion of the improved coordinate attention(ICA)and shuffle attention(SA)techniques.The ICA technique reduces computational overhead by employing depth-separable convolution and an h-swish activation function and captures long-range dependencies of multi-scale time-frequency features using the attention weights.The SA technique promotes feature interaction through channel shuffling,which helps the model learn richer and more discriminative emotional features.Experimental results demonstrate that,compared to the baseline model,the proposed model improves the weighted accuracy by 5.42%and 4.54%,and the unweighted accuracy by 3.37%and 3.85%on the IEMOCAP and RAVDESS datasets,respectively.These improvements were confirmed to be statistically significant by independent samples t-tests,further supporting the practical reliability and applicability of the proposed model in real-world emotion-aware speech systems.
文摘Modern business information systems face significant challenges in managing heterogeneous data sources,integrating disparate systems,and providing real-time decision support in complex enterprise environments.Contemporary enterprises typically operate 200+interconnected systems,with research indicating that 52% of organizations manage three or more enterprise content management systems,creating information silos that reduce operational efficiency by up to 35%.While attention mechanisms have demonstrated remarkable success in natural language processing and computer vision,their systematic application to business information systems remains largely unexplored.This paper presents the theoretical foundation for a Hierarchical Attention-Based Business Information System(HABIS)framework that applies multi-level attention mechanisms to enterprise environments.We provide a comprehensive mathematical formulation of the framework,analyze its computational complexity,and present a proof-of-concept implementation with simulation-based validation that demonstrates a 42% reduction in crosssystem query latency compared to legacy ERP modules and 70% improvement in prediction accuracy over baseline methods.The theoretical framework introduces four hierarchical attention levels:system-level attention for dynamic weighting of business systems,process-level attention for business process prioritization,data-level attention for critical information selection,and temporal attention for time-sensitive pattern recognition.Our complexity analysis demonstrates that the framework achieves O(n log n)computational complexity for attention computation,making it scalable to large enterprise environments including retail supply chains with 200+system-scale deployments.The proof-of-concept implementation validates the theoretical framework’s feasibility withMSE loss of 0.439 and response times of 0.000120 s per query,demonstrating its potential for addressing key challenges in business information systems.This work establishes a foundation for future empirical research and practical implementation of attention-driven enterprise systems.
基金supported by Institute of Information&Communications Technology Planning&Evaluation(IITP)under the Metaverse Support Program to Nurture the Best Talents(IITP-2024-RS-2023-00254529)grant funded by the Korea government(MSIT).
文摘Brain tumors require precise segmentation for diagnosis and treatment plans due to their complex morphology and heterogeneous characteristics.While MRI-based automatic brain tumor segmentation technology reduces the burden on medical staff and provides quantitative information,existing methodologies and recent models still struggle to accurately capture and classify the fine boundaries and diverse morphologies of tumors.In order to address these challenges and maximize the performance of brain tumor segmentation,this research introduces a novel SwinUNETR-based model by integrating a new decoder block,the Hierarchical Channel-wise Attention Decoder(HCAD),into a powerful SwinUNETR encoder.The HCAD decoder block utilizes hierarchical features and channelspecific attention mechanisms to further fuse information at different scales transmitted from the encoder and preserve spatial details throughout the reconstruction phase.Rigorous evaluations on the recent BraTS GLI datasets demonstrate that the proposed SwinHCAD model achieved superior and improved segmentation accuracy on both the Dice score and HD95 metrics across all tumor subregions(WT,TC,and ET)compared to baseline models.In particular,the rationale and contribution of the model design were clarified through ablation studies to verify the effectiveness of the proposed HCAD decoder block.The results of this study are expected to greatly contribute to enhancing the efficiency of clinical diagnosis and treatment planning by increasing the precision of automated brain tumor segmentation.
基金supported by National Natural Science Foundation of China(62466045)Inner Mongolia Natural Science Foundation Project(2021LHMS06003)Inner Mongolia University Basic Research Business Fee Project(114).
文摘Graph Federated Learning(GFL)has shown great potential in privacy protection and distributed intelligence through distributed collaborative training of graph-structured data without sharing raw information.However,existing GFL approaches often lack the capability for comprehensive feature extraction and adaptive optimization,particularly in non-independent and identically distributed(NON-IID)scenarios where balancing global structural understanding and local node-level detail remains a challenge.To this end,this paper proposes a novel framework called GFL-SAR(Graph Federated Collaborative Learning Framework Based on Structural Amplification and Attention Refinement),which enhances the representation learning capability of graph data through a dual-branch collaborative design.Specifically,we propose the Structural Insight Amplifier(SIA),which utilizes an improved Graph Convolutional Network(GCN)to strengthen structural awareness and improve modeling of topological patterns.In parallel,we propose the Attentive Relational Refiner(ARR),which employs an enhanced Graph Attention Network(GAT)to perform fine-grained modeling of node relationships and neighborhood features,thereby improving the expressiveness of local interactions and preserving critical contextual information.GFL-SAR effectively integrates multi-scale features from every branch via feature fusion and federated optimization,thereby addressing existing GFL limitations in structural modeling and feature representation.Experiments on standard benchmark datasets including Cora,Citeseer,Polblogs,and Cora_ML demonstrate that GFL-SAR achieves superior performance in classification accuracy,convergence speed,and robustness compared to existing methods,confirming its effectiveness and generalizability in GFL tasks.
文摘Clock synchronization has important applications in multi-agent collaboration(such as drone light shows,intelligent transportation systems,and game AI),group decision-making,and emergency rescue operations.Synchronization method based on pulse-coupled oscillators(PCOs)provides an effective solution for clock synchronization in wireless networks.However,the existing clock synchronization algorithms in multi-agent ad hoc networks are difficult to meet the requirements of high precision and high stability of synchronization clock in group cooperation.Hence,this paper constructs a network model,named DAUNet(unsupervised neural network based on dual attention),to enhance clock synchronization accuracy in multi-agent wireless ad hoc networks.Specifically,we design an unsupervised distributed neural network framework as the backbone,building upon classical PCO-based synchronization methods.This framework resolves issues such as prolonged time synchronization message exchange between nodes,difficulties in centralized node coordination,and challenges in distributed training.Furthermore,we introduce a dual-attention mechanism as the core module of DAUNet.By integrating a Multi-Head Attention module and a Gated Attention module,the model significantly improves information extraction capabilities while reducing computational complexity,effectively mitigating synchronization inaccuracies and instability in multi-agent ad hoc networks.To evaluate the effectiveness of the proposed model,comparative experiments and ablation studies were conducted against classical methods and existing deep learning models.The research results show that,compared with the deep learning networks based on DASA and LSTM,DAUNet can reduce the mean normalized phase difference(NPD)by 1 to 2 orders of magnitude.Compared with the attention models based on additive attention and self-attention mechanisms,the performance of DAUNet has improved by more than ten times.This study demonstrates DAUNet’s potential in advancing multi-agent ad hoc networking technologies.
基金funded by the National Natural Science Foundation of China,grant number 62262045the Fundamental Research Funds for the Central Universities,grant number 2023CDJYGRH-YB11the Open Funding of SUGON Industrial Control and Security Center,grant number CUIT-SICSC-2025-03.
文摘Autonomous vehicles rely heavily on accurate and efficient scene segmentation for safe navigation and efficient operations.Traditional Bird’s Eye View(BEV)methods on semantic scene segmentation,which leverage multimodal sensor fusion,often struggle with noisy data and demand high-performance GPUs,leading to sensor misalignment and performance degradation.This paper introduces an Enhanced Channel Attention BEV(ECABEV),a novel approach designed to address the challenges under insufficient GPU memory conditions.ECABEV integrates camera and radar data through a de-noise enhanced channel attention mechanism,which utilizes global average and max pooling to effectively filter out noise while preserving discriminative features.Furthermore,an improved fusion approach is proposed to efficiently merge categorical data across modalities.To reduce computational overhead,a bilinear interpolation layer normalizationmethod is devised to ensure spatial feature fidelity.Moreover,a scalable crossentropy loss function is further designed to handle the imbalanced classes with less computational efficiency sacrifice.Extensive experiments on the nuScenes dataset demonstrate that ECABEV achieves state-of-the-art performance with an IoU of 39.961,using a lightweight ViT-B/14 backbone and lower resolution(224×224).Our approach highlights its cost-effectiveness and practical applicability,even on low-end devices.The code is publicly available at:https://github.com/YYF-CQU/ECABEV.git.
基金funded by Project of Sichuan Provincial Department of Science and Technology under 2025JDKP0150the Fundamental Research Funds for the Central Universities under 25CAFUC03093.
文摘Single Image Super-Resolution(SISR)seeks to reconstruct high-resolution(HR)images from lowresolution(LR)inputs,thereby enhancing visual fidelity and the perception of fine details.While Transformer-based models—such as SwinIR,Restormer,and HAT—have recently achieved impressive results in super-resolution tasks by capturing global contextual information,these methods often suffer from substantial computational and memory overhead,which limits their deployment on resource-constrained edge devices.To address these challenges,we propose a novel lightweight super-resolution network,termed Binary Attention-Guided Information Distillation(BAID),which integrates frequency-aware modeling with a binary attention mechanism to significantly reduce computational complexity and parameter count whilemaintaining strong reconstruction performance.The network combines a high–low frequency decoupling strategy with a local–global attention sharing mechanism,enabling efficient compression of redundant computations through binary attention guidance.At the core of the architecture lies the Attention-Guided Distillation Block(AGDB),which retains the strengths of the information distillation framework while introducing a sparse binary attention module to enhance both inference efficiency and feature representation.Extensive×4 superresolution experiments on four standard benchmarks—Set5,Set14,BSD100,and Urban100—demonstrate that BAID achieves Peak Signal-to-Noise Ratio(PSNR)values of 32.13,28.51,27.47,and 26.15,respectively,with only 1.22 million parameters and 26.1 G Floating-Point Operations(FLOPs),outperforming other state-of-the-art lightweight methods such as Information Multi-Distillation Network(IMDN)and Residual Feature Distillation Network(RFDN).These results highlight the proposed model’s ability to deliver high-quality image reconstruction while offering strong deployment efficiency,making it well-suited for image restoration tasks in resource-limited environments.
基金supported by the Knowledge Innovation Program of Wuhan-Shuguang Project(Grant No.2023010201020443)the School-Level Scientific Research Project Funding Program of Jianghan University(Grant No.2022XKZX33)the Natural Science Foundation of Hubei Province(Grant No.2024AFB466).
文摘The 6D pose estimation of objects is of great significance for the intelligent assembly and sorting of industrial parts.In the industrial robot production scenarios,the 6D pose estimation of industrial parts mainly faces two challenges:one is the loss of information and interference caused by occlusion and stacking in the sorting scenario,the other is the difficulty of feature extraction due to the weak texture of industrial parts.To address the above problems,this paper proposes an attention-based pixel-level voting network for 6D pose estimation of weakly textured industrial parts,namely CB-PVNet.On the one hand,the voting scheme can predict the keypoints of affected pixels,which improves the accuracy of keypoint localization even in scenarios such as weak texture and partial occlusion.On the other hand,the attention mechanism can extract interesting features of the object while suppressing useless features of surroundings.Extensive comparative experiments were conducted on both public datasets(including LINEMOD,Occlusion LINEMOD and T-LESS datasets)and self-made datasets.The experimental results indicate that the proposed network CB-PVNet can achieve accuracy of ADD(-s)comparable to state-of-the-art using only RGB images while ensuring real-time performance.Additionally,we also conducted robot grasping experiments in the real world.The balance between accuracy and computational efficiency makes the method well-suited for applications in industrial automation.
基金financially supported by the Fujian Provincial Department of Science and Technology,the Collaborative Innovation Platform Project for Key Technologies of Smart Warehousing and Logistics Systems in the Fuzhou-Xiamen-Quanzhou National Independent Innovation Demonstration Zone(No.2025E3024).
文摘Fabric defect detection plays a vital role in ensuring textile quality.However,traditional manual inspection methods are often inefficient and inaccurate.To overcome these limitations,we propose FD-YOLO,an enhanced lightweight detection model based on the YOLOv11n framework.The proposed model introduces the Bi-level Routing Attention(BRAttention)mechanism to enhance defect feature extraction,enabling more detailed feature representation.It proposes Deep Progressive Cross-Scale Fusion Neck(DPCSFNeck)to better capture smallscale defects and incorporates a Multi-Scale Dilated Residual(MSDR)module to strengthen multi-scale feature representation.Furthermore,a Shared Detail-Enhanced Lightweight Head(SDELHead)is employed to reduce the risk of gradient explosion during training.Experimental results demonstrate that FD-YOLO achieves superior detection accuracy and Lightweight performance compared to the baseline YOLOv11n.
基金Tianmin Tianyuan Boutique Vegetable Industry Technology Service Station(Grant No.2024120011003081)Development of Environmental Monitoring and Traceability System for Wuqing Agricultural Production Areas(Grant No.2024120011001866)。
文摘Tomato is a major economic crop worldwide,and diseases on tomato leaves can significantly reduce both yield and quality.Traditional manual inspection is inefficient and highly subjective,making it difficult to meet the requirements of early disease identification in complex natural environments.To address this issue,this study proposes an improved YOLO11-based model,YOLO-SPDNet(Scale Sequence Fusion,Position-Channel Attention,and Dual Enhancement Network).The model integrates the SEAM(Self-Ensembling Attention Mechanism)semantic enhancement module,the MLCA(Mixed Local Channel Attention)lightweight attention mechanism,and the SPA(Scale-Position-Detail Awareness)module composed of SSFF(Scale Sequence Feature Fusion),TFE(Triple Feature Encoding),and CPAM(Channel and Position Attention Mechanism).These enhancements strengthen fine-grained lesion detection while maintaining model lightweightness.Experimental results show that YOLO-SPDNet achieves an accuracy of 91.8%,a recall of 86.5%,and an mAP@0.5 of 90.6%on the test set,with a computational complexity of 12.5 GFLOPs.Furthermore,the model reaches a real-time inference speed of 987 FPS,making it suitable for deployment on mobile agricultural terminals and online monitoring systems.Comparative analysis and ablation studies further validate the reliability and practical applicability of the proposed model in complex natural scenes.
基金supported by the National Natural Science Foundation of China,No.62301497the Science and Technology Research Program of Henan,No.252102211024the Key Research and Development Program of Henan,No.231111212000.
文摘Stereo matching is a pivotal task in computer vision,enabling precise depth estimation from stereo image pairs,yet it encounters challenges in regions with reflections,repetitive textures,or fine structures.In this paper,we propose a Semantic-Guided Parallax Attention Stereo Matching Network(SGPASMnet)that can be trained in unsupervised manner,building upon the Parallax Attention Stereo Matching Network(PASMnet).Our approach leverages unsupervised learning to address the scarcity of ground truth disparity in stereo matching datasets,facilitating robust training across diverse scene-specific datasets and enhancing generalization.SGPASMnet incorporates two novel components:a Cross-Scale Feature Interaction(CSFI)block and semantic feature augmentation using a pre-trained semantic segmentation model,SegFormer,seamlessly embedded into the parallax attention mechanism.The CSFI block enables effective fusion ofmulti-scale features,integrating coarse and fine details to enhance disparity estimation accuracy.Semantic features,extracted by SegFormer,enrich the parallax attention mechanism by providing high-level scene context,significantly improving performance in ambiguous regions.Our model unifies these enhancements within a cohesive architecture,comprising semantic feature extraction,an hourglass network,a semantic-guided cascaded parallax attentionmodule,outputmodule,and a disparity refinement network.Evaluations on the KITTI2015 dataset demonstrate that our unsupervised method achieves a lower error rate compared to the original PASMnet,highlighting the effectiveness of our enhancements in handling complex scenes.By harnessing unsupervised learning without ground truth disparity needed,SGPASMnet offers a scalable and robust solution for accurate stereo matching,with superior generalization across varied real-world applications.
基金funded by the National Key R&D Program of China(No.2022YFB3904402)the National Natural Science Foundation of China(Nos.42474037 and U2233217)。
文摘The Vertical Total Electron Content(VTEC)of the ionosphere is a crucial parameter for describing the distribution and dynamic changes within the ionosphere.The study utilizes Dual Hybrid Attentional UNet(DHA-UNet)model to achieve higher forecasting performance for global VTEC predictions under the condition of data acquisition delays.Initially,this study uses the first Hybrid Attentional UNet(HA-UNet)model to predict the intermediate missing data.The missing data are caused by delays in data processing,making the Global Ionosphere Map(GIM)for the current day unavailable.Subsequently,the predicted results from the first HA-UNet model are concatenated with the input data to serve as the input data for the second HA-UNet model,yielding the final prediction results.The performance of DHA-UNet model is then evaluated under varying solar and geomagnetic activity conditions.Evaluation results demonstrate that the DHA-UNet model exhibits higher forecasting accuracy and stability compared to commonly used temporal and spatiotemporal forecasting models.Compared to CODG VTEC,the DHA-UNet model achieves Mean Absolute Error(MAE)values of 2.60 TECU,3.07 TECU,3.78 TECU,and 6.45TECU during quiet,weak,moderate,and strong geomagnetic storm periods,respectively,in years of high solar activity.In years of low solar activity,the model achieves MAE values of 1.00 TECU,1.15 TECU,and 1.54 TECU during quiet,weak,and moderate geomagnetic storm periods,respectively.Even during strong geomagnetic storms,55%of the residuals from the DHA-UNet model fall within the-5.0 TECU to 5.0 TECU range,surpassing other commonly used models.Compared to the C1PG forecasting product,the DHA-UNet model shows particularly notable improvements in accuracy during the spring and winter seasons,as well as in mid-to high-latitude regions.
基金supported by the National Natural Science Foundation of China(Grant No.T2222026)the CAS Project for Young Scientists in Basic Research(Grant No.YSBR-034)the Robotic AIScientist Platform of the Chinese Academy of Sciences。
文摘Transformer-based neural-network quantum states(NNQS)have shown great promise in representing quantum manybody ground states,offering high flexibility and accuracy.However,the interpretability of such models remains limited,especially in terms of connecting network components to physically meaningful quantities.We propose that the attention mechanism—a central module in transformer architectures—explicitly models the conditional information flow between orbitals.Intuitively,as the transformer learns to predict orbital configurations by optimizing an energy functional,it approximates the conditional probability distribution p(xn|x_(1),...,x_(n-1)),implicitly encoding conditional mutual information(CMI)among orbitals.This suggests a natural correspondence between attention maps and CMI structures in quantum systems.To probe this idea,we compare weighted attention scores from trained transformer wavefunction ansatze with CMI matrices across several representative small molecules.In most cases,we observe a positive rank-level correlation(Kendall's tau)between attention and CMI,suggesting that the learned attention can reflect physically relevant orbital dependencies.This study provides a quantitative link between transformer attention and conditional mutual information in the NNQS setting.Our results provide a step toward explainable deep learning in quantum chemistry,pointing to opportunities in interpreting attention as a proxy for physical correlations.