期刊文献+
共找到1,991篇文章
< 1 2 100 >
每页显示 20 50 100
Speech Emotion Recognition Based on the Adaptive Acoustic Enhancement and Refined Attention Mechanism
1
作者 Jun Li Chunyan Liang +1 位作者 Zhiguo Liu Fengpei Ge 《Computers, Materials & Continua》 2026年第3期2015-2039,共25页
To enhance speech emotion recognition capability,this study constructs a speech emotion recognition model integrating the adaptive acoustic mixup(AAM)and improved coordinate and shuffle attention(ICASA)methods.The AAM... To enhance speech emotion recognition capability,this study constructs a speech emotion recognition model integrating the adaptive acoustic mixup(AAM)and improved coordinate and shuffle attention(ICASA)methods.The AAM method optimizes data augmentation by combining a sample selection strategy and dynamic interpolation coefficients,thus enabling information fusion of speech data with different emotions at the acoustic level.The ICASA method enhances feature extraction capability through dynamic fusion of the improved coordinate attention(ICA)and shuffle attention(SA)techniques.The ICA technique reduces computational overhead by employing depth-separable convolution and an h-swish activation function and captures long-range dependencies of multi-scale time-frequency features using the attention weights.The SA technique promotes feature interaction through channel shuffling,which helps the model learn richer and more discriminative emotional features.Experimental results demonstrate that,compared to the baseline model,the proposed model improves the weighted accuracy by 5.42%and 4.54%,and the unweighted accuracy by 3.37%and 3.85%on the IEMOCAP and RAVDESS datasets,respectively.These improvements were confirmed to be statistically significant by independent samples t-tests,further supporting the practical reliability and applicability of the proposed model in real-world emotion-aware speech systems. 展开更多
关键词 Speech emotion recognition adaptive acoustic mixup enhancement improved coordinate attention shuffle attention attention mechanism deep learning
在线阅读 下载PDF
基于Attention U^(2)-Net的巷道围岩钻孔采动裂隙抗干扰识别研究
2
作者 单鹏飞 康佳星 +4 位作者 来兴平 代晶晶 许慧聪 李杰宇 惠聪 《煤炭学报》 北大核心 2026年第2期1052-1067,共16页
采动裂隙演化特征是量化巷道围岩动力显现特征的关键依据之一。为了降低光照不均、噪声等对围岩钻孔成像的干扰以及孔内采动裂隙边缘模糊、形态多变等对采动裂隙识别的不利影响,提出基于Attention U^(2)-Net的巷道围岩钻孔采动裂隙抗干... 采动裂隙演化特征是量化巷道围岩动力显现特征的关键依据之一。为了降低光照不均、噪声等对围岩钻孔成像的干扰以及孔内采动裂隙边缘模糊、形态多变等对采动裂隙识别的不利影响,提出基于Attention U^(2)-Net的巷道围岩钻孔采动裂隙抗干扰识别方法。利用自主研发的巷道围岩态势全息感知装备来全天候实时采集高分辨率围岩钻孔采动裂隙影像,结合注入噪声、直方图均衡化调节、HSV中V通道色彩扰动与裂隙灰度三维投影等多种增强手段来提高非理想成像条件下图像数据环境泛化能力;通过在基准模型U^(2)-Net中融合单通道注意力(SE、ECA)、空间注意力(CBAM)与全局多通道注意力(DANet)及组合注意力(CBAM+ECA)等机制,增强对低可见度裂隙等非理想采集环境下裂隙的感知与提取能力;在训练阶段采用深度监督复合损失函数(Dice+BCE)嵌入基准模型U^(2)-Net的6个网络输出端,促进基准模型U^(2)-Net以及Attention U^(2)-Net模型的稳定训练与快速收敛,从而缓解小目标裂隙梯度消失与不连续问题。巷道围岩钻孔采动裂隙抗干扰识别实验结果表明:Attention U^(2)-Net模型的IoU提升至83.1%、F_(1)达到92.6%、E_(MA)降至0.052,相较基准模型U-Net和U^(2)-Net,训练阶段的收敛步长提前21轮次与10轮次,F_(1)提高8.4%、4.0%。Attention U^(2)-Net模型训练收敛更快,裂隙边缘检测、细长裂隙提取与复杂纹理分割能力更强,为准确分析围岩钻孔采动裂隙演化特征以及巷道围岩动力显现特征提供了可靠技术支撑。 展开更多
关键词 采动裂隙 损失函数 注意力机制 attention U^(2)-Net CBAM+ECA
在线阅读 下载PDF
SwinHCAD: A Robust Multi-Modality Segmentation Model for Brain Tumors Using Transformer and Channel-Wise Attention
3
作者 Seyong Jin Muhammad Fayaz +2 位作者 L.Minh Dang Hyoung-Kyu Song Hyeonjoon Moon 《Computers, Materials & Continua》 2026年第1期511-533,共23页
Brain tumors require precise segmentation for diagnosis and treatment plans due to their complex morphology and heterogeneous characteristics.While MRI-based automatic brain tumor segmentation technology reduces the b... Brain tumors require precise segmentation for diagnosis and treatment plans due to their complex morphology and heterogeneous characteristics.While MRI-based automatic brain tumor segmentation technology reduces the burden on medical staff and provides quantitative information,existing methodologies and recent models still struggle to accurately capture and classify the fine boundaries and diverse morphologies of tumors.In order to address these challenges and maximize the performance of brain tumor segmentation,this research introduces a novel SwinUNETR-based model by integrating a new decoder block,the Hierarchical Channel-wise Attention Decoder(HCAD),into a powerful SwinUNETR encoder.The HCAD decoder block utilizes hierarchical features and channelspecific attention mechanisms to further fuse information at different scales transmitted from the encoder and preserve spatial details throughout the reconstruction phase.Rigorous evaluations on the recent BraTS GLI datasets demonstrate that the proposed SwinHCAD model achieved superior and improved segmentation accuracy on both the Dice score and HD95 metrics across all tumor subregions(WT,TC,and ET)compared to baseline models.In particular,the rationale and contribution of the model design were clarified through ablation studies to verify the effectiveness of the proposed HCAD decoder block.The results of this study are expected to greatly contribute to enhancing the efficiency of clinical diagnosis and treatment planning by increasing the precision of automated brain tumor segmentation. 展开更多
关键词 attention mechanism brain tumor segmentation channel-wise attention decoder deep learning medical imaging MRI TRANSFORMER U-Net
在线阅读 下载PDF
A Hierarchical Attention Framework for Business Information Systems:Theoretical Foundation and Proof-of-Concept Implementation
4
作者 Sabina-Cristiana Necula Napoleon-Alexandru Sireteanu 《Computers, Materials & Continua》 2026年第2期2055-2088,共34页
Modern business information systems face significant challenges in managing heterogeneous data sources,integrating disparate systems,and providing real-time decision support in complex enterprise environments.Contempo... Modern business information systems face significant challenges in managing heterogeneous data sources,integrating disparate systems,and providing real-time decision support in complex enterprise environments.Contemporary enterprises typically operate 200+interconnected systems,with research indicating that 52% of organizations manage three or more enterprise content management systems,creating information silos that reduce operational efficiency by up to 35%.While attention mechanisms have demonstrated remarkable success in natural language processing and computer vision,their systematic application to business information systems remains largely unexplored.This paper presents the theoretical foundation for a Hierarchical Attention-Based Business Information System(HABIS)framework that applies multi-level attention mechanisms to enterprise environments.We provide a comprehensive mathematical formulation of the framework,analyze its computational complexity,and present a proof-of-concept implementation with simulation-based validation that demonstrates a 42% reduction in crosssystem query latency compared to legacy ERP modules and 70% improvement in prediction accuracy over baseline methods.The theoretical framework introduces four hierarchical attention levels:system-level attention for dynamic weighting of business systems,process-level attention for business process prioritization,data-level attention for critical information selection,and temporal attention for time-sensitive pattern recognition.Our complexity analysis demonstrates that the framework achieves O(n log n)computational complexity for attention computation,making it scalable to large enterprise environments including retail supply chains with 200+system-scale deployments.The proof-of-concept implementation validates the theoretical framework’s feasibility withMSE loss of 0.439 and response times of 0.000120 s per query,demonstrating its potential for addressing key challenges in business information systems.This work establishes a foundation for future empirical research and practical implementation of attention-driven enterprise systems. 展开更多
关键词 attention mechanisms business information systems theoretical framework enterprise architecture complex systems hierarchical attention
在线阅读 下载PDF
Interactive Dynamic Graph Convolution with Temporal Attention for Traffic Flow Forecasting
5
作者 Zitong Zhao Zixuan Zhang Zhenxing Niu 《Computers, Materials & Continua》 2026年第1期1049-1064,共16页
Reliable traffic flow prediction is crucial for mitigating urban congestion.This paper proposes Attentionbased spatiotemporal Interactive Dynamic Graph Convolutional Network(AIDGCN),a novel architecture integrating In... Reliable traffic flow prediction is crucial for mitigating urban congestion.This paper proposes Attentionbased spatiotemporal Interactive Dynamic Graph Convolutional Network(AIDGCN),a novel architecture integrating Interactive Dynamic Graph Convolution Network(IDGCN)with Temporal Multi-Head Trend-Aware Attention.Its core innovation lies in IDGCN,which uniquely splits sequences into symmetric intervals for interactive feature sharing via dynamic graphs,and a novel attention mechanism incorporating convolutional operations to capture essential local traffic trends—addressing a critical gap in standard attention for continuous data.For 15-and 60-min forecasting on METR-LA,AIDGCN achieves MAEs of 0.75%and 0.39%,and RMSEs of 1.32%and 0.14%,respectively.In the 60-min long-term forecasting of the PEMS-BAY dataset,the AIDGCN out-performs the MRA-BGCN method by 6.28%,4.93%,and 7.17%in terms of MAE,RMSE,and MAPE,respectively.Experimental results demonstrate the superiority of our pro-posed model over state-of-the-art methods. 展开更多
关键词 Traffic flow prediction interactive dynamic graph convolution graph convolution temporal multi-head trend-aware attention self-attention mechanism
在线阅读 下载PDF
基于TCN-BiLSTM-Attention模型的超短期光伏发电量预测方法
6
作者 刘凯伦 孙广玲 陆小锋 《工业控制计算机》 2026年第1期122-124,共3页
随着光伏发电在全球能源体系中占比不断提升,超短期光伏发电量预测对电力系统调度与安全运行至关重要。然而,光伏发电量受多因素影响,具有显著随机性与波动性。为此,提出了一种基于TCN-BiLSTM-Attention模型的超短期光伏发电量预测方法... 随着光伏发电在全球能源体系中占比不断提升,超短期光伏发电量预测对电力系统调度与安全运行至关重要。然而,光伏发电量受多因素影响,具有显著随机性与波动性。为此,提出了一种基于TCN-BiLSTM-Attention模型的超短期光伏发电量预测方法。首先通过皮尔逊相关分析筛选关键特征,并利用孤立森林算法检测异常值,结合线性插值法和标准化完成数据预处理。随后,通过时间卷积网络(Temporal Convolutional Network,TCN)提取时序特征,再利用双向长短期记忆网络(Bidirectional Long Short-Term Memory,BiLSTM)网络捕获前后向时间依赖关系,并在输出端引入注意力机制聚焦关键时间步特征。最后,在Desert Knowledge Australia Solar Centre(DKASC)数据集上的对比实验表明,与传统LSTM、BiLSTM模型相比,提出的TCN-BiLSTM-Attention模型在预测精度、稳定性等方面均表现出一定优势。 展开更多
关键词 TCN BiLSTM attention 发电量超短期预测
在线阅读 下载PDF
基于CNN-Transformer-Cross Attention的滚动轴承故障诊断
7
作者 郑文超 张梅 《煤矿机械》 2026年第4期188-192,共5页
滚动轴承是煤机核心部件,若发生故障,易导致停机与安全风险。提出了一种融合快速傅里叶变换(FFT)、卷积神经网络(CNN)、Transformer及Cross Attention的故障诊断方法。该方法首先通过FFT提取频率特征,随后结合CNN的局部特征提取能力、Tr... 滚动轴承是煤机核心部件,若发生故障,易导致停机与安全风险。提出了一种融合快速傅里叶变换(FFT)、卷积神经网络(CNN)、Transformer及Cross Attention的故障诊断方法。该方法首先通过FFT提取频率特征,随后结合CNN的局部特征提取能力、Transformer的全局建模能力及Cross Attention的信息融合能力,全面提升模型的识别能力,实现滚动轴承故障的精确识别。实验结果表明,该方法的故障诊断准确率可达98%,具有高精度、强鲁棒性的特点,适用于煤矿设备的智能运维。 展开更多
关键词 轴承 故障诊断 FFT CNN TRANSFORMER Cross attention
原文传递
An attention module integrated hybrid model for recognizing microseismic signals induced by high-pressure grouting in deep rock layers
8
作者 Yongshu Zhang Lianchong Li +2 位作者 Wenqiang Mu Jian Chen Peng Chen 《International Journal of Mining Science and Technology》 2026年第3期595-613,共19页
Microseismic(MS)monitoring is an effective technique to detect mining-induced rock fractures.However,recognizing grouting-induced signals is challenging due to complex geological conditions in deep rock plates.Therefo... Microseismic(MS)monitoring is an effective technique to detect mining-induced rock fractures.However,recognizing grouting-induced signals is challenging due to complex geological conditions in deep rock plates.Therefore,a hybrid model(WM-ResNet50)integrating data enhancement,a deep convolutional neural network(CNN),and convolutional block attention modules(CBAM)was proposed.Firstly,an MS system was established at the Xieqiao coal mine in Anhui Province,China.MS waveforms and injection parameters were acquired during grouting.Secondly,signals were categorized based on time-frequency characteristics to build a dataset,which was divided into training,validation,and test sets at a ratio of 4:1:1.Subsequently,the performance of WM-ResNet50 was evaluated based on indices such as individual precision,total accuracy,recall,and loss function.The results indicated that WMResNet50 achieved an average recognition accuracy of 94.38%,surpassing that of a simple CNN(90.04%),ResNet18(91.72%),and ResNet50(92.48%).Finally,WM-ResNet50 was applied to monitor the whole process at laboratory tests and field cases.Both results affirmed the feasibility and effectiveness of MS inversion in predicting actual slurry diffusion ranges within deep rock layers.By comparison,it was revealed that the MS sources classified by WM-ResNet50 matched grouting records well.A solution to address insufficient diffusion under long-borehole grouting has been proposed.WM-ResNet50′s accuracy was validated through in-situ coring and XRD analysis for cement-based hydration products.This study provides a beneficial reference for similar rock signal processing and in-field grouting practices. 展开更多
关键词 attention module Convolutional neural network Microseismic ROCK Grouting-induced signals Slurry diffusion
在线阅读 下载PDF
Superpixel-Aware Transformer with Attention-Guided Boundary Refinement for Salient Object Detection
9
作者 Burhan Baraklı Can Yüzkollar +1 位作者 Tugrul Ta¸sçı Ibrahim Yıldırım 《Computer Modeling in Engineering & Sciences》 2026年第1期1092-1129,共38页
Salient object detection(SOD)models struggle to simultaneously preserve global structure,maintain sharp object boundaries,and sustain computational efficiency in complex scenes.In this study,we propose SPSALNet,a task... Salient object detection(SOD)models struggle to simultaneously preserve global structure,maintain sharp object boundaries,and sustain computational efficiency in complex scenes.In this study,we propose SPSALNet,a task-driven two-stage(macro–micro)architecture that restructures the SOD process around superpixel representations.In the proposed approach,a“split-and-enhance”principle,introduced to our knowledge for the first time in the SOD literature,hierarchically classifies superpixels and then applies targeted refinement only to ambiguous or error-prone regions.At the macro stage,the image is partitioned into content-adaptive superpixel regions,and each superpixel is represented by a high-dimensional region-level feature vector.These representations define a regional decomposition problem in which superpixels are assigned to three classes:background,object interior,and transition regions.Superpixel tokens interact with a global feature vector from a deep network backbone through a cross-attention module and are projected into an enriched embedding space that jointly encodes local topology and global context.At the micro stage,the model employs a U-Net-based refinement process that allocates computational resources only to ambiguous transition regions.The image and distance–similarity maps derived from superpixels are processed through a dual-encoder pathway.Subsequently,channel-aware fusion blocks adaptively combine information from these two sources,producing sharper and more stable object boundaries.Experimental results show that SPSALNet achieves high accuracy with lower computational cost compared to recent competing methods.On the PASCAL-S and DUT-OMRON datasets,SPSALNet exhibits a clear performance advantage across all key metrics,and it ranks first on accuracy-oriented measures on HKU-IS.On the challenging DUT-OMRON benchmark,SPSALNet reaches a MAE of 0.034.Across all datasets,it preserves object boundaries and regional structure in a stable and competitive manner. 展开更多
关键词 Salient object detection superpixel segmentation TRANSFORMERS attention mechanism multi-level fusion edge-preserving refinement model-driven
在线阅读 下载PDF
Adaptive Windowing with Label-Aware Attention for Robust Multi-Tab Website Fingerprinting
10
作者 Chunqian Guo Gang Chen 《Computers, Materials & Continua》 2026年第5期731-751,共21页
Despite the ability of the anonymous communication system The Onion Router(Tor)to obscure the content of communications,prior studies have shown that passive adversaries can still infer the websites visited by users t... Despite the ability of the anonymous communication system The Onion Router(Tor)to obscure the content of communications,prior studies have shown that passive adversaries can still infer the websites visited by users throughwebsite fingerprinting(WF)attacks.ConventionalWFmethodologies demonstrate optimal performance in scenarios involving single-tab browsing.Conventional WF methods achieve optimal performance primarily in scenarios involving single-tab browsing.However,in real-world network environments,users often engage in multitab browsing,which generates overlapping traffic patterns from different websites.This overlap has been shown to significantly degrade the performance of classifiers that rely on the single-tab assumption.To address this challenge,this paper proposes a Transformer-basedmulti-tab website fingerprinting(MT-WF)attack framework.Themodel employs an adaptive sliding windowmechanism to capture fine-grained features of traffic direction.Additionally,it incorporates a label-aware attention mechanism designed to dynamically separate and refine entangled traffic representations,enhancing the model’s ability to distinguish between overlapping traffic patterns.Furthermore,the model leverages global traffic patterns through multi-segment feature fusion and incorporates an incremental learning(IL)strategy to adapt to the continuously evolving website categories in open-world environments.Experimental results demonstrate that the proposedmethod achieves a top-2 precision of 0.78 in the closed-world setting.In the open-world scenario,the model attains an F1 score of 0.904,outperforming most existing baselines.The proposed method maintains superior performance even under challenging conditions,including WF defenses and concept drift. 展开更多
关键词 Tor website fingerprinting(WF) multi-tab browsing transformer-based model label-aware attention traffic analysis privacy CYBERSECURITY
在线阅读 下载PDF
SparseMoE-MFN:A Sparse Attention and Mixture-of-Experts Framework for Multimodal Fake News Detection on Social Media
11
作者 Yuechuan Zhang Mingshu Zhang +2 位作者 Bin Wei Hongyu Jin Yaxuan Wang 《Computers, Materials & Continua》 2026年第5期1646-1669,共24页
Detecting fake news in multimodal and multilingual social media environments is challenging due to inherent noise,inter-modal imbalance,computational bottlenecks,and semantic ambiguity.To address these issues,we propo... Detecting fake news in multimodal and multilingual social media environments is challenging due to inherent noise,inter-modal imbalance,computational bottlenecks,and semantic ambiguity.To address these issues,we propose SparseMoE-MFN,a novel unified framework that integrates sparse attention with a sparse-activated Mixture of-Experts(MoE)architecture.This framework aims to enhance the efficiency,inferential depth,and interpretability of multimodal fake news detection.Sparse MoE-MFN leverages LLaVA-v1.6-Mistral-7B-HF for efficient visual encoding and Qwen/Qwen2-7B for text processing.The sparse attention module adaptively filters irrelevant tokens and focuses on key regions,reducing computational costs and noise.The sparse MoE module dynamically routes inputs to specialized experts(visual,language,cross-modal alignment)based on content heterogeneity.This expert specialization design boosts computational efficiency and semantic adaptability,enabling precise processing of complex content and improving performance on ambiguous categories.Evaluated on the large-scale,multilingualMR2 dataset,SparseMoEMFN achieves state-of-the-art performance.It obtains an accuracy of 86.7%and a macro-averaged F1 score of 0.859,outperforming strong baselines like MiniGPT-4 by 3.4%and 3.2%,respectively.Notably,it shows significant advantages in the“unverified”category.Furthermore,SparseMoE-MFN demonstrates superior computational efficiency,with an average inference latency of 89.1 ms and 95.4 GFLOPs,substantially lower than existing models.Ablation studies and visualization analyses confirm the effectiveness of both sparse attention and sparse MoE components in improving accuracy,generalization,and efficiency. 展开更多
关键词 Fake news detection MULTIMODAL sparse attention mixture-of-experts INTERPRETABILITY computational efficiency
在线阅读 下载PDF
Keyword Spotting Based on Dual-Branch Broadcast Residual and Time-Frequency Coordinate Attention
12
作者 Zeyu Wang Jian-Hong Wang Kuo-Chun Hsu 《Computers, Materials & Continua》 2026年第4期333-352,共20页
In daily life,keyword spotting plays an important role in human-computer interaction.However,noise often interferes with the extraction of time-frequency information,and achieving both computational efficiency and rec... In daily life,keyword spotting plays an important role in human-computer interaction.However,noise often interferes with the extraction of time-frequency information,and achieving both computational efficiency and recognition accuracy on resource-constrained devices such as mobile terminals remains a major challenge.To address this,we propose a novel time-frequency dual-branch parallel residual network,which integrates a Dual-Branch Broadcast Residual module and a Time-Frequency Coordinate Attention module.The time-domain and frequency-domain branches are designed in parallel to independently extract temporal and spectral features,effectively avoiding the potential information loss caused by serial stacking,while enhancing information flow and multi-scale feature fusion.In terms of training strategy,a curriculum learning approach is introduced to progressively improve model robustness fromeasy to difficult tasks.Experimental results demonstrate that the proposed method consistently outperforms existing lightweight models under various signal-to-noise ratio(SNR)conditions,achieving superior far-field recognition performance on the Google Speech Commands V2 dataset.Notably,the model maintains stable performance even in low-SNR environments such as–10 dB,and generalizes well to unseen SNR conditions during training,validating its robustness to novel noise scenarios.Furthermore,the proposed model exhibits significantly fewer parameters,making it highly suitable for deployment on resource-limited devices.Overall,the model achieves a favorable balance between performance and parameter efficiency,demonstrating strong potential for practical applications. 展开更多
关键词 Keyword spotting convolutional neural network residual learning attention small footprint noisy far-field
在线阅读 下载PDF
Enhanced BEV Scene Segmentation:De-Noise Channel Attention for Resource-Constrained Environments
13
作者 Argho Dey Yunfei Yin +3 位作者 Zheng Yuan ZhiwenZeng Xianjian Bao Md Minhazul Islam 《Computers, Materials & Continua》 2026年第4期2161-2180,共20页
Autonomous vehicles rely heavily on accurate and efficient scene segmentation for safe navigation and efficient operations.Traditional Bird’s Eye View(BEV)methods on semantic scene segmentation,which leverage multimo... Autonomous vehicles rely heavily on accurate and efficient scene segmentation for safe navigation and efficient operations.Traditional Bird’s Eye View(BEV)methods on semantic scene segmentation,which leverage multimodal sensor fusion,often struggle with noisy data and demand high-performance GPUs,leading to sensor misalignment and performance degradation.This paper introduces an Enhanced Channel Attention BEV(ECABEV),a novel approach designed to address the challenges under insufficient GPU memory conditions.ECABEV integrates camera and radar data through a de-noise enhanced channel attention mechanism,which utilizes global average and max pooling to effectively filter out noise while preserving discriminative features.Furthermore,an improved fusion approach is proposed to efficiently merge categorical data across modalities.To reduce computational overhead,a bilinear interpolation layer normalizationmethod is devised to ensure spatial feature fidelity.Moreover,a scalable crossentropy loss function is further designed to handle the imbalanced classes with less computational efficiency sacrifice.Extensive experiments on the nuScenes dataset demonstrate that ECABEV achieves state-of-the-art performance with an IoU of 39.961,using a lightweight ViT-B/14 backbone and lower resolution(224×224).Our approach highlights its cost-effectiveness and practical applicability,even on low-end devices.The code is publicly available at:https://github.com/YYF-CQU/ECABEV.git. 展开更多
关键词 Autonomous vehicle BEV attention mechanism sensor fusion scene segmentation
在线阅读 下载PDF
An Attention-Based 6D Pose Estimation Network for Weakly Textured Industrial Parts
14
作者 Song Xu Liang Xuan +1 位作者 Yifeng Li Qiang Zhang 《Computers, Materials & Continua》 2026年第2期2148-2166,共19页
The 6D pose estimation of objects is of great significance for the intelligent assembly and sorting of industrial parts.In the industrial robot production scenarios,the 6D pose estimation of industrial parts mainly fa... The 6D pose estimation of objects is of great significance for the intelligent assembly and sorting of industrial parts.In the industrial robot production scenarios,the 6D pose estimation of industrial parts mainly faces two challenges:one is the loss of information and interference caused by occlusion and stacking in the sorting scenario,the other is the difficulty of feature extraction due to the weak texture of industrial parts.To address the above problems,this paper proposes an attention-based pixel-level voting network for 6D pose estimation of weakly textured industrial parts,namely CB-PVNet.On the one hand,the voting scheme can predict the keypoints of affected pixels,which improves the accuracy of keypoint localization even in scenarios such as weak texture and partial occlusion.On the other hand,the attention mechanism can extract interesting features of the object while suppressing useless features of surroundings.Extensive comparative experiments were conducted on both public datasets(including LINEMOD,Occlusion LINEMOD and T-LESS datasets)and self-made datasets.The experimental results indicate that the proposed network CB-PVNet can achieve accuracy of ADD(-s)comparable to state-of-the-art using only RGB images while ensuring real-time performance.Additionally,we also conducted robot grasping experiments in the real world.The balance between accuracy and computational efficiency makes the method well-suited for applications in industrial automation. 展开更多
关键词 Industrial robots pose estimation industrial parts attention mechanism weak texture
在线阅读 下载PDF
FD-YOLO:An Attention-Augmented Lightweight Network for Real-Time Industrial Fabric Defect Detection
15
作者 Shaobo Kang Mingzhi Yang 《Computers, Materials & Continua》 2026年第2期1087-1109,共23页
Fabric defect detection plays a vital role in ensuring textile quality.However,traditional manual inspection methods are often inefficient and inaccurate.To overcome these limitations,we propose FD-YOLO,an enhanced li... Fabric defect detection plays a vital role in ensuring textile quality.However,traditional manual inspection methods are often inefficient and inaccurate.To overcome these limitations,we propose FD-YOLO,an enhanced lightweight detection model based on the YOLOv11n framework.The proposed model introduces the Bi-level Routing Attention(BRAttention)mechanism to enhance defect feature extraction,enabling more detailed feature representation.It proposes Deep Progressive Cross-Scale Fusion Neck(DPCSFNeck)to better capture smallscale defects and incorporates a Multi-Scale Dilated Residual(MSDR)module to strengthen multi-scale feature representation.Furthermore,a Shared Detail-Enhanced Lightweight Head(SDELHead)is employed to reduce the risk of gradient explosion during training.Experimental results demonstrate that FD-YOLO achieves superior detection accuracy and Lightweight performance compared to the baseline YOLOv11n. 展开更多
关键词 Deep learning YOLO fabric defect inspection multi-scale attention lightweight head
在线阅读 下载PDF
YOLO-SPDNet:Multi-Scale Sequence and Attention-Based Tomato Leaf Disease Detection Model
16
作者 Meng Wang Jinghan Cai +6 位作者 Wenzheng Liu Xue Yang Jingjing Zhang Qiangmin Zhou Fanzhen Wang Hang Zhang Tonghai Liu 《Phyton-International Journal of Experimental Botany》 2026年第1期290-308,共19页
Tomato is a major economic crop worldwide,and diseases on tomato leaves can significantly reduce both yield and quality.Traditional manual inspection is inefficient and highly subjective,making it difficult to meet th... Tomato is a major economic crop worldwide,and diseases on tomato leaves can significantly reduce both yield and quality.Traditional manual inspection is inefficient and highly subjective,making it difficult to meet the requirements of early disease identification in complex natural environments.To address this issue,this study proposes an improved YOLO11-based model,YOLO-SPDNet(Scale Sequence Fusion,Position-Channel Attention,and Dual Enhancement Network).The model integrates the SEAM(Self-Ensembling Attention Mechanism)semantic enhancement module,the MLCA(Mixed Local Channel Attention)lightweight attention mechanism,and the SPA(Scale-Position-Detail Awareness)module composed of SSFF(Scale Sequence Feature Fusion),TFE(Triple Feature Encoding),and CPAM(Channel and Position Attention Mechanism).These enhancements strengthen fine-grained lesion detection while maintaining model lightweightness.Experimental results show that YOLO-SPDNet achieves an accuracy of 91.8%,a recall of 86.5%,and an mAP@0.5 of 90.6%on the test set,with a computational complexity of 12.5 GFLOPs.Furthermore,the model reaches a real-time inference speed of 987 FPS,making it suitable for deployment on mobile agricultural terminals and online monitoring systems.Comparative analysis and ablation studies further validate the reliability and practical applicability of the proposed model in complex natural scenes. 展开更多
关键词 Tomato disease detection YOLO multi-scale feature fusion attention mechanism lightweight model
在线阅读 下载PDF
Toward Efficient Traffic-Sign Detection via SlimNeck and Coordinate-Attention Fusion in YOLO-SMM
17
作者 Hui Chen Mohammed A.H.Ali +6 位作者 Bushroa Abd Razak Zhenya Wang Yusoff Nukman Shikai Zhang Zhiwei Huang Ligang Yao Mohammad Alkhedher 《Computers, Materials & Continua》 2026年第2期1823-1848,共26页
Accurate and real-time traffic-sign detection is a cornerstone of Advanced Driver-Assistance Systems(ADAS)and autonomous vehicles.However,existing one-stage detectors miss distant signs,and two-stage pipelines are imp... Accurate and real-time traffic-sign detection is a cornerstone of Advanced Driver-Assistance Systems(ADAS)and autonomous vehicles.However,existing one-stage detectors miss distant signs,and two-stage pipelines are impractical for embedded deployment.To address this issue,we present YOLO-SMM,a lightweight two-stage framework.This framework is designed to augment the YOLOv8 baseline with three targeted modules.(1)SlimNeck replaces PAN/FPN with a CSP-OSA/GSConv fusion block,reducing parameters and FLOPs without compromising multi-scale detail.(2)The MCA model introduces row-and column-aware weights to selectively amplify small sign regions in cluttered scenes.(3)MPDIoU augments CIoU loss with a corner-distance term,supplying stable gradients for sub-20-pixel boxes and tightening localization.An evaluation of YOLO-SMMon the German Traffic Sign Recognition Benchmark(GTSRB)revealed that it attained 96.3% mAP50 and 93.1% mAP50-90 at a rate of 90.6 frames per second(FPS).This represents an improvement of+1.0% over previous performance benchmarks.Them APat 64×64 resolution was found to be 50% of the maximum attainable value,with an FPS of+8.3 when compared to YOLOv8.This result indicates superior performance in terms of accuracy and speed compared to YOLOv7,YOLOv5,RetinaNet,EfficientDet,and Faster R-CNN,all of which were operated under equivalent conditions. 展开更多
关键词 Traffic sign detection YOLO v8 YOLO v5 YOLO v7 SlimNeck modified coordinate attention MPDIoU
在线阅读 下载PDF
Semantic-Guided Stereo Matching Network Based on Parallax Attention Mechanism and Seg Former
18
作者 Zeyuan Chen Yafei Xie +2 位作者 Jinkun Li Song Wang Yingqiang Ding 《Computers, Materials & Continua》 2026年第4期1322-1340,共19页
Stereo matching is a pivotal task in computer vision,enabling precise depth estimation from stereo image pairs,yet it encounters challenges in regions with reflections,repetitive textures,or fine structures.In this pa... Stereo matching is a pivotal task in computer vision,enabling precise depth estimation from stereo image pairs,yet it encounters challenges in regions with reflections,repetitive textures,or fine structures.In this paper,we propose a Semantic-Guided Parallax Attention Stereo Matching Network(SGPASMnet)that can be trained in unsupervised manner,building upon the Parallax Attention Stereo Matching Network(PASMnet).Our approach leverages unsupervised learning to address the scarcity of ground truth disparity in stereo matching datasets,facilitating robust training across diverse scene-specific datasets and enhancing generalization.SGPASMnet incorporates two novel components:a Cross-Scale Feature Interaction(CSFI)block and semantic feature augmentation using a pre-trained semantic segmentation model,SegFormer,seamlessly embedded into the parallax attention mechanism.The CSFI block enables effective fusion ofmulti-scale features,integrating coarse and fine details to enhance disparity estimation accuracy.Semantic features,extracted by SegFormer,enrich the parallax attention mechanism by providing high-level scene context,significantly improving performance in ambiguous regions.Our model unifies these enhancements within a cohesive architecture,comprising semantic feature extraction,an hourglass network,a semantic-guided cascaded parallax attentionmodule,outputmodule,and a disparity refinement network.Evaluations on the KITTI2015 dataset demonstrate that our unsupervised method achieves a lower error rate compared to the original PASMnet,highlighting the effectiveness of our enhancements in handling complex scenes.By harnessing unsupervised learning without ground truth disparity needed,SGPASMnet offers a scalable and robust solution for accurate stereo matching,with superior generalization across varied real-world applications. 展开更多
关键词 Stereo matching parallax attention unsupervised learning convolutional neural network stereo correspondence
在线阅读 下载PDF
Dual-Attention Multi-Path Deep Learning Framework for Automated Wind Turbine Blade Fault Detection Using UAV Imagery
19
作者 Mubarak Alanazi Junaid Rashid 《Computer Modeling in Engineering & Sciences》 2026年第2期499-523,共25页
Wind turbine blade defect detection faces persistent challenges in separating small,low-contrast surface faults from complex backgrounds while maintaining reliability under variable illumination and viewpoints.Conven-... Wind turbine blade defect detection faces persistent challenges in separating small,low-contrast surface faults from complex backgrounds while maintaining reliability under variable illumination and viewpoints.Conven-tional image-processing pipelines struggle with scalability and robustness,and recent deep learning methods remain sensitive to class imbalance and acquisition variability.This paper introduces TurbineBladeDetNet,a convolutional architecture combining dual-attention mechanisms with multi-path feature extraction for detecting five distinct blade fault types.Our approach employs both channel-wise and spatial attention modules alongside an Albumentations-driven augmentation strategy to handle dataset imbalance and capture condition variability.The model achieves 97.14%accuracy,98.65%precision,and 98.68%recall,yielding a 98.66%F1-score with 0.0110 s inference time.Class-specific analysis shows uniformly high sensitivity and specificity;lightning damage reaches 99.80%for sensitivity,precision,and F1-score,and crack achieves perfect precision and specificity with a 98.94%F1-score.Comparative evaluation against recent wind-turbine inspection approaches indicates higher performance in both accuracy and F1-score.The resulting balance of sensitivity and specificity limits both missed defects and false alarms,supporting reliable deployment in routine unmanned aerial vehicle(UAV)inspection. 展开更多
关键词 Wind energy aerial imagery surface condition monitoring wind turbine blades surface defect detection attention mechanism computer vision deep learning artificial intelligence
在线阅读 下载PDF
A dual attention-based deep learning model for lithology identificationwhile drilling
20
作者 Jie Chen Zhen Gui +6 位作者 Yichao Rui Xusheng Zhao Xiaokang Pan Qingfeng Wang Yuanyuan Pu Zheng Li Maoyi Liu 《Journal of Rock Mechanics and Geotechnical Engineering》 2026年第2期1177-1192,共16页
Lithology identificationwhile drilling technology can obtain rock information in real-time.However,traditional lithology identificationmodels often face limitations in feature extraction and adaptability to complex ge... Lithology identificationwhile drilling technology can obtain rock information in real-time.However,traditional lithology identificationmodels often face limitations in feature extraction and adaptability to complex geological conditions,limiting their accuracy in challenging environments.To address these challenges,a deep learning model for lithology identificationwhile drilling is proposed.The proposed model introduces a dual attention mechanism in the long short-term memory(LSTM)network,effectively enhancing the ability to capture spatial and channel dimension information.Subsequently,the crayfishoptimization algorithm(COA)is applied to optimize the model network structure,thereby enhancing its lithology identificationcapability.Laboratory test results demonstrate that the proposed model achieves 97.15%accuracy on the testing set,significantlyoutperforming the traditional support vector machine(SVM)method(81.77%).Field tests under actual drilling conditions demonstrate an average accuracy of 91.96%for the proposed model,representing a 14.31%improvement over the LSTM model alone.The proposed model demonstrates robust adaptability and generalization ability across diverse operational scenarios.This research offers reliable technical support for lithology identification while drilling. 展开更多
关键词 Lithology identificationwhile drilling Deep learning Dual attention mechanism Metaheuristic algorithm Field applications
在线阅读 下载PDF
上一页 1 2 100 下一页 到第
使用帮助 返回顶部