To enhance speech emotion recognition capability,this study constructs a speech emotion recognition model integrating the adaptive acoustic mixup(AAM)and improved coordinate and shuffle attention(ICASA)methods.The AAM...To enhance speech emotion recognition capability,this study constructs a speech emotion recognition model integrating the adaptive acoustic mixup(AAM)and improved coordinate and shuffle attention(ICASA)methods.The AAM method optimizes data augmentation by combining a sample selection strategy and dynamic interpolation coefficients,thus enabling information fusion of speech data with different emotions at the acoustic level.The ICASA method enhances feature extraction capability through dynamic fusion of the improved coordinate attention(ICA)and shuffle attention(SA)techniques.The ICA technique reduces computational overhead by employing depth-separable convolution and an h-swish activation function and captures long-range dependencies of multi-scale time-frequency features using the attention weights.The SA technique promotes feature interaction through channel shuffling,which helps the model learn richer and more discriminative emotional features.Experimental results demonstrate that,compared to the baseline model,the proposed model improves the weighted accuracy by 5.42%and 4.54%,and the unweighted accuracy by 3.37%and 3.85%on the IEMOCAP and RAVDESS datasets,respectively.These improvements were confirmed to be statistically significant by independent samples t-tests,further supporting the practical reliability and applicability of the proposed model in real-world emotion-aware speech systems.展开更多
Brain tumors require precise segmentation for diagnosis and treatment plans due to their complex morphology and heterogeneous characteristics.While MRI-based automatic brain tumor segmentation technology reduces the b...Brain tumors require precise segmentation for diagnosis and treatment plans due to their complex morphology and heterogeneous characteristics.While MRI-based automatic brain tumor segmentation technology reduces the burden on medical staff and provides quantitative information,existing methodologies and recent models still struggle to accurately capture and classify the fine boundaries and diverse morphologies of tumors.In order to address these challenges and maximize the performance of brain tumor segmentation,this research introduces a novel SwinUNETR-based model by integrating a new decoder block,the Hierarchical Channel-wise Attention Decoder(HCAD),into a powerful SwinUNETR encoder.The HCAD decoder block utilizes hierarchical features and channelspecific attention mechanisms to further fuse information at different scales transmitted from the encoder and preserve spatial details throughout the reconstruction phase.Rigorous evaluations on the recent BraTS GLI datasets demonstrate that the proposed SwinHCAD model achieved superior and improved segmentation accuracy on both the Dice score and HD95 metrics across all tumor subregions(WT,TC,and ET)compared to baseline models.In particular,the rationale and contribution of the model design were clarified through ablation studies to verify the effectiveness of the proposed HCAD decoder block.The results of this study are expected to greatly contribute to enhancing the efficiency of clinical diagnosis and treatment planning by increasing the precision of automated brain tumor segmentation.展开更多
Modern business information systems face significant challenges in managing heterogeneous data sources,integrating disparate systems,and providing real-time decision support in complex enterprise environments.Contempo...Modern business information systems face significant challenges in managing heterogeneous data sources,integrating disparate systems,and providing real-time decision support in complex enterprise environments.Contemporary enterprises typically operate 200+interconnected systems,with research indicating that 52% of organizations manage three or more enterprise content management systems,creating information silos that reduce operational efficiency by up to 35%.While attention mechanisms have demonstrated remarkable success in natural language processing and computer vision,their systematic application to business information systems remains largely unexplored.This paper presents the theoretical foundation for a Hierarchical Attention-Based Business Information System(HABIS)framework that applies multi-level attention mechanisms to enterprise environments.We provide a comprehensive mathematical formulation of the framework,analyze its computational complexity,and present a proof-of-concept implementation with simulation-based validation that demonstrates a 42% reduction in crosssystem query latency compared to legacy ERP modules and 70% improvement in prediction accuracy over baseline methods.The theoretical framework introduces four hierarchical attention levels:system-level attention for dynamic weighting of business systems,process-level attention for business process prioritization,data-level attention for critical information selection,and temporal attention for time-sensitive pattern recognition.Our complexity analysis demonstrates that the framework achieves O(n log n)computational complexity for attention computation,making it scalable to large enterprise environments including retail supply chains with 200+system-scale deployments.The proof-of-concept implementation validates the theoretical framework’s feasibility withMSE loss of 0.439 and response times of 0.000120 s per query,demonstrating its potential for addressing key challenges in business information systems.This work establishes a foundation for future empirical research and practical implementation of attention-driven enterprise systems.展开更多
Reliable traffic flow prediction is crucial for mitigating urban congestion.This paper proposes Attentionbased spatiotemporal Interactive Dynamic Graph Convolutional Network(AIDGCN),a novel architecture integrating In...Reliable traffic flow prediction is crucial for mitigating urban congestion.This paper proposes Attentionbased spatiotemporal Interactive Dynamic Graph Convolutional Network(AIDGCN),a novel architecture integrating Interactive Dynamic Graph Convolution Network(IDGCN)with Temporal Multi-Head Trend-Aware Attention.Its core innovation lies in IDGCN,which uniquely splits sequences into symmetric intervals for interactive feature sharing via dynamic graphs,and a novel attention mechanism incorporating convolutional operations to capture essential local traffic trends—addressing a critical gap in standard attention for continuous data.For 15-and 60-min forecasting on METR-LA,AIDGCN achieves MAEs of 0.75%and 0.39%,and RMSEs of 1.32%and 0.14%,respectively.In the 60-min long-term forecasting of the PEMS-BAY dataset,the AIDGCN out-performs the MRA-BGCN method by 6.28%,4.93%,and 7.17%in terms of MAE,RMSE,and MAPE,respectively.Experimental results demonstrate the superiority of our pro-posed model over state-of-the-art methods.展开更多
随着光伏发电在全球能源体系中占比不断提升,超短期光伏发电量预测对电力系统调度与安全运行至关重要。然而,光伏发电量受多因素影响,具有显著随机性与波动性。为此,提出了一种基于TCN-BiLSTM-Attention模型的超短期光伏发电量预测方法...随着光伏发电在全球能源体系中占比不断提升,超短期光伏发电量预测对电力系统调度与安全运行至关重要。然而,光伏发电量受多因素影响,具有显著随机性与波动性。为此,提出了一种基于TCN-BiLSTM-Attention模型的超短期光伏发电量预测方法。首先通过皮尔逊相关分析筛选关键特征,并利用孤立森林算法检测异常值,结合线性插值法和标准化完成数据预处理。随后,通过时间卷积网络(Temporal Convolutional Network,TCN)提取时序特征,再利用双向长短期记忆网络(Bidirectional Long Short-Term Memory,BiLSTM)网络捕获前后向时间依赖关系,并在输出端引入注意力机制聚焦关键时间步特征。最后,在Desert Knowledge Australia Solar Centre(DKASC)数据集上的对比实验表明,与传统LSTM、BiLSTM模型相比,提出的TCN-BiLSTM-Attention模型在预测精度、稳定性等方面均表现出一定优势。展开更多
Microseismic(MS)monitoring is an effective technique to detect mining-induced rock fractures.However,recognizing grouting-induced signals is challenging due to complex geological conditions in deep rock plates.Therefo...Microseismic(MS)monitoring is an effective technique to detect mining-induced rock fractures.However,recognizing grouting-induced signals is challenging due to complex geological conditions in deep rock plates.Therefore,a hybrid model(WM-ResNet50)integrating data enhancement,a deep convolutional neural network(CNN),and convolutional block attention modules(CBAM)was proposed.Firstly,an MS system was established at the Xieqiao coal mine in Anhui Province,China.MS waveforms and injection parameters were acquired during grouting.Secondly,signals were categorized based on time-frequency characteristics to build a dataset,which was divided into training,validation,and test sets at a ratio of 4:1:1.Subsequently,the performance of WM-ResNet50 was evaluated based on indices such as individual precision,total accuracy,recall,and loss function.The results indicated that WMResNet50 achieved an average recognition accuracy of 94.38%,surpassing that of a simple CNN(90.04%),ResNet18(91.72%),and ResNet50(92.48%).Finally,WM-ResNet50 was applied to monitor the whole process at laboratory tests and field cases.Both results affirmed the feasibility and effectiveness of MS inversion in predicting actual slurry diffusion ranges within deep rock layers.By comparison,it was revealed that the MS sources classified by WM-ResNet50 matched grouting records well.A solution to address insufficient diffusion under long-borehole grouting has been proposed.WM-ResNet50′s accuracy was validated through in-situ coring and XRD analysis for cement-based hydration products.This study provides a beneficial reference for similar rock signal processing and in-field grouting practices.展开更多
Salient object detection(SOD)models struggle to simultaneously preserve global structure,maintain sharp object boundaries,and sustain computational efficiency in complex scenes.In this study,we propose SPSALNet,a task...Salient object detection(SOD)models struggle to simultaneously preserve global structure,maintain sharp object boundaries,and sustain computational efficiency in complex scenes.In this study,we propose SPSALNet,a task-driven two-stage(macro–micro)architecture that restructures the SOD process around superpixel representations.In the proposed approach,a“split-and-enhance”principle,introduced to our knowledge for the first time in the SOD literature,hierarchically classifies superpixels and then applies targeted refinement only to ambiguous or error-prone regions.At the macro stage,the image is partitioned into content-adaptive superpixel regions,and each superpixel is represented by a high-dimensional region-level feature vector.These representations define a regional decomposition problem in which superpixels are assigned to three classes:background,object interior,and transition regions.Superpixel tokens interact with a global feature vector from a deep network backbone through a cross-attention module and are projected into an enriched embedding space that jointly encodes local topology and global context.At the micro stage,the model employs a U-Net-based refinement process that allocates computational resources only to ambiguous transition regions.The image and distance–similarity maps derived from superpixels are processed through a dual-encoder pathway.Subsequently,channel-aware fusion blocks adaptively combine information from these two sources,producing sharper and more stable object boundaries.Experimental results show that SPSALNet achieves high accuracy with lower computational cost compared to recent competing methods.On the PASCAL-S and DUT-OMRON datasets,SPSALNet exhibits a clear performance advantage across all key metrics,and it ranks first on accuracy-oriented measures on HKU-IS.On the challenging DUT-OMRON benchmark,SPSALNet reaches a MAE of 0.034.Across all datasets,it preserves object boundaries and regional structure in a stable and competitive manner.展开更多
Despite the ability of the anonymous communication system The Onion Router(Tor)to obscure the content of communications,prior studies have shown that passive adversaries can still infer the websites visited by users t...Despite the ability of the anonymous communication system The Onion Router(Tor)to obscure the content of communications,prior studies have shown that passive adversaries can still infer the websites visited by users throughwebsite fingerprinting(WF)attacks.ConventionalWFmethodologies demonstrate optimal performance in scenarios involving single-tab browsing.Conventional WF methods achieve optimal performance primarily in scenarios involving single-tab browsing.However,in real-world network environments,users often engage in multitab browsing,which generates overlapping traffic patterns from different websites.This overlap has been shown to significantly degrade the performance of classifiers that rely on the single-tab assumption.To address this challenge,this paper proposes a Transformer-basedmulti-tab website fingerprinting(MT-WF)attack framework.Themodel employs an adaptive sliding windowmechanism to capture fine-grained features of traffic direction.Additionally,it incorporates a label-aware attention mechanism designed to dynamically separate and refine entangled traffic representations,enhancing the model’s ability to distinguish between overlapping traffic patterns.Furthermore,the model leverages global traffic patterns through multi-segment feature fusion and incorporates an incremental learning(IL)strategy to adapt to the continuously evolving website categories in open-world environments.Experimental results demonstrate that the proposedmethod achieves a top-2 precision of 0.78 in the closed-world setting.In the open-world scenario,the model attains an F1 score of 0.904,outperforming most existing baselines.The proposed method maintains superior performance even under challenging conditions,including WF defenses and concept drift.展开更多
Detecting fake news in multimodal and multilingual social media environments is challenging due to inherent noise,inter-modal imbalance,computational bottlenecks,and semantic ambiguity.To address these issues,we propo...Detecting fake news in multimodal and multilingual social media environments is challenging due to inherent noise,inter-modal imbalance,computational bottlenecks,and semantic ambiguity.To address these issues,we propose SparseMoE-MFN,a novel unified framework that integrates sparse attention with a sparse-activated Mixture of-Experts(MoE)architecture.This framework aims to enhance the efficiency,inferential depth,and interpretability of multimodal fake news detection.Sparse MoE-MFN leverages LLaVA-v1.6-Mistral-7B-HF for efficient visual encoding and Qwen/Qwen2-7B for text processing.The sparse attention module adaptively filters irrelevant tokens and focuses on key regions,reducing computational costs and noise.The sparse MoE module dynamically routes inputs to specialized experts(visual,language,cross-modal alignment)based on content heterogeneity.This expert specialization design boosts computational efficiency and semantic adaptability,enabling precise processing of complex content and improving performance on ambiguous categories.Evaluated on the large-scale,multilingualMR2 dataset,SparseMoEMFN achieves state-of-the-art performance.It obtains an accuracy of 86.7%and a macro-averaged F1 score of 0.859,outperforming strong baselines like MiniGPT-4 by 3.4%and 3.2%,respectively.Notably,it shows significant advantages in the“unverified”category.Furthermore,SparseMoE-MFN demonstrates superior computational efficiency,with an average inference latency of 89.1 ms and 95.4 GFLOPs,substantially lower than existing models.Ablation studies and visualization analyses confirm the effectiveness of both sparse attention and sparse MoE components in improving accuracy,generalization,and efficiency.展开更多
In daily life,keyword spotting plays an important role in human-computer interaction.However,noise often interferes with the extraction of time-frequency information,and achieving both computational efficiency and rec...In daily life,keyword spotting plays an important role in human-computer interaction.However,noise often interferes with the extraction of time-frequency information,and achieving both computational efficiency and recognition accuracy on resource-constrained devices such as mobile terminals remains a major challenge.To address this,we propose a novel time-frequency dual-branch parallel residual network,which integrates a Dual-Branch Broadcast Residual module and a Time-Frequency Coordinate Attention module.The time-domain and frequency-domain branches are designed in parallel to independently extract temporal and spectral features,effectively avoiding the potential information loss caused by serial stacking,while enhancing information flow and multi-scale feature fusion.In terms of training strategy,a curriculum learning approach is introduced to progressively improve model robustness fromeasy to difficult tasks.Experimental results demonstrate that the proposed method consistently outperforms existing lightweight models under various signal-to-noise ratio(SNR)conditions,achieving superior far-field recognition performance on the Google Speech Commands V2 dataset.Notably,the model maintains stable performance even in low-SNR environments such as–10 dB,and generalizes well to unseen SNR conditions during training,validating its robustness to novel noise scenarios.Furthermore,the proposed model exhibits significantly fewer parameters,making it highly suitable for deployment on resource-limited devices.Overall,the model achieves a favorable balance between performance and parameter efficiency,demonstrating strong potential for practical applications.展开更多
Autonomous vehicles rely heavily on accurate and efficient scene segmentation for safe navigation and efficient operations.Traditional Bird’s Eye View(BEV)methods on semantic scene segmentation,which leverage multimo...Autonomous vehicles rely heavily on accurate and efficient scene segmentation for safe navigation and efficient operations.Traditional Bird’s Eye View(BEV)methods on semantic scene segmentation,which leverage multimodal sensor fusion,often struggle with noisy data and demand high-performance GPUs,leading to sensor misalignment and performance degradation.This paper introduces an Enhanced Channel Attention BEV(ECABEV),a novel approach designed to address the challenges under insufficient GPU memory conditions.ECABEV integrates camera and radar data through a de-noise enhanced channel attention mechanism,which utilizes global average and max pooling to effectively filter out noise while preserving discriminative features.Furthermore,an improved fusion approach is proposed to efficiently merge categorical data across modalities.To reduce computational overhead,a bilinear interpolation layer normalizationmethod is devised to ensure spatial feature fidelity.Moreover,a scalable crossentropy loss function is further designed to handle the imbalanced classes with less computational efficiency sacrifice.Extensive experiments on the nuScenes dataset demonstrate that ECABEV achieves state-of-the-art performance with an IoU of 39.961,using a lightweight ViT-B/14 backbone and lower resolution(224×224).Our approach highlights its cost-effectiveness and practical applicability,even on low-end devices.The code is publicly available at:https://github.com/YYF-CQU/ECABEV.git.展开更多
The 6D pose estimation of objects is of great significance for the intelligent assembly and sorting of industrial parts.In the industrial robot production scenarios,the 6D pose estimation of industrial parts mainly fa...The 6D pose estimation of objects is of great significance for the intelligent assembly and sorting of industrial parts.In the industrial robot production scenarios,the 6D pose estimation of industrial parts mainly faces two challenges:one is the loss of information and interference caused by occlusion and stacking in the sorting scenario,the other is the difficulty of feature extraction due to the weak texture of industrial parts.To address the above problems,this paper proposes an attention-based pixel-level voting network for 6D pose estimation of weakly textured industrial parts,namely CB-PVNet.On the one hand,the voting scheme can predict the keypoints of affected pixels,which improves the accuracy of keypoint localization even in scenarios such as weak texture and partial occlusion.On the other hand,the attention mechanism can extract interesting features of the object while suppressing useless features of surroundings.Extensive comparative experiments were conducted on both public datasets(including LINEMOD,Occlusion LINEMOD and T-LESS datasets)and self-made datasets.The experimental results indicate that the proposed network CB-PVNet can achieve accuracy of ADD(-s)comparable to state-of-the-art using only RGB images while ensuring real-time performance.Additionally,we also conducted robot grasping experiments in the real world.The balance between accuracy and computational efficiency makes the method well-suited for applications in industrial automation.展开更多
Fabric defect detection plays a vital role in ensuring textile quality.However,traditional manual inspection methods are often inefficient and inaccurate.To overcome these limitations,we propose FD-YOLO,an enhanced li...Fabric defect detection plays a vital role in ensuring textile quality.However,traditional manual inspection methods are often inefficient and inaccurate.To overcome these limitations,we propose FD-YOLO,an enhanced lightweight detection model based on the YOLOv11n framework.The proposed model introduces the Bi-level Routing Attention(BRAttention)mechanism to enhance defect feature extraction,enabling more detailed feature representation.It proposes Deep Progressive Cross-Scale Fusion Neck(DPCSFNeck)to better capture smallscale defects and incorporates a Multi-Scale Dilated Residual(MSDR)module to strengthen multi-scale feature representation.Furthermore,a Shared Detail-Enhanced Lightweight Head(SDELHead)is employed to reduce the risk of gradient explosion during training.Experimental results demonstrate that FD-YOLO achieves superior detection accuracy and Lightweight performance compared to the baseline YOLOv11n.展开更多
Tomato is a major economic crop worldwide,and diseases on tomato leaves can significantly reduce both yield and quality.Traditional manual inspection is inefficient and highly subjective,making it difficult to meet th...Tomato is a major economic crop worldwide,and diseases on tomato leaves can significantly reduce both yield and quality.Traditional manual inspection is inefficient and highly subjective,making it difficult to meet the requirements of early disease identification in complex natural environments.To address this issue,this study proposes an improved YOLO11-based model,YOLO-SPDNet(Scale Sequence Fusion,Position-Channel Attention,and Dual Enhancement Network).The model integrates the SEAM(Self-Ensembling Attention Mechanism)semantic enhancement module,the MLCA(Mixed Local Channel Attention)lightweight attention mechanism,and the SPA(Scale-Position-Detail Awareness)module composed of SSFF(Scale Sequence Feature Fusion),TFE(Triple Feature Encoding),and CPAM(Channel and Position Attention Mechanism).These enhancements strengthen fine-grained lesion detection while maintaining model lightweightness.Experimental results show that YOLO-SPDNet achieves an accuracy of 91.8%,a recall of 86.5%,and an mAP@0.5 of 90.6%on the test set,with a computational complexity of 12.5 GFLOPs.Furthermore,the model reaches a real-time inference speed of 987 FPS,making it suitable for deployment on mobile agricultural terminals and online monitoring systems.Comparative analysis and ablation studies further validate the reliability and practical applicability of the proposed model in complex natural scenes.展开更多
Accurate and real-time traffic-sign detection is a cornerstone of Advanced Driver-Assistance Systems(ADAS)and autonomous vehicles.However,existing one-stage detectors miss distant signs,and two-stage pipelines are imp...Accurate and real-time traffic-sign detection is a cornerstone of Advanced Driver-Assistance Systems(ADAS)and autonomous vehicles.However,existing one-stage detectors miss distant signs,and two-stage pipelines are impractical for embedded deployment.To address this issue,we present YOLO-SMM,a lightweight two-stage framework.This framework is designed to augment the YOLOv8 baseline with three targeted modules.(1)SlimNeck replaces PAN/FPN with a CSP-OSA/GSConv fusion block,reducing parameters and FLOPs without compromising multi-scale detail.(2)The MCA model introduces row-and column-aware weights to selectively amplify small sign regions in cluttered scenes.(3)MPDIoU augments CIoU loss with a corner-distance term,supplying stable gradients for sub-20-pixel boxes and tightening localization.An evaluation of YOLO-SMMon the German Traffic Sign Recognition Benchmark(GTSRB)revealed that it attained 96.3% mAP50 and 93.1% mAP50-90 at a rate of 90.6 frames per second(FPS).This represents an improvement of+1.0% over previous performance benchmarks.Them APat 64×64 resolution was found to be 50% of the maximum attainable value,with an FPS of+8.3 when compared to YOLOv8.This result indicates superior performance in terms of accuracy and speed compared to YOLOv7,YOLOv5,RetinaNet,EfficientDet,and Faster R-CNN,all of which were operated under equivalent conditions.展开更多
Stereo matching is a pivotal task in computer vision,enabling precise depth estimation from stereo image pairs,yet it encounters challenges in regions with reflections,repetitive textures,or fine structures.In this pa...Stereo matching is a pivotal task in computer vision,enabling precise depth estimation from stereo image pairs,yet it encounters challenges in regions with reflections,repetitive textures,or fine structures.In this paper,we propose a Semantic-Guided Parallax Attention Stereo Matching Network(SGPASMnet)that can be trained in unsupervised manner,building upon the Parallax Attention Stereo Matching Network(PASMnet).Our approach leverages unsupervised learning to address the scarcity of ground truth disparity in stereo matching datasets,facilitating robust training across diverse scene-specific datasets and enhancing generalization.SGPASMnet incorporates two novel components:a Cross-Scale Feature Interaction(CSFI)block and semantic feature augmentation using a pre-trained semantic segmentation model,SegFormer,seamlessly embedded into the parallax attention mechanism.The CSFI block enables effective fusion ofmulti-scale features,integrating coarse and fine details to enhance disparity estimation accuracy.Semantic features,extracted by SegFormer,enrich the parallax attention mechanism by providing high-level scene context,significantly improving performance in ambiguous regions.Our model unifies these enhancements within a cohesive architecture,comprising semantic feature extraction,an hourglass network,a semantic-guided cascaded parallax attentionmodule,outputmodule,and a disparity refinement network.Evaluations on the KITTI2015 dataset demonstrate that our unsupervised method achieves a lower error rate compared to the original PASMnet,highlighting the effectiveness of our enhancements in handling complex scenes.By harnessing unsupervised learning without ground truth disparity needed,SGPASMnet offers a scalable and robust solution for accurate stereo matching,with superior generalization across varied real-world applications.展开更多
Wind turbine blade defect detection faces persistent challenges in separating small,low-contrast surface faults from complex backgrounds while maintaining reliability under variable illumination and viewpoints.Conven-...Wind turbine blade defect detection faces persistent challenges in separating small,low-contrast surface faults from complex backgrounds while maintaining reliability under variable illumination and viewpoints.Conven-tional image-processing pipelines struggle with scalability and robustness,and recent deep learning methods remain sensitive to class imbalance and acquisition variability.This paper introduces TurbineBladeDetNet,a convolutional architecture combining dual-attention mechanisms with multi-path feature extraction for detecting five distinct blade fault types.Our approach employs both channel-wise and spatial attention modules alongside an Albumentations-driven augmentation strategy to handle dataset imbalance and capture condition variability.The model achieves 97.14%accuracy,98.65%precision,and 98.68%recall,yielding a 98.66%F1-score with 0.0110 s inference time.Class-specific analysis shows uniformly high sensitivity and specificity;lightning damage reaches 99.80%for sensitivity,precision,and F1-score,and crack achieves perfect precision and specificity with a 98.94%F1-score.Comparative evaluation against recent wind-turbine inspection approaches indicates higher performance in both accuracy and F1-score.The resulting balance of sensitivity and specificity limits both missed defects and false alarms,supporting reliable deployment in routine unmanned aerial vehicle(UAV)inspection.展开更多
Lithology identificationwhile drilling technology can obtain rock information in real-time.However,traditional lithology identificationmodels often face limitations in feature extraction and adaptability to complex ge...Lithology identificationwhile drilling technology can obtain rock information in real-time.However,traditional lithology identificationmodels often face limitations in feature extraction and adaptability to complex geological conditions,limiting their accuracy in challenging environments.To address these challenges,a deep learning model for lithology identificationwhile drilling is proposed.The proposed model introduces a dual attention mechanism in the long short-term memory(LSTM)network,effectively enhancing the ability to capture spatial and channel dimension information.Subsequently,the crayfishoptimization algorithm(COA)is applied to optimize the model network structure,thereby enhancing its lithology identificationcapability.Laboratory test results demonstrate that the proposed model achieves 97.15%accuracy on the testing set,significantlyoutperforming the traditional support vector machine(SVM)method(81.77%).Field tests under actual drilling conditions demonstrate an average accuracy of 91.96%for the proposed model,representing a 14.31%improvement over the LSTM model alone.The proposed model demonstrates robust adaptability and generalization ability across diverse operational scenarios.This research offers reliable technical support for lithology identification while drilling.展开更多
基金supported by the National Natural Science Foundation of China under Grant No.12204062the Natural Science Foundation of Shandong Province under Grant No.ZR2022MF330。
文摘To enhance speech emotion recognition capability,this study constructs a speech emotion recognition model integrating the adaptive acoustic mixup(AAM)and improved coordinate and shuffle attention(ICASA)methods.The AAM method optimizes data augmentation by combining a sample selection strategy and dynamic interpolation coefficients,thus enabling information fusion of speech data with different emotions at the acoustic level.The ICASA method enhances feature extraction capability through dynamic fusion of the improved coordinate attention(ICA)and shuffle attention(SA)techniques.The ICA technique reduces computational overhead by employing depth-separable convolution and an h-swish activation function and captures long-range dependencies of multi-scale time-frequency features using the attention weights.The SA technique promotes feature interaction through channel shuffling,which helps the model learn richer and more discriminative emotional features.Experimental results demonstrate that,compared to the baseline model,the proposed model improves the weighted accuracy by 5.42%and 4.54%,and the unweighted accuracy by 3.37%and 3.85%on the IEMOCAP and RAVDESS datasets,respectively.These improvements were confirmed to be statistically significant by independent samples t-tests,further supporting the practical reliability and applicability of the proposed model in real-world emotion-aware speech systems.
基金supported by Institute of Information&Communications Technology Planning&Evaluation(IITP)under the Metaverse Support Program to Nurture the Best Talents(IITP-2024-RS-2023-00254529)grant funded by the Korea government(MSIT).
文摘Brain tumors require precise segmentation for diagnosis and treatment plans due to their complex morphology and heterogeneous characteristics.While MRI-based automatic brain tumor segmentation technology reduces the burden on medical staff and provides quantitative information,existing methodologies and recent models still struggle to accurately capture and classify the fine boundaries and diverse morphologies of tumors.In order to address these challenges and maximize the performance of brain tumor segmentation,this research introduces a novel SwinUNETR-based model by integrating a new decoder block,the Hierarchical Channel-wise Attention Decoder(HCAD),into a powerful SwinUNETR encoder.The HCAD decoder block utilizes hierarchical features and channelspecific attention mechanisms to further fuse information at different scales transmitted from the encoder and preserve spatial details throughout the reconstruction phase.Rigorous evaluations on the recent BraTS GLI datasets demonstrate that the proposed SwinHCAD model achieved superior and improved segmentation accuracy on both the Dice score and HD95 metrics across all tumor subregions(WT,TC,and ET)compared to baseline models.In particular,the rationale and contribution of the model design were clarified through ablation studies to verify the effectiveness of the proposed HCAD decoder block.The results of this study are expected to greatly contribute to enhancing the efficiency of clinical diagnosis and treatment planning by increasing the precision of automated brain tumor segmentation.
文摘Modern business information systems face significant challenges in managing heterogeneous data sources,integrating disparate systems,and providing real-time decision support in complex enterprise environments.Contemporary enterprises typically operate 200+interconnected systems,with research indicating that 52% of organizations manage three or more enterprise content management systems,creating information silos that reduce operational efficiency by up to 35%.While attention mechanisms have demonstrated remarkable success in natural language processing and computer vision,their systematic application to business information systems remains largely unexplored.This paper presents the theoretical foundation for a Hierarchical Attention-Based Business Information System(HABIS)framework that applies multi-level attention mechanisms to enterprise environments.We provide a comprehensive mathematical formulation of the framework,analyze its computational complexity,and present a proof-of-concept implementation with simulation-based validation that demonstrates a 42% reduction in crosssystem query latency compared to legacy ERP modules and 70% improvement in prediction accuracy over baseline methods.The theoretical framework introduces four hierarchical attention levels:system-level attention for dynamic weighting of business systems,process-level attention for business process prioritization,data-level attention for critical information selection,and temporal attention for time-sensitive pattern recognition.Our complexity analysis demonstrates that the framework achieves O(n log n)computational complexity for attention computation,making it scalable to large enterprise environments including retail supply chains with 200+system-scale deployments.The proof-of-concept implementation validates the theoretical framework’s feasibility withMSE loss of 0.439 and response times of 0.000120 s per query,demonstrating its potential for addressing key challenges in business information systems.This work establishes a foundation for future empirical research and practical implementation of attention-driven enterprise systems.
文摘Reliable traffic flow prediction is crucial for mitigating urban congestion.This paper proposes Attentionbased spatiotemporal Interactive Dynamic Graph Convolutional Network(AIDGCN),a novel architecture integrating Interactive Dynamic Graph Convolution Network(IDGCN)with Temporal Multi-Head Trend-Aware Attention.Its core innovation lies in IDGCN,which uniquely splits sequences into symmetric intervals for interactive feature sharing via dynamic graphs,and a novel attention mechanism incorporating convolutional operations to capture essential local traffic trends—addressing a critical gap in standard attention for continuous data.For 15-and 60-min forecasting on METR-LA,AIDGCN achieves MAEs of 0.75%and 0.39%,and RMSEs of 1.32%and 0.14%,respectively.In the 60-min long-term forecasting of the PEMS-BAY dataset,the AIDGCN out-performs the MRA-BGCN method by 6.28%,4.93%,and 7.17%in terms of MAE,RMSE,and MAPE,respectively.Experimental results demonstrate the superiority of our pro-posed model over state-of-the-art methods.
文摘随着光伏发电在全球能源体系中占比不断提升,超短期光伏发电量预测对电力系统调度与安全运行至关重要。然而,光伏发电量受多因素影响,具有显著随机性与波动性。为此,提出了一种基于TCN-BiLSTM-Attention模型的超短期光伏发电量预测方法。首先通过皮尔逊相关分析筛选关键特征,并利用孤立森林算法检测异常值,结合线性插值法和标准化完成数据预处理。随后,通过时间卷积网络(Temporal Convolutional Network,TCN)提取时序特征,再利用双向长短期记忆网络(Bidirectional Long Short-Term Memory,BiLSTM)网络捕获前后向时间依赖关系,并在输出端引入注意力机制聚焦关键时间步特征。最后,在Desert Knowledge Australia Solar Centre(DKASC)数据集上的对比实验表明,与传统LSTM、BiLSTM模型相比,提出的TCN-BiLSTM-Attention模型在预测精度、稳定性等方面均表现出一定优势。
基金financial support from the National Natural Science Foundation of China(Nos.52204089,52374082)the Young Elite Scientists Sponsorship Program(No.2023QNRC001)by China Association for Science and Technology(CAST).
文摘Microseismic(MS)monitoring is an effective technique to detect mining-induced rock fractures.However,recognizing grouting-induced signals is challenging due to complex geological conditions in deep rock plates.Therefore,a hybrid model(WM-ResNet50)integrating data enhancement,a deep convolutional neural network(CNN),and convolutional block attention modules(CBAM)was proposed.Firstly,an MS system was established at the Xieqiao coal mine in Anhui Province,China.MS waveforms and injection parameters were acquired during grouting.Secondly,signals were categorized based on time-frequency characteristics to build a dataset,which was divided into training,validation,and test sets at a ratio of 4:1:1.Subsequently,the performance of WM-ResNet50 was evaluated based on indices such as individual precision,total accuracy,recall,and loss function.The results indicated that WMResNet50 achieved an average recognition accuracy of 94.38%,surpassing that of a simple CNN(90.04%),ResNet18(91.72%),and ResNet50(92.48%).Finally,WM-ResNet50 was applied to monitor the whole process at laboratory tests and field cases.Both results affirmed the feasibility and effectiveness of MS inversion in predicting actual slurry diffusion ranges within deep rock layers.By comparison,it was revealed that the MS sources classified by WM-ResNet50 matched grouting records well.A solution to address insufficient diffusion under long-borehole grouting has been proposed.WM-ResNet50′s accuracy was validated through in-situ coring and XRD analysis for cement-based hydration products.This study provides a beneficial reference for similar rock signal processing and in-field grouting practices.
文摘Salient object detection(SOD)models struggle to simultaneously preserve global structure,maintain sharp object boundaries,and sustain computational efficiency in complex scenes.In this study,we propose SPSALNet,a task-driven two-stage(macro–micro)architecture that restructures the SOD process around superpixel representations.In the proposed approach,a“split-and-enhance”principle,introduced to our knowledge for the first time in the SOD literature,hierarchically classifies superpixels and then applies targeted refinement only to ambiguous or error-prone regions.At the macro stage,the image is partitioned into content-adaptive superpixel regions,and each superpixel is represented by a high-dimensional region-level feature vector.These representations define a regional decomposition problem in which superpixels are assigned to three classes:background,object interior,and transition regions.Superpixel tokens interact with a global feature vector from a deep network backbone through a cross-attention module and are projected into an enriched embedding space that jointly encodes local topology and global context.At the micro stage,the model employs a U-Net-based refinement process that allocates computational resources only to ambiguous transition regions.The image and distance–similarity maps derived from superpixels are processed through a dual-encoder pathway.Subsequently,channel-aware fusion blocks adaptively combine information from these two sources,producing sharper and more stable object boundaries.Experimental results show that SPSALNet achieves high accuracy with lower computational cost compared to recent competing methods.On the PASCAL-S and DUT-OMRON datasets,SPSALNet exhibits a clear performance advantage across all key metrics,and it ranks first on accuracy-oriented measures on HKU-IS.On the challenging DUT-OMRON benchmark,SPSALNet reaches a MAE of 0.034.Across all datasets,it preserves object boundaries and regional structure in a stable and competitive manner.
文摘Despite the ability of the anonymous communication system The Onion Router(Tor)to obscure the content of communications,prior studies have shown that passive adversaries can still infer the websites visited by users throughwebsite fingerprinting(WF)attacks.ConventionalWFmethodologies demonstrate optimal performance in scenarios involving single-tab browsing.Conventional WF methods achieve optimal performance primarily in scenarios involving single-tab browsing.However,in real-world network environments,users often engage in multitab browsing,which generates overlapping traffic patterns from different websites.This overlap has been shown to significantly degrade the performance of classifiers that rely on the single-tab assumption.To address this challenge,this paper proposes a Transformer-basedmulti-tab website fingerprinting(MT-WF)attack framework.Themodel employs an adaptive sliding windowmechanism to capture fine-grained features of traffic direction.Additionally,it incorporates a label-aware attention mechanism designed to dynamically separate and refine entangled traffic representations,enhancing the model’s ability to distinguish between overlapping traffic patterns.Furthermore,the model leverages global traffic patterns through multi-segment feature fusion and incorporates an incremental learning(IL)strategy to adapt to the continuously evolving website categories in open-world environments.Experimental results demonstrate that the proposedmethod achieves a top-2 precision of 0.78 in the closed-world setting.In the open-world scenario,the model attains an F1 score of 0.904,outperforming most existing baselines.The proposed method maintains superior performance even under challenging conditions,including WF defenses and concept drift.
基金supported by the National Social Science Fund of China(20BXW101).
文摘Detecting fake news in multimodal and multilingual social media environments is challenging due to inherent noise,inter-modal imbalance,computational bottlenecks,and semantic ambiguity.To address these issues,we propose SparseMoE-MFN,a novel unified framework that integrates sparse attention with a sparse-activated Mixture of-Experts(MoE)architecture.This framework aims to enhance the efficiency,inferential depth,and interpretability of multimodal fake news detection.Sparse MoE-MFN leverages LLaVA-v1.6-Mistral-7B-HF for efficient visual encoding and Qwen/Qwen2-7B for text processing.The sparse attention module adaptively filters irrelevant tokens and focuses on key regions,reducing computational costs and noise.The sparse MoE module dynamically routes inputs to specialized experts(visual,language,cross-modal alignment)based on content heterogeneity.This expert specialization design boosts computational efficiency and semantic adaptability,enabling precise processing of complex content and improving performance on ambiguous categories.Evaluated on the large-scale,multilingualMR2 dataset,SparseMoEMFN achieves state-of-the-art performance.It obtains an accuracy of 86.7%and a macro-averaged F1 score of 0.859,outperforming strong baselines like MiniGPT-4 by 3.4%and 3.2%,respectively.Notably,it shows significant advantages in the“unverified”category.Furthermore,SparseMoE-MFN demonstrates superior computational efficiency,with an average inference latency of 89.1 ms and 95.4 GFLOPs,substantially lower than existing models.Ablation studies and visualization analyses confirm the effectiveness of both sparse attention and sparse MoE components in improving accuracy,generalization,and efficiency.
文摘In daily life,keyword spotting plays an important role in human-computer interaction.However,noise often interferes with the extraction of time-frequency information,and achieving both computational efficiency and recognition accuracy on resource-constrained devices such as mobile terminals remains a major challenge.To address this,we propose a novel time-frequency dual-branch parallel residual network,which integrates a Dual-Branch Broadcast Residual module and a Time-Frequency Coordinate Attention module.The time-domain and frequency-domain branches are designed in parallel to independently extract temporal and spectral features,effectively avoiding the potential information loss caused by serial stacking,while enhancing information flow and multi-scale feature fusion.In terms of training strategy,a curriculum learning approach is introduced to progressively improve model robustness fromeasy to difficult tasks.Experimental results demonstrate that the proposed method consistently outperforms existing lightweight models under various signal-to-noise ratio(SNR)conditions,achieving superior far-field recognition performance on the Google Speech Commands V2 dataset.Notably,the model maintains stable performance even in low-SNR environments such as–10 dB,and generalizes well to unseen SNR conditions during training,validating its robustness to novel noise scenarios.Furthermore,the proposed model exhibits significantly fewer parameters,making it highly suitable for deployment on resource-limited devices.Overall,the model achieves a favorable balance between performance and parameter efficiency,demonstrating strong potential for practical applications.
基金funded by the National Natural Science Foundation of China,grant number 62262045the Fundamental Research Funds for the Central Universities,grant number 2023CDJYGRH-YB11the Open Funding of SUGON Industrial Control and Security Center,grant number CUIT-SICSC-2025-03.
文摘Autonomous vehicles rely heavily on accurate and efficient scene segmentation for safe navigation and efficient operations.Traditional Bird’s Eye View(BEV)methods on semantic scene segmentation,which leverage multimodal sensor fusion,often struggle with noisy data and demand high-performance GPUs,leading to sensor misalignment and performance degradation.This paper introduces an Enhanced Channel Attention BEV(ECABEV),a novel approach designed to address the challenges under insufficient GPU memory conditions.ECABEV integrates camera and radar data through a de-noise enhanced channel attention mechanism,which utilizes global average and max pooling to effectively filter out noise while preserving discriminative features.Furthermore,an improved fusion approach is proposed to efficiently merge categorical data across modalities.To reduce computational overhead,a bilinear interpolation layer normalizationmethod is devised to ensure spatial feature fidelity.Moreover,a scalable crossentropy loss function is further designed to handle the imbalanced classes with less computational efficiency sacrifice.Extensive experiments on the nuScenes dataset demonstrate that ECABEV achieves state-of-the-art performance with an IoU of 39.961,using a lightweight ViT-B/14 backbone and lower resolution(224×224).Our approach highlights its cost-effectiveness and practical applicability,even on low-end devices.The code is publicly available at:https://github.com/YYF-CQU/ECABEV.git.
基金supported by the Knowledge Innovation Program of Wuhan-Shuguang Project(Grant No.2023010201020443)the School-Level Scientific Research Project Funding Program of Jianghan University(Grant No.2022XKZX33)the Natural Science Foundation of Hubei Province(Grant No.2024AFB466).
文摘The 6D pose estimation of objects is of great significance for the intelligent assembly and sorting of industrial parts.In the industrial robot production scenarios,the 6D pose estimation of industrial parts mainly faces two challenges:one is the loss of information and interference caused by occlusion and stacking in the sorting scenario,the other is the difficulty of feature extraction due to the weak texture of industrial parts.To address the above problems,this paper proposes an attention-based pixel-level voting network for 6D pose estimation of weakly textured industrial parts,namely CB-PVNet.On the one hand,the voting scheme can predict the keypoints of affected pixels,which improves the accuracy of keypoint localization even in scenarios such as weak texture and partial occlusion.On the other hand,the attention mechanism can extract interesting features of the object while suppressing useless features of surroundings.Extensive comparative experiments were conducted on both public datasets(including LINEMOD,Occlusion LINEMOD and T-LESS datasets)and self-made datasets.The experimental results indicate that the proposed network CB-PVNet can achieve accuracy of ADD(-s)comparable to state-of-the-art using only RGB images while ensuring real-time performance.Additionally,we also conducted robot grasping experiments in the real world.The balance between accuracy and computational efficiency makes the method well-suited for applications in industrial automation.
基金financially supported by the Fujian Provincial Department of Science and Technology,the Collaborative Innovation Platform Project for Key Technologies of Smart Warehousing and Logistics Systems in the Fuzhou-Xiamen-Quanzhou National Independent Innovation Demonstration Zone(No.2025E3024).
文摘Fabric defect detection plays a vital role in ensuring textile quality.However,traditional manual inspection methods are often inefficient and inaccurate.To overcome these limitations,we propose FD-YOLO,an enhanced lightweight detection model based on the YOLOv11n framework.The proposed model introduces the Bi-level Routing Attention(BRAttention)mechanism to enhance defect feature extraction,enabling more detailed feature representation.It proposes Deep Progressive Cross-Scale Fusion Neck(DPCSFNeck)to better capture smallscale defects and incorporates a Multi-Scale Dilated Residual(MSDR)module to strengthen multi-scale feature representation.Furthermore,a Shared Detail-Enhanced Lightweight Head(SDELHead)is employed to reduce the risk of gradient explosion during training.Experimental results demonstrate that FD-YOLO achieves superior detection accuracy and Lightweight performance compared to the baseline YOLOv11n.
基金Tianmin Tianyuan Boutique Vegetable Industry Technology Service Station(Grant No.2024120011003081)Development of Environmental Monitoring and Traceability System for Wuqing Agricultural Production Areas(Grant No.2024120011001866)。
文摘Tomato is a major economic crop worldwide,and diseases on tomato leaves can significantly reduce both yield and quality.Traditional manual inspection is inefficient and highly subjective,making it difficult to meet the requirements of early disease identification in complex natural environments.To address this issue,this study proposes an improved YOLO11-based model,YOLO-SPDNet(Scale Sequence Fusion,Position-Channel Attention,and Dual Enhancement Network).The model integrates the SEAM(Self-Ensembling Attention Mechanism)semantic enhancement module,the MLCA(Mixed Local Channel Attention)lightweight attention mechanism,and the SPA(Scale-Position-Detail Awareness)module composed of SSFF(Scale Sequence Feature Fusion),TFE(Triple Feature Encoding),and CPAM(Channel and Position Attention Mechanism).These enhancements strengthen fine-grained lesion detection while maintaining model lightweightness.Experimental results show that YOLO-SPDNet achieves an accuracy of 91.8%,a recall of 86.5%,and an mAP@0.5 of 90.6%on the test set,with a computational complexity of 12.5 GFLOPs.Furthermore,the model reaches a real-time inference speed of 987 FPS,making it suitable for deployment on mobile agricultural terminals and online monitoring systems.Comparative analysis and ablation studies further validate the reliability and practical applicability of the proposed model in complex natural scenes.
基金supported by University of Malaya and Ministry of High Education-Malaysia via Fundamental Research Grant Scheme No.FRGS/1/2023/TK10/UM/02/3.
文摘Accurate and real-time traffic-sign detection is a cornerstone of Advanced Driver-Assistance Systems(ADAS)and autonomous vehicles.However,existing one-stage detectors miss distant signs,and two-stage pipelines are impractical for embedded deployment.To address this issue,we present YOLO-SMM,a lightweight two-stage framework.This framework is designed to augment the YOLOv8 baseline with three targeted modules.(1)SlimNeck replaces PAN/FPN with a CSP-OSA/GSConv fusion block,reducing parameters and FLOPs without compromising multi-scale detail.(2)The MCA model introduces row-and column-aware weights to selectively amplify small sign regions in cluttered scenes.(3)MPDIoU augments CIoU loss with a corner-distance term,supplying stable gradients for sub-20-pixel boxes and tightening localization.An evaluation of YOLO-SMMon the German Traffic Sign Recognition Benchmark(GTSRB)revealed that it attained 96.3% mAP50 and 93.1% mAP50-90 at a rate of 90.6 frames per second(FPS).This represents an improvement of+1.0% over previous performance benchmarks.Them APat 64×64 resolution was found to be 50% of the maximum attainable value,with an FPS of+8.3 when compared to YOLOv8.This result indicates superior performance in terms of accuracy and speed compared to YOLOv7,YOLOv5,RetinaNet,EfficientDet,and Faster R-CNN,all of which were operated under equivalent conditions.
基金supported by the National Natural Science Foundation of China,No.62301497the Science and Technology Research Program of Henan,No.252102211024the Key Research and Development Program of Henan,No.231111212000.
文摘Stereo matching is a pivotal task in computer vision,enabling precise depth estimation from stereo image pairs,yet it encounters challenges in regions with reflections,repetitive textures,or fine structures.In this paper,we propose a Semantic-Guided Parallax Attention Stereo Matching Network(SGPASMnet)that can be trained in unsupervised manner,building upon the Parallax Attention Stereo Matching Network(PASMnet).Our approach leverages unsupervised learning to address the scarcity of ground truth disparity in stereo matching datasets,facilitating robust training across diverse scene-specific datasets and enhancing generalization.SGPASMnet incorporates two novel components:a Cross-Scale Feature Interaction(CSFI)block and semantic feature augmentation using a pre-trained semantic segmentation model,SegFormer,seamlessly embedded into the parallax attention mechanism.The CSFI block enables effective fusion ofmulti-scale features,integrating coarse and fine details to enhance disparity estimation accuracy.Semantic features,extracted by SegFormer,enrich the parallax attention mechanism by providing high-level scene context,significantly improving performance in ambiguous regions.Our model unifies these enhancements within a cohesive architecture,comprising semantic feature extraction,an hourglass network,a semantic-guided cascaded parallax attentionmodule,outputmodule,and a disparity refinement network.Evaluations on the KITTI2015 dataset demonstrate that our unsupervised method achieves a lower error rate compared to the original PASMnet,highlighting the effectiveness of our enhancements in handling complex scenes.By harnessing unsupervised learning without ground truth disparity needed,SGPASMnet offers a scalable and robust solution for accurate stereo matching,with superior generalization across varied real-world applications.
文摘Wind turbine blade defect detection faces persistent challenges in separating small,low-contrast surface faults from complex backgrounds while maintaining reliability under variable illumination and viewpoints.Conven-tional image-processing pipelines struggle with scalability and robustness,and recent deep learning methods remain sensitive to class imbalance and acquisition variability.This paper introduces TurbineBladeDetNet,a convolutional architecture combining dual-attention mechanisms with multi-path feature extraction for detecting five distinct blade fault types.Our approach employs both channel-wise and spatial attention modules alongside an Albumentations-driven augmentation strategy to handle dataset imbalance and capture condition variability.The model achieves 97.14%accuracy,98.65%precision,and 98.68%recall,yielding a 98.66%F1-score with 0.0110 s inference time.Class-specific analysis shows uniformly high sensitivity and specificity;lightning damage reaches 99.80%for sensitivity,precision,and F1-score,and crack achieves perfect precision and specificity with a 98.94%F1-score.Comparative evaluation against recent wind-turbine inspection approaches indicates higher performance in both accuracy and F1-score.The resulting balance of sensitivity and specificity limits both missed defects and false alarms,supporting reliable deployment in routine unmanned aerial vehicle(UAV)inspection.
基金supported by the National Key Research and Development Program for Young Scientists,Chin(Grant No.2021YFC2900400)the Sichuan-Chongqing Science and Technology Innovation Cooperation Program Project,China(Grant No.2024TIAD-CYKJCXX0269)the National Natural Science Foundation of China,China(Grant No.52304123).
文摘Lithology identificationwhile drilling technology can obtain rock information in real-time.However,traditional lithology identificationmodels often face limitations in feature extraction and adaptability to complex geological conditions,limiting their accuracy in challenging environments.To address these challenges,a deep learning model for lithology identificationwhile drilling is proposed.The proposed model introduces a dual attention mechanism in the long short-term memory(LSTM)network,effectively enhancing the ability to capture spatial and channel dimension information.Subsequently,the crayfishoptimization algorithm(COA)is applied to optimize the model network structure,thereby enhancing its lithology identificationcapability.Laboratory test results demonstrate that the proposed model achieves 97.15%accuracy on the testing set,significantlyoutperforming the traditional support vector machine(SVM)method(81.77%).Field tests under actual drilling conditions demonstrate an average accuracy of 91.96%for the proposed model,representing a 14.31%improvement over the LSTM model alone.The proposed model demonstrates robust adaptability and generalization ability across diverse operational scenarios.This research offers reliable technical support for lithology identification while drilling.