期刊文献+
共找到207篇文章
< 1 2 11 >
每页显示 20 50 100
基于改进Vision Transformer的水稻叶片病害图像识别
1
作者 朱周华 周怡纳 +1 位作者 侯智杰 田成源 《电子测量技术》 北大核心 2025年第10期153-160,共8页
水稻叶片病害智能识别在现代农业生产中具有重要意义。针对传统Vision Transformer网络缺乏归纳偏置,难以有效捕捉图像局部细节特征的问题,提出了一种改进的Vision Transformer模型。该模型通过引入内在归纳偏置,增强了对多尺度上下文... 水稻叶片病害智能识别在现代农业生产中具有重要意义。针对传统Vision Transformer网络缺乏归纳偏置,难以有效捕捉图像局部细节特征的问题,提出了一种改进的Vision Transformer模型。该模型通过引入内在归纳偏置,增强了对多尺度上下文以及局部与全局依赖关系的建模能力,同时降低了对大规模数据集的需求。此外,Vision Transformer中的多层感知器模块被Kolmogorov-Arnold网络结构取代,从而提升了模型对复杂特征的提取能力和可解释性。实验结果表明,所提模型在水稻叶片病害识别任务中取得了优异的性能,识别准确率达到了98.62%,较原始ViT模型提升了6.2%,显著提高了对水稻叶片病害的识别性能。 展开更多
关键词 水稻叶片病害 图像识别 vision transformer网络 归纳偏置 局部特征
原文传递
Vision Transformer模型在中医舌诊图像分类中的应用研究
2
作者 周坚和 王彩雄 +3 位作者 李炜 周晓玲 张丹璇 吴玉峰 《广西科技大学学报》 2025年第5期89-98,共10页
舌诊作为中医望诊中的一项重要且常规的检查手段,在中医临床诊断中发挥着不可或缺的作用。为突破传统舌诊依赖主观经验及卷积神经网络(convolutional neural network,CNN)模型分类性能不足的局限,本文基于高质量舌象分类数据集,提出基于... 舌诊作为中医望诊中的一项重要且常规的检查手段,在中医临床诊断中发挥着不可或缺的作用。为突破传统舌诊依赖主观经验及卷积神经网络(convolutional neural network,CNN)模型分类性能不足的局限,本文基于高质量舌象分类数据集,提出基于Vision Transformer(ViT)深度学习模型,通过预训练与微调策略优化特征提取能力,并结合数据增强技术解决类别分布不平衡问题。实验结果表明,该模型在6项关键舌象特征分类任务中,5项指标的准确率(苔色85.6%、瘀斑98.0%、质地99.6%、舌色96.6%、裂纹87.8%)显著优于现有CNN方法(如ResNet50对应准确率分别为78.0%、91.0%、92.0%、68.0%、80.1%),验证了该模型在突破传统性能瓶颈、提升中医临床智能诊断可靠性方面的有效性和应用潜力。 展开更多
关键词 舌诊 vision transformer(ViT) 深度学习 医学图像分类
在线阅读 下载PDF
A Hybrid Approach for Pavement Crack Detection Using Mask R-CNN and Vision Transformer Model 被引量:2
3
作者 Shorouq Alshawabkeh Li Wu +2 位作者 Daojun Dong Yao Cheng Liping Li 《Computers, Materials & Continua》 SCIE EI 2025年第1期561-577,共17页
Detecting pavement cracks is critical for road safety and infrastructure management.Traditional methods,relying on manual inspection and basic image processing,are time-consuming and prone to errors.Recent deep-learni... Detecting pavement cracks is critical for road safety and infrastructure management.Traditional methods,relying on manual inspection and basic image processing,are time-consuming and prone to errors.Recent deep-learning(DL)methods automate crack detection,but many still struggle with variable crack patterns and environmental conditions.This study aims to address these limitations by introducing the Masker Transformer,a novel hybrid deep learning model that integrates the precise localization capabilities of Mask Region-based Convolutional Neural Network(Mask R-CNN)with the global contextual awareness of Vision Transformer(ViT).The research focuses on leveraging the strengths of both architectures to enhance segmentation accuracy and adaptability across different pavement conditions.We evaluated the performance of theMaskerTransformer against other state-of-theartmodels such asU-Net,TransformerU-Net(TransUNet),U-NetTransformer(UNETr),SwinU-NetTransformer(Swin-UNETr),You Only Look Once version 8(YoloV8),and Mask R-CNN using two benchmark datasets:Crack500 and DeepCrack.The findings reveal that the MaskerTransformer significantly outperforms the existing models,achieving the highest Dice SimilarityCoefficient(DSC),precision,recall,and F1-Score across both datasets.Specifically,the model attained a DSC of 80.04%on Crack500 and 91.37%on DeepCrack,demonstrating superior segmentation accuracy and reliability.The high precision and recall rates further substantiate its effectiveness in real-world applications,suggesting that the Masker Transformer can serve as a robust tool for automated pavement crack detection,potentially replacing more traditional methods. 展开更多
关键词 Pavement crack segmentation TRANSPORTATION deep learning vision transformer Mask R-CNN image segmentation
在线阅读 下载PDF
Vision Transformer深度学习模型在前列腺癌识别中的价值
4
作者 李梦娟 金龙 +2 位作者 尹胜男 计一丁 丁宁 《中国医学计算机成像杂志》 北大核心 2025年第3期396-401,共6页
目的:旨在探讨Vision Transformer(ViT)深度学习模型在前列腺癌(PCa)识别中的应用价值.方法:回顾性分析了480例接受磁共振成像(MRI)检查的患者影像资料.采用TotalSegmentator模型自动分割前列腺区域,通过ViT深度学习方法分别构建基于T2... 目的:旨在探讨Vision Transformer(ViT)深度学习模型在前列腺癌(PCa)识别中的应用价值.方法:回顾性分析了480例接受磁共振成像(MRI)检查的患者影像资料.采用TotalSegmentator模型自动分割前列腺区域,通过ViT深度学习方法分别构建基于T2加权像(T2WI)、基于表观弥散系数(ADC)图和基于两者结合的三个ViT模型.结果:在PCa的识别能力上,结合模型在训练组和测试组上的受试者工作特征(ROC)曲线下面积(AUC)分别为0.961和0.980,优于仅基于单一成像序列构建的ViT模型.在基于单一序列构建的ViT模型中,基于ADC图的模型相较于基于T2WI的模型表现更佳.此外,决策曲线分析显示结合模型提供了更大的临床效益.结论:ViT深度学习模型在前列腺癌识别中具有较高的诊断准确性和潜在价值. 展开更多
关键词 vision transformer 深度学习 前列腺癌 自动分割 磁共振成像
暂未订购
基于Vision Transformer的混合型晶圆图缺陷模式识别
5
作者 李攀 娄莉 《现代信息科技》 2025年第19期26-30,共5页
晶圆测试作为芯片生产过程中重要的一环,晶圆图缺陷模式的识别和分类对改进前端制造工艺具有关键作用。在实际生产过程中,各类缺陷可能同时出现,形成混合缺陷类型。传统深度学习方法对混合型晶圆图缺陷信息的识别率较低,为此,文章提出... 晶圆测试作为芯片生产过程中重要的一环,晶圆图缺陷模式的识别和分类对改进前端制造工艺具有关键作用。在实际生产过程中,各类缺陷可能同时出现,形成混合缺陷类型。传统深度学习方法对混合型晶圆图缺陷信息的识别率较低,为此,文章提出一种基于Vision Transformer的缺陷识别方法。该方法采用多头自注意力机制对晶圆图的全局特征进行编码,实现了对混合型晶圆缺陷图的高效识别。在混合型缺陷数据集上的实验结果表明,该方法性能优于现有深度学习模型,平均正确率达96.2%。 展开更多
关键词 计算机视觉 晶圆图 缺陷识别 vision transformer
在线阅读 下载PDF
基于改进Vision Transformer的遥感图像分类研究
6
作者 李宗轩 冷欣 +1 位作者 章磊 陈佳凯 《林业机械与木工设备》 2025年第6期31-35,共5页
通过遥感图像分类能够快速有效获取森林区域分布,为林业资源管理监测提供支持。Vision Transformer(ViT)凭借优秀的全局信息捕捉能力在遥感图像分类任务中广泛应用。但Vision Transformer在浅层特征提取时会冗余捕捉其他局部特征而无法... 通过遥感图像分类能够快速有效获取森林区域分布,为林业资源管理监测提供支持。Vision Transformer(ViT)凭借优秀的全局信息捕捉能力在遥感图像分类任务中广泛应用。但Vision Transformer在浅层特征提取时会冗余捕捉其他局部特征而无法有效捕获关键特征,并且Vision Transformer在将图像分割为patch过程中可能会导致边缘等细节信息的丢失,从而影响分类准确性。针对上述问题提出一种改进Vision Transformer,引入了STA(Super Token Attention)注意力机制来增强Vision Transformer对关键特征信息的提取并减少计算冗余度,还通过加入哈尔小波下采样(Haar Wavelet Downsampling)在减少细节信息丢失的同时增强对图像不同尺度局部和全局信息的捕获能力。通过实验在AID数据集上达到了92.98%的总体准确率,证明了提出方法的有效性。 展开更多
关键词 遥感图像分类 vision transformer 哈尔小波下采样 STA注意力机制
在线阅读 下载PDF
Steel Surface Defect Detection Using Learnable Memory Vision Transformer
7
作者 Syed Tasnimul Karim Ayon Farhan Md.Siraj Jia Uddin 《Computers, Materials & Continua》 SCIE EI 2025年第1期499-520,共22页
This study investigates the application of Learnable Memory Vision Transformers(LMViT)for detecting metal surface flaws,comparing their performance with traditional CNNs,specifically ResNet18 and ResNet50,as well as o... This study investigates the application of Learnable Memory Vision Transformers(LMViT)for detecting metal surface flaws,comparing their performance with traditional CNNs,specifically ResNet18 and ResNet50,as well as other transformer-based models including Token to Token ViT,ViT withoutmemory,and Parallel ViT.Leveraging awidely-used steel surface defect dataset,the research applies data augmentation and t-distributed stochastic neighbor embedding(t-SNE)to enhance feature extraction and understanding.These techniques mitigated overfitting,stabilized training,and improved generalization capabilities.The LMViT model achieved a test accuracy of 97.22%,significantly outperforming ResNet18(88.89%)and ResNet50(88.90%),aswell as the Token to TokenViT(88.46%),ViT without memory(87.18),and Parallel ViT(91.03%).Furthermore,LMViT exhibited superior training and validation performance,attaining a validation accuracy of 98.2%compared to 91.0%for ResNet 18,96.0%for ResNet50,and 89.12%,87.51%,and 91.21%for Token to Token ViT,ViT without memory,and Parallel ViT,respectively.The findings highlight the LMViT’s ability to capture long-range dependencies in images,an areawhere CNNs struggle due to their reliance on local receptive fields and hierarchical feature extraction.The additional transformer-based models also demonstrate improved performance in capturing complex features over CNNs,with LMViT excelling particularly at detecting subtle and complex defects,which is critical for maintaining product quality and operational efficiency in industrial applications.For instance,the LMViT model successfully identified fine scratches and minor surface irregularities that CNNs often misclassify.This study not only demonstrates LMViT’s potential for real-world defect detection but also underscores the promise of other transformer-based architectures like Token to Token ViT,ViT without memory,and Parallel ViT in industrial scenarios where complex spatial relationships are key.Future research may focus on enhancing LMViT’s computational efficiency for deployment in real-time quality control systems. 展开更多
关键词 Learnable Memory vision transformer(LMViT) Convolutional Neural Networks(CNN) metal surface defect detection deep learning computer vision image classification learnable memory gradient clipping label smoothing t-SNE visualization
在线阅读 下载PDF
ViT-Count:面向冠层遮挡的Vision Transformer树木计数定位方法
8
作者 张乔一 张瑞 霍光煜 《北京林业大学学报》 北大核心 2025年第10期128-138,共11页
【目的】针对复杂场景中树木检测的挑战,如遮挡、背景干扰及密集分布等,本研究提出一种基于Vision Transformer(ViT)的树木检测方法(ViT-Count),提升模型对复杂场景中树木的检测精度与鲁棒性。【方法】采用ViT作为基础模型,其在捕捉图... 【目的】针对复杂场景中树木检测的挑战,如遮挡、背景干扰及密集分布等,本研究提出一种基于Vision Transformer(ViT)的树木检测方法(ViT-Count),提升模型对复杂场景中树木的检测精度与鲁棒性。【方法】采用ViT作为基础模型,其在捕捉图像中全局上下文信息方面具有天然优势,尤其适用于形态多变的复杂环境。设计针对树木的视觉提示调优VPT机制,其通过在特征中注入可学习提示(prompts),优化模型在林地高密度树冠、光照变化及不同树种结构下的特征提取能力,提高对不同林分类型的适应性。设计卷积模块的注意力机制模块,利用其在局部感知基础上的长距离依赖建模能力,有效强化模型对树木遮挡、重叠及形态相似目标的辨别能力,提高整体检测的鲁棒性与准确性。设计一个树木检测解码器,通过多层卷积、归一化、GELU激活与上采样操作逐步还原空间分辨率,以生成的目标密度图实现树木计数与定位。【结果】该方法在提升森林、城市场景下的树木检测鲁棒性的同时,增强了模型在多尺度树木目标上的泛化能力。在Larch Casebearer数据集和Urban Tree数据集上进行的实验显示,与其他主流模型相比,该方法的MAE和RMSE最多分别降低了2.53、3.99,表明其泛化能力更强,具有最优的树木检测性能。可视化实验结果表明,在密集森林场景和复杂城市场景中,所提模型均具有较高的树木检测准确率。消融实验的结果证明了模型主要模块的有效性。【结论】基于Vision Transformer的面向复杂场景的树木计数与定位方法能够充分发挥ViT的全局建模能力及视觉提示调优机制任务适应性,结合卷积模块的注意力机制,有效提升复杂场景树木计数与定位的精度与鲁棒性。 展开更多
关键词 目标识别 树木计数 树木定位 复杂场景 vision transformer(ViT) 视觉提示调优(VPT) 注意力机制
在线阅读 下载PDF
基于改进Vision Transformer的森林火灾视频识别研究
9
作者 张敏 辛颖 黄天棋 《南京林业大学学报(自然科学版)》 北大核心 2025年第4期186-194,共9页
【目的】针对现有森林火灾图像识别算法存在的效率不足、时序特征利用率低等问题,构建基于视频数据的森林火灾识别模型,以提升林火监测的实时性与识别准确率。【方法】提出融合三维卷积神经网络(3DCNN)与视觉Vision Transformer(ViT)的C... 【目的】针对现有森林火灾图像识别算法存在的效率不足、时序特征利用率低等问题,构建基于视频数据的森林火灾识别模型,以提升林火监测的实时性与识别准确率。【方法】提出融合三维卷积神经网络(3DCNN)与视觉Vision Transformer(ViT)的C3D-ViT算法。该模型通过3DCNN提取视频序列的时空特征,构建时空特征向量;利用ViT编码器的自注意力机制融合局部与全局特征;最终经MLP Head层输出分类结果。通过消融实验验证C3D-ViT模型的有效性,并与原模型3DCNN和ViT,以及ResNet50、LSTM、YOLOv5等深度学习模型进行对比。【结果】C3D-ViT在自建林火数据集上准确率达到96.10%,较ResNet50(89.07%)、LSTM(93.26%)和YOLOv5(91.46%)具有明显优势。模型改进有效,准确率超越3DCNN(93.91%)与ViT(90.43%)。在遮挡、远距离、低浓度烟雾等复杂场景下保持较高的平均置信度,满足实时监测需求。【结论】C3D-ViT通过时空特征联合建模,显著提升林火识别的鲁棒性与时效性,为森林防火系统提供可靠的技术支持。 展开更多
关键词 森林火灾 深度学习 目标检测 三维卷积神经网络 vision transformer
原文传递
xCViT:Improved Vision Transformer Network with Fusion of CNN and Xception for Skin Disease Recognition with Explainable AI
10
作者 Armughan Ali Hooria Shahbaz Robertas Damaševicius 《Computers, Materials & Continua》 2025年第4期1367-1398,共32页
Skin cancer is the most prevalent cancer globally,primarily due to extensive exposure to Ultraviolet(UV)radiation.Early identification of skin cancer enhances the likelihood of effective treatment,as delays may lead t... Skin cancer is the most prevalent cancer globally,primarily due to extensive exposure to Ultraviolet(UV)radiation.Early identification of skin cancer enhances the likelihood of effective treatment,as delays may lead to severe tumor advancement.This study proposes a novel hybrid deep learning strategy to address the complex issue of skin cancer diagnosis,with an architecture that integrates a Vision Transformer,a bespoke convolutional neural network(CNN),and an Xception module.They were evaluated using two benchmark datasets,HAM10000 and Skin Cancer ISIC.On the HAM10000,the model achieves a precision of 95.46%,an accuracy of 96.74%,a recall of 96.27%,specificity of 96.00%and an F1-Score of 95.86%.It obtains an accuracy of 93.19%,a precision of 93.25%,a recall of 92.80%,a specificity of 92.89%and an F1-Score of 93.19%on the Skin Cancer ISIC dataset.The findings demonstrate that the model that was proposed is robust and trustworthy when it comes to the classification of skin lesions.In addition,the utilization of Explainable AI techniques,such as Grad-CAM visualizations,assists in highlighting the most significant lesion areas that have an impact on the decisions that are made by the model. 展开更多
关键词 Skin lesions vision transformer CNN Xception deep learning network fusion explainable AI Grad-CAM skin cancer detection
在线阅读 下载PDF
Enhanced Plant Species Identification through Metadata Fusion and Vision Transformer Integration
11
作者 Hassan Javed Labiba Gillani Fahad +2 位作者 Syed Fahad Tahir Mehdi Hassan Hani Alquhayz 《Computers, Materials & Continua》 2025年第11期3981-3996,共16页
Accurate plant species classification is essential for many applications,such as biodiversity conservation,ecological research,and sustainable agricultural practices.Traditional morphological classification methods ar... Accurate plant species classification is essential for many applications,such as biodiversity conservation,ecological research,and sustainable agricultural practices.Traditional morphological classification methods are inherently slow,labour-intensive,and prone to inaccuracies,especiallywhen distinguishing between species exhibiting visual similarities or high intra-species variability.To address these limitations and to overcome the constraints of imageonly approaches,we introduce a novel Artificial Intelligence-driven framework.This approach integrates robust Vision Transformer(ViT)models for advanced visual analysis with a multi-modal data fusion strategy,incorporating contextual metadata such as precise environmental conditions,geographic location,and phenological traits.This combination of visual and ecological cues significantly enhances classification accuracy and robustness,proving especially vital in complex,heterogeneous real-world environments.The proposedmodel achieves an impressive 97.27%of test accuracy,andMean Reciprocal Rank(MRR)of 0.9842 that demonstrates strong generalization capabilities.Furthermore,efficient utilization of high-performance GPU resources(RTX 3090,18 GB memory)ensures scalable processing of highdimensional data.Comparative analysis consistently confirms that ourmetadata fusion approach substantially improves classification performance,particularly formorphologically similar species,and through principled self-supervised and transfer learning from ImageNet,the model adapts efficiently to new species,ensuring enhanced generalization.This comprehensive approach holds profound practical implications for precise conservation initiatives,rigorous ecological monitoring,and advanced agricultural management. 展开更多
关键词 vision transformers(ViTs) transformerS machine learning deep learning plant species classification MULTI-ORGAN
在线阅读 下载PDF
Multi-Scale Vision Transformer with Dynamic Multi-Loss Function for Medical Image Retrieval and Classification
12
作者 Omar Alqahtani Mohamed Ghouse +2 位作者 Asfia Sabahath Omer Bin Hussain Arshiya Begum 《Computers, Materials & Continua》 2025年第5期2221-2244,共24页
This paper introduces a novel method for medical image retrieval and classification by integrating a multi-scale encoding mechanism with Vision Transformer(ViT)architectures and a dynamic multi-loss function.The multi... This paper introduces a novel method for medical image retrieval and classification by integrating a multi-scale encoding mechanism with Vision Transformer(ViT)architectures and a dynamic multi-loss function.The multi-scale encoding significantly enhances the model’s ability to capture both fine-grained and global features,while the dynamic loss function adapts during training to optimize classification accuracy and retrieval performance.Our approach was evaluated on the ISIC-2018 and ChestX-ray14 datasets,yielding notable improvements.Specifically,on the ISIC-2018 dataset,our method achieves an F1-Score improvement of+4.84% compared to the standard ViT,with a precision increase of+5.46% for melanoma(MEL).On the ChestX-ray14 dataset,the method delivers an F1-Score improvement of 5.3%over the conventional ViT,with precision gains of+5.0% for pneumonia(PNEU)and+5.4%for fibrosis(FIB).Experimental results demonstrate that our approach outperforms traditional CNN-based models and existing ViT variants,particularly in retrieving relevant medical cases and enhancing diagnostic accuracy.These findings highlight the potential of the proposedmethod for large-scalemedical image analysis,offering improved tools for clinical decision-making through superior classification and case comparison. 展开更多
关键词 Medical image retrieval vision transformer multi-scale encoding multi-loss function ISIC-2018 ChestX-ray14
在线阅读 下载PDF
Mango Disease Detection Using Fused Vision Transformer with ConvNeXt Architecture
13
作者 Faten S.Alamri Tariq Sadad +2 位作者 Ahmed S.Almasoud Raja Atif Aurangzeb Amjad Khan 《Computers, Materials & Continua》 2025年第4期1023-1039,共17页
Mango farming significantly contributes to the economy,particularly in developing countries.However,mango trees are susceptible to various diseases caused by fungi,viruses,and bacteria,and diagnosing these diseases at... Mango farming significantly contributes to the economy,particularly in developing countries.However,mango trees are susceptible to various diseases caused by fungi,viruses,and bacteria,and diagnosing these diseases at an early stage is crucial to prevent their spread,which can lead to substantial losses.The development of deep learning models for detecting crop diseases is an active area of research in smart agriculture.This study focuses on mango plant diseases and employs the ConvNeXt and Vision Transformer(ViT)architectures.Two datasets were used.The first,MangoLeafBD,contains data for mango leaf diseases such as anthracnose,bacterial canker,gall midge,and powdery mildew.The second,SenMangoFruitDDS,includes data for mango fruit diseases such as Alternaria,Anthracnose,Black Mould Rot,Healthy,and Stem and Rot.Both datasets were obtained from publicly available sources.The proposed model achieved an accuracy of 99.87%on the MangoLeafBD dataset and 98.40%on the MangoFruitDDS dataset.The results demonstrate that ConvNeXt and ViT models can effectively diagnose mango diseases,enabling farmers to identify these conditions more efficiently.The system contributes to increased mango production and minimizes economic losses by reducing the time and effort needed for manual diagnostics.Additionally,the proposed system is integrated into a mobile application that utilizes the model as a backend to detect mango diseases instantly. 展开更多
关键词 ConvNeXt model fusion mango disease smart agriculture vision transformer
在线阅读 下载PDF
DA-ViT:Deformable Attention Vision Transformer for Alzheimer’s Disease Classification from MRI Scans
14
作者 Abdullah G.M.Almansour Faisal Alshomrani +4 位作者 Abdulaziz T.M.Almutairi Easa Alalwany Mohammed S.Alshuhri Hussein Alshaari Abdullah Alfahaid 《Computer Modeling in Engineering & Sciences》 2025年第8期2395-2418,共24页
The early and precise identification of Alzheimer’s Disease(AD)continues to pose considerable clinical difficulty due to subtle structural alterations and overlapping symptoms across the disease phases.This study pre... The early and precise identification of Alzheimer’s Disease(AD)continues to pose considerable clinical difficulty due to subtle structural alterations and overlapping symptoms across the disease phases.This study presents a novel Deformable Attention Vision Transformer(DA-ViT)architecture that integrates deformable Multi-Head Self-Attention(MHSA)with a Multi-Layer Perceptron(MLP)block for efficient classification of Alzheimer’s disease(AD)using Magnetic resonance imaging(MRI)scans.In contrast to traditional vision transformers,our deformable MHSA module preferentially concentrates on spatially pertinent patches through learned offset predictions,markedly diminishing processing demands while improving localized feature representation.DA-ViT contains only 0.93 million parameters,making it exceptionally suitable for implementation in resource-limited settings.We evaluate the model using a class-imbalanced Alzheimer’s MRI dataset comprising 6400 images across four categories,achieving a test accuracy of 80.31%,a macro F1-score of 0.80,and an area under the receiver operating characteristic curve(AUC)of 1.00 for the Mild Demented category.Thorough ablation studies validate the ideal configuration of transformer depth,headcount,and embedding dimensions.Moreover,comparison research indicates that DA-ViT surpasses state-of-theart pre-trained Convolutional Neural Network(CNN)models in terms of accuracy and parameter efficiency. 展开更多
关键词 Alzheimer disease classification vision transformer deformable attention MRI analysis bayesian optimization
在线阅读 下载PDF
Vision Transformer for Extracting Tropical Cyclone Intensity from Satellite Images
15
作者 Ye TIAN Wen ZHOU +1 位作者 Paxson KYCHEUNG Zhenchen LIU 《Advances in Atmospheric Sciences》 2025年第1期79-93,共15页
Tropical cyclone(TC)intensity estimation is a fundamental aspect of TC monitoring and forecasting.Deep learning models have recently been employed to estimate TC intensity from satellite images and yield precise resul... Tropical cyclone(TC)intensity estimation is a fundamental aspect of TC monitoring and forecasting.Deep learning models have recently been employed to estimate TC intensity from satellite images and yield precise results.This work proposes the ViT-TC model based on the Vision Transformer(ViT)architecture.Satellite images of TCs,including infrared(IR),water vapor(WV),and passive microwave(PMW),are used as inputs for intensity estimation.Experiments indicate that combining IR,WV,and PMW as inputs yields more accurate estimations than other channel combinations.The ensemble mean technique is applied to enhance the model's estimations,reducing the root-mean-square error to 9.32 kt(knots,1 kt≈0.51 m s^(-1))and the mean absolute error to 6.49 kt,which outperforms traditional methods and is comparable to existing deep learning models.The model assigns high attention weights to areas with high PMW,indicating that PMW magnitude is essential information for the model's estimation.The model also allocates significance to the cloud-cover region,suggesting that the model utilizes the whole TC cloud structure and TC eye to determine TC intensity. 展开更多
关键词 vision transformer tropical cyclones intensity estimation deep learning
在线阅读 下载PDF
基于改进Vision Transformer网络的农作物病害识别方法研究
16
作者 罗兴 魏维 《黑龙江科学》 2025年第16期50-53,共4页
农作物病害对粮食生产和质量具有显著的负面影响。针对现有基于深度学习的农作物病害识别模型存在的分类精度不足和模型参数量大的问题提出一种基于Vision Transformer的新型架构,该模型采用多尺度卷积模块捕获不同尺度的特征,以扩展模... 农作物病害对粮食生产和质量具有显著的负面影响。针对现有基于深度学习的农作物病害识别模型存在的分类精度不足和模型参数量大的问题提出一种基于Vision Transformer的新型架构,该模型采用多尺度卷积模块捕获不同尺度的特征,以扩展模型的感受野,融合不同尺度特征进行卷积调制,将卷积调制与Vision Transformer相结合,构建成一个混合网络,该网络能够实现局部和全局特征的深度融合,从而显著增强特征分类能力。在Plant Village数据集上的测试结果表明,所提出的MCMT模型达到了99.5%的识别准确率,相较于传统的Vision Transformer计算量更低,识别准确率更高。 展开更多
关键词 农作物病害识别 卷积调制 特征融合 vision transformer
在线阅读 下载PDF
基于增强式Vision Transformer的大坝表面裂缝分割研究
17
作者 马海明 《水利科技与经济》 2025年第4期104-107,共4页
大坝出现裂缝会导致坝渗漏,影响水库发电等效益。通过对大坝表面裂缝进行精确分割,研究开发一种基于增强式Vision Transformer的方法,结合Vision Transformer的全局信息处理能力和多层卷积块以及卷积注意力模块(CBAM),旨在提升对裂缝特... 大坝出现裂缝会导致坝渗漏,影响水库发电等效益。通过对大坝表面裂缝进行精确分割,研究开发一种基于增强式Vision Transformer的方法,结合Vision Transformer的全局信息处理能力和多层卷积块以及卷积注意力模块(CBAM),旨在提升对裂缝特征的敏感性和辨识度。结果表明,该模型展示出在分割大坝表面裂缝方面的优越性能,验证了其在大坝安全监控系统中的应用潜力。 展开更多
关键词 大坝表面裂缝分割 深度学习 vision transformer 卷积注意力模块
在线阅读 下载PDF
High-precision copper-grade identification via a vision transformer with PGNAA
18
作者 Jie Cao Chong-Gui Zhong +6 位作者 Han-Ting You Yan Zhang Ren-Bo Wang Shu-Min Zhou Jin-Hui Qu Rui Chen Shi-Liang Liu 《Nuclear Science and Techniques》 2025年第7期89-99,共11页
The identification of ore grades is a critical step in mineral resource exploration and mining.Prompt gamma neutron activation analysis(PGNAA)technology employs gamma rays generated by the nuclear reactions between ne... The identification of ore grades is a critical step in mineral resource exploration and mining.Prompt gamma neutron activation analysis(PGNAA)technology employs gamma rays generated by the nuclear reactions between neutrons and samples to achieve the qualitative and quantitative detection of sample components.In this study,we present a novel method for identifying copper grade by combining the vision transformer(ViT)model with the PGNAA technique.First,a Monte Carlo simulation is employed to determine the optimal sizes of the neutron moderator,thermal neutron absorption material,and dimensions of the device.Subsequently,based on the parameters obtained through optimization,a PGNAA copper ore measurement model is established.The gamma spectrum of the copper ore is analyzed using the ViT model.The ViT model is optimized for hyperparameters using a grid search.To ensure the reliability of the identification results,the test results are obtained through five repeated tenfold cross-validations.Long short-term memory and convolutional neural network models are compared with the ViT method.These results indicate that the ViT method is efficient in identifying copper ore grades with average accuracy,precision,recall,F_(1)score,and F_(1)(-)score values of 0.9795,0.9637,0.9614,0.9625,and 0.9942,respectively.When identifying associated minerals,the ViT model can identify Pb,Zn,Fe,and Co minerals with identification accuracies of 0.9215,0.9396,0.9966,and 0.8311,respectively. 展开更多
关键词 Copper-grade identification vision transformer model Prompt gamma neutron activation analysis Monte Carlo N-particle
在线阅读 下载PDF
基于Vision Transformer的虹膜——人脸多特征融合识别研究
19
作者 马滔 陈睿 张博 《中国新技术新产品》 2024年第18期8-10,共3页
为了提高生物特征识别系统的准确性和鲁棒性,本文研究基于计算机视觉的虹膜—人脸多特征融合识别方法。本文对面部图像中虹膜区域进行提取以及预处理,采用对比度增强和归一化操作,加强了特征提取的一致性,提升了图像质量。为了获取丰富... 为了提高生物特征识别系统的准确性和鲁棒性,本文研究基于计算机视觉的虹膜—人脸多特征融合识别方法。本文对面部图像中虹膜区域进行提取以及预处理,采用对比度增强和归一化操作,加强了特征提取的一致性,提升了图像质量。为了获取丰富的深度特征,本文使用Vision Transformer模型对预处理后的虹膜和面部图像进行特征提取。利用多头注意力机制将虹膜和面部的多模态特征信息进行融合,再利用全连接层进行分类识别。试验结果表明,该方法识别性能优秀,识别准确性显著提升。 展开更多
关键词 计算机视觉 vision transformer 多特征融合 虹膜识别 人脸识别
在线阅读 下载PDF
基于Vision Transformer的小麦病害图像识别算法 被引量:2
20
作者 白玉鹏 冯毅琨 +3 位作者 李国厚 赵明富 周浩宇 侯志松 《中国农机化学报》 北大核心 2024年第2期267-274,共8页
小麦白粉病、赤霉病和锈病是危害小麦产量的三大病害。为提高小麦病害图像的识别准确率,构建一种基于Vision Transformer的小麦病害图像识别算法。首先,通过田间拍摄的方式收集包含小麦白粉病、赤霉病和锈病3种病害在内的小麦病害图像,... 小麦白粉病、赤霉病和锈病是危害小麦产量的三大病害。为提高小麦病害图像的识别准确率,构建一种基于Vision Transformer的小麦病害图像识别算法。首先,通过田间拍摄的方式收集包含小麦白粉病、赤霉病和锈病3种病害在内的小麦病害图像,并对原始图像进行预处理,建立小麦病害图像识别数据集;然后,基于改进的Vision Transformer构建小麦病害图像识别算法,分析不同迁移学习方式和数据增强对模型识别效果的影响。试验可知,全参数迁移学习和数据增强能明显提高Vision Transformer模型的收敛速度和识别精度。最后,在相同时间条件下,对比Vision Transformer、AlexNet和VGG16算法在相同数据集上的表现。试验结果表明,Vision Transformer模型对3种小麦病害图像的平均识别准确率为96.81%,相较于AlexNet和VGG16模型识别准确率分别提高6.68%和4.94%。 展开更多
关键词 小麦病害 vision transformer 迁移学习 图像识别 数据增强
在线阅读 下载PDF
上一页 1 2 11 下一页 到第
使用帮助 返回顶部