期刊文献+
共找到166篇文章
< 1 2 9 >
每页显示 20 50 100
KPA-ViT:Key Part-Level Attention Vision Transformer for Foreign Body Classification on Coal Conveyor Belt
1
作者 Haoxuanye Ji Zhiliang Chen +3 位作者 Pengfei Jiang Ziyue Wang Ting Yu Wei Zhang 《Computers, Materials & Continua》 2026年第3期656-671,共16页
Foreign body classification on coal conveyor belts is a critical component of intelligent coal mining systems.Previous approaches have primarily utilized convolutional neural networks(CNNs)to effectively integrate spa... Foreign body classification on coal conveyor belts is a critical component of intelligent coal mining systems.Previous approaches have primarily utilized convolutional neural networks(CNNs)to effectively integrate spatial and semantic information.However,the performance of CNN-based methods remains limited in classification accuracy,primarily due to insufficient exploration of local image characteristics.Unlike CNNs,Vision Transformer(ViT)captures discriminative features by modeling relationships between local image patches.However,such methods typically require a large number of training samples to perform effectively.In the context of foreign body classification on coal conveyor belts,the limited availability of training samples hinders the full exploitation of Vision Transformer’s(ViT)capabilities.To address this issue,we propose an efficient approach,termed Key Part-level Attention Vision Transformer(KPA-ViT),which incorporates key local information into the transformer architecture to enrich the training information.It comprises three main components:a key-point detection module,a key local mining module,and an attention module.To extract key local regions,a key-point detection strategy is first employed to identify the positions of key points.Subsequently,the key local mining module extracts the relevant local features based on these detected points.Finally,an attention module composed of self-attention and cross-attention blocks is introduced to integrate global and key part-level information,thereby enhancing the model’s ability to learn discriminative features.Compared to recent transformer-based frameworks—such as ViT,Swin-Transformer,and EfficientViT—the proposed KPA-ViT achieves performance improvements of 9.3%,6.6%,and 2.8%,respectively,on the CUMT-BelT dataset,demonstrating its effectiveness. 展开更多
关键词 Foreign body classification global and part-level key information coal conveyor belt vision transformer(vit) self and cross attention
在线阅读 下载PDF
A Hybrid Deep Learning Approach Using Vision Transformer and U-Net for Flood Segmentation
2
作者 Cyreneo Dofitas Jr Yong-Woon Kim Yung-Cheol Byun 《Computers, Materials & Continua》 2026年第2期1209-1227,共19页
Recent advances in deep learning have significantly improved flood detection and segmentation from aerial and satellite imagery.However,conventional convolutional neural networks(CNNs)often struggle in complex flood s... Recent advances in deep learning have significantly improved flood detection and segmentation from aerial and satellite imagery.However,conventional convolutional neural networks(CNNs)often struggle in complex flood scenarios involving reflections,occlusions,or indistinct boundaries due to limited contextual modeling.To address these challenges,we propose a hybrid flood segmentation framework that integrates a Vision Transformer(ViT)encoder with a U-Net decoder,enhanced by a novel Flood-Aware Refinement Block(FARB).The FARB module improves boundary delineation and suppresses noise by combining residual smoothing with spatial-channel attention mechanisms.We evaluate our model on a UAV-acquired flood imagery dataset,demonstrating that the proposed ViTUNet+FARB architecture outperforms existing CNN and Transformer-based models in terms of accuracy and mean Intersection over Union(mIoU).Detailed ablation studies further validate the contribution of each component,confirming that the FARB design significantly enhances segmentation quality.To its better performance and computational efficiency,the proposed framework is well-suited for flood monitoring and disaster response applications,particularly in resource-constrained environments. 展开更多
关键词 Flood detection vision transformer(vit) U-Net segmentation image processing deep learning artificial intelligence
在线阅读 下载PDF
ViT-Count:面向冠层遮挡的Vision Transformer树木计数定位方法
3
作者 张乔一 张瑞 霍光煜 《北京林业大学学报》 北大核心 2025年第10期128-138,共11页
【目的】针对复杂场景中树木检测的挑战,如遮挡、背景干扰及密集分布等,本研究提出一种基于Vision Transformer(ViT)的树木检测方法(ViT-Count),提升模型对复杂场景中树木的检测精度与鲁棒性。【方法】采用ViT作为基础模型,其在捕捉图... 【目的】针对复杂场景中树木检测的挑战,如遮挡、背景干扰及密集分布等,本研究提出一种基于Vision Transformer(ViT)的树木检测方法(ViT-Count),提升模型对复杂场景中树木的检测精度与鲁棒性。【方法】采用ViT作为基础模型,其在捕捉图像中全局上下文信息方面具有天然优势,尤其适用于形态多变的复杂环境。设计针对树木的视觉提示调优VPT机制,其通过在特征中注入可学习提示(prompts),优化模型在林地高密度树冠、光照变化及不同树种结构下的特征提取能力,提高对不同林分类型的适应性。设计卷积模块的注意力机制模块,利用其在局部感知基础上的长距离依赖建模能力,有效强化模型对树木遮挡、重叠及形态相似目标的辨别能力,提高整体检测的鲁棒性与准确性。设计一个树木检测解码器,通过多层卷积、归一化、GELU激活与上采样操作逐步还原空间分辨率,以生成的目标密度图实现树木计数与定位。【结果】该方法在提升森林、城市场景下的树木检测鲁棒性的同时,增强了模型在多尺度树木目标上的泛化能力。在Larch Casebearer数据集和Urban Tree数据集上进行的实验显示,与其他主流模型相比,该方法的MAE和RMSE最多分别降低了2.53、3.99,表明其泛化能力更强,具有最优的树木检测性能。可视化实验结果表明,在密集森林场景和复杂城市场景中,所提模型均具有较高的树木检测准确率。消融实验的结果证明了模型主要模块的有效性。【结论】基于Vision Transformer的面向复杂场景的树木计数与定位方法能够充分发挥ViT的全局建模能力及视觉提示调优机制任务适应性,结合卷积模块的注意力机制,有效提升复杂场景树木计数与定位的精度与鲁棒性。 展开更多
关键词 目标识别 树木计数 树木定位 复杂场景 vision transformer(vit) 视觉提示调优(VPT) 注意力机制
在线阅读 下载PDF
Vision Transformer模型在中医舌诊图像分类中的应用研究
4
作者 周坚和 王彩雄 +3 位作者 李炜 周晓玲 张丹璇 吴玉峰 《广西科技大学学报》 2025年第5期89-98,共10页
舌诊作为中医望诊中的一项重要且常规的检查手段,在中医临床诊断中发挥着不可或缺的作用。为突破传统舌诊依赖主观经验及卷积神经网络(convolutional neural network,CNN)模型分类性能不足的局限,本文基于高质量舌象分类数据集,提出基于... 舌诊作为中医望诊中的一项重要且常规的检查手段,在中医临床诊断中发挥着不可或缺的作用。为突破传统舌诊依赖主观经验及卷积神经网络(convolutional neural network,CNN)模型分类性能不足的局限,本文基于高质量舌象分类数据集,提出基于Vision Transformer(ViT)深度学习模型,通过预训练与微调策略优化特征提取能力,并结合数据增强技术解决类别分布不平衡问题。实验结果表明,该模型在6项关键舌象特征分类任务中,5项指标的准确率(苔色85.6%、瘀斑98.0%、质地99.6%、舌色96.6%、裂纹87.8%)显著优于现有CNN方法(如ResNet50对应准确率分别为78.0%、91.0%、92.0%、68.0%、80.1%),验证了该模型在突破传统性能瓶颈、提升中医临床智能诊断可靠性方面的有效性和应用潜力。 展开更多
关键词 舌诊 vision transformer(vit) 深度学习 医学图像分类
在线阅读 下载PDF
基于残差注意力TCN与vision transformer的齿轮剩余寿命预测
5
作者 胡爱军 李晨阳 +2 位作者 邢磊 周卓浩 向玲 《航空动力学报》 北大核心 2025年第12期14-24,共11页
齿轮系统的运行状况受到多个因素的影响,这些因素在时间上存在长期依赖关系,并在局部和全局特征之间存在差异。为了有效地捕捉数据中的时间依赖性并自适应调整对特征的关注度,提出具有残差卷积块注意力机制的时间卷积网络(RCMTCN)。通... 齿轮系统的运行状况受到多个因素的影响,这些因素在时间上存在长期依赖关系,并在局部和全局特征之间存在差异。为了有效地捕捉数据中的时间依赖性并自适应调整对特征的关注度,提出具有残差卷积块注意力机制的时间卷积网络(RCMTCN)。通过在卷积块注意力机制中引入残差连接,模型能够同时关注原始输入和注意力加权的信息,提高了模型对局部信息的感知能力。在此基础上,将vision transformer(ViT)模型与RCMTCN相结合对齿轮的剩余使用寿命(RUL)预测,ViT模型能有效地捕获数据中的全局信息。两者融合后能充分展现在处理时间序列数据局部特征提取能力和全局信息关注方面的优势,提高对多维度特征的感知能力。最后,通过在两种工况齿轮性能退化数据集上对模型进行验证,选用点蚀故障数据进行训练,分别对点蚀和断齿故障进行测试。实验结果表明:与其他方法相比,所提出的方法能更充分地提取关键特征信息,在点蚀故障上评分函数得分为0.8898,且在断齿故障上得分为0.8587,表现出良好的工况、故障适应能力。 展开更多
关键词 齿轮 剩余使用寿命 时序网络 注意力机制 vision transformer模型
原文传递
视觉Transformer(ViT)发展综述 被引量:15
6
作者 李玉洁 马子航 +2 位作者 王艺甫 王星河 谭本英 《计算机科学》 北大核心 2025年第1期194-209,共16页
视觉Transformer(Vision Transformer,ViT)是基于编码器-解码器结构的Transformer改进模型,已经被成功应用于计算机视觉领域。近几年基于ViT的研究层出不穷且效果显著,基于该模型的工作已经成为计算机视觉任务的重要研究方向,因此针对... 视觉Transformer(Vision Transformer,ViT)是基于编码器-解码器结构的Transformer改进模型,已经被成功应用于计算机视觉领域。近几年基于ViT的研究层出不穷且效果显著,基于该模型的工作已经成为计算机视觉任务的重要研究方向,因此针对近年来ViT的发展进行概述。首先,简要回顾了ViT的基本原理及迁移过程,并分析了ViT模型的结构特点和优势;然后,根据各ViT变体模型的改进特点,归纳和梳理了基于ViT的主要骨干网络变体改进方向及其代表性改进模型,包括局部性改进、结构改进、自监督、轻量化及效率改进等改进方向,并对其进行分析比较;最后,讨论了当前ViT及其改进模型仍存在的不足,对ViT未来的研究方向进行了展望。可以作为研究人员进行基于ViT骨干网络的研究时选择深度学习相关方法的一个权衡和参考。 展开更多
关键词 计算机视觉 模式识别 vision transformer(vit) 深度学习 自注意力
在线阅读 下载PDF
基于改进Vision Transformer的局部光照一致性估计 被引量:2
7
作者 王杨 宋世佳 +3 位作者 王鹤琴 袁振羽 赵立军 吴其林 《计算机工程》 北大核心 2025年第2期312-321,共10页
光照一致性是增强现实(AR)系统中实现虚实有机融合的关键因素之一。由于拍摄视角的局限性和场景光照的复杂性,开发者在估计全景照明信息时通常忽略局部光照一致性,从而影响最终的渲染效果。为解决这一问题,提出一种基于改进视觉Transfor... 光照一致性是增强现实(AR)系统中实现虚实有机融合的关键因素之一。由于拍摄视角的局限性和场景光照的复杂性,开发者在估计全景照明信息时通常忽略局部光照一致性,从而影响最终的渲染效果。为解决这一问题,提出一种基于改进视觉Transformer(ViT)结构的局部光照一致性估计框架(ViTLight)。首先利用ViT编码器提取特征向量并计算回归球面谐波(SH)系数,进而恢复光照信息;其次改进ViT编码器结构,引入多头自注意力交互机制,采用卷积运算引导注意力头之间相互联系,在此基础上增加局部感知模块,扫描每个图像分块并对局部像素进行加权求和,捕捉区域内的特定特征,有助于平衡全局上下文特征和局部光照信息,提高光照估计的精度。在公开数据集上对比主流特征提取网络和4种经典光照估计框架,实验和分析结果表明,ViTLight在图像渲染准确率方面高于现有框架,其均方根误差(RMSE)和结构相异性(DSSIM)指标分别为0.1296和0.0426,验证了该框架的有效性与正确性。 展开更多
关键词 增强现实 光照估计 球面谐波系数 视觉transformer 多头自注意力
在线阅读 下载PDF
自适应动态选择尺度的ViT后训练量化模型研究
8
作者 裴颂文 彭宇昂 +2 位作者 刘方鑫 陈铭松 张波 《小型微型计算机系统》 北大核心 2026年第1期142-149,共8页
后训练量化方法无需重新训练神经网络,且对数据集的依赖性小,是一种轻量且实用的模型压缩技术.然而,现有的量化范式未能有效地拟合post-Softmax激活的分布特性,并且在重新参数化post-LayerNorm激活后,精度不可避免地出现下降.因此,本文... 后训练量化方法无需重新训练神经网络,且对数据集的依赖性小,是一种轻量且实用的模型压缩技术.然而,现有的量化范式未能有效地拟合post-Softmax激活的分布特性,并且在重新参数化post-LayerNorm激活后,精度不可避免地出现下降.因此,本文提出了一种自适应动态选择量化尺度的变换器后训练量化框架DAQ-ViT.DAQ-ViT首先提出了一种基于偏度度量的缩放因子分布选择器,解决了post-LayerNorm激活存在显著的通道间变化所导致的精度下降问题.其次,针对post-Softmax和post-GELU激活分布特性,提出了满足分布特性的Sigmoid量化器.此外,提出了感知分布检测器,自适应感知激活值分布情况,从而动态选择Sigmoid量化和log2量化.实验结果表明,在没有输出重建的情况下与PTQ4ViT相比,DAQ-ViT进行4比特量化时,在DeiT-Tiny和DeiT-Small上的精度分别提高了20%和35%. 展开更多
关键词 模型压缩 模型量化 后训练量化 图像分类 视觉变换器
在线阅读 下载PDF
基于Vision Transformer-LSTM(ViTL)的多时序遥感影像农作物分类方法 被引量:1
9
作者 张青云 杨辉 +1 位作者 李兴伍 武永闯 《安徽农业大学学报》 CAS CSCD 2024年第5期888-898,共11页
针对当前遥感农作物分类研究中深度学习模型对光谱时间和空间信息特征采样不足,农作物提取仍然存在边界模糊、漏提、误提的问题,提出了一种名为视觉Transformer-长短期记忆递归神经网络(Vision Transformer-long short term memory,ViTL... 针对当前遥感农作物分类研究中深度学习模型对光谱时间和空间信息特征采样不足,农作物提取仍然存在边界模糊、漏提、误提的问题,提出了一种名为视觉Transformer-长短期记忆递归神经网络(Vision Transformer-long short term memory,ViTL)的深度学习模型,ViTL模型集成了双路Vision-Transformer特征提取、时空特征融合和长短期记忆递归神经网络(LSTM)时序分类等3个关键模块,双路Vision-Transformer特征提取模块用于捕获图像的时空特征相关性,一路提取空间分类特征,一路提取时间变化特征;时空特征融合模块用于将多时特征信息进行交叉融合;LSTM时序分类模块捕捉多时序的依赖关系并进行输出分类。综合利用基于多时序卫星影像的遥感技术理论和方法,对黑龙江省齐齐哈尔市讷河市作物信息进行提取,研究结果表明,ViTL模型表现出色,其总体准确率(Overall Accuracy,OA)、平均交并比(Mean Intersection over Union,MIoU)和F1分数分别达到0.8676、0.6987和0.8175,与其他广泛使用的深度学习方法相比,包括三维卷积神经网络(3-D CNN)、二维卷积神经网络(2-D CNN)和长短期记忆递归神经网络(LSTM),ViTL模型的F1分数提高了9%~12%,显示出显著的优越性。ViTL模型克服了面对多时序遥感影像的农作物分类任务中的时间和空间信息特征采样不足问题,为准确、高效地农作物分类提供了新思路。 展开更多
关键词 农作物分类 vision transformer(vit) LSTM 深度学习 遥感监测
原文传递
Model Agnostic Meta-Learning(MAML)-Based Ensemble Model for Accurate Detection of Wheat Diseases Using Vision Transformer and Graph Neural Networks 被引量:1
10
作者 Yasir Maqsood Syed Muhammad Usman +3 位作者 Musaed Alhussein Khursheed Aurangzeb Shehzad Khalid Muhammad Zubair 《Computers, Materials & Continua》 SCIE EI 2024年第5期2795-2811,共17页
Wheat is a critical crop,extensively consumed worldwide,and its production enhancement is essential to meet escalating demand.The presence of diseases like stem rust,leaf rust,yellow rust,and tan spot significantly di... Wheat is a critical crop,extensively consumed worldwide,and its production enhancement is essential to meet escalating demand.The presence of diseases like stem rust,leaf rust,yellow rust,and tan spot significantly diminishes wheat yield,making the early and precise identification of these diseases vital for effective disease management.With advancements in deep learning algorithms,researchers have proposed many methods for the automated detection of disease pathogens;however,accurately detectingmultiple disease pathogens simultaneously remains a challenge.This challenge arises due to the scarcity of RGB images for multiple diseases,class imbalance in existing public datasets,and the difficulty in extracting features that discriminate between multiple classes of disease pathogens.In this research,a novel method is proposed based on Transfer Generative Adversarial Networks for augmenting existing data,thereby overcoming the problems of class imbalance and data scarcity.This study proposes a customized architecture of Vision Transformers(ViT),where the feature vector is obtained by concatenating features extracted from the custom ViT and Graph Neural Networks.This paper also proposes a Model AgnosticMeta Learning(MAML)based ensemble classifier for accurate classification.The proposedmodel,validated on public datasets for wheat disease pathogen classification,achieved a test accuracy of 99.20%and an F1-score of 97.95%.Compared with existing state-of-the-art methods,this proposed model outperforms in terms of accuracy,F1-score,and the number of disease pathogens detection.In future,more diseases can be included for detection along with some other modalities like pests and weed. 展开更多
关键词 Wheat disease detection deep learning vision transformer graph neural network model agnostic meta learning
在线阅读 下载PDF
基于改进Vision Transformer的森林火灾视频识别研究
11
作者 张敏 辛颖 黄天棋 《南京林业大学学报(自然科学版)》 北大核心 2025年第4期186-194,共9页
【目的】针对现有森林火灾图像识别算法存在的效率不足、时序特征利用率低等问题,构建基于视频数据的森林火灾识别模型,以提升林火监测的实时性与识别准确率。【方法】提出融合三维卷积神经网络(3DCNN)与视觉Vision Transformer(ViT)的C... 【目的】针对现有森林火灾图像识别算法存在的效率不足、时序特征利用率低等问题,构建基于视频数据的森林火灾识别模型,以提升林火监测的实时性与识别准确率。【方法】提出融合三维卷积神经网络(3DCNN)与视觉Vision Transformer(ViT)的C3D-ViT算法。该模型通过3DCNN提取视频序列的时空特征,构建时空特征向量;利用ViT编码器的自注意力机制融合局部与全局特征;最终经MLP Head层输出分类结果。通过消融实验验证C3D-ViT模型的有效性,并与原模型3DCNN和ViT,以及ResNet50、LSTM、YOLOv5等深度学习模型进行对比。【结果】C3D-ViT在自建林火数据集上准确率达到96.10%,较ResNet50(89.07%)、LSTM(93.26%)和YOLOv5(91.46%)具有明显优势。模型改进有效,准确率超越3DCNN(93.91%)与ViT(90.43%)。在遮挡、远距离、低浓度烟雾等复杂场景下保持较高的平均置信度,满足实时监测需求。【结论】C3D-ViT通过时空特征联合建模,显著提升林火识别的鲁棒性与时效性,为森林防火系统提供可靠的技术支持。 展开更多
关键词 森林火灾 深度学习 目标检测 三维卷积神经网络 vision transformer
原文传递
Enhanced Plant Species Identification through Metadata Fusion and Vision Transformer Integration
12
作者 Hassan Javed Labiba Gillani Fahad +2 位作者 Syed Fahad Tahir Mehdi Hassan Hani Alquhayz 《Computers, Materials & Continua》 2025年第11期3981-3996,共16页
Accurate plant species classification is essential for many applications,such as biodiversity conservation,ecological research,and sustainable agricultural practices.Traditional morphological classification methods ar... Accurate plant species classification is essential for many applications,such as biodiversity conservation,ecological research,and sustainable agricultural practices.Traditional morphological classification methods are inherently slow,labour-intensive,and prone to inaccuracies,especiallywhen distinguishing between species exhibiting visual similarities or high intra-species variability.To address these limitations and to overcome the constraints of imageonly approaches,we introduce a novel Artificial Intelligence-driven framework.This approach integrates robust Vision Transformer(ViT)models for advanced visual analysis with a multi-modal data fusion strategy,incorporating contextual metadata such as precise environmental conditions,geographic location,and phenological traits.This combination of visual and ecological cues significantly enhances classification accuracy and robustness,proving especially vital in complex,heterogeneous real-world environments.The proposedmodel achieves an impressive 97.27%of test accuracy,andMean Reciprocal Rank(MRR)of 0.9842 that demonstrates strong generalization capabilities.Furthermore,efficient utilization of high-performance GPU resources(RTX 3090,18 GB memory)ensures scalable processing of highdimensional data.Comparative analysis consistently confirms that ourmetadata fusion approach substantially improves classification performance,particularly formorphologically similar species,and through principled self-supervised and transfer learning from ImageNet,the model adapts efficiently to new species,ensuring enhanced generalization.This comprehensive approach holds profound practical implications for precise conservation initiatives,rigorous ecological monitoring,and advanced agricultural management. 展开更多
关键词 vision transformers(vits) transformerS machine learning deep learning plant species classification MULTI-ORGAN
在线阅读 下载PDF
A general framework for airfoil flow field reconstruction based on transformer-guided diffusion models
13
作者 Jinhua LOU Rongqian CHEN +4 位作者 Zelun LIN Jiaqi LIU Yue BAO Hao WU Yancheng YOU 《Chinese Journal of Aeronautics》 2025年第12期214-244,共31页
High-Resolution(HR)data on flow fields are critical for accurately evaluating the aerodynamic performance of aircraft.However,acquiring such data through large-scale numerical simulations or wind tunnel experiments is... High-Resolution(HR)data on flow fields are critical for accurately evaluating the aerodynamic performance of aircraft.However,acquiring such data through large-scale numerical simulations or wind tunnel experiments is highly resource intensive.This paper proposes a FlowViT-Diff framework that integrates a Vision Transformer(ViT)with an enhanced denoising diffusion probabilistic model for the Super-Resolution(SR)reconstruction of HR flow fields based on low-resolution inputs.It provides a quick initial prediction of the HR flow field by optimizing the ViT architecture,and incorporates this preliminary output as guidance within an enhanced diffusion model.The latter captures the Gaussian noise distribution during forward diffusion and progressively removes it during backward diffusion to generate the flow field.Experiments on various supercritical airfoils under different flow conditions show that FlowViT-Diff can robustly reconstruct the flow field across multiple levels of downsampling.It obtains more consistent global and local features than traditional SR methods,and yields a 3.6-fold increase in its training speed via transfer learning.Its accuracy of reconstruction of the flow field is 99.7%under ultra-low downsampling.The results demonstrate that Flow Vi T-Diff not only exhibits effective flow field reconstruction capabilities,but also provides two reconstruction strategies,both of which show effective transferability. 展开更多
关键词 Flow fields vision transformer(vit) Denoising diffusion probabilistic model Supercritical airfoil Transfer learning
原文传递
Mango Disease Detection Using Fused Vision Transformer with ConvNeXt Architecture
14
作者 Faten S.Alamri Tariq Sadad +2 位作者 Ahmed S.Almasoud Raja Atif Aurangzeb Amjad Khan 《Computers, Materials & Continua》 2025年第4期1023-1039,共17页
Mango farming significantly contributes to the economy,particularly in developing countries.However,mango trees are susceptible to various diseases caused by fungi,viruses,and bacteria,and diagnosing these diseases at... Mango farming significantly contributes to the economy,particularly in developing countries.However,mango trees are susceptible to various diseases caused by fungi,viruses,and bacteria,and diagnosing these diseases at an early stage is crucial to prevent their spread,which can lead to substantial losses.The development of deep learning models for detecting crop diseases is an active area of research in smart agriculture.This study focuses on mango plant diseases and employs the ConvNeXt and Vision Transformer(ViT)architectures.Two datasets were used.The first,MangoLeafBD,contains data for mango leaf diseases such as anthracnose,bacterial canker,gall midge,and powdery mildew.The second,SenMangoFruitDDS,includes data for mango fruit diseases such as Alternaria,Anthracnose,Black Mould Rot,Healthy,and Stem and Rot.Both datasets were obtained from publicly available sources.The proposed model achieved an accuracy of 99.87%on the MangoLeafBD dataset and 98.40%on the MangoFruitDDS dataset.The results demonstrate that ConvNeXt and ViT models can effectively diagnose mango diseases,enabling farmers to identify these conditions more efficiently.The system contributes to increased mango production and minimizes economic losses by reducing the time and effort needed for manual diagnostics.Additionally,the proposed system is integrated into a mobile application that utilizes the model as a backend to detect mango diseases instantly. 展开更多
关键词 ConvNeXt model FUSION mango disease smart agriculture vision transformer
在线阅读 下载PDF
High-precision copper-grade identification via a vision transformer with PGNAA
15
作者 Jie Cao Chong-Gui Zhong +6 位作者 Han-Ting You Yan Zhang Ren-Bo Wang Shu-Min Zhou Jin-Hui Qu Rui Chen Shi-Liang Liu 《Nuclear Science and Techniques》 2025年第7期89-99,共11页
The identification of ore grades is a critical step in mineral resource exploration and mining.Prompt gamma neutron activation analysis(PGNAA)technology employs gamma rays generated by the nuclear reactions between ne... The identification of ore grades is a critical step in mineral resource exploration and mining.Prompt gamma neutron activation analysis(PGNAA)technology employs gamma rays generated by the nuclear reactions between neutrons and samples to achieve the qualitative and quantitative detection of sample components.In this study,we present a novel method for identifying copper grade by combining the vision transformer(ViT)model with the PGNAA technique.First,a Monte Carlo simulation is employed to determine the optimal sizes of the neutron moderator,thermal neutron absorption material,and dimensions of the device.Subsequently,based on the parameters obtained through optimization,a PGNAA copper ore measurement model is established.The gamma spectrum of the copper ore is analyzed using the ViT model.The ViT model is optimized for hyperparameters using a grid search.To ensure the reliability of the identification results,the test results are obtained through five repeated tenfold cross-validations.Long short-term memory and convolutional neural network models are compared with the ViT method.These results indicate that the ViT method is efficient in identifying copper ore grades with average accuracy,precision,recall,F_(1)score,and F_(1)(-)score values of 0.9795,0.9637,0.9614,0.9625,and 0.9942,respectively.When identifying associated minerals,the ViT model can identify Pb,Zn,Fe,and Co minerals with identification accuracies of 0.9215,0.9396,0.9966,and 0.8311,respectively. 展开更多
关键词 Copper-grade identification vision transformer model Prompt gamma neutron activation analysis Monte Carlo N-particle
在线阅读 下载PDF
基于卷积Token的Vision Transformer模型的人脸表情识别
16
作者 王静 商钰 《北华航天工业学院学报》 CAS 2023年第5期8-10,共3页
人脸表情识别有着广泛的应用。本文使用基于卷积Token的Vision Transformer的混合模型实现表情识别。混合模型能够更好地捕捉人脸表情的局部特征以及局部特征之间的相关性。实验使用RafDB和Fer2013Plus数据集,并对比了ResNet、DenseNet... 人脸表情识别有着广泛的应用。本文使用基于卷积Token的Vision Transformer的混合模型实现表情识别。混合模型能够更好地捕捉人脸表情的局部特征以及局部特征之间的相关性。实验使用RafDB和Fer2013Plus数据集,并对比了ResNet、DenseNet、Swin Transformer和CVT模型的精度和分类混合矩阵分析在人脸表情识别中的表现。 展开更多
关键词 卷积Token vision transformer 混合模型 表情识别 混合矩阵
在线阅读 下载PDF
优化ViT用于黑色素瘤分类:特征筛选与InfoNCE损失的结合
17
作者 黄金杰 马媛雪 《光学精密工程》 北大核心 2025年第16期2649-2660,共12页
针对Vision Transformer(ViT)在黑色素瘤图像分类中存在的特征冗余和泛化能力不足问题,提出一种融合动态特征筛选与对比学习的改进模型,以提升分类精度与临床诊断效率。首先,设计动态特征筛选模块,通过可学习的权重矩阵自适应强化关键... 针对Vision Transformer(ViT)在黑色素瘤图像分类中存在的特征冗余和泛化能力不足问题,提出一种融合动态特征筛选与对比学习的改进模型,以提升分类精度与临床诊断效率。首先,设计动态特征筛选模块,通过可学习的权重矩阵自适应强化关键特征并抑制冗余信息;其次,引入InfoNCE对比损失函数,联合交叉熵损失构建多目标优化框架,增强类间特征区分度;最后,在多头自注意力机制中嵌入关键特征引导机制,实现局部细节与全局语义的协同建模。在ISIC2018和ISIC2019数据集上的实验结果表明:改进模型分类准确率分别达到83.27%和80.17%,较基线ViT模型提升1.83%和0.49%;消融实验验证动态筛选模块减少18.7%冗余计算量,对比学习使类内特征相似度提升23.6%。所提方法显著提高了ViT模型对黑色素瘤的识别能力,分类精度与鲁棒性优于主流模型,为皮肤癌早期诊断提供了高精度、低冗余的自动化解决方案,具有一定的临床实用价值。 展开更多
关键词 图像分类 特征筛选 InfoNCE损失函数 vit模型
在线阅读 下载PDF
基于Vision Transformer的中文唇语识别 被引量:3
18
作者 薛峰 洪自坤 +2 位作者 李书杰 李雨 谢胤岑 《模式识别与人工智能》 EI CSCD 北大核心 2022年第12期1111-1121,共11页
唇语识别作为一种将唇读视频转换为文本的多模态任务,旨在理解说话者在无声情况下表达的意思.目前唇语识别主要利用卷积神经网络提取唇部视觉特征,捕获短距离像素关系,难以区分相似发音字符的唇形.为了捕获视频图像中唇部区域像素之间... 唇语识别作为一种将唇读视频转换为文本的多模态任务,旨在理解说话者在无声情况下表达的意思.目前唇语识别主要利用卷积神经网络提取唇部视觉特征,捕获短距离像素关系,难以区分相似发音字符的唇形.为了捕获视频图像中唇部区域像素之间的长距离关系,文中提出基于Vision Transformer(ViT)的端到端中文句子级唇语识别模型,融合ViT和门控循环单元(Gate Recurrent Unit,GRU),提高对嘴唇视频的视觉时空特征提取能力.具体地,首先使用ViT的自注意力模块提取嘴唇图像的全局空间特征,再通过GRU对帧序列时序建模,最后使用基于注意力机制的级联序列到序列模型实现对拼音和汉字语句的预测.在中文唇语识别数据集CMLR上的实验表明,文中模型的汉字错误率较低. 展开更多
关键词 唇语识别 vision transformer(vit) 深度神经网络 编解码器 注意力机制 特征提取
在线阅读 下载PDF
Computer-aided diagnosis of retinopathy based on vision transformer 被引量:3
19
作者 Zhencun Jiang Lingyang Wang +4 位作者 Qixin Wu Yilei Shao Meixiao Shen Wenping Jiang Cuixia Dai 《Journal of Innovative Optical Health Sciences》 SCIE EI CAS 2022年第2期49-57,共9页
Age-related Macular Degeneration(AMD)and Diabetic Macular Edema(DME)are two com-mon retinal diseases for elder people that may ultimately cause irreversible blindness.Timely and accurate diagnosis is essential for the... Age-related Macular Degeneration(AMD)and Diabetic Macular Edema(DME)are two com-mon retinal diseases for elder people that may ultimately cause irreversible blindness.Timely and accurate diagnosis is essential for the treatment of these diseases.In recent years,computer-aided diagnosis(CAD)has been deeply investigated and effectively used for rapid and early diagnosis.In this paper,we proposed a method of CAD using vision transformer to analyze optical co-herence tomography(OCT)images and to automatically discriminate AMD,DME,and normal eyes.A classification accuracy of 99.69%was achieved.After the model pruning,the recognition time reached 0.010 s and the classification accuracy did not drop.Compared with the Con-volutional Neural Network(CNN)image classification models(VGG16,Resnet50,Densenet121,and EfficientNet),vision transformer after pruning exhibited better recognition ability.Results show that vision transformer is an improved alternative to diagnose retinal diseases more accurately. 展开更多
关键词 vision transformer OCT image classi¯cation RETINOPATHY computer-aided diagnosis model pruning
原文传递
A Comprehensive Survey of Recent Transformers in Image,Video and Diffusion Models 被引量:1
20
作者 Dinh Phu Cuong Le Dong Wang Viet-Tuan Le 《Computers, Materials & Continua》 SCIE EI 2024年第7期37-60,共24页
Transformer models have emerged as dominant networks for various tasks in computer vision compared to Convolutional Neural Networks(CNNs).The transformers demonstrate the ability to model long-range dependencies by ut... Transformer models have emerged as dominant networks for various tasks in computer vision compared to Convolutional Neural Networks(CNNs).The transformers demonstrate the ability to model long-range dependencies by utilizing a self-attention mechanism.This study aims to provide a comprehensive survey of recent transformerbased approaches in image and video applications,as well as diffusion models.We begin by discussing existing surveys of vision transformers and comparing them to this work.Then,we review the main components of a vanilla transformer network,including the self-attention mechanism,feed-forward network,position encoding,etc.In the main part of this survey,we review recent transformer-based models in three categories:Transformer for downstream tasks,Vision Transformer for Generation,and Vision Transformer for Segmentation.We also provide a comprehensive overview of recent transformer models for video tasks and diffusion models.We compare the performance of various hierarchical transformer networks for multiple tasks on popular benchmark datasets.Finally,we explore some future research directions to further improve the field. 展开更多
关键词 transformer vision transformer self-attention hierarchical transformer diffusion models
在线阅读 下载PDF
上一页 1 2 9 下一页 到第
使用帮助 返回顶部