期刊文献+
共找到39,036篇文章
< 1 2 250 >
每页显示 20 50 100
基于Vision Transformer的肠镜图像识别模型在结肠疾病中的诊断作用研究
1
作者 张婷 徐伟超 +6 位作者 许亚培 王子康 夏悦桐 刘秋华 杜姚 才艳茹 杨倩 《时珍国医国药》 北大核心 2026年第5期987-992,共6页
目的 探究人工智能诊断系统视觉Transformer(ViT)通过分析临床内镜成像数据对结肠疾病的诊断作用。方法 回顾性收集1082例组织学证实患有结肠疾病(包括结肠息肉、结肠炎、结肠癌)患者的3000张标准白光结肠镜图像。对这三类疾病处理后的... 目的 探究人工智能诊断系统视觉Transformer(ViT)通过分析临床内镜成像数据对结肠疾病的诊断作用。方法 回顾性收集1082例组织学证实患有结肠疾病(包括结肠息肉、结肠炎、结肠癌)患者的3000张标准白光结肠镜图像。对这三类疾病处理后的数据集按照7∶2∶1的比例划分,在每类疾病图像中随机选取70%作为训练集(Train),20%作为测试集(Test),10%作为验证集(Predict),最后通过使用ViT模型对图像进行识别分类。结果 在测试集中,该模型对于结肠息肉、结肠炎、结肠癌的肠镜图像分类准确率为:结肠息肉99.61%、结肠炎99.67%、结肠癌100.00%。结论 ViT在检测结肠疾病方面具有较高的诊断准确率,该模型可协助基层医院提高结肠疾病诊断的准确率,也可帮助初级内镜医师提高识别结肠疾病的能力,具有较为可靠的临床应用价值。 展开更多
关键词 结肠疾病 vision Transformer 分类识别 临床应用
原文传递
基于Vision Mamba模型的渔业监测物种分类性能比较
2
作者 张泽海 黄小双 +2 位作者 孔祥洪 刘必林 陈新军 《上海海洋大学学报》 北大核心 2026年第2期508-519,共12页
渔业电子观察员(Electronic monitoring)是实施渔业智能化监管的重要手段,图像识别是其支撑的关键技术之一,如何解决边缘计算场景下部署高性能、轻量化模型是目前面临的挑战。本研究引入深度学习领域的Vision Mamba(ViM)模型,该模型利... 渔业电子观察员(Electronic monitoring)是实施渔业智能化监管的重要手段,图像识别是其支撑的关键技术之一,如何解决边缘计算场景下部署高性能、轻量化模型是目前面临的挑战。本研究引入深度学习领域的Vision Mamba(ViM)模型,该模型利用选择性状态空间机制(State space model,SSM)构建双向编码器,在保持线性计算复杂度的同时实现了对图像长距离依赖关系的全局建模。研究以自然保护协会渔业监测数据集为基础,与ResNet、EfficientNet、DeiT等主流模型开展了系统性的性能对比研究。结果显示,ViM模型在效率与精度上均表现出卓越性能。在轻量级模型中,ViM-Tiny在比ResNet-18基线模型少44.28%参数量的情况下,准确率提升了1.12%,F1分数提升了2.19%。在中量级模型中,ViM-Small在参数量相较ResNet-101基线模型减少44.65%的情况下,仍能实现与之接近持平的准确率(0.960 3)与F1分数(0.964 5)。研究表明,ViM模型能够在显著降低模型复杂度的同时,仍保持强大的渔业物种分类能力,在轻量化与高精度之间取得了很好的平衡。研究为构建高效、智能的渔业监管系统提供了新的技术路径。 展开更多
关键词 渔业电子观察员 图像分类 vision Mamba模型 深度学习 渔业监测数据集
原文传递
Geometric parameter identification of bridge precast box girder sections based on deep learning and computer vision 被引量:4
3
作者 JIA Jingwei NI Youhao +2 位作者 MAO Jianxiao XU Yinfei WANG Hao 《Journal of Southeast University(English Edition)》 2025年第3期278-285,共8页
To overcome the limitations of low efficiency and reliance on manual processes in the measurement of geometric parameters for bridge prefabricated components,a method based on deep learning and computer vision is deve... To overcome the limitations of low efficiency and reliance on manual processes in the measurement of geometric parameters for bridge prefabricated components,a method based on deep learning and computer vision is developed to identify the geometric parameters.The study utilizes a common precast element for highway bridges as the research subject.First,edge feature points of the bridge component section are extracted from images of the precast component cross-sections by combining the Canny operator with mathematical morphology.Subsequently,a deep learning model is developed to identify the geometric parameters of the precast components using the extracted edge coordinates from the images as input and the predefined control parameters of the bridge section as output.A dataset is generated by varying the control parameters and noise levels for model training.Finally,field measurements are conducted to validate the accuracy of the developed method.The results indicate that the developed method effectively identifies the geometric parameters of bridge precast components,with an error rate maintained within 5%. 展开更多
关键词 bridge precast components section geometry parameters size identification computer vision deep learning
在线阅读 下载PDF
卷积神经网络与Vision Transformer在胶质瘤中的研究进展
4
作者 杨浩辉 徐涛 +3 位作者 王伟 安良良 敖用芳 朱家宝 《磁共振成像》 北大核心 2026年第1期168-174,共7页
胶质瘤因高度异质性、强侵袭性及预后差,传统诊疗面临巨大挑战。深度学习技术的引入为其精准诊疗提供了新路径,其中卷积神经网络(convolutional neural network,CNN)与Vision Transformer(ViT)是核心工具。CNN凭借层级化卷积操作在局部... 胶质瘤因高度异质性、强侵袭性及预后差,传统诊疗面临巨大挑战。深度学习技术的引入为其精准诊疗提供了新路径,其中卷积神经网络(convolutional neural network,CNN)与Vision Transformer(ViT)是核心工具。CNN凭借层级化卷积操作在局部特征提取(如肿瘤边缘、纹理细节)上具有天然优势,而ViT基于自注意力机制在全局上下文建模(如肿瘤跨区域异质性、多模态关联)方面表现突出,二者的融合策略通过整合局部精细特征与全局关联信息,在应对胶质瘤边界模糊、跨模态数据异构性等临床难题中展现出显著优势。本文综述了二者在胶质瘤检测与分割、病理分级、分子分型、预后评估等关键临床任务中的研究进展,阐述了原理、单独应用及融合策略。同时,本文也探讨了当前研究中存在的挑战,诸如对数据标注的强依赖性、模型可解释性不足等问题,并展望了未来的发展方向,例如构建轻量化架构、发展自监督学习以及推进多组学融合等前沿,以期为胶质瘤智能诊断提供系统性参考。 展开更多
关键词 胶质瘤 深度学习 卷积神经网络 vision Transformer 磁共振成像
暂未订购
基于条件生成对抗网络和Vision Transformer的胎儿颅脑超声标准切面识别方法
5
作者 李惠莲 林艺榕 +1 位作者 刘中华 柳培忠 《临床超声医学杂志》 2026年第2期164-169,共6页
胎儿颅脑超声检查是产前常规筛查中至关重要的一环,准确识别标准切面对于评估胎儿大脑发育状况具有重要意义。然而,由于超声图像质量差异和切面获取的复杂性,准确识别标准切面具有较大的挑战性。本文提出了一种基于条件对抗生成网络(CG... 胎儿颅脑超声检查是产前常规筛查中至关重要的一环,准确识别标准切面对于评估胎儿大脑发育状况具有重要意义。然而,由于超声图像质量差异和切面获取的复杂性,准确识别标准切面具有较大的挑战性。本文提出了一种基于条件对抗生成网络(CGAN)和Vision Transformer的胎儿颅脑超声标准切面识别方法,利用CGAN对原始数据进行增强,生成额外的标准切面和非标准切面图像,解决数据不足的问题;同时采用YOLOv9模型对超声图像中的颅骨区域进行自动裁剪,去除无关信息,确保模型专注于关键区域。在分类模型中采用Vision Transformer对所有输入图像进行归一化和尺寸调整,使用了数据增强技术如随机水平或垂直翻转、调整图像对比度、中心裁剪和调整图像饱和度等。结果显示,相较于现有最优模型CSwin Transformer的方法,本文提出的方法在胎儿颅脑超声标准切面识别任务中表现出色,其精确率、召回率、F1分数及准确率分别为92.5%、92.3%、92.4%和93.3%。该方法在提升识别精度方面具有显著优势,为临床超声检查提供了有效技术支持。 展开更多
关键词 条件生成对抗网络 vision Transformer 颅脑超声 胎儿 标准切面识别方法
暂未订购
基于Vision Transformer的高炉风口智能监测模型及应用
6
作者 王浩男 韩明博 +1 位作者 但家云 李强 《钢铁研究学报》 北大核心 2026年第1期25-37,共13页
高炉下部风口窥视孔可以实时监测高炉回旋区的燃烧特征与喷煤状态等关键冶炼状态信息,进而判断煤气流分布和炉缸活跃程度等重要参数。为解决风口监测过程中存在的主观性与时滞性问题,本工作基于风口图像非结构大数据与Vision Transforme... 高炉下部风口窥视孔可以实时监测高炉回旋区的燃烧特征与喷煤状态等关键冶炼状态信息,进而判断煤气流分布和炉缸活跃程度等重要参数。为解决风口监测过程中存在的主观性与时滞性问题,本工作基于风口图像非结构大数据与Vision Transformer架构,建立了高炉风口智能监测模型TI-ViT。首先,对采集到的风口图像进行预处理,通过特征辨析与标签标定形成典型炉况数据集;进而,基于Vision Transformer架构构建了TI-ViT风口图像识别模型;最后,对TI-ViT模型进行性能评估,重点探究了模型深度对准确率、参数量、训练时间与运行时间的影响,并与传统卷积神经网络模型进行比较。经验证,TI-ViT模型的准确率达到97.7%,相比基于卷积神经网络的模型提升了9.1%,单张图像的推理时间仅为15.75 ms。将基于本研究模型所开发的“智慧眼”系统应用于现场实践,其识别准确率可达95.2%,表明该系统实现了对高炉风口的实时监测、识别与预警,有助于降低钢铁企业对风口异常状态的监测与诊断成本,为高炉炼铁智能化提供了新的发展方向。 展开更多
关键词 高炉风口 计算机视觉 vision Transformer 图像识别 高炉炼铁
原文传递
基于自适应局部划分Vision Transformer的车辆重识别方法
7
作者 徐萌兮 陈海鑫 徐焕宇 《汽车技术》 北大核心 2026年第4期17-25,共9页
针对复杂场景下因摄像头视角变化、遮挡及车辆外观相似性导致模型的特征表达不充分和鲁棒性不足等问题,提出一种基于自适应局部划分Vision Transformer的车辆重识别方法。结合卷积和注意力设计自适应局部划分(ALP)模块;通过层次化注意... 针对复杂场景下因摄像头视角变化、遮挡及车辆外观相似性导致模型的特征表达不充分和鲁棒性不足等问题,提出一种基于自适应局部划分Vision Transformer的车辆重识别方法。结合卷积和注意力设计自适应局部划分(ALP)模块;通过层次化注意力融合(HAF)模块整合低层视觉细节与高层全局语义,为自适应区域划分提供特征引导;引入多重特征嵌入(MFE)模块,基于相机和视角的动态加权机制提升多视角、多相机环境下的特征区分能力。试验结果表明:在车辆重识别任务中,所提出方法在VeRi-776数据集上的mAP和Rank-1分别达到81.0%、97.1%,在VehicleID数据集上的Rank-1达到80.2%,显著提升了模型的识别精度和鲁棒性。 展开更多
关键词 车辆重识别 vision TRANSFORMER 局部划分 注意力融合
在线阅读 下载PDF
有效诊断Vision Transformer网络的滚动轴承故障诊断方法
8
作者 罗志勇 李明周 董鑫 《重庆邮电大学学报(自然科学版)》 北大核心 2026年第1期146-155,共10页
针对滚动轴承故障诊断中特征提取不完整和诊断效率低的问题,提出了有效诊断Vision Transformer(EDViT)网络。采用基于峰度的加权融合策略,合并传感器信息;利用短时傅里叶变换,将融合后的信号转换为时频图像;依次应用EDViT的双重注意卷... 针对滚动轴承故障诊断中特征提取不完整和诊断效率低的问题,提出了有效诊断Vision Transformer(EDViT)网络。采用基于峰度的加权融合策略,合并传感器信息;利用短时傅里叶变换,将融合后的信号转换为时频图像;依次应用EDViT的双重注意卷积模块和双分支补丁视觉变换模块来提取局部和全局特征,使用分类器进行故障分类。实验验证在凯斯西储大学轴承数据集上进行。结果表明,EDViT模型具有出色的特征提取能力、快速的收敛速度和较高的诊断准确性。与其他方法的对比表明,EDViT模型具有很强的泛化能力和鲁棒性。 展开更多
关键词 有效诊断vision Transformer网络 滚动轴承 故障诊断
在线阅读 下载PDF
Gait-ViT:基于Vision Transformer的跨视角步态识别方法
9
作者 沈澍 王森 +1 位作者 黄苏岩 张秉睿 《小型微型计算机系统》 北大核心 2026年第3期646-652,共7页
步态识别作为一种远程生物特征识别技术,在医疗康复、刑侦侦查及社会治安等领域展现出广泛的应用前景.近年来,随着深度学习的快速发展,步态识别方法逐渐从传统的卷积神经网络(Convolutional Neural Network,CNN)转向更为先进的Transfor... 步态识别作为一种远程生物特征识别技术,在医疗康复、刑侦侦查及社会治安等领域展现出广泛的应用前景.近年来,随着深度学习的快速发展,步态识别方法逐渐从传统的卷积神经网络(Convolutional Neural Network,CNN)转向更为先进的Transformer架构.尽管CNN在图像处理任务中表现优异,但其对图像关键区域的关注能力有限,而注意力机制则能够通过聚焦图像局部区域来学习更具判别性的特征.为此,本文提出了一种融合注意力机制的Vision Transformer模型(Gait-ViT)用于步态识别,该方法首先将步态轮廓划分成多个小块并转化成块序列;然后通过位置嵌入和类嵌入对序列中的位置信息进行重新排列和编码;最后,将向量序列反馈给Vision Transformer进行预测.Gait-ViT模型在CASIA-B和OU-MVLP两个公开步态数据集上分别取得了98.1%和91.2%的识别准确率,验证了所提模型的有效性. 展开更多
关键词 步态识别 vision Transformer 卷积神经网络 特征提取
在线阅读 下载PDF
Approximate-Guided Representation Learning in Vision Transformer
10
作者 Kaili Wang Xinwei Sun +2 位作者 Huijie He Fenhua Bai Tao Shen 《CAAI Transactions on Intelligence Technology》 2025年第5期1459-1477,共19页
In recent years,the transformer model has demonstrated excellent performance in computer vision(CV)applications.The key lies in its guided representation attention mechanism,which uses dot-product to depict complex fe... In recent years,the transformer model has demonstrated excellent performance in computer vision(CV)applications.The key lies in its guided representation attention mechanism,which uses dot-product to depict complex feature relationships,and comprehensively understands the context semantics to obtain feature weights.Then feature enhancement is implemented by guiding the target matrix through feature weights.However,the uncertainty and inconsistency of features are widespread that prone to confusion in the description of relationships within dot-product attention mechanisms.To solve this problem,this paper proposed a novel approximate-guided representation learning methodology for vision transformer.The kernelised matroids fuzzy rough set is defined,wherein the closed sets inside kernelised fuzzy information granules of matroids structures can constitute the subspace of lower approximation in rough sets.Thus,the kernel relation is employed to characterise image feature granules that will be reconstructed according to the independent set in matroids theory.Then,according to the characteristics of the closed set within matroids,the feature attention weight is formed by using the lower approximation to realise the approximate guidance of features.The approximate-guided representation mechanism can be flexibly deployed as a plug-and-play component in a wide range of CV tasks.Extensive empirical results demonstrate that the proposed method outperforms the majority of advanced prevalent models,especially in terms of robustness. 展开更多
关键词 computer vision deep learning image representation kernel methods rough sets
在线阅读 下载PDF
The Role of Artificial Intelligence in Enhancing Financial Reporting Quality:Evidence from Saudi Arabia’s Vision 2030 Transformation
11
作者 Amal Yamani 《Journal of Modern Accounting and Auditing》 2025年第4期237-251,共15页
As it leads to a significant transformation under Saudi Arabia’s Vision 2030 initiative,artificial intelligence(AI)is changing the course of corporate systems,including financial reporting.This research examines the ... As it leads to a significant transformation under Saudi Arabia’s Vision 2030 initiative,artificial intelligence(AI)is changing the course of corporate systems,including financial reporting.This research examines the role of AI in advancing financial reporting quality(FRQ)in the Kingdom’s evolving movement toward improved economy and governance.Using qualitative methodology informed by semi-structured interviews with senior finance leaders,auditors,and regulatory professionals in key sectors,the study reveals rich details about how AI technologies can-and will-be realized today,and how they can effectively improve reporting accuracy,timeliness,transparency,and regulatory compliance.The study helpfully outlines several dimensions where,as sworn,AI is advancing FRQ by automating a range of complicated data-intensive tasks,examining and identifying irregularities,and contributing to real-time decision making.Participants explained that AI would reinforce FRQ by ensuring ethical and transparent governance and enabling investment in co-human collaborative decision-making.The findings relate to agency and stakeholder theories.The research supports the notion that AI reduces information asymmetry and builds trust with investors and regulators.This study adds to a small number of qualitative studies on AI and financial governance in emerging economies and has important implications for policymakers,corporate actors,and standard setters.Moreover,it demonstrates the requirement for a collaborative national AI governance approach to ensure optimized value under the full potential of digital transformation and financial reporting standards.Future studies may explore longitudinal or cross-country comparative studies to further develop these insights and understanding. 展开更多
关键词 artificial intelligence financial reporting quality vision 2030 AI governance Saudi Arabia
在线阅读 下载PDF
Enhanced Plant Species Identification through Metadata Fusion and Vision Transformer Integration
12
作者 Hassan Javed Labiba Gillani Fahad +2 位作者 Syed Fahad Tahir Mehdi Hassan Hani Alquhayz 《Computers, Materials & Continua》 2025年第11期3981-3996,共16页
Accurate plant species classification is essential for many applications,such as biodiversity conservation,ecological research,and sustainable agricultural practices.Traditional morphological classification methods ar... Accurate plant species classification is essential for many applications,such as biodiversity conservation,ecological research,and sustainable agricultural practices.Traditional morphological classification methods are inherently slow,labour-intensive,and prone to inaccuracies,especiallywhen distinguishing between species exhibiting visual similarities or high intra-species variability.To address these limitations and to overcome the constraints of imageonly approaches,we introduce a novel Artificial Intelligence-driven framework.This approach integrates robust Vision Transformer(ViT)models for advanced visual analysis with a multi-modal data fusion strategy,incorporating contextual metadata such as precise environmental conditions,geographic location,and phenological traits.This combination of visual and ecological cues significantly enhances classification accuracy and robustness,proving especially vital in complex,heterogeneous real-world environments.The proposedmodel achieves an impressive 97.27%of test accuracy,andMean Reciprocal Rank(MRR)of 0.9842 that demonstrates strong generalization capabilities.Furthermore,efficient utilization of high-performance GPU resources(RTX 3090,18 GB memory)ensures scalable processing of highdimensional data.Comparative analysis consistently confirms that ourmetadata fusion approach substantially improves classification performance,particularly formorphologically similar species,and through principled self-supervised and transfer learning from ImageNet,the model adapts efficiently to new species,ensuring enhanced generalization.This comprehensive approach holds profound practical implications for precise conservation initiatives,rigorous ecological monitoring,and advanced agricultural management. 展开更多
关键词 vision transformers(ViTs) TRANSFORMERS machine learning deep learning plant species classification MULTI-ORGAN
在线阅读 下载PDF
High-precision copper-grade identification via a vision transformer with PGNAA
13
作者 Jie Cao Chong-Gui Zhong +6 位作者 Han-Ting You Yan Zhang Ren-Bo Wang Shu-Min Zhou Jin-Hui Qu Rui Chen Shi-Liang Liu 《Nuclear Science and Techniques》 2025年第7期89-99,共11页
The identification of ore grades is a critical step in mineral resource exploration and mining.Prompt gamma neutron activation analysis(PGNAA)technology employs gamma rays generated by the nuclear reactions between ne... The identification of ore grades is a critical step in mineral resource exploration and mining.Prompt gamma neutron activation analysis(PGNAA)technology employs gamma rays generated by the nuclear reactions between neutrons and samples to achieve the qualitative and quantitative detection of sample components.In this study,we present a novel method for identifying copper grade by combining the vision transformer(ViT)model with the PGNAA technique.First,a Monte Carlo simulation is employed to determine the optimal sizes of the neutron moderator,thermal neutron absorption material,and dimensions of the device.Subsequently,based on the parameters obtained through optimization,a PGNAA copper ore measurement model is established.The gamma spectrum of the copper ore is analyzed using the ViT model.The ViT model is optimized for hyperparameters using a grid search.To ensure the reliability of the identification results,the test results are obtained through five repeated tenfold cross-validations.Long short-term memory and convolutional neural network models are compared with the ViT method.These results indicate that the ViT method is efficient in identifying copper ore grades with average accuracy,precision,recall,F_(1)score,and F_(1)(-)score values of 0.9795,0.9637,0.9614,0.9625,and 0.9942,respectively.When identifying associated minerals,the ViT model can identify Pb,Zn,Fe,and Co minerals with identification accuracies of 0.9215,0.9396,0.9966,and 0.8311,respectively. 展开更多
关键词 Copper-grade identification vision transformer model Prompt gamma neutron activation analysis Monte Carlo N-particle
在线阅读 下载PDF
Total score of the computer vision syndrome questionnaire predicts refractive errors and binocular vision anomalies
14
作者 Mosaad Alhassan Tasneem Samman +5 位作者 Hatoun Badukhen Muhamad Alrashed Balsam Alabdulkader Essam Almutleb Tahani Alqahtani Ali Almustanyir 《International Journal of Ophthalmology(English edition)》 2026年第1期90-96,共7页
AIM:To evaluate the efficacy of the total computer vision syndrome questionnaire(CVS-Q)score as a predictive tool for identifying individuals with symptomatic binocular vision anomalies and refractive errors.METHODS:A... AIM:To evaluate the efficacy of the total computer vision syndrome questionnaire(CVS-Q)score as a predictive tool for identifying individuals with symptomatic binocular vision anomalies and refractive errors.METHODS:A total of 141 healthy computer users underwent comprehensive clinical visual function assessments,including evaluations of refractive errors,accommodation(amplitude of accommodation,positive relative accommodation,negative relative accommodation,accommodative accuracy,and accommodative facility),and vergence(phoria,positive and negative fusional vergence,near point of convergence,and vergence facility).Total CVS-Q scores were recorded to explore potential associations between symptom scores and the aforementioned clinical visual function parameters.RESULTS:The cohort included 54 males(38.3%)with a mean age of 23.9±0.58y and 87 age-matched females(61.7%)with a mean age of 23.9±0.53y.The multiple regression model was statistically significant[R²=0.60,F=13.28,degrees of freedom(DF=17122,P<0.001].This indicates that 60%of the variance in total CVS-Q scores(reflecting reported symptoms)could be explained by four clinical measurements:amplitude of accommodation,positive relative accommodation,exophoria at distance and near,and positive fusional vergence at near.CONCLUSION:The total CVS-Q score is a valid and reliable tool for predicting the presence of various nonstrabismic binocular vision anomalies and refractive errors in symptomatic computer users. 展开更多
关键词 computer vision syndrome refractive errors ACCOMMODATION VERGENCE binocular vision SYMPTOMS
原文传递
Video action recognition meets vision-language models exploring human factors in scene interaction: a review
15
作者 GUO Yuping GAO Hongwei +3 位作者 YU Jiahui GE Jinchao HAN Meng JU Zhaojie 《Optoelectronics Letters》 2025年第10期626-640,共15页
Video action recognition(VAR)aims to analyze dynamic behaviors in videos and achieve semantic understanding.VAR faces challenges such as temporal dynamics,action-scene coupling,and the complexity of human interactions... Video action recognition(VAR)aims to analyze dynamic behaviors in videos and achieve semantic understanding.VAR faces challenges such as temporal dynamics,action-scene coupling,and the complexity of human interactions.Existing methods can be categorized into motion-level,event-level,and story-level ones based on spatiotemporal granularity.However,single-modal approaches struggle to capture complex behavioral semantics and human factors.Therefore,in recent years,vision-language models(VLMs)have been introduced into this field,providing new research perspectives for VAR.In this paper,we systematically review spatiotemporal hierarchical methods in VAR and explore how the introduction of large models has advanced the field.Additionally,we propose the concept of“Factor”to identify and integrate key information from both visual and textual modalities,enhancing multimodal alignment.We also summarize various multimodal alignment methods and provide in-depth analysis and insights into future research directions. 展开更多
关键词 human factors video action recognition vision language models analyze dynamic behaviors spatiotemporal granularity video action recognition var aims multimodal alignment scene interaction
原文传递
孪生多级Vision Transformer高分遥感影像变化检测方法
16
作者 黄英杰 《测绘与空间地理信息》 2026年第2期123-126,130,共5页
针对现有遥感变化检测模型捕获特征不全面,深、浅层特征利用不充分,导致分割精度不高的问题,提出一种结合Vision Transformer与孪生架构的遥感影像变化检测模型。在编码器端,采用孪生多级Vision Transformer实现空间特征提取与全局上下... 针对现有遥感变化检测模型捕获特征不全面,深、浅层特征利用不充分,导致分割精度不高的问题,提出一种结合Vision Transformer与孪生架构的遥感影像变化检测模型。在编码器端,采用孪生多级Vision Transformer实现空间特征提取与全局上下文特征建模,同时采用haar小波下采样层进行特征图尺寸压缩,减少细节特征的丢失;在特征解码过程中,引入全尺度特征连接机制,充分利用不同来源的深、浅层特征。实验结果表明,所提出模型在分割精度上优于当前的主流模型,能够准确地捕获变化目标的边界与细节信息。 展开更多
关键词 遥感变化检测 孪生架构 vision Transformer haar小波下采样 全尺度特征连接
在线阅读 下载PDF
A Deep Learning-and AI-Enhanced Telecentric Vision Framework for Automated Imaging-to-CAD Reconstruction
17
作者 Toa Saito Kantawatchr Chaiprabha +2 位作者 Kosuke Takano Gridsada Phanomchoeng Ratchatin Chancharoen 《Computer Modeling in Engineering & Sciences》 2026年第3期909-933,共25页
This paper presents an automated imaging-to-CAD reconstruction system that combines telecentric vision and deep learning for high-accuracy digital reconstruction of printed circuit boards(PCBs).The framework integrate... This paper presents an automated imaging-to-CAD reconstruction system that combines telecentric vision and deep learning for high-accuracy digital reconstruction of printed circuit boards(PCBs).The framework integrates a telecentric camera with a Cartesian scanning platform to capture distortion-free,high-resolution PCB images,which are stitched into a single orthographic composite.A YOLO-based detection model,trained on a dataset of 270 PCB images across 23 component classes with data augmentation,identifies and localizes electronic components with a mean average precision of 0.932.Detected components are automatically matched to corresponding 3D CAD models from a part library and assembled within a Fusion 360 environment,producing a 3D digital replica.Experimental results show a similarity score of 0.894 and dimensional deviations below 2%,outperforming both SensoPart image measurement and manual vernier methods.The proposed approach bridges optical metrology and CAD automation,providing a scalable solution for AI-assisted reverse engineering,digital archiving,and intelligent manufacturing. 展开更多
关键词 METROLOGY telecentric vision YOLO imaging-to-CAD reconstruction
在线阅读 下载PDF
A Hybrid Vision Transformer with Attention Architecture for Efficient Lung Cancer Diagnosis
18
作者 Abdu Salam Fahd M.Aldosari +4 位作者 Donia Y.Badawood Farhan Amin Isabel de la Torre Gerardo Mendez Mezquita Henry Fabian Gongora 《Computers, Materials & Continua》 2026年第4期1129-1147,共19页
Lung cancer remains a major global health challenge,with early diagnosis crucial for improved patient survival.Traditional diagnostic techniques,including manual histopathology and radiological assessments,are prone t... Lung cancer remains a major global health challenge,with early diagnosis crucial for improved patient survival.Traditional diagnostic techniques,including manual histopathology and radiological assessments,are prone to errors and variability.Deep learning methods,particularly Vision Transformers(ViT),have shown promise for improving diagnostic accuracy by effectively extracting global features.However,ViT-based approaches face challenges related to computational complexity and limited generalizability.This research proposes the DualSet ViT-PSO-SVM framework,integrating aViTwith dual attentionmechanisms,Particle Swarm Optimization(PSO),and SupportVector Machines(SVM),aiming for efficient and robust lung cancer classification acrossmultiple medical image datasets.The study utilized three publicly available datasets:LIDC-IDRI,LUNA16,and TCIA,encompassing computed tomography(CT)scans and histopathological images.Data preprocessing included normalization,augmentation,and segmentation.Dual attention mechanisms enhanced ViT’s feature extraction capabilities.PSO optimized feature selection,and SVM performed classification.Model performance was evaluated on individual and combined datasets,benchmarked against CNN-based and standard ViT approaches.The DualSet ViT-PSO-SVM significantly outperformed existing methods,achieving superior accuracy rates of 97.85%(LIDC-IDRI),98.32%(LUNA16),and 96.75%(TCIA).Crossdataset evaluations demonstrated strong generalization capabilities and stability across similar imagingmodalities.The proposed framework effectively bridges advanced deep learning techniques with clinical applicability,offering a robust diagnostic tool for lung cancer detection,reducing complexity,and improving diagnostic reliability and interpretability. 展开更多
关键词 Deep learning artificial intelligence healthcare medical imaging vision transformer
在线阅读 下载PDF
From microstructure to performance optimization:Innovative applications of computer vision in materials science
19
作者 Chunyu Guo Xiangyu Tang +10 位作者 Yu’e Chen Changyou Gao Qinglin Shan Heyi Wei Xusheng Liu Chuncheng Lu Meixia Fu Enhui Wang Xinhong Liu Xinmei Hou Yanglong Hou 《International Journal of Minerals,Metallurgy and Materials》 2026年第1期94-115,共22页
The rapid advancements in computer vision(CV)technology have transformed the traditional approaches to material microstructure analysis.This review outlines the history of CV and explores the applications of deep-lear... The rapid advancements in computer vision(CV)technology have transformed the traditional approaches to material microstructure analysis.This review outlines the history of CV and explores the applications of deep-learning(DL)-driven CV in four key areas of materials science:microstructure-based performance prediction,microstructure information generation,microstructure defect detection,and crystal structure-based property prediction.The CV has significantly reduced the cost of traditional experimental methods used in material performance prediction.Moreover,recent progress made in generating microstructure images and detecting microstructural defects using CV has led to increased efficiency and reliability in material performance assessments.The DL-driven CV models can accelerate the design of new materials with optimized performance by integrating predictions based on both crystal and microstructural data,thereby allowing for the discovery and innovation of next-generation materials.Finally,the review provides insights into the rapid interdisciplinary developments in the field of materials science and future prospects. 展开更多
关键词 MICROSTRUCTURE deep learning computer vision performance prediction image generation
在线阅读 下载PDF
Brief application notes for vision transformer (ViT) and convolutional neural network (CNN) in medical imaging
20
作者 Wei Kitt Wong Melinda Melinda 《Medical Data Mining》 2026年第2期34-42,共9页
In contemporary computer vision,convolutional neural networks(CNNs)and vision transformers(ViTs)represent the two primary architectural paradigms for image recognition.While both approaches have been widely adopted in... In contemporary computer vision,convolutional neural networks(CNNs)and vision transformers(ViTs)represent the two primary architectural paradigms for image recognition.While both approaches have been widely adopted in medical imaging applications,they operate based on fundamentally different computational principles.This report attempts to provide brief application notes on ViTs and CNNs,particularly focusing on scenarios that guide the selection of one architecture over the other in practical medical implementations.Generally,CNNs rely on convolutional kernels,localized receptive fields,and weight sharing,enabling efficient hierarchical feature extraction.These properties contribute to strong performance in detecting spatially constrained patterns such as textures,edges,and anatomical boundaries,while maintaining relatively low computational requirements.ViTs,on the other hand,decompose images into smaller segments referred to as tokens and employ self-attention mechanisms to model relationships across the entire image.This global modeling capability allows ViTs to capture long-range dependencies that may be difficult for convolution-based architectures to learn.However,ViTs typically achieve optimal performance when trained on extremely large datasets or when supported by extensive pretraining,as their reduced inductive bias requires greater data exposure to learn robust representations.This report briefly examines the architectural structure,underlying mathematical foundations,and relative performance characteristics of CNNs and ViTs,drawing upon recent findings from contemporary research.Emphasis is placed on understanding how differences in data availability,computational resources,and task requirements influence model effectiveness across medical imaging domains.Most importantly,the report serves as a concise application guide for practitioners seeking informed implementation decisions between these two influential deep learning frameworks. 展开更多
关键词 convolutional neural network vision transformer comparative study medical imaging
在线阅读 下载PDF
上一页 1 2 250 下一页 到第
使用帮助 返回顶部