期刊文献+
共找到2,310篇文章
< 1 2 116 >
每页显示 20 50 100
Visioneer Strobe XP 100
1
《公共支出与采购》 2003年第5期22-22,共1页
关键词 visioneer Strobe XP 100 扫描仪 产品功能 自动倾斜校正工作
在线阅读 下载PDF
扫描仪 HP ScanJet 4S和Visioneer PaperPort Vx:性格迥异的双胞胎
2
作者 Alfred Poor 黄国胜 《个人电脑》 1996年第4期33-34,共2页
Danny DeVito和Arnold Sch-warzenegger在电影Twins中扮演了一对性格各异的双胞胎。
关键词 扫描仪 HP ScanJet 4S 传感器 visioneer PaperPort Vx 灰度级 双胞胎 性格 个性心理特征
在线阅读 下载PDF
基于毫米波感知的皮革瑕疵分类方法
3
作者 张健 关灏文 《小型微型计算机系统》 北大核心 2026年第2期257-264,共8页
皮革瑕疵分类是确保皮革产品质量的关键环节.传统的人工检测和图像处理方法受限于光照等环境因素,难以满足高效检测需求.近年来,深度学习特别是卷积神经网络(CNN)的应用提高了瑕疵检测的准确性和效率,但仍受到环境影响.毫米波雷达技术... 皮革瑕疵分类是确保皮革产品质量的关键环节.传统的人工检测和图像处理方法受限于光照等环境因素,难以满足高效检测需求.近年来,深度学习特别是卷积神经网络(CNN)的应用提高了瑕疵检测的准确性和效率,但仍受到环境影响.毫米波雷达技术作为一种新兴的无损检测方法,因其强穿透性和不受光照等因素影响的特性而逐渐受到关注.文中提出了一种结合毫米波雷达与改进Vision Transformer模型的皮革瑕疵分类方法,利用毫米波雷达信号提取皮革瑕疵的时频特征,并通过深度学习模型进行分类,在自建数据集上达到了95.62%的准确率,相比经典的分类模型优势显著. 展开更多
关键词 毫米波雷达 皮革瑕疵分类 Vision Transformer模型 迁移学习
在线阅读 下载PDF
卷积神经网络与Vision Transformer在胶质瘤中的研究进展
4
作者 杨浩辉 徐涛 +3 位作者 王伟 安良良 敖用芳 朱家宝 《磁共振成像》 北大核心 2026年第1期168-174,共7页
胶质瘤因高度异质性、强侵袭性及预后差,传统诊疗面临巨大挑战。深度学习技术的引入为其精准诊疗提供了新路径,其中卷积神经网络(convolutional neural network,CNN)与Vision Transformer(ViT)是核心工具。CNN凭借层级化卷积操作在局部... 胶质瘤因高度异质性、强侵袭性及预后差,传统诊疗面临巨大挑战。深度学习技术的引入为其精准诊疗提供了新路径,其中卷积神经网络(convolutional neural network,CNN)与Vision Transformer(ViT)是核心工具。CNN凭借层级化卷积操作在局部特征提取(如肿瘤边缘、纹理细节)上具有天然优势,而ViT基于自注意力机制在全局上下文建模(如肿瘤跨区域异质性、多模态关联)方面表现突出,二者的融合策略通过整合局部精细特征与全局关联信息,在应对胶质瘤边界模糊、跨模态数据异构性等临床难题中展现出显著优势。本文综述了二者在胶质瘤检测与分割、病理分级、分子分型、预后评估等关键临床任务中的研究进展,阐述了原理、单独应用及融合策略。同时,本文也探讨了当前研究中存在的挑战,诸如对数据标注的强依赖性、模型可解释性不足等问题,并展望了未来的发展方向,例如构建轻量化架构、发展自监督学习以及推进多组学融合等前沿,以期为胶质瘤智能诊断提供系统性参考。 展开更多
关键词 胶质瘤 深度学习 卷积神经网络 Vision Transformer 磁共振成像
暂未订购
基于条件生成对抗网络和Vision Transformer的胎儿颅脑超声标准切面识别方法
5
作者 李惠莲 林艺榕 +1 位作者 刘中华 柳培忠 《临床超声医学杂志》 2026年第2期164-169,共6页
胎儿颅脑超声检查是产前常规筛查中至关重要的一环,准确识别标准切面对于评估胎儿大脑发育状况具有重要意义。然而,由于超声图像质量差异和切面获取的复杂性,准确识别标准切面具有较大的挑战性。本文提出了一种基于条件对抗生成网络(CG... 胎儿颅脑超声检查是产前常规筛查中至关重要的一环,准确识别标准切面对于评估胎儿大脑发育状况具有重要意义。然而,由于超声图像质量差异和切面获取的复杂性,准确识别标准切面具有较大的挑战性。本文提出了一种基于条件对抗生成网络(CGAN)和Vision Transformer的胎儿颅脑超声标准切面识别方法,利用CGAN对原始数据进行增强,生成额外的标准切面和非标准切面图像,解决数据不足的问题;同时采用YOLOv9模型对超声图像中的颅骨区域进行自动裁剪,去除无关信息,确保模型专注于关键区域。在分类模型中采用Vision Transformer对所有输入图像进行归一化和尺寸调整,使用了数据增强技术如随机水平或垂直翻转、调整图像对比度、中心裁剪和调整图像饱和度等。结果显示,相较于现有最优模型CSwin Transformer的方法,本文提出的方法在胎儿颅脑超声标准切面识别任务中表现出色,其精确率、召回率、F1分数及准确率分别为92.5%、92.3%、92.4%和93.3%。该方法在提升识别精度方面具有显著优势,为临床超声检查提供了有效技术支持。 展开更多
关键词 条件生成对抗网络 Vision Transformer 颅脑超声 胎儿 标准切面识别方法
暂未订购
基于Vision Transformer的高炉风口智能监测模型及应用
6
作者 王浩男 韩明博 +1 位作者 但家云 李强 《钢铁研究学报》 北大核心 2026年第1期25-37,共13页
高炉下部风口窥视孔可以实时监测高炉回旋区的燃烧特征与喷煤状态等关键冶炼状态信息,进而判断煤气流分布和炉缸活跃程度等重要参数。为解决风口监测过程中存在的主观性与时滞性问题,本工作基于风口图像非结构大数据与Vision Transforme... 高炉下部风口窥视孔可以实时监测高炉回旋区的燃烧特征与喷煤状态等关键冶炼状态信息,进而判断煤气流分布和炉缸活跃程度等重要参数。为解决风口监测过程中存在的主观性与时滞性问题,本工作基于风口图像非结构大数据与Vision Transformer架构,建立了高炉风口智能监测模型TI-ViT。首先,对采集到的风口图像进行预处理,通过特征辨析与标签标定形成典型炉况数据集;进而,基于Vision Transformer架构构建了TI-ViT风口图像识别模型;最后,对TI-ViT模型进行性能评估,重点探究了模型深度对准确率、参数量、训练时间与运行时间的影响,并与传统卷积神经网络模型进行比较。经验证,TI-ViT模型的准确率达到97.7%,相比基于卷积神经网络的模型提升了9.1%,单张图像的推理时间仅为15.75 ms。将基于本研究模型所开发的“智慧眼”系统应用于现场实践,其识别准确率可达95.2%,表明该系统实现了对高炉风口的实时监测、识别与预警,有助于降低钢铁企业对风口异常状态的监测与诊断成本,为高炉炼铁智能化提供了新的发展方向。 展开更多
关键词 高炉风口 计算机视觉 Vision Transformer 图像识别 高炉炼铁
原文传递
Gait-ViT:基于Vision Transformer的跨视角步态识别方法
7
作者 沈澍 王森 +1 位作者 黄苏岩 张秉睿 《小型微型计算机系统》 北大核心 2026年第3期646-652,共7页
步态识别作为一种远程生物特征识别技术,在医疗康复、刑侦侦查及社会治安等领域展现出广泛的应用前景.近年来,随着深度学习的快速发展,步态识别方法逐渐从传统的卷积神经网络(Convolutional Neural Network,CNN)转向更为先进的Transfor... 步态识别作为一种远程生物特征识别技术,在医疗康复、刑侦侦查及社会治安等领域展现出广泛的应用前景.近年来,随着深度学习的快速发展,步态识别方法逐渐从传统的卷积神经网络(Convolutional Neural Network,CNN)转向更为先进的Transformer架构.尽管CNN在图像处理任务中表现优异,但其对图像关键区域的关注能力有限,而注意力机制则能够通过聚焦图像局部区域来学习更具判别性的特征.为此,本文提出了一种融合注意力机制的Vision Transformer模型(Gait-ViT)用于步态识别,该方法首先将步态轮廓划分成多个小块并转化成块序列;然后通过位置嵌入和类嵌入对序列中的位置信息进行重新排列和编码;最后,将向量序列反馈给Vision Transformer进行预测.Gait-ViT模型在CASIA-B和OU-MVLP两个公开步态数据集上分别取得了98.1%和91.2%的识别准确率,验证了所提模型的有效性. 展开更多
关键词 步态识别 Vision Transformer 卷积神经网络 特征提取
在线阅读 下载PDF
有效诊断Vision Transformer网络的滚动轴承故障诊断方法
8
作者 罗志勇 李明周 董鑫 《重庆邮电大学学报(自然科学版)》 北大核心 2026年第1期146-155,共10页
针对滚动轴承故障诊断中特征提取不完整和诊断效率低的问题,提出了有效诊断Vision Transformer(EDViT)网络。采用基于峰度的加权融合策略,合并传感器信息;利用短时傅里叶变换,将融合后的信号转换为时频图像;依次应用EDViT的双重注意卷... 针对滚动轴承故障诊断中特征提取不完整和诊断效率低的问题,提出了有效诊断Vision Transformer(EDViT)网络。采用基于峰度的加权融合策略,合并传感器信息;利用短时傅里叶变换,将融合后的信号转换为时频图像;依次应用EDViT的双重注意卷积模块和双分支补丁视觉变换模块来提取局部和全局特征,使用分类器进行故障分类。实验验证在凯斯西储大学轴承数据集上进行。结果表明,EDViT模型具有出色的特征提取能力、快速的收敛速度和较高的诊断准确性。与其他方法的对比表明,EDViT模型具有很强的泛化能力和鲁棒性。 展开更多
关键词 有效诊断Vision Transformer网络 滚动轴承 故障诊断
在线阅读 下载PDF
Total score of the computer vision syndrome questionnaire predicts refractive errors and binocular vision anomalies
9
作者 Mosaad Alhassan Tasneem Samman +5 位作者 Hatoun Badukhen Muhamad Alrashed Balsam Alabdulkader Essam Almutleb Tahani Alqahtani Ali Almustanyir 《International Journal of Ophthalmology(English edition)》 2026年第1期90-96,共7页
AIM:To evaluate the efficacy of the total computer vision syndrome questionnaire(CVS-Q)score as a predictive tool for identifying individuals with symptomatic binocular vision anomalies and refractive errors.METHODS:A... AIM:To evaluate the efficacy of the total computer vision syndrome questionnaire(CVS-Q)score as a predictive tool for identifying individuals with symptomatic binocular vision anomalies and refractive errors.METHODS:A total of 141 healthy computer users underwent comprehensive clinical visual function assessments,including evaluations of refractive errors,accommodation(amplitude of accommodation,positive relative accommodation,negative relative accommodation,accommodative accuracy,and accommodative facility),and vergence(phoria,positive and negative fusional vergence,near point of convergence,and vergence facility).Total CVS-Q scores were recorded to explore potential associations between symptom scores and the aforementioned clinical visual function parameters.RESULTS:The cohort included 54 males(38.3%)with a mean age of 23.9±0.58y and 87 age-matched females(61.7%)with a mean age of 23.9±0.53y.The multiple regression model was statistically significant[R²=0.60,F=13.28,degrees of freedom(DF=17122,P<0.001].This indicates that 60%of the variance in total CVS-Q scores(reflecting reported symptoms)could be explained by four clinical measurements:amplitude of accommodation,positive relative accommodation,exophoria at distance and near,and positive fusional vergence at near.CONCLUSION:The total CVS-Q score is a valid and reliable tool for predicting the presence of various nonstrabismic binocular vision anomalies and refractive errors in symptomatic computer users. 展开更多
关键词 computer vision syndrome refractive errors ACCOMMODATION VERGENCE binocular vision SYMPTOMS
原文传递
CAFE-GAN: CLIP-Projected GAN with Attention-Aware Generation and Multi-Scale Discrimination
10
作者 Xuanhong Wang Hongyu Guo +3 位作者 Jiazhen Li Mingchen Wang Xian Wang Yijun Zhang 《Computers, Materials & Continua》 2026年第1期1742-1760,共19页
Over the past decade,large-scale pre-trained autoregressive and diffusion models rejuvenated the field of text-guided image generation.However,these models require enormous datasets and parameters,and their multi-step... Over the past decade,large-scale pre-trained autoregressive and diffusion models rejuvenated the field of text-guided image generation.However,these models require enormous datasets and parameters,and their multi-step generation processes are often inefficient and difficult to control.To address these challenges,we propose CAFE-GAN,a CLIP-Projected GAN with Attention-Aware Generation and Multi-Scale Discrimination,which incorporates a pretrained CLIP model along with several key architectural innovations.First,we embed a coordinate attention mechanism into the generator to capture long-range dependencies and enhance feature representation.Second,we introduce a trainable linear projection layer after the CLIP text encoder,which aligns textual embeddings with the generator’s semantic space.Third,we design a multi-scale discriminator that leverages pre-trained visual features and integrates a feature regularization strategy,thereby improving training stability and discrimination performance.Experiments on the CUB and COCO datasets demonstrate that CAFE-GAN outperforms existing text-to-image generation methods,achieving lower Fréchet Inception Distance(FID)scores and generating images with superior visual quality and semantic fidelity,with FID scores of 9.84 and 5.62 on the CUB and COCO datasets,respectively,surpassing current state-of-the-art text-to-image models by varying degrees.These findings offer valuable insights for future research on efficient,controllable text-to-image synthesis. 展开更多
关键词 Large vision language models deep learning computer vision text-to-image generation
在线阅读 下载PDF
Prevalence of heterophoria,tropia,and near point of convergence abnormality in a high school student population in Erbil city center
11
作者 Morad Amir Ahmad 《International Journal of Ophthalmology(English edition)》 2026年第3期556-563,共8页
AIM:To determine the prevalence of tropia,phoria,and abnormality of near point of convergence(NPC),along with associated ocular symptoms,in high school students.METHODS:This cross-sectional study was conducted in Erbi... AIM:To determine the prevalence of tropia,phoria,and abnormality of near point of convergence(NPC),along with associated ocular symptoms,in high school students.METHODS:This cross-sectional study was conducted in Erbil,Iraq.The target population consisted of high school students selected through a multi-stage cluster sampling method.Comprehensive visual examinations were performed for all students,including measurement of uncorrected and corrected visual acuity,objective and subjective refraction,and distance and near cover tests.NPC was evaluated using a single 6/12 visual target mounted on a centrally positioned Gulden fixation stick.Ocular symptoms were investigated through interviews.RESULTS:Of the 996 selected students,921 participated in the study.Of them,543(58.96%)were female,and their ages ranged from 13 to 22y.The prevalence of tropia was 3.58%[95%confidence interval(CI):2.38%-4.78%],observed in 3.44%of males and 3.68%of females.Exotropia(1.95%,95%CI:1.06%-2.85%)was more common than esotropia(1.52%,95%CI:0.73%-2.31%).The 15.42%(95%CI:13.09%-17.75%)of students had phoria.Exophoria(13.79%,95%CI:11.56%-16.02%)was significantly more prevalent than esophoria(1.63%,95%CI:0.81%-2.45%).The prevalence of NPC abnormality in the total study population was 24.97%(95%CI:22.18%-27.77%).It was 26.72%(95%CI:22.26%-31.18%)in males and 23.76%(95%CI:20.18%-27.34%)in females(P=0.307).The most common symptom in phoria was headache(86.62%,95%CI:81.02%-92.22%),followed by tired or sore eyes(61.97%,95%CI:53.99%-69.96%).The most common symptoms in tropia were blurry vision(93.94%,95%CI:79.77%-99.26%)and difficulty concentrating(87.88%,95%CI:76.74%-99.01%).CONCLUSION:Among Erbil’s high school students,the prevalence of strabismus,particularly the exodeviation type,is relatively high,and a significant percentage of students have NPC abnormalities.Addressing and correcting these binocular vision problems,due to their associated visual symptoms,can lead to an improvement in students’quality of life and academic performance. 展开更多
关键词 tropia PHORIA binocular vision crosssectional study STUDENT
原文传递
A Hybrid Vision Transformer with Attention Architecture for Efficient Lung Cancer Diagnosis
12
作者 Abdu Salam Fahd M.Aldosari +4 位作者 Donia Y.Badawood Farhan Amin Isabel de la Torre Gerardo Mendez Mezquita Henry Fabian Gongora 《Computers, Materials & Continua》 2026年第4期1129-1147,共19页
Lung cancer remains a major global health challenge,with early diagnosis crucial for improved patient survival.Traditional diagnostic techniques,including manual histopathology and radiological assessments,are prone t... Lung cancer remains a major global health challenge,with early diagnosis crucial for improved patient survival.Traditional diagnostic techniques,including manual histopathology and radiological assessments,are prone to errors and variability.Deep learning methods,particularly Vision Transformers(ViT),have shown promise for improving diagnostic accuracy by effectively extracting global features.However,ViT-based approaches face challenges related to computational complexity and limited generalizability.This research proposes the DualSet ViT-PSO-SVM framework,integrating aViTwith dual attentionmechanisms,Particle Swarm Optimization(PSO),and SupportVector Machines(SVM),aiming for efficient and robust lung cancer classification acrossmultiple medical image datasets.The study utilized three publicly available datasets:LIDC-IDRI,LUNA16,and TCIA,encompassing computed tomography(CT)scans and histopathological images.Data preprocessing included normalization,augmentation,and segmentation.Dual attention mechanisms enhanced ViT’s feature extraction capabilities.PSO optimized feature selection,and SVM performed classification.Model performance was evaluated on individual and combined datasets,benchmarked against CNN-based and standard ViT approaches.The DualSet ViT-PSO-SVM significantly outperformed existing methods,achieving superior accuracy rates of 97.85%(LIDC-IDRI),98.32%(LUNA16),and 96.75%(TCIA).Crossdataset evaluations demonstrated strong generalization capabilities and stability across similar imagingmodalities.The proposed framework effectively bridges advanced deep learning techniques with clinical applicability,offering a robust diagnostic tool for lung cancer detection,reducing complexity,and improving diagnostic reliability and interpretability. 展开更多
关键词 Deep learning artificial intelligence healthcare medical imaging vision transformer
在线阅读 下载PDF
From microstructure to performance optimization:Innovative applications of computer vision in materials science
13
作者 Chunyu Guo Xiangyu Tang +10 位作者 Yu’e Chen Changyou Gao Qinglin Shan Heyi Wei Xusheng Liu Chuncheng Lu Meixia Fu Enhui Wang Xinhong Liu Xinmei Hou Yanglong Hou 《International Journal of Minerals,Metallurgy and Materials》 2026年第1期94-115,共22页
The rapid advancements in computer vision(CV)technology have transformed the traditional approaches to material microstructure analysis.This review outlines the history of CV and explores the applications of deep-lear... The rapid advancements in computer vision(CV)technology have transformed the traditional approaches to material microstructure analysis.This review outlines the history of CV and explores the applications of deep-learning(DL)-driven CV in four key areas of materials science:microstructure-based performance prediction,microstructure information generation,microstructure defect detection,and crystal structure-based property prediction.The CV has significantly reduced the cost of traditional experimental methods used in material performance prediction.Moreover,recent progress made in generating microstructure images and detecting microstructural defects using CV has led to increased efficiency and reliability in material performance assessments.The DL-driven CV models can accelerate the design of new materials with optimized performance by integrating predictions based on both crystal and microstructural data,thereby allowing for the discovery and innovation of next-generation materials.Finally,the review provides insights into the rapid interdisciplinary developments in the field of materials science and future prospects. 展开更多
关键词 MICROSTRUCTURE deep learning computer vision performance prediction image generation
在线阅读 下载PDF
Human Activity Recognition Using Weighted Average Ensemble by Selected Deep Learning Models
14
作者 Waseem Akhtar Mahwish Ilyas +3 位作者 Romana Aziz Ghadah Aldehim Tassawar Iqbal Muhammad Ramzan 《Computer Modeling in Engineering & Sciences》 2026年第2期971-989,共19页
Human Activity Recognition(HAR)is a novel area for computer vision.It has a great impact on healthcare,smart environments,and surveillance while is able to automatically detect human behavior.It plays a vital role in ... Human Activity Recognition(HAR)is a novel area for computer vision.It has a great impact on healthcare,smart environments,and surveillance while is able to automatically detect human behavior.It plays a vital role in many applications,such as smart home,healthcare,human computer interaction,sports analysis,and especially,intelligent surveillance.In this paper,we propose a robust and efficient HAR system by leveraging deep learning paradigms,including pre-trained models,CNN architectures,and their average-weighted fusion.However,due to the diversity of human actions and various environmental influences,as well as a lack of data and resources,achieving high recognition accuracy remain elusive.In this work,a weighted average ensemble technique is employed to fuse three deep learning models:EfficientNet,ResNet50,and a custom CNN.The results of this study indicate that using a weighted average ensemble strategy for developing more effective HAR models may be a promising idea for detection and classification of human activities.Experiments by using the benchmark dataset proved that the proposed weighted ensemble approach outperformed existing approaches in terms of accuracy and other key performance measures.The combined average-weighted ensemble of pre-trained and CNN models obtained an accuracy of 98%,compared to 97%,96%,and 95%for the customized CNN,EfficientNet,and ResNet50 models,respectively. 展开更多
关键词 Artificial intelligence computer vision deep learning RECOGNITION human activity classification image processing
在线阅读 下载PDF
A comprehensive analysis of artificial intelligence,machine learning,deep learning and computer vision in food science
15
作者 Premkumar Borugadda Hemantha Kumar Kalluri 《Journal of Future Foods》 2026年第6期975-991,共17页
Providing safe and quality food is crucial for every household and is of extreme significance in the growth of any society.It is a complex procedure that deals with all issues focusing on the development of food proce... Providing safe and quality food is crucial for every household and is of extreme significance in the growth of any society.It is a complex procedure that deals with all issues focusing on the development of food processing from seed to harvest,storage,preparation,and consumption.This current paper seeks to demystify the importance of artificial intelligence,machine learning(ML),deep learning(DL),and computer vision(CV)in ensuring food safety and quality.By stressing the importance of these technologies,the audience will feel reassured and confident in their potential.These are very handy for such problems,giving assurance over food safety.CV is incredibly noble in today's generation because it improves food processing quality and positively impacts firms and researchers.Thus,at the present production stage,rich in image processing and computer visioning is incorporated into all facets of food production.In this field,DL and ML are implemented to identify the type of food in addition to quality.Concerning data and result-oriented perceptions,one has found similarities regarding various approaches.As a result,the findings of this study will be helpful for scholars looking for a proper approach to identify the quality of food offered.It helps to indicate which food products have been discussed by other scholars and lets the reader know papers by other scholars inclined to research further.Also,DL is accurately integrated with identifying the quality and safety of foods in the market.This paper describes the current practices and concerns of ML,DL,and probable trends for its future development. 展开更多
关键词 Artificial intelligence Computer vision Deep learning Food quality Food recognition Machine learning
在线阅读 下载PDF
Hybrid Quantum Gate Enabled CNN Framework with Optimized Features for Human-Object Detection and Recognition
16
作者 Nouf Abdullah Almujally Tanvir Fatima Naik Bukht +3 位作者 Shuaa S.Alharbi Asaad Algarni Ahmad Jalal Jeongmin Park 《Computers, Materials & Continua》 2026年第4期2254-2271,共18页
Recognising human-object interactions(HOI)is a challenging task for traditional machine learning models,including convolutional neural networks(CNNs).Existing models show limited transferability across complex dataset... Recognising human-object interactions(HOI)is a challenging task for traditional machine learning models,including convolutional neural networks(CNNs).Existing models show limited transferability across complex datasets such as D3D-HOI and SYSU 3D HOI.The conventional architecture of CNNs restricts their ability to handle HOI scenarios with high complexity.HOI recognition requires improved feature extraction methods to overcome the current limitations in accuracy and scalability.This work proposes a Novel quantum gate-enabled hybrid CNN(QEH-CNN)for effectiveHOI recognition.Themodel enhancesCNNperformance by integrating quantumcomputing components.The framework begins with bilateral image filtering,followed bymulti-object tracking(MOT)and Felzenszwalb superpixel segmentation.A watershed algorithm refines object boundaries by cleaning merged superpixels.Feature extraction combines a histogram of oriented gradients(HOG),Global Image Statistics for Texture(GIST)descriptors,and a novel 23-joint keypoint extractionmethod using relative joint angles and joint proximitymeasures.A fuzzy optimization process refines the extracted features before feeding them into the QEH-CNNmodel.The proposed model achieves 95.06%accuracy on the 3D-D3D-HOI dataset and 97.29%on the SYSU3DHOI dataset.Theintegration of quantum computing enhances feature optimization,leading to improved accuracy and overall model efficiency. 展开更多
关键词 Pattern recognition image segmentation computer vision object detection
在线阅读 下载PDF
A generation-based defect detection system for rail transit infrastructure
17
作者 Xinyu Zheng Lingfeng Zhang +1 位作者 Yuhao Luo Tiange Wang 《High-Speed Railway》 2026年第1期1-9,共9页
The use of Unmanned Aerial Vehicles(UAVs)for defect detection on railway slopes is becoming increasingly widespread due to their ability to capture high-resolution images over large,inaccessible,and topographically co... The use of Unmanned Aerial Vehicles(UAVs)for defect detection on railway slopes is becoming increasingly widespread due to their ability to capture high-resolution images over large,inaccessible,and topographically complex areas.However,current UAV-based detection methods face several critical limitations,including constrained deployment frequency,limited availability of annotated defect data,and the lack of mature risk assessment frameworks.To address these challenges,this study introduces a novel approach that integrates diffusion models with Large Language Models(LLMs)to generate highquality synthetic defect images tailored to railway slope scenarios.Furthermore,an improved transformerbased architecture is proposed,incorporating attention mechanisms and LLM-guided diffusion-generated imagery to enhance defect recognition performance under complex environmental conditions.Experimental evaluations conducted on a dataset of 300 field-collected images from high-risk railway slopes demonstrate that the proposed method significantly outperforms existing baselines in terms of precision,recall,and robustness,indicating strong applicability for real-world railway infrastructure monitoring and disaster prevention. 展开更多
关键词 RAILWAY Large language models Computer vision Object detection
在线阅读 下载PDF
Ultrathin Gallium Nitride Quantum-Disk-in-Nanowire-Enabled Reconfigurable Bioinspired Sensor for High-Accuracy Human Action Recognition
18
作者 Zhixiang Gao Xin Ju +10 位作者 Huabin Yu Wei Chen Xin Liu Yuanmin Luo Yang Kang Dongyang Luo JiKai Yao Wengang Gu Muhammad Hunain Memon Yong Yan Haiding Sun 《Nano-Micro Letters》 2026年第2期439-453,共15页
Human action recognition(HAR)is crucial for the development of efficient computer vision,where bioinspired neuromorphic perception visual systems have emerged as a vital solution to address transmission bottlenecks ac... Human action recognition(HAR)is crucial for the development of efficient computer vision,where bioinspired neuromorphic perception visual systems have emerged as a vital solution to address transmission bottlenecks across sensor-processor interfaces.However,the absence of interactions among versatile biomimicking functionalities within a single device,which was developed for specific vision tasks,restricts the computational capacity,practicality,and scalability of in-sensor vision computing.Here,we propose a bioinspired vision sensor composed of a Ga N/Al N-based ultrathin quantum-disks-in-nanowires(QD-NWs)array to mimic not only Parvo cells for high-contrast vision and Magno cells for dynamic vision in the human retina but also the synergistic activity between the two cells for in-sensor vision computing.By simply tuning the applied bias voltage on each QD-NW-array-based pixel,we achieve two biosimilar photoresponse characteristics with slow and fast reactions to light stimuli that enhance the in-sensor image quality and HAR efficiency,respectively.Strikingly,the interplay and synergistic interaction of the two photoresponse modes within a single device markedly increased the HAR recognition accuracy from 51.4%to 81.4%owing to the integrated artificial vision system.The demonstration of an intelligent vision sensor offers a promising device platform for the development of highly efficient HAR systems and future smart optoelectronics. 展开更多
关键词 GaN nanowire Quantum-confined Stark effect Voltage-tunable photoresponse Bioinspired sensor Artificial vision system
在线阅读 下载PDF
Deep Learning for Brain Tumor Segmentation and Classification: A Systematic Review of Methods and Trends
19
作者 Ameer Hamza Robertas Damaševicius 《Computers, Materials & Continua》 2026年第1期132-172,共41页
This systematic review aims to comprehensively examine and compare deep learning methods for brain tumor segmentation and classification using MRI and other imaging modalities,focusing on recent trends from 2022 to 20... This systematic review aims to comprehensively examine and compare deep learning methods for brain tumor segmentation and classification using MRI and other imaging modalities,focusing on recent trends from 2022 to 2025.The primary objective is to evaluate methodological advancements,model performance,dataset usage,and existing challenges in developing clinically robust AI systems.We included peer-reviewed journal articles and highimpact conference papers published between 2022 and 2025,written in English,that proposed or evaluated deep learning methods for brain tumor segmentation and/or classification.Excluded were non-open-access publications,books,and non-English articles.A structured search was conducted across Scopus,Google Scholar,Wiley,and Taylor&Francis,with the last search performed in August 2025.Risk of bias was not formally quantified but considered during full-text screening based on dataset diversity,validation methods,and availability of performance metrics.We used narrative synthesis and tabular benchmarking to compare performance metrics(e.g.,accuracy,Dice score)across model types(CNN,Transformer,Hybrid),imaging modalities,and datasets.A total of 49 studies were included(43 journal articles and 6 conference papers).These studies spanned over 9 public datasets(e.g.,BraTS,Figshare,REMBRANDT,MOLAB)and utilized a range of imaging modalities,predominantly MRI.Hybrid models,especially ResViT and UNetFormer,consistently achieved high performance,with classification accuracy exceeding 98%and segmentation Dice scores above 0.90 across multiple studies.Transformers and hybrid architectures showed increasing adoption post2023.Many studies lacked external validation and were evaluated only on a few benchmark datasets,raising concerns about generalizability and dataset bias.Few studies addressed clinical interpretability or uncertainty quantification.Despite promising results,particularly for hybrid deep learning models,widespread clinical adoption remains limited due to lack of validation,interpretability concerns,and real-world deployment barriers. 展开更多
关键词 Brain tumor segmentation brain tumor classification deep learning vision transformers hybrid models
在线阅读 下载PDF
上一页 1 2 116 下一页 到第
使用帮助 返回顶部