期刊文献+
共找到39,382篇文章
< 1 2 250 >
每页显示 20 50 100
Distant Supervision方法中对齐数据的聚类去噪 被引量:2
1
作者 朱兆龙 欧阳丹彤 叶育鑫 《吉林大学学报(理学版)》 CAS CSCD 北大核心 2014年第2期280-284,共5页
基于聚类和寻找表达语义关系的句子模式,提出一种基于聚类去噪的distant supervision方法,解决了传统distant supervision方法在知识库和文本集对齐阶段会引入噪声数据的问题,得到了一种改进的distant supervision方法.实验结果表明,该... 基于聚类和寻找表达语义关系的句子模式,提出一种基于聚类去噪的distant supervision方法,解决了传统distant supervision方法在知识库和文本集对齐阶段会引入噪声数据的问题,得到了一种改进的distant supervision方法.实验结果表明,该方法可有效提高关系抽取任务的准确率. 展开更多
关键词 关系抽取 聚类 distant supervision方法 去噪
在线阅读 下载PDF
基于主题模型的中文Distant Supervision噪声标注识别方法 被引量:1
2
作者 刘绍毓 李弼程 +2 位作者 周杰 席耀一 唐浩浩 《信息工程大学学报》 2016年第3期303-308,共6页
针对Distant Supervision关系抽取方法训练语料存在大量噪声的问题,提出一种基于主题模型的噪声标注识别方法。该方法首先分析了中文Distant Supervision实体关系抽取方法面临的关系句子实例结构复杂的问题,然后利用自定义的模式以及模... 针对Distant Supervision关系抽取方法训练语料存在大量噪声的问题,提出一种基于主题模型的噪声标注识别方法。该方法首先分析了中文Distant Supervision实体关系抽取方法面临的关系句子实例结构复杂的问题,然后利用自定义的模式以及模式聚类实现模式表示与聚合,最后使用主题模型识别噪声标注。实验结果表明,文章方法能有效识别噪声标注,用滤除噪声标注后的数据训练实体关系抽取模型,实验证明经过噪声滤除后实体关系抽取性能得到显著改善。 展开更多
关键词 distant SUPERvision 关系抽取 噪声标注识别 主题模型 关系模式
在线阅读 下载PDF
Using Distant Supervision and Paragraph Vector for Large Scale Relation Extraction
3
作者 Yuming Liu Weiran Xu 《国际计算机前沿大会会议论文集》 2015年第B12期45-47,共3页
Distant supervision has the ability to generate a huge amount training data.Recently,the multi-instance multi-label learning is imported to distant supervision to combat noisy data and improve the performance of relat... Distant supervision has the ability to generate a huge amount training data.Recently,the multi-instance multi-label learning is imported to distant supervision to combat noisy data and improve the performance of relation extraction.But multi-instance multi-label learning only uses hidden variables when inference relation between entities,which could not make full use of training data.Besides,traditional lexical and syntactic features are defective reflecting domain knowledge and global information of sentence,which limits the system’s performance.This paper presents a novel approach for multi-instance multilabel learning,which takes the idea of fuzzy classification.We use cluster center as train-data and in this way we can adequately utilize sentencelevel features.Meanwhile,we extend feature set by paragraph vector,which carries semantic information of sentences.We conduct an extensive empirical study to verify our contributions.The result shows our method is superior to the state-of-the-art distant supervised baseline. 展开更多
关键词 RELATION EXTRACTION distant SUPERvision PARAGRAPH VECTOR
在线阅读 下载PDF
From microstructure to performance optimization:Innovative applications of computer vision in materials science
4
作者 Chunyu Guo Xiangyu Tang +10 位作者 Yu’e Chen Changyou Gao Qinglin Shan Heyi Wei Xusheng Liu Chuncheng Lu Meixia Fu Enhui Wang Xinhong Liu Xinmei Hou Yanglong Hou 《International Journal of Minerals,Metallurgy and Materials》 2026年第1期94-115,共22页
The rapid advancements in computer vision(CV)technology have transformed the traditional approaches to material microstructure analysis.This review outlines the history of CV and explores the applications of deep-lear... The rapid advancements in computer vision(CV)technology have transformed the traditional approaches to material microstructure analysis.This review outlines the history of CV and explores the applications of deep-learning(DL)-driven CV in four key areas of materials science:microstructure-based performance prediction,microstructure information generation,microstructure defect detection,and crystal structure-based property prediction.The CV has significantly reduced the cost of traditional experimental methods used in material performance prediction.Moreover,recent progress made in generating microstructure images and detecting microstructural defects using CV has led to increased efficiency and reliability in material performance assessments.The DL-driven CV models can accelerate the design of new materials with optimized performance by integrating predictions based on both crystal and microstructural data,thereby allowing for the discovery and innovation of next-generation materials.Finally,the review provides insights into the rapid interdisciplinary developments in the field of materials science and future prospects. 展开更多
关键词 MICROSTRUCTURE deep learning computer vision performance prediction image generation
在线阅读 下载PDF
Privacy-Preserving Gender-Based Customer Behavior Analytics in Retail Spaces Using Computer Vision
5
作者 Ginanjar Suwasono Adi Samsul Huda +4 位作者 Griffani Megiyanto Rahmatullah Dodit Suprianto Dinda Qurrota Aini Al-Sefy Ivon Sandya Sari Putri Lalu Tri Wijaya Nata Kusuma 《Computers, Materials & Continua》 2026年第1期1839-1861,共23页
In the competitive retail industry of the digital era,data-driven insights into gender-specific customer behavior are essential.They support the optimization of store performance,layout design,product placement,and ta... In the competitive retail industry of the digital era,data-driven insights into gender-specific customer behavior are essential.They support the optimization of store performance,layout design,product placement,and targeted marketing.However,existing computer vision solutions often rely on facial recognition to gather such insights,raising significant privacy and ethical concerns.To address these issues,this paper presents a privacypreserving customer analytics system through two key strategies.First,we deploy a deep learning framework using YOLOv9s,trained on the RCA-TVGender dataset.Cameras are positioned perpendicular to observation areas to reduce facial visibility while maintaining accurate gender classification.Second,we apply AES-128 encryption to customer position data,ensuring secure access and regulatory compliance.Our system achieved overall performance,with 81.5%mAP@50,77.7%precision,and 75.7%recall.Moreover,a 90-min observational study confirmed the system’s ability to generate privacy-protected heatmaps revealing distinct behavioral patterns between male and female customers.For instance,women spent more time in certain areas and showed interest in different products.These results confirm the system’s effectiveness in enabling personalized layout and marketing strategies without compromising privacy. 展开更多
关键词 Business intelligence customer behavior privacy-preserving analytics computer vision deep learning smart retail gender recognition heatmap privacy RCA-TVGender dataset
在线阅读 下载PDF
基于改进Vision Transformer的水稻叶片病害图像识别
6
作者 朱周华 周怡纳 +1 位作者 侯智杰 田成源 《电子测量技术》 北大核心 2025年第10期153-160,共8页
水稻叶片病害智能识别在现代农业生产中具有重要意义。针对传统Vision Transformer网络缺乏归纳偏置,难以有效捕捉图像局部细节特征的问题,提出了一种改进的Vision Transformer模型。该模型通过引入内在归纳偏置,增强了对多尺度上下文... 水稻叶片病害智能识别在现代农业生产中具有重要意义。针对传统Vision Transformer网络缺乏归纳偏置,难以有效捕捉图像局部细节特征的问题,提出了一种改进的Vision Transformer模型。该模型通过引入内在归纳偏置,增强了对多尺度上下文以及局部与全局依赖关系的建模能力,同时降低了对大规模数据集的需求。此外,Vision Transformer中的多层感知器模块被Kolmogorov-Arnold网络结构取代,从而提升了模型对复杂特征的提取能力和可解释性。实验结果表明,所提模型在水稻叶片病害识别任务中取得了优异的性能,识别准确率达到了98.62%,较原始ViT模型提升了6.2%,显著提高了对水稻叶片病害的识别性能。 展开更多
关键词 水稻叶片病害 图像识别 vision Transformer网络 归纳偏置 局部特征
原文传递
Vision Transformer模型在中医舌诊图像分类中的应用研究
7
作者 周坚和 王彩雄 +3 位作者 李炜 周晓玲 张丹璇 吴玉峰 《广西科技大学学报》 2025年第5期89-98,共10页
舌诊作为中医望诊中的一项重要且常规的检查手段,在中医临床诊断中发挥着不可或缺的作用。为突破传统舌诊依赖主观经验及卷积神经网络(convolutional neural network,CNN)模型分类性能不足的局限,本文基于高质量舌象分类数据集,提出基于... 舌诊作为中医望诊中的一项重要且常规的检查手段,在中医临床诊断中发挥着不可或缺的作用。为突破传统舌诊依赖主观经验及卷积神经网络(convolutional neural network,CNN)模型分类性能不足的局限,本文基于高质量舌象分类数据集,提出基于Vision Transformer(ViT)深度学习模型,通过预训练与微调策略优化特征提取能力,并结合数据增强技术解决类别分布不平衡问题。实验结果表明,该模型在6项关键舌象特征分类任务中,5项指标的准确率(苔色85.6%、瘀斑98.0%、质地99.6%、舌色96.6%、裂纹87.8%)显著优于现有CNN方法(如ResNet50对应准确率分别为78.0%、91.0%、92.0%、68.0%、80.1%),验证了该模型在突破传统性能瓶颈、提升中医临床智能诊断可靠性方面的有效性和应用潜力。 展开更多
关键词 舌诊 vision Transformer(ViT) 深度学习 医学图像分类
在线阅读 下载PDF
Steel Surface Defect Detection Using Learnable Memory Vision Transformer
8
作者 Syed Tasnimul Karim Ayon Farhan Md.Siraj Jia Uddin 《Computers, Materials & Continua》 SCIE EI 2025年第1期499-520,共22页
This study investigates the application of Learnable Memory Vision Transformers(LMViT)for detecting metal surface flaws,comparing their performance with traditional CNNs,specifically ResNet18 and ResNet50,as well as o... This study investigates the application of Learnable Memory Vision Transformers(LMViT)for detecting metal surface flaws,comparing their performance with traditional CNNs,specifically ResNet18 and ResNet50,as well as other transformer-based models including Token to Token ViT,ViT withoutmemory,and Parallel ViT.Leveraging awidely-used steel surface defect dataset,the research applies data augmentation and t-distributed stochastic neighbor embedding(t-SNE)to enhance feature extraction and understanding.These techniques mitigated overfitting,stabilized training,and improved generalization capabilities.The LMViT model achieved a test accuracy of 97.22%,significantly outperforming ResNet18(88.89%)and ResNet50(88.90%),aswell as the Token to TokenViT(88.46%),ViT without memory(87.18),and Parallel ViT(91.03%).Furthermore,LMViT exhibited superior training and validation performance,attaining a validation accuracy of 98.2%compared to 91.0%for ResNet 18,96.0%for ResNet50,and 89.12%,87.51%,and 91.21%for Token to Token ViT,ViT without memory,and Parallel ViT,respectively.The findings highlight the LMViT’s ability to capture long-range dependencies in images,an areawhere CNNs struggle due to their reliance on local receptive fields and hierarchical feature extraction.The additional transformer-based models also demonstrate improved performance in capturing complex features over CNNs,with LMViT excelling particularly at detecting subtle and complex defects,which is critical for maintaining product quality and operational efficiency in industrial applications.For instance,the LMViT model successfully identified fine scratches and minor surface irregularities that CNNs often misclassify.This study not only demonstrates LMViT’s potential for real-world defect detection but also underscores the promise of other transformer-based architectures like Token to Token ViT,ViT without memory,and Parallel ViT in industrial scenarios where complex spatial relationships are key.Future research may focus on enhancing LMViT’s computational efficiency for deployment in real-time quality control systems. 展开更多
关键词 Learnable Memory vision Transformer(LMViT) Convolutional Neural Networks(CNN) metal surface defect detection deep learning computer vision image classification learnable memory gradient clipping label smoothing t-SNE visualization
在线阅读 下载PDF
Causes and factors associated with vision impairment in the elderly population in Mangxin town,Kashgar region,Xinjiang,China
9
作者 Lingling Chen Ruilian Liao +6 位作者 Yuanyuan Liu Ling Jin Jun Fu Xun Wang Hongwen Jiang Lin Ding Qianyun Chen 《Eye Science》 2025年第1期12-24,共13页
Objective:This study aimed to investigate the prevalence,causes,and influencing factors of vision impairment in the elderly population aged 60 years and above in Mangxin Town,Kashgar region,Xinjiang,China.Located in a... Objective:This study aimed to investigate the prevalence,causes,and influencing factors of vision impairment in the elderly population aged 60 years and above in Mangxin Town,Kashgar region,Xinjiang,China.Located in a region characterized by intense ultraviolet radiation and arid climatic conditions,Mangxin Town presents unique environmental challenges that may exacerbate ocular health issues.Despite the global emphasis on addressing vision impairment among aging populations,there remains a paucity of updated and region-specific data in Xinjiang,necessitating this comprehensive assessment to inform targeted interventions.Methods:A cross-sectional study was conducted from May to June 2024,involving 1,311 elderly participants(76.76%participation rate)out of a total eligible population of 1,708 individuals aged≥60 years.Participants underwent detailed ocular examinations,including assessments of uncorrected visual acuity(UVA)and best-corrected visual acuity(BCVA)using standard logarithmic charts,slit-lamp biomicroscopy,optical coherence tomography(OCT,Topcon DRI OCT Triton),fundus photography,and intraocular pressure measurement(Canon TX-20 Tonometer).A multidisciplinary team of 10 ophthalmologists and 2 local village doctors,trained rigorously in standardized protocols,ensured consistent data collection.Demographic,lifestyle,and medical history data were collected via questionnaires.Statistical analyses,performed using STATA 16,included multivariate logistic regression to identify risk factors,with significance defined as P<0.05.Results:The overall prevalence of vision impairment was 13.21%(95%CI:11.37%-15.04%),with low vision at 11.76%(95%CI:10.01%-13.50%)and blindness at 1.45%(95%CI:0.80%-2.10%).Cataract emerged as the leading cause,responsible for 68.20%of cases,followed by glaucoma(5.80%),optic atrophy(5.20%),and age-related macular degeneration(2.90%).Vision impairment prevalence escalated significantly with age:7.74%in the 60–69 age group,17.79%in 70–79,and 33.72%in those≥80.Males exhibited higher prevalence than females(15.84%vs.10.45%,P=0.004).Multivariate analysis revealed age≥80 years(OR=6.43,95%CI:3.79%-10.90%),male sex(OR=0.53,95%CI:0.34%-0.83%),and daily exercise(OR=0.44,95%CI:0.20%-0.95%)as significant factors.History of eye disease showed a non-significant trend toward increased risk(OR=1.49,P=0.107).Education level,income,and smoking status showed no significant associations.Conclusions:This study underscores cataract as the predominant cause of vision impairment in Mangxin Town’s elderly population,with age and sex as critical determinants.The findings align with global patterns but highlight region-specific challenges,such as environmental factors contributing to cataract prevalence.Public health strategies should prioritize improving access to cataract surgery,enhancing grassroots ophthalmic infrastructure,and integrating portable screening technologies for early detection of fundus diseases.Additionally,promoting health education on UV protection and lifestyle modifications,such as regular exercise,may mitigate risks.Future research should expand to broader regions in Xinjiang,employ advanced diagnostic tools for complex conditions like glaucoma,and explore longitudinal trends to refine intervention strategies.These efforts are vital to reducing preventable blindness and improving quality of life for aging populations in underserved areas. 展开更多
关键词 low vision BLINDNESS vision impairment elderly XINJIANG CATARACT
暂未订购
Automated Concrete Bridge Damage Detection Using an Efficient Vision Transformer-Enhanced Anchor-Free YOLO
10
作者 Xiaofei Yang Enrique del Rey Castillo +3 位作者 Yang Zou Liam Wotherspoon Jianxi Yang Hao Li 《Engineering》 2025年第8期311-326,共16页
Deep learning techniques have recently been the most popular method for automatically detecting bridge damage captured by unmanned aerial vehicles(UAVs).However,their wider application to real-world scenarios is hinde... Deep learning techniques have recently been the most popular method for automatically detecting bridge damage captured by unmanned aerial vehicles(UAVs).However,their wider application to real-world scenarios is hindered by three challenges:①defect scale variance,motion blur,and strong illumination significantly affect the accuracy and reliability of damage detectors;②existing commonly used anchor-based damage detectors struggle to effectively generalize to harsh real-world scenarios;and③convolutional neural networks(CNNs)lack the capability to model long-range dependencies across the entire image.This paper presents an efficient Vision Transformer-enhanced anchor-free YOLO(you only look once)method to address these challenges.First,a concrete bridge damage dataset was established,augmented by motion blur and varying brightness.Four key enhancements were then applied to an anchor-based YOLO method:①Four detection heads were introduced to alleviate the multi-scale damage detection issue;②decoupled heads were employed to address the conflict between classification and bounding box regression tasks inherent in the original coupled head design;③an anchor-free mechanism was incorporated to reduce the computational complexity and improve generalization to real-world scenarios;and④a novel Vision Transformer block,C3MaxViT,was added to enable CNNs to model long-range dependencies.These enhancements were integrated into an advanced anchor-based YOLOv5l algorithm,and the proposed Vision Transformer-enhanced anchor-free YOLO method was then compared against cutting-edge damage detection methods.The experimental results demonstrated the effectiveness of the proposed method,with an increase of 8.1%in mean average precision at intersection over union threshold of 0.5(mAP_(50))and an improvement of 8.4%in mAP@[0.5:.05:.95]respectively.Furthermore,extensive ablation studies revealed that the four detection heads,decoupled head design,anchor-free mechanism,and C3MaxViT contributed improvements of 2.4%,1.2%,2.6%,and 1.9%in mAP50,respectively. 展开更多
关键词 Computer vision Deep learning techniques vision Transformer Object detection Bridge visual inspection
在线阅读 下载PDF
AARPose:Real-time and accurate drogue pose measurement based on monocular vision for autonomous aerial refueling
11
作者 Shuyuan WEN Yang GAO +3 位作者 Bingrui HU Zhongyu LUO Zhenzhong WEI Guangjun ZHANG 《Chinese Journal of Aeronautics》 2025年第6期552-572,共21页
Real-time and accurate drogue pose measurement during docking is basic and critical for Autonomous Aerial Refueling(AAR).Vision measurement is the best practicable technique,but its measurement accuracy and robustness... Real-time and accurate drogue pose measurement during docking is basic and critical for Autonomous Aerial Refueling(AAR).Vision measurement is the best practicable technique,but its measurement accuracy and robustness are easily affected by limited computing power of airborne equipment,complex aerial scenes and partial occlusion.To address the above challenges,we propose a novel drogue keypoint detection and pose measurement algorithm based on monocular vision,and realize real-time processing on airborne embedded devices.Firstly,a lightweight network is designed with structural re-parameterization to reduce computational cost and improve inference speed.And a sub-pixel level keypoints prediction head and loss functions are adopted to improve keypoint detection accuracy.Secondly,a closed-form solution of drogue pose is computed based on double spatial circles,followed by a nonlinear refinement based on Levenberg-Marquardt optimization.Both virtual simulation and physical simulation experiments have been used to test the proposed method.In the virtual simulation,the mean pixel error of the proposed method is 0.787 pixels,which is significantly superior to that of other methods.In the physical simulation,the mean relative measurement error is 0.788%,and the mean processing time is 13.65 ms on embedded devices. 展开更多
关键词 Autonomous aerial refueling vision measurement Deep learning REAL-TIME LIGHTWEIGHT ACCURATE Monocular vision Drogue pose measurement
原文传递
Long-Term Vision in a Rapidly Changing World
12
作者 JOHN QUELCH 《China Today》 2025年第8期43-45,共3页
China’s five-year plans crystallize a governance model that merges long-term strategic vision with adaptive execution.AS China prepares to unveil its 15th Five-Year Plan in 2026,policymakers,investors,and scholars ar... China’s five-year plans crystallize a governance model that merges long-term strategic vision with adaptive execution.AS China prepares to unveil its 15th Five-Year Plan in 2026,policymakers,investors,and scholars around the world are watching closely.For over 70 years,these plans have guided the country’s economic and social development. 展开更多
关键词 economic social development long term vision China economic development five year plans adaptive execution strategic vision governance model
在线阅读 下载PDF
Dynamic Vision-Enabled Intelligent Micro-Vibration Estimation Method with Spatiotemporal Pattern Consistency
13
作者 Shupeng Yu Xiang Li +2 位作者 Yaguo Lei Bin Yang Naipeng Li 《IEEE/CAA Journal of Automatica Sinica》 2025年第11期2359-2361,共3页
Dear Editor,This letter proposes a novel dynamic vision-enabled intelligent micro-vibration estimation method with spatiotemporal pattern consistency.Inspired by biological vision,dynamic vision data are collected by ... Dear Editor,This letter proposes a novel dynamic vision-enabled intelligent micro-vibration estimation method with spatiotemporal pattern consistency.Inspired by biological vision,dynamic vision data are collected by the event camera,which is able to capture the micro-vibration information of mechanical equipment,due to the significant advantage of extremely high temporal sampling frequency. 展开更多
关键词 dynamic vision micro vibration mechanical equipmentdue spatiotemporal pattern consistency biological visiondynamic vision data intelligent estimation event camera event camerawhich
在线阅读 下载PDF
Vision care and the sustainable development goals: a brief review and suggested research agenda
14
作者 Nathan Congdon Brad Wong +1 位作者 Xinxing Guo Graeme MacKenzie 《Eye Science》 2025年第2期103-110,共8页
Blindness affected 45 million people globally in 2021,and moderate to severe vision loss a further 295 million.[1]The most common causes,cataract and uncorrected refractive error,are generally the easiest to treat,and... Blindness affected 45 million people globally in 2021,and moderate to severe vision loss a further 295 million.[1]The most common causes,cataract and uncorrected refractive error,are generally the easiest to treat,and are among the most cost-effective procedures in all of medicine and international development.[1-2]Thus,vision impairment is both extremely common and,in principle,readily manageable. 展开更多
关键词 vision care CATARACT cost effective procedures uncorrected refractive error BLINDNESS moderate severe vision loss uncorrected refractive errorare sustainable development goals
暂未订购
Vision Transformer深度学习模型在前列腺癌识别中的价值
15
作者 李梦娟 金龙 +2 位作者 尹胜男 计一丁 丁宁 《中国医学计算机成像杂志》 北大核心 2025年第3期396-401,共6页
目的:旨在探讨Vision Transformer(ViT)深度学习模型在前列腺癌(PCa)识别中的应用价值.方法:回顾性分析了480例接受磁共振成像(MRI)检查的患者影像资料.采用TotalSegmentator模型自动分割前列腺区域,通过ViT深度学习方法分别构建基于T2... 目的:旨在探讨Vision Transformer(ViT)深度学习模型在前列腺癌(PCa)识别中的应用价值.方法:回顾性分析了480例接受磁共振成像(MRI)检查的患者影像资料.采用TotalSegmentator模型自动分割前列腺区域,通过ViT深度学习方法分别构建基于T2加权像(T2WI)、基于表观弥散系数(ADC)图和基于两者结合的三个ViT模型.结果:在PCa的识别能力上,结合模型在训练组和测试组上的受试者工作特征(ROC)曲线下面积(AUC)分别为0.961和0.980,优于仅基于单一成像序列构建的ViT模型.在基于单一序列构建的ViT模型中,基于ADC图的模型相较于基于T2WI的模型表现更佳.此外,决策曲线分析显示结合模型提供了更大的临床效益.结论:ViT深度学习模型在前列腺癌识别中具有较高的诊断准确性和潜在价值. 展开更多
关键词 vision Transformer 深度学习 前列腺癌 自动分割 磁共振成像
暂未订购
基于改进Vision Transformer的遥感图像分类研究 被引量:1
16
作者 李宗轩 冷欣 +1 位作者 章磊 陈佳凯 《林业机械与木工设备》 2025年第6期31-35,共5页
通过遥感图像分类能够快速有效获取森林区域分布,为林业资源管理监测提供支持。Vision Transformer(ViT)凭借优秀的全局信息捕捉能力在遥感图像分类任务中广泛应用。但Vision Transformer在浅层特征提取时会冗余捕捉其他局部特征而无法... 通过遥感图像分类能够快速有效获取森林区域分布,为林业资源管理监测提供支持。Vision Transformer(ViT)凭借优秀的全局信息捕捉能力在遥感图像分类任务中广泛应用。但Vision Transformer在浅层特征提取时会冗余捕捉其他局部特征而无法有效捕获关键特征,并且Vision Transformer在将图像分割为patch过程中可能会导致边缘等细节信息的丢失,从而影响分类准确性。针对上述问题提出一种改进Vision Transformer,引入了STA(Super Token Attention)注意力机制来增强Vision Transformer对关键特征信息的提取并减少计算冗余度,还通过加入哈尔小波下采样(Haar Wavelet Downsampling)在减少细节信息丢失的同时增强对图像不同尺度局部和全局信息的捕获能力。通过实验在AID数据集上达到了92.98%的总体准确率,证明了提出方法的有效性。 展开更多
关键词 遥感图像分类 vision Transformer 哈尔小波下采样 STA注意力机制
在线阅读 下载PDF
卷积增强Vision Mamba模型的构建及其应用 被引量:1
17
作者 俞焕友 范静 黄凡 《计算机技术与发展》 2025年第8期45-52,共8页
针对Vision Mamba(Vim)模型的局限性,该文提出了一种改进的模型——Convolutional Vision Mamba(CvM)。此模型通过摒弃Vim中的图形切割和位置编码机制,转而采用卷积操作进行替代,以实现对全局视觉信息的更高效处理。同时,此模型对Vim模... 针对Vision Mamba(Vim)模型的局限性,该文提出了一种改进的模型——Convolutional Vision Mamba(CvM)。此模型通过摒弃Vim中的图形切割和位置编码机制,转而采用卷积操作进行替代,以实现对全局视觉信息的更高效处理。同时,此模型对Vim模型中的位置嵌入模块进行了优化,以解决其固有的高计算量和内存消耗问题。进而,该文将CvM模型应用于医学图像分类领域,选用了血细胞图像、脑肿瘤图像、胸部CT扫描、病理性近视眼底图像以及肺炎X射线影像等数据集进行实验。实验结果表明,与Vim模型及其他5个神经网络模型相比,CvM模型在准确率上表现更为出色,在内存占用和参数数量方面也展现出明显的优势。消融实验表明,深度可分离卷积比标准卷积使用的参数和显存占用更少,而且在血细胞图像、脑肿瘤图像等医学图像分类上,准确率还有了显著提升。这些结果充分说明了CvM模型的优势和可行性。 展开更多
关键词 深度学习 vision Mamba 卷积神经网络 深度可分离卷积 医学图像分类
在线阅读 下载PDF
A Hybrid Approach for Pavement Crack Detection Using Mask R-CNN and Vision Transformer Model 被引量:2
18
作者 Shorouq Alshawabkeh Li Wu +2 位作者 Daojun Dong Yao Cheng Liping Li 《Computers, Materials & Continua》 SCIE EI 2025年第1期561-577,共17页
Detecting pavement cracks is critical for road safety and infrastructure management.Traditional methods,relying on manual inspection and basic image processing,are time-consuming and prone to errors.Recent deep-learni... Detecting pavement cracks is critical for road safety and infrastructure management.Traditional methods,relying on manual inspection and basic image processing,are time-consuming and prone to errors.Recent deep-learning(DL)methods automate crack detection,but many still struggle with variable crack patterns and environmental conditions.This study aims to address these limitations by introducing the Masker Transformer,a novel hybrid deep learning model that integrates the precise localization capabilities of Mask Region-based Convolutional Neural Network(Mask R-CNN)with the global contextual awareness of Vision Transformer(ViT).The research focuses on leveraging the strengths of both architectures to enhance segmentation accuracy and adaptability across different pavement conditions.We evaluated the performance of theMaskerTransformer against other state-of-theartmodels such asU-Net,TransformerU-Net(TransUNet),U-NetTransformer(UNETr),SwinU-NetTransformer(Swin-UNETr),You Only Look Once version 8(YoloV8),and Mask R-CNN using two benchmark datasets:Crack500 and DeepCrack.The findings reveal that the MaskerTransformer significantly outperforms the existing models,achieving the highest Dice SimilarityCoefficient(DSC),precision,recall,and F1-Score across both datasets.Specifically,the model attained a DSC of 80.04%on Crack500 and 91.37%on DeepCrack,demonstrating superior segmentation accuracy and reliability.The high precision and recall rates further substantiate its effectiveness in real-world applications,suggesting that the Masker Transformer can serve as a robust tool for automated pavement crack detection,potentially replacing more traditional methods. 展开更多
关键词 Pavement crack segmentation TRANSPORTATION deep learning vision transformer Mask R-CNN image segmentation
在线阅读 下载PDF
基于Vision Transformer的混合型晶圆图缺陷模式识别
19
作者 李攀 娄莉 《现代信息科技》 2025年第19期26-30,共5页
晶圆测试作为芯片生产过程中重要的一环,晶圆图缺陷模式的识别和分类对改进前端制造工艺具有关键作用。在实际生产过程中,各类缺陷可能同时出现,形成混合缺陷类型。传统深度学习方法对混合型晶圆图缺陷信息的识别率较低,为此,文章提出... 晶圆测试作为芯片生产过程中重要的一环,晶圆图缺陷模式的识别和分类对改进前端制造工艺具有关键作用。在实际生产过程中,各类缺陷可能同时出现,形成混合缺陷类型。传统深度学习方法对混合型晶圆图缺陷信息的识别率较低,为此,文章提出一种基于Vision Transformer的缺陷识别方法。该方法采用多头自注意力机制对晶圆图的全局特征进行编码,实现了对混合型晶圆缺陷图的高效识别。在混合型缺陷数据集上的实验结果表明,该方法性能优于现有深度学习模型,平均正确率达96.2%。 展开更多
关键词 计算机视觉 晶圆图 缺陷识别 vision Transformer
在线阅读 下载PDF
Geometric parameter identification of bridge precast box girder sections based on deep learning and computer vision 被引量:2
20
作者 JIA Jingwei NI Youhao +2 位作者 MAO Jianxiao XU Yinfei WANG Hao 《Journal of Southeast University(English Edition)》 2025年第3期278-285,共8页
To overcome the limitations of low efficiency and reliance on manual processes in the measurement of geometric parameters for bridge prefabricated components,a method based on deep learning and computer vision is deve... To overcome the limitations of low efficiency and reliance on manual processes in the measurement of geometric parameters for bridge prefabricated components,a method based on deep learning and computer vision is developed to identify the geometric parameters.The study utilizes a common precast element for highway bridges as the research subject.First,edge feature points of the bridge component section are extracted from images of the precast component cross-sections by combining the Canny operator with mathematical morphology.Subsequently,a deep learning model is developed to identify the geometric parameters of the precast components using the extracted edge coordinates from the images as input and the predefined control parameters of the bridge section as output.A dataset is generated by varying the control parameters and noise levels for model training.Finally,field measurements are conducted to validate the accuracy of the developed method.The results indicate that the developed method effectively identifies the geometric parameters of bridge precast components,with an error rate maintained within 5%. 展开更多
关键词 bridge precast components section geometry parameters size identification computer vision deep learning
在线阅读 下载PDF
上一页 1 2 250 下一页 到第
使用帮助 返回顶部