期刊文献+
共找到252篇文章
< 1 2 13 >
每页显示 20 50 100
Zero-Shot Based Spatial AI Algorithm for Up-to-Date 3D Vision Map Generations in Highly Complex Indoor Environments
1
作者 Sehun Lee Taehoon Kim Junho Ahn 《Computers, Materials & Continua》 2025年第8期3623-3648,共26页
This paper proposes a zero-shot based spatial recognition AI algorithm by fusing and developing multidimensional vision identification technology adapted to the situation in large indoor and underground spaces.With th... This paper proposes a zero-shot based spatial recognition AI algorithm by fusing and developing multidimensional vision identification technology adapted to the situation in large indoor and underground spaces.With the expansion of large shopping malls and underground urban spaces(UUS),there is an increasing need for new technologies that can quickly identify complex indoor structures and changes such as relocation,remodeling,and construction for the safety and management of citizens through the provision of the up-to-date indoor 3D site maps.The proposed algorithm utilizes data collected by an unmanned robot to create a 3D site map of the up-to-date indoor site and recognizes complex indoor spaces based on zero-shot learning.This research specifically addresses two major challenges:the difficulty of detecting walls and floors due to complex patterns and the difficulty of spatial perception due to unknown obstacles.The proposed algorithm addresses the limitations of the existing foundation model,detects floors and obstacles without expensive sensors,and improves the accuracy of spatial recognition by combining floor detection,vanishing point detection,and fusion obstacle detection algorithms.The experimental results show that the algorithm effectively detects the floor and obstacles in various indoor environments,with F1 scores of 0.96 and 0.93 in the floor detection and obstacle detection experiments,respectively. 展开更多
关键词 Spatial AI VISION foundation model zero-shot learning image segmentation
在线阅读 下载PDF
Denoising graph neural network based on zero-shot learning for Gibbs phenomenon in high-order DG applications
2
作者 Wei AN Jiawen LIU +3 位作者 Wenxuan OUYANG Haoyu RU Xuejun LIU Hongqiang LYU 《Chinese Journal of Aeronautics》 2025年第3期234-248,共15页
With the availability of high-performance computing technology and the development of advanced numerical simulation methods, Computational Fluid Dynamics (CFD) is becoming more and more practical and efficient in engi... With the availability of high-performance computing technology and the development of advanced numerical simulation methods, Computational Fluid Dynamics (CFD) is becoming more and more practical and efficient in engineering. As one of the high-precision representative algorithms, the high-order Discontinuous Galerkin Method (DGM) has not only attracted widespread attention from scholars in the CFD research community, but also received strong development. However, when DGM is extended to high-speed aerodynamic flow field calculations, non-physical numerical Gibbs oscillations near shock waves often significantly affect the numerical accuracy and even cause calculation failure. Data driven approaches based on machine learning techniques can be used to learn the characteristics of Gibbs noise, which motivates us to use it in high-speed DG applications. To achieve this goal, labeled data need to be generated in order to train the machine learning models. This paper proposes a new method for denoising modeling of Gibbs phenomenon using a machine learning technique, the zero-shot learning strategy, to eliminate acquiring large amounts of CFD data. The model adopts a graph convolutional network combined with graph attention mechanism to learn the denoising paradigm from synthetic Gibbs noise data and generalize to DGM numerical simulation data. Numerical simulation results show that the Gibbs denoising model proposed in this paper can suppress the numerical oscillation near shock waves in the high-order DGM. Our work automates the extension of DGM to high-speed aerodynamic flow field calculations with higher generalization and lower cost. 展开更多
关键词 Computational fluid dynamics High-order discon tinuous Galerkin method Gibbs phenomenon Graph neural networks zero-shot learning
原文传递
A Dual Discriminator Method for Generalized Zero-Shot Learning
3
作者 Tianshu Wei Jinjie Huang 《Computers, Materials & Continua》 SCIE EI 2024年第4期1599-1612,共14页
Zero-shot learning enables the recognition of new class samples by migrating models learned from semanticfeatures and existing sample features to things that have never been seen before. The problems of consistencyof ... Zero-shot learning enables the recognition of new class samples by migrating models learned from semanticfeatures and existing sample features to things that have never been seen before. The problems of consistencyof different types of features and domain shift problems are two of the critical issues in zero-shot learning. Toaddress both of these issues, this paper proposes a new modeling structure. The traditional approach mappedsemantic features and visual features into the same feature space;based on this, a dual discriminator approachis used in the proposed model. This dual discriminator approach can further enhance the consistency betweensemantic and visual features. At the same time, this approach can also align unseen class semantic features andtraining set samples, providing a portion of information about the unseen classes. In addition, a new feature fusionmethod is proposed in the model. This method is equivalent to adding perturbation to the seen class features,which can reduce the degree to which the classification results in the model are biased towards the seen classes.At the same time, this feature fusion method can provide part of the information of the unseen classes, improvingits classification accuracy in generalized zero-shot learning and reducing domain bias. The proposed method isvalidated and compared with othermethods on four datasets, and fromthe experimental results, it can be seen thatthe method proposed in this paper achieves promising results. 展开更多
关键词 Generalized zero-shot learning modality consistent DISCRIMINATOR domain shift problem feature fusion
在线阅读 下载PDF
基于反向投影的zero-shot learning目标分类算法研究 被引量:1
4
作者 冯鹏 庹红娅 +2 位作者 乔凌峰 王洁欣 敬忠良 《计算机应用研究》 CSCD 北大核心 2017年第11期3291-3294,共4页
Zero-shot learning(ZSL)是针对没有训练样本的类别进行分类的问题。传统回归方法的核心是将视觉特征投影到语义空间,没有充分利用视觉特征自身包含的样本信息,同时训练计算量大。提出基于反向投影的ZSL目标分类方法,将类别原型投影到... Zero-shot learning(ZSL)是针对没有训练样本的类别进行分类的问题。传统回归方法的核心是将视觉特征投影到语义空间,没有充分利用视觉特征自身包含的样本信息,同时训练计算量大。提出基于反向投影的ZSL目标分类方法,将类别原型投影到视觉空间,利用视觉特征的语义性学习出映射函数,参数优化过程仅通过解析解就可以获得。在两个基准数据集的实验结果表明,提出的反向投影方法分类结果较传统回归方法和其他现有方法有大幅提升,并且训练时间大大减少,可以更好地推广到未知类别的分类问题上。 展开更多
关键词 zero-shot LEARNING 目标分类 反向投影 解析解
在线阅读 下载PDF
Zero-shot Fine-grained Classification by Deep Feature Learning with Semantics 被引量:8
5
作者 Ao-Xue Li Ke-Xin Zhang Li-Wei Wang 《International Journal of Automation and computing》 EI CSCD 2019年第5期563-574,共12页
Fine-grained image classification, which aims to distinguish images with subtle distinctions, is a challenging task for two main reasons: lack of sufficient training data for every class and difficulty in learning dis... Fine-grained image classification, which aims to distinguish images with subtle distinctions, is a challenging task for two main reasons: lack of sufficient training data for every class and difficulty in learning discriminative features for representation. In this paper, to address the two issues, we propose a two-phase framework for recognizing images from unseen fine-grained classes, i.e., zeroshot fine-grained classification. In the first feature learning phase, we finetune deep convolutional neural networks using hierarchical semantic structure among fine-grained classes to extract discriminative deep visual features. Meanwhile, a domain adaptation structure is induced into deep convolutional neural networks to avoid domain shift from training data to test data. In the second label inference phase, a semantic directed graph is constructed over attributes of fine-grained classes. Based on this graph, we develop a label propagation algorithm to infer the labels of images in the unseen classes. Experimental results on two benchmark datasets demonstrate that our model outperforms the state-of-the-art zero-shot learning models. In addition, the features obtained by our feature learning model also yield significant gains when they are used by other zero-shot learning models, which shows the flexility of our model in zero-shot finegrained classification. 展开更多
关键词 FINE-GRAINED image CLASSIFICATION zero-shot LEARNING DEEP FEATURE LEARNING domain adaptation semantic graph
原文传递
A Novel Siamese Network for Few/Zero-Shot Handwritten Character Recognition Tasks
6
作者 Nagwa Elaraby Sherif Barakat Amira Rezk 《Computers, Materials & Continua》 SCIE EI 2023年第1期1837-1854,共18页
Deep metric learning is one of the recommended methods for the challenge of supporting few/zero-shot learning by deep networks.It depends on building a Siamese architecture of two homogeneous Convolutional Neural Netw... Deep metric learning is one of the recommended methods for the challenge of supporting few/zero-shot learning by deep networks.It depends on building a Siamese architecture of two homogeneous Convolutional Neural Networks(CNNs)for learning a distance function that can map input data from the input space to the feature space.Instead of determining the class of each sample,the Siamese architecture deals with the existence of a few training samples by deciding if the samples share the same class identity or not.The traditional structure for the Siamese architecture was built by forming two CNNs from scratch with randomly initialized weights and trained by binary cross-entropy loss.Building two CNNs from scratch is a trial and error and time-consuming phase.In addition,training with binary crossentropy loss sometimes leads to poor margins.In this paper,a novel Siamese network is proposed and applied to few/zero-shot Handwritten Character Recognition(HCR)tasks.The novelties of the proposed network are in.1)Utilizing transfer learning and using the pre-trained AlexNet as a feature extractor in the Siamese architecture.Fine-tuning a pre-trained network is typically faster and easier than building from scratch.2)Training the Siamese architecture with contrastive loss instead of the binary cross-entropy.Contrastive loss helps the network to learn a nonlinear mapping function that enables it to map the extracted features in the vector space with an optimal way.The proposed network is evaluated on the challenging Chars74K datasets by conducting two experiments.One is for testing the proposed network in few-shot learning while the other is for testing it in zero-shot learning.The recognition accuracy of the proposed network reaches to 85.6%and 82%in few-and zero-shot learning respectively.In addition,a comparison between the performance of the proposed Siamese network and the traditional Siamese CNNs is conducted.The comparison results show that the proposed network achieves higher recognition results in less time.The proposed network reduces the training time from days to hours in both experiments. 展开更多
关键词 Handwritten character recognition(HCR) few-shot learning zero-shot learning deep metric learning transfer learning contrastive loss Chars74K datasets
在线阅读 下载PDF
Explanatory Multi-Scale Adversarial Semantic Embedding Space Learning for Zero-Shot Recognition
7
作者 Huiting Li 《Open Journal of Applied Sciences》 2022年第3期317-335,共19页
The goal of zero-shot recognition is to classify classes it has never seen before, which needs to build a bridge between seen and unseen classes through semantic embedding space. Therefore, semantic embedding space le... The goal of zero-shot recognition is to classify classes it has never seen before, which needs to build a bridge between seen and unseen classes through semantic embedding space. Therefore, semantic embedding space learning plays an important role in zero-shot recognition. Among existing works, semantic embedding space is mainly taken by user-defined attribute vectors. However, the discriminative information included in the user-defined attribute vector is limited. In this paper, we propose to learn an extra latent attribute space automatically to produce a more generalized and discriminative semantic embedded space. To prevent the bias problem, both user-defined attribute vector and latent attribute space are optimized by adversarial learning with auto-encoders. We also propose to reconstruct semantic patterns produced by explanatory graphs, which can make semantic embedding space more sensitive to usefully semantic information and less sensitive to useless information. The proposed method is evaluated on the AwA2 and CUB dataset. These results show that our proposed method achieves superior performance. 展开更多
关键词 zero-shot Recognition Semantic Embedding Space Adversarial Learning Explanatory Graph
在线阅读 下载PDF
A Survey of Zero-Shot Object Detection
8
作者 Weipeng Cao Xuyang Yao +3 位作者 Zhiwu Xu Ye Liu Yinghui Pan Zhong Ming 《Big Data Mining and Analytics》 2025年第3期726-750,共25页
Zero-Shot object Detection(ZSD),one of the most challenging problems in the field of object detection,aims to accurately identify new categories that are not encountered during training.Recent advancements in deep lea... Zero-Shot object Detection(ZSD),one of the most challenging problems in the field of object detection,aims to accurately identify new categories that are not encountered during training.Recent advancements in deep learning and increased computational power have led to significant improvements in object detection systems,achieving high recognition accuracy on benchmark datasets.However,these systems remain limited in real-world applications due to the scarcity of labeled training samples,making it difficult to detect unseen classes.To address this,researchers have explored various approaches,yielding promising progress.This article provides a comprehensive review of the current state of ZSD,distinguishing four related methods—zero-shot,open-vocabulary,open-set,and open-world approaches—based on task objectives and data usage.We highlight representative methods,discuss the technical challenges within each framework,and summarize the commonly used evaluation metrics,benchmark datasets,and experimental results.Our review aims to offer readers a clear overview of the latest developments and performance trends in ZSD. 展开更多
关键词 zero-shot object Detection(ZSD) open-vocabulary object detection open-set object detection open-world object detection
原文传递
Select-and-Answer Prompting:Facilitating LLMs for Improving Zero-Shot Reasoning
9
作者 WANG Yufang TANG Xuesong HAO Kuangrong 《Journal of Donghua University(English Edition)》 2025年第5期513-522,共10页
Large language models(LLMs)have demonstrated remarkable generalization abilities across multiple tasks in natural language processing(NLP).For multi-step reasoning tasks,chain-of-thought(CoT)prompting facilitates step... Large language models(LLMs)have demonstrated remarkable generalization abilities across multiple tasks in natural language processing(NLP).For multi-step reasoning tasks,chain-of-thought(CoT)prompting facilitates step-by-step thinking,leading to improved performance.However,despite significant advancements in LLMs,current CoT prompting performs suboptimally on smaller-scale models that have fewer parameters.Additionally,the common paradigm of few-shot CoT prompting relies on a set of manual demonstrations,with performance contingent on the quality of these annotations and varying with task-specific requirements.To address these limitations,we propose a select-and-answer prompting method(SAP)to enhance language model performance on reasoning tasks without the need for manual demonstrations.This method comprises two primary steps:guiding the model to conduct preliminary analysis and generate several candidate answers based on the prompting;allowing the model to provide final answers derived from these candidate answers.The proposed prompting strategy is evaluated across two language models of varying sizes and six datasets.On ChatGLM-6B,SAP consistently outperforms few-shot CoT across all datasets.For GPT-3.5,SAP achieves comparable performance to few-shot CoT and outperforms zero-shot CoT in most cases.These experimental results indicate that SAP can significantly improve the accuracy of language models in reasoning tasks. 展开更多
关键词 zero-shot learning large language model(LLM) reasoning problem chain-of-thought(CoT)prompting
在线阅读 下载PDF
TV-SAM:Increasing Zero-Shot Segmentation Performance on Multimodal Medical Images Using GPT-4 Generated Descriptive Prompts Without Human Annotation
10
作者 Zekun Jiang Dongjie Cheng +10 位作者 Ziyuan Qin Jun Gao Qicheng Lao Abdullaev Bakhrom Ismoilovich Urazboev Gayrat Yuldashov Elyorbek Bekchanov Habibullo Defu Tang Linjing Wei Kang Li Le Zhang 《Big Data Mining and Analytics》 CSCD 2024年第4期1199-1211,共13页
This study presents a novel multimodal medical image zero-shot segmentation algorithm named the text-visual-prompt segment anything model(TV-SAM)without any manual annotations.The TV-SAM incorporates and integrates th... This study presents a novel multimodal medical image zero-shot segmentation algorithm named the text-visual-prompt segment anything model(TV-SAM)without any manual annotations.The TV-SAM incorporates and integrates the large language model GPT-4,the vision language model GLIP,and the SAM to autonomously generate descriptive text prompts and visual bounding box prompts from medical images,thereby enhancing the SAM’s capability for zero-shot segmentation.Comprehensive evaluations are implemented on seven public datasets encompassing eight imaging modalities to demonstrate that TV-SAM can effectively segment unseen targets across various modalities without additional training.TV-SAM significantly outperforms SAM AUTO(p<0.01)and GSAM(p<0.05),closely matching the performance of SAM BBOX with gold standard bounding box prompts(p=0.07),and surpasses the state-of-the-art methods on specific datasets such as ISIC(0.853 versus 0.802)and WBC(0.968 versus 0.883).The study indicates that TV-SAM serves as an effective multimodal medical image zero-shot segmentation algorithm,highlighting the significant contribution of GPT-4 to zero-shot segmentation.By integrating foundational models such as GPT-4,GLIP,and SAM,the ability to address complex problems in specialized domains can be enhanced. 展开更多
关键词 large language model vision language model segment anything model medical image segmentation zero-shot segmentation GPT-4
原文传递
基于图像-文本大模型CLIP微调的零样本参考图像分割 被引量:3
11
作者 刘杰 乔文昇 +2 位作者 朱佩佩 雷印杰 王紫轩 《计算机应用研究》 北大核心 2025年第4期1248-1254,共7页
近年来,以CLIP为代表的视觉-语言大模型在众多下游场景中显示出了出色的零样本推理能力,然而将CLIP模型迁移至需要像素水平图-文理解的参考图像分割中非常困难,其根本原因在于CLIP关注图像-文本整体上的对齐情况,却丢弃了图像中像素点... 近年来,以CLIP为代表的视觉-语言大模型在众多下游场景中显示出了出色的零样本推理能力,然而将CLIP模型迁移至需要像素水平图-文理解的参考图像分割中非常困难,其根本原因在于CLIP关注图像-文本整体上的对齐情况,却丢弃了图像中像素点的空间位置信息。鉴于此,以CLIP为基础模型,提出了一种单阶段、细粒度、多层次的零样本参考图像分割模型PixelCLIP。具体地,采取了多尺度的图像特征融合,既聚集CLIP中不同视觉编码器提取的图像像素级特征,同时又考虑CLIP中固有的图像整体语义特征。在文本信息表征上,不但依靠CLIP-BERT来保持物体种类信息,还引入LLaVA大语言模型进一步注入上下文背景知识。最后,PixelCLIP通过细粒度跨模态关联匹配,实现像素水平的参考图像分割。充分的数值分析结果验证了该方法的有效性。 展开更多
关键词 零样本 CLIP 像素级 单阶段 参考图像分割
在线阅读 下载PDF
基于多模态融合Transformer的视听广义零次学习方法 被引量:1
12
作者 杨静 李小勇 +3 位作者 阮小利 李少波 唐向红 徐计 《电子与信息学报》 北大核心 2025年第7期2375-2384,共10页
视听零次学习需要理解音频和视觉信息之间的关系,以便能够推理未见过的类别。尽管领域做出了许多努力并取得了重大进展,但往往专注于学习强大的表征,从而忽视了音频和视频之间的依赖关系和输出分布与目标分布不一致的问题。因此,该文提... 视听零次学习需要理解音频和视觉信息之间的关系,以便能够推理未见过的类别。尽管领域做出了许多努力并取得了重大进展,但往往专注于学习强大的表征,从而忽视了音频和视频之间的依赖关系和输出分布与目标分布不一致的问题。因此,该文提出了基于Transformer的视听广义零次学习方法。具体来说,使用注意力机制来学习数据的内部信息,增强不同模态的信息交互,以捕捉视听数据之间的语义一致性;为了度量不同概率分布之间的差异和类别之间的一致性,引入了Kullback-Leibler(KL)散度和余弦相似度损失。为了评估所提方法,在VGGSound-GZSL^(cls),UCF-GZSL^(cls)和ActivityNet-GZSL^(cls)3个基准数据集上进行测试。大量的实验结果表明,所提方法在3个数据集上都取得了最先进的性能。 展开更多
关键词 视听零次学习 视频分类 注意力机制 KL散度
在线阅读 下载PDF
CGR-BERT-ZESHEL:基于中文特征的零样本实体链接模型 被引量:1
13
作者 潘建 吴志伟 李燕君 《计算机科学》 北大核心 2025年第4期262-270,共9页
目前,在实体链接任务的研究中,对中文实体链接、新兴实体与不知名实体链接的研究较少。此外,传统的BERT模型忽略了中文的两个关键方面,即字形和部首,这两者为语言理解提供了重要的语法和语义信息。针对以上问题,提出了一种基于中文特征... 目前,在实体链接任务的研究中,对中文实体链接、新兴实体与不知名实体链接的研究较少。此外,传统的BERT模型忽略了中文的两个关键方面,即字形和部首,这两者为语言理解提供了重要的语法和语义信息。针对以上问题,提出了一种基于中文特征的零样本实体链接模型CGR-BERT-ZESHEL。该模型首先通过引入视觉图像嵌入和传统字符嵌入,分别将字形特征和部首特征输入模型,从而增强词向量特征并缓解未登录词对模型性能的影响;然后采用候选实体生成和候选实体排序两阶段的方法得到实体链接的结果。在Hansel和CLEEK两个数据集上进行实验,结果表明,与基线模型相比,CGR-BERT-ZESHEL模型在候选实体生成阶段的性能指标Recall@100提高了17.49%和7.34%,在候选实体排序阶段的性能指标Accuracy提高了3.02%和3.11%;同时,在Recall@100和Accuracy指标上的性能均优于其他对比模型。 展开更多
关键词 实体链接 中文零样本 BERT 候选实体生成 候选实体排序
在线阅读 下载PDF
基于双专家的巡检影像多模态零样本缺陷检测 被引量:1
14
作者 吴华 贾栋豪 +3 位作者 张婷婷 白晓静 孙笠 蒲梦杨 《中国图象图形学报》 北大核心 2025年第3期672-682,共11页
目的电力设备巡检影像缺陷检测对于提高电力传输的安全性和电网运行的可靠性具有重要作用。但由于相应训练数据集的构造成本高昂,传统的监督学习方法难以适应电力设备巡检影像缺陷检测。同时电力设备巡检影像中通常含有复杂多样的背景,... 目的电力设备巡检影像缺陷检测对于提高电力传输的安全性和电网运行的可靠性具有重要作用。但由于相应训练数据集的构造成本高昂,传统的监督学习方法难以适应电力设备巡检影像缺陷检测。同时电力设备巡检影像中通常含有复杂多样的背景,严重干扰了模型对缺陷的检测。方法基于视觉语言模型并结合文本提示,提出了电力设备巡检影像零样本缺陷检测模型。模型中含有多个双专家模块,在由视觉语言模型获得文本特征和视觉特征后,经多个双专家模块处理并融合,得到像素级的缺陷检测结果。同时,构建了具有像素级掩码标注的电力设备巡检影像数据集对模型性能进行全面评测。结果在本文构建的电力设备巡检影像测试数据集上与SAA+(segment any anomaly+)、AnomalyGPT、WinCLIP(window-based CLIP)、PaDiM(patch distribution modeling)和PatchCore进行比较,在像素级的缺陷分割性能表现上,AUROC(area under the receiver operating characteristic curve)平均提升18.1%,F1-max(F1 score at optimal threshold)平均提升26.1%;在图像级的缺陷分类性能表现上,AUROC平均提升20.2%,AP(average precision)平均提升10.0%。具体到数据集中的各个电力设备,模型在像素级缺陷分割性能表现上,均获得最好结果。同时进行了消融实验,证明了双专家模块对提升模型缺陷检测精度的显著效果。结论本文模型以零样本的方式,避免了构造电力设备巡检影像数据集的高昂成本。同时提出的双专家模块,使模型减少了受巡检影像复杂背景区域的干扰。 展开更多
关键词 零样本缺陷检测 双专家 视觉语言模型 多模态 电力设备巡检影像
原文传递
提升零样本工业异常检测方法泛化性的属性无关提示学习分析 被引量:1
15
作者 刘桂雄 闫奕樸 +1 位作者 陈贵龙 邢星奥 《激光杂志》 北大核心 2025年第5期64-70,共7页
工业异常检测是制造过程质量控制核心环节,零样本工业异常检测属性无关提示学习是提升泛化性有效途径。本文面向工业生产应用,针对零样本工业异常检测属性无关提示学习,从可学习文本提示、物体解耦文本提示两个方面的基本原理、框架、... 工业异常检测是制造过程质量控制核心环节,零样本工业异常检测属性无关提示学习是提升泛化性有效途径。本文面向工业生产应用,针对零样本工业异常检测属性无关提示学习,从可学习文本提示、物体解耦文本提示两个方面的基本原理、框架、流程与应用性能等内容,系统分析比较各方法应用特点,指出图像与文本共同优化提示,以及细化异常特征描述是该领域值得关注方向,对工业异常检测技术研究人员具有指导参考价值。 展开更多
关键词 工业异常检测 属性无关提示学习 大模型 零样本
原文传递
面向零样本图像分类的交互式类属性构建方法
16
作者 刘真 徐景胜 +2 位作者 颜菁 徐润森 吴向阳 《计算机辅助设计与图形学学报》 北大核心 2025年第2期243-253,共11页
零样本图像分类解决了训练和测试数据类别不相交的问题,人类标注属性是一种常用的实现零样本图像分类的辅助知识.为协助专家设计类属性矩阵,提出了一种交互式构建方法,简化了烦琐且缺乏指导的流程.首先,通过一种基于概念的深度学习可解... 零样本图像分类解决了训练和测试数据类别不相交的问题,人类标注属性是一种常用的实现零样本图像分类的辅助知识.为协助专家设计类属性矩阵,提出了一种交互式构建方法,简化了烦琐且缺乏指导的流程.首先,通过一种基于概念的深度学习可解释性方法,在训练集图像数据中提取出可理解的属性信息;然后,采用多视图协作的交互方式,探索和分析已提取属性的重要性.系统提供了全局和局部2种方式,辅助用户设计测试集数据类别的属性值;最后,通过在数据集Animals with Attributes2上进行的案例分析,以及采用李克特量表的用户评估实验,验证了设计方法的有效性和实用性,可以帮助专家用户高效且便捷地完成类属性构建工作. 展开更多
关键词 零样本学习 零样本图像分类 可视分析 可解释人工智能 人机协作
在线阅读 下载PDF
融合CLIP和3D高斯的多模态场景编辑算法
17
作者 曹仰杰 王伟平 +2 位作者 李振强 谢俊 吕润峰 《郑州大学学报(工学版)》 北大核心 2025年第5期35-42,共8页
针对3D场景编辑算法对标注数据过度依赖和计算复杂度高的问题,提出了一种融合CLIP与3D高斯的多模态场景编辑算法(CLIP2Gaussian)。首先,利用SAM从多视角图像中提取目标掩码,并引入双向传播策略实现不同视角之间的掩码一致性;其次,将提... 针对3D场景编辑算法对标注数据过度依赖和计算复杂度高的问题,提出了一种融合CLIP与3D高斯的多模态场景编辑算法(CLIP2Gaussian)。首先,利用SAM从多视角图像中提取目标掩码,并引入双向传播策略实现不同视角之间的掩码一致性;其次,将提取的掩码通过CLIP进行语义标签分配,并映射到3D高斯点,实现3D场景的语义嵌入;最后,采用可微分渲染机制对3D高斯参数进行优化,同时引入空间一致性正则化策略,通过聚类增强语义标签在3D空间中的一致性与稳定性。实验结果表明:CLIP2Gaussian在LERF数据集上IoU达到61.23%,语义分割任务中单次文本查询响应时间为0.57 s,准确率和效率均优于LERF。消融实验进一步验证了所提算法在最小扰动原始场景的前提下对目标区域的精准编辑。 展开更多
关键词 3D重建 零样本学习 场景理解 场景编辑 3D高斯
在线阅读 下载PDF
基于注意力机制和能量函数的动作识别算法
18
作者 王丽芳 吴荆双 +1 位作者 尹鹏亮 胡立华 《计算机应用》 北大核心 2025年第1期234-239,共6页
针对零样本动作识别(ZSAR)算法的框架缺乏结构性指导的问题,以基于能量的模型(EBM)指导框架设计,提出基于注意力机制和能量函数的动作识别算法(ARAAE)。首先,为了得到EBM的输入,设计了光流加3D卷积(C3D)架构的组合以提取视觉特征,从而... 针对零样本动作识别(ZSAR)算法的框架缺乏结构性指导的问题,以基于能量的模型(EBM)指导框架设计,提出基于注意力机制和能量函数的动作识别算法(ARAAE)。首先,为了得到EBM的输入,设计了光流加3D卷积(C3D)架构的组合以提取视觉特征,从而达到空间去冗余的效果;其次,将视觉Transformer(ViT)用于视觉特征的提取以减少时间冗余,同时利用ViT配合光流加C3D架构的组合以减少空间冗余,从而获得非冗余视觉空间;最后,为度量视觉空间和语义空间的相关性,实现能量评分评估机制,设计联合损失函数来进行优化实验。采用6个经典ZSAR算法及近年文献里的算法在两个数据集HMDB51和UCF101进行实验的结果表明:相较于CAGE(Coupling Adversarial Graph Embedding)、Bi-dir GAN(Bi-directional Generative Adversarial Network)和ETSAN(Energy-based Temporal Summarized Attentive Network)等算法,在平均分组的HMDB51数据集上,ARAAE平均识别准确率提升至(22.1±1.8)%,均明显优于对比算法;在平均分组的UCF101数据集上,ARAAE的平均识别准确率提升至(22.4±1.6)%,略优于对比算法;在以81/20为分割方式的UCF101数据集上,ARAAE的平均识别准确率提升至(40.2±2.6)%,均大于对比算法。可见,ARAAE在ZSAR中能有效提高识别性能。 展开更多
关键词 零样本动作识别 能量函数 注意力机制 光流法 视觉特征
在线阅读 下载PDF
结合双重对比嵌入学习的生成式零样本图像识别
19
作者 张桂梅 闫文尚 黄军阳 《中国图象图形学报》 北大核心 2025年第5期1389-1403,共15页
目的零样本学习(zero-shot learning,ZSL)是解决样本数据缺失情况下目标识别的有效方法。传统的零样本识别是通过对带标签的可见类数据训练,实现对无标签的未见类新数据的识别。根据任务设置的不同,分为传统零样本学习(conventional zer... 目的零样本学习(zero-shot learning,ZSL)是解决样本数据缺失情况下目标识别的有效方法。传统的零样本识别是通过对带标签的可见类数据训练,实现对无标签的未见类新数据的识别。根据任务设置的不同,分为传统零样本学习(conventional zero-shot learning,CZSL)和广义零样本学习(generalized zero-shot learning,GZSL)。生成式零样本识别方法由于可以生成未见类的视觉特征,从而将零样本学习问题转换为常规监督学习问题。但是生成式零样本识别存在特征判别性信息不足、伪视觉特征与语义信息不一致以及域偏移等问题。针对上述问题,提出结合双重对比嵌入学习的生成式零样本图像识别方法。方法首先,针对生成的特征判别性不足问题,基于VAE-GAN(variational autoencoder-generative adversarial network)生成框架,集成对比嵌入模块,多个网络协同训练,提高零样本图像识别精度;其次,以条件VAE-GAN为生成网络,提出双重对比学习策略。一方面,在现有可见类对比学习的基础上,引入未见类伪样本实例—原型域内对比学习,使生成的伪视觉特征与语义信息对齐,缓解可见类和未见类的语义混淆;另一方面,提出跨域中心—原型对比学习,缓解模型过于偏向于可见类,一定程度上减轻域偏移。结果在AWA1(animals with attributes1)、AWA2、CUB(Caltech-UCSD birds-200-2011)和SUN(SUNattribute)数据集上进行零样本和广义零样本识别实验,并与最新相关方法进行比较。在零样本识别任务中,提出的方法在AWA1和CUB数据集取得最优值,相比性能次优的模型,T1值分别提高2.2%和2.7%;在AWA2和SUN数据集均取得次优值。在广义零样本识别中,在AWA1、AWA2和CUB数据集H值均取得最优,相比次优值分别提升0.6%、0.8%和2.8%;在SUN数据集取得次优值。消融实验验证了提出算法的有效性。结论实验结果表明,提出的方法可提高零样本和广义零样本图像识别的精度,并具有较好的泛化性能。 展开更多
关键词 零样本学习(ZSL) 广义零样本学习(GZSL) 生成对抗网络(GAN) 嵌入空间 对比学习
原文传递
融合扩散模型的生成式零样本钢表面缺陷检测
20
作者 季瑞瑞 杨思凡 +2 位作者 华羽垚 耿屹 白晨羲 《计算机工程与应用》 北大核心 2025年第12期333-343,共11页
针对生成式零样本目标检测模型难以应对复杂场景下的钢材表面缺陷检测,存在语义混淆和鲁棒性低的问题,提出一种融合扩散模型的生成式零样本钢材表面缺陷检测模型。设计多模态缺陷特征对齐模块,通过监督对比学习、缺陷特征对齐和语义一... 针对生成式零样本目标检测模型难以应对复杂场景下的钢材表面缺陷检测,存在语义混淆和鲁棒性低的问题,提出一种融合扩散模型的生成式零样本钢材表面缺陷检测模型。设计多模态缺陷特征对齐模块,通过监督对比学习、缺陷特征对齐和语义一致性重建,使生成器生成的缺陷特征与原始语义信息充分对齐,提高生成模型的鲁棒性;引入缺陷特征去噪扩散模块,通过逐步添加、去除噪声来生成多样化的特征表征,并筛选出具有代表性的生成缺陷特征。将得到的生成缺陷特征用于更新缺陷检测网络的分类器,实现零样本钢材表面缺陷检测。通过在NEU和GC10两个钢材表面缺陷数据集上的实验结果显示,零样本检测设置下,检测精度相较于基线模型分别提升11.5和17.4个百分点;广义零样本检测设置下,调和平均值分别提升3.0和9.8个百分点,有效提升了模型在复杂场景下的钢材表面缺陷检测能力;可视化结果表明,模型能够生成分离特征明显的未见缺陷特征,缓解了语义混淆问题;此外,与目前先进的零样本目标检测模型相比,该模型在钢材表面缺陷检测中表现出了更高的准确率和鲁棒性。 展开更多
关键词 缺陷检测 零样本学习 生成式模型 语义对齐 扩散模型
在线阅读 下载PDF
上一页 1 2 13 下一页 到第
使用帮助 返回顶部