期刊文献+
共找到6篇文章
< 1 >
每页显示 20 50 100
VP-SFDA:Visual Prompt Source-Free Domain Adaptation for Cross-Modal Medical Image
1
作者 Yixin Chen Yan Wang Zhaoheng Xie 《Health Data Science》 2025年第1期290-304,共15页
Background:Source-free unsupervised domain adaptation(SFUDA)methods aim to address the challenge of domain shift while preserving data privacy.Existing SFUDA approaches construct reliable and confident pseudo-labels f... Background:Source-free unsupervised domain adaptation(SFUDA)methods aim to address the challenge of domain shift while preserving data privacy.Existing SFUDA approaches construct reliable and confident pseudo-labels for target-domain data through denoising methods,thereby guiding the training of the target-domain model.The effectiveness of denoising approaches is influenced by the degree of domain gap between the source and target domains.A marked shift can cause the pseudo-labels to be unreliable,even after applying denoising.Methods:We propose a novel 2-stage framework for SFUDA called visual prompt source-free domain adaptation(VP-SFDA).We propose input-specific visual prompt in the first stage,prompting process,which bridges the target-domain data to source-domain distribution.Our method utilizes visual prompts and batch normalization constraint to enable the alignment model to learn domainspecific knowledge and align the target-domain data with the source-domain contribution.The second stage is the adaptation process,which aims at optimizing the segmentation model from the source domain to the target domain.This is accomplished through the denoising techniques,ultimately enhancing the performance.Results:Our study presents a comparative analysis of several SFUDA techniques in the VPSFDA framework across 4 tasks:abdominal magnetic resonance imaging(MRI)to computed tomography(CT),abdominal CT to MRI,cardiac MRI to CT,and cardiac CT to MRI.Notably,in the abdominal MRI to CT adaptation task,the VP-OS method achieved a remarkable improvement,increasing the average DICE score from 0.658 to 0.773(P<0.01)and reducing the average surface distance(ASD)from 3.489 to 2.961(P<0.01).Similarly,the VP-LD and VP-DPL methods also showed significant improvements over their base algorithms in both abdominal and cardiac MRI to CT tasks.Conclusions:This paper proposes VP-SFDA,a novel 2-stage framework for SFUDA in medical imaging,which achieves superior performance through input-specific visual prompts and batch normalization constraint for domain adaptation,coupled with denoising methods for enhanced results.Comparative experiments on 4 medical SFUDA tasks demonstrate that VO-SFDA surpasses existing methods,with ablation studies confirming the benefits of domain-specific patterns. 展开更多
关键词 denoising methodsthereby batch normalization visual prompt cross modal adaptation domain shift domain adaptation denoising approaches medical image segmentation
原文传递
VPM-Net:Person Re-ID Network Based on Visual Prompt Technology and Multi-Instance Negative Pooling
2
作者 Haitao Xie Yuliang Chen +3 位作者 Yunjie Zeng Lingyu Yan Zhizhi Wang Zhiwei Ye 《Computers, Materials & Continua》 2025年第5期3389-3410,共22页
With the rapid development of intelligent video surveillance technology,pedestrian re-identification has become increasingly important inmulti-camera surveillance systems.This technology plays a critical role in enhan... With the rapid development of intelligent video surveillance technology,pedestrian re-identification has become increasingly important inmulti-camera surveillance systems.This technology plays a critical role in enhancing public safety.However,traditional methods typically process images and text separately,applying upstream models directly to downstream tasks.This approach significantly increases the complexity ofmodel training and computational costs.Furthermore,the common class imbalance in existing training datasets limitsmodel performance improvement.To address these challenges,we propose an innovative framework named Person Re-ID Network Based on Visual Prompt Technology andMulti-Instance Negative Pooling(VPM-Net).First,we incorporate the Contrastive Language-Image Pre-training(CLIP)pre-trained model to accurately map visual and textual features into a unified embedding space,effectively mitigating inconsistencies in data distribution and the training process.To enhancemodel adaptability and generalization,we introduce an efficient and task-specific Visual Prompt Tuning(VPT)technique,which improves the model’s relevance to specific tasks.Additionally,we design two key modules:the Knowledge-Aware Network(KAN)and theMulti-Instance Negative Pooling(MINP)module.The KAN module significantly enhances the model’s understanding of complex scenarios through deep contextual semantic modeling.MINP module handles samples,effectively improving the model’s ability to distinguish fine-grained features.The experimental outcomes across diverse datasets underscore the remarkable performance of VPM-Net.These results vividly demonstrate the unique advantages and robust reliability of VPM-Net in fine-grained retrieval tasks. 展开更多
关键词 Person re-identification multi-instance negative pooling visual prompt tuning
在线阅读 下载PDF
Dual modality prompt learning for visual question-grounded answering in robotic surgery 被引量:2
3
作者 Yue Zhang Wanshu Fan +3 位作者 Peixi Peng Xin Yang Dongsheng Zhou Xiaopeng Wei 《Visual Computing for Industry,Biomedicine,and Art》 2024年第1期316-328,共13页
With recent advancements in robotic surgery,notable strides have been made in visual question answering(VQA).Existing VQA systems typically generate textual answers to questions but fail to indicate the location of th... With recent advancements in robotic surgery,notable strides have been made in visual question answering(VQA).Existing VQA systems typically generate textual answers to questions but fail to indicate the location of the relevant content within the image.This limitation restricts the interpretative capacity of the VQA models and their abil-ity to explore specific image regions.To address this issue,this study proposes a grounded VQA model for robotic surgery,capable of localizing a specific region during answer prediction.Drawing inspiration from prompt learning in language models,a dual-modality prompt model was developed to enhance precise multimodal information interactions.Specifically,two complementary prompters were introduced to effectively integrate visual and textual prompts into the encoding process of the model.A visual complementary prompter merges visual prompt knowl-edge with visual information features to guide accurate localization.The textual complementary prompter aligns vis-ual information with textual prompt knowledge and textual information,guiding textual information towards a more accurate inference of the answer.Additionally,a multiple iterative fusion strategy was adopted for comprehensive answer reasoning,to ensure high-quality generation of textual and grounded answers.The experimental results vali-date the effectiveness of the model,demonstrating its superiority over existing methods on the EndoVis-18 and End-oVis-17 datasets. 展开更多
关键词 Prompt learning visual prompt Textual prompt Grounding-answering visual question answering
在线阅读 下载PDF
VAGen:waterbody segmentation with prompting for visual in‑context learning
4
作者 Jiapei Zhao Nobuyoshi Yabuki Tomohiro Fukuda 《AI in Civil Engineering》 2024年第1期1-20,共20页
Effective water management and flood prevention are critical challenges encountered by both urban and rural areas,necessitating precise and prompt monitoring of waterbodies.As a fundamental step in the monitoring proc... Effective water management and flood prevention are critical challenges encountered by both urban and rural areas,necessitating precise and prompt monitoring of waterbodies.As a fundamental step in the monitoring process,waterbody segmentation involves precisely delineating waterbody boundaries from imagery.Previous research using satellite images often lacks the resolution and contextual detail needed for local-scale analysis.In response to these challenges,this study seeks to address them by leveraging common natural images that are more easily accessible and provide higher resolution and more contextual information compared to satellite images.However,the segmentation of waterbodies from ordinary images faces several obstacles,including variations in lighting,occlusions from objects like trees and buildings,and reflections on the water surface,all of which can mislead algorithms.Additionally,the diverse shapes and textures of waterbodies,alongside complex backgrounds,further complicate this task.While large-scale vision models have typically been leveraged for their generalizability across various downstream tasks that are pre-trained on large datasets,their application to waterbody segmentation from ground-level images remains underexplored.Hence,this research proposed the Visual Aquatic Generalist(VAGen)as a countermeasure.Being a lightweight model for waterbody segmentation inspired by visual In-Context Learning(ICL)and Visual Prompting(VP),VAGen refines large visual models by innovatively adding learnable perturbations to enhance the quality of prompts in ICL.As demonstrated by the experimental results,VAGen demonstrated a significant increase in the mean Intersection over Union(mIoU)metric,showing a 22.38%enhancement when compared to the baseline model that lacked the integration of learnable prompts.Moreover,VAGen surpassed the current stateof-the-art(SOTA)task-specific models designed for waterbody segmentation by 6.20%.The performance evaluation and analysis of VAGen indicated its capacity to substantially reduce the number of trainable parameters and computational overhead,and proved its feasibility to be deployed on cost-limited devices including unmanned aerial vehicles(UAVs)and mobile computing platforms.This study thereby makes a valuable contribution to the field of computer vision,offering practical solutions for engineering applications related to urban flood monitoring,agricultural water resource management,and environmental conservation efforts. 展开更多
关键词 visual in-context learning visual prompting Vision foundation model Parameter-efficient fine-tuning Waterbody segmentation Deep learning
原文传递
IEPT:input-enhanced prompt tuning for visual-language models
5
作者 Chunru Dong Junyuan Liu +2 位作者 Qiang Hua Jiahong Tang Feng Zhang 《CCF Transactions on High Performance Computing》 2025年第6期494-508,共15页
Prompt learning has become crucial for adapting Visual Language Models(VLM)to downstream tasks.Although existing prompt learning models have made significant strides,they still face two major challenges:1.Too much att... Prompt learning has become crucial for adapting Visual Language Models(VLM)to downstream tasks.Although existing prompt learning models have made significant strides,they still face two major challenges:1.Too much attention is paid to learning about basic classes,making it harder to understand novel classes;2.Most methods only rely on the context information provided by the prompt template,resulting in limited text features.In this study,we propose a new fine-tuning method for Visual-Language Models called Input-Enhanced Prompt Tuning(IEPT).The IEPT improves the generalization of VLMs for downstream tasks by introducing two components,i.e.,the Data Augmentation Framework(DAF)and the Category Generalization Optimizer(CGO).Specifically,the DAF employs Large Language Models to resolve issues of word ambiguity by obtaining more class label context,and uses simple image augmentation to address the issue of limited features by providing more image samples.The CGO prevents overfitting by adding new class names during training.Experiments show that the performance of IEPT in various evaluation suites is better or comparable to that of the existing method,covering basic to novel generalization,domain generalization,and cross-dataset evaluation.Compared to the state-of-the-art method PromptSRC,IEPT achieves an absolute improvement of 0.40%for base classes,1.56%for novel classes and 1.04%on the harmonic mean,averaged over 11 datasets.In addition,we present detailed ablation studies that validate the individual contributions of DAF and CGO to the overall performance of IEPT.Our code is available at https://github.com/ayuan 0626/IEPT. 展开更多
关键词 Prompt learning·visual language models·Fine-tuning·Data augmentation·Generalization
在线阅读 下载PDF
Prompt learning in computer vision: a survey 被引量:4
6
作者 Yiming LEI Jingqi LI +2 位作者 Zilong LI Yuan CAO Hongming SHAN 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2024年第1期42-63,共22页
Prompt learning has attracted broad attention in computer vision since the large pre-trained visionlanguagemodels (VLMs) exploded. Based on the close relationship between vision and language information builtby VLM, p... Prompt learning has attracted broad attention in computer vision since the large pre-trained visionlanguagemodels (VLMs) exploded. Based on the close relationship between vision and language information builtby VLM, prompt learning becomes a crucial technique in many important applications such as artificial intelligencegenerated content (AIGC). In this survey, we provide a progressive and comprehensive review of visual promptlearning as related to AIGC. We begin by introducing VLM, the foundation of visual prompt learning. Then, wereview the vision prompt learning methods and prompt-guided generative models, and discuss how to improve theefficiency of adapting AIGC models to specific downstream tasks. Finally, we provide some promising researchdirections concerning prompt learning. 展开更多
关键词 Prompt learning visual prompt tuning(VPT) Image generation Image classification Artificial intelligence generated content(AIGC)
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部