期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
InferEdit:An instruction-based system with a multimodal LLM for complex multi-target image editing
1
作者 Zhiyong Huang Yali She +1 位作者 MengLi Xiang TuoJun Ding 《Visual Informatics》 2025年第3期122-125,共4页
To address the limitations of existing instruction-based image editing methods in handling complex Multi-target instructions and maintaining semantic consistency,we present InferEdit,a training-free image editing syst... To address the limitations of existing instruction-based image editing methods in handling complex Multi-target instructions and maintaining semantic consistency,we present InferEdit,a training-free image editing system driven by a Multimodal Large Language Model(MLLM).The system parses complex multi-target instructions into sequential subtasks and performs editing iteratively through target localization and semantic reasoning.Furthermore,to adaptively select the most suitable editing models,we construct the evaluation dataset InferDataset to evaluate various editing models on three types of tasks:object removal,object replacement,and local editing.Based on a comprehensive scoring mechanism,we build Binary Search Trees(BSTs)for different editing types to facilitate model scheduling.Experiments demonstrate that InferEdit outperforms existing methods in handling complex instructions while maintaining semantic consistency and visual quality. 展开更多
关键词 MLLM Image editing complex instructions
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部