期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
Uncertainty-aware coarse-to-fine alignment for text-image person retrieval
1
作者 Yifei Deng Zhengyu Chen +1 位作者 Chenglong Li Jin Tang 《Visual Intelligence》 2025年第1期72-85,共14页
Text-to-image person retrieval,a fine-grained cross-modal retrieval problem,aims to search for person images from an image library that match a given textual caption.Existing text-to-image person retrieval methods usu... Text-to-image person retrieval,a fine-grained cross-modal retrieval problem,aims to search for person images from an image library that match a given textual caption.Existing text-to-image person retrieval methods usually use fixed-point embedding to express the semantics of the two modalities and perform multi-granularity alignment between modalities in the embedding space.However,owing to the inherent mutual one-to-many correspondence between images and texts,it is often difficult for fixed-point embedding methods to adequately capture this relationship,leading to erroneous retrieval results.To address this problem,we propose a novel uncertainty-aware coarse-to-fine alignment method,which first maps fixed-point embedding to probability distributions and then aligns two modalities in terms of distributions and sampling points at a coarse-to-fine granularity,for accurate text-to-image person retrieval.Specifically,we first introduce two contrastive learning tasks of distribution contrast learning and point contrast learning,to achieve coarse-grained inter-modal alignment with uncertainty-aware.The distribution contrast learning task ensures that distributions with the same identity are as similar as possible across modalities through distribution-based contrastive learning.The point contrast learning task performs the contrastive learning of inter-modal and intra-modal sampling points,which not only models rich and diverse cross-modal associations,but also optimizes the learning of distributions.For the fine-grained association requirements of text-to-image person retrieval,we design the task of uncertainty-aware attribute masking language reconstruction,which achieves fine-grained alignment by randomly masking attribute words in the text and reconstructing them via inter-modal sample point interactions.Extensive experiments on two public datasets demonstrate the superior performance of our method. 展开更多
关键词 Cross-modal retrieval Uncertainty-aware Coarse-to-fine alignment probalility distribution Contrastive learning
在线阅读 下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部