Cross-modal retrieval tries to achieve mutual retrieval between modalities by establishing consistent alignment for different modal data.Currently,many cross-modal retrieval methods have been proposed and have achieve...Cross-modal retrieval tries to achieve mutual retrieval between modalities by establishing consistent alignment for different modal data.Currently,many cross-modal retrieval methods have been proposed and have achieved excellent results;however,these are trained with clean cross-modal pairs,which are semantically matched but costly,compared with easily available data with noise alignment(i.e.,paired but mismatched in semantics).When training these methods with noise-aligned data,the performance degrades dramatically.Therefore,we propose a robust cross-modal retrieval with alignment refurbishment(RCAR),which significantly reduces the impact of noise on the model.Specifically,RCAR first conducts multi-task learning to slow down the overfitting to the noise to make data separable.Then,RCAR uses a two-component beta-mixture model to divide them into clean and noise alignments and refurbishes the label according to the posterior probability of the noise-alignment component.In addition,we define partial and complete noises in the noise-alignment paradigm.Experimental results show that,compared with the popular cross-modal retrieval methods,RCAR achieves more robust performance with both types of noise.展开更多
基金supported by the National Natural Science Foundation of China(No.12172186)。
文摘Cross-modal retrieval tries to achieve mutual retrieval between modalities by establishing consistent alignment for different modal data.Currently,many cross-modal retrieval methods have been proposed and have achieved excellent results;however,these are trained with clean cross-modal pairs,which are semantically matched but costly,compared with easily available data with noise alignment(i.e.,paired but mismatched in semantics).When training these methods with noise-aligned data,the performance degrades dramatically.Therefore,we propose a robust cross-modal retrieval with alignment refurbishment(RCAR),which significantly reduces the impact of noise on the model.Specifically,RCAR first conducts multi-task learning to slow down the overfitting to the noise to make data separable.Then,RCAR uses a two-component beta-mixture model to divide them into clean and noise alignments and refurbishes the label according to the posterior probability of the noise-alignment component.In addition,we define partial and complete noises in the noise-alignment paradigm.Experimental results show that,compared with the popular cross-modal retrieval methods,RCAR achieves more robust performance with both types of noise.