基于混合注意力与偏振非对称损失的哈希图像检索被引量：1

Hash Image Retrieval Based on Mixed Attention and Polarization Asymmetric Loss

下载PDF

导出

摘要随着互联网的不断发展,人们每天都在制造大量且复杂的图像数据,使当今主流的社交媒体充满了图像等媒体数据,快速且准确地对图像进行检索已经成为了有意义且亟待解决的问题。卷积神经网络(CNN)模型是现有的主流哈希图像检索模型。然而,CNN的卷积操作只能捕捉局部特征,无法处理全局信息;且卷积操作的感受野大小固定,无法适应不同尺度的输入图像。为此,基于Transformer模型中的Swin-Transformer模型实现了图像的有效检索。Transformer模型利用自注意力机制和位置编码操作,有效地解决了CNN的问题。而现有的Swin-Transformer哈希图像检索模型的窗口注意力模块在提取图像特征时对于图像的不同通道给予了相同的权重,忽略了图像不同通道特征信息的差异性和依赖关系,使得提取的特征的可利用性降低,造成了计算资源的浪费。针对上述问题,提出了基于混合注意力与偏振非对称损失的哈希图像检索模型(HRMPA)。该设计基于Swin-Transformer的哈希特征提取模块(HFST),在HFST中的(S)W-MSA模块加入了通道注意力模块(CAB),得到基于混合注意力的哈希特征提取模块(HFMA),从而使模型对输入图像的不同通道的特征赋予不同的权重信息,增加了提取特征的多样性且最大限度地利用了计算资源。同时,为了最小化类内汉明距离、最大化类间汉明距离,并充分利用数据的监督信息,提高图像的检索精度,提出了偏振非对称损失函数(PA),使偏振损失和非对称损失以一定的权重分配比进行组合,从而有效地提高了图像的检索精度。实验表明,在哈希编码长度为16 bits时,所提模型在CIFAR-10单标签数据集上,最高平均精度均值达到98.73%,比VTS16-CSQ模型提高了1.51%;在NUSWIDE多标签数据集上,最高平均精度均值达到90.65%,比TransHash提高了18.02%,比VTS16-CSQ模型提高了5.92%。 With the continuous development of the Internet,massive and complex image data is being created every day,so that today’s mainstream social media is full of complex media data such as images.Effectively processing these image data can not only increase the utilization rate of image data but also improve the user experience.Therefore,how to retrieve images quickly and accurately has become a meaningful and urgent problem.The current mainstream hash image retrieval model is convolutional neural network model.However,the convolution operation of CNN can only capture local features,but cannot process global information,and the receptive field size of the convolution operation is fixed,it cannot adapt to input images of different scales.This paper proposes based on Swin Transformer model in Transformer model to achieve effective image retrieval.The Transformer model effectively solves the CNN problem with self-attention mechanism and location coding operation.However,the window attention module of the existing Swin-Transformer hashing image retrieval model gives the same weight to different channels of the image when extracting image features,thus ignoring the differences and dependencies of the feature information of different channels of the image,which reduces the availability of the extracted features and leads to a waste of computing resources.To solve these problems,this paper proposes hash image retrieval model based on mixed attention and polarization asymmetric loss.The model design is based on Swin-Transformer feature extraction module.The window self-attention module in HFST has been added to the channel attention block.The hash feature extraction module based on mixed attention is obtained,which enables the model to assign different weight information to the features of different channels of the input image.Increase the diversity of extracted features and maximize the use of computing resources.At the same time,in order to minimize the intra-class Hamming distance,maximize the inter-class Hamming distance,make full use of the supervision information of the data,and improve the retrieval accuracy of the image,this paper proposes polarization asymmetric loss function.The polarization loss and asymmetric loss are combined with a certain weight allocation ratio,so effectively improve the image retrieval precision.The experimental results show the validity and rationality of the proposed method.For example,when the hash coding length is 16 bits,the proposed model has a maximum average accuracy of 98.73%on the CIFAR-10 single-label dataset,which is 1.51%higher than that of the VTS16-CSQ model.The highest average retrieval accuracy mean is 90.65%on NUSWIDE multi-label dataset,which is 18.02%higher than TransHash and 5.92%higher than VTS16-CSQ model.

作者刘华咏徐明慧 LIU Huayong;XU Minghui(Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning,Wuhan 430079,China;School of Computer Science,Central China Normal University,Wuhan 430079,China)

机构地区人工智能与智慧学习湖北省重点实验室华中师范大学计算机学院

出处《计算机科学》北大核心 2025年第8期204-213,共10页 Computer Science

基金教育部人文社会科学研究项目(21YJA870005)。

关键词哈希检索空间注意力 Swin-Transformer 混合注意力偏振损失非对称损失 Hash search Spatial attention Swin-Transformer Mixed attention Polarization loss Asymmetric loss

分类号 TP391.41 [自动化与计算机技术—计算机应用技术] TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献4

1苗壮,赵昕昕,李阳,王家宝,张睿.基于Swin Transformer的深度有监督哈希图像检索方法[J].湖南大学学报（自然科学版）,2023,50(8):62-71. 被引量：11
2冯兴杰,程毅玮.基于深度卷积神经网络与哈希的图像检索[J].计算机工程与设计,2020,41(3):670-675. 被引量：6
3石灵奇,王玉玫.基于RAN与深度哈希的图像检索方法研究[J].电子设计工程,2021,29(6):99-103. 被引量：3
4贺超,魏宏喜.结合Transformer与非对称学习策略的图像检索[J].中国图象图形学报,2023,28(2):535-544. 被引量：8

二级参考文献8

1彭天强,栗芳.基于深度卷积神经网络和二进制哈希学习的图像检索方法[J].电子与信息学报,2016,38(8):2068-2075. 被引量：36
2李彦冬,郝宗波,雷航.卷积神经网络研究综述[J].计算机应用,2016,36(9):2508-2515. 被引量：588
3冯兴杰,程毅玮.基于深度卷积神经网络与哈希的图像检索[J].计算机工程与设计,2020,41(3):670-675. 被引量：6
4刘颖,程美,王富平,李大湘,刘伟,范九伦.深度哈希图像检索方法综述[J].中国图象图形学报,2020,25(7):1296-1317. 被引量：24
5龙显忠,程成,李云.深度优先局部聚合哈希[J].湖南大学学报（自然科学版）,2021,48(6):58-66. 被引量：2
6万方,强浩鹏,雷光波.自监督深度离散哈希图像检索[J].中国图象图形学报,2021,26(11):2659-2669. 被引量：6
7程钟琪,石江豪,毛懿俊.基于深度学习的图像检索技术的实现[J].网络安全技术与应用,2019,0(9):38-39. 被引量：1
8董潇潇,何小海,吴晓红,卿粼波,滕奇志.基于注意力掩模融合的目标检测算法[J].液晶与显示,2019,34(8):825-833. 被引量：15

共引文献23

1石灵奇,王玉玫.基于RAN与深度哈希的图像检索方法研究[J].电子设计工程,2021,29(6):99-103. 被引量：3
2张馨月.基于卷积神经网络的图书馆文献自动检索机器人研究[J].自动化与仪器仪表,2022(8):194-198. 被引量：5
3黄界生.基于深度学习的计算机视觉中图像检索算法研究[J].信息技术与信息化,2022(9):181-184. 被引量：4
4孙奇平.基于深度哈希的图像检索技术研究[J].电子技术与软件工程,2023(5):186-190. 被引量：1
5刘菁.基于语义匹配和Bi LSTM的机器翻译技术实现[J].自动化与仪器仪表,2023(8):281-285. 被引量：2
6张益兵.基于区块链技术的海量Web数据关联存储系统设计[J].电子设计工程,2023,31(19):157-161. 被引量：2
7李为杰,杨志景.基于自监督蒸馏辅助学习的哈希图像检索[J].计算机工程与设计,2023,44(11):3420-3426. 被引量：1
8郭英英.基于特征聚类的网络海量图像快速检索研究[J].信息与电脑,2023,35(17):48-50.
9薛俊杰.基于迁移学习技术的机器翻译优化模型研究[J].自动化与仪器仪表,2023(10):183-186. 被引量：2
10彭宏,侯小刚,曾凡璐,吴萌.融合金字塔和注意力机制的文物子图检索模型[J].中国传媒大学学报（自然科学版）,2024,31(2):19-26.

同被引文献4

1李向阳,庄越挺,潘云鹤.基于内容的图像检索技术与系统[J].计算机研究与发展,2001,38(3):344-354. 被引量：154
2黄祥林,沈兰荪.基于内容的图像检索技术研究[J].电子学报,2002,30(7):1065-1071. 被引量：102
3刘海龙,李宝安,吕学强,黄跃.基于深度卷积神经网络的图像检索算法研究[J].计算机应用研究,2017,34(12):3816-3819. 被引量：52
4贺超,魏宏喜.结合Transformer与非对称学习策略的图像检索[J].中国图象图形学报,2023,28(2):535-544. 被引量：8

引证文献1

1王越芸,黄华.基于非对称框架的高效图像语义检索范式[J].计算机应用研究,2026,43(3):924-930.

1葛斌,郑海君,石怀忠,夏晨星,邬成.结合上下文信息的红外-可见光行人重识别[J].红外技术,2025,47(6):722-728.
2Tiantian Gong,Yanan Chen,Shuo Wang,Miao Wang,Junwei Zhao.Rigid-flexible-ligand-ornamented lanthanide-incorporated selenotungstates and photoluminescence properties[J].Chinese Journal of Structural Chemistry,2024(9):52-59.
3邵伟志,熊思宇,潘丽丽.基于三元组哈希损失的半监督图像检索[J].北京航空航天大学学报,2025,51(7):2526-2537. 被引量：1
4陆佳嘉,惠彦凯,杭琦琳,唐玥.基于改进ORB的视觉里程计设计[J].电子设计工程,2025,33(16):1-6. 被引量：1
5王栋欢,于艾洋,肖洪.基于无监督学习的航空发动机铸造涡轮叶片缺陷检测方法[J].航空动力学报,2025,40(6):247-258.
6杨泽锐,杨昕南,郑夏生.龙脑樟磷脂酸磷酸酶基因的克隆及表达分析[J].种子,2025,44(6):17-22.
7郭致远,刘瑞,赵轩,王姝.动态场景下基于地面分割与回环优化的激光雷达定位与建图系统[J].计算机应用,2025,45(S1):302-308.
8付艳斌,高越,闫军南,乔少博,陈湘生,耶律根迪,邬泽,张彦龙.基于少量标注样本的TBM围岩分级半监督学习方法[J].应用基础与工程科学学报,2025,33(3):611-620.
9申溥婷,邓凯峰,蒲阳,蔡嘉华,鲁彩江.基于改进SGM算法的无人机与电力金具测距方法研究[J].自动化仪表,2025,46(7):51-56. 被引量：1
10吴游龙,韩正庆,刘淑萍.基于GA-PSO的牵引供电系统自愈重构策略研究[J].电力系统保护与控制,2025,53(14):80-89. 被引量：1

计算机科学

2025年第8期

浏览历史

内容加载中请稍等...

基于混合注意力与偏振非对称损失的哈希图像检索被引量：1

参考文献4

二级参考文献8

共引文献23

同被引文献4

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于混合注意力与偏振非对称损失的哈希图像检索 被引量：1

参考文献4

二级参考文献8

共引文献23

同被引文献4

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于混合注意力与偏振非对称损失的哈希图像检索被引量：1