This paper introduces a novel method for medical image retrieval and classification by integrating a multi-scale encoding mechanism with Vision Transformer(ViT)architectures and a dynamic multi-loss function.The multi...This paper introduces a novel method for medical image retrieval and classification by integrating a multi-scale encoding mechanism with Vision Transformer(ViT)architectures and a dynamic multi-loss function.The multi-scale encoding significantly enhances the model’s ability to capture both fine-grained and global features,while the dynamic loss function adapts during training to optimize classification accuracy and retrieval performance.Our approach was evaluated on the ISIC-2018 and ChestX-ray14 datasets,yielding notable improvements.Specifically,on the ISIC-2018 dataset,our method achieves an F1-Score improvement of+4.84% compared to the standard ViT,with a precision increase of+5.46% for melanoma(MEL).On the ChestX-ray14 dataset,the method delivers an F1-Score improvement of 5.3%over the conventional ViT,with precision gains of+5.0% for pneumonia(PNEU)and+5.4%for fibrosis(FIB).Experimental results demonstrate that our approach outperforms traditional CNN-based models and existing ViT variants,particularly in retrieving relevant medical cases and enhancing diagnostic accuracy.These findings highlight the potential of the proposedmethod for large-scalemedical image analysis,offering improved tools for clinical decision-making through superior classification and case comparison.展开更多
在细粒度图像检索领域,现有研究成果主要集中于采用深层网络实现判别特征提取与精准定位,忽略了浅层特征信息的重要性,且无法消除背景中的复杂噪声干扰,限制了检索性能的提升。有鉴于此,提出了一种基于多层次特征提取的细粒度图像哈希...在细粒度图像检索领域,现有研究成果主要集中于采用深层网络实现判别特征提取与精准定位,忽略了浅层特征信息的重要性,且无法消除背景中的复杂噪声干扰,限制了检索性能的提升。有鉴于此,提出了一种基于多层次特征提取的细粒度图像哈希检索方法(Fine-grained Deep Hashing image retrieval method based on Multi-level Feature Extraction, FDH-MFE)。该方法主要关注不同层次间特征的关联性,并增强了局部特征的提取能力。首先,提出了一个特征提取模块,旨在从网络的不同阶段提取细粒度特征,并通过图神经网络揭示其潜在的长距离依赖关系,为后续阶段提供更全面和精细的特征表示。其次,设计了一种代理损失算法,使得哈希码分布更加均匀,从而提升细粒度特征的区分能力。最后,通过设计背景抑制算法并结合三元组损失,增强了模型拟合全局分布的能力,使得所提出的方法在细粒度图像检索任务中表现出色。实验结果表明:该方法在4个公开数据集上的平均检索精度相较于次先进方法分别提高了15.03%、10.94%、9.98%和9.78%。展开更多
基金funded by the Deanship of Research and Graduate Studies at King Khalid University through small group research under grant number RGP1/278/45.
文摘This paper introduces a novel method for medical image retrieval and classification by integrating a multi-scale encoding mechanism with Vision Transformer(ViT)architectures and a dynamic multi-loss function.The multi-scale encoding significantly enhances the model’s ability to capture both fine-grained and global features,while the dynamic loss function adapts during training to optimize classification accuracy and retrieval performance.Our approach was evaluated on the ISIC-2018 and ChestX-ray14 datasets,yielding notable improvements.Specifically,on the ISIC-2018 dataset,our method achieves an F1-Score improvement of+4.84% compared to the standard ViT,with a precision increase of+5.46% for melanoma(MEL).On the ChestX-ray14 dataset,the method delivers an F1-Score improvement of 5.3%over the conventional ViT,with precision gains of+5.0% for pneumonia(PNEU)and+5.4%for fibrosis(FIB).Experimental results demonstrate that our approach outperforms traditional CNN-based models and existing ViT variants,particularly in retrieving relevant medical cases and enhancing diagnostic accuracy.These findings highlight the potential of the proposedmethod for large-scalemedical image analysis,offering improved tools for clinical decision-making through superior classification and case comparison.
文摘在细粒度图像检索领域,现有研究成果主要集中于采用深层网络实现判别特征提取与精准定位,忽略了浅层特征信息的重要性,且无法消除背景中的复杂噪声干扰,限制了检索性能的提升。有鉴于此,提出了一种基于多层次特征提取的细粒度图像哈希检索方法(Fine-grained Deep Hashing image retrieval method based on Multi-level Feature Extraction, FDH-MFE)。该方法主要关注不同层次间特征的关联性,并增强了局部特征的提取能力。首先,提出了一个特征提取模块,旨在从网络的不同阶段提取细粒度特征,并通过图神经网络揭示其潜在的长距离依赖关系,为后续阶段提供更全面和精细的特征表示。其次,设计了一种代理损失算法,使得哈希码分布更加均匀,从而提升细粒度特征的区分能力。最后,通过设计背景抑制算法并结合三元组损失,增强了模型拟合全局分布的能力,使得所提出的方法在细粒度图像检索任务中表现出色。实验结果表明:该方法在4个公开数据集上的平均检索精度相较于次先进方法分别提高了15.03%、10.94%、9.98%和9.78%。
文摘现有的基于双向长短时记忆(BiLSTM)网络的命名实体识别(NER)模型难以全面理解文本的整体语义以及捕捉复杂的实体关系。因此,提出一种基于全域信息融合和多维关系感知的NER模型。首先,通过BERT(Bidirectional Encoder Representations from Transformers)获取输入序列的向量表示,并结合BiLSTM进一步学习输入序列的上下文信息。其次,提出由梯度稳定层和特征融合模块组成的全域信息融合机制:前者使模型保持稳定的梯度传播并更新优化输入序列的表示,后者则融合BiLSTM的前后向表示获取更全面的特征表示。接着,构建多维关系感知结构学习不同子空间单词的关联性,以捕获文档中复杂的实体关系。此外,使用自适应焦点损失函数动态调整不同类别实体的权重,提高模型对少数类实体的识别性能。最后,在7个公开数据集上将所提模型和11个基线模型进行对比,实验结果表明所提模型的F1值均优于对比模型,可见该模型的综合性较优。