期刊文献+

面向肺部肿瘤分类的跨模态Light-3Dformer模型

Cross-Modal Light-3Dformer Model for Lung Tumor Classification
在线阅读 下载PDF
导出
摘要 基于深度学习的三维多模态正电子发射型断层扫描/计算机断层扫描(Positron Emission Tomography/Computed Tomography,PET/CT)肺部肿瘤识别是一个重要的研究方向.肺部肿瘤病灶的空间形状不规则、与周围组织边界模糊,导致模型难以充分提取肿瘤特征,且模型在三维任务中需要较高的计算复杂度.针对上述问题,本文提出一种跨模态Light-3Dformer的三维肺部肿瘤识别模型.本文的主要创新工作有以下几个方面.首先,采用主、辅网络结构,其中主干网络提取PET/CT图像特征,辅助网络提取PET图像和CT图像特征,并采用轻量化跨模态协同注意力实现多模态特征增强和交互式学习.其次,设计Light-3Dformer模块,在该模块中,将Transformer的2次矩阵乘法操作更新为全局注意力机制Lightformer的线性元素乘法操作;设计级联Lightformer结构,其输出特征图和最初的输入特征图融合,通过并行和融合更多的深浅层特征,可以实现轻量化和提取丰富的梯度信息;设计无参数的注意力,该机制能从通道、空间和断层3个方面增强肺部肿瘤特征提取能力.再次,设计轻量化跨模态协同注意力模块(Light Cross-modal Collaborative Attention Module,LCCAM),该模块能充分学习三维多模态影像的跨模态优势信息,对深浅层特征进行交互式学习.最后,进行消融实验和对比实验,在自建的肺部肿瘤三维多模态数据集中,本文模型在计算量和运行时间最优的前提下,准确率和曲线下面积(Area Under the Curve,AUC)值分别达到90.19%和89.81%,与3D-SwinTransformer-S模型相比,参数量降低117倍,计算量降低400倍.实验结果表明:本文模型能更好地提取肺部肿瘤病灶的多模态信息,这为深度学习三维模型轻量化和多模态交互提供了新思路. Recognition of 3D multimodal positron emission tomography/computed tomography(PET/CT)lung tumor using deep learning is an important research area.In medical images of lung tumors,the spatial shape of lesions is irregular and the boundary between the lesions and the surrounding tissues is blurred,which makes it difficult for the model to fully extract tumor features,and the computational complexity of the model is higher in three-dimensional tasks.To solve the above problems,a cross-modal Light-3Dformer 3D lung tumor recognition model is proposed in this paper.The main contributions of this paper are as follows.Firstly,the backbone network extracts PET/CT image features,and the auxiliary network extracts PET image features and CT image features.Multi-modal feature enhancement and interactive learning are realized by lightweight cross-modal collaborative attention.Secondly,Light-3Dformer module are designed.In this module,Updating the 2 times matrix multiplication operation of Transformer to the linear element multiplication operation of Lightformer;The cascade Lightformer structure is designed,the output feature map of the cascade Lightformer structure and the initial input feature map are fused,through parallel and deep and shallow feature fusion,lightweight and rich gradient information can be realized;Designing with parameter less attention,this structure can enhance the ability of lung tumor feature extraction from three aspects:channel,space,and tomography image.Thirdly,lightweight cross-modal collaborative attention module(LCCAM)is designed,which can fully learn the cross-modal advantage information of 3D multi-modal images and carry out interactive learning of deep and shallow features.Finally,ablation experiments and comparative experiments.In the self-built 3D multi-modal data set of lung tumor,the accuracy and area under the curve(AUC)values of the model are 90.19%and 89.81%,respectively,under the premise of optimal computation and running time.Comparing with the 3DSwinTransformer-S model,the computation quantity is reduced by 117 times,and the calculation quantity is reduced by 400 times.The experimental results show that the model can better extract multi-modal information of lung tumor lesions,which provides a new idea for lightweight and multi-modal interaction of deep learning 3D models.
作者 周涛 牛玉霞 叶鑫宇 刘隆 陆惠玲 ZHOU Tao;NIU Yu-xia;YE Xin-yu;LIU Long;LU Hui-ling(School of Computer Science and Engineering,North Minzu University,Yinchuan,Ningxia 750021,China;Laboratory of Image&Graphics Intelligent Processing of State Ethnic Affairs Commission,North Minzu University,Yinchuan,Ningxia 750021,China;School of medical information&Engineering,Ningxia Medical University,Yinchuan,Ningxia 750004,China)
出处 《电子学报》 北大核心 2025年第3期951-961,共11页 Acta Electronica Sinica
基金 国家自然科学基金(No.62062003) 宁夏自然科学基金(No.2023AAC03293)。
关键词 肺部肿瘤 多模态图像 Transformer Light-3Dformer 轻量化跨模态协同注意力 lung tumor multimodal images Transformer Light-3Dformer light cross-modal collaborative attention
  • 相关文献

参考文献2

二级参考文献1

共引文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部