The accurate segmentation of deep gray matter nuclei is critical for neuropathological research,disease diagnosis and treatment.Existing methods employ the supervised learning training approach,which requires large la...The accurate segmentation of deep gray matter nuclei is critical for neuropathological research,disease diagnosis and treatment.Existing methods employ the supervised learning training approach,which requires large labeled datasets.It is challenging and time-consuming to obtain such datasets for medical image analysis.In addition,these methods based on convolutional neural networks(CNNs)only achieve suboptimal performance due to the locality of convolutional operations.Vision Transformers(ViTs)efficiently model long-range dependencies and thus have the potentiality to outperform these methods in segmentation tasks.To address these issues,we propose a novel hybrid network based on self-supervised pre-training for deep gray matter nuclei segmentation.Specifically,we present a CNN-Transformer hybrid network(CTNet),whose encoder consists of 3D CNN and ViT to learn local spatial-detailed features and global semantic information.A self-supervised learning(SSL)approach that integrates rotation prediction and masked feature reconstruction is proposed to pre-train the CTNet,enabling the model to learn valuable visual representations from unlabeled data.We evaluate the effectiveness of our method on 3T and 7T human brain MRI datasets.The results demonstrate that our CTNet achieves better performance than other comparison models and our pre-training strategy outperforms other advanced self-supervised methods.When the training set has only one sample,our pre-trained CTNet enhances segmentation performance,showing an 8.4%improvement in Dice similarity coefficient(DSC)compared to the randomly initialized CTNet.展开更多
基金supported in part by the National Natural Science Foundation of China under Grant 62071405the National Natural Science Foundation of China under Grant 12175189.
文摘The accurate segmentation of deep gray matter nuclei is critical for neuropathological research,disease diagnosis and treatment.Existing methods employ the supervised learning training approach,which requires large labeled datasets.It is challenging and time-consuming to obtain such datasets for medical image analysis.In addition,these methods based on convolutional neural networks(CNNs)only achieve suboptimal performance due to the locality of convolutional operations.Vision Transformers(ViTs)efficiently model long-range dependencies and thus have the potentiality to outperform these methods in segmentation tasks.To address these issues,we propose a novel hybrid network based on self-supervised pre-training for deep gray matter nuclei segmentation.Specifically,we present a CNN-Transformer hybrid network(CTNet),whose encoder consists of 3D CNN and ViT to learn local spatial-detailed features and global semantic information.A self-supervised learning(SSL)approach that integrates rotation prediction and masked feature reconstruction is proposed to pre-train the CTNet,enabling the model to learn valuable visual representations from unlabeled data.We evaluate the effectiveness of our method on 3T and 7T human brain MRI datasets.The results demonstrate that our CTNet achieves better performance than other comparison models and our pre-training strategy outperforms other advanced self-supervised methods.When the training set has only one sample,our pre-trained CTNet enhances segmentation performance,showing an 8.4%improvement in Dice similarity coefficient(DSC)compared to the randomly initialized CTNet.