Medical image segmentation is a crucial preliminary step for a number of downstream diagnosis tasks.As deep convolutional neural networks successfully promote the development of computer vision,it is possible to make ...Medical image segmentation is a crucial preliminary step for a number of downstream diagnosis tasks.As deep convolutional neural networks successfully promote the development of computer vision,it is possible to make medical image segmentation a semi-automatic procedure by applying deep convolutional neural networks to finding the contours of regions of interest that are then revised by radiologists.However,supervised learning necessitates large annotated data,which are difficult to acquire especially for medical images.Self-supervised learning is able to take advantage of unlabeled data and provide good initialization to be finetuned for downstream tasks with limited annotations.Considering that most self-supervised learning especially contrastive learning methods are tailored to natural image classification and entail expensive GPU resources,we propose a novel and simple pretext-based self-supervised learning method that exploits the value of positional information in volumetric medical images.Specifically,we regard spatial coordinates as pseudo labels and pretrain the model by predicting positions of randomly sampled 2D slices in volumetric medical images.Experiments on four semantic segmentation datasets demonstrate the superiority of our method over other self-supervised learning methods in both semi-supervised learning and transfer learning settings.Codes are available at https://github.com/alienzyj/PPos.展开更多
BACKGROUND Drug utilization research has an important role in assisting the healthcare administration to know,compute,and refine the prescription whose principal objective is to enable the rational use of drugs.Resear...BACKGROUND Drug utilization research has an important role in assisting the healthcare administration to know,compute,and refine the prescription whose principal objective is to enable the rational use of drugs.Research in developing nations relating to the cost of treatment is scarce when compared with developed countries.Thus,the drug utilization research studies from developing nations are most needed,and their number has been growing.AIM To evaluate patterns of utilization of antipsychotic drugs and direct medical cost analysis in patients newly diagnosed with schizophrenia.METHODS The present study was observational in type and based on a retrospective cohort to evaluate patterns of utilization of antipsychotic drugs using World Health Organization(WHO)core prescribing indicators and anatomical therapeutic chemical/defined daily dose indicators.We also calculated direct medical costs for a period of 6 months.RESULTS This study has found that atypical antipsychotics are the mainstay of treatment for schizophrenia in every age group and subcategories of schizophrenia.The evaluation based on WHO prescribing indicators showed a low average number of drugs per prescription and low prescribing frequency of antipsychotics from the National List of Essential Medicines 2015 and the WHO Essential Medicines List 2019.The total mean drug cost of our study was 1396 Indian rupees.The total mean cost due to the investigation in our study was 1017.34 Indian rupees.Therefore,the total mean direct medical cost incurred on patients in our study was 4337.28 Indian rupees.CONCLUSION The information from the present study can be used for reviewing and updating treatment policy at the institutional level.展开更多
Background:Medical artificial intelligence(MAI)is a synthesis of medical science and artificial intelligence development,serving as a crucial field in the current advancement and application of AI.In the process of de...Background:Medical artificial intelligence(MAI)is a synthesis of medical science and artificial intelligence development,serving as a crucial field in the current advancement and application of AI.In the process of developing medical AI,there may arise not only legal risks such as infringement of privacy rights and health rights but also ethical risks stemming from violations of the principles of beneficence and non-maleficence.Methods:To effectively address the damages caused by MAI in the future,it is necessary to establish a hierarchical governance system with MAI.This paper examines the systematic collection of local practices in China and the induction and integration of legal remedies for the damage of MAI.Results:To effectively address the ethical and legal challenges of medical artificial intelligence,a hierarchical regulatory system should be established,which based on the impact of intervention measures on natural rights and differences in intervention timing.This paper finally obtains a legal hierarchical governance system corresponding to the ethical risks and legal risks of MAI in China.Conclusion:The Chinese government has formed a multi-agent governance system based on the impact of risks on rights and the timing of legal intervention,which provides a reference for other countries to follow up on the research on MAI risk management.展开更多
The volumetric rendering of 3 D medical image data is very effective method for communication about radiological studies to clinicians. Algorithms that produce images with artifacts and inaccuracies are not clinically...The volumetric rendering of 3 D medical image data is very effective method for communication about radiological studies to clinicians. Algorithms that produce images with artifacts and inaccuracies are not clinically useful. This paper proposed a direct voxel projection algorithm to implement volumetric data rendering. Using this algorithm, arbitrary volume rotation, transparent and cutaway views are generated satisfactorily. Compared with the existing raytracing methods, it improves the projection image quality greatly. Some experimental results about real medical CT image data demonstrate the advantages and fidelity of the proposed algorithm.展开更多
Background A medical content-based image retrieval(CBIR)system is designed to retrieve images from large imaging repositories that are visually similar to a user′s query image.CBIR is widely used in evidence-based di...Background A medical content-based image retrieval(CBIR)system is designed to retrieve images from large imaging repositories that are visually similar to a user′s query image.CBIR is widely used in evidence-based diagnosis,teaching,and research.Although the retrieval accuracy has largely improved,there has been limited development toward visualizing important image features that indicate the similarity of retrieved images.Despite the prevalence of 3D volumetric data in medical imaging such as computed tomography(CT),current CBIR systems still rely on 2D cross-sectional views for the visualization of retrieved images.Such 2D visualization requires users to browse through the image stacks to confirm the similarity of the retrieved images and often involves mental reconstruction of 3D information,including the size,shape,and spatial relations of multiple structures.This process is time-consuming and reliant on users'experience.Methods In this study,we proposed an importance-aware 3D volume visualization method.The rendering parameters were automatically optimized to maximize the visibility of important structures that were detected and prioritized in the retrieval process.We then integrated the proposed visualization into a CBIR system,thereby complementing the 2D cross-sectional views for relevance feedback and further analyses.Results Our preliminary results demonstrate that 3D visualization can provide additional information using multimodal positron emission tomography and computed tomography(PETCT)images of a non-small cell lung cancer dataset.展开更多
Brain-computer interfaces(BCIs)represent an emerging technology that facilitates direct communication between the brain and external devices.In recent years,numerous review articles have explored various aspects of BC...Brain-computer interfaces(BCIs)represent an emerging technology that facilitates direct communication between the brain and external devices.In recent years,numerous review articles have explored various aspects of BCIs,including their fundamental principles,technical advancements,and applications in specific domains.However,these reviews often focus on signal processing,hardware development,or limited applications such as motor rehabilitation or communication.This paper aims to offer a comprehensive review of recent electroencephalogram(EEG)-based BCI applications in the medical field across 8 critical areas,encompassing rehabilitation,daily communication,epilepsy,cerebral resuscitation,sleep,neurodegenerative diseases,anesthesiology,and emotion recognition.Moreover,the current challenges and future trends of BCIs were also discussed,including personal privacy and ethical concerns,network security vulnerabilities,safety issues,and biocompatibility.展开更多
Image segmentation is attracting increasing attention in the field of medical image analysis.Since widespread utilization across various medical applications,ensuring and improving segmentation accuracy has become a c...Image segmentation is attracting increasing attention in the field of medical image analysis.Since widespread utilization across various medical applications,ensuring and improving segmentation accuracy has become a crucial topic of research.With advances in deep learning,researchers have developed numerous methods that combine Transformers and convolutional neural networks(CNNs)to create highly accurate models for medical image segmentation.However,efforts to further enhance accuracy by developing larger and more complex models or training with more extensive datasets,significantly increase computational resource consumption.To address this problem,we propose BiCLIP-nnFormer(the prefix"Bi"refers to the use of two distinct CLIP models),a virtual multimodal instrument that leverages CLIP models to enhance the segmentation performance of a medical segmentation model nnFormer.Since two CLIP models(PMC-CLIP and CoCa-CLIP)are pre-trained on large datasets,they do not require additional training,thus conserving computation resources.These models are used offline to extract image and text embeddings from medical images.These embeddings are then processed by the proposed 3D CLIP adapter,which adapts the CLIP knowledge for segmentation tasks by fine-tuning.Finally,the adapted embeddings are fused with feature maps extracted from the nnFormer encoder for generating predicted masks.This process enriches the representation capabilities of the feature maps by integrating global multimodal information,leading to more precise segmentation predictions.We demonstrate the superiority of BiCLIP-nnFormer and the effectiveness of using CLIP models to enhance nnFormer through experiments on two public datasets,namely the Synapse multi-organ segmentation dataset(Synapse)and the Automatic Cardiac Diagnosis Challenge dataset(ACDC),as well as a self-annotated lung multi-category segmentation dataset(LMCS).展开更多
A medical image encryption is proposed based on the Fisher-Yates scrambling,filter diffusion and S-box substitution.First,chaotic sequence associated with the plaintext is generated by logistic-sine-cosine system,whic...A medical image encryption is proposed based on the Fisher-Yates scrambling,filter diffusion and S-box substitution.First,chaotic sequence associated with the plaintext is generated by logistic-sine-cosine system,which is used for the scrambling,substitution and diffusion processes.The three-dimensional Fisher-Yates scrambling,S-box substitution and diffusion are employed for the first round of encryption.The chaotic sequence is adopted for secondary encryption to scramble the ciphertext obtained in the first round.Then,three-dimensional filter is applied to diffusion for further useful information hiding.The key to the algorithm is generated by the combination of hash value of plaintext image and the input parameters.It improves resisting ability of plaintext attacks.The security analysis shows that the algorithm is effective and efficient.It can resist common attacks.In addition,the good diffusion effect shows that the scheme can solve the differential attacks encountered in the transmission of medical images and has positive implications for future research.展开更多
On October 18,2017,the 19th National Congress Report called for the implementation of the Healthy China Strategy.The development of biomedical data plays a pivotal role in advancing this strategy.Since the 18th Nation...On October 18,2017,the 19th National Congress Report called for the implementation of the Healthy China Strategy.The development of biomedical data plays a pivotal role in advancing this strategy.Since the 18th National Congress of the Communist Party of China,China has vigorously promoted the integration and implementation of the Healthy China and Digital China strategies.The National Health Commission has prioritized the development of health and medical big data,issuing policies to promote standardized applica-tions and foster innovation in"Internet+Healthcare."Biomedical data has significantly contributed to preci-sion medicine,personalized health management,drug development,disease diagnosis,public health monitor-ing,and epidemic prediction capabilities.展开更多
Multimodal deep learning has emerged as a key paradigm in contemporary medical diagnostics,advancing precision medicine by enabling integration and learning from diverse data sources.The exponential growth of high-dim...Multimodal deep learning has emerged as a key paradigm in contemporary medical diagnostics,advancing precision medicine by enabling integration and learning from diverse data sources.The exponential growth of high-dimensional healthcare data,encompassing genomic,transcriptomic,and other omics profiles,as well as radiological imaging and histopathological slides,makes this approach increasingly important because,when examined separately,these data sources only offer a fragmented picture of intricate disease processes.Multimodal deep learning leverages the complementary properties of multiple data modalities to enable more accurate prognostic modeling,more robust disease characterization,and improved treatment decision-making.This review provides a comprehensive overview of the current state of multimodal deep learning approaches in medical diagnosis.We classify and examine important application domains,such as(1)radiology,where automated report generation and lesion detection are facilitated by image-text integration;(2)histopathology,where fusion models improve tumor classification and grading;and(3)multi-omics,where molecular subtypes and latent biomarkers are revealed through cross-modal learning.We provide an overview of representative research,methodological advancements,and clinical consequences for each domain.Additionally,we critically analyzed the fundamental issues preventing wider adoption,including computational complexity(particularly in training scalable,multi-branch networks),data heterogeneity(resulting from modality-specific noise,resolution variations,and inconsistent annotations),and the challenge of maintaining significant cross-modal correlations during fusion.These problems impede interpretability,which is crucial for clinical trust and use,in addition to performance and generalizability.Lastly,we outline important areas for future research,including the development of standardized protocols for harmonizing data,the creation of lightweight and interpretable fusion architectures,the integration of real-time clinical decision support systems,and the promotion of cooperation for federated multimodal learning.Our goal is to provide researchers and clinicians with a concise overview of the field’s present state,enduring constraints,and exciting directions for further research through this review.展开更多
Convolutional neural networks(CNNs)-based medical image segmentation technologies have been widely used in medical image segmentation because of their strong representation and generalization abilities.However,due to ...Convolutional neural networks(CNNs)-based medical image segmentation technologies have been widely used in medical image segmentation because of their strong representation and generalization abilities.However,due to the inability to effectively capture global information from images,CNNs can easily lead to loss of contours and textures in segmentation results.Notice that the transformer model can effectively capture the properties of long-range dependencies in the image,and furthermore,combining the CNN and the transformer can effectively extract local details and global contextual features of the image.Motivated by this,we propose a multi-branch and multi-scale attention network(M2ANet)for medical image segmentation,whose architecture consists of three components.Specifically,in the first component,we construct an adaptive multi-branch patch module for parallel extraction of image features to reduce information loss caused by downsampling.In the second component,we apply residual block to the well-known convolutional block attention module to enhance the network’s ability to recognize important features of images and alleviate the phenomenon of gradient vanishing.In the third component,we design a multi-scale feature fusion module,in which we adopt adaptive average pooling and position encoding to enhance contextual features,and then multi-head attention is introduced to further enrich feature representation.Finally,we validate the effectiveness and feasibility of the proposed M2ANet method through comparative experiments on four benchmark medical image segmentation datasets,particularly in the context of preserving contours and textures.展开更多
基金the Major Research Plan of the National Natural Science Foundation of China(No.92059206)。
文摘Medical image segmentation is a crucial preliminary step for a number of downstream diagnosis tasks.As deep convolutional neural networks successfully promote the development of computer vision,it is possible to make medical image segmentation a semi-automatic procedure by applying deep convolutional neural networks to finding the contours of regions of interest that are then revised by radiologists.However,supervised learning necessitates large annotated data,which are difficult to acquire especially for medical images.Self-supervised learning is able to take advantage of unlabeled data and provide good initialization to be finetuned for downstream tasks with limited annotations.Considering that most self-supervised learning especially contrastive learning methods are tailored to natural image classification and entail expensive GPU resources,we propose a novel and simple pretext-based self-supervised learning method that exploits the value of positional information in volumetric medical images.Specifically,we regard spatial coordinates as pseudo labels and pretrain the model by predicting positions of randomly sampled 2D slices in volumetric medical images.Experiments on four semantic segmentation datasets demonstrate the superiority of our method over other self-supervised learning methods in both semi-supervised learning and transfer learning settings.Codes are available at https://github.com/alienzyj/PPos.
文摘BACKGROUND Drug utilization research has an important role in assisting the healthcare administration to know,compute,and refine the prescription whose principal objective is to enable the rational use of drugs.Research in developing nations relating to the cost of treatment is scarce when compared with developed countries.Thus,the drug utilization research studies from developing nations are most needed,and their number has been growing.AIM To evaluate patterns of utilization of antipsychotic drugs and direct medical cost analysis in patients newly diagnosed with schizophrenia.METHODS The present study was observational in type and based on a retrospective cohort to evaluate patterns of utilization of antipsychotic drugs using World Health Organization(WHO)core prescribing indicators and anatomical therapeutic chemical/defined daily dose indicators.We also calculated direct medical costs for a period of 6 months.RESULTS This study has found that atypical antipsychotics are the mainstay of treatment for schizophrenia in every age group and subcategories of schizophrenia.The evaluation based on WHO prescribing indicators showed a low average number of drugs per prescription and low prescribing frequency of antipsychotics from the National List of Essential Medicines 2015 and the WHO Essential Medicines List 2019.The total mean drug cost of our study was 1396 Indian rupees.The total mean cost due to the investigation in our study was 1017.34 Indian rupees.Therefore,the total mean direct medical cost incurred on patients in our study was 4337.28 Indian rupees.CONCLUSION The information from the present study can be used for reviewing and updating treatment policy at the institutional level.
基金funded by China Law Society 2025 Annual Legal Research,Project grant number:CLS(2025)Y04.
文摘Background:Medical artificial intelligence(MAI)is a synthesis of medical science and artificial intelligence development,serving as a crucial field in the current advancement and application of AI.In the process of developing medical AI,there may arise not only legal risks such as infringement of privacy rights and health rights but also ethical risks stemming from violations of the principles of beneficence and non-maleficence.Methods:To effectively address the damages caused by MAI in the future,it is necessary to establish a hierarchical governance system with MAI.This paper examines the systematic collection of local practices in China and the induction and integration of legal remedies for the damage of MAI.Results:To effectively address the ethical and legal challenges of medical artificial intelligence,a hierarchical regulatory system should be established,which based on the impact of intervention measures on natural rights and differences in intervention timing.This paper finally obtains a legal hierarchical governance system corresponding to the ethical risks and legal risks of MAI in China.Conclusion:The Chinese government has formed a multi-agent governance system based on the impact of risks on rights and the timing of legal intervention,which provides a reference for other countries to follow up on the research on MAI risk management.
基金Shanghai Science and Technology Devel-opment Fund(9944 190 2 7)
文摘The volumetric rendering of 3 D medical image data is very effective method for communication about radiological studies to clinicians. Algorithms that produce images with artifacts and inaccuracies are not clinically useful. This paper proposed a direct voxel projection algorithm to implement volumetric data rendering. Using this algorithm, arbitrary volume rotation, transparent and cutaway views are generated satisfactorily. Compared with the existing raytracing methods, it improves the projection image quality greatly. Some experimental results about real medical CT image data demonstrate the advantages and fidelity of the proposed algorithm.
文摘Background A medical content-based image retrieval(CBIR)system is designed to retrieve images from large imaging repositories that are visually similar to a user′s query image.CBIR is widely used in evidence-based diagnosis,teaching,and research.Although the retrieval accuracy has largely improved,there has been limited development toward visualizing important image features that indicate the similarity of retrieved images.Despite the prevalence of 3D volumetric data in medical imaging such as computed tomography(CT),current CBIR systems still rely on 2D cross-sectional views for the visualization of retrieved images.Such 2D visualization requires users to browse through the image stacks to confirm the similarity of the retrieved images and often involves mental reconstruction of 3D information,including the size,shape,and spatial relations of multiple structures.This process is time-consuming and reliant on users'experience.Methods In this study,we proposed an importance-aware 3D volume visualization method.The rendering parameters were automatically optimized to maximize the visibility of important structures that were detected and prioritized in the retrieval process.We then integrated the proposed visualization into a CBIR system,thereby complementing the 2D cross-sectional views for relevance feedback and further analyses.Results Our preliminary results demonstrate that 3D visualization can provide additional information using multimodal positron emission tomography and computed tomography(PETCT)images of a non-small cell lung cancer dataset.
基金supported by the National Key R&D Program of China(2021YFF1200602)the National Science Fund for Excellent Overseas Scholars(0401260011)+3 种基金the National Defense Science and Technology Innovation Fund of Chinese Academy of Sciences(c02022088)the Tianjin Science and Technology Program(20JCZDJC00810)the National Natural Science Foundation of China(82202798)the Shanghai Sailing Program(22YF1404200).
文摘Brain-computer interfaces(BCIs)represent an emerging technology that facilitates direct communication between the brain and external devices.In recent years,numerous review articles have explored various aspects of BCIs,including their fundamental principles,technical advancements,and applications in specific domains.However,these reviews often focus on signal processing,hardware development,or limited applications such as motor rehabilitation or communication.This paper aims to offer a comprehensive review of recent electroencephalogram(EEG)-based BCI applications in the medical field across 8 critical areas,encompassing rehabilitation,daily communication,epilepsy,cerebral resuscitation,sleep,neurodegenerative diseases,anesthesiology,and emotion recognition.Moreover,the current challenges and future trends of BCIs were also discussed,including personal privacy and ethical concerns,network security vulnerabilities,safety issues,and biocompatibility.
基金funded by the National Natural Science Foundation of China(Grant No.6240072655)the Hubei Provincial Key Research and Development Program(Grant No.2023BCB151)+1 种基金the Wuhan Natural Science Foundation Exploration Program(Chenguang Program,Grant No.2024040801020202)the Natural Science Foundation of Hubei Province of China(Grant No.2025AFB148).
文摘Image segmentation is attracting increasing attention in the field of medical image analysis.Since widespread utilization across various medical applications,ensuring and improving segmentation accuracy has become a crucial topic of research.With advances in deep learning,researchers have developed numerous methods that combine Transformers and convolutional neural networks(CNNs)to create highly accurate models for medical image segmentation.However,efforts to further enhance accuracy by developing larger and more complex models or training with more extensive datasets,significantly increase computational resource consumption.To address this problem,we propose BiCLIP-nnFormer(the prefix"Bi"refers to the use of two distinct CLIP models),a virtual multimodal instrument that leverages CLIP models to enhance the segmentation performance of a medical segmentation model nnFormer.Since two CLIP models(PMC-CLIP and CoCa-CLIP)are pre-trained on large datasets,they do not require additional training,thus conserving computation resources.These models are used offline to extract image and text embeddings from medical images.These embeddings are then processed by the proposed 3D CLIP adapter,which adapts the CLIP knowledge for segmentation tasks by fine-tuning.Finally,the adapted embeddings are fused with feature maps extracted from the nnFormer encoder for generating predicted masks.This process enriches the representation capabilities of the feature maps by integrating global multimodal information,leading to more precise segmentation predictions.We demonstrate the superiority of BiCLIP-nnFormer and the effectiveness of using CLIP models to enhance nnFormer through experiments on two public datasets,namely the Synapse multi-organ segmentation dataset(Synapse)and the Automatic Cardiac Diagnosis Challenge dataset(ACDC),as well as a self-annotated lung multi-category segmentation dataset(LMCS).
文摘A medical image encryption is proposed based on the Fisher-Yates scrambling,filter diffusion and S-box substitution.First,chaotic sequence associated with the plaintext is generated by logistic-sine-cosine system,which is used for the scrambling,substitution and diffusion processes.The three-dimensional Fisher-Yates scrambling,S-box substitution and diffusion are employed for the first round of encryption.The chaotic sequence is adopted for secondary encryption to scramble the ciphertext obtained in the first round.Then,three-dimensional filter is applied to diffusion for further useful information hiding.The key to the algorithm is generated by the combination of hash value of plaintext image and the input parameters.It improves resisting ability of plaintext attacks.The security analysis shows that the algorithm is effective and efficient.It can resist common attacks.In addition,the good diffusion effect shows that the scheme can solve the differential attacks encountered in the transmission of medical images and has positive implications for future research.
文摘On October 18,2017,the 19th National Congress Report called for the implementation of the Healthy China Strategy.The development of biomedical data plays a pivotal role in advancing this strategy.Since the 18th National Congress of the Communist Party of China,China has vigorously promoted the integration and implementation of the Healthy China and Digital China strategies.The National Health Commission has prioritized the development of health and medical big data,issuing policies to promote standardized applica-tions and foster innovation in"Internet+Healthcare."Biomedical data has significantly contributed to preci-sion medicine,personalized health management,drug development,disease diagnosis,public health monitor-ing,and epidemic prediction capabilities.
文摘Multimodal deep learning has emerged as a key paradigm in contemporary medical diagnostics,advancing precision medicine by enabling integration and learning from diverse data sources.The exponential growth of high-dimensional healthcare data,encompassing genomic,transcriptomic,and other omics profiles,as well as radiological imaging and histopathological slides,makes this approach increasingly important because,when examined separately,these data sources only offer a fragmented picture of intricate disease processes.Multimodal deep learning leverages the complementary properties of multiple data modalities to enable more accurate prognostic modeling,more robust disease characterization,and improved treatment decision-making.This review provides a comprehensive overview of the current state of multimodal deep learning approaches in medical diagnosis.We classify and examine important application domains,such as(1)radiology,where automated report generation and lesion detection are facilitated by image-text integration;(2)histopathology,where fusion models improve tumor classification and grading;and(3)multi-omics,where molecular subtypes and latent biomarkers are revealed through cross-modal learning.We provide an overview of representative research,methodological advancements,and clinical consequences for each domain.Additionally,we critically analyzed the fundamental issues preventing wider adoption,including computational complexity(particularly in training scalable,multi-branch networks),data heterogeneity(resulting from modality-specific noise,resolution variations,and inconsistent annotations),and the challenge of maintaining significant cross-modal correlations during fusion.These problems impede interpretability,which is crucial for clinical trust and use,in addition to performance and generalizability.Lastly,we outline important areas for future research,including the development of standardized protocols for harmonizing data,the creation of lightweight and interpretable fusion architectures,the integration of real-time clinical decision support systems,and the promotion of cooperation for federated multimodal learning.Our goal is to provide researchers and clinicians with a concise overview of the field’s present state,enduring constraints,and exciting directions for further research through this review.
基金supported by the Natural Science Foundation of the Anhui Higher Education Institutions of China(Grant Nos.2023AH040149 and 2024AH051915)the Anhui Provincial Natural Science Foundation(Grant No.2208085MF168)+1 种基金the Science and Technology Innovation Tackle Plan Project of Maanshan(Grant No.2024RGZN001)the Scientific Research Fund Project of Anhui Medical University(Grant No.2023xkj122).
文摘Convolutional neural networks(CNNs)-based medical image segmentation technologies have been widely used in medical image segmentation because of their strong representation and generalization abilities.However,due to the inability to effectively capture global information from images,CNNs can easily lead to loss of contours and textures in segmentation results.Notice that the transformer model can effectively capture the properties of long-range dependencies in the image,and furthermore,combining the CNN and the transformer can effectively extract local details and global contextual features of the image.Motivated by this,we propose a multi-branch and multi-scale attention network(M2ANet)for medical image segmentation,whose architecture consists of three components.Specifically,in the first component,we construct an adaptive multi-branch patch module for parallel extraction of image features to reduce information loss caused by downsampling.In the second component,we apply residual block to the well-known convolutional block attention module to enhance the network’s ability to recognize important features of images and alleviate the phenomenon of gradient vanishing.In the third component,we design a multi-scale feature fusion module,in which we adopt adaptive average pooling and position encoding to enhance contextual features,and then multi-head attention is introduced to further enrich feature representation.Finally,we validate the effectiveness and feasibility of the proposed M2ANet method through comparative experiments on four benchmark medical image segmentation datasets,particularly in the context of preserving contours and textures.