Objectives Diabetes remains a major global health challenge in China.Artificial intelligence(AI)has demonstrated considerable potential in improving diabetes management.This study aimed to assess healthcare providers...Objectives Diabetes remains a major global health challenge in China.Artificial intelligence(AI)has demonstrated considerable potential in improving diabetes management.This study aimed to assess healthcare providers’perceptions regarding AI in diabetes care across China.Methods A cross-sectional survey was conducted using snowball sampling from November 12 to November 24,2024.We selected 514 physicians and nurses by a snowball sampling method from healthcare providers across 30 cities or provinces in China.The self-developed questionnaire comprised five sections with 19 questions assessing medical workers’demographic characteristics,AI-related experience and interest,awareness,attitudes,and concerns regarding AI in diabetes care.Statistical analysis was performed using t-test,analysis of variance(ANOVA),and linear regression.Results Among them,20.0%and 48.1%of respondents had participated in AI-related research and training,while 85.4%expressed moderate to high interest in AI training for diabetes care.Most respondents reported partial awareness of AI in diabetes care,and only 12.6%exhibited a comprehensive or substantial understanding.Attitudes toward AI in diabetes care were generally positive,with a mean score of 24.50±3.38.Nurses demonstrated significantly higher scores than physicians(P<0.05).Greater awareness,prior AI training experience,and higher interest in AI training in diabetes care were strongly associated with more positive attitudes(P<0.05).Key concerns regarding AI included trust issues from AI-clinician inconsistencies(77.2%),increased workload and clinical workflow disruptions(63.4%),and incomplete legal and regulatory frameworks(60.3%).Only 34.2%of respondents expressed concerns about job displacement,indicating general confidence in their professional roles.Conclusions While Chinese healthcare providers show moderate awareness of AI in diabetes care,their attitudes are generally positive,and they are considerably interested in future training.Tailored,role-specific AI training is essential for equitable and effective integration into clinical practice.Additionally,transparent,reliable,ethical AI models must be prioritized to alleviate practitioners’concerns.展开更多
Objective To develop a non-invasive predictive model for coronary artery stenosis severity based on adaptive multi-modal integration of traditional Chinese and western medicine data.Methods Clinical indicators,echocar...Objective To develop a non-invasive predictive model for coronary artery stenosis severity based on adaptive multi-modal integration of traditional Chinese and western medicine data.Methods Clinical indicators,echocardiographic data,traditional Chinese medicine(TCM)tongue manifestations,and facial features were collected from patients who underwent coro-nary computed tomography angiography(CTA)in the Cardiac Care Unit(CCU)of Shanghai Tenth People's Hospital between May 1,2023 and May 1,2024.An adaptive weighted multi-modal data fusion(AWMDF)model based on deep learning was constructed to predict the severity of coronary artery stenosis.The model was evaluated using metrics including accura-cy,precision,recall,F1 score,and the area under the receiver operating characteristic(ROC)curve(AUC).Further performance assessment was conducted through comparisons with six ensemble machine learning methods,data ablation,model component ablation,and various decision-level fusion strategies.Results A total of 158 patients were included in the study.The AWMDF model achieved ex-cellent predictive performance(AUC=0.973,accuracy=0.937,precision=0.937,recall=0.929,and F1 score=0.933).Compared with model ablation,data ablation experiments,and various traditional machine learning models,the AWMDF model demonstrated superior per-formance.Moreover,the adaptive weighting strategy outperformed alternative approaches,including simple weighting,averaging,voting,and fixed-weight schemes.Conclusion The AWMDF model demonstrates potential clinical value in the non-invasive prediction of coronary artery disease and could serve as a tool for clinical decision support.展开更多
This article examines the complex relationship between disease perception,negative emotions,and their impact on postoperative recovery in patients with perianal diseases.These conditions not only cause physical discom...This article examines the complex relationship between disease perception,negative emotions,and their impact on postoperative recovery in patients with perianal diseases.These conditions not only cause physical discomfort,but also carry a significant emotional burden,often exacerbated by social stigma.Psycho-logical factors,including stress,anxiety,and depression,activate neuroendocrine pathways,such as the hypothalamic–pituitary–adrenal axis,disrupting the gut microbiota and leading to dysbiosis.This disruption can delay wound healing,prolong hospital stay,and intensify pain.Drawing on the findings of Hou et al,our article highlights the critical role of illness perception and negative emotions in shaping recovery outcomes.It advocates for a holistic approach that integrates psychological support and gut microbiota modulation,to enhance healing and improve overall patient outcomes.展开更多
With the increasing importance of multimodal data in emotional expression on social media,mainstream methods for sentiment analysis have shifted from unimodal to multimodal approaches.However,the challenges of extract...With the increasing importance of multimodal data in emotional expression on social media,mainstream methods for sentiment analysis have shifted from unimodal to multimodal approaches.However,the challenges of extracting high-quality emotional features and achieving effective interaction between different modalities remain two major obstacles in multimodal sentiment analysis.To address these challenges,this paper proposes a Text-Gated Interaction Network with Inter-Sample Commonality Perception(TGICP).Specifically,we utilize a Inter-sample Commonality Perception(ICP)module to extract common features from similar samples within the same modality,and use these common features to enhance the original features of each modality,thereby obtaining a richer and more complete multimodal sentiment representation.Subsequently,in the cross-modal interaction stage,we design a Text-Gated Interaction(TGI)module,which is text-driven.By calculating the mutual information difference between the text modality and nonverbal modalities,the TGI module dynamically adjusts the influence of emotional information from the text modality on nonverbal modalities.This helps to reduce modality information asymmetry while enabling full cross-modal interaction.Experimental results show that the proposed model achieves outstanding performance on both the CMU-MOSI and CMU-MOSEI baseline multimodal sentiment analysis datasets,validating its effectiveness in emotion recognition tasks.展开更多
Introduction: Uterine fibroids are benign tumors that develop from the connective and muscular tissues of the uterus. Common among African-American women, patients suffering from them often arrive late to the hospital...Introduction: Uterine fibroids are benign tumors that develop from the connective and muscular tissues of the uterus. Common among African-American women, patients suffering from them often arrive late to the hospital in our African regions. This study aimed to investigate the knowledge and perception of uterine fibroids among women who came to the gynecology-obstetrics department of the Regional Hospital Center (CHR) Tsévié. Methodology: It was a cross-sectional descriptive study, with data collection conducted from May 7th to 20th, 2024, using systematic sampling. The study included all women present in the Gynecology-Obstetrics Department of CHR Tsévié during the study period who willingly and informedly consented to participate in the survey. Results: 362 women participated in the study. Among them, 36.8% had a secondary level, and 72.9% were Christians. About 97.5% had heard of uterine fibroids. In 63.5% of cases, their entourage was the principal source of information. The diagnostic methods mentioned by the women were ultrasound in 94.6% of cases, while prayers and occultism were also cited in 28% and 33.3% of cases, respectively. While 91.9% of the women considered the hospital, the place for treatment, some indicated that treatment would require plant-based approaches (46.8%) and prayers (26%). The cost of treatment was an obstacle for 85.4% of women, and 61.3% expressed fear of dying during surgery. There was a statistically significant relationship between treatment choice and religion. Conclusion: The majority of women had heard of uterine fibroids but had incorrect information about the treatment.展开更多
Tephritid fruit flies are considered one of the world’s most notorious pests of horticultural crops, including mango (Mangefera indica L.) in Sierra Leone, causing extensive direct and indirect damage. A survey was c...Tephritid fruit flies are considered one of the world’s most notorious pests of horticultural crops, including mango (Mangefera indica L.) in Sierra Leone, causing extensive direct and indirect damage. A survey was conducted among 60 mango farmers in 7 districts in Sierra Leone between June and August, 2022, to assess their perceptions regarding fruit fly pest status and the current management options adopted for the control of this pest. Semi-structured questions designed in an open and closed-ended fashion were used for the study. The majority (83%) of the farmers were already aware of the fruit fly problem in the country with 62% perceiving it to be very severe. The majority (60%) of farmers, however, demonstrated poor knowledge of identifying fruit fly species, especially Bactrocera dorsalis, Ceratitis capitata, and Ceratitis cosyra. Farmers were more conversant about the direct damage symptoms to host fruits and the economic impact of fruit flies. A total of 32% of growers took no action to control fruit flies on their farms. Sixty-nine percent (69%) of the farmers adopted cultural control measures, like practicing prompt harvesting, collection and disposal of infested fruits, and weeding to maintain better sanitary conditions on their farms. Recommended fruit fly management strategies such as the use of botanicals and resistant varieties were either unknown or inaccessible to growers. A total of 52% applied chemicals that were not recommended for the control of fruit flies without considering their environmental and health risks. It is important to train fruit growers to improve their capabilities for fruit fly management through extension agents that are appropriate for helping them acquire basic knowledge of fruit fly pests and their management.展开更多
The subcortical visual pathway is generally thought to be involved in dangerous information processing,such as fear processing and defensive behavior.A recent study,published in Human Brain Mapping,shows a new functio...The subcortical visual pathway is generally thought to be involved in dangerous information processing,such as fear processing and defensive behavior.A recent study,published in Human Brain Mapping,shows a new function of the subcortical pathway involved in the fast processing of non-emotional object perception.Rapid object processing is a critical function of visual system.Topological perception theory proposes that the initial perception of objects begins with the extraction of topological property(TP).However,the mechanism of rapid TP processing remains unclear.The researchers investigated the subcortical mechanism of TP processing with transcranial magnetic stimulation(TMS).They find that a subcortical magnocellular pathway is responsible for the early processing of TP,and this subcortical processing of TP accelerates object recognition.Based on their findings,we propose a novel training approach called subcortical magnocellular pathway training(SMPT),aimed at improving the efficiency of the subcortical M pathway to restore visual and attentional functions in disorders associated with subcortical pathway dysfunction.展开更多
SARS-CoV-2,particularly the Omicron variant,often leads to flavor perception dysfunction in infected individuals,making a comprehensive understanding of its duration and recovery patterns a critical part of disease ma...SARS-CoV-2,particularly the Omicron variant,often leads to flavor perception dysfunction in infected individuals,making a comprehensive understanding of its duration and recovery patterns a critical part of disease management.This study surveyed a cohort of 199 mildly-to-moderately affected SARS-CoV-2 Omicron-infected patients,focusing on the alterations in their olfaction,taste,and chemesthesis perception.Further,a subset of 36 participants(18 healthy and 18 infected)underwent sensory evaluations to check the variation of umami taste sensitivity.The results demonstrated that most of the infected cohort experienced chemosensory disorders,with the recovery period varying between one week and over a month.Intriguingly,the severity of flavor perception changes during infection significantly correlated with the length of the recovery period.Furthermore,this study explored the specific manifestations of flavor perception dysfunction,potential contributing factors,and potential mechanistic explanations for chemosensory disorders.These include local damage,inflammatory responses,and virus-induced neural damage.However,this study revealed no significant change(P>0.05)in umami taste sensitivity among infected patients 55 days post-infection.While this research faces limitations related to its self-reported,cross-sectional design,and regional focus,it offers valuable insights into the multifaceted impact of COVID-19,particularly the Omicron variant,on chemosensory perception.展开更多
Multi-modal Named Entity Recognition(MNER)aims to better identify meaningful textual entities by integrating information from images.Previous work has focused on extracting visual semantics at a fine-grained level,or ...Multi-modal Named Entity Recognition(MNER)aims to better identify meaningful textual entities by integrating information from images.Previous work has focused on extracting visual semantics at a fine-grained level,or obtaining entity related external knowledge from knowledge bases or Large Language Models(LLMs).However,these approaches ignore the poor semantic correlation between visual and textual modalities in MNER datasets and do not explore different multi-modal fusion approaches.In this paper,we present MMAVK,a multi-modal named entity recognition model with auxiliary visual knowledge and word-level fusion,which aims to leverage the Multi-modal Large Language Model(MLLM)as an implicit knowledge base.It also extracts vision-based auxiliary knowledge from the image formore accurate and effective recognition.Specifically,we propose vision-based auxiliary knowledge generation,which guides the MLLM to extract external knowledge exclusively derived from images to aid entity recognition by designing target-specific prompts,thus avoiding redundant recognition and cognitive confusion caused by the simultaneous processing of image-text pairs.Furthermore,we employ a word-level multi-modal fusion mechanism to fuse the extracted external knowledge with each word-embedding embedded from the transformerbased encoder.Extensive experimental results demonstrate that MMAVK outperforms or equals the state-of-the-art methods on the two classical MNER datasets,even when the largemodels employed have significantly fewer parameters than other baselines.展开更多
Multi-modal knowledge graph completion(MMKGC)aims to complete missing entities or relations in multi-modal knowledge graphs,thereby discovering more previously unknown triples.Due to the continuous growth of data and ...Multi-modal knowledge graph completion(MMKGC)aims to complete missing entities or relations in multi-modal knowledge graphs,thereby discovering more previously unknown triples.Due to the continuous growth of data and knowledge and the limitations of data sources,the visual knowledge within the knowledge graphs is generally of low quality,and some entities suffer from the issue of missing visual modality.Nevertheless,previous studies of MMKGC have primarily focused on how to facilitate modality interaction and fusion while neglecting the problems of low modality quality and modality missing.In this case,mainstream MMKGC models only use pre-trained visual encoders to extract features and transfer the semantic information to the joint embeddings through modal fusion,which inevitably suffers from problems such as error propagation and increased uncertainty.To address these problems,we propose a Multi-modal knowledge graph Completion model based on Super-resolution and Detailed Description Generation(MMCSD).Specifically,we leverage a pre-trained residual network to enhance the resolution and improve the quality of the visual modality.Moreover,we design multi-level visual semantic extraction and entity description generation,thereby further extracting entity semantics from structural triples and visual images.Meanwhile,we train a variational multi-modal auto-encoder and utilize a pre-trained multi-modal language model to complement the missing visual features.We conducted experiments on FB15K-237 and DB13K,and the results showed that MMCSD can effectively perform MMKGC and achieve state-of-the-art performance.展开更多
Integrating multiple medical imaging techniques,including Magnetic Resonance Imaging(MRI),Computed Tomography,Positron Emission Tomography(PET),and ultrasound,provides a comprehensive view of the patient health status...Integrating multiple medical imaging techniques,including Magnetic Resonance Imaging(MRI),Computed Tomography,Positron Emission Tomography(PET),and ultrasound,provides a comprehensive view of the patient health status.Each of these methods contributes unique diagnostic insights,enhancing the overall assessment of patient condition.Nevertheless,the amalgamation of data from multiple modalities presents difficulties due to disparities in resolution,data collection methods,and noise levels.While traditional models like Convolutional Neural Networks(CNNs)excel in single-modality tasks,they struggle to handle multi-modal complexities,lacking the capacity to model global relationships.This research presents a novel approach for examining multi-modal medical imagery using a transformer-based system.The framework employs self-attention and cross-attention mechanisms to synchronize and integrate features across various modalities.Additionally,it shows resilience to variations in noise and image quality,making it adaptable for real-time clinical use.To address the computational hurdles linked to transformer models,particularly in real-time clinical applications in resource-constrained environments,several optimization techniques have been integrated to boost scalability and efficiency.Initially,a streamlined transformer architecture was adopted to minimize the computational load while maintaining model effectiveness.Methods such as model pruning,quantization,and knowledge distillation have been applied to reduce the parameter count and enhance the inference speed.Furthermore,efficient attention mechanisms such as linear or sparse attention were employed to alleviate the substantial memory and processing requirements of traditional self-attention operations.For further deployment optimization,researchers have implemented hardware-aware acceleration strategies,including the use of TensorRT and ONNX-based model compression,to ensure efficient execution on edge devices.These optimizations allow the approach to function effectively in real-time clinical settings,ensuring viability even in environments with limited resources.Future research directions include integrating non-imaging data to facilitate personalized treatment and enhancing computational efficiency for implementation in resource-limited environments.This study highlights the transformative potential of transformer models in multi-modal medical imaging,offering improvements in diagnostic accuracy and patient care outcomes.展开更多
To address the challenge of missing modal information in entity alignment and to mitigate information loss or bias arising frommodal heterogeneity during fusion,while also capturing shared information acrossmodalities...To address the challenge of missing modal information in entity alignment and to mitigate information loss or bias arising frommodal heterogeneity during fusion,while also capturing shared information acrossmodalities,this paper proposes a Multi-modal Pre-synergistic Entity Alignmentmodel based on Cross-modalMutual Information Strategy Optimization(MPSEA).The model first employs independent encoders to process multi-modal features,including text,images,and numerical values.Next,a multi-modal pre-synergistic fusion mechanism integrates graph structural and visual modal features into the textual modality as preparatory information.This pre-fusion strategy enables unified perception of heterogeneous modalities at the model’s initial stage,reducing discrepancies during the fusion process.Finally,using cross-modal deep perception reinforcement learning,the model achieves adaptive multilevel feature fusion between modalities,supporting learningmore effective alignment strategies.Extensive experiments on multiple public datasets show that the MPSEA method achieves gains of up to 7% in Hits@1 and 8.2% in MRR on the FBDB15K dataset,and up to 9.1% in Hits@1 and 7.7% in MRR on the FBYG15K dataset,compared to existing state-of-the-art methods.These results confirm the effectiveness of the proposed model.展开更多
Background: Sickle cell anemia(SCA), a genetic hemoglobin disorder, suggests essential inner ear compromise and poor auditory processing. In humans, auditory processing differs physiologically between males and female...Background: Sickle cell anemia(SCA), a genetic hemoglobin disorder, suggests essential inner ear compromise and poor auditory processing. In humans, auditory processing differs physiologically between males and females, possibly true for SCA due to gender-specific disease pathophysiological changes. Objective: To investigate gender differences in psychoacoustical abilities, and speech perception in noise in SCA individuals and further compare with normal healthy(NH) population. Methods: 80 SCA and 80 NH normal-hearing participants aged 15-40 years were included and further grouped based on gender. Auditory discrimination for frequency, intensity, and duration at 500Hz and 4000Hz;temporal processing(Gap detection threshold & Modulation Detection Threshold) and Speech Perception In Noise(SPIN) at 0d BSNR tests were evaluated and compared between males and females of SCA and NH population. Results: SCA performed poorer compared to NH for all experimental measures. In the NH population, males performed poorer than females in psychoacoustical measures whereas within the SCA population, the reverse was true. Female participants performed better in the SPIN test in both populations. Conclusions: The adverse impact of SCA on the auditory system due to circulatory changes might cause poorer performance in SCA. Poorer performance by Female SCA is possibly due to the contrary impact of lower Hb level overlying Sickle disease.Estrogen levels and gender preference in auditory processing might lead to better performance by females within the NH population. SPIN performance depends on different attentional demands and sensorimotor processing strategies in noise beyond psychoacoustical processing may lead to better female performance in both populations.展开更多
Traditional Chinese medicine(TCM)demonstrates distinctive advantages in disease prevention and treatment.However,analyzing its biological mechanisms through the modern medical research paradigm of“single drug,single ...Traditional Chinese medicine(TCM)demonstrates distinctive advantages in disease prevention and treatment.However,analyzing its biological mechanisms through the modern medical research paradigm of“single drug,single target”presents significant challenges due to its holistic approach.Network pharmacology and its core theory of network targets connect drugs and diseases from a holistic and systematic perspective based on biological networks,overcoming the limitations of reductionist research models and showing considerable value in TCM research.Recent integration of network target computational and experimental methods with artificial intelligence(AI)and multi-modal multi-omics technologies has substantially enhanced network pharmacology methodology.The advancement in computational and experimental techniques provides complementary support for network target theory in decoding TCM principles.This review,centered on network targets,examines the progress of network target methods combined with AI in predicting disease molecular mechanisms and drug-target relationships,alongside the application of multi-modal multi-omics technologies in analyzing TCM formulae,syndromes,and toxicity.Looking forward,network target theory is expected to incorporate emerging technologies while developing novel approaches aligned with its unique characteristics,potentially leading to significant breakthroughs in TCM research and advancing scientific understanding and innovation in TCM.展开更多
Intelligent perception,as a cutting-edge field of modern science and technology,is profoundly changing our understanding and interaction with the world.With the rapid development of artificial intelligence,the Interne...Intelligent perception,as a cutting-edge field of modern science and technology,is profoundly changing our understanding and interaction with the world.With the rapid development of artificial intelligence,the Internet of things,big data,and other technologies,intelligent perception systems have shown great potential in non-destructive testing,safety monitoring,human-computer interaction,and precision measurement.Traditional sensing technologies face many challenges in complex scenarios or specific needs,while intelligent perception provides a new path for innovation and breakthroughs in instrumentation and sensing technologies through multidisciplinary integration.展开更多
While people met in order to socialize on public spaces in the past, these areas are perceived as a ‘alone in the crowds’ by people who are in the loneliness of modern era, as well as these areas still serve as a so...While people met in order to socialize on public spaces in the past, these areas are perceived as a ‘alone in the crowds’ by people who are in the loneliness of modern era, as well as these areas still serve as a social area. Individuals from all of society, especially minority groups, feel that they are accepted and they show themselves in a way in the public space. Even though the perception and usage of public space have changed in time, people still feel free themselves in these areas. However, ‘terrorism’, which is a reality in today's world, is one of the cases which pose danger to the public spaces. Thus, the image of these areas has changed from “the areas where individuals they feel freer” to “the areas where people are vulnerable to many potential attacks”. This study tells you how the public perception has changed over time and examine the intended use of the public due to these changes. Terrorist activities increased all of the World and public spaces of the individual in the face of this reality, perception and Jane Jacobs, urban life and public relations with the charm of the terrorist phenomenon is one of the main problems the 21st century in the context of views on security are discussed. Also in this report, in order to provide a team recommendation for safe public space taking into account the author's views on security was available. For this purpose, the metropolis of Istanbul is selected as the study area were interviewed and the people living in Istanbul with internet environment. At the end of the 90s until today has changed the perception of how the public and in the public domain when individuals are discussed how they use.展开更多
As the number and complexity of sensors in autonomous vehicles continue to rise,multimodal fusionbased object detection algorithms are increasingly being used to detect 3D environmental information,significantly advan...As the number and complexity of sensors in autonomous vehicles continue to rise,multimodal fusionbased object detection algorithms are increasingly being used to detect 3D environmental information,significantly advancing the development of perception technology in autonomous driving.To further promote the development of fusion algorithms and improve detection performance,this paper discusses the advantages and recent advancements of multimodal fusion-based object detection algorithms.Starting fromsingle-modal sensor detection,the paper provides a detailed overview of typical sensors used in autonomous driving and introduces object detection methods based on images and point clouds.For image-based detection methods,they are categorized into monocular detection and binocular detection based on different input types.For point cloud-based detection methods,they are classified into projection-based,voxel-based,point cluster-based,pillar-based,and graph structure-based approaches based on the technical pathways for processing point cloud features.Additionally,multimodal fusion algorithms are divided into Camera-LiDAR fusion,Camera-Radar fusion,Camera-LiDAR-Radar fusion,and other sensor fusion methods based on the types of sensors involved.Furthermore,the paper identifies five key future research directions in this field,aiming to provide insights for researchers engaged in multimodal fusion-based object detection algorithms and to encourage broader attention to the research and application of multimodal fusion-based object detection.展开更多
Objectives:This study aimed to explore the perceptions and recommendations of multiparas and health-related professionals regarding appropriate birth intervals(Bis)and key determinants.Methods:In-depth semi-structured...Objectives:This study aimed to explore the perceptions and recommendations of multiparas and health-related professionals regarding appropriate birth intervals(Bis)and key determinants.Methods:In-depth semi-structured interviews were conducted between April 1 and June 30,2022.Nine multiparas and thirteen health-related professionals were purposefully sampled until data saturation was reached.A thematic analysis approach was applied to the interview transcripts,utilizing dual independent coding and consensus validation in NVivo 12.0.Results:The data generated two overarching categories:1)balanced decision-making on the appropriate birth intervals and 2)internal and external determinants integrated with health and societal considerations.Four key themes emerged following the two categories:1)consistency and discrepancy between the actual and recommended birth intervals of multiparas;2)health-and developmentoriented professional recommendations;3)internal determinants related to individual-level factors;and 4)external determinants related to child-related factors,family support,and social security.Weighing women's reproductive health and career development,multiparas and health-related professionals perceived a length between 18 and 36 months as the appropriate Bl.Conclusion:Multiparas and health-related professionals shaped their balanced recommendations on a relatively appropriate birth interval ranging from 18 to 36 months,which was influenced by women's individual-level factors,child-related factors,family support,and social security.Targeted social and healthcare services should be offered to women and their families during the Bls.展开更多
The primary objective of Chinese spelling correction(CSC)is to detect and correct erroneous characters in Chinese text,which can result from various factors,such as inaccuracies in pinyin representation,character rese...The primary objective of Chinese spelling correction(CSC)is to detect and correct erroneous characters in Chinese text,which can result from various factors,such as inaccuracies in pinyin representation,character resemblance,and semantic discrepancies.However,existing methods often struggle to fully address these types of errors,impacting the overall correction accuracy.This paper introduces a multi-modal feature encoder designed to efficiently extract features from three distinct modalities:pinyin,semantics,and character morphology.Unlike previous methods that rely on direct fusion or fixed-weight summation to integrate multi-modal information,our approach employs a multi-head attention mechanism to focuse more on relevant modal information while dis-regarding less pertinent data.To prevent issues such as gradient explosion or vanishing,the model incorporates a residual connection of the original text vector for fine-tuning.This approach ensures robust model performance by maintaining essential linguistic details throughout the correction process.Experimental evaluations on the SIGHAN benchmark dataset demonstrate that the pro-posed model outperforms baseline approaches across various metrics and datasets,confirming its effectiveness and feasibility.展开更多
基金supported by the Jiangsu Provincial Department of Science and Technology Social Development Project(No.BE2020787)。
文摘Objectives Diabetes remains a major global health challenge in China.Artificial intelligence(AI)has demonstrated considerable potential in improving diabetes management.This study aimed to assess healthcare providers’perceptions regarding AI in diabetes care across China.Methods A cross-sectional survey was conducted using snowball sampling from November 12 to November 24,2024.We selected 514 physicians and nurses by a snowball sampling method from healthcare providers across 30 cities or provinces in China.The self-developed questionnaire comprised five sections with 19 questions assessing medical workers’demographic characteristics,AI-related experience and interest,awareness,attitudes,and concerns regarding AI in diabetes care.Statistical analysis was performed using t-test,analysis of variance(ANOVA),and linear regression.Results Among them,20.0%and 48.1%of respondents had participated in AI-related research and training,while 85.4%expressed moderate to high interest in AI training for diabetes care.Most respondents reported partial awareness of AI in diabetes care,and only 12.6%exhibited a comprehensive or substantial understanding.Attitudes toward AI in diabetes care were generally positive,with a mean score of 24.50±3.38.Nurses demonstrated significantly higher scores than physicians(P<0.05).Greater awareness,prior AI training experience,and higher interest in AI training in diabetes care were strongly associated with more positive attitudes(P<0.05).Key concerns regarding AI included trust issues from AI-clinician inconsistencies(77.2%),increased workload and clinical workflow disruptions(63.4%),and incomplete legal and regulatory frameworks(60.3%).Only 34.2%of respondents expressed concerns about job displacement,indicating general confidence in their professional roles.Conclusions While Chinese healthcare providers show moderate awareness of AI in diabetes care,their attitudes are generally positive,and they are considerably interested in future training.Tailored,role-specific AI training is essential for equitable and effective integration into clinical practice.Additionally,transparent,reliable,ethical AI models must be prioritized to alleviate practitioners’concerns.
基金Construction Program of the Key Discipline of State Administration of Traditional Chinese Medicine of China(ZYYZDXK-2023069)Research Project of Shanghai Municipal Health Commission (2024QN018)Shanghai University of Traditional Chinese Medicine Science and Technology Development Program (23KFL005)。
文摘Objective To develop a non-invasive predictive model for coronary artery stenosis severity based on adaptive multi-modal integration of traditional Chinese and western medicine data.Methods Clinical indicators,echocardiographic data,traditional Chinese medicine(TCM)tongue manifestations,and facial features were collected from patients who underwent coro-nary computed tomography angiography(CTA)in the Cardiac Care Unit(CCU)of Shanghai Tenth People's Hospital between May 1,2023 and May 1,2024.An adaptive weighted multi-modal data fusion(AWMDF)model based on deep learning was constructed to predict the severity of coronary artery stenosis.The model was evaluated using metrics including accura-cy,precision,recall,F1 score,and the area under the receiver operating characteristic(ROC)curve(AUC).Further performance assessment was conducted through comparisons with six ensemble machine learning methods,data ablation,model component ablation,and various decision-level fusion strategies.Results A total of 158 patients were included in the study.The AWMDF model achieved ex-cellent predictive performance(AUC=0.973,accuracy=0.937,precision=0.937,recall=0.929,and F1 score=0.933).Compared with model ablation,data ablation experiments,and various traditional machine learning models,the AWMDF model demonstrated superior per-formance.Moreover,the adaptive weighting strategy outperformed alternative approaches,including simple weighting,averaging,voting,and fixed-weight schemes.Conclusion The AWMDF model demonstrates potential clinical value in the non-invasive prediction of coronary artery disease and could serve as a tool for clinical decision support.
文摘This article examines the complex relationship between disease perception,negative emotions,and their impact on postoperative recovery in patients with perianal diseases.These conditions not only cause physical discomfort,but also carry a significant emotional burden,often exacerbated by social stigma.Psycho-logical factors,including stress,anxiety,and depression,activate neuroendocrine pathways,such as the hypothalamic–pituitary–adrenal axis,disrupting the gut microbiota and leading to dysbiosis.This disruption can delay wound healing,prolong hospital stay,and intensify pain.Drawing on the findings of Hou et al,our article highlights the critical role of illness perception and negative emotions in shaping recovery outcomes.It advocates for a holistic approach that integrates psychological support and gut microbiota modulation,to enhance healing and improve overall patient outcomes.
基金supported by the Natural Science Foundation of Henan under Grant 242300421220the Henan Provincial Science and Technology Research Project under Grants 252102211047 and 252102211062+3 种基金the Jiangsu Provincial Scheme Double Initiative Plan JSS-CBS20230474the XJTLU RDF-21-02-008the Science and Technology Innovation Project of Zhengzhou University of Light Industry under Grant 23XNKJTD0205the Higher Education Teaching Reform Research and Practice Project of Henan Province under Grant 2024SJGLX0126.
文摘With the increasing importance of multimodal data in emotional expression on social media,mainstream methods for sentiment analysis have shifted from unimodal to multimodal approaches.However,the challenges of extracting high-quality emotional features and achieving effective interaction between different modalities remain two major obstacles in multimodal sentiment analysis.To address these challenges,this paper proposes a Text-Gated Interaction Network with Inter-Sample Commonality Perception(TGICP).Specifically,we utilize a Inter-sample Commonality Perception(ICP)module to extract common features from similar samples within the same modality,and use these common features to enhance the original features of each modality,thereby obtaining a richer and more complete multimodal sentiment representation.Subsequently,in the cross-modal interaction stage,we design a Text-Gated Interaction(TGI)module,which is text-driven.By calculating the mutual information difference between the text modality and nonverbal modalities,the TGI module dynamically adjusts the influence of emotional information from the text modality on nonverbal modalities.This helps to reduce modality information asymmetry while enabling full cross-modal interaction.Experimental results show that the proposed model achieves outstanding performance on both the CMU-MOSI and CMU-MOSEI baseline multimodal sentiment analysis datasets,validating its effectiveness in emotion recognition tasks.
文摘Introduction: Uterine fibroids are benign tumors that develop from the connective and muscular tissues of the uterus. Common among African-American women, patients suffering from them often arrive late to the hospital in our African regions. This study aimed to investigate the knowledge and perception of uterine fibroids among women who came to the gynecology-obstetrics department of the Regional Hospital Center (CHR) Tsévié. Methodology: It was a cross-sectional descriptive study, with data collection conducted from May 7th to 20th, 2024, using systematic sampling. The study included all women present in the Gynecology-Obstetrics Department of CHR Tsévié during the study period who willingly and informedly consented to participate in the survey. Results: 362 women participated in the study. Among them, 36.8% had a secondary level, and 72.9% were Christians. About 97.5% had heard of uterine fibroids. In 63.5% of cases, their entourage was the principal source of information. The diagnostic methods mentioned by the women were ultrasound in 94.6% of cases, while prayers and occultism were also cited in 28% and 33.3% of cases, respectively. While 91.9% of the women considered the hospital, the place for treatment, some indicated that treatment would require plant-based approaches (46.8%) and prayers (26%). The cost of treatment was an obstacle for 85.4% of women, and 61.3% expressed fear of dying during surgery. There was a statistically significant relationship between treatment choice and religion. Conclusion: The majority of women had heard of uterine fibroids but had incorrect information about the treatment.
文摘Tephritid fruit flies are considered one of the world’s most notorious pests of horticultural crops, including mango (Mangefera indica L.) in Sierra Leone, causing extensive direct and indirect damage. A survey was conducted among 60 mango farmers in 7 districts in Sierra Leone between June and August, 2022, to assess their perceptions regarding fruit fly pest status and the current management options adopted for the control of this pest. Semi-structured questions designed in an open and closed-ended fashion were used for the study. The majority (83%) of the farmers were already aware of the fruit fly problem in the country with 62% perceiving it to be very severe. The majority (60%) of farmers, however, demonstrated poor knowledge of identifying fruit fly species, especially Bactrocera dorsalis, Ceratitis capitata, and Ceratitis cosyra. Farmers were more conversant about the direct damage symptoms to host fruits and the economic impact of fruit flies. A total of 32% of growers took no action to control fruit flies on their farms. Sixty-nine percent (69%) of the farmers adopted cultural control measures, like practicing prompt harvesting, collection and disposal of infested fruits, and weeding to maintain better sanitary conditions on their farms. Recommended fruit fly management strategies such as the use of botanicals and resistant varieties were either unknown or inaccessible to growers. A total of 52% applied chemicals that were not recommended for the control of fruit flies without considering their environmental and health risks. It is important to train fruit growers to improve their capabilities for fruit fly management through extension agents that are appropriate for helping them acquire basic knowledge of fruit fly pests and their management.
文摘The subcortical visual pathway is generally thought to be involved in dangerous information processing,such as fear processing and defensive behavior.A recent study,published in Human Brain Mapping,shows a new function of the subcortical pathway involved in the fast processing of non-emotional object perception.Rapid object processing is a critical function of visual system.Topological perception theory proposes that the initial perception of objects begins with the extraction of topological property(TP).However,the mechanism of rapid TP processing remains unclear.The researchers investigated the subcortical mechanism of TP processing with transcranial magnetic stimulation(TMS).They find that a subcortical magnocellular pathway is responsible for the early processing of TP,and this subcortical processing of TP accelerates object recognition.Based on their findings,we propose a novel training approach called subcortical magnocellular pathway training(SMPT),aimed at improving the efficiency of the subcortical M pathway to restore visual and attentional functions in disorders associated with subcortical pathway dysfunction.
基金supported by the National Natural Science Foundation of China(32001824,31901813,32001827)。
文摘SARS-CoV-2,particularly the Omicron variant,often leads to flavor perception dysfunction in infected individuals,making a comprehensive understanding of its duration and recovery patterns a critical part of disease management.This study surveyed a cohort of 199 mildly-to-moderately affected SARS-CoV-2 Omicron-infected patients,focusing on the alterations in their olfaction,taste,and chemesthesis perception.Further,a subset of 36 participants(18 healthy and 18 infected)underwent sensory evaluations to check the variation of umami taste sensitivity.The results demonstrated that most of the infected cohort experienced chemosensory disorders,with the recovery period varying between one week and over a month.Intriguingly,the severity of flavor perception changes during infection significantly correlated with the length of the recovery period.Furthermore,this study explored the specific manifestations of flavor perception dysfunction,potential contributing factors,and potential mechanistic explanations for chemosensory disorders.These include local damage,inflammatory responses,and virus-induced neural damage.However,this study revealed no significant change(P>0.05)in umami taste sensitivity among infected patients 55 days post-infection.While this research faces limitations related to its self-reported,cross-sectional design,and regional focus,it offers valuable insights into the multifaceted impact of COVID-19,particularly the Omicron variant,on chemosensory perception.
基金funded by Research Project,grant number BHQ090003000X03.
文摘Multi-modal Named Entity Recognition(MNER)aims to better identify meaningful textual entities by integrating information from images.Previous work has focused on extracting visual semantics at a fine-grained level,or obtaining entity related external knowledge from knowledge bases or Large Language Models(LLMs).However,these approaches ignore the poor semantic correlation between visual and textual modalities in MNER datasets and do not explore different multi-modal fusion approaches.In this paper,we present MMAVK,a multi-modal named entity recognition model with auxiliary visual knowledge and word-level fusion,which aims to leverage the Multi-modal Large Language Model(MLLM)as an implicit knowledge base.It also extracts vision-based auxiliary knowledge from the image formore accurate and effective recognition.Specifically,we propose vision-based auxiliary knowledge generation,which guides the MLLM to extract external knowledge exclusively derived from images to aid entity recognition by designing target-specific prompts,thus avoiding redundant recognition and cognitive confusion caused by the simultaneous processing of image-text pairs.Furthermore,we employ a word-level multi-modal fusion mechanism to fuse the extracted external knowledge with each word-embedding embedded from the transformerbased encoder.Extensive experimental results demonstrate that MMAVK outperforms or equals the state-of-the-art methods on the two classical MNER datasets,even when the largemodels employed have significantly fewer parameters than other baselines.
基金funded by Research Project,grant number BHQ090003000X03。
文摘Multi-modal knowledge graph completion(MMKGC)aims to complete missing entities or relations in multi-modal knowledge graphs,thereby discovering more previously unknown triples.Due to the continuous growth of data and knowledge and the limitations of data sources,the visual knowledge within the knowledge graphs is generally of low quality,and some entities suffer from the issue of missing visual modality.Nevertheless,previous studies of MMKGC have primarily focused on how to facilitate modality interaction and fusion while neglecting the problems of low modality quality and modality missing.In this case,mainstream MMKGC models only use pre-trained visual encoders to extract features and transfer the semantic information to the joint embeddings through modal fusion,which inevitably suffers from problems such as error propagation and increased uncertainty.To address these problems,we propose a Multi-modal knowledge graph Completion model based on Super-resolution and Detailed Description Generation(MMCSD).Specifically,we leverage a pre-trained residual network to enhance the resolution and improve the quality of the visual modality.Moreover,we design multi-level visual semantic extraction and entity description generation,thereby further extracting entity semantics from structural triples and visual images.Meanwhile,we train a variational multi-modal auto-encoder and utilize a pre-trained multi-modal language model to complement the missing visual features.We conducted experiments on FB15K-237 and DB13K,and the results showed that MMCSD can effectively perform MMKGC and achieve state-of-the-art performance.
基金supported by the Deanship of Research and Graduate Studies at King Khalid University under Small Research Project grant number RGP1/139/45.
文摘Integrating multiple medical imaging techniques,including Magnetic Resonance Imaging(MRI),Computed Tomography,Positron Emission Tomography(PET),and ultrasound,provides a comprehensive view of the patient health status.Each of these methods contributes unique diagnostic insights,enhancing the overall assessment of patient condition.Nevertheless,the amalgamation of data from multiple modalities presents difficulties due to disparities in resolution,data collection methods,and noise levels.While traditional models like Convolutional Neural Networks(CNNs)excel in single-modality tasks,they struggle to handle multi-modal complexities,lacking the capacity to model global relationships.This research presents a novel approach for examining multi-modal medical imagery using a transformer-based system.The framework employs self-attention and cross-attention mechanisms to synchronize and integrate features across various modalities.Additionally,it shows resilience to variations in noise and image quality,making it adaptable for real-time clinical use.To address the computational hurdles linked to transformer models,particularly in real-time clinical applications in resource-constrained environments,several optimization techniques have been integrated to boost scalability and efficiency.Initially,a streamlined transformer architecture was adopted to minimize the computational load while maintaining model effectiveness.Methods such as model pruning,quantization,and knowledge distillation have been applied to reduce the parameter count and enhance the inference speed.Furthermore,efficient attention mechanisms such as linear or sparse attention were employed to alleviate the substantial memory and processing requirements of traditional self-attention operations.For further deployment optimization,researchers have implemented hardware-aware acceleration strategies,including the use of TensorRT and ONNX-based model compression,to ensure efficient execution on edge devices.These optimizations allow the approach to function effectively in real-time clinical settings,ensuring viability even in environments with limited resources.Future research directions include integrating non-imaging data to facilitate personalized treatment and enhancing computational efficiency for implementation in resource-limited environments.This study highlights the transformative potential of transformer models in multi-modal medical imaging,offering improvements in diagnostic accuracy and patient care outcomes.
基金partially supported by the National Natural Science Foundation of China under Grants 62471493 and 62402257(for conceptualization and investigation)partially supported by the Natural Science Foundation of Shandong Province,China under Grants ZR2023LZH017,ZR2024MF066,and 2023QF025(for formal analysis and validation)+1 种基金partially supported by the Open Foundation of Key Laboratory of Computing Power Network and Information Security,Ministry of Education,Qilu University of Technology(Shandong Academy of Sciences)under Grant 2023ZD010(for methodology and model design)partially supported by the Russian Science Foundation(RSF)Project under Grant 22-71-10095-P(for validation and results verification).
文摘To address the challenge of missing modal information in entity alignment and to mitigate information loss or bias arising frommodal heterogeneity during fusion,while also capturing shared information acrossmodalities,this paper proposes a Multi-modal Pre-synergistic Entity Alignmentmodel based on Cross-modalMutual Information Strategy Optimization(MPSEA).The model first employs independent encoders to process multi-modal features,including text,images,and numerical values.Next,a multi-modal pre-synergistic fusion mechanism integrates graph structural and visual modal features into the textual modality as preparatory information.This pre-fusion strategy enables unified perception of heterogeneous modalities at the model’s initial stage,reducing discrepancies during the fusion process.Finally,using cross-modal deep perception reinforcement learning,the model achieves adaptive multilevel feature fusion between modalities,supporting learningmore effective alignment strategies.Extensive experiments on multiple public datasets show that the MPSEA method achieves gains of up to 7% in Hits@1 and 8.2% in MRR on the FBDB15K dataset,and up to 9.1% in Hits@1 and 7.7% in MRR on the FBYG15K dataset,compared to existing state-of-the-art methods.These results confirm the effectiveness of the proposed model.
文摘Background: Sickle cell anemia(SCA), a genetic hemoglobin disorder, suggests essential inner ear compromise and poor auditory processing. In humans, auditory processing differs physiologically between males and females, possibly true for SCA due to gender-specific disease pathophysiological changes. Objective: To investigate gender differences in psychoacoustical abilities, and speech perception in noise in SCA individuals and further compare with normal healthy(NH) population. Methods: 80 SCA and 80 NH normal-hearing participants aged 15-40 years were included and further grouped based on gender. Auditory discrimination for frequency, intensity, and duration at 500Hz and 4000Hz;temporal processing(Gap detection threshold & Modulation Detection Threshold) and Speech Perception In Noise(SPIN) at 0d BSNR tests were evaluated and compared between males and females of SCA and NH population. Results: SCA performed poorer compared to NH for all experimental measures. In the NH population, males performed poorer than females in psychoacoustical measures whereas within the SCA population, the reverse was true. Female participants performed better in the SPIN test in both populations. Conclusions: The adverse impact of SCA on the auditory system due to circulatory changes might cause poorer performance in SCA. Poorer performance by Female SCA is possibly due to the contrary impact of lower Hb level overlying Sickle disease.Estrogen levels and gender preference in auditory processing might lead to better performance by females within the NH population. SPIN performance depends on different attentional demands and sensorimotor processing strategies in noise beyond psychoacoustical processing may lead to better female performance in both populations.
文摘Traditional Chinese medicine(TCM)demonstrates distinctive advantages in disease prevention and treatment.However,analyzing its biological mechanisms through the modern medical research paradigm of“single drug,single target”presents significant challenges due to its holistic approach.Network pharmacology and its core theory of network targets connect drugs and diseases from a holistic and systematic perspective based on biological networks,overcoming the limitations of reductionist research models and showing considerable value in TCM research.Recent integration of network target computational and experimental methods with artificial intelligence(AI)and multi-modal multi-omics technologies has substantially enhanced network pharmacology methodology.The advancement in computational and experimental techniques provides complementary support for network target theory in decoding TCM principles.This review,centered on network targets,examines the progress of network target methods combined with AI in predicting disease molecular mechanisms and drug-target relationships,alongside the application of multi-modal multi-omics technologies in analyzing TCM formulae,syndromes,and toxicity.Looking forward,network target theory is expected to incorporate emerging technologies while developing novel approaches aligned with its unique characteristics,potentially leading to significant breakthroughs in TCM research and advancing scientific understanding and innovation in TCM.
文摘Intelligent perception,as a cutting-edge field of modern science and technology,is profoundly changing our understanding and interaction with the world.With the rapid development of artificial intelligence,the Internet of things,big data,and other technologies,intelligent perception systems have shown great potential in non-destructive testing,safety monitoring,human-computer interaction,and precision measurement.Traditional sensing technologies face many challenges in complex scenarios or specific needs,while intelligent perception provides a new path for innovation and breakthroughs in instrumentation and sensing technologies through multidisciplinary integration.
文摘While people met in order to socialize on public spaces in the past, these areas are perceived as a ‘alone in the crowds’ by people who are in the loneliness of modern era, as well as these areas still serve as a social area. Individuals from all of society, especially minority groups, feel that they are accepted and they show themselves in a way in the public space. Even though the perception and usage of public space have changed in time, people still feel free themselves in these areas. However, ‘terrorism’, which is a reality in today's world, is one of the cases which pose danger to the public spaces. Thus, the image of these areas has changed from “the areas where individuals they feel freer” to “the areas where people are vulnerable to many potential attacks”. This study tells you how the public perception has changed over time and examine the intended use of the public due to these changes. Terrorist activities increased all of the World and public spaces of the individual in the face of this reality, perception and Jane Jacobs, urban life and public relations with the charm of the terrorist phenomenon is one of the main problems the 21st century in the context of views on security are discussed. Also in this report, in order to provide a team recommendation for safe public space taking into account the author's views on security was available. For this purpose, the metropolis of Istanbul is selected as the study area were interviewed and the people living in Istanbul with internet environment. At the end of the 90s until today has changed the perception of how the public and in the public domain when individuals are discussed how they use.
基金funded by the Yangtze River Delta Science and Technology Innovation Community Joint Research Project(2023CSJGG1600)the Natural Science Foundation of Anhui Province(2208085MF173)Wuhu“ChiZhu Light”Major Science and Technology Project(2023ZD01,2023ZD03).
文摘As the number and complexity of sensors in autonomous vehicles continue to rise,multimodal fusionbased object detection algorithms are increasingly being used to detect 3D environmental information,significantly advancing the development of perception technology in autonomous driving.To further promote the development of fusion algorithms and improve detection performance,this paper discusses the advantages and recent advancements of multimodal fusion-based object detection algorithms.Starting fromsingle-modal sensor detection,the paper provides a detailed overview of typical sensors used in autonomous driving and introduces object detection methods based on images and point clouds.For image-based detection methods,they are categorized into monocular detection and binocular detection based on different input types.For point cloud-based detection methods,they are classified into projection-based,voxel-based,point cluster-based,pillar-based,and graph structure-based approaches based on the technical pathways for processing point cloud features.Additionally,multimodal fusion algorithms are divided into Camera-LiDAR fusion,Camera-Radar fusion,Camera-LiDAR-Radar fusion,and other sensor fusion methods based on the types of sensors involved.Furthermore,the paper identifies five key future research directions in this field,aiming to provide insights for researchers engaged in multimodal fusion-based object detection algorithms and to encourage broader attention to the research and application of multimodal fusion-based object detection.
基金supported by the Key Discipline Program of the Fifth Round of the Three-Year Public Health Action Plan(2020-2022 Year)of Shanghai,China(GWV-10.1-XK08).
文摘Objectives:This study aimed to explore the perceptions and recommendations of multiparas and health-related professionals regarding appropriate birth intervals(Bis)and key determinants.Methods:In-depth semi-structured interviews were conducted between April 1 and June 30,2022.Nine multiparas and thirteen health-related professionals were purposefully sampled until data saturation was reached.A thematic analysis approach was applied to the interview transcripts,utilizing dual independent coding and consensus validation in NVivo 12.0.Results:The data generated two overarching categories:1)balanced decision-making on the appropriate birth intervals and 2)internal and external determinants integrated with health and societal considerations.Four key themes emerged following the two categories:1)consistency and discrepancy between the actual and recommended birth intervals of multiparas;2)health-and developmentoriented professional recommendations;3)internal determinants related to individual-level factors;and 4)external determinants related to child-related factors,family support,and social security.Weighing women's reproductive health and career development,multiparas and health-related professionals perceived a length between 18 and 36 months as the appropriate Bl.Conclusion:Multiparas and health-related professionals shaped their balanced recommendations on a relatively appropriate birth interval ranging from 18 to 36 months,which was influenced by women's individual-level factors,child-related factors,family support,and social security.Targeted social and healthcare services should be offered to women and their families during the Bls.
基金Supported by the National Natural Science Foundation of China(No.61472256,61170277)the Hujiang Foundation(No.A14006).
文摘The primary objective of Chinese spelling correction(CSC)is to detect and correct erroneous characters in Chinese text,which can result from various factors,such as inaccuracies in pinyin representation,character resemblance,and semantic discrepancies.However,existing methods often struggle to fully address these types of errors,impacting the overall correction accuracy.This paper introduces a multi-modal feature encoder designed to efficiently extract features from three distinct modalities:pinyin,semantics,and character morphology.Unlike previous methods that rely on direct fusion or fixed-weight summation to integrate multi-modal information,our approach employs a multi-head attention mechanism to focuse more on relevant modal information while dis-regarding less pertinent data.To prevent issues such as gradient explosion or vanishing,the model incorporates a residual connection of the original text vector for fine-tuning.This approach ensures robust model performance by maintaining essential linguistic details throughout the correction process.Experimental evaluations on the SIGHAN benchmark dataset demonstrate that the pro-posed model outperforms baseline approaches across various metrics and datasets,confirming its effectiveness and feasibility.