Objective To develop a non-invasive predictive model for coronary artery stenosis severity based on adaptive multi-modal integration of traditional Chinese and western medicine data.Methods Clinical indicators,echocar...Objective To develop a non-invasive predictive model for coronary artery stenosis severity based on adaptive multi-modal integration of traditional Chinese and western medicine data.Methods Clinical indicators,echocardiographic data,traditional Chinese medicine(TCM)tongue manifestations,and facial features were collected from patients who underwent coro-nary computed tomography angiography(CTA)in the Cardiac Care Unit(CCU)of Shanghai Tenth People's Hospital between May 1,2023 and May 1,2024.An adaptive weighted multi-modal data fusion(AWMDF)model based on deep learning was constructed to predict the severity of coronary artery stenosis.The model was evaluated using metrics including accura-cy,precision,recall,F1 score,and the area under the receiver operating characteristic(ROC)curve(AUC).Further performance assessment was conducted through comparisons with six ensemble machine learning methods,data ablation,model component ablation,and various decision-level fusion strategies.Results A total of 158 patients were included in the study.The AWMDF model achieved ex-cellent predictive performance(AUC=0.973,accuracy=0.937,precision=0.937,recall=0.929,and F1 score=0.933).Compared with model ablation,data ablation experiments,and various traditional machine learning models,the AWMDF model demonstrated superior per-formance.Moreover,the adaptive weighting strategy outperformed alternative approaches,including simple weighting,averaging,voting,and fixed-weight schemes.Conclusion The AWMDF model demonstrates potential clinical value in the non-invasive prediction of coronary artery disease and could serve as a tool for clinical decision support.展开更多
Traditional Chinese medicine(TCM)demonstrates distinctive advantages in disease prevention and treatment.However,analyzing its biological mechanisms through the modern medical research paradigm of“single drug,single ...Traditional Chinese medicine(TCM)demonstrates distinctive advantages in disease prevention and treatment.However,analyzing its biological mechanisms through the modern medical research paradigm of“single drug,single target”presents significant challenges due to its holistic approach.Network pharmacology and its core theory of network targets connect drugs and diseases from a holistic and systematic perspective based on biological networks,overcoming the limitations of reductionist research models and showing considerable value in TCM research.Recent integration of network target computational and experimental methods with artificial intelligence(AI)and multi-modal multi-omics technologies has substantially enhanced network pharmacology methodology.The advancement in computational and experimental techniques provides complementary support for network target theory in decoding TCM principles.This review,centered on network targets,examines the progress of network target methods combined with AI in predicting disease molecular mechanisms and drug-target relationships,alongside the application of multi-modal multi-omics technologies in analyzing TCM formulae,syndromes,and toxicity.Looking forward,network target theory is expected to incorporate emerging technologies while developing novel approaches aligned with its unique characteristics,potentially leading to significant breakthroughs in TCM research and advancing scientific understanding and innovation in TCM.展开更多
With the advent of the next-generation Air Traffic Control(ATC)system,there is growing interest in using Artificial Intelligence(AI)techniques to enhance Situation Awareness(SA)for ATC Controllers(ATCOs),i.e.,Intellig...With the advent of the next-generation Air Traffic Control(ATC)system,there is growing interest in using Artificial Intelligence(AI)techniques to enhance Situation Awareness(SA)for ATC Controllers(ATCOs),i.e.,Intelligent SA(ISA).However,the existing AI-based SA approaches often rely on unimodal data and lack a comprehensive description and benchmark of the ISA tasks utilizing multi-modal data for real-time ATC environments.To address this gap,by analyzing the situation awareness procedure of the ATCOs,the ISA task is refined to the processing of the two primary elements,i.e.,spoken instructions and flight trajectories.Subsequently,the ISA is further formulated into Controlling Intent Understanding(CIU)and Flight Trajectory Prediction(FTP)tasks.For the CIU task,an innovative automatic speech recognition and understanding framework is designed to extract the controlling intent from unstructured and continuous ATC communications.For the FTP task,the single-and multi-horizon FTP approaches are investigated to support the high-precision prediction of the situation evolution.A total of 32 unimodal/multi-modal advanced methods with extensive evaluation metrics are introduced to conduct the benchmarks on the real-world multi-modal ATC situation dataset.Experimental results demonstrate the effectiveness of AI-based techniques in enhancing ISA for the ATC environment.展开更多
A personalized outfit recommendation has emerged as a hot research topic in the fashion domain.However,existing recommendations do not fully exploit user style preferences.Typically,users prefer particular styles such...A personalized outfit recommendation has emerged as a hot research topic in the fashion domain.However,existing recommendations do not fully exploit user style preferences.Typically,users prefer particular styles such as casual and athletic styles,and consider attributes like color and texture when selecting outfits.To achieve personalized outfit recommendations in line with user style preferences,this paper proposes a personal style guided outfit recommendation with multi-modal fashion compatibility modeling,termed as PSGNet.Firstly,a style classifier is designed to categorize fashion images of various clothing types and attributes into distinct style categories.Secondly,a personal style prediction module extracts user style preferences by analyzing historical data.Then,to address the limitations of single-modal representations and enhance fashion compatibility,both fashion images and text data are leveraged to extract multi-modal features.Finally,PSGNet integrates these components through Bayesian personalized ranking(BPR)to unify the personal style and fashion compatibility,where the former is used as personal style features and guides the output of the personalized outfit recommendation tailored to the target user.Extensive experiments on large-scale datasets demonstrate that the proposed model is efficient on the personalized outfit recommendation.展开更多
Multi-modal Named Entity Recognition(MNER)aims to better identify meaningful textual entities by integrating information from images.Previous work has focused on extracting visual semantics at a fine-grained level,or ...Multi-modal Named Entity Recognition(MNER)aims to better identify meaningful textual entities by integrating information from images.Previous work has focused on extracting visual semantics at a fine-grained level,or obtaining entity related external knowledge from knowledge bases or Large Language Models(LLMs).However,these approaches ignore the poor semantic correlation between visual and textual modalities in MNER datasets and do not explore different multi-modal fusion approaches.In this paper,we present MMAVK,a multi-modal named entity recognition model with auxiliary visual knowledge and word-level fusion,which aims to leverage the Multi-modal Large Language Model(MLLM)as an implicit knowledge base.It also extracts vision-based auxiliary knowledge from the image formore accurate and effective recognition.Specifically,we propose vision-based auxiliary knowledge generation,which guides the MLLM to extract external knowledge exclusively derived from images to aid entity recognition by designing target-specific prompts,thus avoiding redundant recognition and cognitive confusion caused by the simultaneous processing of image-text pairs.Furthermore,we employ a word-level multi-modal fusion mechanism to fuse the extracted external knowledge with each word-embedding embedded from the transformerbased encoder.Extensive experimental results demonstrate that MMAVK outperforms or equals the state-of-the-art methods on the two classical MNER datasets,even when the largemodels employed have significantly fewer parameters than other baselines.展开更多
Multi-modal knowledge graph completion(MMKGC)aims to complete missing entities or relations in multi-modal knowledge graphs,thereby discovering more previously unknown triples.Due to the continuous growth of data and ...Multi-modal knowledge graph completion(MMKGC)aims to complete missing entities or relations in multi-modal knowledge graphs,thereby discovering more previously unknown triples.Due to the continuous growth of data and knowledge and the limitations of data sources,the visual knowledge within the knowledge graphs is generally of low quality,and some entities suffer from the issue of missing visual modality.Nevertheless,previous studies of MMKGC have primarily focused on how to facilitate modality interaction and fusion while neglecting the problems of low modality quality and modality missing.In this case,mainstream MMKGC models only use pre-trained visual encoders to extract features and transfer the semantic information to the joint embeddings through modal fusion,which inevitably suffers from problems such as error propagation and increased uncertainty.To address these problems,we propose a Multi-modal knowledge graph Completion model based on Super-resolution and Detailed Description Generation(MMCSD).Specifically,we leverage a pre-trained residual network to enhance the resolution and improve the quality of the visual modality.Moreover,we design multi-level visual semantic extraction and entity description generation,thereby further extracting entity semantics from structural triples and visual images.Meanwhile,we train a variational multi-modal auto-encoder and utilize a pre-trained multi-modal language model to complement the missing visual features.We conducted experiments on FB15K-237 and DB13K,and the results showed that MMCSD can effectively perform MMKGC and achieve state-of-the-art performance.展开更多
Integrating multiple medical imaging techniques,including Magnetic Resonance Imaging(MRI),Computed Tomography,Positron Emission Tomography(PET),and ultrasound,provides a comprehensive view of the patient health status...Integrating multiple medical imaging techniques,including Magnetic Resonance Imaging(MRI),Computed Tomography,Positron Emission Tomography(PET),and ultrasound,provides a comprehensive view of the patient health status.Each of these methods contributes unique diagnostic insights,enhancing the overall assessment of patient condition.Nevertheless,the amalgamation of data from multiple modalities presents difficulties due to disparities in resolution,data collection methods,and noise levels.While traditional models like Convolutional Neural Networks(CNNs)excel in single-modality tasks,they struggle to handle multi-modal complexities,lacking the capacity to model global relationships.This research presents a novel approach for examining multi-modal medical imagery using a transformer-based system.The framework employs self-attention and cross-attention mechanisms to synchronize and integrate features across various modalities.Additionally,it shows resilience to variations in noise and image quality,making it adaptable for real-time clinical use.To address the computational hurdles linked to transformer models,particularly in real-time clinical applications in resource-constrained environments,several optimization techniques have been integrated to boost scalability and efficiency.Initially,a streamlined transformer architecture was adopted to minimize the computational load while maintaining model effectiveness.Methods such as model pruning,quantization,and knowledge distillation have been applied to reduce the parameter count and enhance the inference speed.Furthermore,efficient attention mechanisms such as linear or sparse attention were employed to alleviate the substantial memory and processing requirements of traditional self-attention operations.For further deployment optimization,researchers have implemented hardware-aware acceleration strategies,including the use of TensorRT and ONNX-based model compression,to ensure efficient execution on edge devices.These optimizations allow the approach to function effectively in real-time clinical settings,ensuring viability even in environments with limited resources.Future research directions include integrating non-imaging data to facilitate personalized treatment and enhancing computational efficiency for implementation in resource-limited environments.This study highlights the transformative potential of transformer models in multi-modal medical imaging,offering improvements in diagnostic accuracy and patient care outcomes.展开更多
Gecko-inspired robots have significant potential applications;however,deviations in the yaw direction during locomotion are inevitable for legged robots that lack external sensing.These deviations cause the robot to s...Gecko-inspired robots have significant potential applications;however,deviations in the yaw direction during locomotion are inevitable for legged robots that lack external sensing.These deviations cause the robot to stray from its intended path.Therefore,a cost-effective and straightforward solution is essential for reducing this deviation.In nature,the tail is often used to maintain balance and stability.Similarly,it has been used in robots to improve manoeuvrability and stability.Our aim is to reduce this deviation using a morphological computation approach,specifically by adding a tail.To test this hypothesis,we investigated four different tails(rigid plate,rigid gecko-shaped,soft plate,and soft gecko-shaped)and assessed the deviation of the robot with these tails on different slopes.Additionally,to evaluate the influence of different tail parameters,such as material,shape,and linkage,we investigated the locomotion performance in terms of the robot's climbing speed on slopes,its ability to turn at narrow corners,and the resistance of the tails to external disturbances.A new auto-reset joint was designed to ensure that a disturbed tail could be quickly reset.Our results demonstrate that the yaw deviation of the robot can be reduced by applying a tail.Among the four tails,the soft gecko-shaped tail was the most effective for most tasks.In summary,our findings demonstrate the functional role of the tail in reducing yaw deviation,improving climbing ability and stability and provide a reference for selecting the most suitable tail for geckoinspired robots.展开更多
Underwater robots have emerged as key tools for marine exploration because of their unique ability to traverse and navigate underwater regions,which pose challenges or dangers to human expeditions.Miniature underwater...Underwater robots have emerged as key tools for marine exploration because of their unique ability to traverse and navigate underwater regions,which pose challenges or dangers to human expeditions.Miniature underwater robots are widely employed in marine science,resource surveys,seabed geological investigations,and marine life observations,owing to their compact size,minimal noise,and agile move-ment.In recent years,researchers have developed diverse miniature underwater robots inspired by bion-ics and other disciplines,leading to many landmark achievements such as centimeter-level wireless control,movement speeds up to hundreds of millimeters per second,underwater three-dimensional motion capabilities,robot swarms,and underwater operation robots.This article offers an overview of the actuation methods and locomotion patterns utilized by miniature underwater robots and assesses the advantages and disadvantages of each method.Furthermore,the challenges confronting currently available miniature underwater robots are summarized,and future development trends are explored.展开更多
This paper presents a template-based control method for achieving diverse trotting motions in quadrupedal systems,with a focus on smooth transitions between walking trot,regular trot,and flying(running)trot.First,we e...This paper presents a template-based control method for achieving diverse trotting motions in quadrupedal systems,with a focus on smooth transitions between walking trot,regular trot,and flying(running)trot.First,we extend the Clock Torque Actuated Spring-Loaded Inverted Pendulum(CT-SLIP)template to three dimensions,creating a comprehensive control framework.A template-based control strategy is then developed to compute joint torques for stable locomotion,along with a detailed approach for transitioning between gaits.To enable the flight phase in the running trot,a projectile motion model is incorporated into the template.For improved turning,we implement a yaw control method that rotates the swing foot plane to enhance stability,enabling higher turning rates while maintaining steady forward motion and balance.To further enhance locomotion stability and performance,a Whole-Body Controller(WBC)is integrated.The proposed method is implemented and rigorously evaluated in the MuJoCo simulator,with experiments testing gait transitions and disturbance rejection.Additionally,comparative studies assess the impacts of both swing foot plane rotation and the WBC on overall system performance.Furthermore,the approach is validated through real hardware experiments on Unitree GO1 quadrupedal robot,successfully demonstrating smooth gait transitions,stable locomotion,and practical applicability in real-world scenarios.展开更多
Recently,wearable gait-assist robots have been evolving towards using soft materials designed for the elderly rather than individuals with disabilities,which emphasize modularization,simplification,and weight reductio...Recently,wearable gait-assist robots have been evolving towards using soft materials designed for the elderly rather than individuals with disabilities,which emphasize modularization,simplification,and weight reduction.Thus,synchronizing the robotic assistive force with that of the user’s leg movements is crucial for usability,which requires accurate recognition of the user’s gait intent.In this study,we propose a deep learning model capable of identifying not only gait mode and gait phase but also phase progression.Utilizing data from five inertial measurement units placed on the body,the proposed two-stage architecture incorporates a bidirectional long short-term memory-based model for robust classification of locomotion modes and phases.Subsequently,phase progression is estimated through 1D convolutional neural network-based regressors,each dedicated to a specific phase.The model was evaluated on a diverse dataset encompassing level walking,stair ascent and descent,and sit-to-stand activities from 10 healthy participants.The results demonstrate its ability to accurately classify locomotion phases and estimate phase progression.Accurate phase progression estimation is essential due to the age-related variability in gait phase durations,particularly evident in older adults,the primary demographic for gait-assist robots.These findings underscore the potential to enhance the assistance,comfort,and safety provided by gait-assist robots.展开更多
To address the challenge of missing modal information in entity alignment and to mitigate information loss or bias arising frommodal heterogeneity during fusion,while also capturing shared information acrossmodalities...To address the challenge of missing modal information in entity alignment and to mitigate information loss or bias arising frommodal heterogeneity during fusion,while also capturing shared information acrossmodalities,this paper proposes a Multi-modal Pre-synergistic Entity Alignmentmodel based on Cross-modalMutual Information Strategy Optimization(MPSEA).The model first employs independent encoders to process multi-modal features,including text,images,and numerical values.Next,a multi-modal pre-synergistic fusion mechanism integrates graph structural and visual modal features into the textual modality as preparatory information.This pre-fusion strategy enables unified perception of heterogeneous modalities at the model’s initial stage,reducing discrepancies during the fusion process.Finally,using cross-modal deep perception reinforcement learning,the model achieves adaptive multilevel feature fusion between modalities,supporting learningmore effective alignment strategies.Extensive experiments on multiple public datasets show that the MPSEA method achieves gains of up to 7% in Hits@1 and 8.2% in MRR on the FBDB15K dataset,and up to 9.1% in Hits@1 and 7.7% in MRR on the FBYG15K dataset,compared to existing state-of-the-art methods.These results confirm the effectiveness of the proposed model.展开更多
Soft robots capable of navigating complex environments hold promise for minimally invasive medical procedures and micromanipulation tasks.Here,we present a magnetically controlled multi-legged soft robot inspired by g...Soft robots capable of navigating complex environments hold promise for minimally invasive medical procedures and micromanipulation tasks.Here,we present a magnetically controlled multi-legged soft robot inspired by green sea turtle locomotion.Our designed robot,featuring six magnetized feet,demonstrates stable motion within a magnetic field strength range of 1.84–6.44 mT.Locomotion displacement scales linearly with field strength,while velocity correlates with frequency,reaching approximately 25 mm/s at 10 Hz.The robot navigates dry,semi-submerged,and fully submerged conditions,climbs slopes up to 30°,and maneuvers through U-shaped bends.Additionally,we demonstrate the robot's capability to smoothly transition between terrestrial and aquatic environments,demonstrating its amphibious locomotion performance.This adaptability to diverse environments,coupled with precise magnetic control,opens new possibilities for soft robotics in confined and complex spaces.Our findings provide a framework for designing highly maneuverable small-scale soft robots with potential applications ranging from targeted drug delivery to environmental sensing in challenging terrains.展开更多
Zebrafish are increasingly being utilized as a laboratory animal species to study various biological processes,both normal and pathological.It is crucial to comprehend the dynamics of zebrafish locomotion and put fort...Zebrafish are increasingly being utilized as a laboratory animal species to study various biological processes,both normal and pathological.It is crucial to comprehend the dynamics of zebrafish locomotion and put forth realistic models since their locomotion characteristics are employed as feedback indicators in diverse experiments.In this study,we conducted experimental research on the locomotion of zebrafish across various spatial sizes,focusing on the analysis of motion step size and motion direction.The results indicated that the motion step exhibits long-range correlations,the motion direction shows unbiased randomness,and the data characteristics are not influenced by spatial size.The dynamic mechanisms are complicated dynamical processes rather than fractional Brownian or Lévy processes motion.Based on the experimental results,we proposed a model for describing the movement of zebrafish in a circular container.Our findings shed light on the locomotion characteristics of zebrafish,and have the potential to benefit both the biological outcomes of animal tests and the welfare of the subjects.展开更多
Adhesive patches offer an effective approach for wound closure,making them highly suitable for biomedical applications.However,conventional patches often face limitations such as dual-sided adhesion,lack of shape adap...Adhesive patches offer an effective approach for wound closure,making them highly suitable for biomedical applications.However,conventional patches often face limitations such as dual-sided adhesion,lack of shape adaptability,and limited maneuverability,which restrict their applications in deeper tissues.In this paper,we develop a magnetic patch robot(PatchBot),for targeted Janus adhesion with tissues.The PatchBot features a unique triple-layer structure,with adhesive,shape-morphing,and anti-adhesive layers,each fulfilling roles to support targeted attachment,enable shape transformation,and prevent unwanted adhesion to surrounding tissues.The Janus adhesion of the PatchBot was extensively demonstrated across a variety of tissues.A localized near-infrared(NIR)laser irradiation was used to induce programmable shape transformations.Magnetic actuation of the PatchBot for targeted adhesion was successfully demonstrated in ex vivo porcine stomach tissue.NIR light-activated shape-morphing and multimodal magnetic actuation significantly enhance its maneuverability and adaptability in confined in vivo environments while ensuring the structural integrity of the adhesive surface during deployment.This proof-of-concept study demonstrates the feasibility of using PatchBot for targeted wound adhesion,showing its potential for minimally invasive,precision therapies in complex in vivo environments.展开更多
Underwater jet propulsion bio-inspired robots have typically been designed based on soft-bodied organisms, exhibiting relatively limited forms of locomotion. Scallop, a bivalve organism capable of jet propulsion, hold...Underwater jet propulsion bio-inspired robots have typically been designed based on soft-bodied organisms, exhibiting relatively limited forms of locomotion. Scallop, a bivalve organism capable of jet propulsion, holds significant importance in the study of underwater motion mechanisms. In this study, we present theoretical fluid mechanics analysis and modeling of the three distinct motion stages of scallops, providing parameterized descriptions of scallop locomotion mechanisms. Accordingly, three-stage adaptive motion control for the scallop robot and model-based robot configuration optimization design were achieved. An experimental platform and a robot prototype were built to validate the accuracy of the motion model and the effectiveness of the control strategy. Additionally, based on the models, future optimization directions for the robot are proposed.展开更多
As the number and complexity of sensors in autonomous vehicles continue to rise,multimodal fusionbased object detection algorithms are increasingly being used to detect 3D environmental information,significantly advan...As the number and complexity of sensors in autonomous vehicles continue to rise,multimodal fusionbased object detection algorithms are increasingly being used to detect 3D environmental information,significantly advancing the development of perception technology in autonomous driving.To further promote the development of fusion algorithms and improve detection performance,this paper discusses the advantages and recent advancements of multimodal fusion-based object detection algorithms.Starting fromsingle-modal sensor detection,the paper provides a detailed overview of typical sensors used in autonomous driving and introduces object detection methods based on images and point clouds.For image-based detection methods,they are categorized into monocular detection and binocular detection based on different input types.For point cloud-based detection methods,they are classified into projection-based,voxel-based,point cluster-based,pillar-based,and graph structure-based approaches based on the technical pathways for processing point cloud features.Additionally,multimodal fusion algorithms are divided into Camera-LiDAR fusion,Camera-Radar fusion,Camera-LiDAR-Radar fusion,and other sensor fusion methods based on the types of sensors involved.Furthermore,the paper identifies five key future research directions in this field,aiming to provide insights for researchers engaged in multimodal fusion-based object detection algorithms and to encourage broader attention to the research and application of multimodal fusion-based object detection.展开更多
Animals can adapt to their surroundings by modifying their trunk morphology,whereas legged robots currently utilize rigid trunks.This study introduces a single-degree-of-freedom(DoF),six-revolute(6R)morphing trunk mec...Animals can adapt to their surroundings by modifying their trunk morphology,whereas legged robots currently utilize rigid trunks.This study introduces a single-degree-of-freedom(DoF),six-revolute(6R)morphing trunk mechanism designed to equip legged robots with variable-width capabilities.Subsequently,a morphology-aware locomotion learning pipeline,based on reinforcement learning,is proposed for real-time trunk-width deformation and adaptive legged locomotion.The proposed variable-width trunk is integrated into a quadrupedal robot,and the learning pipeline is employed to train the adaptive locomotion controller of this robot.This study has three key contributions:(1)An overconstrained morphing mechanism is designed to achieve single-DoF trunk-width deformation,thereby minimizing power consumption and simplifying motion control.(2)A novel morphology-adaptive learning pipeline is introduced that utilizes adversarial joint-level motion imitation to ensure coordination consistency during morphological adaptation.This method addresses dynamic disturbances and interlimb coordination disruptions caused by width modifications.(3)A historical proprioception-based asymmetric neural network architecture is utilized to attain implicit terrain perception without visual input.Collectively,these developments enable the proposed variable-width legged robot to maintain consistent locomotion across complex terrains and facilitate rapid width deformation in response to environmental changes.Extensive simulation experiments validate the proposed design and control methodology.展开更多
The primary objective of Chinese spelling correction(CSC)is to detect and correct erroneous characters in Chinese text,which can result from various factors,such as inaccuracies in pinyin representation,character rese...The primary objective of Chinese spelling correction(CSC)is to detect and correct erroneous characters in Chinese text,which can result from various factors,such as inaccuracies in pinyin representation,character resemblance,and semantic discrepancies.However,existing methods often struggle to fully address these types of errors,impacting the overall correction accuracy.This paper introduces a multi-modal feature encoder designed to efficiently extract features from three distinct modalities:pinyin,semantics,and character morphology.Unlike previous methods that rely on direct fusion or fixed-weight summation to integrate multi-modal information,our approach employs a multi-head attention mechanism to focuse more on relevant modal information while dis-regarding less pertinent data.To prevent issues such as gradient explosion or vanishing,the model incorporates a residual connection of the original text vector for fine-tuning.This approach ensures robust model performance by maintaining essential linguistic details throughout the correction process.Experimental evaluations on the SIGHAN benchmark dataset demonstrate that the pro-posed model outperforms baseline approaches across various metrics and datasets,confirming its effectiveness and feasibility.展开更多
基金Construction Program of the Key Discipline of State Administration of Traditional Chinese Medicine of China(ZYYZDXK-2023069)Research Project of Shanghai Municipal Health Commission (2024QN018)Shanghai University of Traditional Chinese Medicine Science and Technology Development Program (23KFL005)。
文摘Objective To develop a non-invasive predictive model for coronary artery stenosis severity based on adaptive multi-modal integration of traditional Chinese and western medicine data.Methods Clinical indicators,echocardiographic data,traditional Chinese medicine(TCM)tongue manifestations,and facial features were collected from patients who underwent coro-nary computed tomography angiography(CTA)in the Cardiac Care Unit(CCU)of Shanghai Tenth People's Hospital between May 1,2023 and May 1,2024.An adaptive weighted multi-modal data fusion(AWMDF)model based on deep learning was constructed to predict the severity of coronary artery stenosis.The model was evaluated using metrics including accura-cy,precision,recall,F1 score,and the area under the receiver operating characteristic(ROC)curve(AUC).Further performance assessment was conducted through comparisons with six ensemble machine learning methods,data ablation,model component ablation,and various decision-level fusion strategies.Results A total of 158 patients were included in the study.The AWMDF model achieved ex-cellent predictive performance(AUC=0.973,accuracy=0.937,precision=0.937,recall=0.929,and F1 score=0.933).Compared with model ablation,data ablation experiments,and various traditional machine learning models,the AWMDF model demonstrated superior per-formance.Moreover,the adaptive weighting strategy outperformed alternative approaches,including simple weighting,averaging,voting,and fixed-weight schemes.Conclusion The AWMDF model demonstrates potential clinical value in the non-invasive prediction of coronary artery disease and could serve as a tool for clinical decision support.
文摘Traditional Chinese medicine(TCM)demonstrates distinctive advantages in disease prevention and treatment.However,analyzing its biological mechanisms through the modern medical research paradigm of“single drug,single target”presents significant challenges due to its holistic approach.Network pharmacology and its core theory of network targets connect drugs and diseases from a holistic and systematic perspective based on biological networks,overcoming the limitations of reductionist research models and showing considerable value in TCM research.Recent integration of network target computational and experimental methods with artificial intelligence(AI)and multi-modal multi-omics technologies has substantially enhanced network pharmacology methodology.The advancement in computational and experimental techniques provides complementary support for network target theory in decoding TCM principles.This review,centered on network targets,examines the progress of network target methods combined with AI in predicting disease molecular mechanisms and drug-target relationships,alongside the application of multi-modal multi-omics technologies in analyzing TCM formulae,syndromes,and toxicity.Looking forward,network target theory is expected to incorporate emerging technologies while developing novel approaches aligned with its unique characteristics,potentially leading to significant breakthroughs in TCM research and advancing scientific understanding and innovation in TCM.
基金supported by the National Natural Science Foundation of China(Nos.62371323,62401380,U2433217,U2333209,and U20A20161)Natural Science Foundation of Sichuan Province,China(Nos.2025ZNSFSC1476)+2 种基金Sichuan Science and Technology Program,China(Nos.2024YFG0010 and 2024ZDZX0046)the Institutional Research Fund from Sichuan University(Nos.2024SCUQJTX030)the Open Fund of Key Laboratory of Flight Techniques and Flight Safety,CAAC(Nos.GY2024-01A).
文摘With the advent of the next-generation Air Traffic Control(ATC)system,there is growing interest in using Artificial Intelligence(AI)techniques to enhance Situation Awareness(SA)for ATC Controllers(ATCOs),i.e.,Intelligent SA(ISA).However,the existing AI-based SA approaches often rely on unimodal data and lack a comprehensive description and benchmark of the ISA tasks utilizing multi-modal data for real-time ATC environments.To address this gap,by analyzing the situation awareness procedure of the ATCOs,the ISA task is refined to the processing of the two primary elements,i.e.,spoken instructions and flight trajectories.Subsequently,the ISA is further formulated into Controlling Intent Understanding(CIU)and Flight Trajectory Prediction(FTP)tasks.For the CIU task,an innovative automatic speech recognition and understanding framework is designed to extract the controlling intent from unstructured and continuous ATC communications.For the FTP task,the single-and multi-horizon FTP approaches are investigated to support the high-precision prediction of the situation evolution.A total of 32 unimodal/multi-modal advanced methods with extensive evaluation metrics are introduced to conduct the benchmarks on the real-world multi-modal ATC situation dataset.Experimental results demonstrate the effectiveness of AI-based techniques in enhancing ISA for the ATC environment.
基金Shanghai Frontier Science Research Center for Modern Textiles,Donghua University,ChinaOpen Project of Henan Key Laboratory of Intelligent Manufacturing of Mechanical Equipment,Zhengzhou University of Light Industry,China(No.IM202303)National Key Research and Development Program of China(No.2019YFB1706300)。
文摘A personalized outfit recommendation has emerged as a hot research topic in the fashion domain.However,existing recommendations do not fully exploit user style preferences.Typically,users prefer particular styles such as casual and athletic styles,and consider attributes like color and texture when selecting outfits.To achieve personalized outfit recommendations in line with user style preferences,this paper proposes a personal style guided outfit recommendation with multi-modal fashion compatibility modeling,termed as PSGNet.Firstly,a style classifier is designed to categorize fashion images of various clothing types and attributes into distinct style categories.Secondly,a personal style prediction module extracts user style preferences by analyzing historical data.Then,to address the limitations of single-modal representations and enhance fashion compatibility,both fashion images and text data are leveraged to extract multi-modal features.Finally,PSGNet integrates these components through Bayesian personalized ranking(BPR)to unify the personal style and fashion compatibility,where the former is used as personal style features and guides the output of the personalized outfit recommendation tailored to the target user.Extensive experiments on large-scale datasets demonstrate that the proposed model is efficient on the personalized outfit recommendation.
基金funded by Research Project,grant number BHQ090003000X03.
文摘Multi-modal Named Entity Recognition(MNER)aims to better identify meaningful textual entities by integrating information from images.Previous work has focused on extracting visual semantics at a fine-grained level,or obtaining entity related external knowledge from knowledge bases or Large Language Models(LLMs).However,these approaches ignore the poor semantic correlation between visual and textual modalities in MNER datasets and do not explore different multi-modal fusion approaches.In this paper,we present MMAVK,a multi-modal named entity recognition model with auxiliary visual knowledge and word-level fusion,which aims to leverage the Multi-modal Large Language Model(MLLM)as an implicit knowledge base.It also extracts vision-based auxiliary knowledge from the image formore accurate and effective recognition.Specifically,we propose vision-based auxiliary knowledge generation,which guides the MLLM to extract external knowledge exclusively derived from images to aid entity recognition by designing target-specific prompts,thus avoiding redundant recognition and cognitive confusion caused by the simultaneous processing of image-text pairs.Furthermore,we employ a word-level multi-modal fusion mechanism to fuse the extracted external knowledge with each word-embedding embedded from the transformerbased encoder.Extensive experimental results demonstrate that MMAVK outperforms or equals the state-of-the-art methods on the two classical MNER datasets,even when the largemodels employed have significantly fewer parameters than other baselines.
基金funded by Research Project,grant number BHQ090003000X03。
文摘Multi-modal knowledge graph completion(MMKGC)aims to complete missing entities or relations in multi-modal knowledge graphs,thereby discovering more previously unknown triples.Due to the continuous growth of data and knowledge and the limitations of data sources,the visual knowledge within the knowledge graphs is generally of low quality,and some entities suffer from the issue of missing visual modality.Nevertheless,previous studies of MMKGC have primarily focused on how to facilitate modality interaction and fusion while neglecting the problems of low modality quality and modality missing.In this case,mainstream MMKGC models only use pre-trained visual encoders to extract features and transfer the semantic information to the joint embeddings through modal fusion,which inevitably suffers from problems such as error propagation and increased uncertainty.To address these problems,we propose a Multi-modal knowledge graph Completion model based on Super-resolution and Detailed Description Generation(MMCSD).Specifically,we leverage a pre-trained residual network to enhance the resolution and improve the quality of the visual modality.Moreover,we design multi-level visual semantic extraction and entity description generation,thereby further extracting entity semantics from structural triples and visual images.Meanwhile,we train a variational multi-modal auto-encoder and utilize a pre-trained multi-modal language model to complement the missing visual features.We conducted experiments on FB15K-237 and DB13K,and the results showed that MMCSD can effectively perform MMKGC and achieve state-of-the-art performance.
基金supported by the Deanship of Research and Graduate Studies at King Khalid University under Small Research Project grant number RGP1/139/45.
文摘Integrating multiple medical imaging techniques,including Magnetic Resonance Imaging(MRI),Computed Tomography,Positron Emission Tomography(PET),and ultrasound,provides a comprehensive view of the patient health status.Each of these methods contributes unique diagnostic insights,enhancing the overall assessment of patient condition.Nevertheless,the amalgamation of data from multiple modalities presents difficulties due to disparities in resolution,data collection methods,and noise levels.While traditional models like Convolutional Neural Networks(CNNs)excel in single-modality tasks,they struggle to handle multi-modal complexities,lacking the capacity to model global relationships.This research presents a novel approach for examining multi-modal medical imagery using a transformer-based system.The framework employs self-attention and cross-attention mechanisms to synchronize and integrate features across various modalities.Additionally,it shows resilience to variations in noise and image quality,making it adaptable for real-time clinical use.To address the computational hurdles linked to transformer models,particularly in real-time clinical applications in resource-constrained environments,several optimization techniques have been integrated to boost scalability and efficiency.Initially,a streamlined transformer architecture was adopted to minimize the computational load while maintaining model effectiveness.Methods such as model pruning,quantization,and knowledge distillation have been applied to reduce the parameter count and enhance the inference speed.Furthermore,efficient attention mechanisms such as linear or sparse attention were employed to alleviate the substantial memory and processing requirements of traditional self-attention operations.For further deployment optimization,researchers have implemented hardware-aware acceleration strategies,including the use of TensorRT and ONNX-based model compression,to ensure efficient execution on edge devices.These optimizations allow the approach to function effectively in real-time clinical settings,ensuring viability even in environments with limited resources.Future research directions include integrating non-imaging data to facilitate personalized treatment and enhancing computational efficiency for implementation in resource-limited environments.This study highlights the transformative potential of transformer models in multi-modal medical imaging,offering improvements in diagnostic accuracy and patient care outcomes.
基金supported by the National Key Research&Development Program of China(Grant No.2020YFB1313504)the State Key Laboratory of Mechanics and Control for Aerospace Structures of Nanjing University of Aeronautics and Astronautics.
文摘Gecko-inspired robots have significant potential applications;however,deviations in the yaw direction during locomotion are inevitable for legged robots that lack external sensing.These deviations cause the robot to stray from its intended path.Therefore,a cost-effective and straightforward solution is essential for reducing this deviation.In nature,the tail is often used to maintain balance and stability.Similarly,it has been used in robots to improve manoeuvrability and stability.Our aim is to reduce this deviation using a morphological computation approach,specifically by adding a tail.To test this hypothesis,we investigated four different tails(rigid plate,rigid gecko-shaped,soft plate,and soft gecko-shaped)and assessed the deviation of the robot with these tails on different slopes.Additionally,to evaluate the influence of different tail parameters,such as material,shape,and linkage,we investigated the locomotion performance in terms of the robot's climbing speed on slopes,its ability to turn at narrow corners,and the resistance of the tails to external disturbances.A new auto-reset joint was designed to ensure that a disturbed tail could be quickly reset.Our results demonstrate that the yaw deviation of the robot can be reduced by applying a tail.Among the four tails,the soft gecko-shaped tail was the most effective for most tasks.In summary,our findings demonstrate the functional role of the tail in reducing yaw deviation,improving climbing ability and stability and provide a reference for selecting the most suitable tail for geckoinspired robots.
基金supported by the Natural Science Foundation of Jiangsu Province,China(BK20220813)the Fundamental Research Funds for the Central Universities(2242023K40014).
文摘Underwater robots have emerged as key tools for marine exploration because of their unique ability to traverse and navigate underwater regions,which pose challenges or dangers to human expeditions.Miniature underwater robots are widely employed in marine science,resource surveys,seabed geological investigations,and marine life observations,owing to their compact size,minimal noise,and agile move-ment.In recent years,researchers have developed diverse miniature underwater robots inspired by bion-ics and other disciplines,leading to many landmark achievements such as centimeter-level wireless control,movement speeds up to hundreds of millimeters per second,underwater three-dimensional motion capabilities,robot swarms,and underwater operation robots.This article offers an overview of the actuation methods and locomotion patterns utilized by miniature underwater robots and assesses the advantages and disadvantages of each method.Furthermore,the challenges confronting currently available miniature underwater robots are summarized,and future development trends are explored.
基金supported by The Scientific and Technological Research Council of Türkiye(TUBITAK)1515 Frontier R&D Laboratories Support Program for Turk Telekom neXt Generation Technologies Lab(XGeNTT)under Project No.5249902supported by the Scientific Research Projects Coordination Unit of Middle East Technical University(METU)under Project No.ADEP-301-2025-11613.
文摘This paper presents a template-based control method for achieving diverse trotting motions in quadrupedal systems,with a focus on smooth transitions between walking trot,regular trot,and flying(running)trot.First,we extend the Clock Torque Actuated Spring-Loaded Inverted Pendulum(CT-SLIP)template to three dimensions,creating a comprehensive control framework.A template-based control strategy is then developed to compute joint torques for stable locomotion,along with a detailed approach for transitioning between gaits.To enable the flight phase in the running trot,a projectile motion model is incorporated into the template.For improved turning,we implement a yaw control method that rotates the swing foot plane to enhance stability,enabling higher turning rates while maintaining steady forward motion and balance.To further enhance locomotion stability and performance,a Whole-Body Controller(WBC)is integrated.The proposed method is implemented and rigorously evaluated in the MuJoCo simulator,with experiments testing gait transitions and disturbance rejection.Additionally,comparative studies assess the impacts of both swing foot plane rotation and the WBC on overall system performance.Furthermore,the approach is validated through real hardware experiments on Unitree GO1 quadrupedal robot,successfully demonstrating smooth gait transitions,stable locomotion,and practical applicability in real-world scenarios.
基金supported by a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute(KHIDI)funded by the Ministry of Health&Welfare,Republic of Korea(Grant Number:RS-2022-KH129263).
文摘Recently,wearable gait-assist robots have been evolving towards using soft materials designed for the elderly rather than individuals with disabilities,which emphasize modularization,simplification,and weight reduction.Thus,synchronizing the robotic assistive force with that of the user’s leg movements is crucial for usability,which requires accurate recognition of the user’s gait intent.In this study,we propose a deep learning model capable of identifying not only gait mode and gait phase but also phase progression.Utilizing data from five inertial measurement units placed on the body,the proposed two-stage architecture incorporates a bidirectional long short-term memory-based model for robust classification of locomotion modes and phases.Subsequently,phase progression is estimated through 1D convolutional neural network-based regressors,each dedicated to a specific phase.The model was evaluated on a diverse dataset encompassing level walking,stair ascent and descent,and sit-to-stand activities from 10 healthy participants.The results demonstrate its ability to accurately classify locomotion phases and estimate phase progression.Accurate phase progression estimation is essential due to the age-related variability in gait phase durations,particularly evident in older adults,the primary demographic for gait-assist robots.These findings underscore the potential to enhance the assistance,comfort,and safety provided by gait-assist robots.
基金partially supported by the National Natural Science Foundation of China under Grants 62471493 and 62402257(for conceptualization and investigation)partially supported by the Natural Science Foundation of Shandong Province,China under Grants ZR2023LZH017,ZR2024MF066,and 2023QF025(for formal analysis and validation)+1 种基金partially supported by the Open Foundation of Key Laboratory of Computing Power Network and Information Security,Ministry of Education,Qilu University of Technology(Shandong Academy of Sciences)under Grant 2023ZD010(for methodology and model design)partially supported by the Russian Science Foundation(RSF)Project under Grant 22-71-10095-P(for validation and results verification).
文摘To address the challenge of missing modal information in entity alignment and to mitigate information loss or bias arising frommodal heterogeneity during fusion,while also capturing shared information acrossmodalities,this paper proposes a Multi-modal Pre-synergistic Entity Alignmentmodel based on Cross-modalMutual Information Strategy Optimization(MPSEA).The model first employs independent encoders to process multi-modal features,including text,images,and numerical values.Next,a multi-modal pre-synergistic fusion mechanism integrates graph structural and visual modal features into the textual modality as preparatory information.This pre-fusion strategy enables unified perception of heterogeneous modalities at the model’s initial stage,reducing discrepancies during the fusion process.Finally,using cross-modal deep perception reinforcement learning,the model achieves adaptive multilevel feature fusion between modalities,supporting learningmore effective alignment strategies.Extensive experiments on multiple public datasets show that the MPSEA method achieves gains of up to 7% in Hits@1 and 8.2% in MRR on the FBDB15K dataset,and up to 9.1% in Hits@1 and 7.7% in MRR on the FBYG15K dataset,compared to existing state-of-the-art methods.These results confirm the effectiveness of the proposed model.
基金supported by Shenzhen Science and Technology Program(nos.JCYJ20210324132810026,GXWD20220811164014001 and KQTD20210811090146075)the National Natural Science Foundation of China(no.52375175)+3 种基金Guangdong Basic and Applied Basic Research Foundation(no.2024A1515240015)Jiangsu Provincial Outstanding Youth Program(no.BK20230072)Suzhou Industrial Foresight and Key Core Technology Project(no.SYC2022044)grants from Jiangsu QingLan Project and Jiangsu 333 high-level talents.
文摘Soft robots capable of navigating complex environments hold promise for minimally invasive medical procedures and micromanipulation tasks.Here,we present a magnetically controlled multi-legged soft robot inspired by green sea turtle locomotion.Our designed robot,featuring six magnetized feet,demonstrates stable motion within a magnetic field strength range of 1.84–6.44 mT.Locomotion displacement scales linearly with field strength,while velocity correlates with frequency,reaching approximately 25 mm/s at 10 Hz.The robot navigates dry,semi-submerged,and fully submerged conditions,climbs slopes up to 30°,and maneuvers through U-shaped bends.Additionally,we demonstrate the robot's capability to smoothly transition between terrestrial and aquatic environments,demonstrating its amphibious locomotion performance.This adaptability to diverse environments,coupled with precise magnetic control,opens new possibilities for soft robotics in confined and complex spaces.Our findings provide a framework for designing highly maneuverable small-scale soft robots with potential applications ranging from targeted drug delivery to environmental sensing in challenging terrains.
基金Project supported by the National Natural Science Foundation of China(Grant No.12205006)the Excellent Youth Scientific Research Project of Anhui Province,China(Grant No.2022AH030107)。
文摘Zebrafish are increasingly being utilized as a laboratory animal species to study various biological processes,both normal and pathological.It is crucial to comprehend the dynamics of zebrafish locomotion and put forth realistic models since their locomotion characteristics are employed as feedback indicators in diverse experiments.In this study,we conducted experimental research on the locomotion of zebrafish across various spatial sizes,focusing on the analysis of motion step size and motion direction.The results indicated that the motion step exhibits long-range correlations,the motion direction shows unbiased randomness,and the data characteristics are not influenced by spatial size.The dynamic mechanisms are complicated dynamical processes rather than fractional Brownian or Lévy processes motion.Based on the experimental results,we proposed a model for describing the movement of zebrafish in a circular container.Our findings shed light on the locomotion characteristics of zebrafish,and have the potential to benefit both the biological outcomes of animal tests and the welfare of the subjects.
基金supported by the National Key Technologies R&D Program of China(Grant No.2023YFC2415900)the National Natural Science Foundation of China(Grant Nos.62373182 and 52405619)+2 种基金the China Postdoctoral Science Foundation(Grant No.2024M751300)supported by the Shenzhen Science and Technology Program(Grant No.JCYJ20241202125417024)Guangdong Basic and Applied Basic Research Foundation(Grant No.2024A1515011915).
文摘Adhesive patches offer an effective approach for wound closure,making them highly suitable for biomedical applications.However,conventional patches often face limitations such as dual-sided adhesion,lack of shape adaptability,and limited maneuverability,which restrict their applications in deeper tissues.In this paper,we develop a magnetic patch robot(PatchBot),for targeted Janus adhesion with tissues.The PatchBot features a unique triple-layer structure,with adhesive,shape-morphing,and anti-adhesive layers,each fulfilling roles to support targeted attachment,enable shape transformation,and prevent unwanted adhesion to surrounding tissues.The Janus adhesion of the PatchBot was extensively demonstrated across a variety of tissues.A localized near-infrared(NIR)laser irradiation was used to induce programmable shape transformations.Magnetic actuation of the PatchBot for targeted adhesion was successfully demonstrated in ex vivo porcine stomach tissue.NIR light-activated shape-morphing and multimodal magnetic actuation significantly enhance its maneuverability and adaptability in confined in vivo environments while ensuring the structural integrity of the adhesive surface during deployment.This proof-of-concept study demonstrates the feasibility of using PatchBot for targeted wound adhesion,showing its potential for minimally invasive,precision therapies in complex in vivo environments.
基金supported by the Fundamental Research Funds for the Central Universities(No.30922010719).
文摘Underwater jet propulsion bio-inspired robots have typically been designed based on soft-bodied organisms, exhibiting relatively limited forms of locomotion. Scallop, a bivalve organism capable of jet propulsion, holds significant importance in the study of underwater motion mechanisms. In this study, we present theoretical fluid mechanics analysis and modeling of the three distinct motion stages of scallops, providing parameterized descriptions of scallop locomotion mechanisms. Accordingly, three-stage adaptive motion control for the scallop robot and model-based robot configuration optimization design were achieved. An experimental platform and a robot prototype were built to validate the accuracy of the motion model and the effectiveness of the control strategy. Additionally, based on the models, future optimization directions for the robot are proposed.
基金funded by the Yangtze River Delta Science and Technology Innovation Community Joint Research Project(2023CSJGG1600)the Natural Science Foundation of Anhui Province(2208085MF173)Wuhu“ChiZhu Light”Major Science and Technology Project(2023ZD01,2023ZD03).
文摘As the number and complexity of sensors in autonomous vehicles continue to rise,multimodal fusionbased object detection algorithms are increasingly being used to detect 3D environmental information,significantly advancing the development of perception technology in autonomous driving.To further promote the development of fusion algorithms and improve detection performance,this paper discusses the advantages and recent advancements of multimodal fusion-based object detection algorithms.Starting fromsingle-modal sensor detection,the paper provides a detailed overview of typical sensors used in autonomous driving and introduces object detection methods based on images and point clouds.For image-based detection methods,they are categorized into monocular detection and binocular detection based on different input types.For point cloud-based detection methods,they are classified into projection-based,voxel-based,point cluster-based,pillar-based,and graph structure-based approaches based on the technical pathways for processing point cloud features.Additionally,multimodal fusion algorithms are divided into Camera-LiDAR fusion,Camera-Radar fusion,Camera-LiDAR-Radar fusion,and other sensor fusion methods based on the types of sensors involved.Furthermore,the paper identifies five key future research directions in this field,aiming to provide insights for researchers engaged in multimodal fusion-based object detection algorithms and to encourage broader attention to the research and application of multimodal fusion-based object detection.
基金Supported by State Key Lab of Mechanical System and Vibration Project of China(Grant No.MSVZD202008).
文摘Animals can adapt to their surroundings by modifying their trunk morphology,whereas legged robots currently utilize rigid trunks.This study introduces a single-degree-of-freedom(DoF),six-revolute(6R)morphing trunk mechanism designed to equip legged robots with variable-width capabilities.Subsequently,a morphology-aware locomotion learning pipeline,based on reinforcement learning,is proposed for real-time trunk-width deformation and adaptive legged locomotion.The proposed variable-width trunk is integrated into a quadrupedal robot,and the learning pipeline is employed to train the adaptive locomotion controller of this robot.This study has three key contributions:(1)An overconstrained morphing mechanism is designed to achieve single-DoF trunk-width deformation,thereby minimizing power consumption and simplifying motion control.(2)A novel morphology-adaptive learning pipeline is introduced that utilizes adversarial joint-level motion imitation to ensure coordination consistency during morphological adaptation.This method addresses dynamic disturbances and interlimb coordination disruptions caused by width modifications.(3)A historical proprioception-based asymmetric neural network architecture is utilized to attain implicit terrain perception without visual input.Collectively,these developments enable the proposed variable-width legged robot to maintain consistent locomotion across complex terrains and facilitate rapid width deformation in response to environmental changes.Extensive simulation experiments validate the proposed design and control methodology.
基金Supported by the National Natural Science Foundation of China(No.61472256,61170277)the Hujiang Foundation(No.A14006).
文摘The primary objective of Chinese spelling correction(CSC)is to detect and correct erroneous characters in Chinese text,which can result from various factors,such as inaccuracies in pinyin representation,character resemblance,and semantic discrepancies.However,existing methods often struggle to fully address these types of errors,impacting the overall correction accuracy.This paper introduces a multi-modal feature encoder designed to efficiently extract features from three distinct modalities:pinyin,semantics,and character morphology.Unlike previous methods that rely on direct fusion or fixed-weight summation to integrate multi-modal information,our approach employs a multi-head attention mechanism to focuse more on relevant modal information while dis-regarding less pertinent data.To prevent issues such as gradient explosion or vanishing,the model incorporates a residual connection of the original text vector for fine-tuning.This approach ensures robust model performance by maintaining essential linguistic details throughout the correction process.Experimental evaluations on the SIGHAN benchmark dataset demonstrate that the pro-posed model outperforms baseline approaches across various metrics and datasets,confirming its effectiveness and feasibility.