Gait recognition is a key biometric for long-distance identification,yet its performance is severely degraded by real-world challenges such as varying clothing,carrying conditions,and changing viewpoints.While combini...Gait recognition is a key biometric for long-distance identification,yet its performance is severely degraded by real-world challenges such as varying clothing,carrying conditions,and changing viewpoints.While combining silhouette and skeleton data is a promising direction,effectively fusing these heterogeneous modalities and adaptively weighting their contributions in response to diverse conditions remains a central problem.This paper introduces GaitMAFF,a novelMulti-modal Adaptive Feature Fusion Network,to address this challenge.Our approach first transforms discrete skeleton joints into a dense SkeletonMap representation to align with silhouettes,then employs an attention-based module to dynamically learn the fusion weights between the two modalities.These fused features are processed by a powerful spatio-temporal backbone withWeighted Global-Local Feature FusionModules(WFFM)to learn a discriminative representation.Extensive experiments on the challenging CCPG and Gait3D datasets show that GaitMAFF achieves state-of-the-art performance,with an average Rank-1 accuracy of 84.6%on CCPG and 58.7%on Gait3D.These results demonstrate that our adaptive fusion strategy effectively integrates complementary multimodal information,significantly enhancing gait recognition robustness and accuracy in complex scenes and providing a practical solution for real-world applications.展开更多
Autism spectrum disorder(AsD)is a highly heterogeneous neurodevelopmental disorder.Early diagnosis and intervention are crucial for improving outcomes.Traditional single-modality diagnostic methods are subjective,limi...Autism spectrum disorder(AsD)is a highly heterogeneous neurodevelopmental disorder.Early diagnosis and intervention are crucial for improving outcomes.Traditional single-modality diagnostic methods are subjective,limited,and struggle to reveal the underlying pathological mechanisms.In contrast,multimodal data analysis integrates behavioral,physiological,and neuroimaging information with advanced machine-learning and deeplearning algorithms to overcome these limitations.In this review,we surveyed the recent pediatric AsD literature,highlighting artificial intelligence-driven diagnostic techniques,multimodal data fusion strategies,and emerging trends in ASD assessment.We surveyed studies that integrated two or more modalities and summarized the fusion levels,learning paradigms,tasks,datasets,and metrics.Multimodal approaches outperform singlemodality baselines in classification,severity estimation,and subtyping by leveraging complementary information and reducing modality-specific biases.Multimodal approaches significantly enhance diagnostic accuracy and comprehensiveness,enabling early screening of AsD,symptom subtyping,severity assessment,and personalized interventions.Advances in multimodal fusion techniques have promoted progress in precision medicine for the treatment of ASD.展开更多
In multi-modal emotion recognition,excessive reliance on historical context often impedes the detection of emotional shifts,while modality heterogeneity and unimodal noise limit recognition performance.Existing method...In multi-modal emotion recognition,excessive reliance on historical context often impedes the detection of emotional shifts,while modality heterogeneity and unimodal noise limit recognition performance.Existing methods struggle to dynamically adjust cross-modal complementary strength to optimize fusion quality and lack effective mechanisms to model the dynamic evolution of emotions.To address these issues,we propose a multi-level dynamic gating and emotion transfer framework for multi-modal emotion recognition.A dynamic gating mechanism is applied across unimodal encoding,cross-modal alignment,and emotion transfer modeling,substantially improving noise robustness and feature alignment.First,we construct a unimodal encoder based on gated recurrent units and feature-selection gating to suppress intra-modal noise and enhance contextual representation.Second,we design a gated-attention crossmodal encoder that dynamically calibrates the complementary contributions of visual and audio modalities to the dominant textual features and eliminates redundant information.Finally,we introduce a gated enhanced emotion transfer module that explicitly models the temporal dependence of emotional evolution in dialogues via transfer gating and optimizes continuity modeling with a comparative learning loss.Experimental results demonstrate that the proposed method outperforms state-of-the-art models on the public MELD and IEMOCAP datasets.展开更多
Triterpenoids are valuable medicinal scaffolds,characterized by excellent pharmacological properties and the presence of hydroxyl and carboxyl groups that allow for further structural modifications.Expanding the scope...Triterpenoids are valuable medicinal scaffolds,characterized by excellent pharmacological properties and the presence of hydroxyl and carboxyl groups that allow for further structural modifications.Expanding the scope of oxidative modifications on these molecules is crucial for increasing their synthetic structural diversity and unlocking new potential pharmacological activities.However,the progress has been limited by the scarcity of suitable tailoring enzymes.Here,we reported a break-through in achieving targeted and remote dual-site oxidation of licorice triterpenoids using a single P450 mutant.This approach successfully enabled the selective synthesis of the rare triterpenoid,liquiritic acid and 24-OH-liquiritic acid.Our findings demonstrate that microenvironmental accessibility engineering of triterpenoid substrates within the P450 enzyme is essential for continuous and regioselective oxidation.This study not only sheds light on the mechanistic aspects of P450 catalysis but also expands the enzymatic toolkit for selective oxidative modifications in triterpenoid biosynthesis.展开更多
Background:Home accessibility modifi cations are crucial for promoting independent living and quality of life among persons with disabilities.While developed countries have established comprehensive policy frameworks,...Background:Home accessibility modifi cations are crucial for promoting independent living and quality of life among persons with disabilities.While developed countries have established comprehensive policy frameworks,developing nations like China face unique challenges in program design and implementation.Objective:This study conducts a systematic comparative analysis of home accessibility modification policies across China,Japan,Germany,and Sweden,identifying key policy dimensions and proposing evidence-based recommendations for strengthening China’s policy framework.Methods:We employed a multi-dimensional analytical framework examining legislative foundations,eligibility criteria,funding mechanisms,and service delivery models.Data were collected from primary legislation,governmental regulations,official statistics,and peer-reviewed literature.Results:Significant cross-national variations exist in policy approaches.Japan and Germany utilize social insurance models with standardized assessments,Sweden adopts a universal rights-based approach,while China employs a targeted assistance model focused on economically disadvantaged households.China completed 1.28 million household renovations during its 14th Five-Year Plan,demonstrating strong implementation capacity;future policy refi nement could draw on international experience to strengthen assessment standardization,broaden eff ective coverage,and improve the sustainability of fi nancing.Conclusions:China can benefi t from international experience in developing standardized assessment protocols,diversifying funding mechanisms,and establishing professional service delivery systems,while acknowledging contextual constraints unique to developing country settings.展开更多
To address the challenge of achieving decentralized,scalable,and adaptive control for large-scale multiple unmanned aerial vehicle(multi-UAV)swarms in dynamic urban environments with obstacles and wind perturbations,w...To address the challenge of achieving decentralized,scalable,and adaptive control for large-scale multiple unmanned aerial vehicle(multi-UAV)swarms in dynamic urban environments with obstacles and wind perturbations,we proposed a hybrid framework integrating adaptive reinforcement learning(RL),multi-modal perception fusion,and enhanced pigeon flock optimization(PFO)with curiosity-driven exploration to enable robust autonomous and formation control.The framework leverages meta-learning to optimize RL policies for real-time adaptation,fuses sensor data for precise state estimation,and enhances PFO with learned leader-follower dynamics and exploration rewards to maintain cohesive formations and explore uncertain areas.For swarms of 10–30 UAVs,it achieves 34%faster convergence,61%reduced stability root mean square error(RMSE),88%fewer collisions and 85.6%–92.3%success rates in target detection and encirclement,outperforming standard multi-agent RL,pure PFO,and single-modality RL.Three-dimensional trajectory visualizations confirm cohesive formations,collision-free maneuvers,and efficient exploration in urban search-and-rescue scenarios.Innovations include meta-RL for rapid adaptation,multi-modal fusion for robust perception,and curiosity-driven PFO for scalable,decentralized control,advancing real-world multi-UAV swarm autonomy and coordination.展开更多
Background Endometrial receptivity(ERE)is a transient uterine state that determines the success of blastocyst implantation;however,the epigenomic regulation underlying ERE establishment in goats remains unclear.Here,w...Background Endometrial receptivity(ERE)is a transient uterine state that determines the success of blastocyst implantation;however,the epigenomic regulation underlying ERE establishment in goats remains unclear.Here,we profiled transcriptional and epigenomic features of endometrial tissues from pregnant goats during the peri-implantation window and nonpregnant control goats in the regressed luteal phase to uncover the transcriptional regulatory networks responsible for ERE establishment in goats,utilizing RNA-seq,ATAC-seq,and H3 K27 ac CUT&Tag.Results A total of 3,143 differentially expressed genes(DEGs)were identified,accompanied by significant alterations in chromatin accessibility and H3 K27 ac modifications between receptive and non-receptive endometria.The targeted genes associated with these epigenetic changes were significantly enriched in pathways related to cell adhesion,immune tolerance,and embryo attachment.Motif enrichment and transcription factor(TF)footprinting analyses identified members of the FOS/JUN,SOX,HNF1,CEBP,and BATF families as candidate regulators,implicating downstream genes involved in ERE establishment,including SPP1,FOXO1,KLF4/6,STAT1,IFI6,ITGB8,PLAC8,DUSP4,NR1D1,ISG15,RUFY4,and PIK3R3.In addition,numerous super-enhancers were identified,indicating regions of high regulatory activity and potential long-range gene-enhancers interactions in the endometrium.Integration of multi-omics datasets revealed a strong correlation(r>0.7)among chromatin accessibility,H3 K27 ac activation,and the expression of 172 DEGs.Furthermore,a set of hub genes(KLF6,IFI6,MCL1,SDC4,SUSD6,MAFF,and IL6R)that appear to coordinate TF binding and distal super-enhancers activity associated with ERE establishment.Conclusions Our data provided an integrated epigenomic atlas of endometrial receptivity establishment in goats and identify candidate regulatory elements and transcription factors that may orchestrate uterine preparation for implantation.These findings offer valuable insights and testable targets for improving fertility in ruminant livestock.展开更多
The accessibility of urban public transit directly influences residents’quality of life,travel behavior,and social equity.Its correlation with housing prices has garnered significant attention across disciplines such...The accessibility of urban public transit directly influences residents’quality of life,travel behavior,and social equity.Its correlation with housing prices has garnered significant attention across disciplines such as geography,economics,and urban planning.Although much existing research focuses on the impact of individual transportation facilities on housing prices,there is a notable gap in comprehensive analyses that assess the influence of overall urban transit accessibility on housing market dynamics.This study selected the main urban area of Hefei,China,as a case to investigate the spatial distribution of housing prices and evaluate public transit accessibility in 2022.Employing techniques such as the optimized parameter geographical detector and local spatial regression models,the study aimed to elucidate the effects and underlying mechanisms of urban transit accessibility on housing prices.The findings revealed that:1)housing prices in Hefei exhibited a clustered spatial pattern,with high prices concentrated in the city center and lower prices in peripheral areas,forming three distinct high-price hotspots with a‘belt-like’distribution;2)public transit accessibility showed a‘coreperiphery’structure,with accessibility declining in a‘circumferential’pattern around the city center.Based on the‘housing price-accessibility’dimension,four categories were identified:high price-high accessibility(37.25%),high price-low accessibility(19.07%),low price-high accessibility(21.95%),and low price-low accessibility(21.73%);3)the impact of transit accessibility on housing prices was spatially heterogeneous,with bus travel showing the strongest explanatory power(0.692),followed by automobile,subway,and bicycle travel.The interaction of these transportation modes generated a synergistic effect on housing price differentiation,with most influencing factors contributing more than 25%.These findings offer valuable insights for optimizing the spatial distribution of public transit infrastructure and improving both urban housing quality and residents’living standards.展开更多
Central nervous system(CNS) axons fail to regenerate following brain or spinal cord injury(SCI),which typically leads to permanent neurological deficits.Peripheral nervous system axons,howeve r,can regenerate followin...Central nervous system(CNS) axons fail to regenerate following brain or spinal cord injury(SCI),which typically leads to permanent neurological deficits.Peripheral nervous system axons,howeve r,can regenerate following injury.Understanding the mechanisms that underlie this difference is key to developing treatments for CNS neurological diseases and injuries characterized by axonal damage.To initiate repair after peripheral nerve injury,dorsal root ganglion(DRG) neurons mobilize a pro-regenerative gene expression program,which facilitates axon outgrowth.展开更多
Under the background of‘the Belt and Road’and‘China-Mongolia-Russia Economic Corridor’initiatives,this paper studied the urban accessibility level,regional accessibility pattern and regional spatial effects along ...Under the background of‘the Belt and Road’and‘China-Mongolia-Russia Economic Corridor’initiatives,this paper studied the urban accessibility level,regional accessibility pattern and regional spatial effects along the Primorsky No.1 and No.2 transportation corridors.First,the evaluation of urban accessibility level with and without Primorsky No.1 and No.2 high-speed rails(HSRs)opening was conducted with two indicators,i.e.,the weighted average travel time,and the economic potential.After the evaluation,the spatial differentiation pattern of the accessibility changes with and without Primorsky No.1 and No.2 HSRs opening was performed respectively using ArcGIS.On these bases,the regional spatial effects brought by Primorsky No.1 and No.2 HSRs opening were studied.The results are as following.First,the urban accessibility level will be greatly improved by the opening of Primorsky No.1 and No.2 HSRs.All adjacent cities will be integrated into‘1 h HSR communication circle’and the whole journey will be integrated into‘4 h HSR communication circle’along Primorsky No.1 and No.2 corridors,respectively.The HSR accessibility of Primorsky No.1 corridor is stronger than that of Primorsky No.2 corridor.But the HSR accessibility improvement degree of Primorsky No.1 corridor is weaker than that of Primorsky No.2 corridor.Second,spatially,along Primorsky No.1 and No.2 corridors,the HSR accessibility level of the cities which are located in China is stronger than those cities located in Russia,showing the‘High West,Low East’patterns.The HSR accessibility improvement degree of the cities which are located in Russia and Sino-Russian border is stronger than those cities located in China,showing the‘High East,Low West’patterns.Third,Primorsky No.1 and No.2 corridors could connect the China’s‘Heilongjiang Land Sea Silk Road Economic Belt’and‘Changchun-Jilin-Tumen Development Pilot Zone’respectively,gradually involving into the development of China’s Harbin-Changchun Megalopolis.Relying on Harbin(China)and Changchun(China),Primorsky No.1 and No.2 HSRs could connect Northeast China-Beijing HSR,accelerating the diffusion of population,economy and other flows from China’s Beijing-Tianjin-Hebei Urban Agglomeration to Northeast China,and then to Russia’s Far East Federal District.Relying on Suifenhe(China)and Hunchun(China),Primorsky No.1 and No.2 HSRs could be conducive to the development of the second largest sea channels for Northeast China,creating the Northeast Asian Urban Belt,and new sea-rail intermodal pattern among China,Russia,Democratic People’s Republic of Korea,Japan and Republic of Korea.Relying on Vladivostok(Russia)and Zarubino(Russia),Primorsky No.1 and No.2 corridors could connect the‘Ice Silk Road’,building the‘Sino-Russian Northern Maritime Corridor’and‘Sino-Russian Arctic Blue Economic Areas’.展开更多
Objective To develop a non-invasive predictive model for coronary artery stenosis severity based on adaptive multi-modal integration of traditional Chinese and western medicine data.Methods Clinical indicators,echocar...Objective To develop a non-invasive predictive model for coronary artery stenosis severity based on adaptive multi-modal integration of traditional Chinese and western medicine data.Methods Clinical indicators,echocardiographic data,traditional Chinese medicine(TCM)tongue manifestations,and facial features were collected from patients who underwent coro-nary computed tomography angiography(CTA)in the Cardiac Care Unit(CCU)of Shanghai Tenth People's Hospital between May 1,2023 and May 1,2024.An adaptive weighted multi-modal data fusion(AWMDF)model based on deep learning was constructed to predict the severity of coronary artery stenosis.The model was evaluated using metrics including accura-cy,precision,recall,F1 score,and the area under the receiver operating characteristic(ROC)curve(AUC).Further performance assessment was conducted through comparisons with six ensemble machine learning methods,data ablation,model component ablation,and various decision-level fusion strategies.Results A total of 158 patients were included in the study.The AWMDF model achieved ex-cellent predictive performance(AUC=0.973,accuracy=0.937,precision=0.937,recall=0.929,and F1 score=0.933).Compared with model ablation,data ablation experiments,and various traditional machine learning models,the AWMDF model demonstrated superior per-formance.Moreover,the adaptive weighting strategy outperformed alternative approaches,including simple weighting,averaging,voting,and fixed-weight schemes.Conclusion The AWMDF model demonstrates potential clinical value in the non-invasive prediction of coronary artery disease and could serve as a tool for clinical decision support.展开更多
Traditional Chinese medicine(TCM)demonstrates distinctive advantages in disease prevention and treatment.However,analyzing its biological mechanisms through the modern medical research paradigm of“single drug,single ...Traditional Chinese medicine(TCM)demonstrates distinctive advantages in disease prevention and treatment.However,analyzing its biological mechanisms through the modern medical research paradigm of“single drug,single target”presents significant challenges due to its holistic approach.Network pharmacology and its core theory of network targets connect drugs and diseases from a holistic and systematic perspective based on biological networks,overcoming the limitations of reductionist research models and showing considerable value in TCM research.Recent integration of network target computational and experimental methods with artificial intelligence(AI)and multi-modal multi-omics technologies has substantially enhanced network pharmacology methodology.The advancement in computational and experimental techniques provides complementary support for network target theory in decoding TCM principles.This review,centered on network targets,examines the progress of network target methods combined with AI in predicting disease molecular mechanisms and drug-target relationships,alongside the application of multi-modal multi-omics technologies in analyzing TCM formulae,syndromes,and toxicity.Looking forward,network target theory is expected to incorporate emerging technologies while developing novel approaches aligned with its unique characteristics,potentially leading to significant breakthroughs in TCM research and advancing scientific understanding and innovation in TCM.展开更多
The multi-modal characteristics of mineral particles play a pivotal role in enhancing the classification accuracy,which is critical for obtaining a profound understanding of the Earth's composition and ensuring ef...The multi-modal characteristics of mineral particles play a pivotal role in enhancing the classification accuracy,which is critical for obtaining a profound understanding of the Earth's composition and ensuring effective exploitation utilization of its resources.However,the existing methods for classifying mineral particles do not fully utilize these multi-modal features,thereby limiting the classification accuracy.Furthermore,when conventional multi-modal image classification methods are applied to planepolarized and cross-polarized sequence images of mineral particles,they encounter issues such as information loss,misaligned features,and challenges in spatiotemporal feature extraction.To address these challenges,we propose a multi-modal mineral particle polarization image classification network(MMGC-Net)for precise mineral particle classification.Initially,MMGC-Net employs a two-dimensional(2D)backbone network with shared parameters to extract features from two types of polarized images to ensure feature alignment.Subsequently,a cross-polarized intra-modal feature fusion module is designed to refine the spatiotemporal features from the extracted features of the cross-polarized sequence images.Ultimately,the inter-modal feature fusion module integrates the two types of modal features to enhance the classification precision.Quantitative and qualitative experimental results indicate that when compared with the current state-of-the-art multi-modal image classification methods,MMGC-Net demonstrates marked superiority in terms of mineral particle multi-modal feature learning and four classification evaluation metrics.It also demonstrates better stability than the existing models.展开更多
With the advent of the next-generation Air Traffic Control(ATC)system,there is growing interest in using Artificial Intelligence(AI)techniques to enhance Situation Awareness(SA)for ATC Controllers(ATCOs),i.e.,Intellig...With the advent of the next-generation Air Traffic Control(ATC)system,there is growing interest in using Artificial Intelligence(AI)techniques to enhance Situation Awareness(SA)for ATC Controllers(ATCOs),i.e.,Intelligent SA(ISA).However,the existing AI-based SA approaches often rely on unimodal data and lack a comprehensive description and benchmark of the ISA tasks utilizing multi-modal data for real-time ATC environments.To address this gap,by analyzing the situation awareness procedure of the ATCOs,the ISA task is refined to the processing of the two primary elements,i.e.,spoken instructions and flight trajectories.Subsequently,the ISA is further formulated into Controlling Intent Understanding(CIU)and Flight Trajectory Prediction(FTP)tasks.For the CIU task,an innovative automatic speech recognition and understanding framework is designed to extract the controlling intent from unstructured and continuous ATC communications.For the FTP task,the single-and multi-horizon FTP approaches are investigated to support the high-precision prediction of the situation evolution.A total of 32 unimodal/multi-modal advanced methods with extensive evaluation metrics are introduced to conduct the benchmarks on the real-world multi-modal ATC situation dataset.Experimental results demonstrate the effectiveness of AI-based techniques in enhancing ISA for the ATC environment.展开更多
A personalized outfit recommendation has emerged as a hot research topic in the fashion domain.However,existing recommendations do not fully exploit user style preferences.Typically,users prefer particular styles such...A personalized outfit recommendation has emerged as a hot research topic in the fashion domain.However,existing recommendations do not fully exploit user style preferences.Typically,users prefer particular styles such as casual and athletic styles,and consider attributes like color and texture when selecting outfits.To achieve personalized outfit recommendations in line with user style preferences,this paper proposes a personal style guided outfit recommendation with multi-modal fashion compatibility modeling,termed as PSGNet.Firstly,a style classifier is designed to categorize fashion images of various clothing types and attributes into distinct style categories.Secondly,a personal style prediction module extracts user style preferences by analyzing historical data.Then,to address the limitations of single-modal representations and enhance fashion compatibility,both fashion images and text data are leveraged to extract multi-modal features.Finally,PSGNet integrates these components through Bayesian personalized ranking(BPR)to unify the personal style and fashion compatibility,where the former is used as personal style features and guides the output of the personalized outfit recommendation tailored to the target user.Extensive experiments on large-scale datasets demonstrate that the proposed model is efficient on the personalized outfit recommendation.展开更多
Multi-modal Named Entity Recognition(MNER)aims to better identify meaningful textual entities by integrating information from images.Previous work has focused on extracting visual semantics at a fine-grained level,or ...Multi-modal Named Entity Recognition(MNER)aims to better identify meaningful textual entities by integrating information from images.Previous work has focused on extracting visual semantics at a fine-grained level,or obtaining entity related external knowledge from knowledge bases or Large Language Models(LLMs).However,these approaches ignore the poor semantic correlation between visual and textual modalities in MNER datasets and do not explore different multi-modal fusion approaches.In this paper,we present MMAVK,a multi-modal named entity recognition model with auxiliary visual knowledge and word-level fusion,which aims to leverage the Multi-modal Large Language Model(MLLM)as an implicit knowledge base.It also extracts vision-based auxiliary knowledge from the image formore accurate and effective recognition.Specifically,we propose vision-based auxiliary knowledge generation,which guides the MLLM to extract external knowledge exclusively derived from images to aid entity recognition by designing target-specific prompts,thus avoiding redundant recognition and cognitive confusion caused by the simultaneous processing of image-text pairs.Furthermore,we employ a word-level multi-modal fusion mechanism to fuse the extracted external knowledge with each word-embedding embedded from the transformerbased encoder.Extensive experimental results demonstrate that MMAVK outperforms or equals the state-of-the-art methods on the two classical MNER datasets,even when the largemodels employed have significantly fewer parameters than other baselines.展开更多
Multi-modal knowledge graph completion(MMKGC)aims to complete missing entities or relations in multi-modal knowledge graphs,thereby discovering more previously unknown triples.Due to the continuous growth of data and ...Multi-modal knowledge graph completion(MMKGC)aims to complete missing entities or relations in multi-modal knowledge graphs,thereby discovering more previously unknown triples.Due to the continuous growth of data and knowledge and the limitations of data sources,the visual knowledge within the knowledge graphs is generally of low quality,and some entities suffer from the issue of missing visual modality.Nevertheless,previous studies of MMKGC have primarily focused on how to facilitate modality interaction and fusion while neglecting the problems of low modality quality and modality missing.In this case,mainstream MMKGC models only use pre-trained visual encoders to extract features and transfer the semantic information to the joint embeddings through modal fusion,which inevitably suffers from problems such as error propagation and increased uncertainty.To address these problems,we propose a Multi-modal knowledge graph Completion model based on Super-resolution and Detailed Description Generation(MMCSD).Specifically,we leverage a pre-trained residual network to enhance the resolution and improve the quality of the visual modality.Moreover,we design multi-level visual semantic extraction and entity description generation,thereby further extracting entity semantics from structural triples and visual images.Meanwhile,we train a variational multi-modal auto-encoder and utilize a pre-trained multi-modal language model to complement the missing visual features.We conducted experiments on FB15K-237 and DB13K,and the results showed that MMCSD can effectively perform MMKGC and achieve state-of-the-art performance.展开更多
Integrating multiple medical imaging techniques,including Magnetic Resonance Imaging(MRI),Computed Tomography,Positron Emission Tomography(PET),and ultrasound,provides a comprehensive view of the patient health status...Integrating multiple medical imaging techniques,including Magnetic Resonance Imaging(MRI),Computed Tomography,Positron Emission Tomography(PET),and ultrasound,provides a comprehensive view of the patient health status.Each of these methods contributes unique diagnostic insights,enhancing the overall assessment of patient condition.Nevertheless,the amalgamation of data from multiple modalities presents difficulties due to disparities in resolution,data collection methods,and noise levels.While traditional models like Convolutional Neural Networks(CNNs)excel in single-modality tasks,they struggle to handle multi-modal complexities,lacking the capacity to model global relationships.This research presents a novel approach for examining multi-modal medical imagery using a transformer-based system.The framework employs self-attention and cross-attention mechanisms to synchronize and integrate features across various modalities.Additionally,it shows resilience to variations in noise and image quality,making it adaptable for real-time clinical use.To address the computational hurdles linked to transformer models,particularly in real-time clinical applications in resource-constrained environments,several optimization techniques have been integrated to boost scalability and efficiency.Initially,a streamlined transformer architecture was adopted to minimize the computational load while maintaining model effectiveness.Methods such as model pruning,quantization,and knowledge distillation have been applied to reduce the parameter count and enhance the inference speed.Furthermore,efficient attention mechanisms such as linear or sparse attention were employed to alleviate the substantial memory and processing requirements of traditional self-attention operations.For further deployment optimization,researchers have implemented hardware-aware acceleration strategies,including the use of TensorRT and ONNX-based model compression,to ensure efficient execution on edge devices.These optimizations allow the approach to function effectively in real-time clinical settings,ensuring viability even in environments with limited resources.Future research directions include integrating non-imaging data to facilitate personalized treatment and enhancing computational efficiency for implementation in resource-limited environments.This study highlights the transformative potential of transformer models in multi-modal medical imaging,offering improvements in diagnostic accuracy and patient care outcomes.展开更多
基金funded by the Natural Science Foundation of Chongqing Municipality,grant number CSTB2022NSCQ-MSX0503.
文摘Gait recognition is a key biometric for long-distance identification,yet its performance is severely degraded by real-world challenges such as varying clothing,carrying conditions,and changing viewpoints.While combining silhouette and skeleton data is a promising direction,effectively fusing these heterogeneous modalities and adaptively weighting their contributions in response to diverse conditions remains a central problem.This paper introduces GaitMAFF,a novelMulti-modal Adaptive Feature Fusion Network,to address this challenge.Our approach first transforms discrete skeleton joints into a dense SkeletonMap representation to align with silhouettes,then employs an attention-based module to dynamically learn the fusion weights between the two modalities.These fused features are processed by a powerful spatio-temporal backbone withWeighted Global-Local Feature FusionModules(WFFM)to learn a discriminative representation.Extensive experiments on the challenging CCPG and Gait3D datasets show that GaitMAFF achieves state-of-the-art performance,with an average Rank-1 accuracy of 84.6%on CCPG and 58.7%on Gait3D.These results demonstrate that our adaptive fusion strategy effectively integrates complementary multimodal information,significantly enhancing gait recognition robustness and accuracy in complex scenes and providing a practical solution for real-world applications.
基金supported by the National Key Research and Development Program of China(Research Grant Number:2023YFC3603600).
文摘Autism spectrum disorder(AsD)is a highly heterogeneous neurodevelopmental disorder.Early diagnosis and intervention are crucial for improving outcomes.Traditional single-modality diagnostic methods are subjective,limited,and struggle to reveal the underlying pathological mechanisms.In contrast,multimodal data analysis integrates behavioral,physiological,and neuroimaging information with advanced machine-learning and deeplearning algorithms to overcome these limitations.In this review,we surveyed the recent pediatric AsD literature,highlighting artificial intelligence-driven diagnostic techniques,multimodal data fusion strategies,and emerging trends in ASD assessment.We surveyed studies that integrated two or more modalities and summarized the fusion levels,learning paradigms,tasks,datasets,and metrics.Multimodal approaches outperform singlemodality baselines in classification,severity estimation,and subtyping by leveraging complementary information and reducing modality-specific biases.Multimodal approaches significantly enhance diagnostic accuracy and comprehensiveness,enabling early screening of AsD,symptom subtyping,severity assessment,and personalized interventions.Advances in multimodal fusion techniques have promoted progress in precision medicine for the treatment of ASD.
基金funded by“the Fanying Special Program of the National Natural Science Foundation of China,grant number 62341307”“the Scientific research project of Jiangxi Provincial Department of Education,grant number GJJ200839”“theDoctoral startup fund of JiangxiUniversity of Technology,grant number 205200100402”.
文摘In multi-modal emotion recognition,excessive reliance on historical context often impedes the detection of emotional shifts,while modality heterogeneity and unimodal noise limit recognition performance.Existing methods struggle to dynamically adjust cross-modal complementary strength to optimize fusion quality and lack effective mechanisms to model the dynamic evolution of emotions.To address these issues,we propose a multi-level dynamic gating and emotion transfer framework for multi-modal emotion recognition.A dynamic gating mechanism is applied across unimodal encoding,cross-modal alignment,and emotion transfer modeling,substantially improving noise robustness and feature alignment.First,we construct a unimodal encoder based on gated recurrent units and feature-selection gating to suppress intra-modal noise and enhance contextual representation.Second,we design a gated-attention crossmodal encoder that dynamically calibrates the complementary contributions of visual and audio modalities to the dominant textual features and eliminates redundant information.Finally,we introduce a gated enhanced emotion transfer module that explicitly models the temporal dependence of emotional evolution in dialogues via transfer gating and optimizes continuity modeling with a comparative learning loss.Experimental results demonstrate that the proposed method outperforms state-of-the-art models on the public MELD and IEMOCAP datasets.
基金supported by grants from the National Natural Science Foundation of China(Nos.22108154,22138006,32171430).
文摘Triterpenoids are valuable medicinal scaffolds,characterized by excellent pharmacological properties and the presence of hydroxyl and carboxyl groups that allow for further structural modifications.Expanding the scope of oxidative modifications on these molecules is crucial for increasing their synthetic structural diversity and unlocking new potential pharmacological activities.However,the progress has been limited by the scarcity of suitable tailoring enzymes.Here,we reported a break-through in achieving targeted and remote dual-site oxidation of licorice triterpenoids using a single P450 mutant.This approach successfully enabled the selective synthesis of the rare triterpenoid,liquiritic acid and 24-OH-liquiritic acid.Our findings demonstrate that microenvironmental accessibility engineering of triterpenoid substrates within the P450 enzyme is essential for continuous and regioselective oxidation.This study not only sheds light on the mechanistic aspects of P450 catalysis but also expands the enzymatic toolkit for selective oxidative modifications in triterpenoid biosynthesis.
基金funded by the China Disabled Persons’Federation under its 2024 research project(Grant No.2024CDPFAT-47)the Yancheng Social Science Foundation(Grant No.25skB252).
文摘Background:Home accessibility modifi cations are crucial for promoting independent living and quality of life among persons with disabilities.While developed countries have established comprehensive policy frameworks,developing nations like China face unique challenges in program design and implementation.Objective:This study conducts a systematic comparative analysis of home accessibility modification policies across China,Japan,Germany,and Sweden,identifying key policy dimensions and proposing evidence-based recommendations for strengthening China’s policy framework.Methods:We employed a multi-dimensional analytical framework examining legislative foundations,eligibility criteria,funding mechanisms,and service delivery models.Data were collected from primary legislation,governmental regulations,official statistics,and peer-reviewed literature.Results:Significant cross-national variations exist in policy approaches.Japan and Germany utilize social insurance models with standardized assessments,Sweden adopts a universal rights-based approach,while China employs a targeted assistance model focused on economically disadvantaged households.China completed 1.28 million household renovations during its 14th Five-Year Plan,demonstrating strong implementation capacity;future policy refi nement could draw on international experience to strengthen assessment standardization,broaden eff ective coverage,and improve the sustainability of fi nancing.Conclusions:China can benefi t from international experience in developing standardized assessment protocols,diversifying funding mechanisms,and establishing professional service delivery systems,while acknowledging contextual constraints unique to developing country settings.
基金supported by the National Natural Science Foundation of China(No.62350048)。
文摘To address the challenge of achieving decentralized,scalable,and adaptive control for large-scale multiple unmanned aerial vehicle(multi-UAV)swarms in dynamic urban environments with obstacles and wind perturbations,we proposed a hybrid framework integrating adaptive reinforcement learning(RL),multi-modal perception fusion,and enhanced pigeon flock optimization(PFO)with curiosity-driven exploration to enable robust autonomous and formation control.The framework leverages meta-learning to optimize RL policies for real-time adaptation,fuses sensor data for precise state estimation,and enhances PFO with learned leader-follower dynamics and exploration rewards to maintain cohesive formations and explore uncertain areas.For swarms of 10–30 UAVs,it achieves 34%faster convergence,61%reduced stability root mean square error(RMSE),88%fewer collisions and 85.6%–92.3%success rates in target detection and encirclement,outperforming standard multi-agent RL,pure PFO,and single-modality RL.Three-dimensional trajectory visualizations confirm cohesive formations,collision-free maneuvers,and efficient exploration in urban search-and-rescue scenarios.Innovations include meta-RL for rapid adaptation,multi-modal fusion for robust perception,and curiosity-driven PFO for scalable,decentralized control,advancing real-world multi-UAV swarm autonomy and coordination.
基金financially supported by the National Natural Science Foundation of China(No.32502862)the Collection,Utilization and Innovation of Animal Resources by Research Institutes and Enterprises of Chongqing(No.Cqnyncwkqlhtxm)+1 种基金the Chongqing Modern Agricultural Industry Technology System(CQMAITS202513)the Key Project of Chongqing Technology Innovation and Application Development Special Program(CTSB2025TIAD-KPX0079)。
文摘Background Endometrial receptivity(ERE)is a transient uterine state that determines the success of blastocyst implantation;however,the epigenomic regulation underlying ERE establishment in goats remains unclear.Here,we profiled transcriptional and epigenomic features of endometrial tissues from pregnant goats during the peri-implantation window and nonpregnant control goats in the regressed luteal phase to uncover the transcriptional regulatory networks responsible for ERE establishment in goats,utilizing RNA-seq,ATAC-seq,and H3 K27 ac CUT&Tag.Results A total of 3,143 differentially expressed genes(DEGs)were identified,accompanied by significant alterations in chromatin accessibility and H3 K27 ac modifications between receptive and non-receptive endometria.The targeted genes associated with these epigenetic changes were significantly enriched in pathways related to cell adhesion,immune tolerance,and embryo attachment.Motif enrichment and transcription factor(TF)footprinting analyses identified members of the FOS/JUN,SOX,HNF1,CEBP,and BATF families as candidate regulators,implicating downstream genes involved in ERE establishment,including SPP1,FOXO1,KLF4/6,STAT1,IFI6,ITGB8,PLAC8,DUSP4,NR1D1,ISG15,RUFY4,and PIK3R3.In addition,numerous super-enhancers were identified,indicating regions of high regulatory activity and potential long-range gene-enhancers interactions in the endometrium.Integration of multi-omics datasets revealed a strong correlation(r>0.7)among chromatin accessibility,H3 K27 ac activation,and the expression of 172 DEGs.Furthermore,a set of hub genes(KLF6,IFI6,MCL1,SDC4,SUSD6,MAFF,and IL6R)that appear to coordinate TF binding and distal super-enhancers activity associated with ERE establishment.Conclusions Our data provided an integrated epigenomic atlas of endometrial receptivity establishment in goats and identify candidate regulatory elements and transcription factors that may orchestrate uterine preparation for implantation.These findings offer valuable insights and testable targets for improving fertility in ruminant livestock.
基金Under the auspices of the National Natural Science Foundation of China(No.42271224,41901193)Ministry of Edu cation Humanities and Social Sciences Research Planning Fund Project of China(No.24YJAZH190)+1 种基金Anhui Province Excellent Youth Research Project in Universities(No.2022AH030019)Anhui Social Sciences Innovation Development Research Project(No.2024CXQ503)。
文摘The accessibility of urban public transit directly influences residents’quality of life,travel behavior,and social equity.Its correlation with housing prices has garnered significant attention across disciplines such as geography,economics,and urban planning.Although much existing research focuses on the impact of individual transportation facilities on housing prices,there is a notable gap in comprehensive analyses that assess the influence of overall urban transit accessibility on housing market dynamics.This study selected the main urban area of Hefei,China,as a case to investigate the spatial distribution of housing prices and evaluate public transit accessibility in 2022.Employing techniques such as the optimized parameter geographical detector and local spatial regression models,the study aimed to elucidate the effects and underlying mechanisms of urban transit accessibility on housing prices.The findings revealed that:1)housing prices in Hefei exhibited a clustered spatial pattern,with high prices concentrated in the city center and lower prices in peripheral areas,forming three distinct high-price hotspots with a‘belt-like’distribution;2)public transit accessibility showed a‘coreperiphery’structure,with accessibility declining in a‘circumferential’pattern around the city center.Based on the‘housing price-accessibility’dimension,four categories were identified:high price-high accessibility(37.25%),high price-low accessibility(19.07%),low price-high accessibility(21.95%),and low price-low accessibility(21.73%);3)the impact of transit accessibility on housing prices was spatially heterogeneous,with bus travel showing the strongest explanatory power(0.692),followed by automobile,subway,and bicycle travel.The interaction of these transportation modes generated a synergistic effect on housing price differentiation,with most influencing factors contributing more than 25%.These findings offer valuable insights for optimizing the spatial distribution of public transit infrastructure and improving both urban housing quality and residents’living standards.
基金supported by the Canada Foundation for Innovation (Project#44220)the Natural Sciences and Engineering Research Council of Canada (RGPIN-2024-03986)+3 种基金the Michael Smith Foundation for Health Research BCthe financial support of Health Canada,through the Canada Brain Research Fund,an innovative partnership between the Government of Canada (through Health Canada),Brain Canada Foundationthe Azrieli Foundationsupported by a Canadian Institutes of Health Research (CIHR) Canada Graduate Scholarship–Master’s Award。
文摘Central nervous system(CNS) axons fail to regenerate following brain or spinal cord injury(SCI),which typically leads to permanent neurological deficits.Peripheral nervous system axons,howeve r,can regenerate following injury.Understanding the mechanisms that underlie this difference is key to developing treatments for CNS neurological diseases and injuries characterized by axonal damage.To initiate repair after peripheral nerve injury,dorsal root ganglion(DRG) neurons mobilize a pro-regenerative gene expression program,which facilitates axon outgrowth.
基金Under the auspices of Heilongjiang Provincial Natural Science Foundation of China(No.YQ2024D012),National Natural Science Foundation of China(No.42071162,42101165,42501220)。
文摘Under the background of‘the Belt and Road’and‘China-Mongolia-Russia Economic Corridor’initiatives,this paper studied the urban accessibility level,regional accessibility pattern and regional spatial effects along the Primorsky No.1 and No.2 transportation corridors.First,the evaluation of urban accessibility level with and without Primorsky No.1 and No.2 high-speed rails(HSRs)opening was conducted with two indicators,i.e.,the weighted average travel time,and the economic potential.After the evaluation,the spatial differentiation pattern of the accessibility changes with and without Primorsky No.1 and No.2 HSRs opening was performed respectively using ArcGIS.On these bases,the regional spatial effects brought by Primorsky No.1 and No.2 HSRs opening were studied.The results are as following.First,the urban accessibility level will be greatly improved by the opening of Primorsky No.1 and No.2 HSRs.All adjacent cities will be integrated into‘1 h HSR communication circle’and the whole journey will be integrated into‘4 h HSR communication circle’along Primorsky No.1 and No.2 corridors,respectively.The HSR accessibility of Primorsky No.1 corridor is stronger than that of Primorsky No.2 corridor.But the HSR accessibility improvement degree of Primorsky No.1 corridor is weaker than that of Primorsky No.2 corridor.Second,spatially,along Primorsky No.1 and No.2 corridors,the HSR accessibility level of the cities which are located in China is stronger than those cities located in Russia,showing the‘High West,Low East’patterns.The HSR accessibility improvement degree of the cities which are located in Russia and Sino-Russian border is stronger than those cities located in China,showing the‘High East,Low West’patterns.Third,Primorsky No.1 and No.2 corridors could connect the China’s‘Heilongjiang Land Sea Silk Road Economic Belt’and‘Changchun-Jilin-Tumen Development Pilot Zone’respectively,gradually involving into the development of China’s Harbin-Changchun Megalopolis.Relying on Harbin(China)and Changchun(China),Primorsky No.1 and No.2 HSRs could connect Northeast China-Beijing HSR,accelerating the diffusion of population,economy and other flows from China’s Beijing-Tianjin-Hebei Urban Agglomeration to Northeast China,and then to Russia’s Far East Federal District.Relying on Suifenhe(China)and Hunchun(China),Primorsky No.1 and No.2 HSRs could be conducive to the development of the second largest sea channels for Northeast China,creating the Northeast Asian Urban Belt,and new sea-rail intermodal pattern among China,Russia,Democratic People’s Republic of Korea,Japan and Republic of Korea.Relying on Vladivostok(Russia)and Zarubino(Russia),Primorsky No.1 and No.2 corridors could connect the‘Ice Silk Road’,building the‘Sino-Russian Northern Maritime Corridor’and‘Sino-Russian Arctic Blue Economic Areas’.
基金Construction Program of the Key Discipline of State Administration of Traditional Chinese Medicine of China(ZYYZDXK-2023069)Research Project of Shanghai Municipal Health Commission (2024QN018)Shanghai University of Traditional Chinese Medicine Science and Technology Development Program (23KFL005)。
文摘Objective To develop a non-invasive predictive model for coronary artery stenosis severity based on adaptive multi-modal integration of traditional Chinese and western medicine data.Methods Clinical indicators,echocardiographic data,traditional Chinese medicine(TCM)tongue manifestations,and facial features were collected from patients who underwent coro-nary computed tomography angiography(CTA)in the Cardiac Care Unit(CCU)of Shanghai Tenth People's Hospital between May 1,2023 and May 1,2024.An adaptive weighted multi-modal data fusion(AWMDF)model based on deep learning was constructed to predict the severity of coronary artery stenosis.The model was evaluated using metrics including accura-cy,precision,recall,F1 score,and the area under the receiver operating characteristic(ROC)curve(AUC).Further performance assessment was conducted through comparisons with six ensemble machine learning methods,data ablation,model component ablation,and various decision-level fusion strategies.Results A total of 158 patients were included in the study.The AWMDF model achieved ex-cellent predictive performance(AUC=0.973,accuracy=0.937,precision=0.937,recall=0.929,and F1 score=0.933).Compared with model ablation,data ablation experiments,and various traditional machine learning models,the AWMDF model demonstrated superior per-formance.Moreover,the adaptive weighting strategy outperformed alternative approaches,including simple weighting,averaging,voting,and fixed-weight schemes.Conclusion The AWMDF model demonstrates potential clinical value in the non-invasive prediction of coronary artery disease and could serve as a tool for clinical decision support.
文摘Traditional Chinese medicine(TCM)demonstrates distinctive advantages in disease prevention and treatment.However,analyzing its biological mechanisms through the modern medical research paradigm of“single drug,single target”presents significant challenges due to its holistic approach.Network pharmacology and its core theory of network targets connect drugs and diseases from a holistic and systematic perspective based on biological networks,overcoming the limitations of reductionist research models and showing considerable value in TCM research.Recent integration of network target computational and experimental methods with artificial intelligence(AI)and multi-modal multi-omics technologies has substantially enhanced network pharmacology methodology.The advancement in computational and experimental techniques provides complementary support for network target theory in decoding TCM principles.This review,centered on network targets,examines the progress of network target methods combined with AI in predicting disease molecular mechanisms and drug-target relationships,alongside the application of multi-modal multi-omics technologies in analyzing TCM formulae,syndromes,and toxicity.Looking forward,network target theory is expected to incorporate emerging technologies while developing novel approaches aligned with its unique characteristics,potentially leading to significant breakthroughs in TCM research and advancing scientific understanding and innovation in TCM.
基金supported by the National Natural Science Foundation of China(Grant Nos.62071315 and 62271336).
文摘The multi-modal characteristics of mineral particles play a pivotal role in enhancing the classification accuracy,which is critical for obtaining a profound understanding of the Earth's composition and ensuring effective exploitation utilization of its resources.However,the existing methods for classifying mineral particles do not fully utilize these multi-modal features,thereby limiting the classification accuracy.Furthermore,when conventional multi-modal image classification methods are applied to planepolarized and cross-polarized sequence images of mineral particles,they encounter issues such as information loss,misaligned features,and challenges in spatiotemporal feature extraction.To address these challenges,we propose a multi-modal mineral particle polarization image classification network(MMGC-Net)for precise mineral particle classification.Initially,MMGC-Net employs a two-dimensional(2D)backbone network with shared parameters to extract features from two types of polarized images to ensure feature alignment.Subsequently,a cross-polarized intra-modal feature fusion module is designed to refine the spatiotemporal features from the extracted features of the cross-polarized sequence images.Ultimately,the inter-modal feature fusion module integrates the two types of modal features to enhance the classification precision.Quantitative and qualitative experimental results indicate that when compared with the current state-of-the-art multi-modal image classification methods,MMGC-Net demonstrates marked superiority in terms of mineral particle multi-modal feature learning and four classification evaluation metrics.It also demonstrates better stability than the existing models.
基金supported by the National Natural Science Foundation of China(Nos.62371323,62401380,U2433217,U2333209,and U20A20161)Natural Science Foundation of Sichuan Province,China(Nos.2025ZNSFSC1476)+2 种基金Sichuan Science and Technology Program,China(Nos.2024YFG0010 and 2024ZDZX0046)the Institutional Research Fund from Sichuan University(Nos.2024SCUQJTX030)the Open Fund of Key Laboratory of Flight Techniques and Flight Safety,CAAC(Nos.GY2024-01A).
文摘With the advent of the next-generation Air Traffic Control(ATC)system,there is growing interest in using Artificial Intelligence(AI)techniques to enhance Situation Awareness(SA)for ATC Controllers(ATCOs),i.e.,Intelligent SA(ISA).However,the existing AI-based SA approaches often rely on unimodal data and lack a comprehensive description and benchmark of the ISA tasks utilizing multi-modal data for real-time ATC environments.To address this gap,by analyzing the situation awareness procedure of the ATCOs,the ISA task is refined to the processing of the two primary elements,i.e.,spoken instructions and flight trajectories.Subsequently,the ISA is further formulated into Controlling Intent Understanding(CIU)and Flight Trajectory Prediction(FTP)tasks.For the CIU task,an innovative automatic speech recognition and understanding framework is designed to extract the controlling intent from unstructured and continuous ATC communications.For the FTP task,the single-and multi-horizon FTP approaches are investigated to support the high-precision prediction of the situation evolution.A total of 32 unimodal/multi-modal advanced methods with extensive evaluation metrics are introduced to conduct the benchmarks on the real-world multi-modal ATC situation dataset.Experimental results demonstrate the effectiveness of AI-based techniques in enhancing ISA for the ATC environment.
基金Shanghai Frontier Science Research Center for Modern Textiles,Donghua University,ChinaOpen Project of Henan Key Laboratory of Intelligent Manufacturing of Mechanical Equipment,Zhengzhou University of Light Industry,China(No.IM202303)National Key Research and Development Program of China(No.2019YFB1706300)。
文摘A personalized outfit recommendation has emerged as a hot research topic in the fashion domain.However,existing recommendations do not fully exploit user style preferences.Typically,users prefer particular styles such as casual and athletic styles,and consider attributes like color and texture when selecting outfits.To achieve personalized outfit recommendations in line with user style preferences,this paper proposes a personal style guided outfit recommendation with multi-modal fashion compatibility modeling,termed as PSGNet.Firstly,a style classifier is designed to categorize fashion images of various clothing types and attributes into distinct style categories.Secondly,a personal style prediction module extracts user style preferences by analyzing historical data.Then,to address the limitations of single-modal representations and enhance fashion compatibility,both fashion images and text data are leveraged to extract multi-modal features.Finally,PSGNet integrates these components through Bayesian personalized ranking(BPR)to unify the personal style and fashion compatibility,where the former is used as personal style features and guides the output of the personalized outfit recommendation tailored to the target user.Extensive experiments on large-scale datasets demonstrate that the proposed model is efficient on the personalized outfit recommendation.
基金funded by Research Project,grant number BHQ090003000X03.
文摘Multi-modal Named Entity Recognition(MNER)aims to better identify meaningful textual entities by integrating information from images.Previous work has focused on extracting visual semantics at a fine-grained level,or obtaining entity related external knowledge from knowledge bases or Large Language Models(LLMs).However,these approaches ignore the poor semantic correlation between visual and textual modalities in MNER datasets and do not explore different multi-modal fusion approaches.In this paper,we present MMAVK,a multi-modal named entity recognition model with auxiliary visual knowledge and word-level fusion,which aims to leverage the Multi-modal Large Language Model(MLLM)as an implicit knowledge base.It also extracts vision-based auxiliary knowledge from the image formore accurate and effective recognition.Specifically,we propose vision-based auxiliary knowledge generation,which guides the MLLM to extract external knowledge exclusively derived from images to aid entity recognition by designing target-specific prompts,thus avoiding redundant recognition and cognitive confusion caused by the simultaneous processing of image-text pairs.Furthermore,we employ a word-level multi-modal fusion mechanism to fuse the extracted external knowledge with each word-embedding embedded from the transformerbased encoder.Extensive experimental results demonstrate that MMAVK outperforms or equals the state-of-the-art methods on the two classical MNER datasets,even when the largemodels employed have significantly fewer parameters than other baselines.
基金funded by Research Project,grant number BHQ090003000X03。
文摘Multi-modal knowledge graph completion(MMKGC)aims to complete missing entities or relations in multi-modal knowledge graphs,thereby discovering more previously unknown triples.Due to the continuous growth of data and knowledge and the limitations of data sources,the visual knowledge within the knowledge graphs is generally of low quality,and some entities suffer from the issue of missing visual modality.Nevertheless,previous studies of MMKGC have primarily focused on how to facilitate modality interaction and fusion while neglecting the problems of low modality quality and modality missing.In this case,mainstream MMKGC models only use pre-trained visual encoders to extract features and transfer the semantic information to the joint embeddings through modal fusion,which inevitably suffers from problems such as error propagation and increased uncertainty.To address these problems,we propose a Multi-modal knowledge graph Completion model based on Super-resolution and Detailed Description Generation(MMCSD).Specifically,we leverage a pre-trained residual network to enhance the resolution and improve the quality of the visual modality.Moreover,we design multi-level visual semantic extraction and entity description generation,thereby further extracting entity semantics from structural triples and visual images.Meanwhile,we train a variational multi-modal auto-encoder and utilize a pre-trained multi-modal language model to complement the missing visual features.We conducted experiments on FB15K-237 and DB13K,and the results showed that MMCSD can effectively perform MMKGC and achieve state-of-the-art performance.
基金supported by the Deanship of Research and Graduate Studies at King Khalid University under Small Research Project grant number RGP1/139/45.
文摘Integrating multiple medical imaging techniques,including Magnetic Resonance Imaging(MRI),Computed Tomography,Positron Emission Tomography(PET),and ultrasound,provides a comprehensive view of the patient health status.Each of these methods contributes unique diagnostic insights,enhancing the overall assessment of patient condition.Nevertheless,the amalgamation of data from multiple modalities presents difficulties due to disparities in resolution,data collection methods,and noise levels.While traditional models like Convolutional Neural Networks(CNNs)excel in single-modality tasks,they struggle to handle multi-modal complexities,lacking the capacity to model global relationships.This research presents a novel approach for examining multi-modal medical imagery using a transformer-based system.The framework employs self-attention and cross-attention mechanisms to synchronize and integrate features across various modalities.Additionally,it shows resilience to variations in noise and image quality,making it adaptable for real-time clinical use.To address the computational hurdles linked to transformer models,particularly in real-time clinical applications in resource-constrained environments,several optimization techniques have been integrated to boost scalability and efficiency.Initially,a streamlined transformer architecture was adopted to minimize the computational load while maintaining model effectiveness.Methods such as model pruning,quantization,and knowledge distillation have been applied to reduce the parameter count and enhance the inference speed.Furthermore,efficient attention mechanisms such as linear or sparse attention were employed to alleviate the substantial memory and processing requirements of traditional self-attention operations.For further deployment optimization,researchers have implemented hardware-aware acceleration strategies,including the use of TensorRT and ONNX-based model compression,to ensure efficient execution on edge devices.These optimizations allow the approach to function effectively in real-time clinical settings,ensuring viability even in environments with limited resources.Future research directions include integrating non-imaging data to facilitate personalized treatment and enhancing computational efficiency for implementation in resource-limited environments.This study highlights the transformative potential of transformer models in multi-modal medical imaging,offering improvements in diagnostic accuracy and patient care outcomes.