Gait recognition is a key biometric for long-distance identification,yet its performance is severely degraded by real-world challenges such as varying clothing,carrying conditions,and changing viewpoints.While combini...Gait recognition is a key biometric for long-distance identification,yet its performance is severely degraded by real-world challenges such as varying clothing,carrying conditions,and changing viewpoints.While combining silhouette and skeleton data is a promising direction,effectively fusing these heterogeneous modalities and adaptively weighting their contributions in response to diverse conditions remains a central problem.This paper introduces GaitMAFF,a novelMulti-modal Adaptive Feature Fusion Network,to address this challenge.Our approach first transforms discrete skeleton joints into a dense SkeletonMap representation to align with silhouettes,then employs an attention-based module to dynamically learn the fusion weights between the two modalities.These fused features are processed by a powerful spatio-temporal backbone withWeighted Global-Local Feature FusionModules(WFFM)to learn a discriminative representation.Extensive experiments on the challenging CCPG and Gait3D datasets show that GaitMAFF achieves state-of-the-art performance,with an average Rank-1 accuracy of 84.6%on CCPG and 58.7%on Gait3D.These results demonstrate that our adaptive fusion strategy effectively integrates complementary multimodal information,significantly enhancing gait recognition robustness and accuracy in complex scenes and providing a practical solution for real-world applications.展开更多
Autism spectrum disorder(AsD)is a highly heterogeneous neurodevelopmental disorder.Early diagnosis and intervention are crucial for improving outcomes.Traditional single-modality diagnostic methods are subjective,limi...Autism spectrum disorder(AsD)is a highly heterogeneous neurodevelopmental disorder.Early diagnosis and intervention are crucial for improving outcomes.Traditional single-modality diagnostic methods are subjective,limited,and struggle to reveal the underlying pathological mechanisms.In contrast,multimodal data analysis integrates behavioral,physiological,and neuroimaging information with advanced machine-learning and deeplearning algorithms to overcome these limitations.In this review,we surveyed the recent pediatric AsD literature,highlighting artificial intelligence-driven diagnostic techniques,multimodal data fusion strategies,and emerging trends in ASD assessment.We surveyed studies that integrated two or more modalities and summarized the fusion levels,learning paradigms,tasks,datasets,and metrics.Multimodal approaches outperform singlemodality baselines in classification,severity estimation,and subtyping by leveraging complementary information and reducing modality-specific biases.Multimodal approaches significantly enhance diagnostic accuracy and comprehensiveness,enabling early screening of AsD,symptom subtyping,severity assessment,and personalized interventions.Advances in multimodal fusion techniques have promoted progress in precision medicine for the treatment of ASD.展开更多
Metal organic framework(MOF) assembled with coordination bonds has the disadvantage of poor stability that limits its application in the field of stationary phase,while covalent organic framework(COF)assembled through...Metal organic framework(MOF) assembled with coordination bonds has the disadvantage of poor stability that limits its application in the field of stationary phase,while covalent organic framework(COF)assembled through covalent bonds exhibits excellent structural stability.It has been shown that the stationary phases prepared by combining MOF and COF can make up for the poor stability of MOF@SiO_(2),and the MOF/COF composites have superior chromatographic separation performance.However,the traditional methods for preparing COF/MOF based stationary phases are generally solvent thermal synthesis.In this study,a green and low-cost synthesis method was proposed for the preparation of MOF/COF@SiO_(2) stationary phase.Firstly,COF@SiO_(2) was prepared in a choline chloride/ethylene glycol based deep eutectic solvent(DES).Secondly,another acid-base tunable DES prepared by mixing p-toluenesulfonic acid(PTSA)and 2-methylimidazole in different proportions was introduced as the reaction solvent and reactant for rapid synthesis of MOF/COF@SiO_(2).Compared with the toxic transition metal-based MOFs selected in most previous studies,a lightweight and non-toxic S-zone metal(calcium) based MOF was employed in this study.PTSA and calcium will form the calcium/oxygen-containing organic acid framework in acidic DES,which assembles with terephthalic acid dissolved in basic DES to form MOF.The strong hydrogen bonding effect of DES can facilitate rapid assembly of Ca-MOF.The obtained Ca-MOF/COF@SiO_(2) can be used for multi-mode chromatography to efficiently separate multiple isomeric/hydrophilic/hydrophobic analytes.The synthesis method of Ca-MOF/COF@SiO_(2) is green and mild,especially the use of acid-base tunable DES promotes the rapid synthesis of non-toxic Ca-MOF/COF@silica composites,which offers an innovative approach of greenly synthesizing novel MOF/COF stationary phases and extends their applications in the field of chromatography.展开更多
In multi-modal emotion recognition,excessive reliance on historical context often impedes the detection of emotional shifts,while modality heterogeneity and unimodal noise limit recognition performance.Existing method...In multi-modal emotion recognition,excessive reliance on historical context often impedes the detection of emotional shifts,while modality heterogeneity and unimodal noise limit recognition performance.Existing methods struggle to dynamically adjust cross-modal complementary strength to optimize fusion quality and lack effective mechanisms to model the dynamic evolution of emotions.To address these issues,we propose a multi-level dynamic gating and emotion transfer framework for multi-modal emotion recognition.A dynamic gating mechanism is applied across unimodal encoding,cross-modal alignment,and emotion transfer modeling,substantially improving noise robustness and feature alignment.First,we construct a unimodal encoder based on gated recurrent units and feature-selection gating to suppress intra-modal noise and enhance contextual representation.Second,we design a gated-attention crossmodal encoder that dynamically calibrates the complementary contributions of visual and audio modalities to the dominant textual features and eliminates redundant information.Finally,we introduce a gated enhanced emotion transfer module that explicitly models the temporal dependence of emotional evolution in dialogues via transfer gating and optimizes continuity modeling with a comparative learning loss.Experimental results demonstrate that the proposed method outperforms state-of-the-art models on the public MELD and IEMOCAP datasets.展开更多
The fasteners employed in the railway tracks are susceptible to defects arising from their intricate composition.Foreign objects are frequently observed on the track bed in an open environment.These two types of defec...The fasteners employed in the railway tracks are susceptible to defects arising from their intricate composition.Foreign objects are frequently observed on the track bed in an open environment.These two types of defects pose potential threats to high-speed trains,thus necessitating timely and accurate track inspection.The majority of extant automatic inspection methods are predicated on the utilization of single visible light data,and the efficacy of the algorithmic processes is influenced by complex environments.Furthermore,due to the single information dimension,the detection accuracy of defects in similar,occluded,and small object categories is low.To address the aforementioned issues,this paper proposes a track defect detectionmethod based on dynamicmulti-modal fusion and challenging object enhanced perception.First,in light of the variances in the representation dimensions ofmultimodal information,this paper proposes a dynamic weighted multi-modal feature fusion module.The fused multi-modal features are assigned weights,and thenmultiplied with the extracted single-modal features atmultiple levels,achieving adaptive adjustment of the response degree of fusion features.Second,a novel stepwise multi-scale convolution feature aggregation module is proposed for challenging objects.The proposed method employs depth separable convolution and cross-scale aggregation operations of different receptive fields to enhance feature extraction and reuse,thereby reducing the degree of progressive loss of effective information.The experimental results demonstrate the efficacy of the proposed method in comparison to eight established methods,encompassing both single-modal and multi-modal methods,as evidenced by the extensive findings within the constructed RGBD dataset.展开更多
The flow behavior of molten steel in the thin slab mold under high casting speed conditions was investigated,with a focus on the multi-mode continuous casting and rolling mold.A steel-slag two-phase flow model was est...The flow behavior of molten steel in the thin slab mold under high casting speed conditions was investigated,with a focus on the multi-mode continuous casting and rolling mold.A steel-slag two-phase flow model was established using large eddy simulation,the volume of fluid,and magnetohydrodynamics methods through numerical simulation.The maximum flow velocity and wave height at the steel-slag interface within the mold are critical evaluation criteria for analyzing asymmetric flow under varying casting speeds and electromagnetic braking.The results indicate that the asymmetric flows within the mold do not occur synchronously.The severity of the asymmetric flow correlates with the velocity difference across the steel-slag interface.A greater biased flow prolongs the time required to revert to a steady state.When the magnetic field intensity is set to 0.24 T and the magnetic pole position is at 390 mm from the steel-slag interface,this configuration can reduce the velocity of the steel-slag interface,thereby mitigating the asymmetric flow.Additionally,it can diminish the velocity,impact depth,and impact intensity on the narrow face of the jet,thus improving the distribution of velocity and turbulent kinetic energy within the mold.This configuration prolongs the time required for the steel-slag interface to transition from a stable state to its maximum velocity and shortens the time for the interface to return to stability from an unstable state.Moreover,it ensures the positional stability of the steel-slag interface,confining its position within−3 mm.展开更多
To address the challenge of achieving decentralized,scalable,and adaptive control for large-scale multiple unmanned aerial vehicle(multi-UAV)swarms in dynamic urban environments with obstacles and wind perturbations,w...To address the challenge of achieving decentralized,scalable,and adaptive control for large-scale multiple unmanned aerial vehicle(multi-UAV)swarms in dynamic urban environments with obstacles and wind perturbations,we proposed a hybrid framework integrating adaptive reinforcement learning(RL),multi-modal perception fusion,and enhanced pigeon flock optimization(PFO)with curiosity-driven exploration to enable robust autonomous and formation control.The framework leverages meta-learning to optimize RL policies for real-time adaptation,fuses sensor data for precise state estimation,and enhances PFO with learned leader-follower dynamics and exploration rewards to maintain cohesive formations and explore uncertain areas.For swarms of 10–30 UAVs,it achieves 34%faster convergence,61%reduced stability root mean square error(RMSE),88%fewer collisions and 85.6%–92.3%success rates in target detection and encirclement,outperforming standard multi-agent RL,pure PFO,and single-modality RL.Three-dimensional trajectory visualizations confirm cohesive formations,collision-free maneuvers,and efficient exploration in urban search-and-rescue scenarios.Innovations include meta-RL for rapid adaptation,multi-modal fusion for robust perception,and curiosity-driven PFO for scalable,decentralized control,advancing real-world multi-UAV swarm autonomy and coordination.展开更多
Objective To develop a non-invasive predictive model for coronary artery stenosis severity based on adaptive multi-modal integration of traditional Chinese and western medicine data.Methods Clinical indicators,echocar...Objective To develop a non-invasive predictive model for coronary artery stenosis severity based on adaptive multi-modal integration of traditional Chinese and western medicine data.Methods Clinical indicators,echocardiographic data,traditional Chinese medicine(TCM)tongue manifestations,and facial features were collected from patients who underwent coro-nary computed tomography angiography(CTA)in the Cardiac Care Unit(CCU)of Shanghai Tenth People's Hospital between May 1,2023 and May 1,2024.An adaptive weighted multi-modal data fusion(AWMDF)model based on deep learning was constructed to predict the severity of coronary artery stenosis.The model was evaluated using metrics including accura-cy,precision,recall,F1 score,and the area under the receiver operating characteristic(ROC)curve(AUC).Further performance assessment was conducted through comparisons with six ensemble machine learning methods,data ablation,model component ablation,and various decision-level fusion strategies.Results A total of 158 patients were included in the study.The AWMDF model achieved ex-cellent predictive performance(AUC=0.973,accuracy=0.937,precision=0.937,recall=0.929,and F1 score=0.933).Compared with model ablation,data ablation experiments,and various traditional machine learning models,the AWMDF model demonstrated superior per-formance.Moreover,the adaptive weighting strategy outperformed alternative approaches,including simple weighting,averaging,voting,and fixed-weight schemes.Conclusion The AWMDF model demonstrates potential clinical value in the non-invasive prediction of coronary artery disease and could serve as a tool for clinical decision support.展开更多
Traditional Chinese medicine(TCM)demonstrates distinctive advantages in disease prevention and treatment.However,analyzing its biological mechanisms through the modern medical research paradigm of“single drug,single ...Traditional Chinese medicine(TCM)demonstrates distinctive advantages in disease prevention and treatment.However,analyzing its biological mechanisms through the modern medical research paradigm of“single drug,single target”presents significant challenges due to its holistic approach.Network pharmacology and its core theory of network targets connect drugs and diseases from a holistic and systematic perspective based on biological networks,overcoming the limitations of reductionist research models and showing considerable value in TCM research.Recent integration of network target computational and experimental methods with artificial intelligence(AI)and multi-modal multi-omics technologies has substantially enhanced network pharmacology methodology.The advancement in computational and experimental techniques provides complementary support for network target theory in decoding TCM principles.This review,centered on network targets,examines the progress of network target methods combined with AI in predicting disease molecular mechanisms and drug-target relationships,alongside the application of multi-modal multi-omics technologies in analyzing TCM formulae,syndromes,and toxicity.Looking forward,network target theory is expected to incorporate emerging technologies while developing novel approaches aligned with its unique characteristics,potentially leading to significant breakthroughs in TCM research and advancing scientific understanding and innovation in TCM.展开更多
The multi-modal characteristics of mineral particles play a pivotal role in enhancing the classification accuracy,which is critical for obtaining a profound understanding of the Earth's composition and ensuring ef...The multi-modal characteristics of mineral particles play a pivotal role in enhancing the classification accuracy,which is critical for obtaining a profound understanding of the Earth's composition and ensuring effective exploitation utilization of its resources.However,the existing methods for classifying mineral particles do not fully utilize these multi-modal features,thereby limiting the classification accuracy.Furthermore,when conventional multi-modal image classification methods are applied to planepolarized and cross-polarized sequence images of mineral particles,they encounter issues such as information loss,misaligned features,and challenges in spatiotemporal feature extraction.To address these challenges,we propose a multi-modal mineral particle polarization image classification network(MMGC-Net)for precise mineral particle classification.Initially,MMGC-Net employs a two-dimensional(2D)backbone network with shared parameters to extract features from two types of polarized images to ensure feature alignment.Subsequently,a cross-polarized intra-modal feature fusion module is designed to refine the spatiotemporal features from the extracted features of the cross-polarized sequence images.Ultimately,the inter-modal feature fusion module integrates the two types of modal features to enhance the classification precision.Quantitative and qualitative experimental results indicate that when compared with the current state-of-the-art multi-modal image classification methods,MMGC-Net demonstrates marked superiority in terms of mineral particle multi-modal feature learning and four classification evaluation metrics.It also demonstrates better stability than the existing models.展开更多
With the advent of the next-generation Air Traffic Control(ATC)system,there is growing interest in using Artificial Intelligence(AI)techniques to enhance Situation Awareness(SA)for ATC Controllers(ATCOs),i.e.,Intellig...With the advent of the next-generation Air Traffic Control(ATC)system,there is growing interest in using Artificial Intelligence(AI)techniques to enhance Situation Awareness(SA)for ATC Controllers(ATCOs),i.e.,Intelligent SA(ISA).However,the existing AI-based SA approaches often rely on unimodal data and lack a comprehensive description and benchmark of the ISA tasks utilizing multi-modal data for real-time ATC environments.To address this gap,by analyzing the situation awareness procedure of the ATCOs,the ISA task is refined to the processing of the two primary elements,i.e.,spoken instructions and flight trajectories.Subsequently,the ISA is further formulated into Controlling Intent Understanding(CIU)and Flight Trajectory Prediction(FTP)tasks.For the CIU task,an innovative automatic speech recognition and understanding framework is designed to extract the controlling intent from unstructured and continuous ATC communications.For the FTP task,the single-and multi-horizon FTP approaches are investigated to support the high-precision prediction of the situation evolution.A total of 32 unimodal/multi-modal advanced methods with extensive evaluation metrics are introduced to conduct the benchmarks on the real-world multi-modal ATC situation dataset.Experimental results demonstrate the effectiveness of AI-based techniques in enhancing ISA for the ATC environment.展开更多
A personalized outfit recommendation has emerged as a hot research topic in the fashion domain.However,existing recommendations do not fully exploit user style preferences.Typically,users prefer particular styles such...A personalized outfit recommendation has emerged as a hot research topic in the fashion domain.However,existing recommendations do not fully exploit user style preferences.Typically,users prefer particular styles such as casual and athletic styles,and consider attributes like color and texture when selecting outfits.To achieve personalized outfit recommendations in line with user style preferences,this paper proposes a personal style guided outfit recommendation with multi-modal fashion compatibility modeling,termed as PSGNet.Firstly,a style classifier is designed to categorize fashion images of various clothing types and attributes into distinct style categories.Secondly,a personal style prediction module extracts user style preferences by analyzing historical data.Then,to address the limitations of single-modal representations and enhance fashion compatibility,both fashion images and text data are leveraged to extract multi-modal features.Finally,PSGNet integrates these components through Bayesian personalized ranking(BPR)to unify the personal style and fashion compatibility,where the former is used as personal style features and guides the output of the personalized outfit recommendation tailored to the target user.Extensive experiments on large-scale datasets demonstrate that the proposed model is efficient on the personalized outfit recommendation.展开更多
Integrating multiple medical imaging techniques,including Magnetic Resonance Imaging(MRI),Computed Tomography,Positron Emission Tomography(PET),and ultrasound,provides a comprehensive view of the patient health status...Integrating multiple medical imaging techniques,including Magnetic Resonance Imaging(MRI),Computed Tomography,Positron Emission Tomography(PET),and ultrasound,provides a comprehensive view of the patient health status.Each of these methods contributes unique diagnostic insights,enhancing the overall assessment of patient condition.Nevertheless,the amalgamation of data from multiple modalities presents difficulties due to disparities in resolution,data collection methods,and noise levels.While traditional models like Convolutional Neural Networks(CNNs)excel in single-modality tasks,they struggle to handle multi-modal complexities,lacking the capacity to model global relationships.This research presents a novel approach for examining multi-modal medical imagery using a transformer-based system.The framework employs self-attention and cross-attention mechanisms to synchronize and integrate features across various modalities.Additionally,it shows resilience to variations in noise and image quality,making it adaptable for real-time clinical use.To address the computational hurdles linked to transformer models,particularly in real-time clinical applications in resource-constrained environments,several optimization techniques have been integrated to boost scalability and efficiency.Initially,a streamlined transformer architecture was adopted to minimize the computational load while maintaining model effectiveness.Methods such as model pruning,quantization,and knowledge distillation have been applied to reduce the parameter count and enhance the inference speed.Furthermore,efficient attention mechanisms such as linear or sparse attention were employed to alleviate the substantial memory and processing requirements of traditional self-attention operations.For further deployment optimization,researchers have implemented hardware-aware acceleration strategies,including the use of TensorRT and ONNX-based model compression,to ensure efficient execution on edge devices.These optimizations allow the approach to function effectively in real-time clinical settings,ensuring viability even in environments with limited resources.Future research directions include integrating non-imaging data to facilitate personalized treatment and enhancing computational efficiency for implementation in resource-limited environments.This study highlights the transformative potential of transformer models in multi-modal medical imaging,offering improvements in diagnostic accuracy and patient care outcomes.展开更多
Carbon dots(CDs)-based composites have shown impressive performance in fields of information encryption and sensing,however,a great challenge is to simultaneously implement multi-mode luminescence and room-temperature...Carbon dots(CDs)-based composites have shown impressive performance in fields of information encryption and sensing,however,a great challenge is to simultaneously implement multi-mode luminescence and room-temperature phosphorescence(RTP)detection in single system due to the formidable synthesis.Herein,a multifunctional composite of Eu&CDs@p RHO has been designed by co-assembly strategy and prepared via a facile calcination and impregnation treatment.Eu&CDs@p RHO exhibits intense fluorescence(FL)and RTP coming from two individual luminous centers,Eu3+in the free pores and CDs in the interrupted structure of RHO zeolite.Unique four-mode color outputs including pink(Eu^(3+),ex.254 nm),light violet(CDs,ex.365 nm),blue(CDs,254 nm off),and green(CDs,365 nm off)could be realized,on the basis of it,a preliminary application of advanced information encoding has been demonstrated.Given the free pores of matrix and stable RTP in water of confined CDs,a visual RTP detection of Fe^(3+)ions is achieved with the detection limit as low as 9.8μmol/L.This work has opened up a new perspective for the strategic amalgamation of luminous vips with porous zeolite to construct the advanced functional materials.展开更多
To address the challenge of missing modal information in entity alignment and to mitigate information loss or bias arising frommodal heterogeneity during fusion,while also capturing shared information acrossmodalities...To address the challenge of missing modal information in entity alignment and to mitigate information loss or bias arising frommodal heterogeneity during fusion,while also capturing shared information acrossmodalities,this paper proposes a Multi-modal Pre-synergistic Entity Alignmentmodel based on Cross-modalMutual Information Strategy Optimization(MPSEA).The model first employs independent encoders to process multi-modal features,including text,images,and numerical values.Next,a multi-modal pre-synergistic fusion mechanism integrates graph structural and visual modal features into the textual modality as preparatory information.This pre-fusion strategy enables unified perception of heterogeneous modalities at the model’s initial stage,reducing discrepancies during the fusion process.Finally,using cross-modal deep perception reinforcement learning,the model achieves adaptive multilevel feature fusion between modalities,supporting learningmore effective alignment strategies.Extensive experiments on multiple public datasets show that the MPSEA method achieves gains of up to 7% in Hits@1 and 8.2% in MRR on the FBDB15K dataset,and up to 9.1% in Hits@1 and 7.7% in MRR on the FBYG15K dataset,compared to existing state-of-the-art methods.These results confirm the effectiveness of the proposed model.展开更多
BACKGROUND Stress ulcers are common complications in critically ill patients,with a higher incidence observed in older patients following gastrointestinal surgery.This study aimed to develop and evaluate the effective...BACKGROUND Stress ulcers are common complications in critically ill patients,with a higher incidence observed in older patients following gastrointestinal surgery.This study aimed to develop and evaluate the effectiveness of a multi-modal intervention protocol to prevent stress ulcers in this high-risk population.AIM To assess the impact of a multi-modal intervention on preventing stress ulcers in older intensive care unit(ICU)patients postoperatively.METHODS A randomized controlled trial involving critically ill patients(aged≥65 years)admitted to the ICU after gastrointestinal surgery was conducted.Patients were randomly assigned to either the intervention group,which received a multimodal stress ulcer prevention protocol,or the control group,which received standard care.The primary outcome measure was the incidence of stress ulcers.The secondary outcomes included ulcer healing time,complication rates,and length of hospital stay.RESULTS A total of 200 patients(100 in each group)were included in this study.The intervention group exhibited a significantly lower incidence of stress ulcers than the control group(15%vs 30%,P<0.01).Additionally,the intervention group demonstrated shorter ulcer healing times(mean 5.2 vs 7.8 days,P<0.05),lower complication rates(10%vs 22%,P<0.05),and reduced length of hospital stay(mean 12.3 vs 15.7 days,P<0.05).CONCLUSION This multi-modal intervention protocol significantly reduced the incidence of stress ulcers and improved clinical outcomes in critically ill older patients after gastrointestinal surgery.This comprehensive approach may provide a valuable strategy for managing high-risk populations in intensive care settings.展开更多
The primary objective of Chinese spelling correction(CSC)is to detect and correct erroneous characters in Chinese text,which can result from various factors,such as inaccuracies in pinyin representation,character rese...The primary objective of Chinese spelling correction(CSC)is to detect and correct erroneous characters in Chinese text,which can result from various factors,such as inaccuracies in pinyin representation,character resemblance,and semantic discrepancies.However,existing methods often struggle to fully address these types of errors,impacting the overall correction accuracy.This paper introduces a multi-modal feature encoder designed to efficiently extract features from three distinct modalities:pinyin,semantics,and character morphology.Unlike previous methods that rely on direct fusion or fixed-weight summation to integrate multi-modal information,our approach employs a multi-head attention mechanism to focuse more on relevant modal information while dis-regarding less pertinent data.To prevent issues such as gradient explosion or vanishing,the model incorporates a residual connection of the original text vector for fine-tuning.This approach ensures robust model performance by maintaining essential linguistic details throughout the correction process.Experimental evaluations on the SIGHAN benchmark dataset demonstrate that the pro-posed model outperforms baseline approaches across various metrics and datasets,confirming its effectiveness and feasibility.展开更多
Acute Bilirubin Encephalopathy(ABE)is a significant threat to neonates and it leads to disability and high mortality rates.Detecting and treating ABE promptly is important to prevent further complications and long-ter...Acute Bilirubin Encephalopathy(ABE)is a significant threat to neonates and it leads to disability and high mortality rates.Detecting and treating ABE promptly is important to prevent further complications and long-term issues.Recent studies have explored ABE diagnosis.However,they often face limitations in classification due to reliance on a single modality of Magnetic Resonance Imaging(MRI).To tackle this problem,the authors propose a Tri-M2MT model for precise ABE detection by using tri-modality MRI scans.The scans include T1-weighted imaging(T1WI),T2-weighted imaging(T2WI),and apparent diffusion coefficient maps to get indepth information.Initially,the tri-modality MRI scans are collected and preprocessesed by using an Advanced Gaussian Filter for noise reduction and Z-score normalisation for data standardisation.An Advanced Capsule Network was utilised to extract relevant features by using Snake Optimization Algorithm to select optimal features based on feature correlation with the aim of minimising complexity and enhancing detection accuracy.Furthermore,a multi-transformer approach was used for feature fusion and identify feature correlations effectively.Finally,accurate ABE diagnosis is achieved through the utilisation of a SoftMax layer.The performance of the proposed Tri-M2MT model is evaluated across various metrics,including accuracy,specificity,sensitivity,F1-score,and ROC curve analysis,and the proposed methodology provides better performance compared to existing methodologies.展开更多
In recent years,the analysis of encrypted network traffic has gained momentum due to the widespread use of Transport Layer Security and Quick UDP Internet Connections protocols,which complicate and prolong the analysi...In recent years,the analysis of encrypted network traffic has gained momentum due to the widespread use of Transport Layer Security and Quick UDP Internet Connections protocols,which complicate and prolong the analysis process.Classification models face challenges in understanding and classifying unknown traffic because of issues related to interpret ability and the representation of traffic data.To tackle these complexities,multi-modal representation learning can be employed to extract meaningful features and represent them in a lower-dimensional latent space.Recently,auto-encoder-based multi-modal representation techniques have shown superior performance in representing network traffic.By combining the advantages of multi-modal representation with efficient classifiers,we can develop robust network traffic classifiers.In this paper,we propose a novel multi-modal encoder-decoder model to create unified representations of network traffic,paired with a robust 1D-CNN(one-dimensional convolution neural network)classifier for effective traffic classification.The proposed model utilizes the ISCX Virtual Private Networknon Virtual Private Network 2016 datasets to extract general multi-modal representations and to train both shallow and deep learning models,such as Random Forest and the 1D-CNN model,for traffic classification.We compare these learning approaches based on the multi-modal representations generated from the autoencoder and the early feature fusion technique.For the classification task,both the Random Forest and 1D-CNN models,when trained on multimodal representations,achieve over 90%accuracy on a highly imbalanced dataset.展开更多
Prostate cancer(PCa)is characterized by high incidence and propensity for easy metastasis,presenting significant challenges in clinical diagnosis and treatment.Tumor microenvironment(TME)-responsive nanomaterials prov...Prostate cancer(PCa)is characterized by high incidence and propensity for easy metastasis,presenting significant challenges in clinical diagnosis and treatment.Tumor microenvironment(TME)-responsive nanomaterials provide a promising prospect for imaging-guided precision therapy.Considering that tumor-derived alkaline phosphatase(ALP)is over-expressed in metastatic PCa,it makes a great chance to develop a theranostics system with ALP responsive in the TME.Herein,an ALP-responsive aggregationinduced emission luminogens(AIEgens)nanoprobe AMNF self-assembly was designed for enhancing the diagnosis and treatment of metastatic PCa.The nanoprobe exhibited self-aggregation in the presence of ALP resulted in aggregation-induced fluorescence,and enhanced accumulation and prolonged retention period at the tumor site.In terms of detection,the fluorescence(FL)/computed tomography(CT)/magnetic resonance(MR)multi-mode imaging effect of nanoprobe was significantly improved post-aggregation,enabling precise diagnosis through the amalgamation of multiple imaging modes.Enhanced CT/MR imaging can achieve assist preoperative tumor diagnosis,and enhanced FL imaging technology can achieve“intraoperative visual navigation”,showing its potential application value in clinical tumor detection and surgical guidance.In terms of treatment,AMNF showed strong absorption in the near infrared region after aggregation,which improved the photothermal treatment effect.Overall,our work developed an effective aggregation-enhanced theranostic strategy for ALP-related cancers.展开更多
基金funded by the Natural Science Foundation of Chongqing Municipality,grant number CSTB2022NSCQ-MSX0503.
文摘Gait recognition is a key biometric for long-distance identification,yet its performance is severely degraded by real-world challenges such as varying clothing,carrying conditions,and changing viewpoints.While combining silhouette and skeleton data is a promising direction,effectively fusing these heterogeneous modalities and adaptively weighting their contributions in response to diverse conditions remains a central problem.This paper introduces GaitMAFF,a novelMulti-modal Adaptive Feature Fusion Network,to address this challenge.Our approach first transforms discrete skeleton joints into a dense SkeletonMap representation to align with silhouettes,then employs an attention-based module to dynamically learn the fusion weights between the two modalities.These fused features are processed by a powerful spatio-temporal backbone withWeighted Global-Local Feature FusionModules(WFFM)to learn a discriminative representation.Extensive experiments on the challenging CCPG and Gait3D datasets show that GaitMAFF achieves state-of-the-art performance,with an average Rank-1 accuracy of 84.6%on CCPG and 58.7%on Gait3D.These results demonstrate that our adaptive fusion strategy effectively integrates complementary multimodal information,significantly enhancing gait recognition robustness and accuracy in complex scenes and providing a practical solution for real-world applications.
基金supported by the National Key Research and Development Program of China(Research Grant Number:2023YFC3603600).
文摘Autism spectrum disorder(AsD)is a highly heterogeneous neurodevelopmental disorder.Early diagnosis and intervention are crucial for improving outcomes.Traditional single-modality diagnostic methods are subjective,limited,and struggle to reveal the underlying pathological mechanisms.In contrast,multimodal data analysis integrates behavioral,physiological,and neuroimaging information with advanced machine-learning and deeplearning algorithms to overcome these limitations.In this review,we surveyed the recent pediatric AsD literature,highlighting artificial intelligence-driven diagnostic techniques,multimodal data fusion strategies,and emerging trends in ASD assessment.We surveyed studies that integrated two or more modalities and summarized the fusion levels,learning paradigms,tasks,datasets,and metrics.Multimodal approaches outperform singlemodality baselines in classification,severity estimation,and subtyping by leveraging complementary information and reducing modality-specific biases.Multimodal approaches significantly enhance diagnostic accuracy and comprehensiveness,enabling early screening of AsD,symptom subtyping,severity assessment,and personalized interventions.Advances in multimodal fusion techniques have promoted progress in precision medicine for the treatment of ASD.
基金supported by National Natural Science Foundation of China (Nos.21906124,32302202)Natural Science Foundation of Hubei Province (No.2017CFB220)Natural Science Foundation of Shandong Province (No.ZR2023MH278)。
文摘Metal organic framework(MOF) assembled with coordination bonds has the disadvantage of poor stability that limits its application in the field of stationary phase,while covalent organic framework(COF)assembled through covalent bonds exhibits excellent structural stability.It has been shown that the stationary phases prepared by combining MOF and COF can make up for the poor stability of MOF@SiO_(2),and the MOF/COF composites have superior chromatographic separation performance.However,the traditional methods for preparing COF/MOF based stationary phases are generally solvent thermal synthesis.In this study,a green and low-cost synthesis method was proposed for the preparation of MOF/COF@SiO_(2) stationary phase.Firstly,COF@SiO_(2) was prepared in a choline chloride/ethylene glycol based deep eutectic solvent(DES).Secondly,another acid-base tunable DES prepared by mixing p-toluenesulfonic acid(PTSA)and 2-methylimidazole in different proportions was introduced as the reaction solvent and reactant for rapid synthesis of MOF/COF@SiO_(2).Compared with the toxic transition metal-based MOFs selected in most previous studies,a lightweight and non-toxic S-zone metal(calcium) based MOF was employed in this study.PTSA and calcium will form the calcium/oxygen-containing organic acid framework in acidic DES,which assembles with terephthalic acid dissolved in basic DES to form MOF.The strong hydrogen bonding effect of DES can facilitate rapid assembly of Ca-MOF.The obtained Ca-MOF/COF@SiO_(2) can be used for multi-mode chromatography to efficiently separate multiple isomeric/hydrophilic/hydrophobic analytes.The synthesis method of Ca-MOF/COF@SiO_(2) is green and mild,especially the use of acid-base tunable DES promotes the rapid synthesis of non-toxic Ca-MOF/COF@silica composites,which offers an innovative approach of greenly synthesizing novel MOF/COF stationary phases and extends their applications in the field of chromatography.
基金funded by“the Fanying Special Program of the National Natural Science Foundation of China,grant number 62341307”“the Scientific research project of Jiangxi Provincial Department of Education,grant number GJJ200839”“theDoctoral startup fund of JiangxiUniversity of Technology,grant number 205200100402”.
文摘In multi-modal emotion recognition,excessive reliance on historical context often impedes the detection of emotional shifts,while modality heterogeneity and unimodal noise limit recognition performance.Existing methods struggle to dynamically adjust cross-modal complementary strength to optimize fusion quality and lack effective mechanisms to model the dynamic evolution of emotions.To address these issues,we propose a multi-level dynamic gating and emotion transfer framework for multi-modal emotion recognition.A dynamic gating mechanism is applied across unimodal encoding,cross-modal alignment,and emotion transfer modeling,substantially improving noise robustness and feature alignment.First,we construct a unimodal encoder based on gated recurrent units and feature-selection gating to suppress intra-modal noise and enhance contextual representation.Second,we design a gated-attention crossmodal encoder that dynamically calibrates the complementary contributions of visual and audio modalities to the dominant textual features and eliminates redundant information.Finally,we introduce a gated enhanced emotion transfer module that explicitly models the temporal dependence of emotional evolution in dialogues via transfer gating and optimizes continuity modeling with a comparative learning loss.Experimental results demonstrate that the proposed method outperforms state-of-the-art models on the public MELD and IEMOCAP datasets.
基金funded by Beijing Natural Science Foundation,grant number L241078.
文摘The fasteners employed in the railway tracks are susceptible to defects arising from their intricate composition.Foreign objects are frequently observed on the track bed in an open environment.These two types of defects pose potential threats to high-speed trains,thus necessitating timely and accurate track inspection.The majority of extant automatic inspection methods are predicated on the utilization of single visible light data,and the efficacy of the algorithmic processes is influenced by complex environments.Furthermore,due to the single information dimension,the detection accuracy of defects in similar,occluded,and small object categories is low.To address the aforementioned issues,this paper proposes a track defect detectionmethod based on dynamicmulti-modal fusion and challenging object enhanced perception.First,in light of the variances in the representation dimensions ofmultimodal information,this paper proposes a dynamic weighted multi-modal feature fusion module.The fused multi-modal features are assigned weights,and thenmultiplied with the extracted single-modal features atmultiple levels,achieving adaptive adjustment of the response degree of fusion features.Second,a novel stepwise multi-scale convolution feature aggregation module is proposed for challenging objects.The proposed method employs depth separable convolution and cross-scale aggregation operations of different receptive fields to enhance feature extraction and reuse,thereby reducing the degree of progressive loss of effective information.The experimental results demonstrate the efficacy of the proposed method in comparison to eight established methods,encompassing both single-modal and multi-modal methods,as evidenced by the extensive findings within the constructed RGBD dataset.
基金support from the National Natural Science Foundation of China(Grant Nos.52174313 and 52304350)thank all members of the Hebei High Quality Steel Continuous Casting Engineering Technology Research Center at North China University of Science and Technology,Tangshan,China.
文摘The flow behavior of molten steel in the thin slab mold under high casting speed conditions was investigated,with a focus on the multi-mode continuous casting and rolling mold.A steel-slag two-phase flow model was established using large eddy simulation,the volume of fluid,and magnetohydrodynamics methods through numerical simulation.The maximum flow velocity and wave height at the steel-slag interface within the mold are critical evaluation criteria for analyzing asymmetric flow under varying casting speeds and electromagnetic braking.The results indicate that the asymmetric flows within the mold do not occur synchronously.The severity of the asymmetric flow correlates with the velocity difference across the steel-slag interface.A greater biased flow prolongs the time required to revert to a steady state.When the magnetic field intensity is set to 0.24 T and the magnetic pole position is at 390 mm from the steel-slag interface,this configuration can reduce the velocity of the steel-slag interface,thereby mitigating the asymmetric flow.Additionally,it can diminish the velocity,impact depth,and impact intensity on the narrow face of the jet,thus improving the distribution of velocity and turbulent kinetic energy within the mold.This configuration prolongs the time required for the steel-slag interface to transition from a stable state to its maximum velocity and shortens the time for the interface to return to stability from an unstable state.Moreover,it ensures the positional stability of the steel-slag interface,confining its position within−3 mm.
基金supported by the National Natural Science Foundation of China(No.62350048)。
文摘To address the challenge of achieving decentralized,scalable,and adaptive control for large-scale multiple unmanned aerial vehicle(multi-UAV)swarms in dynamic urban environments with obstacles and wind perturbations,we proposed a hybrid framework integrating adaptive reinforcement learning(RL),multi-modal perception fusion,and enhanced pigeon flock optimization(PFO)with curiosity-driven exploration to enable robust autonomous and formation control.The framework leverages meta-learning to optimize RL policies for real-time adaptation,fuses sensor data for precise state estimation,and enhances PFO with learned leader-follower dynamics and exploration rewards to maintain cohesive formations and explore uncertain areas.For swarms of 10–30 UAVs,it achieves 34%faster convergence,61%reduced stability root mean square error(RMSE),88%fewer collisions and 85.6%–92.3%success rates in target detection and encirclement,outperforming standard multi-agent RL,pure PFO,and single-modality RL.Three-dimensional trajectory visualizations confirm cohesive formations,collision-free maneuvers,and efficient exploration in urban search-and-rescue scenarios.Innovations include meta-RL for rapid adaptation,multi-modal fusion for robust perception,and curiosity-driven PFO for scalable,decentralized control,advancing real-world multi-UAV swarm autonomy and coordination.
基金Construction Program of the Key Discipline of State Administration of Traditional Chinese Medicine of China(ZYYZDXK-2023069)Research Project of Shanghai Municipal Health Commission (2024QN018)Shanghai University of Traditional Chinese Medicine Science and Technology Development Program (23KFL005)。
文摘Objective To develop a non-invasive predictive model for coronary artery stenosis severity based on adaptive multi-modal integration of traditional Chinese and western medicine data.Methods Clinical indicators,echocardiographic data,traditional Chinese medicine(TCM)tongue manifestations,and facial features were collected from patients who underwent coro-nary computed tomography angiography(CTA)in the Cardiac Care Unit(CCU)of Shanghai Tenth People's Hospital between May 1,2023 and May 1,2024.An adaptive weighted multi-modal data fusion(AWMDF)model based on deep learning was constructed to predict the severity of coronary artery stenosis.The model was evaluated using metrics including accura-cy,precision,recall,F1 score,and the area under the receiver operating characteristic(ROC)curve(AUC).Further performance assessment was conducted through comparisons with six ensemble machine learning methods,data ablation,model component ablation,and various decision-level fusion strategies.Results A total of 158 patients were included in the study.The AWMDF model achieved ex-cellent predictive performance(AUC=0.973,accuracy=0.937,precision=0.937,recall=0.929,and F1 score=0.933).Compared with model ablation,data ablation experiments,and various traditional machine learning models,the AWMDF model demonstrated superior per-formance.Moreover,the adaptive weighting strategy outperformed alternative approaches,including simple weighting,averaging,voting,and fixed-weight schemes.Conclusion The AWMDF model demonstrates potential clinical value in the non-invasive prediction of coronary artery disease and could serve as a tool for clinical decision support.
文摘Traditional Chinese medicine(TCM)demonstrates distinctive advantages in disease prevention and treatment.However,analyzing its biological mechanisms through the modern medical research paradigm of“single drug,single target”presents significant challenges due to its holistic approach.Network pharmacology and its core theory of network targets connect drugs and diseases from a holistic and systematic perspective based on biological networks,overcoming the limitations of reductionist research models and showing considerable value in TCM research.Recent integration of network target computational and experimental methods with artificial intelligence(AI)and multi-modal multi-omics technologies has substantially enhanced network pharmacology methodology.The advancement in computational and experimental techniques provides complementary support for network target theory in decoding TCM principles.This review,centered on network targets,examines the progress of network target methods combined with AI in predicting disease molecular mechanisms and drug-target relationships,alongside the application of multi-modal multi-omics technologies in analyzing TCM formulae,syndromes,and toxicity.Looking forward,network target theory is expected to incorporate emerging technologies while developing novel approaches aligned with its unique characteristics,potentially leading to significant breakthroughs in TCM research and advancing scientific understanding and innovation in TCM.
基金supported by the National Natural Science Foundation of China(Grant Nos.62071315 and 62271336).
文摘The multi-modal characteristics of mineral particles play a pivotal role in enhancing the classification accuracy,which is critical for obtaining a profound understanding of the Earth's composition and ensuring effective exploitation utilization of its resources.However,the existing methods for classifying mineral particles do not fully utilize these multi-modal features,thereby limiting the classification accuracy.Furthermore,when conventional multi-modal image classification methods are applied to planepolarized and cross-polarized sequence images of mineral particles,they encounter issues such as information loss,misaligned features,and challenges in spatiotemporal feature extraction.To address these challenges,we propose a multi-modal mineral particle polarization image classification network(MMGC-Net)for precise mineral particle classification.Initially,MMGC-Net employs a two-dimensional(2D)backbone network with shared parameters to extract features from two types of polarized images to ensure feature alignment.Subsequently,a cross-polarized intra-modal feature fusion module is designed to refine the spatiotemporal features from the extracted features of the cross-polarized sequence images.Ultimately,the inter-modal feature fusion module integrates the two types of modal features to enhance the classification precision.Quantitative and qualitative experimental results indicate that when compared with the current state-of-the-art multi-modal image classification methods,MMGC-Net demonstrates marked superiority in terms of mineral particle multi-modal feature learning and four classification evaluation metrics.It also demonstrates better stability than the existing models.
基金supported by the National Natural Science Foundation of China(Nos.62371323,62401380,U2433217,U2333209,and U20A20161)Natural Science Foundation of Sichuan Province,China(Nos.2025ZNSFSC1476)+2 种基金Sichuan Science and Technology Program,China(Nos.2024YFG0010 and 2024ZDZX0046)the Institutional Research Fund from Sichuan University(Nos.2024SCUQJTX030)the Open Fund of Key Laboratory of Flight Techniques and Flight Safety,CAAC(Nos.GY2024-01A).
文摘With the advent of the next-generation Air Traffic Control(ATC)system,there is growing interest in using Artificial Intelligence(AI)techniques to enhance Situation Awareness(SA)for ATC Controllers(ATCOs),i.e.,Intelligent SA(ISA).However,the existing AI-based SA approaches often rely on unimodal data and lack a comprehensive description and benchmark of the ISA tasks utilizing multi-modal data for real-time ATC environments.To address this gap,by analyzing the situation awareness procedure of the ATCOs,the ISA task is refined to the processing of the two primary elements,i.e.,spoken instructions and flight trajectories.Subsequently,the ISA is further formulated into Controlling Intent Understanding(CIU)and Flight Trajectory Prediction(FTP)tasks.For the CIU task,an innovative automatic speech recognition and understanding framework is designed to extract the controlling intent from unstructured and continuous ATC communications.For the FTP task,the single-and multi-horizon FTP approaches are investigated to support the high-precision prediction of the situation evolution.A total of 32 unimodal/multi-modal advanced methods with extensive evaluation metrics are introduced to conduct the benchmarks on the real-world multi-modal ATC situation dataset.Experimental results demonstrate the effectiveness of AI-based techniques in enhancing ISA for the ATC environment.
基金Shanghai Frontier Science Research Center for Modern Textiles,Donghua University,ChinaOpen Project of Henan Key Laboratory of Intelligent Manufacturing of Mechanical Equipment,Zhengzhou University of Light Industry,China(No.IM202303)National Key Research and Development Program of China(No.2019YFB1706300)。
文摘A personalized outfit recommendation has emerged as a hot research topic in the fashion domain.However,existing recommendations do not fully exploit user style preferences.Typically,users prefer particular styles such as casual and athletic styles,and consider attributes like color and texture when selecting outfits.To achieve personalized outfit recommendations in line with user style preferences,this paper proposes a personal style guided outfit recommendation with multi-modal fashion compatibility modeling,termed as PSGNet.Firstly,a style classifier is designed to categorize fashion images of various clothing types and attributes into distinct style categories.Secondly,a personal style prediction module extracts user style preferences by analyzing historical data.Then,to address the limitations of single-modal representations and enhance fashion compatibility,both fashion images and text data are leveraged to extract multi-modal features.Finally,PSGNet integrates these components through Bayesian personalized ranking(BPR)to unify the personal style and fashion compatibility,where the former is used as personal style features and guides the output of the personalized outfit recommendation tailored to the target user.Extensive experiments on large-scale datasets demonstrate that the proposed model is efficient on the personalized outfit recommendation.
基金supported by the Deanship of Research and Graduate Studies at King Khalid University under Small Research Project grant number RGP1/139/45.
文摘Integrating multiple medical imaging techniques,including Magnetic Resonance Imaging(MRI),Computed Tomography,Positron Emission Tomography(PET),and ultrasound,provides a comprehensive view of the patient health status.Each of these methods contributes unique diagnostic insights,enhancing the overall assessment of patient condition.Nevertheless,the amalgamation of data from multiple modalities presents difficulties due to disparities in resolution,data collection methods,and noise levels.While traditional models like Convolutional Neural Networks(CNNs)excel in single-modality tasks,they struggle to handle multi-modal complexities,lacking the capacity to model global relationships.This research presents a novel approach for examining multi-modal medical imagery using a transformer-based system.The framework employs self-attention and cross-attention mechanisms to synchronize and integrate features across various modalities.Additionally,it shows resilience to variations in noise and image quality,making it adaptable for real-time clinical use.To address the computational hurdles linked to transformer models,particularly in real-time clinical applications in resource-constrained environments,several optimization techniques have been integrated to boost scalability and efficiency.Initially,a streamlined transformer architecture was adopted to minimize the computational load while maintaining model effectiveness.Methods such as model pruning,quantization,and knowledge distillation have been applied to reduce the parameter count and enhance the inference speed.Furthermore,efficient attention mechanisms such as linear or sparse attention were employed to alleviate the substantial memory and processing requirements of traditional self-attention operations.For further deployment optimization,researchers have implemented hardware-aware acceleration strategies,including the use of TensorRT and ONNX-based model compression,to ensure efficient execution on edge devices.These optimizations allow the approach to function effectively in real-time clinical settings,ensuring viability even in environments with limited resources.Future research directions include integrating non-imaging data to facilitate personalized treatment and enhancing computational efficiency for implementation in resource-limited environments.This study highlights the transformative potential of transformer models in multi-modal medical imaging,offering improvements in diagnostic accuracy and patient care outcomes.
基金supported by the National Natural Science Foundation of China(No.22288101)the 111 Project(No.B17020)。
文摘Carbon dots(CDs)-based composites have shown impressive performance in fields of information encryption and sensing,however,a great challenge is to simultaneously implement multi-mode luminescence and room-temperature phosphorescence(RTP)detection in single system due to the formidable synthesis.Herein,a multifunctional composite of Eu&CDs@p RHO has been designed by co-assembly strategy and prepared via a facile calcination and impregnation treatment.Eu&CDs@p RHO exhibits intense fluorescence(FL)and RTP coming from two individual luminous centers,Eu3+in the free pores and CDs in the interrupted structure of RHO zeolite.Unique four-mode color outputs including pink(Eu^(3+),ex.254 nm),light violet(CDs,ex.365 nm),blue(CDs,254 nm off),and green(CDs,365 nm off)could be realized,on the basis of it,a preliminary application of advanced information encoding has been demonstrated.Given the free pores of matrix and stable RTP in water of confined CDs,a visual RTP detection of Fe^(3+)ions is achieved with the detection limit as low as 9.8μmol/L.This work has opened up a new perspective for the strategic amalgamation of luminous vips with porous zeolite to construct the advanced functional materials.
基金partially supported by the National Natural Science Foundation of China under Grants 62471493 and 62402257(for conceptualization and investigation)partially supported by the Natural Science Foundation of Shandong Province,China under Grants ZR2023LZH017,ZR2024MF066,and 2023QF025(for formal analysis and validation)+1 种基金partially supported by the Open Foundation of Key Laboratory of Computing Power Network and Information Security,Ministry of Education,Qilu University of Technology(Shandong Academy of Sciences)under Grant 2023ZD010(for methodology and model design)partially supported by the Russian Science Foundation(RSF)Project under Grant 22-71-10095-P(for validation and results verification).
文摘To address the challenge of missing modal information in entity alignment and to mitigate information loss or bias arising frommodal heterogeneity during fusion,while also capturing shared information acrossmodalities,this paper proposes a Multi-modal Pre-synergistic Entity Alignmentmodel based on Cross-modalMutual Information Strategy Optimization(MPSEA).The model first employs independent encoders to process multi-modal features,including text,images,and numerical values.Next,a multi-modal pre-synergistic fusion mechanism integrates graph structural and visual modal features into the textual modality as preparatory information.This pre-fusion strategy enables unified perception of heterogeneous modalities at the model’s initial stage,reducing discrepancies during the fusion process.Finally,using cross-modal deep perception reinforcement learning,the model achieves adaptive multilevel feature fusion between modalities,supporting learningmore effective alignment strategies.Extensive experiments on multiple public datasets show that the MPSEA method achieves gains of up to 7% in Hits@1 and 8.2% in MRR on the FBDB15K dataset,and up to 9.1% in Hits@1 and 7.7% in MRR on the FBYG15K dataset,compared to existing state-of-the-art methods.These results confirm the effectiveness of the proposed model.
文摘BACKGROUND Stress ulcers are common complications in critically ill patients,with a higher incidence observed in older patients following gastrointestinal surgery.This study aimed to develop and evaluate the effectiveness of a multi-modal intervention protocol to prevent stress ulcers in this high-risk population.AIM To assess the impact of a multi-modal intervention on preventing stress ulcers in older intensive care unit(ICU)patients postoperatively.METHODS A randomized controlled trial involving critically ill patients(aged≥65 years)admitted to the ICU after gastrointestinal surgery was conducted.Patients were randomly assigned to either the intervention group,which received a multimodal stress ulcer prevention protocol,or the control group,which received standard care.The primary outcome measure was the incidence of stress ulcers.The secondary outcomes included ulcer healing time,complication rates,and length of hospital stay.RESULTS A total of 200 patients(100 in each group)were included in this study.The intervention group exhibited a significantly lower incidence of stress ulcers than the control group(15%vs 30%,P<0.01).Additionally,the intervention group demonstrated shorter ulcer healing times(mean 5.2 vs 7.8 days,P<0.05),lower complication rates(10%vs 22%,P<0.05),and reduced length of hospital stay(mean 12.3 vs 15.7 days,P<0.05).CONCLUSION This multi-modal intervention protocol significantly reduced the incidence of stress ulcers and improved clinical outcomes in critically ill older patients after gastrointestinal surgery.This comprehensive approach may provide a valuable strategy for managing high-risk populations in intensive care settings.
基金Supported by the National Natural Science Foundation of China(No.61472256,61170277)the Hujiang Foundation(No.A14006).
文摘The primary objective of Chinese spelling correction(CSC)is to detect and correct erroneous characters in Chinese text,which can result from various factors,such as inaccuracies in pinyin representation,character resemblance,and semantic discrepancies.However,existing methods often struggle to fully address these types of errors,impacting the overall correction accuracy.This paper introduces a multi-modal feature encoder designed to efficiently extract features from three distinct modalities:pinyin,semantics,and character morphology.Unlike previous methods that rely on direct fusion or fixed-weight summation to integrate multi-modal information,our approach employs a multi-head attention mechanism to focuse more on relevant modal information while dis-regarding less pertinent data.To prevent issues such as gradient explosion or vanishing,the model incorporates a residual connection of the original text vector for fine-tuning.This approach ensures robust model performance by maintaining essential linguistic details throughout the correction process.Experimental evaluations on the SIGHAN benchmark dataset demonstrate that the pro-posed model outperforms baseline approaches across various metrics and datasets,confirming its effectiveness and feasibility.
文摘Acute Bilirubin Encephalopathy(ABE)is a significant threat to neonates and it leads to disability and high mortality rates.Detecting and treating ABE promptly is important to prevent further complications and long-term issues.Recent studies have explored ABE diagnosis.However,they often face limitations in classification due to reliance on a single modality of Magnetic Resonance Imaging(MRI).To tackle this problem,the authors propose a Tri-M2MT model for precise ABE detection by using tri-modality MRI scans.The scans include T1-weighted imaging(T1WI),T2-weighted imaging(T2WI),and apparent diffusion coefficient maps to get indepth information.Initially,the tri-modality MRI scans are collected and preprocessesed by using an Advanced Gaussian Filter for noise reduction and Z-score normalisation for data standardisation.An Advanced Capsule Network was utilised to extract relevant features by using Snake Optimization Algorithm to select optimal features based on feature correlation with the aim of minimising complexity and enhancing detection accuracy.Furthermore,a multi-transformer approach was used for feature fusion and identify feature correlations effectively.Finally,accurate ABE diagnosis is achieved through the utilisation of a SoftMax layer.The performance of the proposed Tri-M2MT model is evaluated across various metrics,including accuracy,specificity,sensitivity,F1-score,and ROC curve analysis,and the proposed methodology provides better performance compared to existing methodologies.
文摘In recent years,the analysis of encrypted network traffic has gained momentum due to the widespread use of Transport Layer Security and Quick UDP Internet Connections protocols,which complicate and prolong the analysis process.Classification models face challenges in understanding and classifying unknown traffic because of issues related to interpret ability and the representation of traffic data.To tackle these complexities,multi-modal representation learning can be employed to extract meaningful features and represent them in a lower-dimensional latent space.Recently,auto-encoder-based multi-modal representation techniques have shown superior performance in representing network traffic.By combining the advantages of multi-modal representation with efficient classifiers,we can develop robust network traffic classifiers.In this paper,we propose a novel multi-modal encoder-decoder model to create unified representations of network traffic,paired with a robust 1D-CNN(one-dimensional convolution neural network)classifier for effective traffic classification.The proposed model utilizes the ISCX Virtual Private Networknon Virtual Private Network 2016 datasets to extract general multi-modal representations and to train both shallow and deep learning models,such as Random Forest and the 1D-CNN model,for traffic classification.We compare these learning approaches based on the multi-modal representations generated from the autoencoder and the early feature fusion technique.For the classification task,both the Random Forest and 1D-CNN models,when trained on multimodal representations,achieve over 90%accuracy on a highly imbalanced dataset.
基金supported by Natural Science Foundation of Jilin Province(No.SKL202302002)Key Research and Development project of Jilin Provincial Science and Technology Department(No.20210204142YY)+2 种基金The Science and Technology Development Program of Jilin Province(No.2020122256JC)Beijing Kechuang Medical Development Foundation Fund of China(No.KC2023-JX-0186BQ079)Talent Reserve Program(TRP),the First Hospital of Jilin University(No.JDYY-TRP-2024007)。
文摘Prostate cancer(PCa)is characterized by high incidence and propensity for easy metastasis,presenting significant challenges in clinical diagnosis and treatment.Tumor microenvironment(TME)-responsive nanomaterials provide a promising prospect for imaging-guided precision therapy.Considering that tumor-derived alkaline phosphatase(ALP)is over-expressed in metastatic PCa,it makes a great chance to develop a theranostics system with ALP responsive in the TME.Herein,an ALP-responsive aggregationinduced emission luminogens(AIEgens)nanoprobe AMNF self-assembly was designed for enhancing the diagnosis and treatment of metastatic PCa.The nanoprobe exhibited self-aggregation in the presence of ALP resulted in aggregation-induced fluorescence,and enhanced accumulation and prolonged retention period at the tumor site.In terms of detection,the fluorescence(FL)/computed tomography(CT)/magnetic resonance(MR)multi-mode imaging effect of nanoprobe was significantly improved post-aggregation,enabling precise diagnosis through the amalgamation of multiple imaging modes.Enhanced CT/MR imaging can achieve assist preoperative tumor diagnosis,and enhanced FL imaging technology can achieve“intraoperative visual navigation”,showing its potential application value in clinical tumor detection and surgical guidance.In terms of treatment,AMNF showed strong absorption in the near infrared region after aggregation,which improved the photothermal treatment effect.Overall,our work developed an effective aggregation-enhanced theranostic strategy for ALP-related cancers.