With the increasing of the elderly population and the growing hearth care cost, the role of service robots in aiding the disabled and the elderly is becoming important. Many researchers in the world have paid much att...With the increasing of the elderly population and the growing hearth care cost, the role of service robots in aiding the disabled and the elderly is becoming important. Many researchers in the world have paid much attention to heaRthcare robots and rehabilitation robots. To get natural and harmonious communication between the user and a service robot, the information perception/feedback ability, and interaction ability for service robots become more important in many key issues.展开更多
For the analysis of spinal and disc diseases,automated tissue segmentation of the lumbar spine is vital.Due to the continuous and concentrated location of the target,the abundance of edge features,and individual diffe...For the analysis of spinal and disc diseases,automated tissue segmentation of the lumbar spine is vital.Due to the continuous and concentrated location of the target,the abundance of edge features,and individual differences,conventional automatic segmentation methods perform poorly.Since the success of deep learning in the segmentation of medical images has been shown in the past few years,it has been applied to this task in a number of ways.The multi-scale and multi-modal features of lumbar tissues,however,are rarely explored by methodologies of deep learning.Because of the inadequacies in medical images availability,it is crucial to effectively fuse various modes of data collection for model training to alleviate the problem of insufficient samples.In this paper,we propose a novel multi-modality hierarchical fusion network(MHFN)for improving lumbar spine segmentation by learning robust feature representations from multi-modality magnetic resonance images.An adaptive group fusion module(AGFM)is introduced in this paper to fuse features from various modes to extract cross-modality features that could be valuable.Furthermore,to combine features from low to high levels of cross-modality,we design a hierarchical fusion structure based on AGFM.Compared to the other feature fusion methods,AGFM is more effective based on experimental results on multi-modality MR images of the lumbar spine.To further enhance segmentation accuracy,we compare our network with baseline fusion structures.Compared to the baseline fusion structures(input-level:76.27%,layer-level:78.10%,decision-level:79.14%),our network was able to segment fractured vertebrae more accurately(85.05%).展开更多
Listening is the breakthrough for conquering English castle, it is not only the requirement of English test, but also the practical use of English knowledge and the embodiment of English comprehensive ability. Listeni...Listening is the breakthrough for conquering English castle, it is not only the requirement of English test, but also the practical use of English knowledge and the embodiment of English comprehensive ability. Listening teaching plays a crucial role in foreign language teaching. However, the effect of listening teaching is undesirable. In recent years, multi-modality theory has been focused by many researchers. In view of particularity of the listening teaching, it is urgent to apply the multi-modality theory to English listening teaching which will produce very good teaching result.展开更多
A new coarse-to-fine strategy was proposed for nonrigid registration of computed tomography(CT) and magnetic resonance(MR) images of a liver.This hierarchical framework consisted of an affine transformation and a B-sp...A new coarse-to-fine strategy was proposed for nonrigid registration of computed tomography(CT) and magnetic resonance(MR) images of a liver.This hierarchical framework consisted of an affine transformation and a B-splines free-form deformation(FFD).The affine transformation performed a rough registration targeting the mismatch between the CT and MR images.The B-splines FFD transformation performed a finer registration by correcting local motion deformation.In the registration algorithm,the normalized mutual information(NMI) was used as similarity measure,and the limited memory Broyden-Fletcher- Goldfarb-Shannon(L-BFGS) optimization method was applied for optimization process.The algorithm was applied to the fully automated registration of liver CT and MR images in three subjects.The results demonstrate that the proposed method not only significantly improves the registration accuracy but also reduces the running time,which is effective and efficient for nonrigid registration.展开更多
In this work, we propose a new variational model for multi-modal image registration and present an efficient numerical implementation. The model minimizes a new functional based on using reformulated normalized gradie...In this work, we propose a new variational model for multi-modal image registration and present an efficient numerical implementation. The model minimizes a new functional based on using reformulated normalized gradients of the images as the fidelity term and higher-order derivatives as the regularizer. A key feature of the model is its ability of guaranteeing a diffeomorphic transformation which is achieved by a control term motivated by the quasi-conformal map and Beltrami coefficient. The existence of the solution of this model is established. To solve the model numerically, we design a Gauss-Newton method to solve the resulting discrete optimization problem and prove its convergence;a multilevel technique is employed to speed up the initialization and avoid likely local minima of the underlying functional. Finally, numerical experiments demonstrate that this new model can deliver good performances for multi-modal image registration and simultaneously generate an accurate diffeomorphic transformation.展开更多
Background:In recent years,deep convolutional neural networks(CNNs)have achieved great successes in medical imaging.However,it is difficult to obtain accurate pathological information for clinical diagnosis and treatm...Background:In recent years,deep convolutional neural networks(CNNs)have achieved great successes in medical imaging.However,it is difficult to obtain accurate pathological information for clinical diagnosis and treatment by leveraging single-modality medical images.This study aims to provide an efficient multimodality whole heart segmentation method for the diagnosis of coronary heart disease.Methods:We propose SFAM-TransUnet for multimodality whole heart segmentation,a novel deep learning framework combining CNNs and transformers.Primarily,the method integrates CNNs and visual transformers(Vits)into a unified fusion framework.Specifically,the shallow feature fusion module is designed to connect MRI and CT images,thereby providing a powerful and efficient multimodality fusion backbone for semantic segmentation.Furthermore,we propose a fusion ViT(FViT)module including self-attention(SA)and adaptive mutual boost attention(Ada-MBA)to enhance contextual information within and across modalities.The Ada-MBA module assigns attention to semantic perception regions by calculating SA and cross-attention,which improves the ability to understand context from the different modalities.Extensive experiments are con-ducted on the clinical Multi-Modality Whole Heart Segmentation datasets.Results:We successfully improved the whole heart segmentation DSCs to 0.902(AA),0.920(LV-blood),0.863(LA-blood),and 0.837(LV-myo),the HDs to 9.886(AA),9.947(LV-blood),11.911(LA-blood),and 13.599(LV-myo),the PSNR values to 33.577(AA),30.091(LV-blood),32.055(LA-blood),and 29.837(LV-myo),SSMI values to 0.901(AA),0.818(LV-blood),0.765(LA-blood),and 0.743(LV-myo).This demonstrate SFAM-TransUnet outperforms various alternative methods.Conclusions:We propose SFAM-TransUnet,an efficient framework tailored for whole heart segmentation that combines CNNs and transformers.It provides a powerful multimodality fusion network to improve the performance of whole heart semantic segmentation.These results demonstrate the efficacy of SFAM-TransUnet in integrating relevant information between different modalities in multimodal tasks.展开更多
Gait recognition is a key biometric for long-distance identification,yet its performance is severely degraded by real-world challenges such as varying clothing,carrying conditions,and changing viewpoints.While combini...Gait recognition is a key biometric for long-distance identification,yet its performance is severely degraded by real-world challenges such as varying clothing,carrying conditions,and changing viewpoints.While combining silhouette and skeleton data is a promising direction,effectively fusing these heterogeneous modalities and adaptively weighting their contributions in response to diverse conditions remains a central problem.This paper introduces GaitMAFF,a novelMulti-modal Adaptive Feature Fusion Network,to address this challenge.Our approach first transforms discrete skeleton joints into a dense SkeletonMap representation to align with silhouettes,then employs an attention-based module to dynamically learn the fusion weights between the two modalities.These fused features are processed by a powerful spatio-temporal backbone withWeighted Global-Local Feature FusionModules(WFFM)to learn a discriminative representation.Extensive experiments on the challenging CCPG and Gait3D datasets show that GaitMAFF achieves state-of-the-art performance,with an average Rank-1 accuracy of 84.6%on CCPG and 58.7%on Gait3D.These results demonstrate that our adaptive fusion strategy effectively integrates complementary multimodal information,significantly enhancing gait recognition robustness and accuracy in complex scenes and providing a practical solution for real-world applications.展开更多
Autism spectrum disorder(AsD)is a highly heterogeneous neurodevelopmental disorder.Early diagnosis and intervention are crucial for improving outcomes.Traditional single-modality diagnostic methods are subjective,limi...Autism spectrum disorder(AsD)is a highly heterogeneous neurodevelopmental disorder.Early diagnosis and intervention are crucial for improving outcomes.Traditional single-modality diagnostic methods are subjective,limited,and struggle to reveal the underlying pathological mechanisms.In contrast,multimodal data analysis integrates behavioral,physiological,and neuroimaging information with advanced machine-learning and deeplearning algorithms to overcome these limitations.In this review,we surveyed the recent pediatric AsD literature,highlighting artificial intelligence-driven diagnostic techniques,multimodal data fusion strategies,and emerging trends in ASD assessment.We surveyed studies that integrated two or more modalities and summarized the fusion levels,learning paradigms,tasks,datasets,and metrics.Multimodal approaches outperform singlemodality baselines in classification,severity estimation,and subtyping by leveraging complementary information and reducing modality-specific biases.Multimodal approaches significantly enhance diagnostic accuracy and comprehensiveness,enabling early screening of AsD,symptom subtyping,severity assessment,and personalized interventions.Advances in multimodal fusion techniques have promoted progress in precision medicine for the treatment of ASD.展开更多
Metal organic framework(MOF) assembled with coordination bonds has the disadvantage of poor stability that limits its application in the field of stationary phase,while covalent organic framework(COF)assembled through...Metal organic framework(MOF) assembled with coordination bonds has the disadvantage of poor stability that limits its application in the field of stationary phase,while covalent organic framework(COF)assembled through covalent bonds exhibits excellent structural stability.It has been shown that the stationary phases prepared by combining MOF and COF can make up for the poor stability of MOF@SiO_(2),and the MOF/COF composites have superior chromatographic separation performance.However,the traditional methods for preparing COF/MOF based stationary phases are generally solvent thermal synthesis.In this study,a green and low-cost synthesis method was proposed for the preparation of MOF/COF@SiO_(2) stationary phase.Firstly,COF@SiO_(2) was prepared in a choline chloride/ethylene glycol based deep eutectic solvent(DES).Secondly,another acid-base tunable DES prepared by mixing p-toluenesulfonic acid(PTSA)and 2-methylimidazole in different proportions was introduced as the reaction solvent and reactant for rapid synthesis of MOF/COF@SiO_(2).Compared with the toxic transition metal-based MOFs selected in most previous studies,a lightweight and non-toxic S-zone metal(calcium) based MOF was employed in this study.PTSA and calcium will form the calcium/oxygen-containing organic acid framework in acidic DES,which assembles with terephthalic acid dissolved in basic DES to form MOF.The strong hydrogen bonding effect of DES can facilitate rapid assembly of Ca-MOF.The obtained Ca-MOF/COF@SiO_(2) can be used for multi-mode chromatography to efficiently separate multiple isomeric/hydrophilic/hydrophobic analytes.The synthesis method of Ca-MOF/COF@SiO_(2) is green and mild,especially the use of acid-base tunable DES promotes the rapid synthesis of non-toxic Ca-MOF/COF@silica composites,which offers an innovative approach of greenly synthesizing novel MOF/COF stationary phases and extends their applications in the field of chromatography.展开更多
In multi-modal emotion recognition,excessive reliance on historical context often impedes the detection of emotional shifts,while modality heterogeneity and unimodal noise limit recognition performance.Existing method...In multi-modal emotion recognition,excessive reliance on historical context often impedes the detection of emotional shifts,while modality heterogeneity and unimodal noise limit recognition performance.Existing methods struggle to dynamically adjust cross-modal complementary strength to optimize fusion quality and lack effective mechanisms to model the dynamic evolution of emotions.To address these issues,we propose a multi-level dynamic gating and emotion transfer framework for multi-modal emotion recognition.A dynamic gating mechanism is applied across unimodal encoding,cross-modal alignment,and emotion transfer modeling,substantially improving noise robustness and feature alignment.First,we construct a unimodal encoder based on gated recurrent units and feature-selection gating to suppress intra-modal noise and enhance contextual representation.Second,we design a gated-attention crossmodal encoder that dynamically calibrates the complementary contributions of visual and audio modalities to the dominant textual features and eliminates redundant information.Finally,we introduce a gated enhanced emotion transfer module that explicitly models the temporal dependence of emotional evolution in dialogues via transfer gating and optimizes continuity modeling with a comparative learning loss.Experimental results demonstrate that the proposed method outperforms state-of-the-art models on the public MELD and IEMOCAP datasets.展开更多
The fasteners employed in the railway tracks are susceptible to defects arising from their intricate composition.Foreign objects are frequently observed on the track bed in an open environment.These two types of defec...The fasteners employed in the railway tracks are susceptible to defects arising from their intricate composition.Foreign objects are frequently observed on the track bed in an open environment.These two types of defects pose potential threats to high-speed trains,thus necessitating timely and accurate track inspection.The majority of extant automatic inspection methods are predicated on the utilization of single visible light data,and the efficacy of the algorithmic processes is influenced by complex environments.Furthermore,due to the single information dimension,the detection accuracy of defects in similar,occluded,and small object categories is low.To address the aforementioned issues,this paper proposes a track defect detectionmethod based on dynamicmulti-modal fusion and challenging object enhanced perception.First,in light of the variances in the representation dimensions ofmultimodal information,this paper proposes a dynamic weighted multi-modal feature fusion module.The fused multi-modal features are assigned weights,and thenmultiplied with the extracted single-modal features atmultiple levels,achieving adaptive adjustment of the response degree of fusion features.Second,a novel stepwise multi-scale convolution feature aggregation module is proposed for challenging objects.The proposed method employs depth separable convolution and cross-scale aggregation operations of different receptive fields to enhance feature extraction and reuse,thereby reducing the degree of progressive loss of effective information.The experimental results demonstrate the efficacy of the proposed method in comparison to eight established methods,encompassing both single-modal and multi-modal methods,as evidenced by the extensive findings within the constructed RGBD dataset.展开更多
The flow behavior of molten steel in the thin slab mold under high casting speed conditions was investigated,with a focus on the multi-mode continuous casting and rolling mold.A steel-slag two-phase flow model was est...The flow behavior of molten steel in the thin slab mold under high casting speed conditions was investigated,with a focus on the multi-mode continuous casting and rolling mold.A steel-slag two-phase flow model was established using large eddy simulation,the volume of fluid,and magnetohydrodynamics methods through numerical simulation.The maximum flow velocity and wave height at the steel-slag interface within the mold are critical evaluation criteria for analyzing asymmetric flow under varying casting speeds and electromagnetic braking.The results indicate that the asymmetric flows within the mold do not occur synchronously.The severity of the asymmetric flow correlates with the velocity difference across the steel-slag interface.A greater biased flow prolongs the time required to revert to a steady state.When the magnetic field intensity is set to 0.24 T and the magnetic pole position is at 390 mm from the steel-slag interface,this configuration can reduce the velocity of the steel-slag interface,thereby mitigating the asymmetric flow.Additionally,it can diminish the velocity,impact depth,and impact intensity on the narrow face of the jet,thus improving the distribution of velocity and turbulent kinetic energy within the mold.This configuration prolongs the time required for the steel-slag interface to transition from a stable state to its maximum velocity and shortens the time for the interface to return to stability from an unstable state.Moreover,it ensures the positional stability of the steel-slag interface,confining its position within−3 mm.展开更多
To address the challenge of achieving decentralized,scalable,and adaptive control for large-scale multiple unmanned aerial vehicle(multi-UAV)swarms in dynamic urban environments with obstacles and wind perturbations,w...To address the challenge of achieving decentralized,scalable,and adaptive control for large-scale multiple unmanned aerial vehicle(multi-UAV)swarms in dynamic urban environments with obstacles and wind perturbations,we proposed a hybrid framework integrating adaptive reinforcement learning(RL),multi-modal perception fusion,and enhanced pigeon flock optimization(PFO)with curiosity-driven exploration to enable robust autonomous and formation control.The framework leverages meta-learning to optimize RL policies for real-time adaptation,fuses sensor data for precise state estimation,and enhances PFO with learned leader-follower dynamics and exploration rewards to maintain cohesive formations and explore uncertain areas.For swarms of 10–30 UAVs,it achieves 34%faster convergence,61%reduced stability root mean square error(RMSE),88%fewer collisions and 85.6%–92.3%success rates in target detection and encirclement,outperforming standard multi-agent RL,pure PFO,and single-modality RL.Three-dimensional trajectory visualizations confirm cohesive formations,collision-free maneuvers,and efficient exploration in urban search-and-rescue scenarios.Innovations include meta-RL for rapid adaptation,multi-modal fusion for robust perception,and curiosity-driven PFO for scalable,decentralized control,advancing real-world multi-UAV swarm autonomy and coordination.展开更多
Development of versatile theranostic agents that simultaneously integrate therapeutic and diagnostic features remains a clinical urgent.Herein,we aimed to prepare uniform PEGylated(lactic-co-glycolic acid)(PLGA)microc...Development of versatile theranostic agents that simultaneously integrate therapeutic and diagnostic features remains a clinical urgent.Herein,we aimed to prepare uniform PEGylated(lactic-co-glycolic acid)(PLGA)microcapsules(PB@(Fe_(3)O_(4)@PEG-PLGA)MCs)with superparamagnetic Fe3O4 nanoparticles embedded in the shell and Prussian blue(PB)NPs inbuilt in the cavity via a premix membrane emulsification(PME)method.On account of the eligible geometry and multiple load capacity,these MCs could be used as efficient multi-modality contrast agents to simultaneously enhance the contrasts of US,MR and PAT imaging.In-built PB NPs furnished the MCs with excellent photothermal conversion property and embedded Fe_(3)O_(4)NPs endowed the magnetic location for fabrication of targeted drug delivery system.Notably,after further in-situ encapsulation of antitumor drug of DOX,(PB+DOX)@(Fe_(3)O_(4)@PEG-PLGA)MCs possessed more unique advantages on achieving near infrared(NIR)-responsive drug delivery and magnetic-guided chemo-photothermal synergistic osteosarcoma therapy.In vitro and in vivo studies revealed these biocompatible(PB+DOX)@(Fe_(3)O_(4)@PEG-PLGA)MCs could effectively target to the tumor tissue with superior therapeutic effect against the invasion of osteosarcoma and alleviation of osteolytic lesions,which will be developed as a smart platform integrating multi-modality imaging capabilities and synergistic effect with high therapy efficacy.展开更多
Self-mixing interferometry(SMI)is an attractive sensing scheme that typically relies on mono-modal operation of an employed laser diode.However,change in laser modality can occur due to change in operating conditions....Self-mixing interferometry(SMI)is an attractive sensing scheme that typically relies on mono-modal operation of an employed laser diode.However,change in laser modality can occur due to change in operating conditions.So,detection of occurrence of multi-modality in SMI signals is necessary to avoid erroneous metric measurements.Typically,processing of multi-modal SMI signals is a difficult task due to the diverse and complex nature of such signals.However,the proposed techniques can significantly ease this task by identifying the modal state of SMI signals with 100%success rate so that interferometric fringes can be correctly interpreted for metric sensing applications.展开更多
In metaverse,a digital-twin smart home is a vital platform for immersive communication between the physical and virtual world.Triboelectric nanogenerators(TENGs)sensors contribute substantially to providing smart-home...In metaverse,a digital-twin smart home is a vital platform for immersive communication between the physical and virtual world.Triboelectric nanogenerators(TENGs)sensors contribute substantially to providing smart-home monitoring.However,TENG deployment is hindered by its unstable out-put under environment changes.Herein,we develop a digital-twin smart home using a robust all-TENG based information mat(InfoMat),which consists of an in-home mat array and an entry mat.The interdigital electrodes design allows environment-insensitive ratiometric readout from the mat array to can-cel the commonly experienced environmental variations.Arbitrary position sensing is also achieved because of the interval arrangement of the mat pixels.Concurrently,the two-channel entry mat generates multi-modality informa-tion to aid the 10-user identification accuracy to increase from 93% to 99% compared to the one-channel case.Furthermore,a digital-twin smart home is visualized by real-time projecting the information in smart home to virtual reality,including access authorization,position,walking trajectory,dynamic activities/sports,and so on.展开更多
Autonomous driving and self-driving vehicles have become the most popular selection for customers for their convenience.Vehicle angle prediction is one of the most prevalent topics in the autonomous driving industry,t...Autonomous driving and self-driving vehicles have become the most popular selection for customers for their convenience.Vehicle angle prediction is one of the most prevalent topics in the autonomous driving industry,that is,realizing real-time vehicle angle prediction.However,existing methods of vehicle angle prediction utilize only single-modal data to achieve model prediction,such as images captured by the camera,which limits the performance and efficiency of the prediction system.In this paper,we present Emma,a novel vehicle angle prediction strategy that achieves multi-modal prediction and is more efficient.Specifically,Emma exploits both images and inertial measurement unit(IMU)signals with a fusion network for multi-modal data fusion and vehicle angle prediction.Moreover,we design and implement a few-shot learning module in Emma for fast domain adaptation to varied scenarios(e.g.,different vehicle models).Evaluation results demonstrate that Emma achieves overall 97.5%accuracy in predicting three vehicle angle parameters(yaw,pitch,and roll),which outperforms traditional single-modalities by approximately 16.7%-36.8%.Additionally,the few-shot learning module presents promising adaptive ability and shows overall 79.8%and 88.3%accuracy in 5-shot and 10-shot settings,respectively.Finally,empirical results show that Emma reduces energy consumption by 39.7%when running on the Arduino UNO board.展开更多
The precise prediction of molecular properties is essential for advancements in drug development,particularly in virtual screening and compound optimization.The recent introduction of numerous deep learningbased metho...The precise prediction of molecular properties is essential for advancements in drug development,particularly in virtual screening and compound optimization.The recent introduction of numerous deep learningbased methods has shown remarkable potential in enhancing Molecular Property Prediction(MPP),especially improving accuracy and insights into molecular structures.Yet,two critical questions arise:does the integration of domain knowledge augment the accuracy of molecular property prediction and does employing multi-modal data fusion yield more precise results than unique data source methods?To explore these matters,we comprehensively review and quantitatively analyze recent deep learning methods based on various benchmarks.We discover that integrating molecular information significantly improves Molecular Property Prediction(MPP)for both regression and classification tasks.Specifically,regression improvements,measured by reductions in Root Mean Square Error(RMSE),are up to 4.0%,while classification enhancements,measured by the area under the receiver operating characteristic curve(ROC-AUC),are up to 1.7%.Additionally,we discover that,as measured by ROC-AUC,augmenting 2D graphs with 3D information improves performance for classification tasks by up to 13.2%and enriching 2D graphs with 1D SMILES boosts multi-modal learning performance for regression tasks by up to 9.1%.The two consolidated insights offer crucial guidance for future advancements in drug discovery.展开更多
文摘With the increasing of the elderly population and the growing hearth care cost, the role of service robots in aiding the disabled and the elderly is becoming important. Many researchers in the world have paid much attention to heaRthcare robots and rehabilitation robots. To get natural and harmonious communication between the user and a service robot, the information perception/feedback ability, and interaction ability for service robots become more important in many key issues.
基金supported in part by the Technology Innovation 2030 under Grant 2022ZD0211700.
文摘For the analysis of spinal and disc diseases,automated tissue segmentation of the lumbar spine is vital.Due to the continuous and concentrated location of the target,the abundance of edge features,and individual differences,conventional automatic segmentation methods perform poorly.Since the success of deep learning in the segmentation of medical images has been shown in the past few years,it has been applied to this task in a number of ways.The multi-scale and multi-modal features of lumbar tissues,however,are rarely explored by methodologies of deep learning.Because of the inadequacies in medical images availability,it is crucial to effectively fuse various modes of data collection for model training to alleviate the problem of insufficient samples.In this paper,we propose a novel multi-modality hierarchical fusion network(MHFN)for improving lumbar spine segmentation by learning robust feature representations from multi-modality magnetic resonance images.An adaptive group fusion module(AGFM)is introduced in this paper to fuse features from various modes to extract cross-modality features that could be valuable.Furthermore,to combine features from low to high levels of cross-modality,we design a hierarchical fusion structure based on AGFM.Compared to the other feature fusion methods,AGFM is more effective based on experimental results on multi-modality MR images of the lumbar spine.To further enhance segmentation accuracy,we compare our network with baseline fusion structures.Compared to the baseline fusion structures(input-level:76.27%,layer-level:78.10%,decision-level:79.14%),our network was able to segment fractured vertebrae more accurately(85.05%).
文摘Listening is the breakthrough for conquering English castle, it is not only the requirement of English test, but also the practical use of English knowledge and the embodiment of English comprehensive ability. Listening teaching plays a crucial role in foreign language teaching. However, the effect of listening teaching is undesirable. In recent years, multi-modality theory has been focused by many researchers. In view of particularity of the listening teaching, it is urgent to apply the multi-modality theory to English listening teaching which will produce very good teaching result.
基金Project(61240010)supported by the National Natural Science Foundation of ChinaProject(20070007070)supported by Specialized Research Fund for the Doctoral Program of Higher Education of China
文摘A new coarse-to-fine strategy was proposed for nonrigid registration of computed tomography(CT) and magnetic resonance(MR) images of a liver.This hierarchical framework consisted of an affine transformation and a B-splines free-form deformation(FFD).The affine transformation performed a rough registration targeting the mismatch between the CT and MR images.The B-splines FFD transformation performed a finer registration by correcting local motion deformation.In the registration algorithm,the normalized mutual information(NMI) was used as similarity measure,and the limited memory Broyden-Fletcher- Goldfarb-Shannon(L-BFGS) optimization method was applied for optimization process.The algorithm was applied to the fully automated registration of liver CT and MR images in three subjects.The results demonstrate that the proposed method not only significantly improves the registration accuracy but also reduces the running time,which is effective and efficient for nonrigid registration.
文摘In this work, we propose a new variational model for multi-modal image registration and present an efficient numerical implementation. The model minimizes a new functional based on using reformulated normalized gradients of the images as the fidelity term and higher-order derivatives as the regularizer. A key feature of the model is its ability of guaranteeing a diffeomorphic transformation which is achieved by a control term motivated by the quasi-conformal map and Beltrami coefficient. The existence of the solution of this model is established. To solve the model numerically, we design a Gauss-Newton method to solve the resulting discrete optimization problem and prove its convergence;a multilevel technique is employed to speed up the initialization and avoid likely local minima of the underlying functional. Finally, numerical experiments demonstrate that this new model can deliver good performances for multi-modal image registration and simultaneously generate an accurate diffeomorphic transformation.
基金supported by the Henan Province Science and Technology Research Project(Grant 252102311276)Henan Province Key Scientific Research Projects of Universities(Grant 25B520002)+1 种基金the Fund of the Institute of Complexity Science from Henan University of Technology(Grant CSKFJJ-2025-13)the 2023 Research Nursery Engineering Project of Henan University of Chinese Medicine(Grant MP2023-10).
文摘Background:In recent years,deep convolutional neural networks(CNNs)have achieved great successes in medical imaging.However,it is difficult to obtain accurate pathological information for clinical diagnosis and treatment by leveraging single-modality medical images.This study aims to provide an efficient multimodality whole heart segmentation method for the diagnosis of coronary heart disease.Methods:We propose SFAM-TransUnet for multimodality whole heart segmentation,a novel deep learning framework combining CNNs and transformers.Primarily,the method integrates CNNs and visual transformers(Vits)into a unified fusion framework.Specifically,the shallow feature fusion module is designed to connect MRI and CT images,thereby providing a powerful and efficient multimodality fusion backbone for semantic segmentation.Furthermore,we propose a fusion ViT(FViT)module including self-attention(SA)and adaptive mutual boost attention(Ada-MBA)to enhance contextual information within and across modalities.The Ada-MBA module assigns attention to semantic perception regions by calculating SA and cross-attention,which improves the ability to understand context from the different modalities.Extensive experiments are con-ducted on the clinical Multi-Modality Whole Heart Segmentation datasets.Results:We successfully improved the whole heart segmentation DSCs to 0.902(AA),0.920(LV-blood),0.863(LA-blood),and 0.837(LV-myo),the HDs to 9.886(AA),9.947(LV-blood),11.911(LA-blood),and 13.599(LV-myo),the PSNR values to 33.577(AA),30.091(LV-blood),32.055(LA-blood),and 29.837(LV-myo),SSMI values to 0.901(AA),0.818(LV-blood),0.765(LA-blood),and 0.743(LV-myo).This demonstrate SFAM-TransUnet outperforms various alternative methods.Conclusions:We propose SFAM-TransUnet,an efficient framework tailored for whole heart segmentation that combines CNNs and transformers.It provides a powerful multimodality fusion network to improve the performance of whole heart semantic segmentation.These results demonstrate the efficacy of SFAM-TransUnet in integrating relevant information between different modalities in multimodal tasks.
基金funded by the Natural Science Foundation of Chongqing Municipality,grant number CSTB2022NSCQ-MSX0503.
文摘Gait recognition is a key biometric for long-distance identification,yet its performance is severely degraded by real-world challenges such as varying clothing,carrying conditions,and changing viewpoints.While combining silhouette and skeleton data is a promising direction,effectively fusing these heterogeneous modalities and adaptively weighting their contributions in response to diverse conditions remains a central problem.This paper introduces GaitMAFF,a novelMulti-modal Adaptive Feature Fusion Network,to address this challenge.Our approach first transforms discrete skeleton joints into a dense SkeletonMap representation to align with silhouettes,then employs an attention-based module to dynamically learn the fusion weights between the two modalities.These fused features are processed by a powerful spatio-temporal backbone withWeighted Global-Local Feature FusionModules(WFFM)to learn a discriminative representation.Extensive experiments on the challenging CCPG and Gait3D datasets show that GaitMAFF achieves state-of-the-art performance,with an average Rank-1 accuracy of 84.6%on CCPG and 58.7%on Gait3D.These results demonstrate that our adaptive fusion strategy effectively integrates complementary multimodal information,significantly enhancing gait recognition robustness and accuracy in complex scenes and providing a practical solution for real-world applications.
基金supported by the National Key Research and Development Program of China(Research Grant Number:2023YFC3603600).
文摘Autism spectrum disorder(AsD)is a highly heterogeneous neurodevelopmental disorder.Early diagnosis and intervention are crucial for improving outcomes.Traditional single-modality diagnostic methods are subjective,limited,and struggle to reveal the underlying pathological mechanisms.In contrast,multimodal data analysis integrates behavioral,physiological,and neuroimaging information with advanced machine-learning and deeplearning algorithms to overcome these limitations.In this review,we surveyed the recent pediatric AsD literature,highlighting artificial intelligence-driven diagnostic techniques,multimodal data fusion strategies,and emerging trends in ASD assessment.We surveyed studies that integrated two or more modalities and summarized the fusion levels,learning paradigms,tasks,datasets,and metrics.Multimodal approaches outperform singlemodality baselines in classification,severity estimation,and subtyping by leveraging complementary information and reducing modality-specific biases.Multimodal approaches significantly enhance diagnostic accuracy and comprehensiveness,enabling early screening of AsD,symptom subtyping,severity assessment,and personalized interventions.Advances in multimodal fusion techniques have promoted progress in precision medicine for the treatment of ASD.
基金supported by National Natural Science Foundation of China (Nos.21906124,32302202)Natural Science Foundation of Hubei Province (No.2017CFB220)Natural Science Foundation of Shandong Province (No.ZR2023MH278)。
文摘Metal organic framework(MOF) assembled with coordination bonds has the disadvantage of poor stability that limits its application in the field of stationary phase,while covalent organic framework(COF)assembled through covalent bonds exhibits excellent structural stability.It has been shown that the stationary phases prepared by combining MOF and COF can make up for the poor stability of MOF@SiO_(2),and the MOF/COF composites have superior chromatographic separation performance.However,the traditional methods for preparing COF/MOF based stationary phases are generally solvent thermal synthesis.In this study,a green and low-cost synthesis method was proposed for the preparation of MOF/COF@SiO_(2) stationary phase.Firstly,COF@SiO_(2) was prepared in a choline chloride/ethylene glycol based deep eutectic solvent(DES).Secondly,another acid-base tunable DES prepared by mixing p-toluenesulfonic acid(PTSA)and 2-methylimidazole in different proportions was introduced as the reaction solvent and reactant for rapid synthesis of MOF/COF@SiO_(2).Compared with the toxic transition metal-based MOFs selected in most previous studies,a lightweight and non-toxic S-zone metal(calcium) based MOF was employed in this study.PTSA and calcium will form the calcium/oxygen-containing organic acid framework in acidic DES,which assembles with terephthalic acid dissolved in basic DES to form MOF.The strong hydrogen bonding effect of DES can facilitate rapid assembly of Ca-MOF.The obtained Ca-MOF/COF@SiO_(2) can be used for multi-mode chromatography to efficiently separate multiple isomeric/hydrophilic/hydrophobic analytes.The synthesis method of Ca-MOF/COF@SiO_(2) is green and mild,especially the use of acid-base tunable DES promotes the rapid synthesis of non-toxic Ca-MOF/COF@silica composites,which offers an innovative approach of greenly synthesizing novel MOF/COF stationary phases and extends their applications in the field of chromatography.
基金funded by“the Fanying Special Program of the National Natural Science Foundation of China,grant number 62341307”“the Scientific research project of Jiangxi Provincial Department of Education,grant number GJJ200839”“theDoctoral startup fund of JiangxiUniversity of Technology,grant number 205200100402”.
文摘In multi-modal emotion recognition,excessive reliance on historical context often impedes the detection of emotional shifts,while modality heterogeneity and unimodal noise limit recognition performance.Existing methods struggle to dynamically adjust cross-modal complementary strength to optimize fusion quality and lack effective mechanisms to model the dynamic evolution of emotions.To address these issues,we propose a multi-level dynamic gating and emotion transfer framework for multi-modal emotion recognition.A dynamic gating mechanism is applied across unimodal encoding,cross-modal alignment,and emotion transfer modeling,substantially improving noise robustness and feature alignment.First,we construct a unimodal encoder based on gated recurrent units and feature-selection gating to suppress intra-modal noise and enhance contextual representation.Second,we design a gated-attention crossmodal encoder that dynamically calibrates the complementary contributions of visual and audio modalities to the dominant textual features and eliminates redundant information.Finally,we introduce a gated enhanced emotion transfer module that explicitly models the temporal dependence of emotional evolution in dialogues via transfer gating and optimizes continuity modeling with a comparative learning loss.Experimental results demonstrate that the proposed method outperforms state-of-the-art models on the public MELD and IEMOCAP datasets.
基金funded by Beijing Natural Science Foundation,grant number L241078.
文摘The fasteners employed in the railway tracks are susceptible to defects arising from their intricate composition.Foreign objects are frequently observed on the track bed in an open environment.These two types of defects pose potential threats to high-speed trains,thus necessitating timely and accurate track inspection.The majority of extant automatic inspection methods are predicated on the utilization of single visible light data,and the efficacy of the algorithmic processes is influenced by complex environments.Furthermore,due to the single information dimension,the detection accuracy of defects in similar,occluded,and small object categories is low.To address the aforementioned issues,this paper proposes a track defect detectionmethod based on dynamicmulti-modal fusion and challenging object enhanced perception.First,in light of the variances in the representation dimensions ofmultimodal information,this paper proposes a dynamic weighted multi-modal feature fusion module.The fused multi-modal features are assigned weights,and thenmultiplied with the extracted single-modal features atmultiple levels,achieving adaptive adjustment of the response degree of fusion features.Second,a novel stepwise multi-scale convolution feature aggregation module is proposed for challenging objects.The proposed method employs depth separable convolution and cross-scale aggregation operations of different receptive fields to enhance feature extraction and reuse,thereby reducing the degree of progressive loss of effective information.The experimental results demonstrate the efficacy of the proposed method in comparison to eight established methods,encompassing both single-modal and multi-modal methods,as evidenced by the extensive findings within the constructed RGBD dataset.
基金support from the National Natural Science Foundation of China(Grant Nos.52174313 and 52304350)thank all members of the Hebei High Quality Steel Continuous Casting Engineering Technology Research Center at North China University of Science and Technology,Tangshan,China.
文摘The flow behavior of molten steel in the thin slab mold under high casting speed conditions was investigated,with a focus on the multi-mode continuous casting and rolling mold.A steel-slag two-phase flow model was established using large eddy simulation,the volume of fluid,and magnetohydrodynamics methods through numerical simulation.The maximum flow velocity and wave height at the steel-slag interface within the mold are critical evaluation criteria for analyzing asymmetric flow under varying casting speeds and electromagnetic braking.The results indicate that the asymmetric flows within the mold do not occur synchronously.The severity of the asymmetric flow correlates with the velocity difference across the steel-slag interface.A greater biased flow prolongs the time required to revert to a steady state.When the magnetic field intensity is set to 0.24 T and the magnetic pole position is at 390 mm from the steel-slag interface,this configuration can reduce the velocity of the steel-slag interface,thereby mitigating the asymmetric flow.Additionally,it can diminish the velocity,impact depth,and impact intensity on the narrow face of the jet,thus improving the distribution of velocity and turbulent kinetic energy within the mold.This configuration prolongs the time required for the steel-slag interface to transition from a stable state to its maximum velocity and shortens the time for the interface to return to stability from an unstable state.Moreover,it ensures the positional stability of the steel-slag interface,confining its position within−3 mm.
基金supported by the National Natural Science Foundation of China(No.62350048)。
文摘To address the challenge of achieving decentralized,scalable,and adaptive control for large-scale multiple unmanned aerial vehicle(multi-UAV)swarms in dynamic urban environments with obstacles and wind perturbations,we proposed a hybrid framework integrating adaptive reinforcement learning(RL),multi-modal perception fusion,and enhanced pigeon flock optimization(PFO)with curiosity-driven exploration to enable robust autonomous and formation control.The framework leverages meta-learning to optimize RL policies for real-time adaptation,fuses sensor data for precise state estimation,and enhances PFO with learned leader-follower dynamics and exploration rewards to maintain cohesive formations and explore uncertain areas.For swarms of 10–30 UAVs,it achieves 34%faster convergence,61%reduced stability root mean square error(RMSE),88%fewer collisions and 85.6%–92.3%success rates in target detection and encirclement,outperforming standard multi-agent RL,pure PFO,and single-modality RL.Three-dimensional trajectory visualizations confirm cohesive formations,collision-free maneuvers,and efficient exploration in urban search-and-rescue scenarios.Innovations include meta-RL for rapid adaptation,multi-modal fusion for robust perception,and curiosity-driven PFO for scalable,decentralized control,advancing real-world multi-UAV swarm autonomy and coordination.
基金This work is supported by the National Natural Science Foundation of China(51973226,51773004,51920105006 and 81630056)National Key Basic Research Program of China(2014CB542202)the Youth Innovation Promotion Association CAS(No.2019031)for financial support.
文摘Development of versatile theranostic agents that simultaneously integrate therapeutic and diagnostic features remains a clinical urgent.Herein,we aimed to prepare uniform PEGylated(lactic-co-glycolic acid)(PLGA)microcapsules(PB@(Fe_(3)O_(4)@PEG-PLGA)MCs)with superparamagnetic Fe3O4 nanoparticles embedded in the shell and Prussian blue(PB)NPs inbuilt in the cavity via a premix membrane emulsification(PME)method.On account of the eligible geometry and multiple load capacity,these MCs could be used as efficient multi-modality contrast agents to simultaneously enhance the contrasts of US,MR and PAT imaging.In-built PB NPs furnished the MCs with excellent photothermal conversion property and embedded Fe_(3)O_(4)NPs endowed the magnetic location for fabrication of targeted drug delivery system.Notably,after further in-situ encapsulation of antitumor drug of DOX,(PB+DOX)@(Fe_(3)O_(4)@PEG-PLGA)MCs possessed more unique advantages on achieving near infrared(NIR)-responsive drug delivery and magnetic-guided chemo-photothermal synergistic osteosarcoma therapy.In vitro and in vivo studies revealed these biocompatible(PB+DOX)@(Fe_(3)O_(4)@PEG-PLGA)MCs could effectively target to the tumor tissue with superior therapeutic effect against the invasion of osteosarcoma and alleviation of osteolytic lesions,which will be developed as a smart platform integrating multi-modality imaging capabilities and synergistic effect with high therapy efficacy.
文摘Self-mixing interferometry(SMI)is an attractive sensing scheme that typically relies on mono-modal operation of an employed laser diode.However,change in laser modality can occur due to change in operating conditions.So,detection of occurrence of multi-modality in SMI signals is necessary to avoid erroneous metric measurements.Typically,processing of multi-modal SMI signals is a difficult task due to the diverse and complex nature of such signals.However,the proposed techniques can significantly ease this task by identifying the modal state of SMI signals with 100%success rate so that interferometric fringes can be correctly interpreted for metric sensing applications.
基金This work is supported by The Collaborative Research Project under the SIMTech-NUS Joint Laboratory,“SIMTech-NUS Joint Lab on Large-area Flexible Hybrid Electronics”and The National Key Research and Devel-opment Program of China(Grant No.2019YFB2004800,Project No.R-2020-S-002).
文摘In metaverse,a digital-twin smart home is a vital platform for immersive communication between the physical and virtual world.Triboelectric nanogenerators(TENGs)sensors contribute substantially to providing smart-home monitoring.However,TENG deployment is hindered by its unstable out-put under environment changes.Herein,we develop a digital-twin smart home using a robust all-TENG based information mat(InfoMat),which consists of an in-home mat array and an entry mat.The interdigital electrodes design allows environment-insensitive ratiometric readout from the mat array to can-cel the commonly experienced environmental variations.Arbitrary position sensing is also achieved because of the interval arrangement of the mat pixels.Concurrently,the two-channel entry mat generates multi-modality informa-tion to aid the 10-user identification accuracy to increase from 93% to 99% compared to the one-channel case.Furthermore,a digital-twin smart home is visualized by real-time projecting the information in smart home to virtual reality,including access authorization,position,walking trajectory,dynamic activities/sports,and so on.
基金supported by the National Natural Science Foundation of China(No.62101471)partially supported by the Shenzhen Research Institute of City University of Hong Kong,the Research Grants Council of the Hong Kong Special Administrative Region,China(No.CityU 21201420)+8 种基金Shenzhen Science and Technology Funding Fundamental Research Program(No.2021Szvup126)National Natural Science Foundation of Shandong Province(No.ZR2021LZH010)Changsha International and Regional Science and Technology Cooperation Program(No.kh2201023)Chow Sang Sang Group Research Fund sponsored by Chow Sang Sang Holdings International Limited(No.9229062)CityU MFPRC(No.9680333)CityU SIRG(No.7020057)CityU APRC(No.9610485)CityU ARG(No.9667225)CityU SRG-Fd(No.7005666).
文摘Autonomous driving and self-driving vehicles have become the most popular selection for customers for their convenience.Vehicle angle prediction is one of the most prevalent topics in the autonomous driving industry,that is,realizing real-time vehicle angle prediction.However,existing methods of vehicle angle prediction utilize only single-modal data to achieve model prediction,such as images captured by the camera,which limits the performance and efficiency of the prediction system.In this paper,we present Emma,a novel vehicle angle prediction strategy that achieves multi-modal prediction and is more efficient.Specifically,Emma exploits both images and inertial measurement unit(IMU)signals with a fusion network for multi-modal data fusion and vehicle angle prediction.Moreover,we design and implement a few-shot learning module in Emma for fast domain adaptation to varied scenarios(e.g.,different vehicle models).Evaluation results demonstrate that Emma achieves overall 97.5%accuracy in predicting three vehicle angle parameters(yaw,pitch,and roll),which outperforms traditional single-modalities by approximately 16.7%-36.8%.Additionally,the few-shot learning module presents promising adaptive ability and shows overall 79.8%and 88.3%accuracy in 5-shot and 10-shot settings,respectively.Finally,empirical results show that Emma reduces energy consumption by 39.7%when running on the Arduino UNO board.
文摘The precise prediction of molecular properties is essential for advancements in drug development,particularly in virtual screening and compound optimization.The recent introduction of numerous deep learningbased methods has shown remarkable potential in enhancing Molecular Property Prediction(MPP),especially improving accuracy and insights into molecular structures.Yet,two critical questions arise:does the integration of domain knowledge augment the accuracy of molecular property prediction and does employing multi-modal data fusion yield more precise results than unique data source methods?To explore these matters,we comprehensively review and quantitatively analyze recent deep learning methods based on various benchmarks.We discover that integrating molecular information significantly improves Molecular Property Prediction(MPP)for both regression and classification tasks.Specifically,regression improvements,measured by reductions in Root Mean Square Error(RMSE),are up to 4.0%,while classification enhancements,measured by the area under the receiver operating characteristic curve(ROC-AUC),are up to 1.7%.Additionally,we discover that,as measured by ROC-AUC,augmenting 2D graphs with 3D information improves performance for classification tasks by up to 13.2%and enriching 2D graphs with 1D SMILES boosts multi-modal learning performance for regression tasks by up to 9.1%.The two consolidated insights offer crucial guidance for future advancements in drug discovery.