With the increasing of the elderly population and the growing hearth care cost, the role of service robots in aiding the disabled and the elderly is becoming important. Many researchers in the world have paid much att...With the increasing of the elderly population and the growing hearth care cost, the role of service robots in aiding the disabled and the elderly is becoming important. Many researchers in the world have paid much attention to heaRthcare robots and rehabilitation robots. To get natural and harmonious communication between the user and a service robot, the information perception/feedback ability, and interaction ability for service robots become more important in many key issues.展开更多
For the analysis of spinal and disc diseases,automated tissue segmentation of the lumbar spine is vital.Due to the continuous and concentrated location of the target,the abundance of edge features,and individual diffe...For the analysis of spinal and disc diseases,automated tissue segmentation of the lumbar spine is vital.Due to the continuous and concentrated location of the target,the abundance of edge features,and individual differences,conventional automatic segmentation methods perform poorly.Since the success of deep learning in the segmentation of medical images has been shown in the past few years,it has been applied to this task in a number of ways.The multi-scale and multi-modal features of lumbar tissues,however,are rarely explored by methodologies of deep learning.Because of the inadequacies in medical images availability,it is crucial to effectively fuse various modes of data collection for model training to alleviate the problem of insufficient samples.In this paper,we propose a novel multi-modality hierarchical fusion network(MHFN)for improving lumbar spine segmentation by learning robust feature representations from multi-modality magnetic resonance images.An adaptive group fusion module(AGFM)is introduced in this paper to fuse features from various modes to extract cross-modality features that could be valuable.Furthermore,to combine features from low to high levels of cross-modality,we design a hierarchical fusion structure based on AGFM.Compared to the other feature fusion methods,AGFM is more effective based on experimental results on multi-modality MR images of the lumbar spine.To further enhance segmentation accuracy,we compare our network with baseline fusion structures.Compared to the baseline fusion structures(input-level:76.27%,layer-level:78.10%,decision-level:79.14%),our network was able to segment fractured vertebrae more accurately(85.05%).展开更多
Listening is the breakthrough for conquering English castle, it is not only the requirement of English test, but also the practical use of English knowledge and the embodiment of English comprehensive ability. Listeni...Listening is the breakthrough for conquering English castle, it is not only the requirement of English test, but also the practical use of English knowledge and the embodiment of English comprehensive ability. Listening teaching plays a crucial role in foreign language teaching. However, the effect of listening teaching is undesirable. In recent years, multi-modality theory has been focused by many researchers. In view of particularity of the listening teaching, it is urgent to apply the multi-modality theory to English listening teaching which will produce very good teaching result.展开更多
A new coarse-to-fine strategy was proposed for nonrigid registration of computed tomography(CT) and magnetic resonance(MR) images of a liver.This hierarchical framework consisted of an affine transformation and a B-sp...A new coarse-to-fine strategy was proposed for nonrigid registration of computed tomography(CT) and magnetic resonance(MR) images of a liver.This hierarchical framework consisted of an affine transformation and a B-splines free-form deformation(FFD).The affine transformation performed a rough registration targeting the mismatch between the CT and MR images.The B-splines FFD transformation performed a finer registration by correcting local motion deformation.In the registration algorithm,the normalized mutual information(NMI) was used as similarity measure,and the limited memory Broyden-Fletcher- Goldfarb-Shannon(L-BFGS) optimization method was applied for optimization process.The algorithm was applied to the fully automated registration of liver CT and MR images in three subjects.The results demonstrate that the proposed method not only significantly improves the registration accuracy but also reduces the running time,which is effective and efficient for nonrigid registration.展开更多
In this work, we propose a new variational model for multi-modal image registration and present an efficient numerical implementation. The model minimizes a new functional based on using reformulated normalized gradie...In this work, we propose a new variational model for multi-modal image registration and present an efficient numerical implementation. The model minimizes a new functional based on using reformulated normalized gradients of the images as the fidelity term and higher-order derivatives as the regularizer. A key feature of the model is its ability of guaranteeing a diffeomorphic transformation which is achieved by a control term motivated by the quasi-conformal map and Beltrami coefficient. The existence of the solution of this model is established. To solve the model numerically, we design a Gauss-Newton method to solve the resulting discrete optimization problem and prove its convergence;a multilevel technique is employed to speed up the initialization and avoid likely local minima of the underlying functional. Finally, numerical experiments demonstrate that this new model can deliver good performances for multi-modal image registration and simultaneously generate an accurate diffeomorphic transformation.展开更多
Development of versatile theranostic agents that simultaneously integrate therapeutic and diagnostic features remains a clinical urgent.Herein,we aimed to prepare uniform PEGylated(lactic-co-glycolic acid)(PLGA)microc...Development of versatile theranostic agents that simultaneously integrate therapeutic and diagnostic features remains a clinical urgent.Herein,we aimed to prepare uniform PEGylated(lactic-co-glycolic acid)(PLGA)microcapsules(PB@(Fe_(3)O_(4)@PEG-PLGA)MCs)with superparamagnetic Fe3O4 nanoparticles embedded in the shell and Prussian blue(PB)NPs inbuilt in the cavity via a premix membrane emulsification(PME)method.On account of the eligible geometry and multiple load capacity,these MCs could be used as efficient multi-modality contrast agents to simultaneously enhance the contrasts of US,MR and PAT imaging.In-built PB NPs furnished the MCs with excellent photothermal conversion property and embedded Fe_(3)O_(4)NPs endowed the magnetic location for fabrication of targeted drug delivery system.Notably,after further in-situ encapsulation of antitumor drug of DOX,(PB+DOX)@(Fe_(3)O_(4)@PEG-PLGA)MCs possessed more unique advantages on achieving near infrared(NIR)-responsive drug delivery and magnetic-guided chemo-photothermal synergistic osteosarcoma therapy.In vitro and in vivo studies revealed these biocompatible(PB+DOX)@(Fe_(3)O_(4)@PEG-PLGA)MCs could effectively target to the tumor tissue with superior therapeutic effect against the invasion of osteosarcoma and alleviation of osteolytic lesions,which will be developed as a smart platform integrating multi-modality imaging capabilities and synergistic effect with high therapy efficacy.展开更多
Self-mixing interferometry(SMI)is an attractive sensing scheme that typically relies on mono-modal operation of an employed laser diode.However,change in laser modality can occur due to change in operating conditions....Self-mixing interferometry(SMI)is an attractive sensing scheme that typically relies on mono-modal operation of an employed laser diode.However,change in laser modality can occur due to change in operating conditions.So,detection of occurrence of multi-modality in SMI signals is necessary to avoid erroneous metric measurements.Typically,processing of multi-modal SMI signals is a difficult task due to the diverse and complex nature of such signals.However,the proposed techniques can significantly ease this task by identifying the modal state of SMI signals with 100%success rate so that interferometric fringes can be correctly interpreted for metric sensing applications.展开更多
In metaverse,a digital-twin smart home is a vital platform for immersive communication between the physical and virtual world.Triboelectric nanogenerators(TENGs)sensors contribute substantially to providing smart-home...In metaverse,a digital-twin smart home is a vital platform for immersive communication between the physical and virtual world.Triboelectric nanogenerators(TENGs)sensors contribute substantially to providing smart-home monitoring.However,TENG deployment is hindered by its unstable out-put under environment changes.Herein,we develop a digital-twin smart home using a robust all-TENG based information mat(InfoMat),which consists of an in-home mat array and an entry mat.The interdigital electrodes design allows environment-insensitive ratiometric readout from the mat array to can-cel the commonly experienced environmental variations.Arbitrary position sensing is also achieved because of the interval arrangement of the mat pixels.Concurrently,the two-channel entry mat generates multi-modality informa-tion to aid the 10-user identification accuracy to increase from 93% to 99% compared to the one-channel case.Furthermore,a digital-twin smart home is visualized by real-time projecting the information in smart home to virtual reality,including access authorization,position,walking trajectory,dynamic activities/sports,and so on.展开更多
The precise prediction of molecular properties is essential for advancements in drug development,particularly in virtual screening and compound optimization.The recent introduction of numerous deep learningbased metho...The precise prediction of molecular properties is essential for advancements in drug development,particularly in virtual screening and compound optimization.The recent introduction of numerous deep learningbased methods has shown remarkable potential in enhancing Molecular Property Prediction(MPP),especially improving accuracy and insights into molecular structures.Yet,two critical questions arise:does the integration of domain knowledge augment the accuracy of molecular property prediction and does employing multi-modal data fusion yield more precise results than unique data source methods?To explore these matters,we comprehensively review and quantitatively analyze recent deep learning methods based on various benchmarks.We discover that integrating molecular information significantly improves Molecular Property Prediction(MPP)for both regression and classification tasks.Specifically,regression improvements,measured by reductions in Root Mean Square Error(RMSE),are up to 4.0%,while classification enhancements,measured by the area under the receiver operating characteristic curve(ROC-AUC),are up to 1.7%.Additionally,we discover that,as measured by ROC-AUC,augmenting 2D graphs with 3D information improves performance for classification tasks by up to 13.2%and enriching 2D graphs with 1D SMILES boosts multi-modal learning performance for regression tasks by up to 9.1%.The two consolidated insights offer crucial guidance for future advancements in drug discovery.展开更多
Autonomous driving and self-driving vehicles have become the most popular selection for customers for their convenience.Vehicle angle prediction is one of the most prevalent topics in the autonomous driving industry,t...Autonomous driving and self-driving vehicles have become the most popular selection for customers for their convenience.Vehicle angle prediction is one of the most prevalent topics in the autonomous driving industry,that is,realizing real-time vehicle angle prediction.However,existing methods of vehicle angle prediction utilize only single-modal data to achieve model prediction,such as images captured by the camera,which limits the performance and efficiency of the prediction system.In this paper,we present Emma,a novel vehicle angle prediction strategy that achieves multi-modal prediction and is more efficient.Specifically,Emma exploits both images and inertial measurement unit(IMU)signals with a fusion network for multi-modal data fusion and vehicle angle prediction.Moreover,we design and implement a few-shot learning module in Emma for fast domain adaptation to varied scenarios(e.g.,different vehicle models).Evaluation results demonstrate that Emma achieves overall 97.5%accuracy in predicting three vehicle angle parameters(yaw,pitch,and roll),which outperforms traditional single-modalities by approximately 16.7%-36.8%.Additionally,the few-shot learning module presents promising adaptive ability and shows overall 79.8%and 88.3%accuracy in 5-shot and 10-shot settings,respectively.Finally,empirical results show that Emma reduces energy consumption by 39.7%when running on the Arduino UNO board.展开更多
With the increasing prevalence of Android software,protecting it against malicious threats has become a critical concern.Traditional malware detection methods,tailored for static environments,often fail to adapt to ev...With the increasing prevalence of Android software,protecting it against malicious threats has become a critical concern.Traditional malware detection methods,tailored for static environments,often fail to adapt to evolving threats in dynamic environments.To address the challenge of detecting evolving malware,we introduce DMDroid,a novel multi-modal fusion-based framework for malware analysis and detection.DMDroid leverages an array of feature extraction technologies and advanced deep learning models to analyze data,enhanced by a multi-head attention mechanism.This mechanism optimizes the integration of diverse static features from graphbased and image-based modalities,including permissions,API calls,opcodes,and bytecode sequences,prioritizing critical features to effectively detect new and evolving malware threats.We evaluate DMDroid in various realistic environments.Experiments show that compared to Bai,Drebin,and MaMa-pkg detector,DMDroid can improve the detection accuracy by 117.56%,122.11%,and 119.47%,respectively.Compared to an unimodal approach,DMDroid can enhance the accuracy,macro-averaged F1 score,and weighted-averaged F1 score by 143.25%,75.84%and 279.22%.The prototype can help to improve the quality and security of Android malware analysis and detection.展开更多
Acute Bilirubin Encephalopathy(ABE)is a significant threat to neonates and it leads to disability and high mortality rates.Detecting and treating ABE promptly is important to prevent further complications and long-ter...Acute Bilirubin Encephalopathy(ABE)is a significant threat to neonates and it leads to disability and high mortality rates.Detecting and treating ABE promptly is important to prevent further complications and long-term issues.Recent studies have explored ABE diagnosis.However,they often face limitations in classification due to reliance on a single modality of Magnetic Resonance Imaging(MRI).To tackle this problem,the authors propose a Tri-M2MT model for precise ABE detection by using tri-modality MRI scans.The scans include T1-weighted imaging(T1WI),T2-weighted imaging(T2WI),and apparent diffusion coefficient maps to get indepth information.Initially,the tri-modality MRI scans are collected and preprocessesed by using an Advanced Gaussian Filter for noise reduction and Z-score normalisation for data standardisation.An Advanced Capsule Network was utilised to extract relevant features by using Snake Optimization Algorithm to select optimal features based on feature correlation with the aim of minimising complexity and enhancing detection accuracy.Furthermore,a multi-transformer approach was used for feature fusion and identify feature correlations effectively.Finally,accurate ABE diagnosis is achieved through the utilisation of a SoftMax layer.The performance of the proposed Tri-M2MT model is evaluated across various metrics,including accuracy,specificity,sensitivity,F1-score,and ROC curve analysis,and the proposed methodology provides better performance compared to existing methodologies.展开更多
Objective To develop a non-invasive predictive model for coronary artery stenosis severity based on adaptive multi-modal integration of traditional Chinese and western medicine data.Methods Clinical indicators,echocar...Objective To develop a non-invasive predictive model for coronary artery stenosis severity based on adaptive multi-modal integration of traditional Chinese and western medicine data.Methods Clinical indicators,echocardiographic data,traditional Chinese medicine(TCM)tongue manifestations,and facial features were collected from patients who underwent coro-nary computed tomography angiography(CTA)in the Cardiac Care Unit(CCU)of Shanghai Tenth People's Hospital between May 1,2023 and May 1,2024.An adaptive weighted multi-modal data fusion(AWMDF)model based on deep learning was constructed to predict the severity of coronary artery stenosis.The model was evaluated using metrics including accura-cy,precision,recall,F1 score,and the area under the receiver operating characteristic(ROC)curve(AUC).Further performance assessment was conducted through comparisons with six ensemble machine learning methods,data ablation,model component ablation,and various decision-level fusion strategies.Results A total of 158 patients were included in the study.The AWMDF model achieved ex-cellent predictive performance(AUC=0.973,accuracy=0.937,precision=0.937,recall=0.929,and F1 score=0.933).Compared with model ablation,data ablation experiments,and various traditional machine learning models,the AWMDF model demonstrated superior per-formance.Moreover,the adaptive weighting strategy outperformed alternative approaches,including simple weighting,averaging,voting,and fixed-weight schemes.Conclusion The AWMDF model demonstrates potential clinical value in the non-invasive prediction of coronary artery disease and could serve as a tool for clinical decision support.展开更多
Traditional Chinese medicine(TCM)demonstrates distinctive advantages in disease prevention and treatment.However,analyzing its biological mechanisms through the modern medical research paradigm of“single drug,single ...Traditional Chinese medicine(TCM)demonstrates distinctive advantages in disease prevention and treatment.However,analyzing its biological mechanisms through the modern medical research paradigm of“single drug,single target”presents significant challenges due to its holistic approach.Network pharmacology and its core theory of network targets connect drugs and diseases from a holistic and systematic perspective based on biological networks,overcoming the limitations of reductionist research models and showing considerable value in TCM research.Recent integration of network target computational and experimental methods with artificial intelligence(AI)and multi-modal multi-omics technologies has substantially enhanced network pharmacology methodology.The advancement in computational and experimental techniques provides complementary support for network target theory in decoding TCM principles.This review,centered on network targets,examines the progress of network target methods combined with AI in predicting disease molecular mechanisms and drug-target relationships,alongside the application of multi-modal multi-omics technologies in analyzing TCM formulae,syndromes,and toxicity.Looking forward,network target theory is expected to incorporate emerging technologies while developing novel approaches aligned with its unique characteristics,potentially leading to significant breakthroughs in TCM research and advancing scientific understanding and innovation in TCM.展开更多
With the advent of the next-generation Air Traffic Control(ATC)system,there is growing interest in using Artificial Intelligence(AI)techniques to enhance Situation Awareness(SA)for ATC Controllers(ATCOs),i.e.,Intellig...With the advent of the next-generation Air Traffic Control(ATC)system,there is growing interest in using Artificial Intelligence(AI)techniques to enhance Situation Awareness(SA)for ATC Controllers(ATCOs),i.e.,Intelligent SA(ISA).However,the existing AI-based SA approaches often rely on unimodal data and lack a comprehensive description and benchmark of the ISA tasks utilizing multi-modal data for real-time ATC environments.To address this gap,by analyzing the situation awareness procedure of the ATCOs,the ISA task is refined to the processing of the two primary elements,i.e.,spoken instructions and flight trajectories.Subsequently,the ISA is further formulated into Controlling Intent Understanding(CIU)and Flight Trajectory Prediction(FTP)tasks.For the CIU task,an innovative automatic speech recognition and understanding framework is designed to extract the controlling intent from unstructured and continuous ATC communications.For the FTP task,the single-and multi-horizon FTP approaches are investigated to support the high-precision prediction of the situation evolution.A total of 32 unimodal/multi-modal advanced methods with extensive evaluation metrics are introduced to conduct the benchmarks on the real-world multi-modal ATC situation dataset.Experimental results demonstrate the effectiveness of AI-based techniques in enhancing ISA for the ATC environment.展开更多
A personalized outfit recommendation has emerged as a hot research topic in the fashion domain.However,existing recommendations do not fully exploit user style preferences.Typically,users prefer particular styles such...A personalized outfit recommendation has emerged as a hot research topic in the fashion domain.However,existing recommendations do not fully exploit user style preferences.Typically,users prefer particular styles such as casual and athletic styles,and consider attributes like color and texture when selecting outfits.To achieve personalized outfit recommendations in line with user style preferences,this paper proposes a personal style guided outfit recommendation with multi-modal fashion compatibility modeling,termed as PSGNet.Firstly,a style classifier is designed to categorize fashion images of various clothing types and attributes into distinct style categories.Secondly,a personal style prediction module extracts user style preferences by analyzing historical data.Then,to address the limitations of single-modal representations and enhance fashion compatibility,both fashion images and text data are leveraged to extract multi-modal features.Finally,PSGNet integrates these components through Bayesian personalized ranking(BPR)to unify the personal style and fashion compatibility,where the former is used as personal style features and guides the output of the personalized outfit recommendation tailored to the target user.Extensive experiments on large-scale datasets demonstrate that the proposed model is efficient on the personalized outfit recommendation.展开更多
Integrating multiple medical imaging techniques,including Magnetic Resonance Imaging(MRI),Computed Tomography,Positron Emission Tomography(PET),and ultrasound,provides a comprehensive view of the patient health status...Integrating multiple medical imaging techniques,including Magnetic Resonance Imaging(MRI),Computed Tomography,Positron Emission Tomography(PET),and ultrasound,provides a comprehensive view of the patient health status.Each of these methods contributes unique diagnostic insights,enhancing the overall assessment of patient condition.Nevertheless,the amalgamation of data from multiple modalities presents difficulties due to disparities in resolution,data collection methods,and noise levels.While traditional models like Convolutional Neural Networks(CNNs)excel in single-modality tasks,they struggle to handle multi-modal complexities,lacking the capacity to model global relationships.This research presents a novel approach for examining multi-modal medical imagery using a transformer-based system.The framework employs self-attention and cross-attention mechanisms to synchronize and integrate features across various modalities.Additionally,it shows resilience to variations in noise and image quality,making it adaptable for real-time clinical use.To address the computational hurdles linked to transformer models,particularly in real-time clinical applications in resource-constrained environments,several optimization techniques have been integrated to boost scalability and efficiency.Initially,a streamlined transformer architecture was adopted to minimize the computational load while maintaining model effectiveness.Methods such as model pruning,quantization,and knowledge distillation have been applied to reduce the parameter count and enhance the inference speed.Furthermore,efficient attention mechanisms such as linear or sparse attention were employed to alleviate the substantial memory and processing requirements of traditional self-attention operations.For further deployment optimization,researchers have implemented hardware-aware acceleration strategies,including the use of TensorRT and ONNX-based model compression,to ensure efficient execution on edge devices.These optimizations allow the approach to function effectively in real-time clinical settings,ensuring viability even in environments with limited resources.Future research directions include integrating non-imaging data to facilitate personalized treatment and enhancing computational efficiency for implementation in resource-limited environments.This study highlights the transformative potential of transformer models in multi-modal medical imaging,offering improvements in diagnostic accuracy and patient care outcomes.展开更多
Carbon dots(CDs)-based composites have shown impressive performance in fields of information encryption and sensing,however,a great challenge is to simultaneously implement multi-mode luminescence and room-temperature...Carbon dots(CDs)-based composites have shown impressive performance in fields of information encryption and sensing,however,a great challenge is to simultaneously implement multi-mode luminescence and room-temperature phosphorescence(RTP)detection in single system due to the formidable synthesis.Herein,a multifunctional composite of Eu&CDs@p RHO has been designed by co-assembly strategy and prepared via a facile calcination and impregnation treatment.Eu&CDs@p RHO exhibits intense fluorescence(FL)and RTP coming from two individual luminous centers,Eu3+in the free pores and CDs in the interrupted structure of RHO zeolite.Unique four-mode color outputs including pink(Eu^(3+),ex.254 nm),light violet(CDs,ex.365 nm),blue(CDs,254 nm off),and green(CDs,365 nm off)could be realized,on the basis of it,a preliminary application of advanced information encoding has been demonstrated.Given the free pores of matrix and stable RTP in water of confined CDs,a visual RTP detection of Fe^(3+)ions is achieved with the detection limit as low as 9.8μmol/L.This work has opened up a new perspective for the strategic amalgamation of luminous vips with porous zeolite to construct the advanced functional materials.展开更多
文摘With the increasing of the elderly population and the growing hearth care cost, the role of service robots in aiding the disabled and the elderly is becoming important. Many researchers in the world have paid much attention to heaRthcare robots and rehabilitation robots. To get natural and harmonious communication between the user and a service robot, the information perception/feedback ability, and interaction ability for service robots become more important in many key issues.
基金supported in part by the Technology Innovation 2030 under Grant 2022ZD0211700.
文摘For the analysis of spinal and disc diseases,automated tissue segmentation of the lumbar spine is vital.Due to the continuous and concentrated location of the target,the abundance of edge features,and individual differences,conventional automatic segmentation methods perform poorly.Since the success of deep learning in the segmentation of medical images has been shown in the past few years,it has been applied to this task in a number of ways.The multi-scale and multi-modal features of lumbar tissues,however,are rarely explored by methodologies of deep learning.Because of the inadequacies in medical images availability,it is crucial to effectively fuse various modes of data collection for model training to alleviate the problem of insufficient samples.In this paper,we propose a novel multi-modality hierarchical fusion network(MHFN)for improving lumbar spine segmentation by learning robust feature representations from multi-modality magnetic resonance images.An adaptive group fusion module(AGFM)is introduced in this paper to fuse features from various modes to extract cross-modality features that could be valuable.Furthermore,to combine features from low to high levels of cross-modality,we design a hierarchical fusion structure based on AGFM.Compared to the other feature fusion methods,AGFM is more effective based on experimental results on multi-modality MR images of the lumbar spine.To further enhance segmentation accuracy,we compare our network with baseline fusion structures.Compared to the baseline fusion structures(input-level:76.27%,layer-level:78.10%,decision-level:79.14%),our network was able to segment fractured vertebrae more accurately(85.05%).
文摘Listening is the breakthrough for conquering English castle, it is not only the requirement of English test, but also the practical use of English knowledge and the embodiment of English comprehensive ability. Listening teaching plays a crucial role in foreign language teaching. However, the effect of listening teaching is undesirable. In recent years, multi-modality theory has been focused by many researchers. In view of particularity of the listening teaching, it is urgent to apply the multi-modality theory to English listening teaching which will produce very good teaching result.
基金Project(61240010)supported by the National Natural Science Foundation of ChinaProject(20070007070)supported by Specialized Research Fund for the Doctoral Program of Higher Education of China
文摘A new coarse-to-fine strategy was proposed for nonrigid registration of computed tomography(CT) and magnetic resonance(MR) images of a liver.This hierarchical framework consisted of an affine transformation and a B-splines free-form deformation(FFD).The affine transformation performed a rough registration targeting the mismatch between the CT and MR images.The B-splines FFD transformation performed a finer registration by correcting local motion deformation.In the registration algorithm,the normalized mutual information(NMI) was used as similarity measure,and the limited memory Broyden-Fletcher- Goldfarb-Shannon(L-BFGS) optimization method was applied for optimization process.The algorithm was applied to the fully automated registration of liver CT and MR images in three subjects.The results demonstrate that the proposed method not only significantly improves the registration accuracy but also reduces the running time,which is effective and efficient for nonrigid registration.
文摘In this work, we propose a new variational model for multi-modal image registration and present an efficient numerical implementation. The model minimizes a new functional based on using reformulated normalized gradients of the images as the fidelity term and higher-order derivatives as the regularizer. A key feature of the model is its ability of guaranteeing a diffeomorphic transformation which is achieved by a control term motivated by the quasi-conformal map and Beltrami coefficient. The existence of the solution of this model is established. To solve the model numerically, we design a Gauss-Newton method to solve the resulting discrete optimization problem and prove its convergence;a multilevel technique is employed to speed up the initialization and avoid likely local minima of the underlying functional. Finally, numerical experiments demonstrate that this new model can deliver good performances for multi-modal image registration and simultaneously generate an accurate diffeomorphic transformation.
基金This work is supported by the National Natural Science Foundation of China(51973226,51773004,51920105006 and 81630056)National Key Basic Research Program of China(2014CB542202)the Youth Innovation Promotion Association CAS(No.2019031)for financial support.
文摘Development of versatile theranostic agents that simultaneously integrate therapeutic and diagnostic features remains a clinical urgent.Herein,we aimed to prepare uniform PEGylated(lactic-co-glycolic acid)(PLGA)microcapsules(PB@(Fe_(3)O_(4)@PEG-PLGA)MCs)with superparamagnetic Fe3O4 nanoparticles embedded in the shell and Prussian blue(PB)NPs inbuilt in the cavity via a premix membrane emulsification(PME)method.On account of the eligible geometry and multiple load capacity,these MCs could be used as efficient multi-modality contrast agents to simultaneously enhance the contrasts of US,MR and PAT imaging.In-built PB NPs furnished the MCs with excellent photothermal conversion property and embedded Fe_(3)O_(4)NPs endowed the magnetic location for fabrication of targeted drug delivery system.Notably,after further in-situ encapsulation of antitumor drug of DOX,(PB+DOX)@(Fe_(3)O_(4)@PEG-PLGA)MCs possessed more unique advantages on achieving near infrared(NIR)-responsive drug delivery and magnetic-guided chemo-photothermal synergistic osteosarcoma therapy.In vitro and in vivo studies revealed these biocompatible(PB+DOX)@(Fe_(3)O_(4)@PEG-PLGA)MCs could effectively target to the tumor tissue with superior therapeutic effect against the invasion of osteosarcoma and alleviation of osteolytic lesions,which will be developed as a smart platform integrating multi-modality imaging capabilities and synergistic effect with high therapy efficacy.
文摘Self-mixing interferometry(SMI)is an attractive sensing scheme that typically relies on mono-modal operation of an employed laser diode.However,change in laser modality can occur due to change in operating conditions.So,detection of occurrence of multi-modality in SMI signals is necessary to avoid erroneous metric measurements.Typically,processing of multi-modal SMI signals is a difficult task due to the diverse and complex nature of such signals.However,the proposed techniques can significantly ease this task by identifying the modal state of SMI signals with 100%success rate so that interferometric fringes can be correctly interpreted for metric sensing applications.
基金This work is supported by The Collaborative Research Project under the SIMTech-NUS Joint Laboratory,“SIMTech-NUS Joint Lab on Large-area Flexible Hybrid Electronics”and The National Key Research and Devel-opment Program of China(Grant No.2019YFB2004800,Project No.R-2020-S-002).
文摘In metaverse,a digital-twin smart home is a vital platform for immersive communication between the physical and virtual world.Triboelectric nanogenerators(TENGs)sensors contribute substantially to providing smart-home monitoring.However,TENG deployment is hindered by its unstable out-put under environment changes.Herein,we develop a digital-twin smart home using a robust all-TENG based information mat(InfoMat),which consists of an in-home mat array and an entry mat.The interdigital electrodes design allows environment-insensitive ratiometric readout from the mat array to can-cel the commonly experienced environmental variations.Arbitrary position sensing is also achieved because of the interval arrangement of the mat pixels.Concurrently,the two-channel entry mat generates multi-modality informa-tion to aid the 10-user identification accuracy to increase from 93% to 99% compared to the one-channel case.Furthermore,a digital-twin smart home is visualized by real-time projecting the information in smart home to virtual reality,including access authorization,position,walking trajectory,dynamic activities/sports,and so on.
文摘The precise prediction of molecular properties is essential for advancements in drug development,particularly in virtual screening and compound optimization.The recent introduction of numerous deep learningbased methods has shown remarkable potential in enhancing Molecular Property Prediction(MPP),especially improving accuracy and insights into molecular structures.Yet,two critical questions arise:does the integration of domain knowledge augment the accuracy of molecular property prediction and does employing multi-modal data fusion yield more precise results than unique data source methods?To explore these matters,we comprehensively review and quantitatively analyze recent deep learning methods based on various benchmarks.We discover that integrating molecular information significantly improves Molecular Property Prediction(MPP)for both regression and classification tasks.Specifically,regression improvements,measured by reductions in Root Mean Square Error(RMSE),are up to 4.0%,while classification enhancements,measured by the area under the receiver operating characteristic curve(ROC-AUC),are up to 1.7%.Additionally,we discover that,as measured by ROC-AUC,augmenting 2D graphs with 3D information improves performance for classification tasks by up to 13.2%and enriching 2D graphs with 1D SMILES boosts multi-modal learning performance for regression tasks by up to 9.1%.The two consolidated insights offer crucial guidance for future advancements in drug discovery.
基金supported by the National Natural Science Foundation of China(No.62101471)partially supported by the Shenzhen Research Institute of City University of Hong Kong,the Research Grants Council of the Hong Kong Special Administrative Region,China(No.CityU 21201420)+8 种基金Shenzhen Science and Technology Funding Fundamental Research Program(No.2021Szvup126)National Natural Science Foundation of Shandong Province(No.ZR2021LZH010)Changsha International and Regional Science and Technology Cooperation Program(No.kh2201023)Chow Sang Sang Group Research Fund sponsored by Chow Sang Sang Holdings International Limited(No.9229062)CityU MFPRC(No.9680333)CityU SIRG(No.7020057)CityU APRC(No.9610485)CityU ARG(No.9667225)CityU SRG-Fd(No.7005666).
文摘Autonomous driving and self-driving vehicles have become the most popular selection for customers for their convenience.Vehicle angle prediction is one of the most prevalent topics in the autonomous driving industry,that is,realizing real-time vehicle angle prediction.However,existing methods of vehicle angle prediction utilize only single-modal data to achieve model prediction,such as images captured by the camera,which limits the performance and efficiency of the prediction system.In this paper,we present Emma,a novel vehicle angle prediction strategy that achieves multi-modal prediction and is more efficient.Specifically,Emma exploits both images and inertial measurement unit(IMU)signals with a fusion network for multi-modal data fusion and vehicle angle prediction.Moreover,we design and implement a few-shot learning module in Emma for fast domain adaptation to varied scenarios(e.g.,different vehicle models).Evaluation results demonstrate that Emma achieves overall 97.5%accuracy in predicting three vehicle angle parameters(yaw,pitch,and roll),which outperforms traditional single-modalities by approximately 16.7%-36.8%.Additionally,the few-shot learning module presents promising adaptive ability and shows overall 79.8%and 88.3%accuracy in 5-shot and 10-shot settings,respectively.Finally,empirical results show that Emma reduces energy consumption by 39.7%when running on the Arduino UNO board.
基金supported by a sub-project of the National Key Research and Development Program of the Ministry of Science and Technology,with grant number 2022YFB4501700
文摘With the increasing prevalence of Android software,protecting it against malicious threats has become a critical concern.Traditional malware detection methods,tailored for static environments,often fail to adapt to evolving threats in dynamic environments.To address the challenge of detecting evolving malware,we introduce DMDroid,a novel multi-modal fusion-based framework for malware analysis and detection.DMDroid leverages an array of feature extraction technologies and advanced deep learning models to analyze data,enhanced by a multi-head attention mechanism.This mechanism optimizes the integration of diverse static features from graphbased and image-based modalities,including permissions,API calls,opcodes,and bytecode sequences,prioritizing critical features to effectively detect new and evolving malware threats.We evaluate DMDroid in various realistic environments.Experiments show that compared to Bai,Drebin,and MaMa-pkg detector,DMDroid can improve the detection accuracy by 117.56%,122.11%,and 119.47%,respectively.Compared to an unimodal approach,DMDroid can enhance the accuracy,macro-averaged F1 score,and weighted-averaged F1 score by 143.25%,75.84%and 279.22%.The prototype can help to improve the quality and security of Android malware analysis and detection.
文摘Acute Bilirubin Encephalopathy(ABE)is a significant threat to neonates and it leads to disability and high mortality rates.Detecting and treating ABE promptly is important to prevent further complications and long-term issues.Recent studies have explored ABE diagnosis.However,they often face limitations in classification due to reliance on a single modality of Magnetic Resonance Imaging(MRI).To tackle this problem,the authors propose a Tri-M2MT model for precise ABE detection by using tri-modality MRI scans.The scans include T1-weighted imaging(T1WI),T2-weighted imaging(T2WI),and apparent diffusion coefficient maps to get indepth information.Initially,the tri-modality MRI scans are collected and preprocessesed by using an Advanced Gaussian Filter for noise reduction and Z-score normalisation for data standardisation.An Advanced Capsule Network was utilised to extract relevant features by using Snake Optimization Algorithm to select optimal features based on feature correlation with the aim of minimising complexity and enhancing detection accuracy.Furthermore,a multi-transformer approach was used for feature fusion and identify feature correlations effectively.Finally,accurate ABE diagnosis is achieved through the utilisation of a SoftMax layer.The performance of the proposed Tri-M2MT model is evaluated across various metrics,including accuracy,specificity,sensitivity,F1-score,and ROC curve analysis,and the proposed methodology provides better performance compared to existing methodologies.
基金Construction Program of the Key Discipline of State Administration of Traditional Chinese Medicine of China(ZYYZDXK-2023069)Research Project of Shanghai Municipal Health Commission (2024QN018)Shanghai University of Traditional Chinese Medicine Science and Technology Development Program (23KFL005)。
文摘Objective To develop a non-invasive predictive model for coronary artery stenosis severity based on adaptive multi-modal integration of traditional Chinese and western medicine data.Methods Clinical indicators,echocardiographic data,traditional Chinese medicine(TCM)tongue manifestations,and facial features were collected from patients who underwent coro-nary computed tomography angiography(CTA)in the Cardiac Care Unit(CCU)of Shanghai Tenth People's Hospital between May 1,2023 and May 1,2024.An adaptive weighted multi-modal data fusion(AWMDF)model based on deep learning was constructed to predict the severity of coronary artery stenosis.The model was evaluated using metrics including accura-cy,precision,recall,F1 score,and the area under the receiver operating characteristic(ROC)curve(AUC).Further performance assessment was conducted through comparisons with six ensemble machine learning methods,data ablation,model component ablation,and various decision-level fusion strategies.Results A total of 158 patients were included in the study.The AWMDF model achieved ex-cellent predictive performance(AUC=0.973,accuracy=0.937,precision=0.937,recall=0.929,and F1 score=0.933).Compared with model ablation,data ablation experiments,and various traditional machine learning models,the AWMDF model demonstrated superior per-formance.Moreover,the adaptive weighting strategy outperformed alternative approaches,including simple weighting,averaging,voting,and fixed-weight schemes.Conclusion The AWMDF model demonstrates potential clinical value in the non-invasive prediction of coronary artery disease and could serve as a tool for clinical decision support.
文摘Traditional Chinese medicine(TCM)demonstrates distinctive advantages in disease prevention and treatment.However,analyzing its biological mechanisms through the modern medical research paradigm of“single drug,single target”presents significant challenges due to its holistic approach.Network pharmacology and its core theory of network targets connect drugs and diseases from a holistic and systematic perspective based on biological networks,overcoming the limitations of reductionist research models and showing considerable value in TCM research.Recent integration of network target computational and experimental methods with artificial intelligence(AI)and multi-modal multi-omics technologies has substantially enhanced network pharmacology methodology.The advancement in computational and experimental techniques provides complementary support for network target theory in decoding TCM principles.This review,centered on network targets,examines the progress of network target methods combined with AI in predicting disease molecular mechanisms and drug-target relationships,alongside the application of multi-modal multi-omics technologies in analyzing TCM formulae,syndromes,and toxicity.Looking forward,network target theory is expected to incorporate emerging technologies while developing novel approaches aligned with its unique characteristics,potentially leading to significant breakthroughs in TCM research and advancing scientific understanding and innovation in TCM.
基金supported by the National Natural Science Foundation of China(Nos.62371323,62401380,U2433217,U2333209,and U20A20161)Natural Science Foundation of Sichuan Province,China(Nos.2025ZNSFSC1476)+2 种基金Sichuan Science and Technology Program,China(Nos.2024YFG0010 and 2024ZDZX0046)the Institutional Research Fund from Sichuan University(Nos.2024SCUQJTX030)the Open Fund of Key Laboratory of Flight Techniques and Flight Safety,CAAC(Nos.GY2024-01A).
文摘With the advent of the next-generation Air Traffic Control(ATC)system,there is growing interest in using Artificial Intelligence(AI)techniques to enhance Situation Awareness(SA)for ATC Controllers(ATCOs),i.e.,Intelligent SA(ISA).However,the existing AI-based SA approaches often rely on unimodal data and lack a comprehensive description and benchmark of the ISA tasks utilizing multi-modal data for real-time ATC environments.To address this gap,by analyzing the situation awareness procedure of the ATCOs,the ISA task is refined to the processing of the two primary elements,i.e.,spoken instructions and flight trajectories.Subsequently,the ISA is further formulated into Controlling Intent Understanding(CIU)and Flight Trajectory Prediction(FTP)tasks.For the CIU task,an innovative automatic speech recognition and understanding framework is designed to extract the controlling intent from unstructured and continuous ATC communications.For the FTP task,the single-and multi-horizon FTP approaches are investigated to support the high-precision prediction of the situation evolution.A total of 32 unimodal/multi-modal advanced methods with extensive evaluation metrics are introduced to conduct the benchmarks on the real-world multi-modal ATC situation dataset.Experimental results demonstrate the effectiveness of AI-based techniques in enhancing ISA for the ATC environment.
基金Shanghai Frontier Science Research Center for Modern Textiles,Donghua University,ChinaOpen Project of Henan Key Laboratory of Intelligent Manufacturing of Mechanical Equipment,Zhengzhou University of Light Industry,China(No.IM202303)National Key Research and Development Program of China(No.2019YFB1706300)。
文摘A personalized outfit recommendation has emerged as a hot research topic in the fashion domain.However,existing recommendations do not fully exploit user style preferences.Typically,users prefer particular styles such as casual and athletic styles,and consider attributes like color and texture when selecting outfits.To achieve personalized outfit recommendations in line with user style preferences,this paper proposes a personal style guided outfit recommendation with multi-modal fashion compatibility modeling,termed as PSGNet.Firstly,a style classifier is designed to categorize fashion images of various clothing types and attributes into distinct style categories.Secondly,a personal style prediction module extracts user style preferences by analyzing historical data.Then,to address the limitations of single-modal representations and enhance fashion compatibility,both fashion images and text data are leveraged to extract multi-modal features.Finally,PSGNet integrates these components through Bayesian personalized ranking(BPR)to unify the personal style and fashion compatibility,where the former is used as personal style features and guides the output of the personalized outfit recommendation tailored to the target user.Extensive experiments on large-scale datasets demonstrate that the proposed model is efficient on the personalized outfit recommendation.
基金supported by the Deanship of Research and Graduate Studies at King Khalid University under Small Research Project grant number RGP1/139/45.
文摘Integrating multiple medical imaging techniques,including Magnetic Resonance Imaging(MRI),Computed Tomography,Positron Emission Tomography(PET),and ultrasound,provides a comprehensive view of the patient health status.Each of these methods contributes unique diagnostic insights,enhancing the overall assessment of patient condition.Nevertheless,the amalgamation of data from multiple modalities presents difficulties due to disparities in resolution,data collection methods,and noise levels.While traditional models like Convolutional Neural Networks(CNNs)excel in single-modality tasks,they struggle to handle multi-modal complexities,lacking the capacity to model global relationships.This research presents a novel approach for examining multi-modal medical imagery using a transformer-based system.The framework employs self-attention and cross-attention mechanisms to synchronize and integrate features across various modalities.Additionally,it shows resilience to variations in noise and image quality,making it adaptable for real-time clinical use.To address the computational hurdles linked to transformer models,particularly in real-time clinical applications in resource-constrained environments,several optimization techniques have been integrated to boost scalability and efficiency.Initially,a streamlined transformer architecture was adopted to minimize the computational load while maintaining model effectiveness.Methods such as model pruning,quantization,and knowledge distillation have been applied to reduce the parameter count and enhance the inference speed.Furthermore,efficient attention mechanisms such as linear or sparse attention were employed to alleviate the substantial memory and processing requirements of traditional self-attention operations.For further deployment optimization,researchers have implemented hardware-aware acceleration strategies,including the use of TensorRT and ONNX-based model compression,to ensure efficient execution on edge devices.These optimizations allow the approach to function effectively in real-time clinical settings,ensuring viability even in environments with limited resources.Future research directions include integrating non-imaging data to facilitate personalized treatment and enhancing computational efficiency for implementation in resource-limited environments.This study highlights the transformative potential of transformer models in multi-modal medical imaging,offering improvements in diagnostic accuracy and patient care outcomes.
基金supported by the National Natural Science Foundation of China(No.22288101)the 111 Project(No.B17020)。
文摘Carbon dots(CDs)-based composites have shown impressive performance in fields of information encryption and sensing,however,a great challenge is to simultaneously implement multi-mode luminescence and room-temperature phosphorescence(RTP)detection in single system due to the formidable synthesis.Herein,a multifunctional composite of Eu&CDs@p RHO has been designed by co-assembly strategy and prepared via a facile calcination and impregnation treatment.Eu&CDs@p RHO exhibits intense fluorescence(FL)and RTP coming from two individual luminous centers,Eu3+in the free pores and CDs in the interrupted structure of RHO zeolite.Unique four-mode color outputs including pink(Eu^(3+),ex.254 nm),light violet(CDs,ex.365 nm),blue(CDs,254 nm off),and green(CDs,365 nm off)could be realized,on the basis of it,a preliminary application of advanced information encoding has been demonstrated.Given the free pores of matrix and stable RTP in water of confined CDs,a visual RTP detection of Fe^(3+)ions is achieved with the detection limit as low as 9.8μmol/L.This work has opened up a new perspective for the strategic amalgamation of luminous vips with porous zeolite to construct the advanced functional materials.