Background Frailty is common and significantly impacts prognosis in heart failure(HF). The Vulnerable Elders Survey-13(VES-13), widely used in oncogeriatrics and public health, remains unexplored as a frailty screenin...Background Frailty is common and significantly impacts prognosis in heart failure(HF). The Vulnerable Elders Survey-13(VES-13), widely used in oncogeriatrics and public health, remains unexplored as a frailty screening tool in HF outpatients. In this study, we prospectively evaluated VES-13 against a multimodal screening assessment in detecting frailty and predicting individual risk of adverse prognosis.Methods Frailty was assessed at the initial visit using both a multimodal approach, incorporating Barthel Index, Older American Resources and Services scale, Pfeiffer Test, abbreviated Geriatric Depression Scale, age > 85 years, lacking support systems,and VES-13. Patients scoring ≥ 3 on VES-13 or meeting at least one multimodal criterion were classified as frail. Endpoints included all-cause mortality, a composite of death or HF hospitalization, and recurrent HF hospitalizations.Results A total of 301 patients were evaluated. VES-13 identified 40.2% as frail and the multimodal assessment 33.2%. In Cox regression analyses, frailty identified by VES-13 showed greater prognostic significance than the multimodal assessment for allcause mortality(HR = 3.70 [2.15–6.33], P < 0.001 vs. 2.40 [1.46–4.0], P = 0.001) and the composite endpoint(HR = 3.13 [2.02–4.84], P< 0.001 vs. 1.96 [1.28–2.99], P = 0.002). Recurrent HF hospitalizations were four times more frequent in VES-13 frail patients while two times in those identified as frail by the multimodal assessment. Additionally, stratifying patients by VES-13 tertiles provided robust risk differentiation.Conclusions VES-13, a simple frailty tool, outperformed a comprehensive multimodal assessment and could be easily integrated into routine HF care, highlighting its clinical utility in identifying patients at risk for poor outcomes.展开更多
As emerging two-dimensional(2D)materials,carbides and nitrides(MXenes)could be solid solutions or organized structures made up of multi-atomic layers.With remarkable and adjustable electrical,optical,mechanical,and el...As emerging two-dimensional(2D)materials,carbides and nitrides(MXenes)could be solid solutions or organized structures made up of multi-atomic layers.With remarkable and adjustable electrical,optical,mechanical,and electrochemical characteristics,MXenes have shown great potential in brain-inspired neuromorphic computing electronics,including neuromorphic gas sensors,pressure sensors and photodetectors.This paper provides a forward-looking review of the research progress regarding MXenes in the neuromorphic sensing domain and discussed the critical challenges that need to be resolved.Key bottlenecks such as insufficient long-term stability under environmental exposure,high costs,scalability limitations in large-scale production,and mechanical mismatch in wearable integration hinder their practical deployment.Furthermore,unresolved issues like interfacial compatibility in heterostructures and energy inefficiency in neu-romorphic signal conversion demand urgent attention.The review offers insights into future research directions enhance the fundamental understanding of MXene properties and promote further integration into neuromorphic computing applications through the convergence with various emerging technologies.展开更多
Since the first design of tactile sensors was proposed by Harmon in 1982,tactile sensors have evolved through four key phases:industrial applications(1980s,basic pressure detection),miniaturization via MEMS(1990s),fle...Since the first design of tactile sensors was proposed by Harmon in 1982,tactile sensors have evolved through four key phases:industrial applications(1980s,basic pressure detection),miniaturization via MEMS(1990s),flexible electronics(2010s,stretchable materials),and intelligent systems(2020s-present,AI-driven multimodal sensing).With the innovation of material,processing techniques,and multimodal fusion of stimuli,the application of tactile sensors has been continuously expanding to a diversity of areas,including but not limited to medical care,aerospace,sports and intelligent robots.Currently,researchers are dedicated to develop tactile sensors with emerging mechanisms and structures,pursuing high-sensitivity,high-resolution,and multimodal characteristics and further constructing tactile systems which imitate and approach the performance of human organs.However,challenges in the combination between the theoretical research and the practical applications are still significant.There is a lack of comprehensive understanding in the state of the art of such knowledge transferring from academic work to technical products.Scaled-up production of laboratory materials faces fatal challenges like high costs,small scale,and inconsistent quality.Ambient factors,such as temperature,humidity,and electromagnetic interference,also impair signal reliability.Moreover,tactile sensors must operate across a wide pressure range(0.1 k Pa to several or even dozens of MPa)to meet diverse application needs.Meanwhile,the existing algorithms,data models and sensing systems commonly reveal insufficient precision as well as undesired robustness in data processing,and there is a realistic gap between the designed and the demanded system response speed.In this review,oriented by the design requirements of intelligent tactile sensing systems,we summarize the common sensing mechanisms,inspired structures,key performance,and optimizing strategies,followed by a brief overview of the recent advances in the perspectives of system integration and algorithm implementation,and the possible roadmap of future development of tactile sensors,providing a forward-looking as well as critical discussions in the future industrial applications of flexible tactile sensors.展开更多
Hepatocellular carcinoma presents with three distinct immune phenotypes,including immune-desert,immune-excluded,and immune-inflamed,indicating various treatment responses and prognostic outcomes.The clinical applicati...Hepatocellular carcinoma presents with three distinct immune phenotypes,including immune-desert,immune-excluded,and immune-inflamed,indicating various treatment responses and prognostic outcomes.The clinical application of multi-omics parameters is still restricted by the expensive and less accessible assays,although they accurately reflect immune status.A comprehensive evaluation framework based on“easy-to-obtain”multi-model clinical parameters is urgently required,incorporating clinical features to establish baseline patient profiles and disease staging;routine blood tests assessing systemic metabolic and functional status;immune cell subsets quantifying subcluster dynamics;imaging features delineating tumor morphology,spatial configuration,and perilesional anatomical relationships;immunohistochemical markers positioning qualitative and quantitative detection of tumor antigens from the cellular and molecular level.This integrated phenomic approach aims to improve prognostic stratification and clinical decision-making in hepatocellular carcinoma management conveniently and practically.展开更多
Business Process Modelling(BPM)is essential for analyzing,improving,and automating the flow of information within organizations,but traditional approaches based on manual interpretation are slow,error-prone,and requir...Business Process Modelling(BPM)is essential for analyzing,improving,and automating the flow of information within organizations,but traditional approaches based on manual interpretation are slow,error-prone,and require a high level of expertise.This article proposes an innovative alternative solution that overcomes these limitations by automatically generating comprehensive Business Process Modelling and Notation(BPMN)diagrams solely from verbal descriptions of the processes to be modeled,utilizing Large Language Models(LLMs)and multimodal Artificial Intelligence(AI).Experimental results,based on video recordings of process explanations provided by an expert from an organization(in this case,the Commercial Courts of a public justice administration),demonstrate that the proposed methodology successfully enables the automatic generation of complete and accurate BPMN diagrams,leading to significant improvements in the speed,accuracy,and accessibility of process modeling.This research makes a substantial contribution to the field of business process modeling,as its methodology is groundbreaking in its use of LLMs and multimodal AI capabilities to handle different types of source material(text and video),combining several tools to minimize the number of queries and reduce the complexity of the prompts required for the automatic generation of successful BPMN diagrams.展开更多
Arrhythmias are a frequently occurring phenomenon in clinical practice,but how to accurately dis-tinguish subtle rhythm abnormalities remains an ongoing difficulty faced by the entire research community when conductin...Arrhythmias are a frequently occurring phenomenon in clinical practice,but how to accurately dis-tinguish subtle rhythm abnormalities remains an ongoing difficulty faced by the entire research community when conducting ECG-based studies.From a review of existing studies,two main factors appear to contribute to this problem:the uneven distribution of arrhythmia classes and the limited expressiveness of features learned by current models.To overcome these limitations,this study proposes a dual-path multimodal framework,termed DM-EHC(Dual-Path Multimodal ECG Heartbeat Classifier),for ECG-based heartbeat classification.The proposed framework links 1D ECG temporal features with 2D time–frequency features.By setting up the dual paths described above,the model can process more dimensions of feature information.The MIT-BIH arrhythmia database was selected as the baseline dataset for the experiments.Experimental results show that the proposed method outperforms single modalities and performs better for certain specific types of arrhythmias.The model achieved mean precision,recall,and F1 score of 95.14%,92.26%,and 93.65%,respectively.These results indicate that the framework is robust and has potential value in automated arrhythmia classification.展开更多
In fire rescue scenarios,traditional manual operations are highly dangerous,as dense smoke,low visibility,extreme heat,and toxic gases not only hinder rescue efficiency but also endanger firefighters’safety.Although ...In fire rescue scenarios,traditional manual operations are highly dangerous,as dense smoke,low visibility,extreme heat,and toxic gases not only hinder rescue efficiency but also endanger firefighters’safety.Although intelligent rescue robots can enter hazardous environments in place of humans,smoke poses major challenges for human detection algorithms.These challenges include the attenuation of visible and infrared signals,complex thermal fields,and interference frombackground objects,all ofwhichmake it difficult to accurately identify trapped individuals.To address this problem,we propose VIF-YOLO,a visible–infrared fusion model for real-time human detection in dense smoke environments.The framework introduces a lightweight multimodal fusion(LMF)module based on learnable low-rank representation blocks to end-to-end integrate visible and infrared images,preserving fine details while enhancing salient features.In addition,an efficient multiscale attention(EMA)mechanism is incorporated into the YOLOv10n backbone to improve feature representation under low-light conditions.Extensive experiments on our newly constructedmultimodal smoke human detection(MSHD)dataset demonstrate thatVIF-YOLOachievesmAP50 of 99.5%,precision of 99.2%,and recall of 99.3%,outperforming YOLOv10n by a clear margin.Furthermore,when deployed on the NVIDIA Jetson Xavier NX,VIF-YOLO attains 40.6 FPS with an average inference latency of 24.6 ms,validating its real-time capability on edge-computing platforms.These results confirm that VIF-YOLO provides accurate,robust,and fast detection across complex backgrounds and diverse smoke conditions,ensuring reliable and rapid localization of individuals in need of rescue.展开更多
Gastrointestinal(GI)cancers remain a leading cause of cancer-related morbidity and mortality worldwide.Artificial intelligence(AI),particularly machine learning and deep learning(DL),has shown promise in enhancing can...Gastrointestinal(GI)cancers remain a leading cause of cancer-related morbidity and mortality worldwide.Artificial intelligence(AI),particularly machine learning and deep learning(DL),has shown promise in enhancing cancer detection,diagnosis,and prognostication.A narrative review of literature published from January 2015 to march 2025 was conducted using PubMed,Web of Science,and Scopus.Search terms included"gastrointestinal cancer","artificial intelligence","machine learning","deep learning","radiomics","multimodal detection"and"predictive modeling".Studies were included if they focused on clinically relevant AI applications in GI oncology.AI algorithms for GI cancer detection have achieved high performance across imaging modalities,with endoscopic DL systems reporting accuracies of 85%-97%for polyp detection and segmentation.Radiomics-based models have predicted molecular biomarkers such as programmed cell death ligand 2 expression with area under the curves up to 0.92.Large language models applied to radiology reports demonstrated diagnostic accuracy comparable to junior radiologists(78.9%vs 80.0%),though without incremental value when combined with human interpretation.Multimodal AI approaches integrating imaging,pathology,and clinical data show emerging potential for precision oncology.AI in GI oncology has reached clinically relevant accuracy levels in multiple diagnostic tasks,with multimodal approaches and predictive biomarker modeling offering new opportunities for personalized care.However,broader validation,integration into clinical workflows,and attention to ethical,legal,and social implications remain critical for widespread adoption.展开更多
This review comprehensively summarized the potential of artificial intelligence(AI)in the management of esophageal cancer.It highlighted the significance of AI-assisted endoscopy in Japan where endoscopy is central to...This review comprehensively summarized the potential of artificial intelligence(AI)in the management of esophageal cancer.It highlighted the significance of AI-assisted endoscopy in Japan where endoscopy is central to both screening and diagnosis.For the clinical adaptation of AI,several challenges remain for its effective translation.The establishment of high-quality clinical databases,such as the National Clinical Database and Japan Endoscopy Database in Japan,which covers almost all cases of esophageal cancer,is essential for validating multimodal AI models.This requires rigorous external validation using diverse datasets,including those from different endoscope manufacturers and image qualities.Furthermore,endoscopists’skills significantly affect diagnostic accuracy,suggesting that AI should serve as a supportive tool rather than a replacement.Addressing these challenges,along with country-specific legal and ethical considerations,will facilitate the successful integration of multimodal AI into the management of esophageal cancer,particularly in endoscopic diagnosis,and contribute to improved patient outcomes.Although this review focused on Japan as a case study,the challenges and solutions described are broadly applicable to other high-incidence regions.展开更多
The diagnostic efficacy of contemporary bioimaging technologies remains constrained by inherent limitations of conventional imaging agents,including suboptimal sensitivity,off-target biodistribution,and inherent cytot...The diagnostic efficacy of contemporary bioimaging technologies remains constrained by inherent limitations of conventional imaging agents,including suboptimal sensitivity,off-target biodistribution,and inherent cytotoxicity.These limitations have catalyzed the development of intelligent stimuli-responsive block copolymers-based bioimaging agents,which was engineered to dynamically respond to endogenous biochemical cues(e.g.,p H gradients,redox potential,enzyme activity,hypoxia environment) or exogenous physical triggers(e.g.,photoirradiation,thermal gradients,ultrasound(US)/magnetic stimuli).Through spatiotemporally controlled structural transformations,stimuli-responsive block copolymers enable precise contrast targeting,activatable signal amplification,and theranostic integration,thereby substantially enhancing signal-to-noise ratios of bioimaging and diagnostic specificity.Hence,this mini-review systematically examines molecular engineering principles for designing p H-,redox-,enzyme-,light-,thermo-,and US/magnetic-responsive polymers,with emphasis on structure-property relationships governing imaging performance modulation.Furthermore,we critically analyze emerging strategies for optical imaging,US synergies,and magnetic resonance imaging(MRI).Multimodal bioimaging has also been elaborated,which could overcome the inherent trade-offs between resolution,penetration depth,and functional specificity in single-modal approaches.By elucidating mechanistic insights and translational challenges,this mini-review aims to establish a design framework of stimuli-responsive block copolymersbased for high fidelity bioimaging agents and accelerate their clinical translation in precise diagnosis and therapy.展开更多
Diabetes mellitus represents a major global health issue,driving the need for noninvasive alternatives to traditional blood glucose monitoring methods.Recent advancements in wearable technology have introduced skin-in...Diabetes mellitus represents a major global health issue,driving the need for noninvasive alternatives to traditional blood glucose monitoring methods.Recent advancements in wearable technology have introduced skin-interfaced biosensors capable of analyzing sweat and skin biomarkers,providing innovative solutions for diabetes diagnosis and monitoring.This review comprehensively discusses the current developments in noninvasive wearable biosensors,emphasizing simultaneous detection of biochemical biomarkers(such as glucose,cortisol,lactate,branched-chain amino acids,and cytokines)and physiological signals(including heart rate,blood pressure,and sweat rate)for accurate,personalized diabetes management.We explore innovations in multimodal sensor design,materials science,biorecognition elements,and integration techniques,highlighting the importance of advanced data analytics,artificial intelligence-driven predictive algorithms,and closed-loop therapeutic systems.Additionally,the review addresses ongoing challenges in biomarker validation,sensor stability,user compliance,data privacy,and regulatory considerations.A holistic,multimodal approach enabled by these next-generation wearable biosensors holds significant potential for improving patient outcomes and facilitating proactive healthcare interventions in diabetes management.展开更多
To ensure the safe and stable operation of rotating machinery,intelligent fault diagnosis methods hold significant research value.However,existing diagnostic approaches largely rely on manual feature extraction and ex...To ensure the safe and stable operation of rotating machinery,intelligent fault diagnosis methods hold significant research value.However,existing diagnostic approaches largely rely on manual feature extraction and expert experience,which limits their adaptability under variable operating conditions and strong noise environments,severely affecting the generalization capability of diagnostic models.To address this issue,this study proposes a multimodal fusion fault diagnosis framework based on Mel-spectrograms and automated machine learning(AutoML).The framework first extracts fault-sensitive Mel time–frequency features from acoustic signals and fuses them with statistical features of vibration signals to construct complementary fault representations.On this basis,automated machine learning techniques are introduced to enable end-to-end diagnostic workflow construction and optimal model configuration acquisition.Finally,diagnostic decisions are achieved by automatically integrating the predictions of multiple high-performance base models.Experimental results on a centrifugal pump vibration and acoustic dataset demonstrate that the proposed framework achieves high diagnostic accuracy under noise-free conditions and maintains strong robustness under noisy interference,validating its efficiency,scalability,and practical value for rotating machinery fault diagnosis.展开更多
It remains difficult to automate the creation and validation of Unified Modeling Language(UML)dia-grams due to unstructured requirements,limited automated pipelines,and the lack of reliable evaluation methods.This stu...It remains difficult to automate the creation and validation of Unified Modeling Language(UML)dia-grams due to unstructured requirements,limited automated pipelines,and the lack of reliable evaluation methods.This study introduces a cohesive architecture that amalgamates requirement development,UML synthesis,and multimodal validation.First,LLaMA-3.2-1B-Instruct was utilized to generate user-focused requirements.Then,DeepSeek-R1-Distill-Qwen-32B applies its reasoning skills to transform these requirements into PlantUML code.Using this dual-LLM pipeline,we constructed a synthetic dataset of 11,997 UML diagrams spanning six major diagram families.Rendering analysis showed that 89.5%of the generated diagrams compile correctly,while invalid cases were detected automatically.To assess quality,we employed a multimodal scoring method that combines Qwen2.5-VL-3B,LLaMA-3.2-11B-Vision-Instruct and Aya-Vision-8B,with weights based on MMMU performance.A study with 94 experts revealed strong alignment between automatic and manual evaluations,yielding a Pearson correlation of r=0.82 and a Fleiss’Kappa of 0.78.This indicates a high degree of concordance between automated metrics and human judgment.Overall,the results demonstrated that our scoring system is effective and that the proposed generation pipeline produces UML diagrams that are both syntactically correct and semantically coherent.More broadly,the system provides a scalable and reproducible foundation for future work in AI-driven software modeling and multimodal verification.展开更多
Artificial intelligence(AI)is transforming the diagnostic landscape of malignant tumors in the urinary system,including prostate cancer,bladder cancer,and renal cell carcinoma(RCC).By integrating imaging,pathology,and...Artificial intelligence(AI)is transforming the diagnostic landscape of malignant tumors in the urinary system,including prostate cancer,bladder cancer,and renal cell carcinoma(RCC).By integrating imaging,pathology,and molecular data,AI enhances the precision and reproducibility of tumor detection,grading,and risk stratification.In prostate cancer,AI-assisted multiparametric Magnetic resonance imaging(MRI)and digital pathology systems improve lesion localization and Gleason scoring.For bladder cancer,deep learning-based cystoscopy and radiomics models from Computed tomography/magnetic resonance imaging(CT/MRI)enable real-time lesion segmentation and non-invasive biomarker prediction,such as Programmed Cell Death-Ligand 1(PD-L1)expression.In RCC,AI,combined with CT/MRI and multi-omics data,aids in subtype classification and prognostic prediction,supporting personalized therapy.However,despite these promising advances,challenges such as data standardization,model generalizability,interpretability,and regulatory compliance hinder AI’s clinical translation.This review outlines the current state of AI in urological cancer diagnosis and prognosis,its technological innovations,and the clinical challenges and opportunities that lie ahead.展开更多
In Human–Robot Interaction(HRI),generating robot trajectories that accurately reflect user intentions while ensuring physical realism remains challenging,especially in unstructured environments.In this study,we devel...In Human–Robot Interaction(HRI),generating robot trajectories that accurately reflect user intentions while ensuring physical realism remains challenging,especially in unstructured environments.In this study,we develop a multimodal framework that integrates symbolic task reasoning with continuous trajectory generation.The approach employs transformer models and adversarial training to map high-level intent to robotic motion.Information from multiple data sources,such as voice traits,hand and body keypoints,visual observations,and recorded paths,is integrated simultaneously.These signals are mapped into a shared representation that supports interpretable reasoning while enabling smooth and realistic motion generation.Based on this design,two different learning strategies are investigated.In the first step,grammar-constrained Linear Temporal Logic(LTL)expressions are created from multimodal human inputs.These expressions are subsequently decoded into robot trajectories.The second method generates trajectories directly from symbolic intent and linguistic data,bypassing an intermediate logical representation.Transformer encoders combine multiple types of information,and autoregressive transformer decoders generate motion sequences.Adding smoothness and speed limits during training increases the likelihood of physical feasibility.To improve the realism and stability of the generated trajectories during training,an adversarial discriminator is also included to guide them toward the distribution of actual robot motion.Tests on the NATSGLD dataset indicate that the complete system exhibits stable training behaviour and performance.In normalised coordinates,the logic-based pipeline has an Average Displacement Error(ADE)of 0.040 and a Final Displacement Error(FDE)of 0.036.The adversarial generator makes substantially more progress,reducing ADE to 0.021 and FDE to 0.018.Visual examination confirms that the generated trajectories closely align with observed motion patterns while preserving smooth temporal dynamics.展开更多
Surgical navigation has evolved significantly through advances in augmented reality,virtual reality,and mixed reality,improving precision and safety across many clinical applications,including neurosurgery,maxillofaci...Surgical navigation has evolved significantly through advances in augmented reality,virtual reality,and mixed reality,improving precision and safety across many clinical applications,including neurosurgery,maxillofacial,spinal,and arthroplasty procedures.By integrating preoperative imaging with real-time intraoperative data,these systems provide dynamic guidance,reduce radiation exposure,and minimize tissue damage.Key challenges persist,including intraoperative registration accuracy,flexible tissue deformation,respiratory compensation,and real-time imaging quality.Emerging solutions include artificial intelligence-driven segmentation,deformation-field modeling,and hybrid registration techniques.Future developments will include lightweight,portable systems,improved non-rigid registration algorithms,and greater clinical adoption.Despite advances in rigid-tissue applications,soft-tissue navigation requires additional innovation to address motion variability and registration reliability,ultimately advancing minimally invasive surgery and precision medicine.展开更多
Humanoid robots hold significant promise for social interaction and emotional companionship.However,their effectiveness hinges on the ability to convey nuanced and authentic emotions.Here,we presented a universal huma...Humanoid robots hold significant promise for social interaction and emotional companionship.However,their effectiveness hinges on the ability to convey nuanced and authentic emotions.Here,we presented a universal humanoid robot head with a facial kinematics model.Using a reinforcement learning framework guided by symmetry assessment,emotion decoupling,and MLLM authenticity evaluation,our system autonomously learns to generate adaptive facial expressions through dynamic landmark adjustments.By transferring the simulation training results to real-world environments,the robot can perform natural and expressive expressions.Another novel feature is the independent regulation of emotion intensity and expression magnitude across emotional categories,which enhances the ability to achieve culturally adaptive and socially resonant robotic expressions significantly.This research advances adaptive humanoid interaction,offering an easier and more efficient pathway toward culturally resonant and psychologically plausible robotic expressions.展开更多
Background:Medical imaging advancements are constrained by fundamental trade-offs between acquisition speed,radiation dose,and image quality,forcing clinicians to work with noisy,incomplete data.Existing reconstructio...Background:Medical imaging advancements are constrained by fundamental trade-offs between acquisition speed,radiation dose,and image quality,forcing clinicians to work with noisy,incomplete data.Existing reconstruction methods either compromise on accuracy with iterative algorithms or suffer from limited generalizability with task-specific deep learning approaches.Methods:We present LDM-PIR,a lightweight physics-conditioned diffusion multi-model for medical image reconstruction that addresses key challenges in magnetic resonance imaging(MRI),CT,and low-photon imaging.Unlike traditional iterative methods,which are computationally expensive,or task-specific deep learning approaches lacking generalizability,integrates three innovations.A physics-conditioned diffusion framework that embeds acquisition operators(Fourier/Radon transforms)and noise models directly into the reconstruction process.A multi-model architecture that unifies denoising,inpainting,and super-resolution via shared weight conditioning.A lightweight design(2.1M parameters)enabling rapid inference(0.8s/image on GPU).Through self-supervised fine-tuning with measurement consistency losses adapts to new imaging modalities using fewer annotated samples.Results:Achieves state-of-the-art performance on fastMRI(peak signal-to-noise ratio(PSNR):34.04 for single-coil/31.50 for multi-coil)and Lung Image Database Consortium and Image Database Resource Initiative(28.83 PSNR under Poisson noise).Clinical evaluations demonstrate superior preservation of anatomical structures,with SSIM improvements of 8.8%for single-coil and 4.36%for multi-coil MRI over uDPIR.Conclusion:It offers a flexible,efficient,and scalable solution for medical image reconstruction,addressing the challenges of noise,undersampling,and modality generalization.The model’s lightweight design allows for rapid inference,while its self-supervised fine-tuning capability minimizes reliance on large annotated datasets,making it suitable for real-world clinical applications.展开更多
The problem of fake news detection(FND)is becoming increasingly important in the field of natural language processing(NLP)because of the rapid dissemination of misleading information on the web.Large language models(L...The problem of fake news detection(FND)is becoming increasingly important in the field of natural language processing(NLP)because of the rapid dissemination of misleading information on the web.Large language models(LLMs)such as GPT-4.Zero excels in natural language understanding tasks but can still struggle to distinguish between fact and fiction,particularly when applied in the wild.However,a key challenge of existing FND methods is that they only consider unimodal data(e.g.,images),while more detailed multimodal data(e.g.,user behaviour,temporal dynamics)is neglected,and the latter is crucial for full-context understanding.To overcome these limitations,we introduce M3-FND(Multimodal Misinformation Mitigation for False News Detection),a novel methodological framework that integrates LLMs with multimodal data sources to perform context-aware veracity assessments.Our method proposes a hybrid system that combines image-text alignment,user credibility profiling,and temporal pattern recognition,which is also strengthened through a natural feedback loop that provides real-time feedback for correcting downstream errors.We use contextual reinforcement learning to schedule prompt updating and update the classifier threshold based on the latest multimodal input,which enables the model to better adapt to changing misinformation attack strategies.M3-FND is tested on three diverse datasets,FakeNewsNet,Twitter15,andWeibo,which contain both text and visual socialmedia content.Experiments showthatM3-FND significantly outperforms conventional and LLMbased baselines in terms of accuracy,F1-score,and AUC on all benchmarks.Our results indicate the importance of employing multimodal cues and adaptive learning for effective and timely detection of fake news.展开更多
文摘Background Frailty is common and significantly impacts prognosis in heart failure(HF). The Vulnerable Elders Survey-13(VES-13), widely used in oncogeriatrics and public health, remains unexplored as a frailty screening tool in HF outpatients. In this study, we prospectively evaluated VES-13 against a multimodal screening assessment in detecting frailty and predicting individual risk of adverse prognosis.Methods Frailty was assessed at the initial visit using both a multimodal approach, incorporating Barthel Index, Older American Resources and Services scale, Pfeiffer Test, abbreviated Geriatric Depression Scale, age > 85 years, lacking support systems,and VES-13. Patients scoring ≥ 3 on VES-13 or meeting at least one multimodal criterion were classified as frail. Endpoints included all-cause mortality, a composite of death or HF hospitalization, and recurrent HF hospitalizations.Results A total of 301 patients were evaluated. VES-13 identified 40.2% as frail and the multimodal assessment 33.2%. In Cox regression analyses, frailty identified by VES-13 showed greater prognostic significance than the multimodal assessment for allcause mortality(HR = 3.70 [2.15–6.33], P < 0.001 vs. 2.40 [1.46–4.0], P = 0.001) and the composite endpoint(HR = 3.13 [2.02–4.84], P< 0.001 vs. 1.96 [1.28–2.99], P = 0.002). Recurrent HF hospitalizations were four times more frequent in VES-13 frail patients while two times in those identified as frail by the multimodal assessment. Additionally, stratifying patients by VES-13 tertiles provided robust risk differentiation.Conclusions VES-13, a simple frailty tool, outperformed a comprehensive multimodal assessment and could be easily integrated into routine HF care, highlighting its clinical utility in identifying patients at risk for poor outcomes.
基金supported by the NSFC(12474071)Natural Science Foundation of Shandong Province(ZR2024YQ051,ZR2025QB50)+6 种基金Guangdong Basic and Applied Basic Research Foundation(2025A1515011191)the Shanghai Sailing Program(23YF1402200,23YF1402400)funded by Basic Research Program of Jiangsu(BK20240424)Open Research Fund of State Key Laboratory of Crystal Materials(KF2406)Taishan Scholar Foundation of Shandong Province(tsqn202408006,tsqn202507058)Young Talent of Lifting engineering for Science and Technology in Shandong,China(SDAST2024QTB002)the Qilu Young Scholar Program of Shandong University。
文摘As emerging two-dimensional(2D)materials,carbides and nitrides(MXenes)could be solid solutions or organized structures made up of multi-atomic layers.With remarkable and adjustable electrical,optical,mechanical,and electrochemical characteristics,MXenes have shown great potential in brain-inspired neuromorphic computing electronics,including neuromorphic gas sensors,pressure sensors and photodetectors.This paper provides a forward-looking review of the research progress regarding MXenes in the neuromorphic sensing domain and discussed the critical challenges that need to be resolved.Key bottlenecks such as insufficient long-term stability under environmental exposure,high costs,scalability limitations in large-scale production,and mechanical mismatch in wearable integration hinder their practical deployment.Furthermore,unresolved issues like interfacial compatibility in heterostructures and energy inefficiency in neu-romorphic signal conversion demand urgent attention.The review offers insights into future research directions enhance the fundamental understanding of MXene properties and promote further integration into neuromorphic computing applications through the convergence with various emerging technologies.
基金the financial support of the National Natural Science Foundation of China(NO.52173028)。
文摘Since the first design of tactile sensors was proposed by Harmon in 1982,tactile sensors have evolved through four key phases:industrial applications(1980s,basic pressure detection),miniaturization via MEMS(1990s),flexible electronics(2010s,stretchable materials),and intelligent systems(2020s-present,AI-driven multimodal sensing).With the innovation of material,processing techniques,and multimodal fusion of stimuli,the application of tactile sensors has been continuously expanding to a diversity of areas,including but not limited to medical care,aerospace,sports and intelligent robots.Currently,researchers are dedicated to develop tactile sensors with emerging mechanisms and structures,pursuing high-sensitivity,high-resolution,and multimodal characteristics and further constructing tactile systems which imitate and approach the performance of human organs.However,challenges in the combination between the theoretical research and the practical applications are still significant.There is a lack of comprehensive understanding in the state of the art of such knowledge transferring from academic work to technical products.Scaled-up production of laboratory materials faces fatal challenges like high costs,small scale,and inconsistent quality.Ambient factors,such as temperature,humidity,and electromagnetic interference,also impair signal reliability.Moreover,tactile sensors must operate across a wide pressure range(0.1 k Pa to several or even dozens of MPa)to meet diverse application needs.Meanwhile,the existing algorithms,data models and sensing systems commonly reveal insufficient precision as well as undesired robustness in data processing,and there is a realistic gap between the designed and the demanded system response speed.In this review,oriented by the design requirements of intelligent tactile sensing systems,we summarize the common sensing mechanisms,inspired structures,key performance,and optimizing strategies,followed by a brief overview of the recent advances in the perspectives of system integration and algorithm implementation,and the possible roadmap of future development of tactile sensors,providing a forward-looking as well as critical discussions in the future industrial applications of flexible tactile sensors.
文摘Hepatocellular carcinoma presents with three distinct immune phenotypes,including immune-desert,immune-excluded,and immune-inflamed,indicating various treatment responses and prognostic outcomes.The clinical application of multi-omics parameters is still restricted by the expensive and less accessible assays,although they accurately reflect immune status.A comprehensive evaluation framework based on“easy-to-obtain”multi-model clinical parameters is urgently required,incorporating clinical features to establish baseline patient profiles and disease staging;routine blood tests assessing systemic metabolic and functional status;immune cell subsets quantifying subcluster dynamics;imaging features delineating tumor morphology,spatial configuration,and perilesional anatomical relationships;immunohistochemical markers positioning qualitative and quantitative detection of tumor antigens from the cellular and molecular level.This integrated phenomic approach aims to improve prognostic stratification and clinical decision-making in hepatocellular carcinoma management conveniently and practically.
基金funded by Fundación CajaCanarias and Fundación Bancaria“la Caixa”,grant number 2023DIG11.
文摘Business Process Modelling(BPM)is essential for analyzing,improving,and automating the flow of information within organizations,but traditional approaches based on manual interpretation are slow,error-prone,and require a high level of expertise.This article proposes an innovative alternative solution that overcomes these limitations by automatically generating comprehensive Business Process Modelling and Notation(BPMN)diagrams solely from verbal descriptions of the processes to be modeled,utilizing Large Language Models(LLMs)and multimodal Artificial Intelligence(AI).Experimental results,based on video recordings of process explanations provided by an expert from an organization(in this case,the Commercial Courts of a public justice administration),demonstrate that the proposed methodology successfully enables the automatic generation of complete and accurate BPMN diagrams,leading to significant improvements in the speed,accuracy,and accessibility of process modeling.This research makes a substantial contribution to the field of business process modeling,as its methodology is groundbreaking in its use of LLMs and multimodal AI capabilities to handle different types of source material(text and video),combining several tools to minimize the number of queries and reduce the complexity of the prompts required for the automatic generation of successful BPMN diagrams.
基金supported by the Innovative Human Resource Development for Local Intel-lectualization program through the Institute of Information&Communications Technology Planning&Evaluation(IITP)grant funded by the Korea government(MSIT)(No.IITP-2026-2020-0-01741)the research fund of Hanyang University(HY-2025-1110).
文摘Arrhythmias are a frequently occurring phenomenon in clinical practice,but how to accurately dis-tinguish subtle rhythm abnormalities remains an ongoing difficulty faced by the entire research community when conducting ECG-based studies.From a review of existing studies,two main factors appear to contribute to this problem:the uneven distribution of arrhythmia classes and the limited expressiveness of features learned by current models.To overcome these limitations,this study proposes a dual-path multimodal framework,termed DM-EHC(Dual-Path Multimodal ECG Heartbeat Classifier),for ECG-based heartbeat classification.The proposed framework links 1D ECG temporal features with 2D time–frequency features.By setting up the dual paths described above,the model can process more dimensions of feature information.The MIT-BIH arrhythmia database was selected as the baseline dataset for the experiments.Experimental results show that the proposed method outperforms single modalities and performs better for certain specific types of arrhythmias.The model achieved mean precision,recall,and F1 score of 95.14%,92.26%,and 93.65%,respectively.These results indicate that the framework is robust and has potential value in automated arrhythmia classification.
基金funded by the National Natural Science Foundation of China under Grant 62306128the Leading Innovation Project of Changzhou Science and Technology Bureau underGrant CQ20230072+2 种基金the Basic Science Research Project of Jiangsu Provincial Department of Education under Grant 23KJD520003the Science and Technology Development Plan Project of Jilin Provinceunder Grant 20240101382JCthe National KeyR esearch and Development Program of China under Grant 2023YFF1105102.
文摘In fire rescue scenarios,traditional manual operations are highly dangerous,as dense smoke,low visibility,extreme heat,and toxic gases not only hinder rescue efficiency but also endanger firefighters’safety.Although intelligent rescue robots can enter hazardous environments in place of humans,smoke poses major challenges for human detection algorithms.These challenges include the attenuation of visible and infrared signals,complex thermal fields,and interference frombackground objects,all ofwhichmake it difficult to accurately identify trapped individuals.To address this problem,we propose VIF-YOLO,a visible–infrared fusion model for real-time human detection in dense smoke environments.The framework introduces a lightweight multimodal fusion(LMF)module based on learnable low-rank representation blocks to end-to-end integrate visible and infrared images,preserving fine details while enhancing salient features.In addition,an efficient multiscale attention(EMA)mechanism is incorporated into the YOLOv10n backbone to improve feature representation under low-light conditions.Extensive experiments on our newly constructedmultimodal smoke human detection(MSHD)dataset demonstrate thatVIF-YOLOachievesmAP50 of 99.5%,precision of 99.2%,and recall of 99.3%,outperforming YOLOv10n by a clear margin.Furthermore,when deployed on the NVIDIA Jetson Xavier NX,VIF-YOLO attains 40.6 FPS with an average inference latency of 24.6 ms,validating its real-time capability on edge-computing platforms.These results confirm that VIF-YOLO provides accurate,robust,and fast detection across complex backgrounds and diverse smoke conditions,ensuring reliable and rapid localization of individuals in need of rescue.
文摘Gastrointestinal(GI)cancers remain a leading cause of cancer-related morbidity and mortality worldwide.Artificial intelligence(AI),particularly machine learning and deep learning(DL),has shown promise in enhancing cancer detection,diagnosis,and prognostication.A narrative review of literature published from January 2015 to march 2025 was conducted using PubMed,Web of Science,and Scopus.Search terms included"gastrointestinal cancer","artificial intelligence","machine learning","deep learning","radiomics","multimodal detection"and"predictive modeling".Studies were included if they focused on clinically relevant AI applications in GI oncology.AI algorithms for GI cancer detection have achieved high performance across imaging modalities,with endoscopic DL systems reporting accuracies of 85%-97%for polyp detection and segmentation.Radiomics-based models have predicted molecular biomarkers such as programmed cell death ligand 2 expression with area under the curves up to 0.92.Large language models applied to radiology reports demonstrated diagnostic accuracy comparable to junior radiologists(78.9%vs 80.0%),though without incremental value when combined with human interpretation.Multimodal AI approaches integrating imaging,pathology,and clinical data show emerging potential for precision oncology.AI in GI oncology has reached clinically relevant accuracy levels in multiple diagnostic tasks,with multimodal approaches and predictive biomarker modeling offering new opportunities for personalized care.However,broader validation,integration into clinical workflows,and attention to ethical,legal,and social implications remain critical for widespread adoption.
基金Supported by Japan Society for the Promotion of Science,No.24K11935.
文摘This review comprehensively summarized the potential of artificial intelligence(AI)in the management of esophageal cancer.It highlighted the significance of AI-assisted endoscopy in Japan where endoscopy is central to both screening and diagnosis.For the clinical adaptation of AI,several challenges remain for its effective translation.The establishment of high-quality clinical databases,such as the National Clinical Database and Japan Endoscopy Database in Japan,which covers almost all cases of esophageal cancer,is essential for validating multimodal AI models.This requires rigorous external validation using diverse datasets,including those from different endoscope manufacturers and image qualities.Furthermore,endoscopists’skills significantly affect diagnostic accuracy,suggesting that AI should serve as a supportive tool rather than a replacement.Addressing these challenges,along with country-specific legal and ethical considerations,will facilitate the successful integration of multimodal AI into the management of esophageal cancer,particularly in endoscopic diagnosis,and contribute to improved patient outcomes.Although this review focused on Japan as a case study,the challenges and solutions described are broadly applicable to other high-incidence regions.
基金supported by the National Natural Science Foundation of China (Nos.22208218,22078196,and 22278268)the Natural Science Foundation of Shanghai (No.22ZR1460400)Collaborative Innovation Center of Fragrance Flavour and Cosmetics,and Collaborative Innovation Project of Shanghai Institute of Technology (No.XTCX2023-07)。
文摘The diagnostic efficacy of contemporary bioimaging technologies remains constrained by inherent limitations of conventional imaging agents,including suboptimal sensitivity,off-target biodistribution,and inherent cytotoxicity.These limitations have catalyzed the development of intelligent stimuli-responsive block copolymers-based bioimaging agents,which was engineered to dynamically respond to endogenous biochemical cues(e.g.,p H gradients,redox potential,enzyme activity,hypoxia environment) or exogenous physical triggers(e.g.,photoirradiation,thermal gradients,ultrasound(US)/magnetic stimuli).Through spatiotemporally controlled structural transformations,stimuli-responsive block copolymers enable precise contrast targeting,activatable signal amplification,and theranostic integration,thereby substantially enhancing signal-to-noise ratios of bioimaging and diagnostic specificity.Hence,this mini-review systematically examines molecular engineering principles for designing p H-,redox-,enzyme-,light-,thermo-,and US/magnetic-responsive polymers,with emphasis on structure-property relationships governing imaging performance modulation.Furthermore,we critically analyze emerging strategies for optical imaging,US synergies,and magnetic resonance imaging(MRI).Multimodal bioimaging has also been elaborated,which could overcome the inherent trade-offs between resolution,penetration depth,and functional specificity in single-modal approaches.By elucidating mechanistic insights and translational challenges,this mini-review aims to establish a design framework of stimuli-responsive block copolymersbased for high fidelity bioimaging agents and accelerate their clinical translation in precise diagnosis and therapy.
文摘Diabetes mellitus represents a major global health issue,driving the need for noninvasive alternatives to traditional blood glucose monitoring methods.Recent advancements in wearable technology have introduced skin-interfaced biosensors capable of analyzing sweat and skin biomarkers,providing innovative solutions for diabetes diagnosis and monitoring.This review comprehensively discusses the current developments in noninvasive wearable biosensors,emphasizing simultaneous detection of biochemical biomarkers(such as glucose,cortisol,lactate,branched-chain amino acids,and cytokines)and physiological signals(including heart rate,blood pressure,and sweat rate)for accurate,personalized diabetes management.We explore innovations in multimodal sensor design,materials science,biorecognition elements,and integration techniques,highlighting the importance of advanced data analytics,artificial intelligence-driven predictive algorithms,and closed-loop therapeutic systems.Additionally,the review addresses ongoing challenges in biomarker validation,sensor stability,user compliance,data privacy,and regulatory considerations.A holistic,multimodal approach enabled by these next-generation wearable biosensors holds significant potential for improving patient outcomes and facilitating proactive healthcare interventions in diabetes management.
基金supported in part by the National Natural Science Foundation of China under Grants 52475102 and 52205101in part by the Guangdong Basic and Applied Basic Research Foundation under Grant 2023A1515240021+1 种基金in part by the Young Talent Support Project of Guangzhou Association for Science and Technology(QT-2024-28)in part by the Youth Development Initiative of Guangdong Association for Science and Technology(SKXRC2025254).
文摘To ensure the safe and stable operation of rotating machinery,intelligent fault diagnosis methods hold significant research value.However,existing diagnostic approaches largely rely on manual feature extraction and expert experience,which limits their adaptability under variable operating conditions and strong noise environments,severely affecting the generalization capability of diagnostic models.To address this issue,this study proposes a multimodal fusion fault diagnosis framework based on Mel-spectrograms and automated machine learning(AutoML).The framework first extracts fault-sensitive Mel time–frequency features from acoustic signals and fuses them with statistical features of vibration signals to construct complementary fault representations.On this basis,automated machine learning techniques are introduced to enable end-to-end diagnostic workflow construction and optimal model configuration acquisition.Finally,diagnostic decisions are achieved by automatically integrating the predictions of multiple high-performance base models.Experimental results on a centrifugal pump vibration and acoustic dataset demonstrate that the proposed framework achieves high diagnostic accuracy under noise-free conditions and maintains strong robustness under noisy interference,validating its efficiency,scalability,and practical value for rotating machinery fault diagnosis.
基金supported by the DH2025-TN07-07 project conducted at the Thai Nguyen University of Information and Communication Technology,Thai Nguyen,Vietnam,with additional support from the AI in Software Engineering Lab.
文摘It remains difficult to automate the creation and validation of Unified Modeling Language(UML)dia-grams due to unstructured requirements,limited automated pipelines,and the lack of reliable evaluation methods.This study introduces a cohesive architecture that amalgamates requirement development,UML synthesis,and multimodal validation.First,LLaMA-3.2-1B-Instruct was utilized to generate user-focused requirements.Then,DeepSeek-R1-Distill-Qwen-32B applies its reasoning skills to transform these requirements into PlantUML code.Using this dual-LLM pipeline,we constructed a synthetic dataset of 11,997 UML diagrams spanning six major diagram families.Rendering analysis showed that 89.5%of the generated diagrams compile correctly,while invalid cases were detected automatically.To assess quality,we employed a multimodal scoring method that combines Qwen2.5-VL-3B,LLaMA-3.2-11B-Vision-Instruct and Aya-Vision-8B,with weights based on MMMU performance.A study with 94 experts revealed strong alignment between automatic and manual evaluations,yielding a Pearson correlation of r=0.82 and a Fleiss’Kappa of 0.78.This indicates a high degree of concordance between automated metrics and human judgment.Overall,the results demonstrated that our scoring system is effective and that the proposed generation pipeline produces UML diagrams that are both syntactically correct and semantically coherent.More broadly,the system provides a scalable and reproducible foundation for future work in AI-driven software modeling and multimodal verification.
基金supported by grants from the Hangzhou Key Project for Agricultural and Social Development under Grant No.20231203A12(JZ)the General Program of the Scientific Research Special Project for Post-Marketing Clinical Research of Innovative Drugs,Development Center for Medical Science&Technology,National Health Commission of the People’s Republic of China under Grant No.WKZX2024CX104202(JZ).
文摘Artificial intelligence(AI)is transforming the diagnostic landscape of malignant tumors in the urinary system,including prostate cancer,bladder cancer,and renal cell carcinoma(RCC).By integrating imaging,pathology,and molecular data,AI enhances the precision and reproducibility of tumor detection,grading,and risk stratification.In prostate cancer,AI-assisted multiparametric Magnetic resonance imaging(MRI)and digital pathology systems improve lesion localization and Gleason scoring.For bladder cancer,deep learning-based cystoscopy and radiomics models from Computed tomography/magnetic resonance imaging(CT/MRI)enable real-time lesion segmentation and non-invasive biomarker prediction,such as Programmed Cell Death-Ligand 1(PD-L1)expression.In RCC,AI,combined with CT/MRI and multi-omics data,aids in subtype classification and prognostic prediction,supporting personalized therapy.However,despite these promising advances,challenges such as data standardization,model generalizability,interpretability,and regulatory compliance hinder AI’s clinical translation.This review outlines the current state of AI in urological cancer diagnosis and prognosis,its technological innovations,and the clinical challenges and opportunities that lie ahead.
基金The authors extend their appreciation to Prince Sattam bin Abdulaziz University for funding this research work through the project number(PSAU/2024/01/32082).
文摘In Human–Robot Interaction(HRI),generating robot trajectories that accurately reflect user intentions while ensuring physical realism remains challenging,especially in unstructured environments.In this study,we develop a multimodal framework that integrates symbolic task reasoning with continuous trajectory generation.The approach employs transformer models and adversarial training to map high-level intent to robotic motion.Information from multiple data sources,such as voice traits,hand and body keypoints,visual observations,and recorded paths,is integrated simultaneously.These signals are mapped into a shared representation that supports interpretable reasoning while enabling smooth and realistic motion generation.Based on this design,two different learning strategies are investigated.In the first step,grammar-constrained Linear Temporal Logic(LTL)expressions are created from multimodal human inputs.These expressions are subsequently decoded into robot trajectories.The second method generates trajectories directly from symbolic intent and linguistic data,bypassing an intermediate logical representation.Transformer encoders combine multiple types of information,and autoregressive transformer decoders generate motion sequences.Adding smoothness and speed limits during training increases the likelihood of physical feasibility.To improve the realism and stability of the generated trajectories during training,an adversarial discriminator is also included to guide them toward the distribution of actual robot motion.Tests on the NATSGLD dataset indicate that the complete system exhibits stable training behaviour and performance.In normalised coordinates,the logic-based pipeline has an Average Displacement Error(ADE)of 0.040 and a Final Displacement Error(FDE)of 0.036.The adversarial generator makes substantially more progress,reducing ADE to 0.021 and FDE to 0.018.Visual examination confirms that the generated trajectories closely align with observed motion patterns while preserving smooth temporal dynamics.
基金Supported by the National Natural Science Foundation of China(NSFC)under Grants 62025104,62422102,62331005,62301034,and U22A2052the Beijing Natural Science Foundation-Daxing Innovation Joint Fund(L256040).
文摘Surgical navigation has evolved significantly through advances in augmented reality,virtual reality,and mixed reality,improving precision and safety across many clinical applications,including neurosurgery,maxillofacial,spinal,and arthroplasty procedures.By integrating preoperative imaging with real-time intraoperative data,these systems provide dynamic guidance,reduce radiation exposure,and minimize tissue damage.Key challenges persist,including intraoperative registration accuracy,flexible tissue deformation,respiratory compensation,and real-time imaging quality.Emerging solutions include artificial intelligence-driven segmentation,deformation-field modeling,and hybrid registration techniques.Future developments will include lightweight,portable systems,improved non-rigid registration algorithms,and greater clinical adoption.Despite advances in rigid-tissue applications,soft-tissue navigation requires additional innovation to address motion variability and registration reliability,ultimately advancing minimally invasive surgery and precision medicine.
基金supported by the National Natural Science Foundation of China(Grant No.52405041)the Major Program of the Zhejiang Provincial Natural Science Foundation of China(Grant No.LD25E050001)the Key R&D Program of Zhejiang Province(Grant No.2025C01186)。
文摘Humanoid robots hold significant promise for social interaction and emotional companionship.However,their effectiveness hinges on the ability to convey nuanced and authentic emotions.Here,we presented a universal humanoid robot head with a facial kinematics model.Using a reinforcement learning framework guided by symmetry assessment,emotion decoupling,and MLLM authenticity evaluation,our system autonomously learns to generate adaptive facial expressions through dynamic landmark adjustments.By transferring the simulation training results to real-world environments,the robot can perform natural and expressive expressions.Another novel feature is the independent regulation of emotion intensity and expression magnitude across emotional categories,which enhances the ability to achieve culturally adaptive and socially resonant robotic expressions significantly.This research advances adaptive humanoid interaction,offering an easier and more efficient pathway toward culturally resonant and psychologically plausible robotic expressions.
文摘Background:Medical imaging advancements are constrained by fundamental trade-offs between acquisition speed,radiation dose,and image quality,forcing clinicians to work with noisy,incomplete data.Existing reconstruction methods either compromise on accuracy with iterative algorithms or suffer from limited generalizability with task-specific deep learning approaches.Methods:We present LDM-PIR,a lightweight physics-conditioned diffusion multi-model for medical image reconstruction that addresses key challenges in magnetic resonance imaging(MRI),CT,and low-photon imaging.Unlike traditional iterative methods,which are computationally expensive,or task-specific deep learning approaches lacking generalizability,integrates three innovations.A physics-conditioned diffusion framework that embeds acquisition operators(Fourier/Radon transforms)and noise models directly into the reconstruction process.A multi-model architecture that unifies denoising,inpainting,and super-resolution via shared weight conditioning.A lightweight design(2.1M parameters)enabling rapid inference(0.8s/image on GPU).Through self-supervised fine-tuning with measurement consistency losses adapts to new imaging modalities using fewer annotated samples.Results:Achieves state-of-the-art performance on fastMRI(peak signal-to-noise ratio(PSNR):34.04 for single-coil/31.50 for multi-coil)and Lung Image Database Consortium and Image Database Resource Initiative(28.83 PSNR under Poisson noise).Clinical evaluations demonstrate superior preservation of anatomical structures,with SSIM improvements of 8.8%for single-coil and 4.36%for multi-coil MRI over uDPIR.Conclusion:It offers a flexible,efficient,and scalable solution for medical image reconstruction,addressing the challenges of noise,undersampling,and modality generalization.The model’s lightweight design allows for rapid inference,while its self-supervised fine-tuning capability minimizes reliance on large annotated datasets,making it suitable for real-world clinical applications.
文摘The problem of fake news detection(FND)is becoming increasingly important in the field of natural language processing(NLP)because of the rapid dissemination of misleading information on the web.Large language models(LLMs)such as GPT-4.Zero excels in natural language understanding tasks but can still struggle to distinguish between fact and fiction,particularly when applied in the wild.However,a key challenge of existing FND methods is that they only consider unimodal data(e.g.,images),while more detailed multimodal data(e.g.,user behaviour,temporal dynamics)is neglected,and the latter is crucial for full-context understanding.To overcome these limitations,we introduce M3-FND(Multimodal Misinformation Mitigation for False News Detection),a novel methodological framework that integrates LLMs with multimodal data sources to perform context-aware veracity assessments.Our method proposes a hybrid system that combines image-text alignment,user credibility profiling,and temporal pattern recognition,which is also strengthened through a natural feedback loop that provides real-time feedback for correcting downstream errors.We use contextual reinforcement learning to schedule prompt updating and update the classifier threshold based on the latest multimodal input,which enables the model to better adapt to changing misinformation attack strategies.M3-FND is tested on three diverse datasets,FakeNewsNet,Twitter15,andWeibo,which contain both text and visual socialmedia content.Experiments showthatM3-FND significantly outperforms conventional and LLMbased baselines in terms of accuracy,F1-score,and AUC on all benchmarks.Our results indicate the importance of employing multimodal cues and adaptive learning for effective and timely detection of fake news.