Background Frailty is common and significantly impacts prognosis in heart failure(HF). The Vulnerable Elders Survey-13(VES-13), widely used in oncogeriatrics and public health, remains unexplored as a frailty screenin...Background Frailty is common and significantly impacts prognosis in heart failure(HF). The Vulnerable Elders Survey-13(VES-13), widely used in oncogeriatrics and public health, remains unexplored as a frailty screening tool in HF outpatients. In this study, we prospectively evaluated VES-13 against a multimodal screening assessment in detecting frailty and predicting individual risk of adverse prognosis.Methods Frailty was assessed at the initial visit using both a multimodal approach, incorporating Barthel Index, Older American Resources and Services scale, Pfeiffer Test, abbreviated Geriatric Depression Scale, age > 85 years, lacking support systems,and VES-13. Patients scoring ≥ 3 on VES-13 or meeting at least one multimodal criterion were classified as frail. Endpoints included all-cause mortality, a composite of death or HF hospitalization, and recurrent HF hospitalizations.Results A total of 301 patients were evaluated. VES-13 identified 40.2% as frail and the multimodal assessment 33.2%. In Cox regression analyses, frailty identified by VES-13 showed greater prognostic significance than the multimodal assessment for allcause mortality(HR = 3.70 [2.15–6.33], P < 0.001 vs. 2.40 [1.46–4.0], P = 0.001) and the composite endpoint(HR = 3.13 [2.02–4.84], P< 0.001 vs. 1.96 [1.28–2.99], P = 0.002). Recurrent HF hospitalizations were four times more frequent in VES-13 frail patients while two times in those identified as frail by the multimodal assessment. Additionally, stratifying patients by VES-13 tertiles provided robust risk differentiation.Conclusions VES-13, a simple frailty tool, outperformed a comprehensive multimodal assessment and could be easily integrated into routine HF care, highlighting its clinical utility in identifying patients at risk for poor outcomes.展开更多
Multimodal emotion recognition has emerged as a key research area for enabling human-centered artificial intelligence,supported by the rapid progress in vision,audio,language,and physiological modeling.Existing approa...Multimodal emotion recognition has emerged as a key research area for enabling human-centered artificial intelligence,supported by the rapid progress in vision,audio,language,and physiological modeling.Existing approaches integrate heterogeneous affective cues through diverse embedding strategies and fusion mechanisms,yet the field remains fragmented due to differences in feature alignment,temporal synchronization,modality reliability,and robustness to noise or missing inputs.This survey provides a comprehensive analysis of MER research from 2021 to 2025,consolidating advances in modality-specific representation learning,cross-modal feature construction,and early,late,and hybrid fusion paradigms.We systematically review visual,acoustic,textual,and sensor-based embeddings,highlighting howpre-trained encoders,self-supervised learning,and large languagemodels have reshaped the representational foundations ofMER.We further categorize fusion strategies by interaction depth and architectural design,examining how attention mechanisms,cross-modal transformers,adaptive gating,and multimodal large language models redefine the integration of affective signals.Finally,we summarize major benchmark datasets and evaluation metrics and discuss emerging challenges related to scalability,generalization,and interpretability.This survey aims to provide a unified perspective onmultimodal fusion for emotion recognition and to guide future research toward more coherent and generalizable multimodal affective intelligence.展开更多
Hepatocellular carcinoma presents with three distinct immune phenotypes,including immune-desert,immune-excluded,and immune-inflamed,indicating various treatment responses and prognostic outcomes.The clinical applicati...Hepatocellular carcinoma presents with three distinct immune phenotypes,including immune-desert,immune-excluded,and immune-inflamed,indicating various treatment responses and prognostic outcomes.The clinical application of multi-omics parameters is still restricted by the expensive and less accessible assays,although they accurately reflect immune status.A comprehensive evaluation framework based on“easy-to-obtain”multi-model clinical parameters is urgently required,incorporating clinical features to establish baseline patient profiles and disease staging;routine blood tests assessing systemic metabolic and functional status;immune cell subsets quantifying subcluster dynamics;imaging features delineating tumor morphology,spatial configuration,and perilesional anatomical relationships;immunohistochemical markers positioning qualitative and quantitative detection of tumor antigens from the cellular and molecular level.This integrated phenomic approach aims to improve prognostic stratification and clinical decision-making in hepatocellular carcinoma management conveniently and practically.展开更多
Since the first design of tactile sensors was proposed by Harmon in 1982,tactile sensors have evolved through four key phases:industrial applications(1980s,basic pressure detection),miniaturization via MEMS(1990s),fle...Since the first design of tactile sensors was proposed by Harmon in 1982,tactile sensors have evolved through four key phases:industrial applications(1980s,basic pressure detection),miniaturization via MEMS(1990s),flexible electronics(2010s,stretchable materials),and intelligent systems(2020s-present,AI-driven multimodal sensing).With the innovation of material,processing techniques,and multimodal fusion of stimuli,the application of tactile sensors has been continuously expanding to a diversity of areas,including but not limited to medical care,aerospace,sports and intelligent robots.Currently,researchers are dedicated to develop tactile sensors with emerging mechanisms and structures,pursuing high-sensitivity,high-resolution,and multimodal characteristics and further constructing tactile systems which imitate and approach the performance of human organs.However,challenges in the combination between the theoretical research and the practical applications are still significant.There is a lack of comprehensive understanding in the state of the art of such knowledge transferring from academic work to technical products.Scaled-up production of laboratory materials faces fatal challenges like high costs,small scale,and inconsistent quality.Ambient factors,such as temperature,humidity,and electromagnetic interference,also impair signal reliability.Moreover,tactile sensors must operate across a wide pressure range(0.1 k Pa to several or even dozens of MPa)to meet diverse application needs.Meanwhile,the existing algorithms,data models and sensing systems commonly reveal insufficient precision as well as undesired robustness in data processing,and there is a realistic gap between the designed and the demanded system response speed.In this review,oriented by the design requirements of intelligent tactile sensing systems,we summarize the common sensing mechanisms,inspired structures,key performance,and optimizing strategies,followed by a brief overview of the recent advances in the perspectives of system integration and algorithm implementation,and the possible roadmap of future development of tactile sensors,providing a forward-looking as well as critical discussions in the future industrial applications of flexible tactile sensors.展开更多
As emerging two-dimensional(2D)materials,carbides and nitrides(MXenes)could be solid solutions or organized structures made up of multi-atomic layers.With remarkable and adjustable electrical,optical,mechanical,and el...As emerging two-dimensional(2D)materials,carbides and nitrides(MXenes)could be solid solutions or organized structures made up of multi-atomic layers.With remarkable and adjustable electrical,optical,mechanical,and electrochemical characteristics,MXenes have shown great potential in brain-inspired neuromorphic computing electronics,including neuromorphic gas sensors,pressure sensors and photodetectors.This paper provides a forward-looking review of the research progress regarding MXenes in the neuromorphic sensing domain and discussed the critical challenges that need to be resolved.Key bottlenecks such as insufficient long-term stability under environmental exposure,high costs,scalability limitations in large-scale production,and mechanical mismatch in wearable integration hinder their practical deployment.Furthermore,unresolved issues like interfacial compatibility in heterostructures and energy inefficiency in neu-romorphic signal conversion demand urgent attention.The review offers insights into future research directions enhance the fundamental understanding of MXene properties and promote further integration into neuromorphic computing applications through the convergence with various emerging technologies.展开更多
Business Process Modelling(BPM)is essential for analyzing,improving,and automating the flow of information within organizations,but traditional approaches based on manual interpretation are slow,error-prone,and requir...Business Process Modelling(BPM)is essential for analyzing,improving,and automating the flow of information within organizations,but traditional approaches based on manual interpretation are slow,error-prone,and require a high level of expertise.This article proposes an innovative alternative solution that overcomes these limitations by automatically generating comprehensive Business Process Modelling and Notation(BPMN)diagrams solely from verbal descriptions of the processes to be modeled,utilizing Large Language Models(LLMs)and multimodal Artificial Intelligence(AI).Experimental results,based on video recordings of process explanations provided by an expert from an organization(in this case,the Commercial Courts of a public justice administration),demonstrate that the proposed methodology successfully enables the automatic generation of complete and accurate BPMN diagrams,leading to significant improvements in the speed,accuracy,and accessibility of process modeling.This research makes a substantial contribution to the field of business process modeling,as its methodology is groundbreaking in its use of LLMs and multimodal AI capabilities to handle different types of source material(text and video),combining several tools to minimize the number of queries and reduce the complexity of the prompts required for the automatic generation of successful BPMN diagrams.展开更多
Spectrum sensing is an indispensable core part of cognitive radio dynamic spectrum access(DSA)and a key approach to alleviating spectrum scarcity in the Internet of Things(IoT).The key issue in practical IoT networks ...Spectrum sensing is an indispensable core part of cognitive radio dynamic spectrum access(DSA)and a key approach to alleviating spectrum scarcity in the Internet of Things(IoT).The key issue in practical IoT networks is robust sensing under the coexistence of low signal-to-noise ratios(SNRs)and non-Gaussian impulsive noise,where observations may be distorted differently across feature modalities,making conventional fusion unstable and degrading detection reliability.To address this challenge,the generalized Gaussian distribution(GGD)is adopted as the noise model,and a multimodal fusion framework termed BCAM-Net(bidirectional cross-attention multimodal network)is proposed.BCAM-Net adopts a parallel dual-branch architecture:a time-frequency branch that leverages the continuous wavelet transform(CWT)to extract time-frequency representations,and a temporal branch that learns long-range dependencies from raw signals.BCAM-Net utilizes a bidirectional cross-attention mechanism to achieve deep alignment and mutual calibration of temporal and time-frequency features,generating a fused representation that is highly robust to complex noise.Simulation results show that,under GGD noise with shape parameterβ=0.5,BCAM-Net achieves high detection probabilities in the low-SNR regime and outperforms representative baselines.At a false alarm probability Pf=0.1 and SNR of−14 dB,it attains a detection probability of 0.9020,exceeding the CNN-Transformer,WT-ResNet,TFCFN,and conventional CNN benchmarks by 5.75%,6.98%,33.3%,and 21.1%,respectively.These results indicate that BCAM-Net can effectively improve spectrum sensing performance in low-SNR impulsive-noise scenarios,and provides a lightweight,high-performance solution for practical cognitive radio spectrum sensing.展开更多
Arrhythmias are a frequently occurring phenomenon in clinical practice,but how to accurately dis-tinguish subtle rhythm abnormalities remains an ongoing difficulty faced by the entire research community when conductin...Arrhythmias are a frequently occurring phenomenon in clinical practice,but how to accurately dis-tinguish subtle rhythm abnormalities remains an ongoing difficulty faced by the entire research community when conducting ECG-based studies.From a review of existing studies,two main factors appear to contribute to this problem:the uneven distribution of arrhythmia classes and the limited expressiveness of features learned by current models.To overcome these limitations,this study proposes a dual-path multimodal framework,termed DM-EHC(Dual-Path Multimodal ECG Heartbeat Classifier),for ECG-based heartbeat classification.The proposed framework links 1D ECG temporal features with 2D time–frequency features.By setting up the dual paths described above,the model can process more dimensions of feature information.The MIT-BIH arrhythmia database was selected as the baseline dataset for the experiments.Experimental results show that the proposed method outperforms single modalities and performs better for certain specific types of arrhythmias.The model achieved mean precision,recall,and F1 score of 95.14%,92.26%,and 93.65%,respectively.These results indicate that the framework is robust and has potential value in automated arrhythmia classification.展开更多
Detecting fake news in multimodal and multilingual social media environments is challenging due to inherent noise,inter-modal imbalance,computational bottlenecks,and semantic ambiguity.To address these issues,we propo...Detecting fake news in multimodal and multilingual social media environments is challenging due to inherent noise,inter-modal imbalance,computational bottlenecks,and semantic ambiguity.To address these issues,we propose SparseMoE-MFN,a novel unified framework that integrates sparse attention with a sparse-activated Mixture of-Experts(MoE)architecture.This framework aims to enhance the efficiency,inferential depth,and interpretability of multimodal fake news detection.Sparse MoE-MFN leverages LLaVA-v1.6-Mistral-7B-HF for efficient visual encoding and Qwen/Qwen2-7B for text processing.The sparse attention module adaptively filters irrelevant tokens and focuses on key regions,reducing computational costs and noise.The sparse MoE module dynamically routes inputs to specialized experts(visual,language,cross-modal alignment)based on content heterogeneity.This expert specialization design boosts computational efficiency and semantic adaptability,enabling precise processing of complex content and improving performance on ambiguous categories.Evaluated on the large-scale,multilingualMR2 dataset,SparseMoEMFN achieves state-of-the-art performance.It obtains an accuracy of 86.7%and a macro-averaged F1 score of 0.859,outperforming strong baselines like MiniGPT-4 by 3.4%and 3.2%,respectively.Notably,it shows significant advantages in the“unverified”category.Furthermore,SparseMoE-MFN demonstrates superior computational efficiency,with an average inference latency of 89.1 ms and 95.4 GFLOPs,substantially lower than existing models.Ablation studies and visualization analyses confirm the effectiveness of both sparse attention and sparse MoE components in improving accuracy,generalization,and efficiency.展开更多
In fire rescue scenarios,traditional manual operations are highly dangerous,as dense smoke,low visibility,extreme heat,and toxic gases not only hinder rescue efficiency but also endanger firefighters’safety.Although ...In fire rescue scenarios,traditional manual operations are highly dangerous,as dense smoke,low visibility,extreme heat,and toxic gases not only hinder rescue efficiency but also endanger firefighters’safety.Although intelligent rescue robots can enter hazardous environments in place of humans,smoke poses major challenges for human detection algorithms.These challenges include the attenuation of visible and infrared signals,complex thermal fields,and interference frombackground objects,all ofwhichmake it difficult to accurately identify trapped individuals.To address this problem,we propose VIF-YOLO,a visible–infrared fusion model for real-time human detection in dense smoke environments.The framework introduces a lightweight multimodal fusion(LMF)module based on learnable low-rank representation blocks to end-to-end integrate visible and infrared images,preserving fine details while enhancing salient features.In addition,an efficient multiscale attention(EMA)mechanism is incorporated into the YOLOv10n backbone to improve feature representation under low-light conditions.Extensive experiments on our newly constructedmultimodal smoke human detection(MSHD)dataset demonstrate thatVIF-YOLOachievesmAP50 of 99.5%,precision of 99.2%,and recall of 99.3%,outperforming YOLOv10n by a clear margin.Furthermore,when deployed on the NVIDIA Jetson Xavier NX,VIF-YOLO attains 40.6 FPS with an average inference latency of 24.6 ms,validating its real-time capability on edge-computing platforms.These results confirm that VIF-YOLO provides accurate,robust,and fast detection across complex backgrounds and diverse smoke conditions,ensuring reliable and rapid localization of individuals in need of rescue.展开更多
Traditional artificial intelligence(AI)-based methods for breast cancer diagnosis often rely on a single modality,such as ultrasound images.With the rise of multimodal approaches,multiple data sources,including imagin...Traditional artificial intelligence(AI)-based methods for breast cancer diagnosis often rely on a single modality,such as ultrasound images.With the rise of multimodal approaches,multiple data sources,including imaging from diverse medical modalities,structured clinical information,and unstructured medical reports,are increasingly integrated to provide richer and more informative signals for model training.This survey reviews the data modalities employed in AI-based breast cancer research,examines common multimodal combinations and fusion strategies,and discusses their applications across clinical tasks such as diagnosis,treatment planning,and outcome prediction.By consolidating current literature and identifying critical gaps,this survey aims to guide future research toward the development of reliable,clinically relevant multimodal AI systems for use in breast cancer management.展开更多
This review comprehensively summarized the potential of artificial intelligence(AI)in the management of esophageal cancer.It highlighted the significance of AI-assisted endoscopy in Japan where endoscopy is central to...This review comprehensively summarized the potential of artificial intelligence(AI)in the management of esophageal cancer.It highlighted the significance of AI-assisted endoscopy in Japan where endoscopy is central to both screening and diagnosis.For the clinical adaptation of AI,several challenges remain for its effective translation.The establishment of high-quality clinical databases,such as the National Clinical Database and Japan Endoscopy Database in Japan,which covers almost all cases of esophageal cancer,is essential for validating multimodal AI models.This requires rigorous external validation using diverse datasets,including those from different endoscope manufacturers and image qualities.Furthermore,endoscopists’skills significantly affect diagnostic accuracy,suggesting that AI should serve as a supportive tool rather than a replacement.Addressing these challenges,along with country-specific legal and ethical considerations,will facilitate the successful integration of multimodal AI into the management of esophageal cancer,particularly in endoscopic diagnosis,and contribute to improved patient outcomes.Although this review focused on Japan as a case study,the challenges and solutions described are broadly applicable to other high-incidence regions.展开更多
Gastrointestinal(GI)cancers remain a leading cause of cancer-related morbidity and mortality worldwide.Artificial intelligence(AI),particularly machine learning and deep learning(DL),has shown promise in enhancing can...Gastrointestinal(GI)cancers remain a leading cause of cancer-related morbidity and mortality worldwide.Artificial intelligence(AI),particularly machine learning and deep learning(DL),has shown promise in enhancing cancer detection,diagnosis,and prognostication.A narrative review of literature published from January 2015 to march 2025 was conducted using PubMed,Web of Science,and Scopus.Search terms included"gastrointestinal cancer","artificial intelligence","machine learning","deep learning","radiomics","multimodal detection"and"predictive modeling".Studies were included if they focused on clinically relevant AI applications in GI oncology.AI algorithms for GI cancer detection have achieved high performance across imaging modalities,with endoscopic DL systems reporting accuracies of 85%-97%for polyp detection and segmentation.Radiomics-based models have predicted molecular biomarkers such as programmed cell death ligand 2 expression with area under the curves up to 0.92.Large language models applied to radiology reports demonstrated diagnostic accuracy comparable to junior radiologists(78.9%vs 80.0%),though without incremental value when combined with human interpretation.Multimodal AI approaches integrating imaging,pathology,and clinical data show emerging potential for precision oncology.AI in GI oncology has reached clinically relevant accuracy levels in multiple diagnostic tasks,with multimodal approaches and predictive biomarker modeling offering new opportunities for personalized care.However,broader validation,integration into clinical workflows,and attention to ethical,legal,and social implications remain critical for widespread adoption.展开更多
Audio-visual speaker tracking aims to determine the locations of multiple speakers in the scene by leveraging signals captured from multisensor platforms.Multimodal fusion methods can improve both the accuracy and rob...Audio-visual speaker tracking aims to determine the locations of multiple speakers in the scene by leveraging signals captured from multisensor platforms.Multimodal fusion methods can improve both the accuracy and robustness of speaker tracking.However,in complex multispeaker tracking scenarios,critical challenges such as cross-modal feature discrepancy,weak sound source localisation ambiguity and frequent identity switch errors remain unresolved,which severely hinder the modelling of speaker identity consistency and consequently lead to degraded tracking accuracy and unstable tracking trajectories.To this end,this paper proposes a multimodal multispeaker tracking network using audio-visual contrastive learning(AVCLNet).By integrating heterogeneous modal representations into a unified space through audio-visual contrastive learning,which facilitates cross-modal feature alignment,mitigates cross-modal feature bias and enhances identity-consistent representations.In the audio-visual measurement stage,we design a vision-guided weak sound source weighted enhancement method,which leverages visual cues to establish cross-modal mappings and employs a spatiotemporal dynamic weighted mechanism to improve the detectability of weak sound sources.Furthermore,in the data association phase,a dual geometric constraint strategy is introduced by combining the 2D and 3D spatial geometric information,reducing frequent identity switch errors.Experiments on the AV16.3 and CAV3D datasets show that AVCLNet outperforms state-of-the-art methods,demonstrating superior robustness in multispeaker scenarios.展开更多
Diabetes mellitus represents a major global health issue,driving the need for noninvasive alternatives to traditional blood glucose monitoring methods.Recent advancements in wearable technology have introduced skin-in...Diabetes mellitus represents a major global health issue,driving the need for noninvasive alternatives to traditional blood glucose monitoring methods.Recent advancements in wearable technology have introduced skin-interfaced biosensors capable of analyzing sweat and skin biomarkers,providing innovative solutions for diabetes diagnosis and monitoring.This review comprehensively discusses the current developments in noninvasive wearable biosensors,emphasizing simultaneous detection of biochemical biomarkers(such as glucose,cortisol,lactate,branched-chain amino acids,and cytokines)and physiological signals(including heart rate,blood pressure,and sweat rate)for accurate,personalized diabetes management.We explore innovations in multimodal sensor design,materials science,biorecognition elements,and integration techniques,highlighting the importance of advanced data analytics,artificial intelligence-driven predictive algorithms,and closed-loop therapeutic systems.Additionally,the review addresses ongoing challenges in biomarker validation,sensor stability,user compliance,data privacy,and regulatory considerations.A holistic,multimodal approach enabled by these next-generation wearable biosensors holds significant potential for improving patient outcomes and facilitating proactive healthcare interventions in diabetes management.展开更多
The diagnostic efficacy of contemporary bioimaging technologies remains constrained by inherent limitations of conventional imaging agents,including suboptimal sensitivity,off-target biodistribution,and inherent cytot...The diagnostic efficacy of contemporary bioimaging technologies remains constrained by inherent limitations of conventional imaging agents,including suboptimal sensitivity,off-target biodistribution,and inherent cytotoxicity.These limitations have catalyzed the development of intelligent stimuli-responsive block copolymers-based bioimaging agents,which was engineered to dynamically respond to endogenous biochemical cues(e.g.,p H gradients,redox potential,enzyme activity,hypoxia environment) or exogenous physical triggers(e.g.,photoirradiation,thermal gradients,ultrasound(US)/magnetic stimuli).Through spatiotemporally controlled structural transformations,stimuli-responsive block copolymers enable precise contrast targeting,activatable signal amplification,and theranostic integration,thereby substantially enhancing signal-to-noise ratios of bioimaging and diagnostic specificity.Hence,this mini-review systematically examines molecular engineering principles for designing p H-,redox-,enzyme-,light-,thermo-,and US/magnetic-responsive polymers,with emphasis on structure-property relationships governing imaging performance modulation.Furthermore,we critically analyze emerging strategies for optical imaging,US synergies,and magnetic resonance imaging(MRI).Multimodal bioimaging has also been elaborated,which could overcome the inherent trade-offs between resolution,penetration depth,and functional specificity in single-modal approaches.By elucidating mechanistic insights and translational challenges,this mini-review aims to establish a design framework of stimuli-responsive block copolymersbased for high fidelity bioimaging agents and accelerate their clinical translation in precise diagnosis and therapy.展开更多
Foundation models are reshaping artificial intelligence,yet their deployment in specialised domains such as agricultural question answering(AQA)still faces challenges including data scarcity and barriers to domainspec...Foundation models are reshaping artificial intelligence,yet their deployment in specialised domains such as agricultural question answering(AQA)still faces challenges including data scarcity and barriers to domainspecific knowledge.To systematically review recent progress in this area,this paper adopts a task–paradigmperspective and examines applications across three major AQA task families.For text-based QA,we analyse the strengths and limitations of retrieval-based,generative,and hybrid approaches built on large languagemodels,revealing a clear trend toward hybrid paradigms that balance precision and flexibility.For visual diagnosis,we discuss techniques such as crossmodal alignment and prompt-driven generation,which are pushing systems beyond simple pest and disease recognition toward deeper causal reasoning.Formultimodal reasoning,we show how the fusion of heterogeneous data—including text,images,speech,and sensor streams—enables comprehensive decision-making for diagnosis,monitoring,and yield prediction.To address the lack of unified benchmarks,we further propose a standardised evaluation protocol and a diagnostic taxonomy specifically designed to characterise agriculture-specific errors.Finally,we outline a concreteAQA roadmap that emphasises safety alignment,hallucination control,and lightweight deployment,aiming to guide future systems toward greater efficiency,trustworthiness,and sustainability.展开更多
The rapid evolution of the autonomous driving industry has led to a surge in electronic units and applications,resulting in increased in-vehicle data traffic and higher demands for communication efficiency and securit...The rapid evolution of the autonomous driving industry has led to a surge in electronic units and applications,resulting in increased in-vehicle data traffic and higher demands for communication efficiency and security.Meanwhile,safe driving necessitates further development of in-vehicle thermal management systems,as traditional point-type sensors face deployment challenges due to their limited monitoring range.All-glass multimode fibers(AG-MMFs)emerge as an ideal solution for sensing and transmission.An integrated sensing and communication(ISAC)system based on AG-MMFs has been proposed and experimentally validated for stable and efficient operation across a broad temperature range from-18°C to 122°C,while maintaining strong tolerance to typical vehicle vibrations and connector misalignments.Utilizing a single commercial OM4 fiber,we achieve error-free PAM-4 transmission up to 100 Gb∕s with the aid of forward error correction and precise real-time temperature monitoring over 100 m at the same time.Furthermore,by adopting a looped link structure and a neural network-based denoising algorithm,temperature measuring maintains an average uncertainty and a spatial resolution of 0.1°C and 0.5 m,respectively,even under extreme conditions.Exhibiting such outstanding performance in both transmission and sensing,the ISAC architecture successfully addresses the growing demands for high-capacity in-vehicle networks and distributed thermal monitoring of critical components,while paving the theoretical foundation for“fiber to vehicle.”展开更多
Artificial intelligence(AI)is transforming the diagnostic landscape of malignant tumors in the urinary system,including prostate cancer,bladder cancer,and renal cell carcinoma(RCC).By integrating imaging,pathology,and...Artificial intelligence(AI)is transforming the diagnostic landscape of malignant tumors in the urinary system,including prostate cancer,bladder cancer,and renal cell carcinoma(RCC).By integrating imaging,pathology,and molecular data,AI enhances the precision and reproducibility of tumor detection,grading,and risk stratification.In prostate cancer,AI-assisted multiparametric Magnetic resonance imaging(MRI)and digital pathology systems improve lesion localization and Gleason scoring.For bladder cancer,deep learning-based cystoscopy and radiomics models from Computed tomography/magnetic resonance imaging(CT/MRI)enable real-time lesion segmentation and non-invasive biomarker prediction,such as Programmed Cell Death-Ligand 1(PD-L1)expression.In RCC,AI,combined with CT/MRI and multi-omics data,aids in subtype classification and prognostic prediction,supporting personalized therapy.However,despite these promising advances,challenges such as data standardization,model generalizability,interpretability,and regulatory compliance hinder AI’s clinical translation.This review outlines the current state of AI in urological cancer diagnosis and prognosis,its technological innovations,and the clinical challenges and opportunities that lie ahead.展开更多
It remains difficult to automate the creation and validation of Unified Modeling Language(UML)dia-grams due to unstructured requirements,limited automated pipelines,and the lack of reliable evaluation methods.This stu...It remains difficult to automate the creation and validation of Unified Modeling Language(UML)dia-grams due to unstructured requirements,limited automated pipelines,and the lack of reliable evaluation methods.This study introduces a cohesive architecture that amalgamates requirement development,UML synthesis,and multimodal validation.First,LLaMA-3.2-1B-Instruct was utilized to generate user-focused requirements.Then,DeepSeek-R1-Distill-Qwen-32B applies its reasoning skills to transform these requirements into PlantUML code.Using this dual-LLM pipeline,we constructed a synthetic dataset of 11,997 UML diagrams spanning six major diagram families.Rendering analysis showed that 89.5%of the generated diagrams compile correctly,while invalid cases were detected automatically.To assess quality,we employed a multimodal scoring method that combines Qwen2.5-VL-3B,LLaMA-3.2-11B-Vision-Instruct and Aya-Vision-8B,with weights based on MMMU performance.A study with 94 experts revealed strong alignment between automatic and manual evaluations,yielding a Pearson correlation of r=0.82 and a Fleiss’Kappa of 0.78.This indicates a high degree of concordance between automated metrics and human judgment.Overall,the results demonstrated that our scoring system is effective and that the proposed generation pipeline produces UML diagrams that are both syntactically correct and semantically coherent.More broadly,the system provides a scalable and reproducible foundation for future work in AI-driven software modeling and multimodal verification.展开更多
文摘Background Frailty is common and significantly impacts prognosis in heart failure(HF). The Vulnerable Elders Survey-13(VES-13), widely used in oncogeriatrics and public health, remains unexplored as a frailty screening tool in HF outpatients. In this study, we prospectively evaluated VES-13 against a multimodal screening assessment in detecting frailty and predicting individual risk of adverse prognosis.Methods Frailty was assessed at the initial visit using both a multimodal approach, incorporating Barthel Index, Older American Resources and Services scale, Pfeiffer Test, abbreviated Geriatric Depression Scale, age > 85 years, lacking support systems,and VES-13. Patients scoring ≥ 3 on VES-13 or meeting at least one multimodal criterion were classified as frail. Endpoints included all-cause mortality, a composite of death or HF hospitalization, and recurrent HF hospitalizations.Results A total of 301 patients were evaluated. VES-13 identified 40.2% as frail and the multimodal assessment 33.2%. In Cox regression analyses, frailty identified by VES-13 showed greater prognostic significance than the multimodal assessment for allcause mortality(HR = 3.70 [2.15–6.33], P < 0.001 vs. 2.40 [1.46–4.0], P = 0.001) and the composite endpoint(HR = 3.13 [2.02–4.84], P< 0.001 vs. 1.96 [1.28–2.99], P = 0.002). Recurrent HF hospitalizations were four times more frequent in VES-13 frail patients while two times in those identified as frail by the multimodal assessment. Additionally, stratifying patients by VES-13 tertiles provided robust risk differentiation.Conclusions VES-13, a simple frailty tool, outperformed a comprehensive multimodal assessment and could be easily integrated into routine HF care, highlighting its clinical utility in identifying patients at risk for poor outcomes.
基金supported by the Institute of Information&Communications Technology Planning&Evaluation grant funded by the Korea government(MSIT)(No.RS-2021-II211341,AI Graduate School Support Program,Chung-Ang University)in part by the Institute of Information and Communications Technology Planning and Evaluation grant funded by the Korea government(MSIT)(Development of Integrated Development Framework that Supports Automatic Neural Network Generation and Deployment Optimized for Runtime Environment,Grant No.2021-0-00766).
文摘Multimodal emotion recognition has emerged as a key research area for enabling human-centered artificial intelligence,supported by the rapid progress in vision,audio,language,and physiological modeling.Existing approaches integrate heterogeneous affective cues through diverse embedding strategies and fusion mechanisms,yet the field remains fragmented due to differences in feature alignment,temporal synchronization,modality reliability,and robustness to noise or missing inputs.This survey provides a comprehensive analysis of MER research from 2021 to 2025,consolidating advances in modality-specific representation learning,cross-modal feature construction,and early,late,and hybrid fusion paradigms.We systematically review visual,acoustic,textual,and sensor-based embeddings,highlighting howpre-trained encoders,self-supervised learning,and large languagemodels have reshaped the representational foundations ofMER.We further categorize fusion strategies by interaction depth and architectural design,examining how attention mechanisms,cross-modal transformers,adaptive gating,and multimodal large language models redefine the integration of affective signals.Finally,we summarize major benchmark datasets and evaluation metrics and discuss emerging challenges related to scalability,generalization,and interpretability.This survey aims to provide a unified perspective onmultimodal fusion for emotion recognition and to guide future research toward more coherent and generalizable multimodal affective intelligence.
文摘Hepatocellular carcinoma presents with three distinct immune phenotypes,including immune-desert,immune-excluded,and immune-inflamed,indicating various treatment responses and prognostic outcomes.The clinical application of multi-omics parameters is still restricted by the expensive and less accessible assays,although they accurately reflect immune status.A comprehensive evaluation framework based on“easy-to-obtain”multi-model clinical parameters is urgently required,incorporating clinical features to establish baseline patient profiles and disease staging;routine blood tests assessing systemic metabolic and functional status;immune cell subsets quantifying subcluster dynamics;imaging features delineating tumor morphology,spatial configuration,and perilesional anatomical relationships;immunohistochemical markers positioning qualitative and quantitative detection of tumor antigens from the cellular and molecular level.This integrated phenomic approach aims to improve prognostic stratification and clinical decision-making in hepatocellular carcinoma management conveniently and practically.
基金the financial support of the National Natural Science Foundation of China(NO.52173028)。
文摘Since the first design of tactile sensors was proposed by Harmon in 1982,tactile sensors have evolved through four key phases:industrial applications(1980s,basic pressure detection),miniaturization via MEMS(1990s),flexible electronics(2010s,stretchable materials),and intelligent systems(2020s-present,AI-driven multimodal sensing).With the innovation of material,processing techniques,and multimodal fusion of stimuli,the application of tactile sensors has been continuously expanding to a diversity of areas,including but not limited to medical care,aerospace,sports and intelligent robots.Currently,researchers are dedicated to develop tactile sensors with emerging mechanisms and structures,pursuing high-sensitivity,high-resolution,and multimodal characteristics and further constructing tactile systems which imitate and approach the performance of human organs.However,challenges in the combination between the theoretical research and the practical applications are still significant.There is a lack of comprehensive understanding in the state of the art of such knowledge transferring from academic work to technical products.Scaled-up production of laboratory materials faces fatal challenges like high costs,small scale,and inconsistent quality.Ambient factors,such as temperature,humidity,and electromagnetic interference,also impair signal reliability.Moreover,tactile sensors must operate across a wide pressure range(0.1 k Pa to several or even dozens of MPa)to meet diverse application needs.Meanwhile,the existing algorithms,data models and sensing systems commonly reveal insufficient precision as well as undesired robustness in data processing,and there is a realistic gap between the designed and the demanded system response speed.In this review,oriented by the design requirements of intelligent tactile sensing systems,we summarize the common sensing mechanisms,inspired structures,key performance,and optimizing strategies,followed by a brief overview of the recent advances in the perspectives of system integration and algorithm implementation,and the possible roadmap of future development of tactile sensors,providing a forward-looking as well as critical discussions in the future industrial applications of flexible tactile sensors.
基金supported by the NSFC(12474071)Natural Science Foundation of Shandong Province(ZR2024YQ051,ZR2025QB50)+6 种基金Guangdong Basic and Applied Basic Research Foundation(2025A1515011191)the Shanghai Sailing Program(23YF1402200,23YF1402400)funded by Basic Research Program of Jiangsu(BK20240424)Open Research Fund of State Key Laboratory of Crystal Materials(KF2406)Taishan Scholar Foundation of Shandong Province(tsqn202408006,tsqn202507058)Young Talent of Lifting engineering for Science and Technology in Shandong,China(SDAST2024QTB002)the Qilu Young Scholar Program of Shandong University。
文摘As emerging two-dimensional(2D)materials,carbides and nitrides(MXenes)could be solid solutions or organized structures made up of multi-atomic layers.With remarkable and adjustable electrical,optical,mechanical,and electrochemical characteristics,MXenes have shown great potential in brain-inspired neuromorphic computing electronics,including neuromorphic gas sensors,pressure sensors and photodetectors.This paper provides a forward-looking review of the research progress regarding MXenes in the neuromorphic sensing domain and discussed the critical challenges that need to be resolved.Key bottlenecks such as insufficient long-term stability under environmental exposure,high costs,scalability limitations in large-scale production,and mechanical mismatch in wearable integration hinder their practical deployment.Furthermore,unresolved issues like interfacial compatibility in heterostructures and energy inefficiency in neu-romorphic signal conversion demand urgent attention.The review offers insights into future research directions enhance the fundamental understanding of MXene properties and promote further integration into neuromorphic computing applications through the convergence with various emerging technologies.
基金funded by Fundación CajaCanarias and Fundación Bancaria“la Caixa”,grant number 2023DIG11.
文摘Business Process Modelling(BPM)is essential for analyzing,improving,and automating the flow of information within organizations,but traditional approaches based on manual interpretation are slow,error-prone,and require a high level of expertise.This article proposes an innovative alternative solution that overcomes these limitations by automatically generating comprehensive Business Process Modelling and Notation(BPMN)diagrams solely from verbal descriptions of the processes to be modeled,utilizing Large Language Models(LLMs)and multimodal Artificial Intelligence(AI).Experimental results,based on video recordings of process explanations provided by an expert from an organization(in this case,the Commercial Courts of a public justice administration),demonstrate that the proposed methodology successfully enables the automatic generation of complete and accurate BPMN diagrams,leading to significant improvements in the speed,accuracy,and accessibility of process modeling.This research makes a substantial contribution to the field of business process modeling,as its methodology is groundbreaking in its use of LLMs and multimodal AI capabilities to handle different types of source material(text and video),combining several tools to minimize the number of queries and reduce the complexity of the prompts required for the automatic generation of successful BPMN diagrams.
基金supported in part by JSPS Grants-in-Aid for Scientific Research 25K07742 and 25K23457.
文摘Spectrum sensing is an indispensable core part of cognitive radio dynamic spectrum access(DSA)and a key approach to alleviating spectrum scarcity in the Internet of Things(IoT).The key issue in practical IoT networks is robust sensing under the coexistence of low signal-to-noise ratios(SNRs)and non-Gaussian impulsive noise,where observations may be distorted differently across feature modalities,making conventional fusion unstable and degrading detection reliability.To address this challenge,the generalized Gaussian distribution(GGD)is adopted as the noise model,and a multimodal fusion framework termed BCAM-Net(bidirectional cross-attention multimodal network)is proposed.BCAM-Net adopts a parallel dual-branch architecture:a time-frequency branch that leverages the continuous wavelet transform(CWT)to extract time-frequency representations,and a temporal branch that learns long-range dependencies from raw signals.BCAM-Net utilizes a bidirectional cross-attention mechanism to achieve deep alignment and mutual calibration of temporal and time-frequency features,generating a fused representation that is highly robust to complex noise.Simulation results show that,under GGD noise with shape parameterβ=0.5,BCAM-Net achieves high detection probabilities in the low-SNR regime and outperforms representative baselines.At a false alarm probability Pf=0.1 and SNR of−14 dB,it attains a detection probability of 0.9020,exceeding the CNN-Transformer,WT-ResNet,TFCFN,and conventional CNN benchmarks by 5.75%,6.98%,33.3%,and 21.1%,respectively.These results indicate that BCAM-Net can effectively improve spectrum sensing performance in low-SNR impulsive-noise scenarios,and provides a lightweight,high-performance solution for practical cognitive radio spectrum sensing.
基金supported by the Innovative Human Resource Development for Local Intel-lectualization program through the Institute of Information&Communications Technology Planning&Evaluation(IITP)grant funded by the Korea government(MSIT)(No.IITP-2026-2020-0-01741)the research fund of Hanyang University(HY-2025-1110).
文摘Arrhythmias are a frequently occurring phenomenon in clinical practice,but how to accurately dis-tinguish subtle rhythm abnormalities remains an ongoing difficulty faced by the entire research community when conducting ECG-based studies.From a review of existing studies,two main factors appear to contribute to this problem:the uneven distribution of arrhythmia classes and the limited expressiveness of features learned by current models.To overcome these limitations,this study proposes a dual-path multimodal framework,termed DM-EHC(Dual-Path Multimodal ECG Heartbeat Classifier),for ECG-based heartbeat classification.The proposed framework links 1D ECG temporal features with 2D time–frequency features.By setting up the dual paths described above,the model can process more dimensions of feature information.The MIT-BIH arrhythmia database was selected as the baseline dataset for the experiments.Experimental results show that the proposed method outperforms single modalities and performs better for certain specific types of arrhythmias.The model achieved mean precision,recall,and F1 score of 95.14%,92.26%,and 93.65%,respectively.These results indicate that the framework is robust and has potential value in automated arrhythmia classification.
基金supported by the National Social Science Fund of China(20BXW101).
文摘Detecting fake news in multimodal and multilingual social media environments is challenging due to inherent noise,inter-modal imbalance,computational bottlenecks,and semantic ambiguity.To address these issues,we propose SparseMoE-MFN,a novel unified framework that integrates sparse attention with a sparse-activated Mixture of-Experts(MoE)architecture.This framework aims to enhance the efficiency,inferential depth,and interpretability of multimodal fake news detection.Sparse MoE-MFN leverages LLaVA-v1.6-Mistral-7B-HF for efficient visual encoding and Qwen/Qwen2-7B for text processing.The sparse attention module adaptively filters irrelevant tokens and focuses on key regions,reducing computational costs and noise.The sparse MoE module dynamically routes inputs to specialized experts(visual,language,cross-modal alignment)based on content heterogeneity.This expert specialization design boosts computational efficiency and semantic adaptability,enabling precise processing of complex content and improving performance on ambiguous categories.Evaluated on the large-scale,multilingualMR2 dataset,SparseMoEMFN achieves state-of-the-art performance.It obtains an accuracy of 86.7%and a macro-averaged F1 score of 0.859,outperforming strong baselines like MiniGPT-4 by 3.4%and 3.2%,respectively.Notably,it shows significant advantages in the“unverified”category.Furthermore,SparseMoE-MFN demonstrates superior computational efficiency,with an average inference latency of 89.1 ms and 95.4 GFLOPs,substantially lower than existing models.Ablation studies and visualization analyses confirm the effectiveness of both sparse attention and sparse MoE components in improving accuracy,generalization,and efficiency.
基金funded by the National Natural Science Foundation of China under Grant 62306128the Leading Innovation Project of Changzhou Science and Technology Bureau underGrant CQ20230072+2 种基金the Basic Science Research Project of Jiangsu Provincial Department of Education under Grant 23KJD520003the Science and Technology Development Plan Project of Jilin Provinceunder Grant 20240101382JCthe National KeyR esearch and Development Program of China under Grant 2023YFF1105102.
文摘In fire rescue scenarios,traditional manual operations are highly dangerous,as dense smoke,low visibility,extreme heat,and toxic gases not only hinder rescue efficiency but also endanger firefighters’safety.Although intelligent rescue robots can enter hazardous environments in place of humans,smoke poses major challenges for human detection algorithms.These challenges include the attenuation of visible and infrared signals,complex thermal fields,and interference frombackground objects,all ofwhichmake it difficult to accurately identify trapped individuals.To address this problem,we propose VIF-YOLO,a visible–infrared fusion model for real-time human detection in dense smoke environments.The framework introduces a lightweight multimodal fusion(LMF)module based on learnable low-rank representation blocks to end-to-end integrate visible and infrared images,preserving fine details while enhancing salient features.In addition,an efficient multiscale attention(EMA)mechanism is incorporated into the YOLOv10n backbone to improve feature representation under low-light conditions.Extensive experiments on our newly constructedmultimodal smoke human detection(MSHD)dataset demonstrate thatVIF-YOLOachievesmAP50 of 99.5%,precision of 99.2%,and recall of 99.3%,outperforming YOLOv10n by a clear margin.Furthermore,when deployed on the NVIDIA Jetson Xavier NX,VIF-YOLO attains 40.6 FPS with an average inference latency of 24.6 ms,validating its real-time capability on edge-computing platforms.These results confirm that VIF-YOLO provides accurate,robust,and fast detection across complex backgrounds and diverse smoke conditions,ensuring reliable and rapid localization of individuals in need of rescue.
文摘Traditional artificial intelligence(AI)-based methods for breast cancer diagnosis often rely on a single modality,such as ultrasound images.With the rise of multimodal approaches,multiple data sources,including imaging from diverse medical modalities,structured clinical information,and unstructured medical reports,are increasingly integrated to provide richer and more informative signals for model training.This survey reviews the data modalities employed in AI-based breast cancer research,examines common multimodal combinations and fusion strategies,and discusses their applications across clinical tasks such as diagnosis,treatment planning,and outcome prediction.By consolidating current literature and identifying critical gaps,this survey aims to guide future research toward the development of reliable,clinically relevant multimodal AI systems for use in breast cancer management.
基金Supported by Japan Society for the Promotion of Science,No.24K11935.
文摘This review comprehensively summarized the potential of artificial intelligence(AI)in the management of esophageal cancer.It highlighted the significance of AI-assisted endoscopy in Japan where endoscopy is central to both screening and diagnosis.For the clinical adaptation of AI,several challenges remain for its effective translation.The establishment of high-quality clinical databases,such as the National Clinical Database and Japan Endoscopy Database in Japan,which covers almost all cases of esophageal cancer,is essential for validating multimodal AI models.This requires rigorous external validation using diverse datasets,including those from different endoscope manufacturers and image qualities.Furthermore,endoscopists’skills significantly affect diagnostic accuracy,suggesting that AI should serve as a supportive tool rather than a replacement.Addressing these challenges,along with country-specific legal and ethical considerations,will facilitate the successful integration of multimodal AI into the management of esophageal cancer,particularly in endoscopic diagnosis,and contribute to improved patient outcomes.Although this review focused on Japan as a case study,the challenges and solutions described are broadly applicable to other high-incidence regions.
文摘Gastrointestinal(GI)cancers remain a leading cause of cancer-related morbidity and mortality worldwide.Artificial intelligence(AI),particularly machine learning and deep learning(DL),has shown promise in enhancing cancer detection,diagnosis,and prognostication.A narrative review of literature published from January 2015 to march 2025 was conducted using PubMed,Web of Science,and Scopus.Search terms included"gastrointestinal cancer","artificial intelligence","machine learning","deep learning","radiomics","multimodal detection"and"predictive modeling".Studies were included if they focused on clinically relevant AI applications in GI oncology.AI algorithms for GI cancer detection have achieved high performance across imaging modalities,with endoscopic DL systems reporting accuracies of 85%-97%for polyp detection and segmentation.Radiomics-based models have predicted molecular biomarkers such as programmed cell death ligand 2 expression with area under the curves up to 0.92.Large language models applied to radiology reports demonstrated diagnostic accuracy comparable to junior radiologists(78.9%vs 80.0%),though without incremental value when combined with human interpretation.Multimodal AI approaches integrating imaging,pathology,and clinical data show emerging potential for precision oncology.AI in GI oncology has reached clinically relevant accuracy levels in multiple diagnostic tasks,with multimodal approaches and predictive biomarker modeling offering new opportunities for personalized care.However,broader validation,integration into clinical workflows,and attention to ethical,legal,and social implications remain critical for widespread adoption.
基金supported by the National Natural Science Foundation of China(62403345)the Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology(2024B1212010006)the Shanxi Provincial Department of Science and Technology Basic Research Project(202403021212174,202403021221074).
文摘Audio-visual speaker tracking aims to determine the locations of multiple speakers in the scene by leveraging signals captured from multisensor platforms.Multimodal fusion methods can improve both the accuracy and robustness of speaker tracking.However,in complex multispeaker tracking scenarios,critical challenges such as cross-modal feature discrepancy,weak sound source localisation ambiguity and frequent identity switch errors remain unresolved,which severely hinder the modelling of speaker identity consistency and consequently lead to degraded tracking accuracy and unstable tracking trajectories.To this end,this paper proposes a multimodal multispeaker tracking network using audio-visual contrastive learning(AVCLNet).By integrating heterogeneous modal representations into a unified space through audio-visual contrastive learning,which facilitates cross-modal feature alignment,mitigates cross-modal feature bias and enhances identity-consistent representations.In the audio-visual measurement stage,we design a vision-guided weak sound source weighted enhancement method,which leverages visual cues to establish cross-modal mappings and employs a spatiotemporal dynamic weighted mechanism to improve the detectability of weak sound sources.Furthermore,in the data association phase,a dual geometric constraint strategy is introduced by combining the 2D and 3D spatial geometric information,reducing frequent identity switch errors.Experiments on the AV16.3 and CAV3D datasets show that AVCLNet outperforms state-of-the-art methods,demonstrating superior robustness in multispeaker scenarios.
文摘Diabetes mellitus represents a major global health issue,driving the need for noninvasive alternatives to traditional blood glucose monitoring methods.Recent advancements in wearable technology have introduced skin-interfaced biosensors capable of analyzing sweat and skin biomarkers,providing innovative solutions for diabetes diagnosis and monitoring.This review comprehensively discusses the current developments in noninvasive wearable biosensors,emphasizing simultaneous detection of biochemical biomarkers(such as glucose,cortisol,lactate,branched-chain amino acids,and cytokines)and physiological signals(including heart rate,blood pressure,and sweat rate)for accurate,personalized diabetes management.We explore innovations in multimodal sensor design,materials science,biorecognition elements,and integration techniques,highlighting the importance of advanced data analytics,artificial intelligence-driven predictive algorithms,and closed-loop therapeutic systems.Additionally,the review addresses ongoing challenges in biomarker validation,sensor stability,user compliance,data privacy,and regulatory considerations.A holistic,multimodal approach enabled by these next-generation wearable biosensors holds significant potential for improving patient outcomes and facilitating proactive healthcare interventions in diabetes management.
基金supported by the National Natural Science Foundation of China (Nos.22208218,22078196,and 22278268)the Natural Science Foundation of Shanghai (No.22ZR1460400)Collaborative Innovation Center of Fragrance Flavour and Cosmetics,and Collaborative Innovation Project of Shanghai Institute of Technology (No.XTCX2023-07)。
文摘The diagnostic efficacy of contemporary bioimaging technologies remains constrained by inherent limitations of conventional imaging agents,including suboptimal sensitivity,off-target biodistribution,and inherent cytotoxicity.These limitations have catalyzed the development of intelligent stimuli-responsive block copolymers-based bioimaging agents,which was engineered to dynamically respond to endogenous biochemical cues(e.g.,p H gradients,redox potential,enzyme activity,hypoxia environment) or exogenous physical triggers(e.g.,photoirradiation,thermal gradients,ultrasound(US)/magnetic stimuli).Through spatiotemporally controlled structural transformations,stimuli-responsive block copolymers enable precise contrast targeting,activatable signal amplification,and theranostic integration,thereby substantially enhancing signal-to-noise ratios of bioimaging and diagnostic specificity.Hence,this mini-review systematically examines molecular engineering principles for designing p H-,redox-,enzyme-,light-,thermo-,and US/magnetic-responsive polymers,with emphasis on structure-property relationships governing imaging performance modulation.Furthermore,we critically analyze emerging strategies for optical imaging,US synergies,and magnetic resonance imaging(MRI).Multimodal bioimaging has also been elaborated,which could overcome the inherent trade-offs between resolution,penetration depth,and functional specificity in single-modal approaches.By elucidating mechanistic insights and translational challenges,this mini-review aims to establish a design framework of stimuli-responsive block copolymersbased for high fidelity bioimaging agents and accelerate their clinical translation in precise diagnosis and therapy.
基金supported by the Ningxia Natural Science Foundation(2025AAC050001)the Scientific Research Startup Project for Full-Time Introduced High-Level Talents in Ningxia(2024BEH04130)+2 种基金the National Natural Science Foundation of China(32460444)the Ningxia Hui Autonomous Region Key Research and Development Program(2024BBF0101302,2023BDE02001)Supported by the Special Fund for Basic Research Business of Central Universities of North Minzu University(2025BG234,2023ZRLG12).
文摘Foundation models are reshaping artificial intelligence,yet their deployment in specialised domains such as agricultural question answering(AQA)still faces challenges including data scarcity and barriers to domainspecific knowledge.To systematically review recent progress in this area,this paper adopts a task–paradigmperspective and examines applications across three major AQA task families.For text-based QA,we analyse the strengths and limitations of retrieval-based,generative,and hybrid approaches built on large languagemodels,revealing a clear trend toward hybrid paradigms that balance precision and flexibility.For visual diagnosis,we discuss techniques such as crossmodal alignment and prompt-driven generation,which are pushing systems beyond simple pest and disease recognition toward deeper causal reasoning.Formultimodal reasoning,we show how the fusion of heterogeneous data—including text,images,speech,and sensor streams—enables comprehensive decision-making for diagnosis,monitoring,and yield prediction.To address the lack of unified benchmarks,we further propose a standardised evaluation protocol and a diagnostic taxonomy specifically designed to characterise agriculture-specific errors.Finally,we outline a concreteAQA roadmap that emphasises safety alignment,hallucination control,and lightweight deployment,aiming to guide future systems toward greater efficiency,trustworthiness,and sustainability.
基金supported by the National Key R&D Program of China(Grant No.2023YFB2906303)the National Natural Science Foundation of China(Grant No.62225110)+1 种基金the JD Project of Hubei Province(Grant No.2023BAA013)the Innovation Fund of WNLO。
文摘The rapid evolution of the autonomous driving industry has led to a surge in electronic units and applications,resulting in increased in-vehicle data traffic and higher demands for communication efficiency and security.Meanwhile,safe driving necessitates further development of in-vehicle thermal management systems,as traditional point-type sensors face deployment challenges due to their limited monitoring range.All-glass multimode fibers(AG-MMFs)emerge as an ideal solution for sensing and transmission.An integrated sensing and communication(ISAC)system based on AG-MMFs has been proposed and experimentally validated for stable and efficient operation across a broad temperature range from-18°C to 122°C,while maintaining strong tolerance to typical vehicle vibrations and connector misalignments.Utilizing a single commercial OM4 fiber,we achieve error-free PAM-4 transmission up to 100 Gb∕s with the aid of forward error correction and precise real-time temperature monitoring over 100 m at the same time.Furthermore,by adopting a looped link structure and a neural network-based denoising algorithm,temperature measuring maintains an average uncertainty and a spatial resolution of 0.1°C and 0.5 m,respectively,even under extreme conditions.Exhibiting such outstanding performance in both transmission and sensing,the ISAC architecture successfully addresses the growing demands for high-capacity in-vehicle networks and distributed thermal monitoring of critical components,while paving the theoretical foundation for“fiber to vehicle.”
基金supported by grants from the Hangzhou Key Project for Agricultural and Social Development under Grant No.20231203A12(JZ)the General Program of the Scientific Research Special Project for Post-Marketing Clinical Research of Innovative Drugs,Development Center for Medical Science&Technology,National Health Commission of the People’s Republic of China under Grant No.WKZX2024CX104202(JZ).
文摘Artificial intelligence(AI)is transforming the diagnostic landscape of malignant tumors in the urinary system,including prostate cancer,bladder cancer,and renal cell carcinoma(RCC).By integrating imaging,pathology,and molecular data,AI enhances the precision and reproducibility of tumor detection,grading,and risk stratification.In prostate cancer,AI-assisted multiparametric Magnetic resonance imaging(MRI)and digital pathology systems improve lesion localization and Gleason scoring.For bladder cancer,deep learning-based cystoscopy and radiomics models from Computed tomography/magnetic resonance imaging(CT/MRI)enable real-time lesion segmentation and non-invasive biomarker prediction,such as Programmed Cell Death-Ligand 1(PD-L1)expression.In RCC,AI,combined with CT/MRI and multi-omics data,aids in subtype classification and prognostic prediction,supporting personalized therapy.However,despite these promising advances,challenges such as data standardization,model generalizability,interpretability,and regulatory compliance hinder AI’s clinical translation.This review outlines the current state of AI in urological cancer diagnosis and prognosis,its technological innovations,and the clinical challenges and opportunities that lie ahead.
基金supported by the DH2025-TN07-07 project conducted at the Thai Nguyen University of Information and Communication Technology,Thai Nguyen,Vietnam,with additional support from the AI in Software Engineering Lab.
文摘It remains difficult to automate the creation and validation of Unified Modeling Language(UML)dia-grams due to unstructured requirements,limited automated pipelines,and the lack of reliable evaluation methods.This study introduces a cohesive architecture that amalgamates requirement development,UML synthesis,and multimodal validation.First,LLaMA-3.2-1B-Instruct was utilized to generate user-focused requirements.Then,DeepSeek-R1-Distill-Qwen-32B applies its reasoning skills to transform these requirements into PlantUML code.Using this dual-LLM pipeline,we constructed a synthetic dataset of 11,997 UML diagrams spanning six major diagram families.Rendering analysis showed that 89.5%of the generated diagrams compile correctly,while invalid cases were detected automatically.To assess quality,we employed a multimodal scoring method that combines Qwen2.5-VL-3B,LLaMA-3.2-11B-Vision-Instruct and Aya-Vision-8B,with weights based on MMMU performance.A study with 94 experts revealed strong alignment between automatic and manual evaluations,yielding a Pearson correlation of r=0.82 and a Fleiss’Kappa of 0.78.This indicates a high degree of concordance between automated metrics and human judgment.Overall,the results demonstrated that our scoring system is effective and that the proposed generation pipeline produces UML diagrams that are both syntactically correct and semantically coherent.More broadly,the system provides a scalable and reproducible foundation for future work in AI-driven software modeling and multimodal verification.