期刊文献+
共找到1,032篇文章
< 1 2 52 >
每页显示 20 50 100
Multimodal clinical parameters-based immune status associated with the prognosis in patients with hepatocellular carcinoma
1
作者 Yu-Zhou Zhang Yuan-Ze Tang +4 位作者 Yun-Xuan He Shu-Tong Pan Hao-Cheng Dai Yu Liu Hai-Feng Zhou 《World Journal of Gastrointestinal Oncology》 2026年第1期75-91,共17页
Hepatocellular carcinoma presents with three distinct immune phenotypes,including immune-desert,immune-excluded,and immune-inflamed,indicating various treatment responses and prognostic outcomes.The clinical applicati... Hepatocellular carcinoma presents with three distinct immune phenotypes,including immune-desert,immune-excluded,and immune-inflamed,indicating various treatment responses and prognostic outcomes.The clinical application of multi-omics parameters is still restricted by the expensive and less accessible assays,although they accurately reflect immune status.A comprehensive evaluation framework based on“easy-to-obtain”multi-model clinical parameters is urgently required,incorporating clinical features to establish baseline patient profiles and disease staging;routine blood tests assessing systemic metabolic and functional status;immune cell subsets quantifying subcluster dynamics;imaging features delineating tumor morphology,spatial configuration,and perilesional anatomical relationships;immunohistochemical markers positioning qualitative and quantitative detection of tumor antigens from the cellular and molecular level.This integrated phenomic approach aims to improve prognostic stratification and clinical decision-making in hepatocellular carcinoma management conveniently and practically. 展开更多
关键词 Hepatocellular carcinoma Immune status PHENOTYPE multimodal parameters PROGNOSIS
暂未订购
Transformation of Verbal Descriptions of Process Flows into Business Process Modelling and Notation Models Using Multimodal Artificial Intelligence:Application in Justice
2
作者 Silvia Alayón Carlos Martín +3 位作者 Jesús Torres Manuel Bacallado Rosa Aguilar Guzmán Savirón 《Computer Modeling in Engineering & Sciences》 2026年第2期870-892,共23页
Business Process Modelling(BPM)is essential for analyzing,improving,and automating the flow of information within organizations,but traditional approaches based on manual interpretation are slow,error-prone,and requir... Business Process Modelling(BPM)is essential for analyzing,improving,and automating the flow of information within organizations,but traditional approaches based on manual interpretation are slow,error-prone,and require a high level of expertise.This article proposes an innovative alternative solution that overcomes these limitations by automatically generating comprehensive Business Process Modelling and Notation(BPMN)diagrams solely from verbal descriptions of the processes to be modeled,utilizing Large Language Models(LLMs)and multimodal Artificial Intelligence(AI).Experimental results,based on video recordings of process explanations provided by an expert from an organization(in this case,the Commercial Courts of a public justice administration),demonstrate that the proposed methodology successfully enables the automatic generation of complete and accurate BPMN diagrams,leading to significant improvements in the speed,accuracy,and accessibility of process modeling.This research makes a substantial contribution to the field of business process modeling,as its methodology is groundbreaking in its use of LLMs and multimodal AI capabilities to handle different types of source material(text and video),combining several tools to minimize the number of queries and reduce the complexity of the prompts required for the automatic generation of successful BPMN diagrams. 展开更多
关键词 Process modelling verbal description BPMN LLM multimodal AI
在线阅读 下载PDF
Multimodal Trajectory Generation for Robotic Motion Planning Using Transformer-Based Fusion and Adversarial Learning
3
作者 Shtwai Alsubai Ahmad Almadhor +3 位作者 Abdullah Al Hejaili Najib Ben Aoun Tahani Alsubait Vincent Karovic 《Computer Modeling in Engineering & Sciences》 2026年第2期848-869,共22页
In Human–Robot Interaction(HRI),generating robot trajectories that accurately reflect user intentions while ensuring physical realism remains challenging,especially in unstructured environments.In this study,we devel... In Human–Robot Interaction(HRI),generating robot trajectories that accurately reflect user intentions while ensuring physical realism remains challenging,especially in unstructured environments.In this study,we develop a multimodal framework that integrates symbolic task reasoning with continuous trajectory generation.The approach employs transformer models and adversarial training to map high-level intent to robotic motion.Information from multiple data sources,such as voice traits,hand and body keypoints,visual observations,and recorded paths,is integrated simultaneously.These signals are mapped into a shared representation that supports interpretable reasoning while enabling smooth and realistic motion generation.Based on this design,two different learning strategies are investigated.In the first step,grammar-constrained Linear Temporal Logic(LTL)expressions are created from multimodal human inputs.These expressions are subsequently decoded into robot trajectories.The second method generates trajectories directly from symbolic intent and linguistic data,bypassing an intermediate logical representation.Transformer encoders combine multiple types of information,and autoregressive transformer decoders generate motion sequences.Adding smoothness and speed limits during training increases the likelihood of physical feasibility.To improve the realism and stability of the generated trajectories during training,an adversarial discriminator is also included to guide them toward the distribution of actual robot motion.Tests on the NATSGLD dataset indicate that the complete system exhibits stable training behaviour and performance.In normalised coordinates,the logic-based pipeline has an Average Displacement Error(ADE)of 0.040 and a Final Displacement Error(FDE)of 0.036.The adversarial generator makes substantially more progress,reducing ADE to 0.021 and FDE to 0.018.Visual examination confirms that the generated trajectories closely align with observed motion patterns while preserving smooth temporal dynamics. 展开更多
关键词 multimodal trajectory generation robotic motion planning transformer networks sensor fusion reinforcement learning generative adversarial networks
在线阅读 下载PDF
LLM-Powered Multimodal Reasoning for Fake News Detection
4
作者 Md.Ahsan Habib Md.Anwar Hussen Wadud +1 位作者 M.F.Mridha Md.Jakir Hossen 《Computers, Materials & Continua》 2026年第4期1821-1864,共44页
The problem of fake news detection(FND)is becoming increasingly important in the field of natural language processing(NLP)because of the rapid dissemination of misleading information on the web.Large language models(L... The problem of fake news detection(FND)is becoming increasingly important in the field of natural language processing(NLP)because of the rapid dissemination of misleading information on the web.Large language models(LLMs)such as GPT-4.Zero excels in natural language understanding tasks but can still struggle to distinguish between fact and fiction,particularly when applied in the wild.However,a key challenge of existing FND methods is that they only consider unimodal data(e.g.,images),while more detailed multimodal data(e.g.,user behaviour,temporal dynamics)is neglected,and the latter is crucial for full-context understanding.To overcome these limitations,we introduce M3-FND(Multimodal Misinformation Mitigation for False News Detection),a novel methodological framework that integrates LLMs with multimodal data sources to perform context-aware veracity assessments.Our method proposes a hybrid system that combines image-text alignment,user credibility profiling,and temporal pattern recognition,which is also strengthened through a natural feedback loop that provides real-time feedback for correcting downstream errors.We use contextual reinforcement learning to schedule prompt updating and update the classifier threshold based on the latest multimodal input,which enables the model to better adapt to changing misinformation attack strategies.M3-FND is tested on three diverse datasets,FakeNewsNet,Twitter15,andWeibo,which contain both text and visual socialmedia content.Experiments showthatM3-FND significantly outperforms conventional and LLMbased baselines in terms of accuracy,F1-score,and AUC on all benchmarks.Our results indicate the importance of employing multimodal cues and adaptive learning for effective and timely detection of fake news. 展开更多
关键词 Fake news detection multimodal learning large language models prompt engineering instruction tuning reinforcement learning misinformation mitigation
在线阅读 下载PDF
Engineering stimuli-responsive block copolymers for multimodal bioimaging
5
作者 Lizhuang Zhong Ming Liu +3 位作者 Shilong Su Dongxin Zeng Jing Hu Zhiqian Guo 《Chinese Chemical Letters》 2026年第1期116-124,共9页
The diagnostic efficacy of contemporary bioimaging technologies remains constrained by inherent limitations of conventional imaging agents,including suboptimal sensitivity,off-target biodistribution,and inherent cytot... The diagnostic efficacy of contemporary bioimaging technologies remains constrained by inherent limitations of conventional imaging agents,including suboptimal sensitivity,off-target biodistribution,and inherent cytotoxicity.These limitations have catalyzed the development of intelligent stimuli-responsive block copolymers-based bioimaging agents,which was engineered to dynamically respond to endogenous biochemical cues(e.g.,p H gradients,redox potential,enzyme activity,hypoxia environment) or exogenous physical triggers(e.g.,photoirradiation,thermal gradients,ultrasound(US)/magnetic stimuli).Through spatiotemporally controlled structural transformations,stimuli-responsive block copolymers enable precise contrast targeting,activatable signal amplification,and theranostic integration,thereby substantially enhancing signal-to-noise ratios of bioimaging and diagnostic specificity.Hence,this mini-review systematically examines molecular engineering principles for designing p H-,redox-,enzyme-,light-,thermo-,and US/magnetic-responsive polymers,with emphasis on structure-property relationships governing imaging performance modulation.Furthermore,we critically analyze emerging strategies for optical imaging,US synergies,and magnetic resonance imaging(MRI).Multimodal bioimaging has also been elaborated,which could overcome the inherent trade-offs between resolution,penetration depth,and functional specificity in single-modal approaches.By elucidating mechanistic insights and translational challenges,this mini-review aims to establish a design framework of stimuli-responsive block copolymersbased for high fidelity bioimaging agents and accelerate their clinical translation in precise diagnosis and therapy. 展开更多
关键词 STIMULI-RESPONSIVE Block copolymers Molecular engineering multimodal bioimaging Diagnosis and therapy
原文传递
Multimodal artificial intelligence integrates imaging,endoscopic,and omics data for intelligent decision-making in individualized gastrointestinal tumor treatment
6
作者 Hui Nian Yi-Bin Wu +5 位作者 Yu Bai Zhi-Long Zhang Xiao-Huang Tu Qi-Zhi Liu De-Hua Zhou Qian-Cheng Du 《Artificial Intelligence in Gastroenterology》 2026年第1期1-19,共19页
Gastrointestinal tumors require personalized treatment strategies due to their heterogeneity and complexity.Multimodal artificial intelligence(AI)addresses this challenge by integrating diverse data sources-including ... Gastrointestinal tumors require personalized treatment strategies due to their heterogeneity and complexity.Multimodal artificial intelligence(AI)addresses this challenge by integrating diverse data sources-including computed tomography(CT),magnetic resonance imaging(MRI),endoscopic imaging,and genomic profiles-to enable intelligent decision-making for individualized therapy.This approach leverages AI algorithms to fuse imaging,endoscopic,and omics data,facilitating comprehensive characterization of tumor biology,prediction of treatment response,and optimization of therapeutic strategies.By combining CT and MRI for structural assessment,endoscopic data for real-time visual inspection,and genomic information for molecular profiling,multimodal AI enhances the accuracy of patient stratification and treatment personalization.The clinical implementation of this technology demonstrates potential for improving patient outcomes,advancing precision oncology,and supporting individualized care in gastrointestinal cancers.Ultimately,multimodal AI serves as a transformative tool in oncology,bridging data integration with clinical application to effectively tailor therapies. 展开更多
关键词 multimodal artificial intelligence Gastrointestinal tumors Individualized therapy Intelligent diagnosis Treatment optimization Prognostic prediction Data fusion Deep learning Precision medicine
在线阅读 下载PDF
Multimodal Signal Processing of ECG Signals with Time-Frequency Representations for Arrhythmia Classification
7
作者 Yu Zhou Jiawei Tian Kyungtae Kang 《Computer Modeling in Engineering & Sciences》 2026年第2期990-1017,共28页
Arrhythmias are a frequently occurring phenomenon in clinical practice,but how to accurately dis-tinguish subtle rhythm abnormalities remains an ongoing difficulty faced by the entire research community when conductin... Arrhythmias are a frequently occurring phenomenon in clinical practice,but how to accurately dis-tinguish subtle rhythm abnormalities remains an ongoing difficulty faced by the entire research community when conducting ECG-based studies.From a review of existing studies,two main factors appear to contribute to this problem:the uneven distribution of arrhythmia classes and the limited expressiveness of features learned by current models.To overcome these limitations,this study proposes a dual-path multimodal framework,termed DM-EHC(Dual-Path Multimodal ECG Heartbeat Classifier),for ECG-based heartbeat classification.The proposed framework links 1D ECG temporal features with 2D time–frequency features.By setting up the dual paths described above,the model can process more dimensions of feature information.The MIT-BIH arrhythmia database was selected as the baseline dataset for the experiments.Experimental results show that the proposed method outperforms single modalities and performs better for certain specific types of arrhythmias.The model achieved mean precision,recall,and F1 score of 95.14%,92.26%,and 93.65%,respectively.These results indicate that the framework is robust and has potential value in automated arrhythmia classification. 展开更多
关键词 ELECTROCARDIOGRAM arrhythmia classification multimodal time-frequency representation
在线阅读 下载PDF
A Novel Unified Framework for Automated Generation and Multimodal Validation of UML Diagrams
8
作者 Van-Viet Nguyen Huu-Khanh Nguyen +4 位作者 Kim-Son Nguyen Thi Minh-Hue Luong Duc-Quang Vu Trung-Nghia Phung The-Vinh Nguyen 《Computer Modeling in Engineering & Sciences》 2026年第1期1023-1050,共28页
It remains difficult to automate the creation and validation of Unified Modeling Language(UML)dia-grams due to unstructured requirements,limited automated pipelines,and the lack of reliable evaluation methods.This stu... It remains difficult to automate the creation and validation of Unified Modeling Language(UML)dia-grams due to unstructured requirements,limited automated pipelines,and the lack of reliable evaluation methods.This study introduces a cohesive architecture that amalgamates requirement development,UML synthesis,and multimodal validation.First,LLaMA-3.2-1B-Instruct was utilized to generate user-focused requirements.Then,DeepSeek-R1-Distill-Qwen-32B applies its reasoning skills to transform these requirements into PlantUML code.Using this dual-LLM pipeline,we constructed a synthetic dataset of 11,997 UML diagrams spanning six major diagram families.Rendering analysis showed that 89.5%of the generated diagrams compile correctly,while invalid cases were detected automatically.To assess quality,we employed a multimodal scoring method that combines Qwen2.5-VL-3B,LLaMA-3.2-11B-Vision-Instruct and Aya-Vision-8B,with weights based on MMMU performance.A study with 94 experts revealed strong alignment between automatic and manual evaluations,yielding a Pearson correlation of r=0.82 and a Fleiss’Kappa of 0.78.This indicates a high degree of concordance between automated metrics and human judgment.Overall,the results demonstrated that our scoring system is effective and that the proposed generation pipeline produces UML diagrams that are both syntactically correct and semantically coherent.More broadly,the system provides a scalable and reproducible foundation for future work in AI-driven software modeling and multimodal verification. 展开更多
关键词 Automated dataset generation vision-language models multimodal validation software engineering automation UMLCode
在线阅读 下载PDF
Multimodal,multifaceted,imaging-based human brain white matter atlas
9
作者 Junchen Zhou Wenxia Li +4 位作者 Shuo Xu Bharat B.Biswal Huafu Chen Jiao Li Wei Liao 《Science Bulletin》 2026年第3期500-504,共5页
The brain atlas,or parcellation-delineating spatial partitions,organizes the brain's structure and function[1].The spatial arrangements of highly heterogeneous landscapes represent specialized functional regions f... The brain atlas,or parcellation-delineating spatial partitions,organizes the brain's structure and function[1].The spatial arrangements of highly heterogeneous landscapes represent specialized functional regions for investigating their interactions.Early efforts to parcellate the mammalian brain,using histological cytoarchitecture and myeloarchitecture,as well as recent in vivo magnetic resonance imaging(MRl)[2,3],have primarily involved cortical areas,subcortical structures,and cerebellar nuclei.Human brain parcellations primarily focus on grey matter(GM),which purposefully excludes white matter(WM),hindering the development of next-generation brain atlases. 展开更多
关键词 brain atlasor cerebellar nucleihuman brain p vivo magnetic resonance imaging mrl human brain white matter atlas histological cytoarchitecture imaging based PARCELLATION multimodal
原文传递
Beyond origin:multimodal AI synthesis to resolve cancers of unknown primary
10
作者 Hongru Shen Xiangchun Li 《Cancer Biology & Medicine》 2026年第1期21-29,共9页
For decades,the central dogma of oncology has been that a cancer’s identity is inextricably linked to its anatomical origin.This principle underpins the entire diagnostic and therapeutic framework,from histology-base... For decades,the central dogma of oncology has been that a cancer’s identity is inextricably linked to its anatomical origin.This principle underpins the entire diagnostic and therapeutic framework,from histology-based classification to site-specific treatment guidelines.Yet,this framework catastrophically fails for a substantial population of patients diagnosed with cancer of unknown primary(CUP).These patients present metastatic disease,yet their primary tumors remain elusive despite exhaustive clinical workup1.CUP,accounting for 1%-3%of all cancer diagnoses,is an enigma with devastating consequences;the median overall survival is only 2-12 months2-4.The inability to pinpoint an origin forces clinicians to rely on broad-spectrum empirical chemotherapy,such as taxane-carboplatin regimens,which have limited efficacy and exclude patients from the promise of targeted therapies and clinical trials5.CUP is not only a diagnostic challenge but also an indictment of the siloed approach to understanding malignancy:this cancer highlights the limitations of origin-based diagnostic frameworks.However,the confluence of high-dimensional biological data and advanced artificial intelligence(AI)is now poised to address this long-standing diagnostic limitation and to herald a new era for not only CUP but also oncology as a whole(Figure 1). 展开更多
关键词 central dogma oncology cancer unknown primary high dimensional biological data clinical trials diagnostic framework artificial intelligence targeted therapies multimodal AI synthesis
暂未订购
AI-driven integration of multi-omics and multimodal data for precision medicine
11
作者 Heng-Rui Liu 《Medical Data Mining》 2026年第1期1-2,共2页
High-throughput transcriptomics has evolved from bulk RNA-seq to single-cell and spatial profiling,yet its clinical translation still depends on effective integration across diverse omics and data modalities.Emerging ... High-throughput transcriptomics has evolved from bulk RNA-seq to single-cell and spatial profiling,yet its clinical translation still depends on effective integration across diverse omics and data modalities.Emerging foundation models and multimodal learning frameworks are enabling scalable and transferable representations of cellular states,while advances in interpretability and real-world data integration are bridging the gap between discovery and clinical application.This paper outlines a concise roadmap for AI-driven,transcriptome-centered multi-omics integration in precision medicine(Figure 1). 展开更多
关键词 high throughput transcriptomics multi omics single cell multimodal learning frameworks foundation models omics data modalitiesemerging ai driven precision medicine
在线阅读 下载PDF
Multimodal fluorescent switch:pressure-induced emission and solution-induced optical anti-counterfeiting and logic gates
12
作者 Jingtian Wang Xihan Yu +2 位作者 Kai Wang Guanjun Xiao Bo Zou 《Science Bulletin》 2026年第3期490-494,共5页
Multifunctional optical responsive materials have grown increasingly pivotal in addressingthe escalating demands of sensing,detection,and anti-counterfeiting applications[1,2].These materials exhibit distinct visible ... Multifunctional optical responsive materials have grown increasingly pivotal in addressingthe escalating demands of sensing,detection,and anti-counterfeiting applications[1,2].These materials exhibit distinct visible optical variations upon exposure to external stimuli,such as pressure,temperature,light,solvents,pH fluctuations,or mechanical force.Fluorescent sensing and anti-counterfeiting technologies leveraging these optical responses have emerged as highly promising solutions. 展开更多
关键词 sensing multifunctional optical responsive materials logic gates optical responsive materials multimodal fluorescent switch optical variations pressure induced emission solution induced optical anti counterfeiting
原文传递
Automated Machine Learning for Fault Diagnosis Using Multimodal Mel-Spectrogram and Vibration Data
13
作者 Zehao Li Xuting Zhang +4 位作者 Hongqi Lin Wu Qin Junyu Qi Zhuyun Chen Qiang Liu 《Computer Modeling in Engineering & Sciences》 2026年第2期471-498,共28页
To ensure the safe and stable operation of rotating machinery,intelligent fault diagnosis methods hold significant research value.However,existing diagnostic approaches largely rely on manual feature extraction and ex... To ensure the safe and stable operation of rotating machinery,intelligent fault diagnosis methods hold significant research value.However,existing diagnostic approaches largely rely on manual feature extraction and expert experience,which limits their adaptability under variable operating conditions and strong noise environments,severely affecting the generalization capability of diagnostic models.To address this issue,this study proposes a multimodal fusion fault diagnosis framework based on Mel-spectrograms and automated machine learning(AutoML).The framework first extracts fault-sensitive Mel time–frequency features from acoustic signals and fuses them with statistical features of vibration signals to construct complementary fault representations.On this basis,automated machine learning techniques are introduced to enable end-to-end diagnostic workflow construction and optimal model configuration acquisition.Finally,diagnostic decisions are achieved by automatically integrating the predictions of multiple high-performance base models.Experimental results on a centrifugal pump vibration and acoustic dataset demonstrate that the proposed framework achieves high diagnostic accuracy under noise-free conditions and maintains strong robustness under noisy interference,validating its efficiency,scalability,and practical value for rotating machinery fault diagnosis. 展开更多
关键词 Automated machine learning mechanical fault diagnosis feature engineering multimodal data
在线阅读 下载PDF
A novel method for composite facial expressions generation based on multimodal reinforcement learning
14
作者 Zequan XU Wei WANG +2 位作者 Qinchuan LI Jin WANG Gang CHEN 《Science China(Technological Sciences)》 2026年第2期259-271,共13页
Humanoid robots hold significant promise for social interaction and emotional companionship.However,their effectiveness hinges on the ability to convey nuanced and authentic emotions.Here,we presented a universal huma... Humanoid robots hold significant promise for social interaction and emotional companionship.However,their effectiveness hinges on the ability to convey nuanced and authentic emotions.Here,we presented a universal humanoid robot head with a facial kinematics model.Using a reinforcement learning framework guided by symmetry assessment,emotion decoupling,and MLLM authenticity evaluation,our system autonomously learns to generate adaptive facial expressions through dynamic landmark adjustments.By transferring the simulation training results to real-world environments,the robot can perform natural and expressive expressions.Another novel feature is the independent regulation of emotion intensity and expression magnitude across emotional categories,which enhances the ability to achieve culturally adaptive and socially resonant robotic expressions significantly.This research advances adaptive humanoid interaction,offering an easier and more efficient pathway toward culturally resonant and psychologically plausible robotic expressions. 展开更多
关键词 humanoid robot composite expressions multimodal reinforcement learning human-robot interaction
原文传递
Fatigue Detection with Multimodal Physiological Signals via Uncertainty-Aware Deep Transfer Learning
15
作者 Kourosh Kakhi Hamzeh Asgharnezhad +2 位作者 Abbas Khosravi Roohallah Alizadehsani U.Rajendra Acharya 《Journal of Bionic Engineering》 2026年第1期472-487,共16页
Accurate detection of driver fatigue is essential for improving road safety.This study investigates the effectiveness of using multimodal physiological signals for fatigue detection while incorporating uncertainty qua... Accurate detection of driver fatigue is essential for improving road safety.This study investigates the effectiveness of using multimodal physiological signals for fatigue detection while incorporating uncertainty quantification to enhance the reliability of predictions.Physiological signals,including Electrocardiogram(ECG),Galvanic Skin Response(GSR),and Electroencephalogram(EEG),were transformed into image representations and analyzed using pretrained deep neu-ral networks.The extracted features were classified through a feedforward neural network,and prediction reliability was assessed using uncertainty quantification techniques such as Monte Carlo Dropout(MCD),model ensembles,and combined approaches.Evaluation metrics included standard measures(sensitivity,specificity,precision,and accuracy)along with uncertainty-aware metrics such as uncertainty sensitivity and uncertainty precision.Across all evaluations,ECG-based models consistently demonstrated strong performance.The findings indicate that combining multimodal physi-ological signals,Transfer Learning(TL),and uncertainty quantification can significantly improve both the accuracy and trustworthiness of fatigue detection systems.This approach supports the development of more reliable driver assistance technologies aimed at preventing fatigue-related accidents. 展开更多
关键词 Fatigue detection multimodal physiological signals Deep transfer learning Uncertainty-aware learning Driver monitoring
在线阅读 下载PDF
A Multimodal Metaphor Analysis of Chinese Architecture in Ne Zha 2
16
作者 QIAO Mengyu DUAN Rongjuan 《Journal of Literature and Art Studies》 2026年第1期39-42,共4页
From the perspective of Multimodal Metaphor Theory,the architectural scenes in Ne Zha 2 embody highly condensed cultural connotations.Through the synergy of vision,soundscape,and dialect,the film constructs a metaphor... From the perspective of Multimodal Metaphor Theory,the architectural scenes in Ne Zha 2 embody highly condensed cultural connotations.Through the synergy of vision,soundscape,and dialect,the film constructs a metaphorical chain of“human order-ethnic oppression-theocratic structure”via the three core architectural spaces.As core signifiers,buildings drive the plot,shape characters,and convey values.The study reveals that animation activates traditional architecture’s metaphorical potential through cross-modal mapping,endowing historical symbols with contemporary vitality and providing a paradigm for the creative transformation of traditional culture. 展开更多
关键词 multimodal metaphor Ne Zha 2 Chinese architecture
在线阅读 下载PDF
A Comprehensive Review of Multimodal Deep Learning for Enhanced Medical Diagnostics 被引量:1
17
作者 Aya M.Al-Zoghby Ahmed Ismail Ebada +2 位作者 Aya S.Saleh Mohammed Abdelhay Wael A.Awad 《Computers, Materials & Continua》 2025年第9期4155-4193,共39页
Multimodal deep learning has emerged as a key paradigm in contemporary medical diagnostics,advancing precision medicine by enabling integration and learning from diverse data sources.The exponential growth of high-dim... Multimodal deep learning has emerged as a key paradigm in contemporary medical diagnostics,advancing precision medicine by enabling integration and learning from diverse data sources.The exponential growth of high-dimensional healthcare data,encompassing genomic,transcriptomic,and other omics profiles,as well as radiological imaging and histopathological slides,makes this approach increasingly important because,when examined separately,these data sources only offer a fragmented picture of intricate disease processes.Multimodal deep learning leverages the complementary properties of multiple data modalities to enable more accurate prognostic modeling,more robust disease characterization,and improved treatment decision-making.This review provides a comprehensive overview of the current state of multimodal deep learning approaches in medical diagnosis.We classify and examine important application domains,such as(1)radiology,where automated report generation and lesion detection are facilitated by image-text integration;(2)histopathology,where fusion models improve tumor classification and grading;and(3)multi-omics,where molecular subtypes and latent biomarkers are revealed through cross-modal learning.We provide an overview of representative research,methodological advancements,and clinical consequences for each domain.Additionally,we critically analyzed the fundamental issues preventing wider adoption,including computational complexity(particularly in training scalable,multi-branch networks),data heterogeneity(resulting from modality-specific noise,resolution variations,and inconsistent annotations),and the challenge of maintaining significant cross-modal correlations during fusion.These problems impede interpretability,which is crucial for clinical trust and use,in addition to performance and generalizability.Lastly,we outline important areas for future research,including the development of standardized protocols for harmonizing data,the creation of lightweight and interpretable fusion architectures,the integration of real-time clinical decision support systems,and the promotion of cooperation for federated multimodal learning.Our goal is to provide researchers and clinicians with a concise overview of the field’s present state,enduring constraints,and exciting directions for further research through this review. 展开更多
关键词 multimodal deep learning medical diagnostics multimodal healthcare fusion healthcare data integration
暂未订购
How female treefrogs weigh unimodal and multimodal sexual displays in the absence and presence of noise
18
作者 Bicheng Zhua Runhan Li +1 位作者 Jichao Wang Jianguo Cui 《Current Zoology》 2025年第6期683-691,共9页
Mate choice plays a pivotal role in wildlife reproduction and population sustainability.The assessment of sexual displays in noise poses a common challenge for wildlife.Multimodal signals are hypothesized to be favore... Mate choice plays a pivotal role in wildlife reproduction and population sustainability.The assessment of sexual displays in noise poses a common challenge for wildlife.Multimodal signals are hypothesized to be favored since they improve the accuracy of signal detection and discrimination in noise.We verified whether female treefrogs exhibit a heightened reliance on visual cues when acoustic cues are drowned out by the noise and whether increased call complexity can compensate for the attractiveness differences between unimodal and multimodal signals.Our results demonstrated that female treefrogs prefer longer courtship signals in the absence of noise.Meanwhile,increasing call complexity effectively mitigated the attractiveness difference between acoustic and visual/multimodal signals.However,female treefrogs did not shift their reliance to visual signals when masked by noise.Noise prolonged the duration required for females to make a mate choice in most cases and reduced female preferences for attractive signals regardless of whether the mating scene was unimodal or multimodal,which lends further the hypothesis of cross-sensory interference.We examined how female treefrogs weigh unimodal and multimodal courtship cues in the absence and presence of noise and offered distinct perspectives on the interplay of multi-sensory sexual displays in noise.This study enhanced our comprehension of noise interference in mating choice and established a novel,comprehensive scientific foundation for the prevention and control of multimodal sensory pollution. 展开更多
关键词 cross-sensory interference mate choice multimodal signal noise interference multimodal sensory pollution WILDLIFE
原文传递
Performance vs.Complexity Comparative Analysis of Multimodal Bilinear Pooling Fusion Approaches for Deep Learning-Based Visual Arabic-Question Answering Systems
19
作者 Sarah M.Kamel Mai A.Fadel +1 位作者 Lamiaa Elrefaei Shimaa I.Hassan 《Computer Modeling in Engineering & Sciences》 2025年第4期373-411,共39页
Visual question answering(VQA)is a multimodal task,involving a deep understanding of the image scene and the question’s meaning and capturing the relevant correlations between both modalities to infer the appropriate... Visual question answering(VQA)is a multimodal task,involving a deep understanding of the image scene and the question’s meaning and capturing the relevant correlations between both modalities to infer the appropriate answer.In this paper,we propose a VQA system intended to answer yes/no questions about real-world images,in Arabic.To support a robust VQA system,we work in two directions:(1)Using deep neural networks to semantically represent the given image and question in a fine-grainedmanner,namely ResNet-152 and Gated Recurrent Units(GRU).(2)Studying the role of the utilizedmultimodal bilinear pooling fusion technique in the trade-o.between the model complexity and the overall model performance.Some fusion techniques could significantly increase the model complexity,which seriously limits their applicability for VQA models.So far,there is no evidence of how efficient these multimodal bilinear pooling fusion techniques are for VQA systems dedicated to yes/no questions.Hence,a comparative analysis is conducted between eight bilinear pooling fusion techniques,in terms of their ability to reduce themodel complexity and improve themodel performance in this case of VQA systems.Experiments indicate that these multimodal bilinear pooling fusion techniques have improved the VQA model’s performance,until reaching the best performance of 89.25%.Further,experiments have proven that the number of answers in the developed VQA system is a critical factor that a.ects the effectiveness of these multimodal bilinear pooling techniques in achieving their main objective of reducing the model complexity.The Multimodal Local Perception Bilinear Pooling(MLPB)technique has shown the best balance between the model complexity and its performance,for VQA systems designed to answer yes/no questions. 展开更多
关键词 Arabic-VQA deep learning-based VQA deep multimodal information fusion multimodal representation learning VQA of yes/no questions VQA model complexity VQA model performance performance-complexity trade-off
在线阅读 下载PDF
上一页 1 2 52 下一页 到第
使用帮助 返回顶部