期刊文献+
共找到12,248篇文章
< 1 2 250 >
每页显示 20 50 100
A Survey on Multimodal Emotion Recognition:Methods,Datasets,and Future Directions
1
作者 A-Seong Moon Haesung Kim +1 位作者 Ye-Chan Park Jaesung Lee 《Computers, Materials & Continua》 2026年第5期1-42,共42页
Multimodal emotion recognition has emerged as a key research area for enabling human-centered artificial intelligence,supported by the rapid progress in vision,audio,language,and physiological modeling.Existing approa... Multimodal emotion recognition has emerged as a key research area for enabling human-centered artificial intelligence,supported by the rapid progress in vision,audio,language,and physiological modeling.Existing approaches integrate heterogeneous affective cues through diverse embedding strategies and fusion mechanisms,yet the field remains fragmented due to differences in feature alignment,temporal synchronization,modality reliability,and robustness to noise or missing inputs.This survey provides a comprehensive analysis of MER research from 2021 to 2025,consolidating advances in modality-specific representation learning,cross-modal feature construction,and early,late,and hybrid fusion paradigms.We systematically review visual,acoustic,textual,and sensor-based embeddings,highlighting howpre-trained encoders,self-supervised learning,and large languagemodels have reshaped the representational foundations ofMER.We further categorize fusion strategies by interaction depth and architectural design,examining how attention mechanisms,cross-modal transformers,adaptive gating,and multimodal large language models redefine the integration of affective signals.Finally,we summarize major benchmark datasets and evaluation metrics and discuss emerging challenges related to scalability,generalization,and interpretability.This survey aims to provide a unified perspective onmultimodal fusion for emotion recognition and to guide future research toward more coherent and generalizable multimodal affective intelligence. 展开更多
关键词 multimodal emotion recognition multimodal learning cross-modal learning fusion strategies representation learning
在线阅读 下载PDF
Multimodal Signal Processing of ECG Signals with Time-Frequency Representations for Arrhythmia Classification
2
作者 Yu Zhou Jiawei Tian Kyungtae Kang 《Computer Modeling in Engineering & Sciences》 2026年第2期990-1017,共28页
Arrhythmias are a frequently occurring phenomenon in clinical practice,but how to accurately dis-tinguish subtle rhythm abnormalities remains an ongoing difficulty faced by the entire research community when conductin... Arrhythmias are a frequently occurring phenomenon in clinical practice,but how to accurately dis-tinguish subtle rhythm abnormalities remains an ongoing difficulty faced by the entire research community when conducting ECG-based studies.From a review of existing studies,two main factors appear to contribute to this problem:the uneven distribution of arrhythmia classes and the limited expressiveness of features learned by current models.To overcome these limitations,this study proposes a dual-path multimodal framework,termed DM-EHC(Dual-Path Multimodal ECG Heartbeat Classifier),for ECG-based heartbeat classification.The proposed framework links 1D ECG temporal features with 2D time–frequency features.By setting up the dual paths described above,the model can process more dimensions of feature information.The MIT-BIH arrhythmia database was selected as the baseline dataset for the experiments.Experimental results show that the proposed method outperforms single modalities and performs better for certain specific types of arrhythmias.The model achieved mean precision,recall,and F1 score of 95.14%,92.26%,and 93.65%,respectively.These results indicate that the framework is robust and has potential value in automated arrhythmia classification. 展开更多
关键词 ELECTROCARDIOGRAM arrhythmia classification multimodal time-frequency representation
在线阅读 下载PDF
Multimodal clinical parameters-based immune status associated with the prognosis in patients with hepatocellular carcinoma
3
作者 Yu-Zhou Zhang Yuan-Ze Tang +4 位作者 Yun-Xuan He Shu-Tong Pan Hao-Cheng Dai Yu Liu Hai-Feng Zhou 《World Journal of Gastrointestinal Oncology》 2026年第1期75-91,共17页
Hepatocellular carcinoma presents with three distinct immune phenotypes,including immune-desert,immune-excluded,and immune-inflamed,indicating various treatment responses and prognostic outcomes.The clinical applicati... Hepatocellular carcinoma presents with three distinct immune phenotypes,including immune-desert,immune-excluded,and immune-inflamed,indicating various treatment responses and prognostic outcomes.The clinical application of multi-omics parameters is still restricted by the expensive and less accessible assays,although they accurately reflect immune status.A comprehensive evaluation framework based on“easy-to-obtain”multi-model clinical parameters is urgently required,incorporating clinical features to establish baseline patient profiles and disease staging;routine blood tests assessing systemic metabolic and functional status;immune cell subsets quantifying subcluster dynamics;imaging features delineating tumor morphology,spatial configuration,and perilesional anatomical relationships;immunohistochemical markers positioning qualitative and quantitative detection of tumor antigens from the cellular and molecular level.This integrated phenomic approach aims to improve prognostic stratification and clinical decision-making in hepatocellular carcinoma management conveniently and practically. 展开更多
关键词 Hepatocellular carcinoma Immune status PHENOTYPE multimodal parameters PROGNOSIS
暂未订购
Transformation of Verbal Descriptions of Process Flows into Business Process Modelling and Notation Models Using Multimodal Artificial Intelligence:Application in Justice
4
作者 Silvia Alayón Carlos Martín +3 位作者 Jesús Torres Manuel Bacallado Rosa Aguilar Guzmán Savirón 《Computer Modeling in Engineering & Sciences》 2026年第2期870-892,共23页
Business Process Modelling(BPM)is essential for analyzing,improving,and automating the flow of information within organizations,but traditional approaches based on manual interpretation are slow,error-prone,and requir... Business Process Modelling(BPM)is essential for analyzing,improving,and automating the flow of information within organizations,but traditional approaches based on manual interpretation are slow,error-prone,and require a high level of expertise.This article proposes an innovative alternative solution that overcomes these limitations by automatically generating comprehensive Business Process Modelling and Notation(BPMN)diagrams solely from verbal descriptions of the processes to be modeled,utilizing Large Language Models(LLMs)and multimodal Artificial Intelligence(AI).Experimental results,based on video recordings of process explanations provided by an expert from an organization(in this case,the Commercial Courts of a public justice administration),demonstrate that the proposed methodology successfully enables the automatic generation of complete and accurate BPMN diagrams,leading to significant improvements in the speed,accuracy,and accessibility of process modeling.This research makes a substantial contribution to the field of business process modeling,as its methodology is groundbreaking in its use of LLMs and multimodal AI capabilities to handle different types of source material(text and video),combining several tools to minimize the number of queries and reduce the complexity of the prompts required for the automatic generation of successful BPMN diagrams. 展开更多
关键词 Process modelling verbal description BPMN LLM multimodal AI
在线阅读 下载PDF
Multimodal medical imaging AI for breast cancer diagnosis: A comprehensive review
5
作者 Ting-Ruen Wei Yuling Yan 《Intelligent Oncology》 2026年第1期40-50,共11页
Traditional artificial intelligence(AI)-based methods for breast cancer diagnosis often rely on a single modality,such as ultrasound images.With the rise of multimodal approaches,multiple data sources,including imagin... Traditional artificial intelligence(AI)-based methods for breast cancer diagnosis often rely on a single modality,such as ultrasound images.With the rise of multimodal approaches,multiple data sources,including imaging from diverse medical modalities,structured clinical information,and unstructured medical reports,are increasingly integrated to provide richer and more informative signals for model training.This survey reviews the data modalities employed in AI-based breast cancer research,examines common multimodal combinations and fusion strategies,and discusses their applications across clinical tasks such as diagnosis,treatment planning,and outcome prediction.By consolidating current literature and identifying critical gaps,this survey aims to guide future research toward the development of reliable,clinically relevant multimodal AI systems for use in breast cancer management. 展开更多
关键词 Breast cancer Artificial intelligence Machine learning Deep learning multimodal
在线阅读 下载PDF
BCAM-Net:A Bidirectional Cross-Attention Multimodal Network for IoT Spectrum Sensing under Generalized Gaussian Noise
6
作者 Yuzhou Han Zhuoran Li +2 位作者 Ahmad Gendia Teruji Ide Osamu Muta 《Computers, Materials & Continua》 2026年第5期272-297,共26页
Spectrum sensing is an indispensable core part of cognitive radio dynamic spectrum access(DSA)and a key approach to alleviating spectrum scarcity in the Internet of Things(IoT).The key issue in practical IoT networks ... Spectrum sensing is an indispensable core part of cognitive radio dynamic spectrum access(DSA)and a key approach to alleviating spectrum scarcity in the Internet of Things(IoT).The key issue in practical IoT networks is robust sensing under the coexistence of low signal-to-noise ratios(SNRs)and non-Gaussian impulsive noise,where observations may be distorted differently across feature modalities,making conventional fusion unstable and degrading detection reliability.To address this challenge,the generalized Gaussian distribution(GGD)is adopted as the noise model,and a multimodal fusion framework termed BCAM-Net(bidirectional cross-attention multimodal network)is proposed.BCAM-Net adopts a parallel dual-branch architecture:a time-frequency branch that leverages the continuous wavelet transform(CWT)to extract time-frequency representations,and a temporal branch that learns long-range dependencies from raw signals.BCAM-Net utilizes a bidirectional cross-attention mechanism to achieve deep alignment and mutual calibration of temporal and time-frequency features,generating a fused representation that is highly robust to complex noise.Simulation results show that,under GGD noise with shape parameterβ=0.5,BCAM-Net achieves high detection probabilities in the low-SNR regime and outperforms representative baselines.At a false alarm probability Pf=0.1 and SNR of−14 dB,it attains a detection probability of 0.9020,exceeding the CNN-Transformer,WT-ResNet,TFCFN,and conventional CNN benchmarks by 5.75%,6.98%,33.3%,and 21.1%,respectively.These results indicate that BCAM-Net can effectively improve spectrum sensing performance in low-SNR impulsive-noise scenarios,and provides a lightweight,high-performance solution for practical cognitive radio spectrum sensing. 展开更多
关键词 Cognitive radio spectrumsensing IOT deep learning bidirectional cross-attention multimodal fusion
在线阅读 下载PDF
A Multimodal Metaphor Analysis of Chinese Architecture in Ne Zha 2
7
作者 QIAO Mengyu DUAN Rongjuan 《Journal of Literature and Art Studies》 2026年第1期39-42,共4页
From the perspective of Multimodal Metaphor Theory,the architectural scenes in Ne Zha 2 embody highly condensed cultural connotations.Through the synergy of vision,soundscape,and dialect,the film constructs a metaphor... From the perspective of Multimodal Metaphor Theory,the architectural scenes in Ne Zha 2 embody highly condensed cultural connotations.Through the synergy of vision,soundscape,and dialect,the film constructs a metaphorical chain of“human order-ethnic oppression-theocratic structure”via the three core architectural spaces.As core signifiers,buildings drive the plot,shape characters,and convey values.The study reveals that animation activates traditional architecture’s metaphorical potential through cross-modal mapping,endowing historical symbols with contemporary vitality and providing a paradigm for the creative transformation of traditional culture. 展开更多
关键词 multimodal metaphor Ne Zha 2 Chinese architecture
在线阅读 下载PDF
A Multimodal Sentiment Analysis Method Based on Multi-Granularity Guided Fusion
8
作者 Zilin Zhang Yan Liu +3 位作者 Jia Liu Senbao Hou Yuping Zhang Chenyuan Wang 《Computers, Materials & Continua》 2026年第2期1228-1241,共14页
With the growing demand formore comprehensive and nuanced sentiment understanding,Multimodal Sentiment Analysis(MSA)has gained significant traction in recent years and continues to attract widespread attention in the ... With the growing demand formore comprehensive and nuanced sentiment understanding,Multimodal Sentiment Analysis(MSA)has gained significant traction in recent years and continues to attract widespread attention in the academic community.Despite notable advances,existing approaches still face critical challenges in both information modeling and modality fusion.On one hand,many current methods rely heavily on encoders to extract global features from each modality,which limits their ability to capture latent fine-grained emotional cues within modalities.On the other hand,prevailing fusion strategies often lack mechanisms to model semantic discrepancies across modalities and to adaptively regulate modality interactions.To address these limitations,we propose a novel framework for MSA,termed Multi-Granularity Guided Fusion(MGGF).The proposed framework consists of three core components:(i)Multi-Granularity Feature Extraction Module,which simultaneously captures both global and local emotional features within each modality,and integrates them to construct richer intra-modal representations;(ii)Cross-ModalGuidance Learning Module(CMGL),which introduces a cross-modal scoring mechanism to quantify the divergence and complementarity betweenmodalities.These scores are then used as guiding signals to enable the fusion strategy to adaptively respond to scenarios of modality agreement or conflict;(iii)Cross-Modal Fusion Module(CMF),which learns the semantic dependencies among modalities and facilitates deep-level emotional feature interaction,thereby enhancing sentiment prediction with complementary information.We evaluate MGGF on two benchmark datasets:MVSA-Single and MVSA-Multiple.Experimental results demonstrate that MGGF outperforms the current state-of-the-art model CLMLF on MVSA-Single by achieving a 2.32% improvement in F1 score.On MVSA-Multiple,it surpasses MGNNS with a 0.26% increase in accuracy.These results substantiate the effectiveness ofMGGFin addressing two major limitations of existing methods—insufficient intra-modal fine-grained sentiment modeling and inadequate cross-modal semantic fusion. 展开更多
关键词 multimodal sentiment analysis cross-modal fusion cross-modal guided learning
在线阅读 下载PDF
AVCLNet:Multimodal Multispeaker Tracking Network Using Audio-Visual Contrastive Learning
9
作者 Yihan Li Yidi Li +3 位作者 Zhenhuan Xu Hao Guo Mengyuan Liu Weiwei Wan 《CAAI Transactions on Intelligence Technology》 2026年第1期238-255,共18页
Audio-visual speaker tracking aims to determine the locations of multiple speakers in the scene by leveraging signals captured from multisensor platforms.Multimodal fusion methods can improve both the accuracy and rob... Audio-visual speaker tracking aims to determine the locations of multiple speakers in the scene by leveraging signals captured from multisensor platforms.Multimodal fusion methods can improve both the accuracy and robustness of speaker tracking.However,in complex multispeaker tracking scenarios,critical challenges such as cross-modal feature discrepancy,weak sound source localisation ambiguity and frequent identity switch errors remain unresolved,which severely hinder the modelling of speaker identity consistency and consequently lead to degraded tracking accuracy and unstable tracking trajectories.To this end,this paper proposes a multimodal multispeaker tracking network using audio-visual contrastive learning(AVCLNet).By integrating heterogeneous modal representations into a unified space through audio-visual contrastive learning,which facilitates cross-modal feature alignment,mitigates cross-modal feature bias and enhances identity-consistent representations.In the audio-visual measurement stage,we design a vision-guided weak sound source weighted enhancement method,which leverages visual cues to establish cross-modal mappings and employs a spatiotemporal dynamic weighted mechanism to improve the detectability of weak sound sources.Furthermore,in the data association phase,a dual geometric constraint strategy is introduced by combining the 2D and 3D spatial geometric information,reducing frequent identity switch errors.Experiments on the AV16.3 and CAV3D datasets show that AVCLNet outperforms state-of-the-art methods,demonstrating superior robustness in multispeaker scenarios. 展开更多
关键词 computer vision machine perception multimodal approaches pattern recognition video signal processing
在线阅读 下载PDF
SparseMoE-MFN:A Sparse Attention and Mixture-of-Experts Framework for Multimodal Fake News Detection on Social Media
10
作者 Yuechuan Zhang Mingshu Zhang +2 位作者 Bin Wei Hongyu Jin Yaxuan Wang 《Computers, Materials & Continua》 2026年第5期1646-1669,共24页
Detecting fake news in multimodal and multilingual social media environments is challenging due to inherent noise,inter-modal imbalance,computational bottlenecks,and semantic ambiguity.To address these issues,we propo... Detecting fake news in multimodal and multilingual social media environments is challenging due to inherent noise,inter-modal imbalance,computational bottlenecks,and semantic ambiguity.To address these issues,we propose SparseMoE-MFN,a novel unified framework that integrates sparse attention with a sparse-activated Mixture of-Experts(MoE)architecture.This framework aims to enhance the efficiency,inferential depth,and interpretability of multimodal fake news detection.Sparse MoE-MFN leverages LLaVA-v1.6-Mistral-7B-HF for efficient visual encoding and Qwen/Qwen2-7B for text processing.The sparse attention module adaptively filters irrelevant tokens and focuses on key regions,reducing computational costs and noise.The sparse MoE module dynamically routes inputs to specialized experts(visual,language,cross-modal alignment)based on content heterogeneity.This expert specialization design boosts computational efficiency and semantic adaptability,enabling precise processing of complex content and improving performance on ambiguous categories.Evaluated on the large-scale,multilingualMR2 dataset,SparseMoEMFN achieves state-of-the-art performance.It obtains an accuracy of 86.7%and a macro-averaged F1 score of 0.859,outperforming strong baselines like MiniGPT-4 by 3.4%and 3.2%,respectively.Notably,it shows significant advantages in the“unverified”category.Furthermore,SparseMoE-MFN demonstrates superior computational efficiency,with an average inference latency of 89.1 ms and 95.4 GFLOPs,substantially lower than existing models.Ablation studies and visualization analyses confirm the effectiveness of both sparse attention and sparse MoE components in improving accuracy,generalization,and efficiency. 展开更多
关键词 Fake news detection multimodal sparse attention mixture-of-experts INTERPRETABILITY computational efficiency
在线阅读 下载PDF
Engineering stimuli-responsive block copolymers for multimodal bioimaging
11
作者 Lizhuang Zhong Ming Liu +3 位作者 Shilong Su Dongxin Zeng Jing Hu Zhiqian Guo 《Chinese Chemical Letters》 2026年第1期116-124,共9页
The diagnostic efficacy of contemporary bioimaging technologies remains constrained by inherent limitations of conventional imaging agents,including suboptimal sensitivity,off-target biodistribution,and inherent cytot... The diagnostic efficacy of contemporary bioimaging technologies remains constrained by inherent limitations of conventional imaging agents,including suboptimal sensitivity,off-target biodistribution,and inherent cytotoxicity.These limitations have catalyzed the development of intelligent stimuli-responsive block copolymers-based bioimaging agents,which was engineered to dynamically respond to endogenous biochemical cues(e.g.,p H gradients,redox potential,enzyme activity,hypoxia environment) or exogenous physical triggers(e.g.,photoirradiation,thermal gradients,ultrasound(US)/magnetic stimuli).Through spatiotemporally controlled structural transformations,stimuli-responsive block copolymers enable precise contrast targeting,activatable signal amplification,and theranostic integration,thereby substantially enhancing signal-to-noise ratios of bioimaging and diagnostic specificity.Hence,this mini-review systematically examines molecular engineering principles for designing p H-,redox-,enzyme-,light-,thermo-,and US/magnetic-responsive polymers,with emphasis on structure-property relationships governing imaging performance modulation.Furthermore,we critically analyze emerging strategies for optical imaging,US synergies,and magnetic resonance imaging(MRI).Multimodal bioimaging has also been elaborated,which could overcome the inherent trade-offs between resolution,penetration depth,and functional specificity in single-modal approaches.By elucidating mechanistic insights and translational challenges,this mini-review aims to establish a design framework of stimuli-responsive block copolymersbased for high fidelity bioimaging agents and accelerate their clinical translation in precise diagnosis and therapy. 展开更多
关键词 STIMULI-RESPONSIVE Block copolymers Molecular engineering multimodal bioimaging Diagnosis and therapy
原文传递
Automated Machine Learning for Fault Diagnosis Using Multimodal Mel-Spectrogram and Vibration Data
12
作者 Zehao Li Xuting Zhang +4 位作者 Hongqi Lin Wu Qin Junyu Qi Zhuyun Chen Qiang Liu 《Computer Modeling in Engineering & Sciences》 2026年第2期471-498,共28页
To ensure the safe and stable operation of rotating machinery,intelligent fault diagnosis methods hold significant research value.However,existing diagnostic approaches largely rely on manual feature extraction and ex... To ensure the safe and stable operation of rotating machinery,intelligent fault diagnosis methods hold significant research value.However,existing diagnostic approaches largely rely on manual feature extraction and expert experience,which limits their adaptability under variable operating conditions and strong noise environments,severely affecting the generalization capability of diagnostic models.To address this issue,this study proposes a multimodal fusion fault diagnosis framework based on Mel-spectrograms and automated machine learning(AutoML).The framework first extracts fault-sensitive Mel time–frequency features from acoustic signals and fuses them with statistical features of vibration signals to construct complementary fault representations.On this basis,automated machine learning techniques are introduced to enable end-to-end diagnostic workflow construction and optimal model configuration acquisition.Finally,diagnostic decisions are achieved by automatically integrating the predictions of multiple high-performance base models.Experimental results on a centrifugal pump vibration and acoustic dataset demonstrate that the proposed framework achieves high diagnostic accuracy under noise-free conditions and maintains strong robustness under noisy interference,validating its efficiency,scalability,and practical value for rotating machinery fault diagnosis. 展开更多
关键词 Automated machine learning mechanical fault diagnosis feature engineering multimodal data
在线阅读 下载PDF
LWCNet:A Physics-Guided Multimodal Few-Shot Learning Framework for Intelligent Fault Diagnosis
13
作者 Yong Hu Weifan Xu Xiangtong Du 《Computers, Materials & Continua》 2026年第5期1564-1587,共24页
Deep learning-based methods have shown great potential in intelligent bearing fault diagnosis.However,most existing approaches suffer from the scarcity of labeled data,which often results in insufficient robustness un... Deep learning-based methods have shown great potential in intelligent bearing fault diagnosis.However,most existing approaches suffer from the scarcity of labeled data,which often results in insufficient robustness under complex working conditions and a general lack of interpretability.To address these challenges,we propose a physics-informed multimodal fault diagnosis framework based on few-shot learning,which integrates a 2D timefrequency image encoder and a 1Dvibration signal encoder.Specifically,we embed prior knowledge ofmulti-resolution analysis from signal processing into the model by designing a Laplace Wavelet Convolution(LWC)module,which enhances interpretability since wavelet coefficients naturally correspond to specific frequency and temporal structures.To further balance the guidance of physical priors with the flexibility of learnable representations,we introduce a parametric multi-kernel wavelet that employs channel-wise dynamic attention to adaptively select relevant wavelet bases,thereby improving the feature expressiveness.Moreover,we develop a Mahalanobis-Prototype Joint Metric,which constructs more accurate and distribution-consistent decision boundaries under few-shot conditions.Comprehensive experiments on the Case Western Reserve University(CWRU)and Paderborn University(PU)bearing datasets demonstrate the superior effectiveness,robustness,and interpretability of the proposed approach compared with state-of-the-art baselines. 展开更多
关键词 Few-shot fault diagnosis multimodal feature fusion laplace wavelet convolution INTERPRETABILITY
在线阅读 下载PDF
A Novel Unified Framework for Automated Generation and Multimodal Validation of UML Diagrams
14
作者 Van-Viet Nguyen Huu-Khanh Nguyen +4 位作者 Kim-Son Nguyen Thi Minh-Hue Luong Duc-Quang Vu Trung-Nghia Phung The-Vinh Nguyen 《Computer Modeling in Engineering & Sciences》 2026年第1期1023-1050,共28页
It remains difficult to automate the creation and validation of Unified Modeling Language(UML)dia-grams due to unstructured requirements,limited automated pipelines,and the lack of reliable evaluation methods.This stu... It remains difficult to automate the creation and validation of Unified Modeling Language(UML)dia-grams due to unstructured requirements,limited automated pipelines,and the lack of reliable evaluation methods.This study introduces a cohesive architecture that amalgamates requirement development,UML synthesis,and multimodal validation.First,LLaMA-3.2-1B-Instruct was utilized to generate user-focused requirements.Then,DeepSeek-R1-Distill-Qwen-32B applies its reasoning skills to transform these requirements into PlantUML code.Using this dual-LLM pipeline,we constructed a synthetic dataset of 11,997 UML diagrams spanning six major diagram families.Rendering analysis showed that 89.5%of the generated diagrams compile correctly,while invalid cases were detected automatically.To assess quality,we employed a multimodal scoring method that combines Qwen2.5-VL-3B,LLaMA-3.2-11B-Vision-Instruct and Aya-Vision-8B,with weights based on MMMU performance.A study with 94 experts revealed strong alignment between automatic and manual evaluations,yielding a Pearson correlation of r=0.82 and a Fleiss’Kappa of 0.78.This indicates a high degree of concordance between automated metrics and human judgment.Overall,the results demonstrated that our scoring system is effective and that the proposed generation pipeline produces UML diagrams that are both syntactically correct and semantically coherent.More broadly,the system provides a scalable and reproducible foundation for future work in AI-driven software modeling and multimodal verification. 展开更多
关键词 Automated dataset generation vision-language models multimodal validation software engineering automation UMLCode
在线阅读 下载PDF
Multimodal artificial intelligence predicts PIK3CA mutation in breast cancer from digital pathology and clinical data:a multicenter study
15
作者 Jiaxian Miao Qi Liu +11 位作者 Jianing Zhao Shishun Fan Shenwen Wang Feng Ye Si Wu Jinze Li Huirui Zhang Meng Zhang Hong Bu Xiao Han Lianghong Teng Yueping Liu 《Cancer Biology & Medicine》 2026年第3期430-450,共21页
Objective:Accurate detection of PIK3CA mutations is essential for guiding PI3K-targeted therapies in breast cancer,yet sequencing is not universally accessible,and single-modality prediction models have limited perfor... Objective:Accurate detection of PIK3CA mutations is essential for guiding PI3K-targeted therapies in breast cancer,yet sequencing is not universally accessible,and single-modality prediction models have limited performance.This study developed a multimodal deep learning framework integrating whole-slide imaging(WSI)and structured clinical data to improve mutation prediction.Methods:A total of 1,047 patients from TCGA and 166 patients from 3 external centers were included.The histopathology model used a transformer-based pretrained encoder(H-optimus-0)and a clustering-constrained attention multiple instance learning(CLAM-SB MIL)classifier to generate WSI-level representations.The clinical model incorporated engineered clinical variables and an extreme gradient boosting(XGBoost)model.A decision-level late fusion strategy(Multimodal PIK3CA Model,MPM)combined probabilistic outputs from both branches.Performance was evaluated with the area under the curve(AUC)and secondary metrics.Interpretability was assessed via attention heatmaps and shapley additive explanations(SHAP)analysis.Results:MPM outperformed single-modality models.It achieved an AUC of 0.745 on TCGA and maintained stable performance across external cohorts(0.695,0.690,and 0.680).SHAP analysis identified molecular subtype as the most influential clinical feature,whereas attention maps highlighted mutation-associated morphological regions.Conclusions:The developed multimodal framework effectively integrates complementary morphological and clinical information,and provides a robust and generalizable method for predicting PIK3CA mutation status.Strong multicenter adaptability and biological interpretability support its potential use as a clinical decision-support tool and an accessible alternative to molecular testing. 展开更多
关键词 Breast cancer PIK3CA mutation multimodal artificial intelligence whole-slide imaging computational pathology
在线阅读 下载PDF
Fatigue Detection with Multimodal Physiological Signals via Uncertainty-Aware Deep Transfer Learning
16
作者 Kourosh Kakhi Hamzeh Asgharnezhad +2 位作者 Abbas Khosravi Roohallah Alizadehsani U.Rajendra Acharya 《Journal of Bionic Engineering》 2026年第1期472-487,共16页
Accurate detection of driver fatigue is essential for improving road safety.This study investigates the effectiveness of using multimodal physiological signals for fatigue detection while incorporating uncertainty qua... Accurate detection of driver fatigue is essential for improving road safety.This study investigates the effectiveness of using multimodal physiological signals for fatigue detection while incorporating uncertainty quantification to enhance the reliability of predictions.Physiological signals,including Electrocardiogram(ECG),Galvanic Skin Response(GSR),and Electroencephalogram(EEG),were transformed into image representations and analyzed using pretrained deep neu-ral networks.The extracted features were classified through a feedforward neural network,and prediction reliability was assessed using uncertainty quantification techniques such as Monte Carlo Dropout(MCD),model ensembles,and combined approaches.Evaluation metrics included standard measures(sensitivity,specificity,precision,and accuracy)along with uncertainty-aware metrics such as uncertainty sensitivity and uncertainty precision.Across all evaluations,ECG-based models consistently demonstrated strong performance.The findings indicate that combining multimodal physi-ological signals,Transfer Learning(TL),and uncertainty quantification can significantly improve both the accuracy and trustworthiness of fatigue detection systems.This approach supports the development of more reliable driver assistance technologies aimed at preventing fatigue-related accidents. 展开更多
关键词 Fatigue detection multimodal physiological signals Deep transfer learning Uncertainty-aware learning Driver monitoring
在线阅读 下载PDF
Multimodal Trajectory Generation for Robotic Motion Planning Using Transformer-Based Fusion and Adversarial Learning
17
作者 Shtwai Alsubai Ahmad Almadhor +3 位作者 Abdullah Al Hejaili Najib Ben Aoun Tahani Alsubait Vincent Karovic 《Computer Modeling in Engineering & Sciences》 2026年第2期848-869,共22页
In Human–Robot Interaction(HRI),generating robot trajectories that accurately reflect user intentions while ensuring physical realism remains challenging,especially in unstructured environments.In this study,we devel... In Human–Robot Interaction(HRI),generating robot trajectories that accurately reflect user intentions while ensuring physical realism remains challenging,especially in unstructured environments.In this study,we develop a multimodal framework that integrates symbolic task reasoning with continuous trajectory generation.The approach employs transformer models and adversarial training to map high-level intent to robotic motion.Information from multiple data sources,such as voice traits,hand and body keypoints,visual observations,and recorded paths,is integrated simultaneously.These signals are mapped into a shared representation that supports interpretable reasoning while enabling smooth and realistic motion generation.Based on this design,two different learning strategies are investigated.In the first step,grammar-constrained Linear Temporal Logic(LTL)expressions are created from multimodal human inputs.These expressions are subsequently decoded into robot trajectories.The second method generates trajectories directly from symbolic intent and linguistic data,bypassing an intermediate logical representation.Transformer encoders combine multiple types of information,and autoregressive transformer decoders generate motion sequences.Adding smoothness and speed limits during training increases the likelihood of physical feasibility.To improve the realism and stability of the generated trajectories during training,an adversarial discriminator is also included to guide them toward the distribution of actual robot motion.Tests on the NATSGLD dataset indicate that the complete system exhibits stable training behaviour and performance.In normalised coordinates,the logic-based pipeline has an Average Displacement Error(ADE)of 0.040 and a Final Displacement Error(FDE)of 0.036.The adversarial generator makes substantially more progress,reducing ADE to 0.021 and FDE to 0.018.Visual examination confirms that the generated trajectories closely align with observed motion patterns while preserving smooth temporal dynamics. 展开更多
关键词 multimodal trajectory generation robotic motion planning transformer networks sensor fusion reinforcement learning generative adversarial networks
在线阅读 下载PDF
MultiAgent-CoT:A Multi-Agent Chain-of-Thought Reasoning Model for Robust Multimodal Dialogue Understanding
18
作者 Ans D.Alghamdi 《Computers, Materials & Continua》 2026年第2期1395-1429,共35页
Multimodal dialogue systems often fail to maintain coherent reasoning over extended conversations and suffer from hallucination due to limited context modeling capabilities.Current approaches struggle with crossmodal ... Multimodal dialogue systems often fail to maintain coherent reasoning over extended conversations and suffer from hallucination due to limited context modeling capabilities.Current approaches struggle with crossmodal alignment,temporal consistency,and robust handling of noisy or incomplete inputs across multiple modalities.We propose Multi Agent-Chain of Thought(CoT),a novel multi-agent chain-of-thought reasoning framework where specialized agents for text,vision,and speech modalities collaboratively construct shared reasoning traces through inter-agent message passing and consensus voting mechanisms.Our architecture incorporates self-reflection modules,conflict resolution protocols,and dynamic rationale alignment to enhance consistency,factual accuracy,and user engagement.The framework employs a hierarchical attention mechanism with cross-modal fusion and implements adaptive reasoning depth based on dialogue complexity.Comprehensive evaluations on Situated Interactive Multi-Modal Conversations(SIMMC)2.0,VisDial v1.0,and newly introduced challenging scenarios demonstrate statistically significant improvements in grounding accuracy(p<0.01),chain-of-thought interpretability,and robustness to adversarial inputs compared to state-of-the-art monolithic transformer baselines and existing multi-agent approaches. 展开更多
关键词 Multi-agent systems chain-of-thought reasoning multimodal dialogue conversational artificial intelligence(AI) cross-modal fusion reasoning Interpretability
在线阅读 下载PDF
LLM-Powered Multimodal Reasoning for Fake News Detection
19
作者 Md.Ahsan Habib Md.Anwar Hussen Wadud +1 位作者 M.F.Mridha Md.Jakir Hossen 《Computers, Materials & Continua》 2026年第4期1821-1864,共44页
The problem of fake news detection(FND)is becoming increasingly important in the field of natural language processing(NLP)because of the rapid dissemination of misleading information on the web.Large language models(L... The problem of fake news detection(FND)is becoming increasingly important in the field of natural language processing(NLP)because of the rapid dissemination of misleading information on the web.Large language models(LLMs)such as GPT-4.Zero excels in natural language understanding tasks but can still struggle to distinguish between fact and fiction,particularly when applied in the wild.However,a key challenge of existing FND methods is that they only consider unimodal data(e.g.,images),while more detailed multimodal data(e.g.,user behaviour,temporal dynamics)is neglected,and the latter is crucial for full-context understanding.To overcome these limitations,we introduce M3-FND(Multimodal Misinformation Mitigation for False News Detection),a novel methodological framework that integrates LLMs with multimodal data sources to perform context-aware veracity assessments.Our method proposes a hybrid system that combines image-text alignment,user credibility profiling,and temporal pattern recognition,which is also strengthened through a natural feedback loop that provides real-time feedback for correcting downstream errors.We use contextual reinforcement learning to schedule prompt updating and update the classifier threshold based on the latest multimodal input,which enables the model to better adapt to changing misinformation attack strategies.M3-FND is tested on three diverse datasets,FakeNewsNet,Twitter15,andWeibo,which contain both text and visual socialmedia content.Experiments showthatM3-FND significantly outperforms conventional and LLMbased baselines in terms of accuracy,F1-score,and AUC on all benchmarks.Our results indicate the importance of employing multimodal cues and adaptive learning for effective and timely detection of fake news. 展开更多
关键词 Fake news detection multimodal learning large language models prompt engineering instruction tuning reinforcement learning misinformation mitigation
在线阅读 下载PDF
上一页 1 2 250 下一页 到第
使用帮助 返回顶部