期刊文献+
共找到29篇文章
< 1 2 >
每页显示 20 50 100
A Comprehensive Review of Multimodal Deep Learning for Enhanced Medical Diagnostics 被引量:1
1
作者 Aya M.Al-Zoghby Ahmed Ismail Ebada +2 位作者 Aya S.Saleh Mohammed Abdelhay Wael A.Awad 《Computers, Materials & Continua》 2025年第9期4155-4193,共39页
Multimodal deep learning has emerged as a key paradigm in contemporary medical diagnostics,advancing precision medicine by enabling integration and learning from diverse data sources.The exponential growth of high-dim... Multimodal deep learning has emerged as a key paradigm in contemporary medical diagnostics,advancing precision medicine by enabling integration and learning from diverse data sources.The exponential growth of high-dimensional healthcare data,encompassing genomic,transcriptomic,and other omics profiles,as well as radiological imaging and histopathological slides,makes this approach increasingly important because,when examined separately,these data sources only offer a fragmented picture of intricate disease processes.Multimodal deep learning leverages the complementary properties of multiple data modalities to enable more accurate prognostic modeling,more robust disease characterization,and improved treatment decision-making.This review provides a comprehensive overview of the current state of multimodal deep learning approaches in medical diagnosis.We classify and examine important application domains,such as(1)radiology,where automated report generation and lesion detection are facilitated by image-text integration;(2)histopathology,where fusion models improve tumor classification and grading;and(3)multi-omics,where molecular subtypes and latent biomarkers are revealed through cross-modal learning.We provide an overview of representative research,methodological advancements,and clinical consequences for each domain.Additionally,we critically analyzed the fundamental issues preventing wider adoption,including computational complexity(particularly in training scalable,multi-branch networks),data heterogeneity(resulting from modality-specific noise,resolution variations,and inconsistent annotations),and the challenge of maintaining significant cross-modal correlations during fusion.These problems impede interpretability,which is crucial for clinical trust and use,in addition to performance and generalizability.Lastly,we outline important areas for future research,including the development of standardized protocols for harmonizing data,the creation of lightweight and interpretable fusion architectures,the integration of real-time clinical decision support systems,and the promotion of cooperation for federated multimodal learning.Our goal is to provide researchers and clinicians with a concise overview of the field’s present state,enduring constraints,and exciting directions for further research through this review. 展开更多
关键词 multimodal deep learning medical diagnostics multimodal healthcare fusion healthcare data integration
暂未订购
Performance vs.Complexity Comparative Analysis of Multimodal Bilinear Pooling Fusion Approaches for Deep Learning-Based Visual Arabic-Question Answering Systems
2
作者 Sarah M.Kamel Mai A.Fadel +1 位作者 Lamiaa Elrefaei Shimaa I.Hassan 《Computer Modeling in Engineering & Sciences》 2025年第4期373-411,共39页
Visual question answering(VQA)is a multimodal task,involving a deep understanding of the image scene and the question’s meaning and capturing the relevant correlations between both modalities to infer the appropriate... Visual question answering(VQA)is a multimodal task,involving a deep understanding of the image scene and the question’s meaning and capturing the relevant correlations between both modalities to infer the appropriate answer.In this paper,we propose a VQA system intended to answer yes/no questions about real-world images,in Arabic.To support a robust VQA system,we work in two directions:(1)Using deep neural networks to semantically represent the given image and question in a fine-grainedmanner,namely ResNet-152 and Gated Recurrent Units(GRU).(2)Studying the role of the utilizedmultimodal bilinear pooling fusion technique in the trade-o.between the model complexity and the overall model performance.Some fusion techniques could significantly increase the model complexity,which seriously limits their applicability for VQA models.So far,there is no evidence of how efficient these multimodal bilinear pooling fusion techniques are for VQA systems dedicated to yes/no questions.Hence,a comparative analysis is conducted between eight bilinear pooling fusion techniques,in terms of their ability to reduce themodel complexity and improve themodel performance in this case of VQA systems.Experiments indicate that these multimodal bilinear pooling fusion techniques have improved the VQA model’s performance,until reaching the best performance of 89.25%.Further,experiments have proven that the number of answers in the developed VQA system is a critical factor that a.ects the effectiveness of these multimodal bilinear pooling techniques in achieving their main objective of reducing the model complexity.The Multimodal Local Perception Bilinear Pooling(MLPB)technique has shown the best balance between the model complexity and its performance,for VQA systems designed to answer yes/no questions. 展开更多
关键词 Arabic-VQA deep learning-based VQA deep multimodal information fusion multimodal representation learning VQA of yes/no questions VQA model complexity VQA model performance performance-complexity trade-off
在线阅读 下载PDF
Multimodal Gas Detection Using E-Nose and Thermal Images:An Approach Utilizing SRGAN and Sparse Autoencoder
3
作者 Pratik Jadhav Vuppala Adithya Sairam +5 位作者 Niranjan Bhojane Abhyuday Singh Shilpa Gite Biswajeet Pradhan Mrinal Bachute Abdullah Alamri 《Computers, Materials & Continua》 2025年第5期3493-3517,共25页
Electronic nose and thermal images are effective ways to diagnose the presence of gases in real-time realtime.Multimodal fusion of these modalities can result in the development of highly accurate diagnostic systems.T... Electronic nose and thermal images are effective ways to diagnose the presence of gases in real-time realtime.Multimodal fusion of these modalities can result in the development of highly accurate diagnostic systems.The low-cost thermal imaging software produces low-resolution thermal images in grayscale format,hence necessitating methods for improving the resolution and colorizing the images.The objective of this paper is to develop and train a super-resolution generative adversarial network for improving the resolution of the thermal images,followed by a sparse autoencoder for colorization of thermal images and amultimodal convolutional neural network for gas detection using electronic nose and thermal images.The dataset used comprises 6400 thermal images and electronic nose measurements for four classes.A multimodal Convolutional Neural Network(CNN)comprising an EfficientNetB2 pre-trainedmodel was developed using both early and late feature fusion.The Super Resolution Generative Adversarial Network(SRGAN)model was developed and trained on low and high-resolution thermal images.Asparse autoencoder was trained on the grayscale and colorized thermal images.The SRGAN was trained on lowand high-resolution thermal images,achieving a Structural Similarity Index(SSIM)of 90.28,a Peak Signal-to-Noise Ratio(PSNR)of 68.74,and a Mean Absolute Error(MAE)of 0.066.The autoencoder model produced an MAE of 0.035,a Mean Squared Error(MSE)of 0.006,and a Root Mean Squared Error(RMSE)of 0.0705.The multimodal CNN,trained on these images and electronic nose measurements using both early and late fusion techniques,achieved accuracies of 97.89% and 98.55%,respectively.Hence,the proposed framework can be of great aid for the integration with low-cost software to generate high quality thermal camera images and highly accurate detection of gases in real-time. 展开更多
关键词 Thermal imaging gas detection multimodal learning generative models autoencoders
在线阅读 下载PDF
An Arrhythmia Intelligent Recognition Method Based on a Multimodal Information and Spatio-Temporal Hybrid Neural Network Model
4
作者 Xinchao Han Aojun Zhang +6 位作者 Runchuan Li Shengya Shen Di Zhang Bo Jin Longfei Mao Linqi Yang Shuqin Zhang 《Computers, Materials & Continua》 2025年第2期3443-3465,共23页
Electrocardiogram (ECG) analysis is critical for detecting arrhythmias, but traditional methods struggle with large-scale Electrocardiogram data and rare arrhythmia events in imbalanced datasets. These methods fail to... Electrocardiogram (ECG) analysis is critical for detecting arrhythmias, but traditional methods struggle with large-scale Electrocardiogram data and rare arrhythmia events in imbalanced datasets. These methods fail to perform multi-perspective learning of temporal signals and Electrocardiogram images, nor can they fully extract the latent information within the data, falling short of the accuracy required by clinicians. Therefore, this paper proposes an innovative hybrid multimodal spatiotemporal neural network to address these challenges. The model employs a multimodal data augmentation framework integrating visual and signal-based features to enhance the classification performance of rare arrhythmias in imbalanced datasets. Additionally, the spatiotemporal fusion module incorporates a spatiotemporal graph convolutional network to jointly model temporal and spatial features, uncovering complex dependencies within the Electrocardiogram data and improving the model’s ability to represent complex patterns. In experiments conducted on the MIT-BIH arrhythmia dataset, the model achieved 99.95% accuracy, 99.80% recall, and a 99.78% F1 score. The model was further validated for generalization using the clinical INCART arrhythmia dataset, and the results demonstrated its effectiveness in terms of both generalization and robustness. 展开更多
关键词 multimodal learning spatio-temporal hybrid graph convolutional network data imbalance ECG classification
在线阅读 下载PDF
Robust Audio-Visual Fusion for Emotion Recognition Based on Cross-Modal Learning under Noisy Conditions
5
作者 A-Seong Moon Seungyeon Jeong +3 位作者 Donghee Kim Mohd Asyraf Zulkifley Bong-Soo Sohn Jaesung Lee 《Computers, Materials & Continua》 2025年第11期2851-2872,共22页
Emotion recognition under uncontrolled and noisy environments presents persistent challenges in the design of emotionally responsive systems.The current study introduces an audio-visual recognition framework designed ... Emotion recognition under uncontrolled and noisy environments presents persistent challenges in the design of emotionally responsive systems.The current study introduces an audio-visual recognition framework designed to address performance degradation caused by environmental interference,such as background noise,overlapping speech,and visual obstructions.The proposed framework employs a structured fusion approach,combining early-stage feature-level integration with decision-level coordination guided by temporal attention mechanisms.Audio data are transformed into mel-spectrogram representations,and visual data are represented as raw frame sequences.Spatial and temporal features are extracted through convolutional and transformer-based encoders,allowing the framework to capture complementary and hierarchical information fromboth sources.Across-modal attentionmodule enables selective emphasis on relevant signals while suppressing modality-specific noise.Performance is validated on a modified version of the AFEW dataset,in which controlled noise is introduced to emulate realistic conditions.The framework achieves higher classification accuracy than comparative baselines,confirming increased robustness under conditions of cross-modal disruption.This result demonstrates the suitability of the proposed method for deployment in practical emotion-aware technologies operating outside controlled environments.The study also contributes a systematic approach to fusion design and supports further exploration in the direction of resilientmultimodal emotion analysis frameworks.The source code is publicly available at https://github.com/asmoon002/AVER(accessed on 18 August 2025). 展开更多
关键词 multimodal learning emotion recognition cross-modal attention robust representation learning
在线阅读 下载PDF
DMF: A Deep Multimodal Fusion-Based Network Traffic Classification Model
6
作者 Xiangbin Wang Qingjun Yuan +3 位作者 Weina Niu Qianwei Meng Yongjuan Wang Chunxiang Gu 《Computers, Materials & Continua》 2025年第5期2267-2285,共19页
With the rise of encrypted traffic,traditional network analysis methods have become less effective,leading to a shift towards deep learning-based approaches.Among these,multimodal learning-based classification methods... With the rise of encrypted traffic,traditional network analysis methods have become less effective,leading to a shift towards deep learning-based approaches.Among these,multimodal learning-based classification methods have gained attention due to their ability to leverage diverse feature sets from encrypted traffic,improving classification accuracy.However,existing research predominantly relies on late fusion techniques,which hinder the full utilization of deep features within the data.To address this limitation,we propose a novel multimodal encrypted traffic classification model that synchronizes modality fusion with multiscale feature extraction.Specifically,our approach performs real-time fusion of modalities at each stage of feature extraction,enhancing feature representation at each level and preserving inter-level correlations for more effective learning.This continuous fusion strategy improves the model’s ability to detect subtle variations in encrypted traffic,while boosting its robustness and adaptability to evolving network conditions.Experimental results on two real-world encrypted traffic datasets demonstrate that our method achieves a classification accuracy of 98.23% and 97.63%,outperforming existing multimodal learning-based methods. 展开更多
关键词 Deep fusion intrusion detection multimodal learning network traffic classification
在线阅读 下载PDF
Multimodal Machine Learning Guides Low Carbon Aeration Strategies in Urban Wastewater Treatment 被引量:2
7
作者 Hong-Cheng Wang Yu-Qi Wang +4 位作者 Xu Wang Wan-Xin Yin Ting-Chao Yu Chen-Hao Xue Ai-Jie Wang 《Engineering》 SCIE EI CAS CSCD 2024年第5期51-62,共12页
The potential for reducing greenhouse gas(GHG)emissions and energy consumption in wastewater treatment can be realized through intelligent control,with machine learning(ML)and multimodality emerging as a promising sol... The potential for reducing greenhouse gas(GHG)emissions and energy consumption in wastewater treatment can be realized through intelligent control,with machine learning(ML)and multimodality emerging as a promising solution.Here,we introduce an ML technique based on multimodal strategies,focusing specifically on intelligent aeration control in wastewater treatment plants(WWTPs).The generalization of the multimodal strategy is demonstrated on eight ML models.The results demonstrate that this multimodal strategy significantly enhances model indicators for ML in environmental science and the efficiency of aeration control,exhibiting exceptional performance and interpretability.Integrating random forest with visual models achieves the highest accuracy in forecasting aeration quantity in multimodal models,with a mean absolute percentage error of 4.4%and a coefficient of determination of 0.948.Practical testing in a full-scale plant reveals that the multimodal model can reduce operation costs by 19.8%compared to traditional fuzzy control methods.The potential application of these strategies in critical water science domains is discussed.To foster accessibility and promote widespread adoption,the multimodal ML models are freely available on GitHub,thereby eliminating technical barriers and encouraging the application of artificial intelligence in urban wastewater treatment. 展开更多
关键词 Wastewater treatment multimodal machine learning Deep learning Aeration control Interpretable machine learning
在线阅读 下载PDF
Deep multimodal learning for municipal solid waste sorting 被引量:2
8
作者 LU Gang WANG YuanBin +2 位作者 XU HuXiu YANG HuaYong ZOU Jun 《Science China(Technological Sciences)》 SCIE EI CAS CSCD 2022年第2期324-335,共12页
Automated waste sorting can dramatically increase waste sorting efficiency and reduce its regulation cost. Most of the current methods only use a single modality such as image data or acoustic data for waste classific... Automated waste sorting can dramatically increase waste sorting efficiency and reduce its regulation cost. Most of the current methods only use a single modality such as image data or acoustic data for waste classification, which makes it difficult to classify mixed and confusable wastes. In these complex situations, using multiple modalities becomes necessary to achieve a high classification accuracy. Traditionally, the fusion of multiple modalities has been limited by fixed handcrafted features. In this study, the deep-learning approach was applied to the multimodal fusion at the feature level for municipal solid-waste sorting.More specifically, the pre-trained VGG16 and one-dimensional convolutional neural networks(1 D CNNs) were utilized to extract features from visual data and acoustic data, respectively. These deeply learned features were then fused in the fully connected layers for classification. The results of comparative experiments proved that the proposed method was superior to the single-modality methods. Additionally, the feature-based fusion strategy performed better than the decision-based strategy with deeply learned features. 展开更多
关键词 deep multimodal learning municipal waste sorting multimodal fusion convolutional neural networks
原文传递
Brain-inspired multimodal learning based on neural networks 被引量:1
9
作者 Chang Liu Fuchun Sun Bo Zhang 《Translational Neuroscience and Clinics》 2018年第1期61-72,共12页
Modern computational models have leveraged biological advances in human brain research. This study addresses the problem of multimodal learning with the help of brain-inspired models. Specifically, a unified multimoda... Modern computational models have leveraged biological advances in human brain research. This study addresses the problem of multimodal learning with the help of brain-inspired models. Specifically, a unified multimodal learning architecture is proposed based on deep neural networks, which are inspired by the biology of the visual cortex of the human brain. This unified framework is validated by two practical multimodal learning tasks: image captioning, involving visual and natural language signals, and visual-haptic fusion, involving haptic and visual signals. Extensive experiments are conducted under the framework, and competitive results are achieved. 展开更多
关键词 multimodal learning brain-inspired learning deep learning neural networks
原文传递
Learning Strategies, Motivation and Learners' Perspectives on Online Multimodal Chinese Learning
10
作者 張鵬 《汉语教学方法与技术》 2021年第1期1-26,I0002,共27页
This mixed-method empirical study investigated the role of learning strategies and motivation in predicting L2 Chinese learning outcomes in an online multimodal learning environment.Both quantitative and qualitative a... This mixed-method empirical study investigated the role of learning strategies and motivation in predicting L2 Chinese learning outcomes in an online multimodal learning environment.Both quantitative and qualitative approaches also examined the learners'perspectives on online multimodal Chinese learning.The participants in this study were fifteen pre-intermediate adult Chinese learners aged 18-26.They were originally from different countries(Spain,Italy,Argentina,Colombia,and Mexico)and lived in Barcelona.They were multilingual,speaking more than two European languages,without exposure to any other Asian languages apart from Chinese.The study's investigation was composed of Strategy Inventory for Language Learning(SILL),motivation questionnaire,learner perception questionnaire,and focus group interview.The whole trial period lasted three months;after the experiment,the statistics were analyzed via the Spearman correlation coefficient.The statistical analysis results showed that strategy use was highly correlated with online multimodal Chinese learning outcomes;this indicated that strategy use played a vital role in online multimodal Chinese learning.Motivation was also found to have a significant effect.The perception questionnaire uncovered that the students were overall satisfied and favoring the online multimodal learning experience design.The detailed insights from the participants were exhibited in the transcripted analysis of focus group interviews. 展开更多
关键词 Chinese learning Online multimodal learning Individual Difference MOTIVATION Strategy Over the last few decades
在线阅读 下载PDF
Solving Geometry Problems via Feature Learning and Contrastive Learning of Multimodal Data 被引量:1
11
作者 Pengpeng Jian Fucheng Guo +1 位作者 Yanli Wang Yang Li 《Computer Modeling in Engineering & Sciences》 SCIE EI 2023年第8期1707-1728,共22页
This paper presents an end-to-end deep learning method to solve geometry problems via feature learning and contrastive learning of multimodal data.A key challenge in solving geometry problems using deep learning is to... This paper presents an end-to-end deep learning method to solve geometry problems via feature learning and contrastive learning of multimodal data.A key challenge in solving geometry problems using deep learning is to automatically adapt to the task of understanding single-modal and multimodal problems.Existing methods either focus on single-modal ormultimodal problems,and they cannot fit each other.A general geometry problem solver shouldobviouslybe able toprocess variousmodalproblems at the same time.Inthispaper,a shared feature-learning model of multimodal data is adopted to learn the unified feature representation of text and image,which can solve the heterogeneity issue between multimodal geometry problems.A contrastive learning model of multimodal data enhances the semantic relevance betweenmultimodal features and maps them into a unified semantic space,which can effectively adapt to both single-modal and multimodal downstream tasks.Based on the feature extraction and fusion of multimodal data,a proposed geometry problem solver uses relation extraction,theorem reasoning,and problem solving to present solutions in a readable way.Experimental results show the effectiveness of the method. 展开更多
关键词 Geometry problems multimodal feature learning multimodal contrastive learning automatic solver
在线阅读 下载PDF
Multimodality Prediction of Chaotic Time Series with Sparse Hard-Cut EM Learning of the Gaussian Process Mixture Model 被引量:1
12
作者 周亚同 樊煜 +1 位作者 陈子一 孙建成 《Chinese Physics Letters》 SCIE CAS CSCD 2017年第5期22-26,共5页
The contribution of this work is twofold: (1) a multimodality prediction method of chaotic time series with the Gaussian process mixture (GPM) model is proposed, which employs a divide and conquer strategy. It au... The contribution of this work is twofold: (1) a multimodality prediction method of chaotic time series with the Gaussian process mixture (GPM) model is proposed, which employs a divide and conquer strategy. It automatically divides the chaotic time series into multiple modalities with different extrinsic patterns and intrinsic characteristics, and thus can more precisely fit the chaotic time series. (2) An effective sparse hard-cut expec- tation maximization (SHC-EM) learning algorithm for the GPM model is proposed to improve the prediction performance. SHO-EM replaces a large learning sample set with fewer pseudo inputs, accelerating model learning based on these pseudo inputs. Experiments on Lorenz and Chua time series demonstrate that the proposed method yields not only accurate multimodality prediction, but also the prediction confidence interval SHC-EM outperforms the traditional variational 1earning in terms of both prediction accuracy and speed. In addition, SHC-EM is more robust and insusceptible to noise than variational learning. 展开更多
关键词 GPM multimodality Prediction of Chaotic Time Series with Sparse Hard-Cut EM learning of the Gaussian Process Mixture Model EM SHC
原文传递
Enhancing Cross-Lingual Image Description: A Multimodal Approach for Semantic Relevance and Stylistic Alignment
13
作者 Emran Al-Buraihy Dan Wang 《Computers, Materials & Continua》 SCIE EI 2024年第6期3913-3938,共26页
Cross-lingual image description,the task of generating image captions in a target language from images and descriptions in a source language,is addressed in this study through a novel approach that combines neural net... Cross-lingual image description,the task of generating image captions in a target language from images and descriptions in a source language,is addressed in this study through a novel approach that combines neural network models and semantic matching techniques.Experiments conducted on the Flickr8k and AraImg2k benchmark datasets,featuring images and descriptions in English and Arabic,showcase remarkable performance improvements over state-of-the-art methods.Our model,equipped with the Image&Cross-Language Semantic Matching module and the Target Language Domain Evaluation module,significantly enhances the semantic relevance of generated image descriptions.For English-to-Arabic and Arabic-to-English cross-language image descriptions,our approach achieves a CIDEr score for English and Arabic of 87.9%and 81.7%,respectively,emphasizing the substantial contributions of our methodology.Comparative analyses with previous works further affirm the superior performance of our approach,and visual results underscore that our model generates image captions that are both semantically accurate and stylistically consistent with the target language.In summary,this study advances the field of cross-lingual image description,offering an effective solution for generating image captions across languages,with the potential to impact multilingual communication and accessibility.Future research directions include expanding to more languages and incorporating diverse visual and textual data sources. 展开更多
关键词 Cross-language image description multimodal deep learning semantic matching reward mechanisms
在线阅读 下载PDF
Large Models for Machine Monitoring and Fault Diagnostics:Opportunities,Challenges,and Future Direction
14
作者 Xuefeng Chen Yaguo Lei +9 位作者 Yan-Fu Li Simon Parkinson Xiang Li Jinxin Liu Fan Lu Huan Wang Zisheng Wang Bin Yang Shilong Ye Zhibin Zhao 《Journal of Dynamics, Monitoring and Diagnostics》 2025年第2期76-90,共15页
As a critical technology for industrial system reliability and safety,machine monitoring and fault diagnostics have advanced transformatively with large language models(LLMs).This paper reviews LLM-based monitoring an... As a critical technology for industrial system reliability and safety,machine monitoring and fault diagnostics have advanced transformatively with large language models(LLMs).This paper reviews LLM-based monitoring and diagnostics methodologies,categorizing them into in-context learning,fine-tuning,retrievalaugmented generation,multimodal learning,and time series approaches,analyzing advances in diagnostics and decision support.It identifies bottlenecks like limited industrial data and edge deployment issues,proposing a three-stage roadmap to highlight LLMs’potential in shaping adaptive,interpretable PHM frameworks. 展开更多
关键词 context learning fault diagnostics LLMs multimodal learning
在线阅读 下载PDF
A Review on Vision-Language-Based Approaches: Challenges and Applications
15
作者 Huu-Tuong Ho Luong Vuong Nguyen +4 位作者 Minh-Tien Pham Quang-Huy Pham Quang-Duong Tran Duong Nguyen Minh Huy Tri-Hai Nguyen 《Computers, Materials & Continua》 2025年第2期1733-1756,共24页
In multimodal learning, Vision-Language Models (VLMs) have become a critical research focus, enabling the integration of textual and visual data. These models have shown significant promise across various natural lang... In multimodal learning, Vision-Language Models (VLMs) have become a critical research focus, enabling the integration of textual and visual data. These models have shown significant promise across various natural language processing tasks, such as visual question answering and computer vision applications, including image captioning and image-text retrieval, highlighting their adaptability for complex, multimodal datasets. In this work, we review the landscape of Bootstrapping Language-Image Pre-training (BLIP) and other VLM techniques. A comparative analysis is conducted to assess VLMs’ strengths, limitations, and applicability across tasks while examining challenges such as scalability, data quality, and fine-tuning complexities. The work concludes by outlining potential future directions in VLM research, focusing on enhancing model interpretability, addressing ethical implications, and advancing multimodal integration in real-world applications. 展开更多
关键词 Bootstrapping language-image pre-training(BLIP) multimodal learning vision-language model(VLM) vision-language pre-training(VLP)
在线阅读 下载PDF
Challenges in Al-driven Biomedical Multimodal Data Fusion and Analysis 被引量:3
16
作者 Junwei Liu Xiaoping Cen +9 位作者 Chenxin Yi Feng-ao Wang Junxiang Ding Jinyu Cheng Qinhua Wu Baowen Gai Yiwen Zhou Ruikun He Feng Gao Yixue Li 《Genomics, Proteomics & Bioinformatics》 2025年第1期1-19,共19页
The rapid development of biological and medical examination methods has vastly expanded personal biomedical information,including molecular,cel-lular,image,and electronic health record datasets.Integrating this wealth... The rapid development of biological and medical examination methods has vastly expanded personal biomedical information,including molecular,cel-lular,image,and electronic health record datasets.Integrating this wealth of information enables precise disease diagnosis,biomarker identification,and treatment design in clinical settings.Artificial intelligence(Al)techniques,particularly deep learning models,have been extensively employed in biomedical applications,demonstrating increased precision,efficiency,and generalization.The success of the large language and vision models fur-ther significantly extends their biomedical applications.However,challenges remain in learning these multimodal biomedical datasets,such as data privacy,fusion,and model interpretation.In this review,we provide a comprehensive overview of various biomedical data modalities,multimodal rep-resentation learning methods,and the applications of Al in biomedical data integrative analysis.Additionally,we discuss the challenges in applying these deep learning methods and how to better integrate them into biomedical scenarios.We then propose future directions for adapting deep learn-ing methods with model pretraining and knowledge integration to advance biomedical research and benefit their clinical applications. 展开更多
关键词 multimodal learning Biomedical analysis Large language model Model interpretation Meta-learning.
原文传递
Classifying Chinese Medicine Constitution Using Multimodal Deep-Learning Model 被引量:7
17
作者 GU Tian-yu YAN Zhuang-zhi JIANG Jie-hui 《Chinese Journal of Integrative Medicine》 SCIE CAS CSCD 2024年第2期163-170,共8页
Objective:To develop a multimodal deep-learning model for classifying Chinese medicine constitution,i.e.,the balanced and unbalanced constitutions,based on inspection of tongue and face images,pulse waves from palpati... Objective:To develop a multimodal deep-learning model for classifying Chinese medicine constitution,i.e.,the balanced and unbalanced constitutions,based on inspection of tongue and face images,pulse waves from palpation,and health information from a total of 540 subjects.Methods:This study data consisted of tongue and face images,pulse waves obtained by palpation,and health information,including personal information,life habits,medical history,and current symptoms,from 540 subjects(202 males and 338 females).Convolutional neural networks,recurrent neural networks,and fully connected neural networks were used to extract deep features from the data.Feature fusion and decision fusion models were constructed for the multimodal data.Results:The optimal models for tongue and face images,pulse waves and health information were ResNet18,Gate Recurrent Unit,and entity embedding,respectively.Feature fusion was superior to decision fusion.The multimodal analysis revealed that multimodal data compensated for the loss of information from a single mode,resulting in improved classification performance.Conclusions:Multimodal data fusion can supplement single model information and improve classification performance.Our research underscores the effectiveness of multimodal deep learning technology to identify body constitution for modernizing and improving the intelligent application of Chinese medicine. 展开更多
关键词 Chinese medicine constitution classification multimodal deep learning tongue image face image pulsewave health information
原文传递
Intelligent Recognition Using Ultralight Multifunctional Nano‑Layered Carbon Aerogel Sensors with Human‑Like Tactile Perception 被引量:4
18
作者 Huiqi Zhao Yizheng Zhang +8 位作者 Lei Han Weiqi Qian Jiabin Wang Heting Wu Jingchen Li Yuan Dai Zhengyou Zhang Chris RBowen Ya Yang 《Nano-Micro Letters》 SCIE EI CAS CSCD 2024年第1期172-186,共15页
Humans can perceive our complex world through multi-sensory fusion.Under limited visual conditions,people can sense a variety of tactile signals to identify objects accurately and rapidly.However,replicating this uniq... Humans can perceive our complex world through multi-sensory fusion.Under limited visual conditions,people can sense a variety of tactile signals to identify objects accurately and rapidly.However,replicating this unique capability in robots remains a significant challenge.Here,we present a new form of ultralight multifunctional tactile nano-layered carbon aerogel sensor that provides pressure,temperature,material recognition and 3D location capabilities,which is combined with multimodal supervised learning algorithms for object recognition.The sensor exhibits human-like pressure(0.04–100 kPa)and temperature(21.5–66.2℃)detection,millisecond response times(11 ms),a pressure sensitivity of 92.22 kPa^(−1)and triboelectric durability of over 6000 cycles.The devised algorithm has universality and can accommodate a range of application scenarios.The tactile system can identify common foods in a kitchen scene with 94.63%accuracy and explore the topographic and geomorphic features of a Mars scene with 100%accuracy.This sensing approach empowers robots with versatile tactile perception to advance future society toward heightened sensing,recognition and intelligence. 展开更多
关键词 Multifunctional sensor Tactile perception multimodal machine learning algorithms Universal tactile system Intelligent object recognition
在线阅读 下载PDF
Enhancing 3D Reconstruction Accuracy of FIB Tomography Data Using Multi‑voltage Images and Multimodal Machine Learning
19
作者 Trushal Sardhara Alexander Shkurmanov +5 位作者 Yong Li Lukas Riedel Shan Shi Christian J.Cyron Roland C.Aydin Martin Ritter 《Nanomanufacturing and Metrology》 EI 2024年第1期48-60,共13页
FIB-SEM tomography is a powerful technique that integrates a focused ion beam(FIB)and a scanning electron microscope(SEM)to capture high-resolution imaging data of nanostructures.This approach involves collecting in-p... FIB-SEM tomography is a powerful technique that integrates a focused ion beam(FIB)and a scanning electron microscope(SEM)to capture high-resolution imaging data of nanostructures.This approach involves collecting in-plane SEM imagesand using FIB to remove material layers for imaging subsequent planes,thereby producing image stacks.However,theseimage stacks in FIB-SEM tomography are subject to the shine-through effect,which makes structures visible from theposterior regions of the current plane.This artifact introduces an ambiguity between image intensity and structures in thecurrent plane,making conventional segmentation methods such as thresholding or the k-means algorithm insufficient.Inthis study,we propose a multimodal machine learning approach that combines intensity information obtained at differentelectron beam accelerating voltages to improve the three-dimensional(3D)reconstruction of nanostructures.By treatingthe increased shine-through effect at higher accelerating voltages as a form of additional information,the proposed methodsignificantly improves segmentation accuracy and leads to more precise 3D reconstructions for real FIB tomography data. 展开更多
关键词 multimodal machine learning Multi-voltage images FIB-SEM Overdeterministic systems 3D reconstruction FIB tomography
原文传递
Multimodal Archives,Monophonic Futures:A Transformer-Based Paradigm Shift in Kyrgyz Musical Documents
20
作者 Tong Cui Ting Li Muratova Ainura Muratovna 《New Horizon of Education》 2025年第1期41-47,共7页
The digitisation of musical manuscripts has transformed them from static heritage assets into dynamic data capital.This study explores how digitisation enhances the cultural value of musical manuscripts in low-resourc... The digitisation of musical manuscripts has transformed them from static heritage assets into dynamic data capital.This study explores how digitisation enhances the cultural value of musical manuscripts in low-resource contexts,focusing on Kyrgyz instrumental traditions(küü).Grounded in the SCP-R(Structure,Culture,Performance,and Resources)model,we analyse digitisation's impact through structural,cultural,performance,and resource dimensions.We propose a three-stage"embed–reconstruct–transform"framework,leveraging 12,400 folios and 2,300 hours of audio from the Kyrgyz National Conservatory.A Kyrgyz-tuned Transformer(MusicKG-T)trained with nomadic-path contrastive learning(CMCL-Kyrgyz)demonstrates that digitisation improves accessibility and usability,significantly increasing cultural and economic value.Findings offer a reproducible workflow for Silk-Road archives and highlight implications for music education and cultural policy.Future research should validate applicability to vocal traditions and other regions. 展开更多
关键词 Kyrgyzstan musical archives Transformer multimodal learning cultural economics music education
在线阅读 下载PDF
上一页 1 2 下一页 到第
使用帮助 返回顶部