期刊文献+
共找到29篇文章
< 1 2 >
每页显示 20 50 100
Construction of Human Digital Twin Model Based on Multimodal Data and Its Application in Locomotion Mode Identifcation 被引量:2
1
作者 Ruirui Zhong Bingtao Hu +4 位作者 Yixiong Feng Hao Zheng Zhaoxi Hong Shanhe Lou Jianrong Tan 《Chinese Journal of Mechanical Engineering》 SCIE EI CAS CSCD 2023年第5期7-19,共13页
With the increasing attention to the state and role of people in intelligent manufacturing, there is a strong demand for human-cyber-physical systems (HCPS) that focus on human-robot interaction. The existing intellig... With the increasing attention to the state and role of people in intelligent manufacturing, there is a strong demand for human-cyber-physical systems (HCPS) that focus on human-robot interaction. The existing intelligent manufacturing system cannot satisfy efcient human-robot collaborative work. However, unlike machines equipped with sensors, human characteristic information is difcult to be perceived and digitized instantly. In view of the high complexity and uncertainty of the human body, this paper proposes a framework for building a human digital twin (HDT) model based on multimodal data and expounds on the key technologies. Data acquisition system is built to dynamically acquire and update the body state data and physiological data of the human body and realize the digital expression of multi-source heterogeneous human body information. A bidirectional long short-term memory and convolutional neural network (BiLSTM-CNN) based network is devised to fuse multimodal human data and extract the spatiotemporal features, and the human locomotion mode identifcation is taken as an application case. A series of optimization experiments are carried out to improve the performance of the proposed BiLSTM-CNN-based network model. The proposed model is compared with traditional locomotion mode identifcation models. The experimental results proved the superiority of the HDT framework for human locomotion mode identifcation. 展开更多
关键词 Human digital twin Human-cyber-physical system Bidirectional long short-term memory Convolutional neural network multimodal data
在线阅读 下载PDF
Analysis of Emotions Using Multimodal Data: A Case Study
2
作者 Toshiya Akiyama Kyoko Osaka +4 位作者 Hirokazu Ito Ryuichi Tanioka Allan Paulo Blaquera Leah Anne Christine Bollos Tetsuya Tanioka 《Journal of Biosciences and Medicines》 2023年第12期54-68,共15页
In this case study, we hypothesized that sympathetic nerve activity would be higher during conversation with PALRO robot, and that conversation would result in an increase in cerebral blood flow near the Broca’s area... In this case study, we hypothesized that sympathetic nerve activity would be higher during conversation with PALRO robot, and that conversation would result in an increase in cerebral blood flow near the Broca’s area. The facial expressions of a human subject were recorded, and cerebral blood flow and heart rate variability were measured during interactions with the humanoid robot. These multimodal data were time-synchronized to quantitatively verify the change from the resting baseline by testing facial expression analysis, cerebral blood flow, and heart rate variability. In conclusion, this subject indicated that sympathetic nervous activity was dominant, suggesting that the subject may have enjoyed and been excited while talking to the robot (normalized High Frequency < normalized Low Frequency: 0.22 ± 0.16 < 0.78 ± 0.16). Cerebral blood flow values were higher during conversation and in the resting state after the experiment than in the resting state before the experiment. Talking increased cerebral blood flow in the frontal region. As the subject was left-handed, it was confirmed that the right side of the brain, where the Broca’s area is located, was particularly activated (Left < right: 0.15 ± 0.21 < 1.25 ± 0.17). In the sections where a “happy” facial emotion was recognized, the examiner-judged “happy” faces and the MTCNN “happy” results were also generally consistent. 展开更多
关键词 Humanoid Robots multimodal data Emotion Analysis
暂未订购
Deep learning-based multimodal data fusion in bone tumor management:Advances in clinical decision support
3
作者 Tongtong Huo Wei Wu +12 位作者 Xiaoliang Chen Mingdi Xue Pengran Liu Jiayao Zhang Yi Xie Honglin Wang Hong Zhou Zineng Yan Songxiang Liu Lin Lu Jiaming Yang Jin Liu Zhewei Ye 《Intelligent Oncology》 2025年第3期204-215,共12页
Bone tumors(BTs)-including osteosarcoma,Ewing sarcoma,and chondrosarcoma-are rare but biologically complex malignancies characterized by pronounced heterogeneity in anatomical location,histological subtype,and molecul... Bone tumors(BTs)-including osteosarcoma,Ewing sarcoma,and chondrosarcoma-are rare but biologically complex malignancies characterized by pronounced heterogeneity in anatomical location,histological subtype,and molecular alterations.Recent advances in artificial intelligence(AI),particularly deep learning,have enabled the integration of diverse clinical data modalities to support diagnosis,treatment planning,and prognostication in bone oncology.This review provides a comprehensive synthesis of AI-driven multimodal fusion strategies that incorporate radiological imaging,digital pathology,multi-omics profiling,and electronic health records.We conducted a structured review of peer-reviewed literature published between 2015 and early 2025,focusing on the development,validation,and clinical applicability of AI models for BT diagnosis,subtyping,treatment response prediction,and recurrence monitoring.Although multimodal models have demonstrated advantages over unimodal approaches,especially in handling missing data and improving generalizability,most remain constrained by single-center study designs,small sample sizes,and limited prospective or external validation.Persistent technical and translational challenges include semantic misalignment across modalities,incomplete datasets,limited model interpretability,and regulatory and infrastructural barriers to clinical integration.To address these limitations,we highlight emerging directions such as contrastive representation learning,generative data augmentation,transformer-based fusion architectures,and privacy-preserving federated learning.We also discuss the evolving role of foundation models and workflow-integrated AI agents in enhancing scalability and clinical usability.In summary,multimodal AI represents a promising paradigm for advancing precision care in BTs.Realizing its full clinical potential will require methodologically rigorous,biologically informed,and system-level approaches that bridge algorithmic innovation with real-world healthcare delivery. 展开更多
关键词 Bone tumors multimodal data fusion Artificial intelligence Clinical decision support systems Deep learning
暂未订购
Optimizing Multimodal Data Queries in Data Lakes
4
作者 Runqun Xiong Shiyuan Zhao +1 位作者 Ciyuan Chen Zhuqing Xu 《Tsinghua Science and Technology》 2025年第6期2625-2637,共13页
This paper addresses the challenge of efficiently querying multimodal related data in data lakes,a large-scale storage and management system that supports heterogeneous data formats,including structured,semi-structure... This paper addresses the challenge of efficiently querying multimodal related data in data lakes,a large-scale storage and management system that supports heterogeneous data formats,including structured,semi-structured,and unstructured data.Multimodal data queries are crucial because they enable seamless retrieval of related data across modalities,such as tables,images,and text,which has applications in fields like e-commerce,healthcare,and education.However,existing methods primarily focus on single-modality queries,such as joinable or unionable table discovery,and struggle to handle the heterogeneity and lack of metadata in data lakes while balancing accuracy and efficiency.To tackle these challenges,we propose a Multimodal data Query mechanism for Data Lakes(MQDL),which employs a modality-adaptive indexing mechanism raleted and contrastive learning based embeddings to unify representations across modalities.Additionally,we introduce product quantization to optimize candidate verification during queries,reducing computational overhead while maintaining precision.We evaluate MQDL using a table-image dataset across multiple business scenarios,measuring metrics such as precision,recall,and F1-score.Results show that MQDL achieves an accuracy rate of approximately 90%,while demonstrating strong scalability and reduced query response time compared to traditional methods.These findings highlight MQDL's potential to enhance multimodal data retrieval in complex data lake environments. 展开更多
关键词 multimodal data query data lake contrastive learning related data query
原文传递
Seismic vulnerability and risk assessment using multimodal data and machine learning:a case study of the central urban area of Jinan City,China
5
作者 Yaohui LIU Xinyu ZHANG +2 位作者 Jie ZHOU Xu HAN Hao ZHENG 《Frontiers of Earth Science》 2025年第3期452-467,共16页
Seismic hazards pose a major threat to life safety,social development,and the economy.Traditional seismic vulnerability and risk assessments,such as field survey methods,may not be suitable for densely built-up urban ... Seismic hazards pose a major threat to life safety,social development,and the economy.Traditional seismic vulnerability and risk assessments,such as field survey methods,may not be suitable for densely built-up urban areas due to the limited availability of comprehensive data and potential subjectivity in judgment.To overcome these limitations,an integrated method for seismic vulnerability and risk assessment based on multimodal remote sensing data,support vector machine(SVM)and GIScience methods was proposed and applied to the central urban area of Jinan City,Shandong Province,China.First,an area with representative buildings was selected for field survey research,and an attribute information base established.Then,the SVM method was used to establish the susceptibility proxies,which were applied to the whole study area after accuracy evaluation.Finally,the spatial distribution of seismic vulnerability and risk under different seismic intensity scenarios(from VI to X)was analyzed in GIScience.The results show that the average building vulnerability index in the central urban area of Jinan City is 0.53,indicating that the overall seismic performance of buildings is at a moderate level.Under the seismic intensity scenario of VIII,the buildings in the Starting area and New urban district of Jinan would mostly suffer‘Moderate’damage,while Old urban areas,with more seismic-resistant buildings,would experience only‘Slight’damage.This study aims to offer an efficient and accurate method for assessing seismic vulnerability in mid to large-sized cities characterized by concentrated population densities and rapid urbanization,as well as provide a valuable reference for efforts in urban renewal,seismic mitigation,and land planning,particularly in cities and regions of developing countries.Additionally,it contributes to the realization of Sustainable Development Goal 11,which seeks to make cities and human settlements inclusive,safe,resilient,and sustainable. 展开更多
关键词 seismic vulnerability assessment GIScience EMS-98 SVM RISK-UE multimodal remote sensing data
原文传递
Low-Rank Adapter Layers and Bidirectional Gated Feature Fusion for Multimodal Hateful Memes Classification
6
作者 Youwei Huang Han Zhong +1 位作者 Cheng Cheng Yijie Peng 《Computers, Materials & Continua》 2025年第7期1863-1882,共20页
Hateful meme is a multimodal medium that combines images and texts.The potential hate content of hateful memes has caused serious problems for social media security.The current hateful memes classification task faces ... Hateful meme is a multimodal medium that combines images and texts.The potential hate content of hateful memes has caused serious problems for social media security.The current hateful memes classification task faces significant data scarcity challenges,and direct fine-tuning of large-scale pre-trained models often leads to severe overfitting issues.In addition,it is a challenge to understand the underlying relationship between text and images in the hateful memes.To address these issues,we propose a multimodal hateful memes classification model named LABF,which is based on low-rank adapter layers and bidirectional gated feature fusion.Firstly,low-rank adapter layers are adopted to learn the feature representation of the new dataset.This is achieved by introducing a small number of additional parameters while retaining prior knowledge of the CLIP model,which effectively alleviates the overfitting phenomenon.Secondly,a bidirectional gated feature fusion mechanism is designed to dynamically adjust the interaction weights of text and image features to achieve finer cross-modal fusion.Experimental results show that the method significantly outperforms existing methods on two public datasets,verifying its effectiveness and robustness. 展开更多
关键词 Hateful meme multimodal fusion multimodal data deep learning
在线阅读 下载PDF
Early warning of emerging infectious diseases based on multimodal data
7
作者 Haotian Ren Yunchao Ling +3 位作者 Ruifang Cao Zhen Wang Yixue Li Tao Huang 《Biosafety and Health》 CAS CSCD 2023年第4期193-203,共11页
The coronavirus disease 2019 (COVID-19) pandemic has dramatically increased the awareness of emerging infectious diseases. The advancement of multiomics analysis technology has resulted in the development of several d... The coronavirus disease 2019 (COVID-19) pandemic has dramatically increased the awareness of emerging infectious diseases. The advancement of multiomics analysis technology has resulted in the development of several databases containing virus information. Several scientists have integrated existing data on viruses to construct phylogenetic trees and predict virus mutation and transmission in different ways, providing prospective technical support for epidemic prevention and control. This review summarized the databases of known emerging infectious viruses and techniques focusing on virus variant forecasting and early warning. It focuses on the multi-dimensional information integration and database construction of emerging infectious viruses, virus mutation spectrum construction and variant forecast model, analysis of the affinity between mutation antigen and the receptor, propagation model of virus dynamic evolution, and monitoring and early warning for variants. As people have suffered from COVID-19 and repeated flu outbreaks, we focused on the research results of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and influenza viruses. This review comprehensively viewed the latest virus research and provided a reference for future virus prevention and control research. 展开更多
关键词 Emerging infectious disease SARS-CoV-2 multimodal data Early warning
原文传递
Revolutionizing gastroenterology and hepatology with artificial intelligence:From precision diagnosis to equitable healthcare through interdisciplinary practice 被引量:1
8
作者 Zhi-Li Chen Chao Wang Fang Wang 《World Journal of Gastroenterology》 2025年第24期25-49,共25页
Artificial intelligence(AI)is driving a paradigm shift in gastroenterology and hepa-tology by delivering cutting-edge tools for disease screening,diagnosis,treatment,and prognostic management.Through deep learning,rad... Artificial intelligence(AI)is driving a paradigm shift in gastroenterology and hepa-tology by delivering cutting-edge tools for disease screening,diagnosis,treatment,and prognostic management.Through deep learning,radiomics,and multimodal data integration,AI has achieved diagnostic parity with expert cli-nicians in endoscopic image analysis(e.g.,early gastric cancer detection,colorectal polyp identification)and non-invasive assessment of liver pathologies(e.g.,fibrosis staging,fatty liver typing)while demonstrating utility in personalized care scenarios such as predicting hepatocellular carcinoma recurrence and opti-mizing inflammatory bowel disease treatment responses.Despite these advance-ments challenges persist including limited model generalization due to frag-mented datasets,algorithmic limitations in rare conditions(e.g.,pediatric liver diseases)caused by insufficient training data,and unresolved ethical issues related to bias,accountability,and patient privacy.Mitigation strategies involve constructing standardized multicenter databases,validating AI tools through prospective trials,leveraging federated learning to address data scarcity,and de-veloping interpretable systems(e.g.,attention heatmap visualization)to enhance clinical trust.Integrating generative AI,digital twin technologies,and establishing unified ethical/regulatory frameworks will accelerate AI adoption in primary care and foster equitable healthcare access while interdisciplinary collaboration and evidence-based implementation remain critical for realizing AI’s potential to redefine precision care for digestive disorders,improve global health outcomes,and reshape healthcare equity. 展开更多
关键词 Artificial intelligence Precision medicine GASTROENTEROLOGY HEPATOLOGY multimodal data integration Deep learning MICROBIOME
暂未订购
Uncovering differences in the spatial structure of intercity interactive networks described by multi-source migration flow:From the multi-hierarchical perspective
9
作者 WEI Shimei PAN Jinghu 《Journal of Geographical Sciences》 2025年第5期1049-1079,共31页
Population migration data derived from location-based services has often been used to delineate population flows between cities or construct intercity relationship networks to reveal and explore the complex interactio... Population migration data derived from location-based services has often been used to delineate population flows between cities or construct intercity relationship networks to reveal and explore the complex interaction patterns underlying human activities.Nevertheless,the inherent heterogeneity in multimodal migration big data has been ignored.This study conducts an in-depth comparison and quantitative analysis through a comprehensive lens of spatial association.Initially,the intercity interactive networks in China were constructed,utilizing migration data from Baidu and AutoNavi collected during the same time period.Subsequently,the characteristics and spatial structure similarities of the two types of intercity interactive networks were quantitatively assessed and analyzed from overall(network)and local(node)perspectives.Furthermore,the precision of these networks at the local scale is corroborated by constructing an intercity network from mobile phone(MP)data.Results indicate that the intercity interactive networks in China,as delineated by Baidu and AutoNavi migration flows,exhibit a high degree of structure equivalence.The correlation coefficient between these two networks is 0.874.Both networks exhibit a pronounced spatial polarization trend and hierarchical structure.This is evident in their distinct core and peripheral structures,as well as in the varying importance and influence of different nodes within the networks.Nevertheless,there are notable differences worthy of attention.Baidu intercity interactive network exhibits pronounced cross-regional effects,and its high-level interactions are characterized by a“rich-club”phenomenon.The AutoNavi intercity interactive network presents a more significant distance attenuation effect,and the high-level interactions display a gradient distribution pattern.Notably,there exists a substantial correlation between the AutoNavi and MP networks at the local scale,evidenced by a high correlation coefficient of 0.954.Furthermore,the“spatial dislocations”phenomenon was observed within the spatial structures at different levels,extracted from the Baidu and AutoNavi intercity networks.However,the measured results of network spatial structure similarity from three dimensions,namely,node location,node size,and local structure,indicate a relatively high similarity and consistency between the two networks. 展开更多
关键词 network differences interactive network intercity migration multimodal data China
原文传递
Fusion of color and hallucinated depth features for enhanced multimodal deep learning-based damage segmentation
10
作者 Tarutal Ghosh Mondal Mohammad Reza Jahanshahi 《Earthquake Engineering and Engineering Vibration》 SCIE EI CSCD 2023年第1期55-68,共14页
Recent advances in computer vision and deep learning have shown that the fusion of depth information can significantly enhance the performance of RGB-based damage detection and segmentation models.However,alongside th... Recent advances in computer vision and deep learning have shown that the fusion of depth information can significantly enhance the performance of RGB-based damage detection and segmentation models.However,alongside the advantages,depth-sensing also presents many practical challenges.For instance,the depth sensors impose an additional payload burden on the robotic inspection platforms limiting the operation time and increasing the inspection cost.Additionally,some lidar-based depth sensors have poor outdoor performance due to sunlight contamination during the daytime.In this context,this study investigates the feasibility of abolishing depth-sensing at test time without compromising the segmentation performance.An autonomous damage segmentation framework is developed,based on recent advancements in vision-based multi-modal sensing such as modality hallucination(MH)and monocular depth estimation(MDE),which require depth data only during the model training.At the time of deployment,depth data becomes expendable as it can be simulated from the corresponding RGB frames.This makes it possible to reap the benefits of depth fusion without any depth perception per se.This study explored two different depth encoding techniques and three different fusion strategies in addition to a baseline RGB-based model.The proposed approach is validated on computer-generated RGB-D data of reinforced concrete buildings subjected to seismic damage.It was observed that the surrogate techniques can increase the segmentation IoU by up to 20.1%with a negligible increase in the computation cost.Overall,this study is believed to make a positive contribution to enhancing the resilience of critical civil infrastructure. 展开更多
关键词 multimodal data fusion depth sensing vision-based inspection UAV-assisted inspection damage segmentation post-disaster reconnaissance modality hallucination monocular depth estimation
在线阅读 下载PDF
TRANSHEALTH:A Transformer-BDI Hybrid Framework for Real-Time Psychological Distress Detection in Ambient Healthcare
11
作者 Parul Dubey Pushkar Dubey +2 位作者 Mohammed Zakariah Abdulaziz S.Almazyad Deema Mohammed Alsekait 《Computers, Materials & Continua》 2025年第11期3897-3919,共23页
Psychological distress detection plays a critical role in modern healthcare,especially in ambient environments where continuous monitoring is essential for timely intervention.Advances in sensor technology and artific... Psychological distress detection plays a critical role in modern healthcare,especially in ambient environments where continuous monitoring is essential for timely intervention.Advances in sensor technology and artificial intelligence(AI)have enabled the development of systems capable of mental health monitoring using multimodal data.However,existing models often struggle with contextual adaptation and real-time decision-making in dynamic settings.This paper addresses these challenges by proposing TRANS-HEALTH,a hybrid framework that integrates transformer-based inference with Belief-Desire-Intention(BDI)reasoning for real-time psychological distress detection.The framework utilizes a multimodal dataset containing EEG,GSR,heart rate,and activity data to predict distress while adapting to individual contexts.The methodology combines deep learning for robust pattern recognition and symbolic BDI reasoning to enable adaptive decision-making.The novelty of the approach lies in its seamless integration of transformermodelswith BDI reasoning,providing both high accuracy and contextual relevance in real time.Performance metrics such as accuracy,precision,recall,and F1-score are employed to evaluate the system’s performance.The results show that TRANS-HEALTH outperforms existing models,achieving 96.1% accuracy with 4.78 ms latency and significantly reducing false alerts,with an enhanced ability to engage users,making it suitable for deployment in wearable and remote healthcare environments. 展开更多
关键词 Psychological distress detection transformer architecture BDI reasoning(Belief-Desire-Intention) real-time ambient healthcare multimodal sensor data
在线阅读 下载PDF
Deep Multimodal Learning and Fusion Based Intelligent Fault Diagnosis Approach 被引量:2
12
作者 Huifang Li Jianghang Huang +3 位作者 Jingwei Huang Senchun Chai Leilei Zhao Yuanqing Xia 《Journal of Beijing Institute of Technology》 EI CAS 2021年第2期172-185,共14页
Industrial Internet of Things(IoT)connecting society and industrial systems represents a tremendous and promising paradigm shift.With IoT,multimodal and heterogeneous data from industrial devices can be easily collect... Industrial Internet of Things(IoT)connecting society and industrial systems represents a tremendous and promising paradigm shift.With IoT,multimodal and heterogeneous data from industrial devices can be easily collected,and further analyzed to discover device maintenance and health related potential knowledge behind.IoT data-based fault diagnosis for industrial devices is very helpful to the sustainability and applicability of an IoT ecosystem.But how to efficiently use and fuse this multimodal heterogeneous data to realize intelligent fault diagnosis is still a challenge.In this paper,a novel Deep Multimodal Learning and Fusion(DMLF)based fault diagnosis method is proposed for addressing heterogeneous data from IoT environments where industrial devices coexist.First,a DMLF model is designed by combining a Convolution Neural Network(CNN)and Stacked Denoising Autoencoder(SDAE)together to capture more comprehensive fault knowledge and extract features from different modal data.Second,these multimodal features are seamlessly integrated at a fusion layer and the resulting fused features are further used to train a classifier for recognizing potential faults.Third,a two-stage training algorithm is proposed by combining supervised pre-training and fine-tuning to simplify the training process for deep structure models.A series of experiments are conducted over multimodal heterogeneous data from a gear device to verify our proposed fault diagnosis method.The experimental results show that our method outperforms the benchmarking ones in fault diagnosis accuracy. 展开更多
关键词 fault diagnosis deep learning multimodal heterogeneous data multimodal fused features
在线阅读 下载PDF
Brain-inspired multimodal approach for effluent quality prediction using wastewater surface images and water quality data 被引量:1
13
作者 Junchen Li Sijie Lin +3 位作者 Liang Zhang Yuheng Liu Yongzhen Peng Qing Hu 《Frontiers of Environmental Science & Engineering》 SCIE EI CSCD 2024年第3期69-82,共14页
Efficiently predicting effluent quality through data-driven analysis presents a significant advancement for consistent wastewater treatment operations.In this study,we aimed to develop an integrated method for predict... Efficiently predicting effluent quality through data-driven analysis presents a significant advancement for consistent wastewater treatment operations.In this study,we aimed to develop an integrated method for predicting effluent COD and NH3 levels.We employed a 200 L pilot-scale sequencing batch reactor(SBR)to gather multimodal data from urban sewage over 40 d.Then we collected data on critical parameters like COD,DO,pH,NH_(3),EC,ORP,SS,and water temperature,alongside wastewater surface images,resulting in a data set of approximately 40246 points.Then we proposed a brain-inspired image and temporal fusion model integrated with a CNN-LSTM network(BITF-CL)using this data.This innovative model synergized sewage imagery with water quality data,enhancing prediction accuracy.As a result,the BITF-CL model reduced prediction error by over 23%compared to traditional methods and still performed comparably to conventional techniques even without using DO and SS sensor data.Consequently,this research presents a cost-effective and precise prediction system for sewage treatment,demonstrating the potential of brain-inspired models. 展开更多
关键词 Wastewater treatment system Water quality prediction data driven analysis Brain-inspired model multimodal data Attention mechanism
原文传递
GeoPredict-LLM:Intelligent tunnel advanced geological prediction by reprogramming large language models 被引量:6
14
作者 Zhenhao Xu Zhaoyang Wang +2 位作者 Shucai Li Xiao Zhang Peng Lin 《Intelligent Geoengineering》 2024年第1期49-57,共9页
With the improvement of multisource information sensing and data acquisition capabilities inside tunnels,the availability of multimodal data in tunnel engineering has significantly increased.However,due to structural ... With the improvement of multisource information sensing and data acquisition capabilities inside tunnels,the availability of multimodal data in tunnel engineering has significantly increased.However,due to structural differences in multimodal data,traditional intelligent advanced geological prediction models have limited capacity for data fusion.Furthermore,the lack of pre-trained models makes it difficult for neural networks trained from scratch to deeply explore the features of multimodal data.To address these challenges,we utilize the fusion capability of knowledge graph for multimodal data and the pre-trained knowledge of large language models(LLMs)to establish an intelligent advanced geological prediction model(GeoPredict-LLM).First,we develop an advanced geological prediction ontology model,forming a knowledge graph database.Using knowledge graph embeddings,multisource and multimodal data are transformed into low-dimensional vectors with a unified structure.Secondly,pre-trained LLMs,through reprogramming,reconstruct these low-dimensional vectors,imparting linguistic characteristics to the data.This transformation effectively reframes the complex task of advanced geological prediction as a"language-based"problem,enabling the model to approach the task from a linguistic perspective.Moreover,we propose the prompt-as-prefix method,which enables output generation,while freezing the core of the LLM,thereby significantly reduces the number of training parameters.Finally,evaluations show that compared to neural network models without pre-trained models,GeoPredict-LLM significantly improves prediction accuracy.It is worth noting that as long as a knowledge graph database can be established,GeoPredict-LLM can be adapted to multimodal data mining tasks with minimal modifications. 展开更多
关键词 Advanced geological prediction Large language model data diffusion Multisource data multimodal data Knowledge graph
在线阅读 下载PDF
Adjusted Reasoning Module for Deep Visual Question Answering Using Vision Transformer
15
作者 Christine Dewi Hanna Prillysca Chernovita +2 位作者 Stephen Abednego Philemon Christian Adi Ananta Abbott Po Shun Chen 《Computers, Materials & Continua》 SCIE EI 2024年第12期4195-4216,共22页
Visual Question Answering(VQA)is an interdisciplinary artificial intelligence(AI)activity that integrates com-puter vision and natural language processing.Its purpose is to empower machines to respond to questions by ... Visual Question Answering(VQA)is an interdisciplinary artificial intelligence(AI)activity that integrates com-puter vision and natural language processing.Its purpose is to empower machines to respond to questions by utilizing visual information.A VQA system typically takes an image and a natural language query as input and produces a textual answer as output.One major obstacle in VQA is identifying a successful method to extract and merge textual and visual data.We examine“Fusion”Models that use information from both the text encoder and picture encoder to efficiently perform the visual question-answering challenge.For the transformer model,we utilize BERT and RoBERTa,which analyze textual data.The image encoder designed for processing image data utilizes ViT(Vision Transformer),Deit(Data-efficient Image Transformer),and BeIT(Image Transformers).The reasoning module of VQA was updated and layer normalization was incorporated to enhance the performance outcome of our effort.In comparison to the results of previous research,our proposed method suggests a substantial enhancement in efficacy.Our experiment obtained a 60.4%accuracy with the PathVQA dataset and a 69.2%accuracy with the VizWiz dataset. 展开更多
关键词 VQA vision transformer multimodal data deep learning
在线阅读 下载PDF
Traffic demand prediction using a social multiplex networks representation on a multimodal and multisource dataset
16
作者 Panagiotis Fafoutellis Eleni I.Vlahogianni 《International Journal of Transportation Science and Technology》 2024年第2期171-185,共15页
In this paper,a meaningful representation of the road network using multiplex networks and a novel feature selection framework that enhances the predictability of future traffic conditions of an entire network are pro... In this paper,a meaningful representation of the road network using multiplex networks and a novel feature selection framework that enhances the predictability of future traffic conditions of an entire network are proposed.Using data on traffic volumes and tickets’validation from the transportation network of Athens,we were able to develop prediction models that not only achieve very good performance but are also trained efficiently,do not introduce high complexity and,thus,are suitable for real-time operation.More specifically,the network’s nodes(loop detectors and subway/metro stations)are organized as a multilayer graph,each layer representing an hour of the day.Nodes with similar structural properties are then classified in communities and are exploited as features to predict the future demand values of nodes belonging to the same community.The results reveal the potential of the proposed method to provide reliable and accurate predictions. 展开更多
关键词 Multiplex networks Community detection Multi-layer graphs Traffic prediction multimodal data
在线阅读 下载PDF
Multimodal behavior recognition for dairy cow digital twin construction under incomplete modalities:A modality mapping completion network approach
17
作者 Yi Zhang Yu Zhang +3 位作者 Meng Gao Xinjie Wang Baisheng Dai Weizheng Shen 《Artificial Intelligence in Agriculture》 2025年第3期459-469,共11页
The recognition of dairy cow behavior is essential for enhancing health management,reproductive efficiency,production performance,and animal welfare.This paper addresses the challenge of modality loss in multimodal da... The recognition of dairy cow behavior is essential for enhancing health management,reproductive efficiency,production performance,and animal welfare.This paper addresses the challenge of modality loss in multimodal dairy cow behavior recognition algorithms,which can be caused by sensor or video signal disturbances arising from interference,harsh environmental conditions,extreme weather,network fluctuations,and other complexities inherent in farm environments.This study introduces a modality mapping completion network that maps incomplete sensor and video data to improve multimodal dairy cow behavior recognition under conditions of modality loss.By mapping incomplete sensor or video data,the method applies a multimodal behavior recognition algorithm to identify five specific behaviors:drinking,feeding,lying,standing,and walking.The results indicate that,under various comprehensive missing coefficients(λ),the method achieves an average accuracy of 97.87%±0.15%,an average precision of 95.19%±0.4%,and an average F1 score of 94.685%±0.375%,with an overall accuracy of 94.67%±0.37%.This approach enhances the robustness and applicability of cow behavior recognition based on multimodal data in situations of modality loss,resolving practical issues in the development of digital twins for cow behavior and providing comprehensive support for the intelligent and precise management of farms. 展开更多
关键词 multimodal data Modality loss Behavior recognition Dairy cow Digital twin
原文传递
Application of multimodal deep learning in the auxiliary diagnosis and treatment of dermatological diseases
18
作者 Ting Li Bowei Li +5 位作者 Yuying Jia Lian Duan Ping Sun Xiaozhen Li Xiaodong Yang Hong Cai 《Intelligent Medicine》 2025年第2期132-140,共9页
Skin diseases are important factors affecting health and quality of life,especially in rural areas where medical resources are limited.Early and accurate diagnosis can reduce unnecessary health and economic losses.How... Skin diseases are important factors affecting health and quality of life,especially in rural areas where medical resources are limited.Early and accurate diagnosis can reduce unnecessary health and economic losses.However,traditional visual diagnosis poses a high demand on both doctors’experience and the examination equipment,and there is a risk of missed diagnosis and misdiagnosis.Recently,advances in artificial intelligence technology,particularly deep learning,have resulted in the use of unimodal computer-aided diagnosis and treatment technologies based on skin images in dermatology.However,due to the small amount of information contained in unimodality,this technology cannot fully demonstrate the advantages of multimodal data in the real-world medical environment.Multimodal data fusion can fully integrate various types of data to help doctors make more accurate clinical decisions.This review aimed to provide a comprehensive overview of multimodal data and deep learning methods that could help dermatologists diagnose and treat skin diseases. 展开更多
关键词 Skin diseases multimodal data Artificial intelligence Deep learning Computer-aided diagnosis and treatment
原文传递
Multimodal Learning-based Prediction for Nonalcoholic Fatty Liver Disease
19
作者 Yaran Chen Xueyu Chen +5 位作者 Yu Han Haoran Li Dongbin Zhao Jingzhong Li Xu Wang Yong Zhou 《Machine Intelligence Research》 2025年第5期871-887,共17页
Nonalcoholic fatty liver disease(NAFLD)is the most common cause of chronic liver disease,and if it is accurately predicted,severe fibrosis and cirrhosis can be prevented.While liver biopsies,the gold standard for NAFL... Nonalcoholic fatty liver disease(NAFLD)is the most common cause of chronic liver disease,and if it is accurately predicted,severe fibrosis and cirrhosis can be prevented.While liver biopsies,the gold standard for NAFLD diagnosis,is intrusive,expensive,and prone to sample errors,noninvasive studies are extremely promising but are still in their infancy due to a dearth of comprehensive study data and sophisticated multimodal data methodologies.This paper proposes a novel approach for diagnosing NAFLD by integrating a comprehensive clinical dataset with a multimodal learning-based prediction method.The dataset comprises physical examinations,laboratory and imaging studies,detailed questionnaires,and facial photographs of a substantial number of participants,totaling more than 6000.This comprehensive collection of data holds significant value for clinical studies.The dataset is subjected to quantitative analysis to identify which clinical metadata,such as metadata and facial images,has the greatest impact on the prediction of NAFLD.Furthermore,a multimodal learning-based prediction method(DeepFLD)is proposed that incorporates several modalities and demonstrates superior performance compared to the methodology that relies only on metadata.Additionally,satisfactory performance is assessed through verification of the results using other unseen data.Inspiringly,the proposed DeepFLD prediction method can achieve competitive results by solely utilizing facial images as input rather than relying on metadata,paving the way for a more robust and simpler noninvasive NAFLD diagnosis. 展开更多
关键词 Nonalcoholic fatty liver disease detection(NAFLD) disease diagnosis convolutional neural networks multimodal data multimodal learning-based prediction
原文传递
Multimodal artificial intelligence technology in the precision diagnosis and treatment of gastroenterology and hepatology: Innovative applications and challenges
20
作者 Yi-Mao Wu Fei-Yang Tang Zi-Xin Qi 《World Journal of Gastroenterology》 2025年第38期26-43,共18页
With the rapid development of artificial intelligence(AI)technology,multimodal data integration has become an important means to improve the accuracy of diagnosis and treatment in gastroenterology and hepatology.This ... With the rapid development of artificial intelligence(AI)technology,multimodal data integration has become an important means to improve the accuracy of diagnosis and treatment in gastroenterology and hepatology.This article systematically reviews the latest progress of multimodal AI technology in the diagnosis,treatment,and decision-making for gastrointestinal tumors,functional gastrointestinal diseases,and liver diseases,focusing on the innovative applications of endoscopic image AI,pathological section AI,multi-omics data fusion models,and wearable devices combined with natural language processing.Multimodal AI can significantly improve the accuracy of early diagnosis and the efficiency of individualized treatment planning by integrating imaging,pathological data,molecular,and clinical phenotypic data.However,current AI technologies still face challenges such as insufficient data standardization,limited generalization of models,and ethical compliance.This paper proposes solutions,such as the establishment of cross-center data sharing platform,the development of federated learning framework,and the formulation of ethical norms,and looks forward to the application prospect of multimodal large-scale models in the disease management process.This review provides theoretical basis and practical guidance for promoting the clinical translation of AI technology in the field of gastroenterology and hepatology. 展开更多
关键词 Artificial intelligence multimodal data Gastroenterology Hepatology Precision medicine Challenges and countermeasures
在线阅读 下载PDF
上一页 1 2 下一页 到第
使用帮助 返回顶部