期刊文献+
共找到358,987篇文章
< 1 2 250 >
每页显示 20 50 100
Tomato Growth Height Prediction Method by Phenotypic Feature Extraction Using Multi-modal Data
1
作者 GONG Yu WANG Ling +3 位作者 ZHAO Rongqiang YOU Haibo ZHOU Mo LIU Jie 《智慧农业(中英文)》 2025年第1期97-110,共14页
[Objective]Accurate prediction of tomato growth height is crucial for optimizing production environments in smart farming.However,current prediction methods predominantly rely on empirical,mechanistic,or learning-base... [Objective]Accurate prediction of tomato growth height is crucial for optimizing production environments in smart farming.However,current prediction methods predominantly rely on empirical,mechanistic,or learning-based models that utilize either images data or environmental data.These methods fail to fully leverage multi-modal data to capture the diverse aspects of plant growth comprehensively.[Methods]To address this limitation,a two-stage phenotypic feature extraction(PFE)model based on deep learning algorithm of recurrent neural network(RNN)and long short-term memory(LSTM)was developed.The model integrated environment and plant information to provide a holistic understanding of the growth process,emploied phenotypic and temporal feature extractors to comprehensively capture both types of features,enabled a deeper understanding of the interaction between tomato plants and their environment,ultimately leading to highly accurate predictions of growth height.[Results and Discussions]The experimental results showed the model's ef‐fectiveness:When predicting the next two days based on the past five days,the PFE-based RNN and LSTM models achieved mean absolute percentage error(MAPE)of 0.81%and 0.40%,respectively,which were significantly lower than the 8.00%MAPE of the large language model(LLM)and 6.72%MAPE of the Transformer-based model.In longer-term predictions,the 10-day prediction for 4 days ahead and the 30-day prediction for 12 days ahead,the PFE-RNN model continued to outperform the other two baseline models,with MAPE of 2.66%and 14.05%,respectively.[Conclusions]The proposed method,which leverages phenotypic-temporal collaboration,shows great potential for intelligent,data-driven management of tomato cultivation,making it a promising approach for enhancing the efficiency and precision of smart tomato planting management. 展开更多
关键词 tomato growth prediction deep learning phenotypic feature extraction multi-modal data recurrent neural net‐work long short-term memory large language model
在线阅读 下载PDF
Multi-Modal Data Analysis Based Game Player Experience Modeling Using LSTM-DNN 被引量:1
2
作者 Sehar Shahzad Farooq Mustansar Fiaz +4 位作者 Irfan Mehmood Ali Kashif Bashir Raheel Nawaz KyungJoong Kim Soon Ki Jung 《Computers, Materials & Continua》 SCIE EI 2021年第9期4087-4108,共22页
Game player modeling is a paradigm of computational models to exploit players’behavior and experience using game and player analytics.Player modeling refers to descriptions of players based on frameworks of data deri... Game player modeling is a paradigm of computational models to exploit players’behavior and experience using game and player analytics.Player modeling refers to descriptions of players based on frameworks of data derived from the interaction of a player’s behavior within the game as well as the player’s experience with the game.Player behavior focuses on dynamic and static information gathered at the time of gameplay.Player experience concerns the association of the human player during gameplay,which is based on cognitive and affective physiological measurements collected from sensors mounted on the player’s body or in the player’s surroundings.In this paper,player experience modeling is studied based on the board puzzle game“Candy Crush Saga”using cognitive data of players accessed by physiological and peripheral devices.Long Short-Term Memory-based Deep Neural Network(LSTM-DNN)is used to predict players’effective states in terms of valence,arousal,dominance,and liking by employing the concept of transfer learning.Transfer learning focuses on gaining knowledge while solving one problem and using the same knowledge to solve different but related problems.The homogeneous transfer learning approach has not been implemented in the game domain before,and this novel study opens a new research area for the game industry where the main challenge is predicting the significance of innovative games for entertainment and players’engagement.Relevant not only from a player’s point of view,it is also a benchmark study for game developers who have been facing problems of“cold start”for innovative games that strengthen the game industrial economy. 展开更多
关键词 Game player modeling experience modeling player analytics deep learning LSTM game play data Candy Crush Saga
在线阅读 下载PDF
Construction and evaluation of a predictive model for the degree of coronary artery occlusion based on adaptive weighted multi-modal fusion of traditional Chinese and western medicine data 被引量:2
3
作者 Jiyu ZHANG Jiatuo XU +1 位作者 Liping TU Hongyuan FU 《Digital Chinese Medicine》 2025年第2期163-173,共11页
Objective To develop a non-invasive predictive model for coronary artery stenosis severity based on adaptive multi-modal integration of traditional Chinese and western medicine data.Methods Clinical indicators,echocar... Objective To develop a non-invasive predictive model for coronary artery stenosis severity based on adaptive multi-modal integration of traditional Chinese and western medicine data.Methods Clinical indicators,echocardiographic data,traditional Chinese medicine(TCM)tongue manifestations,and facial features were collected from patients who underwent coro-nary computed tomography angiography(CTA)in the Cardiac Care Unit(CCU)of Shanghai Tenth People's Hospital between May 1,2023 and May 1,2024.An adaptive weighted multi-modal data fusion(AWMDF)model based on deep learning was constructed to predict the severity of coronary artery stenosis.The model was evaluated using metrics including accura-cy,precision,recall,F1 score,and the area under the receiver operating characteristic(ROC)curve(AUC).Further performance assessment was conducted through comparisons with six ensemble machine learning methods,data ablation,model component ablation,and various decision-level fusion strategies.Results A total of 158 patients were included in the study.The AWMDF model achieved ex-cellent predictive performance(AUC=0.973,accuracy=0.937,precision=0.937,recall=0.929,and F1 score=0.933).Compared with model ablation,data ablation experiments,and various traditional machine learning models,the AWMDF model demonstrated superior per-formance.Moreover,the adaptive weighting strategy outperformed alternative approaches,including simple weighting,averaging,voting,and fixed-weight schemes.Conclusion The AWMDF model demonstrates potential clinical value in the non-invasive prediction of coronary artery disease and could serve as a tool for clinical decision support. 展开更多
关键词 Coronary artery disease Deep learning multi-modal Clinical prediction Traditional Chinese medicine diagnosis
暂未订购
ViE-Take:A Vision-Driven Multi-Modal Dataset for Exploring the Emotional Landscape in Takeover Safety of Autonomous Driving
4
作者 Yantong Wang Yu Gu +4 位作者 Tong Quan Jiaoyun Yang Mianxiong Dong Ning An Fuji Ren 《Research》 2025年第4期907-925,共19页
Takeover safety draws increasing attention in the intelligent transportation as the new energy vehicles with cutting-edge autopilot capabilities vigorously blossom on the road.Despite recent studies highlighting the i... Takeover safety draws increasing attention in the intelligent transportation as the new energy vehicles with cutting-edge autopilot capabilities vigorously blossom on the road.Despite recent studies highlighting the importance of drivers’emotions in takeover safety,the lack of emotion-aware takeover datasets hinders further investigation,thereby constraining potential applications in this field.To this end,we introduce ViE-Take,the first Vision-driven(Vision is used since it constitutes the most cost-effective and user-friendly solution for commercial driver monitor systems)dataset for exploring the Emotional landscape in Takeovers of autonomous driving.ViE-Take enables a comprehensive exploration of the impact of emotions on drivers’takeover performance through 3 key attributes:multi-source emotion elicitation,multi-modal driver data collection,and multi-dimensional emotion annotations.To aid the use of ViE-Take,we provide 4 deep models(corresponding to 4 prevalent learning strategies)for predicting 3 different aspects of drivers’takeover performance(readiness,reaction time,and quality).These models offer benefits for various downstream tasks,such as driver emotion recognition and regulation for automobile manufacturers.Initial analysis and experiments conducted on ViE-Take indicate that(a)emotions have diverse impacts on takeover performance,some of which are counterintuitive;(b)highly expressive social media clips,despite their brevity,prove effective in eliciting emotions(a foundation for emotion regulation);and(c)predicting takeover performance solely through deep learning on vision data not only is feasible but also holds great potential. 展开更多
关键词 multi modal data emotion annotations takeover safety vision driven new energy vehicles intelligent transportation autonomous driving emotion aware dataset
原文传递
M3SC:A Generic Dataset for Mixed Multi-Modal(MMM)Sensing and Communication Integration 被引量:5
5
作者 Xiang Cheng Ziwei Huang +6 位作者 Lu Bai Haotian Zhang Mingran Sun Boxun Liu Sijiang Li Jianan Zhang Minson Lee 《China Communications》 SCIE CSCD 2023年第11期13-29,共17页
The sixth generation(6G)of mobile communication system is witnessing a new paradigm shift,i.e.,integrated sensing-communication system.A comprehensive dataset is a prerequisite for 6G integrated sensing-communication ... The sixth generation(6G)of mobile communication system is witnessing a new paradigm shift,i.e.,integrated sensing-communication system.A comprehensive dataset is a prerequisite for 6G integrated sensing-communication research.This paper develops a novel simulation dataset,named M3SC,for mixed multi-modal(MMM)sensing-communication integration,and the generation framework of the M3SC dataset is further given.To obtain multimodal sensory data in physical space and communication data in electromagnetic space,we utilize Air-Sim and WaveFarer to collect multi-modal sensory data and exploit Wireless InSite to collect communication data.Furthermore,the in-depth integration and precise alignment of AirSim,WaveFarer,andWireless InSite are achieved.The M3SC dataset covers various weather conditions,multiplex frequency bands,and different times of the day.Currently,the M3SC dataset contains 1500 snapshots,including 80 RGB images,160 depth maps,80 LiDAR point clouds,256 sets of mmWave waveforms with 8 radar point clouds,and 72 channel impulse response(CIR)matrices per snapshot,thus totaling 120,000 RGB images,240,000 depth maps,120,000 LiDAR point clouds,384,000 sets of mmWave waveforms with 12,000 radar point clouds,and 108,000 CIR matrices.The data processing result presents the multi-modal sensory information and communication channel statistical properties.Finally,the MMM sensing-communication application,which can be supported by the M3SC dataset,is discussed. 展开更多
关键词 multi-modal sensing RAY-TRACING sensing-communication integration simulation dataset
在线阅读 下载PDF
TCM network pharmacology:new perspective integrating network target with artificial intelligence and multi-modal multi-omics technologies 被引量:1
6
作者 Ziyi Wang Tingyu Zhang +1 位作者 Boyang Wang Shao Li 《Chinese Journal of Natural Medicines》 2025年第11期1425-1434,共10页
Traditional Chinese medicine(TCM)demonstrates distinctive advantages in disease prevention and treatment.However,analyzing its biological mechanisms through the modern medical research paradigm of“single drug,single ... Traditional Chinese medicine(TCM)demonstrates distinctive advantages in disease prevention and treatment.However,analyzing its biological mechanisms through the modern medical research paradigm of“single drug,single target”presents significant challenges due to its holistic approach.Network pharmacology and its core theory of network targets connect drugs and diseases from a holistic and systematic perspective based on biological networks,overcoming the limitations of reductionist research models and showing considerable value in TCM research.Recent integration of network target computational and experimental methods with artificial intelligence(AI)and multi-modal multi-omics technologies has substantially enhanced network pharmacology methodology.The advancement in computational and experimental techniques provides complementary support for network target theory in decoding TCM principles.This review,centered on network targets,examines the progress of network target methods combined with AI in predicting disease molecular mechanisms and drug-target relationships,alongside the application of multi-modal multi-omics technologies in analyzing TCM formulae,syndromes,and toxicity.Looking forward,network target theory is expected to incorporate emerging technologies while developing novel approaches aligned with its unique characteristics,potentially leading to significant breakthroughs in TCM research and advancing scientific understanding and innovation in TCM. 展开更多
关键词 Network pharmacology Traditional Chinese medicine Network target Artificial intelligence multi-modal Multi-omics
原文传递
Multi-modal intelligent situation awareness in real-time air traffic control: Control intent understanding and flight trajectory prediction 被引量:1
7
作者 Dongyue GUO Jianwei ZHANG +1 位作者 Bo YANG Yi LIN 《Chinese Journal of Aeronautics》 2025年第6期41-57,共17页
With the advent of the next-generation Air Traffic Control(ATC)system,there is growing interest in using Artificial Intelligence(AI)techniques to enhance Situation Awareness(SA)for ATC Controllers(ATCOs),i.e.,Intellig... With the advent of the next-generation Air Traffic Control(ATC)system,there is growing interest in using Artificial Intelligence(AI)techniques to enhance Situation Awareness(SA)for ATC Controllers(ATCOs),i.e.,Intelligent SA(ISA).However,the existing AI-based SA approaches often rely on unimodal data and lack a comprehensive description and benchmark of the ISA tasks utilizing multi-modal data for real-time ATC environments.To address this gap,by analyzing the situation awareness procedure of the ATCOs,the ISA task is refined to the processing of the two primary elements,i.e.,spoken instructions and flight trajectories.Subsequently,the ISA is further formulated into Controlling Intent Understanding(CIU)and Flight Trajectory Prediction(FTP)tasks.For the CIU task,an innovative automatic speech recognition and understanding framework is designed to extract the controlling intent from unstructured and continuous ATC communications.For the FTP task,the single-and multi-horizon FTP approaches are investigated to support the high-precision prediction of the situation evolution.A total of 32 unimodal/multi-modal advanced methods with extensive evaluation metrics are introduced to conduct the benchmarks on the real-world multi-modal ATC situation dataset.Experimental results demonstrate the effectiveness of AI-based techniques in enhancing ISA for the ATC environment. 展开更多
关键词 Airtraffic control Automatic speechrecognition and understanding Flight trajectory prediction multi-modal Situationawareness
原文传递
Adaptive Multi-modal Fusion Instance Segmentation for CAEVs in Complex Conditions:Dataset,Framework and Verifications 被引量:3
8
作者 Pai Peng Keke Geng +3 位作者 Guodong Yin Yanbo Lu Weichao Zhuang Shuaipeng Liu 《Chinese Journal of Mechanical Engineering》 SCIE EI CAS CSCD 2021年第5期96-106,共11页
Current works of environmental perception for connected autonomous electrified vehicles(CAEVs)mainly focus on the object detection task in good weather and illumination conditions,they often perform poorly in adverse ... Current works of environmental perception for connected autonomous electrified vehicles(CAEVs)mainly focus on the object detection task in good weather and illumination conditions,they often perform poorly in adverse scenarios and have a vague scene parsing ability.This paper aims to develop an end-to-end sharpening mixture of experts(SMoE)fusion framework to improve the robustness and accuracy of the perception systems for CAEVs in complex illumination and weather conditions.Three original contributions make our work distinctive from the existing relevant literature.The Complex KITTI dataset is introduced which consists of 7481 pairs of modified KITTI RGB images and the generated LiDAR dense depth maps,and this dataset is fine annotated in instance-level with the proposed semi-automatic annotation method.The SMoE fusion approach is devised to adaptively learn the robust kernels from complementary modalities.Comprehensive comparative experiments are implemented,and the results show that the proposed SMoE framework yield significant improvements over the other fusion techniques in adverse environmental conditions.This research proposes a SMoE fusion framework to improve the scene parsing ability of the perception systems for CAEVs in adverse conditions. 展开更多
关键词 Connected autonomous electrified vehicles multi-modal fusion Semi-automatic annotation Sharpening mixture of experts Comparative experiments
在线阅读 下载PDF
Personal Style Guided Outfit Recommendation with Multi-Modal Fashion Compatibility Modeling 被引量:1
9
作者 WANG Kexin ZHANG Jie +3 位作者 ZHANG Peng SUN Kexin ZHAN Jiamei WEI Meng 《Journal of Donghua University(English Edition)》 2025年第2期156-167,共12页
A personalized outfit recommendation has emerged as a hot research topic in the fashion domain.However,existing recommendations do not fully exploit user style preferences.Typically,users prefer particular styles such... A personalized outfit recommendation has emerged as a hot research topic in the fashion domain.However,existing recommendations do not fully exploit user style preferences.Typically,users prefer particular styles such as casual and athletic styles,and consider attributes like color and texture when selecting outfits.To achieve personalized outfit recommendations in line with user style preferences,this paper proposes a personal style guided outfit recommendation with multi-modal fashion compatibility modeling,termed as PSGNet.Firstly,a style classifier is designed to categorize fashion images of various clothing types and attributes into distinct style categories.Secondly,a personal style prediction module extracts user style preferences by analyzing historical data.Then,to address the limitations of single-modal representations and enhance fashion compatibility,both fashion images and text data are leveraged to extract multi-modal features.Finally,PSGNet integrates these components through Bayesian personalized ranking(BPR)to unify the personal style and fashion compatibility,where the former is used as personal style features and guides the output of the personalized outfit recommendation tailored to the target user.Extensive experiments on large-scale datasets demonstrate that the proposed model is efficient on the personalized outfit recommendation. 展开更多
关键词 personalized outfit recommendation fashion compatibility modeling style preference multi-modal representation Bayesian personalized ranking(BPR) style classifier
暂未订购
A multi-modal clustering method for traditonal Chinese medicine clinical data via media convergence 被引量:2
10
作者 Jingna Si Ziwei Tian +6 位作者 Dongmei Li Lei Zhang Lei Yao Wenjuan Jiang Jia Liu Runshun Zhang Xiaoping Zhang 《CAAI Transactions on Intelligence Technology》 SCIE EI 2023年第2期390-400,共11页
Media convergence is a media change led by technological innovation.Applying media convergence technology to the study of clustering in Chinese medicine can significantly exploit the advantages of media fusion.Obtaini... Media convergence is a media change led by technological innovation.Applying media convergence technology to the study of clustering in Chinese medicine can significantly exploit the advantages of media fusion.Obtaining consistent and complementary information among multiple modalities through media convergence can provide technical support for clustering.This article presents an approach based on Media Convergence and Graph convolution Encoder Clustering(MCGEC)for traditonal Chinese medicine(TCM)clinical data.It feeds modal information and graph structure from media information into a multi-modal graph convolution encoder to obtain the media feature representation learnt from multiple modalities.MCGEC captures latent information from various modalities by fusion and optimises the feature representations and network architecture with learnt clustering labels.The experiment is conducted on real-world multimodal TCM clinical data,including information like images and text.MCGEC has improved clustering results compared to the generic single-modal clustering methods and the current more advanced multi-modal clustering methods.MCGEC applied to TCM clinical datasets can achieve better results.Integrating multimedia features into clustering algorithms offers significant benefits compared to single-modal clustering approaches that simply concatenate features from different modalities.It provides practical technical support for multi-modal clustering in the TCM field incorporating multimedia features. 展开更多
关键词 graph convolutional encoder media convergence multi-modal clustering traditional Chinese medicine
在线阅读 下载PDF
Deep learning-based multi-modal data integration enhancing breast cancer disease-free survival prediction 被引量:1
11
作者 Zehua Wang Ruichong Lin +9 位作者 Yanchun Li Jin Zeng Yongjian Chen Wenhao Ouyang Han Li Xueyan Jia Zijia Lai Yunfang Yu Herui Yao Weifeng Su 《Precision Clinical Medicine》 2024年第2期132-145,共14页
Background:The prognosis of breast cancer is often unfavorable,emphasizing the need for early metastasis risk detection and accurate treatment predictions.This study aimed to develop a novel multi-modal deep learning ... Background:The prognosis of breast cancer is often unfavorable,emphasizing the need for early metastasis risk detection and accurate treatment predictions.This study aimed to develop a novel multi-modal deep learning model using preoperative data to predict disease-free survival(DFS).Methods:We retrospectively collected pathology imaging,molecular and clinical data from The Cancer Genome Atlas and one independent institution in China.We developed a novel Deep Learning Clinical Medicine Based Pathological Gene Multi-modal(DeepClinMed-PGM)model for DFS prediction,integrating clinicopathological data with molecular insights.The patients included the training cohort(n=741),internal validation cohort(n=184),and external testing cohort(n=95).Result:Integrating multi-modal data into the DeepClinMed-PGM model significantly improved area under the receiver operating characteristic curve(AUC)values.In the training cohort,AUC values for 1-,3-,and 5-year DFS predictions increased to 0.979,0.957,and 0.871,while in the external testing cohort,the values reached 0.851,0.878,and 0.938 for 1-,2-,and 3-year DFS predictions,respectively.The DeepClinMed-PGM's robust discriminative capabilities were consistently evident across various cohorts,including the training cohort[hazard ratio(HR)0.027,95%confidence interval(CI)0.0016-0.046,P<0.0001],the internal validation cohort(HR 0.117,95%CI 0.041-0.334,P<0.0001),and the external cohort(HR 0.061,95%CI 0.017-0.218,P<0.0001).Additionally,the DeepClinMed-PGM model demonstrated C-index values of 0.925,0.823,and 0.864 within the three cohorts,respectively.Conclusion:This study introduces an approach to breast cancer prognosis,integrating imaging and molecular and clinical data for enhanced predictive accuracy,offering promise for personalized treatment strategies. 展开更多
关键词 breast cancer multi-modalITY deep learning PATHOLOGICAL disease-free survival
原文传递
Multi-modal deep learning based on multi-dimensional and multi-level temporal data can enhance the prognostic prediction for multi-drug resistant pulmonary tuberculosis patients 被引量:4
12
作者 Zhen-Hui Lu Ming Yang +2 位作者 Chen-Hui Pan Pei-Yong Zheng Shun-Xian Zhang 《Science in One Health》 2022年第1期6-8,共3页
Despite the advent of new diagnostics,drugs and regimens,multi-drug resistant pulmonary tuberculosis(MDRPTB)remains a global health threat.It has a long treatment cycle,low cure rate and heavy disease burden.Factors s... Despite the advent of new diagnostics,drugs and regimens,multi-drug resistant pulmonary tuberculosis(MDRPTB)remains a global health threat.It has a long treatment cycle,low cure rate and heavy disease burden.Factors such as demographics,disease characteristics,lung imaging,biomarkers,therapeutic schedule and adherence to medications are associated with MDR-PTB prognosis.However,thus far,the majority of existing studies have focused on predicting treatment outcomes through static single-scale or low dimensional information.Hence,multi-modal deep learning based on dynamic data for multiple dimensions can provide a deeper understanding of personalized treatment plans to aid in the clinical management of patients. 展开更多
关键词 MDR-PTB multi-modal Deep learning PROGNOSIS
暂未订购
Multi-Modal Named Entity Recognition with Auxiliary Visual Knowledge and Word-Level Fusion
13
作者 Huansha Wang Ruiyang Huang +1 位作者 Qinrang Liu Xinghao Wang 《Computers, Materials & Continua》 2025年第6期5747-5760,共14页
Multi-modal Named Entity Recognition(MNER)aims to better identify meaningful textual entities by integrating information from images.Previous work has focused on extracting visual semantics at a fine-grained level,or ... Multi-modal Named Entity Recognition(MNER)aims to better identify meaningful textual entities by integrating information from images.Previous work has focused on extracting visual semantics at a fine-grained level,or obtaining entity related external knowledge from knowledge bases or Large Language Models(LLMs).However,these approaches ignore the poor semantic correlation between visual and textual modalities in MNER datasets and do not explore different multi-modal fusion approaches.In this paper,we present MMAVK,a multi-modal named entity recognition model with auxiliary visual knowledge and word-level fusion,which aims to leverage the Multi-modal Large Language Model(MLLM)as an implicit knowledge base.It also extracts vision-based auxiliary knowledge from the image formore accurate and effective recognition.Specifically,we propose vision-based auxiliary knowledge generation,which guides the MLLM to extract external knowledge exclusively derived from images to aid entity recognition by designing target-specific prompts,thus avoiding redundant recognition and cognitive confusion caused by the simultaneous processing of image-text pairs.Furthermore,we employ a word-level multi-modal fusion mechanism to fuse the extracted external knowledge with each word-embedding embedded from the transformerbased encoder.Extensive experimental results demonstrate that MMAVK outperforms or equals the state-of-the-art methods on the two classical MNER datasets,even when the largemodels employed have significantly fewer parameters than other baselines. 展开更多
关键词 multi-modal named entity recognition large language model multi-modal fusion
在线阅读 下载PDF
MMCSD:Multi-Modal Knowledge Graph Completion Based on Super-Resolution and Detailed Description Generation
14
作者 Huansha Wang Ruiyang Huang +2 位作者 Qinrang Liu Shaomei Li Jianpeng Zhang 《Computers, Materials & Continua》 2025年第4期761-783,共23页
Multi-modal knowledge graph completion(MMKGC)aims to complete missing entities or relations in multi-modal knowledge graphs,thereby discovering more previously unknown triples.Due to the continuous growth of data and ... Multi-modal knowledge graph completion(MMKGC)aims to complete missing entities or relations in multi-modal knowledge graphs,thereby discovering more previously unknown triples.Due to the continuous growth of data and knowledge and the limitations of data sources,the visual knowledge within the knowledge graphs is generally of low quality,and some entities suffer from the issue of missing visual modality.Nevertheless,previous studies of MMKGC have primarily focused on how to facilitate modality interaction and fusion while neglecting the problems of low modality quality and modality missing.In this case,mainstream MMKGC models only use pre-trained visual encoders to extract features and transfer the semantic information to the joint embeddings through modal fusion,which inevitably suffers from problems such as error propagation and increased uncertainty.To address these problems,we propose a Multi-modal knowledge graph Completion model based on Super-resolution and Detailed Description Generation(MMCSD).Specifically,we leverage a pre-trained residual network to enhance the resolution and improve the quality of the visual modality.Moreover,we design multi-level visual semantic extraction and entity description generation,thereby further extracting entity semantics from structural triples and visual images.Meanwhile,we train a variational multi-modal auto-encoder and utilize a pre-trained multi-modal language model to complement the missing visual features.We conducted experiments on FB15K-237 and DB13K,and the results showed that MMCSD can effectively perform MMKGC and achieve state-of-the-art performance. 展开更多
关键词 multi-modal knowledge graph knowledge graph completion multi-modal fusion
在线阅读 下载PDF
Transformers for Multi-Modal Image Analysis in Healthcare
15
作者 Sameera V Mohd Sagheer Meghana K H +2 位作者 P M Ameer Muneer Parayangat Mohamed Abbas 《Computers, Materials & Continua》 2025年第9期4259-4297,共39页
Integrating multiple medical imaging techniques,including Magnetic Resonance Imaging(MRI),Computed Tomography,Positron Emission Tomography(PET),and ultrasound,provides a comprehensive view of the patient health status... Integrating multiple medical imaging techniques,including Magnetic Resonance Imaging(MRI),Computed Tomography,Positron Emission Tomography(PET),and ultrasound,provides a comprehensive view of the patient health status.Each of these methods contributes unique diagnostic insights,enhancing the overall assessment of patient condition.Nevertheless,the amalgamation of data from multiple modalities presents difficulties due to disparities in resolution,data collection methods,and noise levels.While traditional models like Convolutional Neural Networks(CNNs)excel in single-modality tasks,they struggle to handle multi-modal complexities,lacking the capacity to model global relationships.This research presents a novel approach for examining multi-modal medical imagery using a transformer-based system.The framework employs self-attention and cross-attention mechanisms to synchronize and integrate features across various modalities.Additionally,it shows resilience to variations in noise and image quality,making it adaptable for real-time clinical use.To address the computational hurdles linked to transformer models,particularly in real-time clinical applications in resource-constrained environments,several optimization techniques have been integrated to boost scalability and efficiency.Initially,a streamlined transformer architecture was adopted to minimize the computational load while maintaining model effectiveness.Methods such as model pruning,quantization,and knowledge distillation have been applied to reduce the parameter count and enhance the inference speed.Furthermore,efficient attention mechanisms such as linear or sparse attention were employed to alleviate the substantial memory and processing requirements of traditional self-attention operations.For further deployment optimization,researchers have implemented hardware-aware acceleration strategies,including the use of TensorRT and ONNX-based model compression,to ensure efficient execution on edge devices.These optimizations allow the approach to function effectively in real-time clinical settings,ensuring viability even in environments with limited resources.Future research directions include integrating non-imaging data to facilitate personalized treatment and enhancing computational efficiency for implementation in resource-limited environments.This study highlights the transformative potential of transformer models in multi-modal medical imaging,offering improvements in diagnostic accuracy and patient care outcomes. 展开更多
关键词 multi-modal image analysis medical imaging deep learning image segmentation disease detection multi-modal fusion Vision Transformers(ViTs) precision medicine clinical decision support
在线阅读 下载PDF
Multi-Modal Pre-Synergistic Fusion Entity Alignment Based on Mutual Information Strategy Optimization
16
作者 Huayu Li Xinxin Chen +3 位作者 Lizhuang Tan Konstantin I.Kostromitin Athanasios V.Vasilakos Peiying Zhang 《Computers, Materials & Continua》 2025年第11期4133-4153,共21页
To address the challenge of missing modal information in entity alignment and to mitigate information loss or bias arising frommodal heterogeneity during fusion,while also capturing shared information acrossmodalities... To address the challenge of missing modal information in entity alignment and to mitigate information loss or bias arising frommodal heterogeneity during fusion,while also capturing shared information acrossmodalities,this paper proposes a Multi-modal Pre-synergistic Entity Alignmentmodel based on Cross-modalMutual Information Strategy Optimization(MPSEA).The model first employs independent encoders to process multi-modal features,including text,images,and numerical values.Next,a multi-modal pre-synergistic fusion mechanism integrates graph structural and visual modal features into the textual modality as preparatory information.This pre-fusion strategy enables unified perception of heterogeneous modalities at the model’s initial stage,reducing discrepancies during the fusion process.Finally,using cross-modal deep perception reinforcement learning,the model achieves adaptive multilevel feature fusion between modalities,supporting learningmore effective alignment strategies.Extensive experiments on multiple public datasets show that the MPSEA method achieves gains of up to 7% in Hits@1 and 8.2% in MRR on the FBDB15K dataset,and up to 9.1% in Hits@1 and 7.7% in MRR on the FBYG15K dataset,compared to existing state-of-the-art methods.These results confirm the effectiveness of the proposed model. 展开更多
关键词 Knowledge graph multi-modal entity alignment feature fusion pre-synergistic fusion
在线阅读 下载PDF
Visual Topic Semantic Enhanced Machine Translation for Multi-Modal Data Efficiency
17
作者 王超 蔡思佳 +1 位作者 史北祥 崇志宏 《Journal of Computer Science & Technology》 SCIE EI CSCD 2023年第6期1223-1236,共14页
The scarcity of bilingual parallel corpus imposes limitations on exploiting the state-of-the-art supervised translation technology.One of the research directions is employing relations among multi-modal data to enhanc... The scarcity of bilingual parallel corpus imposes limitations on exploiting the state-of-the-art supervised translation technology.One of the research directions is employing relations among multi-modal data to enhance perfor-mance.However,the reliance on manually annotated multi-modal datasets results in a high cost of data labeling.In this paper,the topic semantics of images is proposed to alleviate the above problem.First,topic-related images can be auto-matically collected from the Internet by search engines.Second,topic semantics is sufficient to encode the relations be-tween multi-modal data such as texts and images.Specifically,we propose a visual topic semantic enhanced translation(VTSE)model that utilizes topic-related images to construct a cross-lingual and cross-modal semantic space,allowing the VTSE model to simultaneously integrate the syntactic structure and semantic features.In the above process,topic similar texts and images are wrapped into groups so that the model can extract more robust topic semantics from a set of similar images and then further optimize the feature integration.The results show that our model outperforms competitive base-lines by a large margin on the Multi30k and the Ambiguous COCO datasets.Our model can use external images to bring gains to translation,improving data efficiency. 展开更多
关键词 multi-modal machine translation visual topic semantics data efficiency
原文传递
Research Progress on Multi-Modal Fusion Object Detection Algorithms for Autonomous Driving:A Review
18
作者 Peicheng Shi Li Yang +2 位作者 Xinlong Dong Heng Qi Aixi Yang 《Computers, Materials & Continua》 2025年第6期3877-3917,共41页
As the number and complexity of sensors in autonomous vehicles continue to rise,multimodal fusionbased object detection algorithms are increasingly being used to detect 3D environmental information,significantly advan... As the number and complexity of sensors in autonomous vehicles continue to rise,multimodal fusionbased object detection algorithms are increasingly being used to detect 3D environmental information,significantly advancing the development of perception technology in autonomous driving.To further promote the development of fusion algorithms and improve detection performance,this paper discusses the advantages and recent advancements of multimodal fusion-based object detection algorithms.Starting fromsingle-modal sensor detection,the paper provides a detailed overview of typical sensors used in autonomous driving and introduces object detection methods based on images and point clouds.For image-based detection methods,they are categorized into monocular detection and binocular detection based on different input types.For point cloud-based detection methods,they are classified into projection-based,voxel-based,point cluster-based,pillar-based,and graph structure-based approaches based on the technical pathways for processing point cloud features.Additionally,multimodal fusion algorithms are divided into Camera-LiDAR fusion,Camera-Radar fusion,Camera-LiDAR-Radar fusion,and other sensor fusion methods based on the types of sensors involved.Furthermore,the paper identifies five key future research directions in this field,aiming to provide insights for researchers engaged in multimodal fusion-based object detection algorithms and to encourage broader attention to the research and application of multimodal fusion-based object detection. 展开更多
关键词 multi-modal fusion 3D object detection deep learning autonomous driving
在线阅读 下载PDF
A multi-modal hierarchical approach for Chinese spelling correction using multi-head attention and residual connections
19
作者 SHAO Qing DU Yiwei 《High Technology Letters》 2025年第3期309-320,共12页
The primary objective of Chinese spelling correction(CSC)is to detect and correct erroneous characters in Chinese text,which can result from various factors,such as inaccuracies in pinyin representation,character rese... The primary objective of Chinese spelling correction(CSC)is to detect and correct erroneous characters in Chinese text,which can result from various factors,such as inaccuracies in pinyin representation,character resemblance,and semantic discrepancies.However,existing methods often struggle to fully address these types of errors,impacting the overall correction accuracy.This paper introduces a multi-modal feature encoder designed to efficiently extract features from three distinct modalities:pinyin,semantics,and character morphology.Unlike previous methods that rely on direct fusion or fixed-weight summation to integrate multi-modal information,our approach employs a multi-head attention mechanism to focuse more on relevant modal information while dis-regarding less pertinent data.To prevent issues such as gradient explosion or vanishing,the model incorporates a residual connection of the original text vector for fine-tuning.This approach ensures robust model performance by maintaining essential linguistic details throughout the correction process.Experimental evaluations on the SIGHAN benchmark dataset demonstrate that the pro-posed model outperforms baseline approaches across various metrics and datasets,confirming its effectiveness and feasibility. 展开更多
关键词 Chinese spelling correction multiple-headed attention multi-modal fusion resid-ual connection pinyin encoder
在线阅读 下载PDF
上一页 1 2 250 下一页 到第
使用帮助 返回顶部