期刊文献+
共找到366,041篇文章
< 1 2 250 >
每页显示 20 50 100
Multi-modal data analysis for autism spectrum disorder in children:State of the art and trends
1
作者 Lukai Pang Xiaoke Zhao +4 位作者 Lulu Zhao Jianqing Li Fengyi Kuo Hongxing Wang Chengyu Liu 《EngMedicine》 2026年第1期47-56,共10页
Autism spectrum disorder(AsD)is a highly heterogeneous neurodevelopmental disorder.Early diagnosis and intervention are crucial for improving outcomes.Traditional single-modality diagnostic methods are subjective,limi... Autism spectrum disorder(AsD)is a highly heterogeneous neurodevelopmental disorder.Early diagnosis and intervention are crucial for improving outcomes.Traditional single-modality diagnostic methods are subjective,limited,and struggle to reveal the underlying pathological mechanisms.In contrast,multimodal data analysis integrates behavioral,physiological,and neuroimaging information with advanced machine-learning and deeplearning algorithms to overcome these limitations.In this review,we surveyed the recent pediatric AsD literature,highlighting artificial intelligence-driven diagnostic techniques,multimodal data fusion strategies,and emerging trends in ASD assessment.We surveyed studies that integrated two or more modalities and summarized the fusion levels,learning paradigms,tasks,datasets,and metrics.Multimodal approaches outperform singlemodality baselines in classification,severity estimation,and subtyping by leveraging complementary information and reducing modality-specific biases.Multimodal approaches significantly enhance diagnostic accuracy and comprehensiveness,enabling early screening of AsD,symptom subtyping,severity assessment,and personalized interventions.Advances in multimodal fusion techniques have promoted progress in precision medicine for the treatment of ASD. 展开更多
关键词 Autism spectrum disorder multi-modal data Machine learning Early screening Symptom subtyping
暂未订购
Tomato Growth Height Prediction Method by Phenotypic Feature Extraction Using Multi-modal Data
2
作者 GONG Yu WANG Ling +3 位作者 ZHAO Rongqiang YOU Haibo ZHOU Mo LIU Jie 《智慧农业(中英文)》 2025年第1期97-110,共14页
[Objective]Accurate prediction of tomato growth height is crucial for optimizing production environments in smart farming.However,current prediction methods predominantly rely on empirical,mechanistic,or learning-base... [Objective]Accurate prediction of tomato growth height is crucial for optimizing production environments in smart farming.However,current prediction methods predominantly rely on empirical,mechanistic,or learning-based models that utilize either images data or environmental data.These methods fail to fully leverage multi-modal data to capture the diverse aspects of plant growth comprehensively.[Methods]To address this limitation,a two-stage phenotypic feature extraction(PFE)model based on deep learning algorithm of recurrent neural network(RNN)and long short-term memory(LSTM)was developed.The model integrated environment and plant information to provide a holistic understanding of the growth process,emploied phenotypic and temporal feature extractors to comprehensively capture both types of features,enabled a deeper understanding of the interaction between tomato plants and their environment,ultimately leading to highly accurate predictions of growth height.[Results and Discussions]The experimental results showed the model's ef‐fectiveness:When predicting the next two days based on the past five days,the PFE-based RNN and LSTM models achieved mean absolute percentage error(MAPE)of 0.81%and 0.40%,respectively,which were significantly lower than the 8.00%MAPE of the large language model(LLM)and 6.72%MAPE of the Transformer-based model.In longer-term predictions,the 10-day prediction for 4 days ahead and the 30-day prediction for 12 days ahead,the PFE-RNN model continued to outperform the other two baseline models,with MAPE of 2.66%and 14.05%,respectively.[Conclusions]The proposed method,which leverages phenotypic-temporal collaboration,shows great potential for intelligent,data-driven management of tomato cultivation,making it a promising approach for enhancing the efficiency and precision of smart tomato planting management. 展开更多
关键词 tomato growth prediction deep learning phenotypic feature extraction multi-modal data recurrent neural net‐work long short-term memory large language model
在线阅读 下载PDF
Multi-Modal Data Analysis Based Game Player Experience Modeling Using LSTM-DNN 被引量:1
3
作者 Sehar Shahzad Farooq Mustansar Fiaz +4 位作者 Irfan Mehmood Ali Kashif Bashir Raheel Nawaz KyungJoong Kim Soon Ki Jung 《Computers, Materials & Continua》 SCIE EI 2021年第9期4087-4108,共22页
Game player modeling is a paradigm of computational models to exploit players’behavior and experience using game and player analytics.Player modeling refers to descriptions of players based on frameworks of data deri... Game player modeling is a paradigm of computational models to exploit players’behavior and experience using game and player analytics.Player modeling refers to descriptions of players based on frameworks of data derived from the interaction of a player’s behavior within the game as well as the player’s experience with the game.Player behavior focuses on dynamic and static information gathered at the time of gameplay.Player experience concerns the association of the human player during gameplay,which is based on cognitive and affective physiological measurements collected from sensors mounted on the player’s body or in the player’s surroundings.In this paper,player experience modeling is studied based on the board puzzle game“Candy Crush Saga”using cognitive data of players accessed by physiological and peripheral devices.Long Short-Term Memory-based Deep Neural Network(LSTM-DNN)is used to predict players’effective states in terms of valence,arousal,dominance,and liking by employing the concept of transfer learning.Transfer learning focuses on gaining knowledge while solving one problem and using the same knowledge to solve different but related problems.The homogeneous transfer learning approach has not been implemented in the game domain before,and this novel study opens a new research area for the game industry where the main challenge is predicting the significance of innovative games for entertainment and players’engagement.Relevant not only from a player’s point of view,it is also a benchmark study for game developers who have been facing problems of“cold start”for innovative games that strengthen the game industrial economy. 展开更多
关键词 Game player modeling experience modeling player analytics deep learning LSTM game play data Candy Crush Saga
在线阅读 下载PDF
Genetic Algorithm-Optimized Stacking Ensemble Learning for Multi-modal Data Analysis
4
作者 Yingjing Wang Di Wang +6 位作者 Haoqi Xu Ziyang Du Xianyu Zhang Yongqi Tan Wenjie Pan Chaoying Jiang Weichang Gao 《Journal of Intelligent Science and Engineering Technology》 2026年第1期17-27,共11页
Tobacco leaf grade classification plays an important role in the tobacco industry.The traditional tobacco grade evaluation method mainly relies on manual sensory analysis,but this method has some limitations.With the ... Tobacco leaf grade classification plays an important role in the tobacco industry.The traditional tobacco grade evaluation method mainly relies on manual sensory analysis,but this method has some limitations.With the progress of science and technology,the tobacco grade classification method based on machine learning technology has been widely used.However,the existing methods still need to be further improved in terms of feature selection classification accuracy and model generalization ability.The model that integrates stacking ensemble and genetic algorithm is proposed to discriminate the grade of tobacco leaves in the study.The experimental results show that the proposed method can effectively extract key features from complex multi-modal data,which fully proves that the mathematical model has excellent performance in tobacco grade classification.The accuracy achieves up to 93.7% in the identification task,which demonstrates that the model has superior applicability in the multimodal data. 展开更多
关键词 Genetic Algorithm Stacking Ensemble multi-modal Machine Learning Geological Sensor
在线阅读 下载PDF
Railway Track Defect Detection Based on Dynamic Multi-Modal Fusion and Challenging Object Enhanced Perception
5
作者 Yaguan Wang Linlin Kou +3 位作者 Yang Gao Qiang Sun Yong Qin Genwang Peng 《Structural Durability & Health Monitoring》 2026年第2期195-212,共18页
The fasteners employed in the railway tracks are susceptible to defects arising from their intricate composition.Foreign objects are frequently observed on the track bed in an open environment.These two types of defec... The fasteners employed in the railway tracks are susceptible to defects arising from their intricate composition.Foreign objects are frequently observed on the track bed in an open environment.These two types of defects pose potential threats to high-speed trains,thus necessitating timely and accurate track inspection.The majority of extant automatic inspection methods are predicated on the utilization of single visible light data,and the efficacy of the algorithmic processes is influenced by complex environments.Furthermore,due to the single information dimension,the detection accuracy of defects in similar,occluded,and small object categories is low.To address the aforementioned issues,this paper proposes a track defect detectionmethod based on dynamicmulti-modal fusion and challenging object enhanced perception.First,in light of the variances in the representation dimensions ofmultimodal information,this paper proposes a dynamic weighted multi-modal feature fusion module.The fused multi-modal features are assigned weights,and thenmultiplied with the extracted single-modal features atmultiple levels,achieving adaptive adjustment of the response degree of fusion features.Second,a novel stepwise multi-scale convolution feature aggregation module is proposed for challenging objects.The proposed method employs depth separable convolution and cross-scale aggregation operations of different receptive fields to enhance feature extraction and reuse,thereby reducing the degree of progressive loss of effective information.The experimental results demonstrate the efficacy of the proposed method in comparison to eight established methods,encompassing both single-modal and multi-modal methods,as evidenced by the extensive findings within the constructed RGBD dataset. 展开更多
关键词 Railway safety track defect detection multi-modal data object detection
在线阅读 下载PDF
GaitMAFF:Adaptive Multi-Modal Fusion of Skeleton Maps and Silhouettes for Robust Gait Recognition in Complex Scenarios
6
作者 Zhongbin Luo Zhaoyang Guan +2 位作者 Wenxing You Yunteng Wang Yanqiu Bi 《Computers, Materials & Continua》 2026年第5期540-558,共19页
Gait recognition is a key biometric for long-distance identification,yet its performance is severely degraded by real-world challenges such as varying clothing,carrying conditions,and changing viewpoints.While combini... Gait recognition is a key biometric for long-distance identification,yet its performance is severely degraded by real-world challenges such as varying clothing,carrying conditions,and changing viewpoints.While combining silhouette and skeleton data is a promising direction,effectively fusing these heterogeneous modalities and adaptively weighting their contributions in response to diverse conditions remains a central problem.This paper introduces GaitMAFF,a novelMulti-modal Adaptive Feature Fusion Network,to address this challenge.Our approach first transforms discrete skeleton joints into a dense SkeletonMap representation to align with silhouettes,then employs an attention-based module to dynamically learn the fusion weights between the two modalities.These fused features are processed by a powerful spatio-temporal backbone withWeighted Global-Local Feature FusionModules(WFFM)to learn a discriminative representation.Extensive experiments on the challenging CCPG and Gait3D datasets show that GaitMAFF achieves state-of-the-art performance,with an average Rank-1 accuracy of 84.6%on CCPG and 58.7%on Gait3D.These results demonstrate that our adaptive fusion strategy effectively integrates complementary multimodal information,significantly enhancing gait recognition robustness and accuracy in complex scenes and providing a practical solution for real-world applications. 展开更多
关键词 Gait recognition multi-modal fusion adaptive feature fusion skeleton map SILHOUETTE
在线阅读 下载PDF
MDGET-MER:Multi-Level Dynamic Gating and Emotion Transfer for Multi-Modal Emotion Recognition
7
作者 Musheng Chen Qiang Wen +2 位作者 Xiaohong Qiu Junhua Wu Wenqing Fu 《Computers, Materials & Continua》 2026年第3期872-893,共22页
In multi-modal emotion recognition,excessive reliance on historical context often impedes the detection of emotional shifts,while modality heterogeneity and unimodal noise limit recognition performance.Existing method... In multi-modal emotion recognition,excessive reliance on historical context often impedes the detection of emotional shifts,while modality heterogeneity and unimodal noise limit recognition performance.Existing methods struggle to dynamically adjust cross-modal complementary strength to optimize fusion quality and lack effective mechanisms to model the dynamic evolution of emotions.To address these issues,we propose a multi-level dynamic gating and emotion transfer framework for multi-modal emotion recognition.A dynamic gating mechanism is applied across unimodal encoding,cross-modal alignment,and emotion transfer modeling,substantially improving noise robustness and feature alignment.First,we construct a unimodal encoder based on gated recurrent units and feature-selection gating to suppress intra-modal noise and enhance contextual representation.Second,we design a gated-attention crossmodal encoder that dynamically calibrates the complementary contributions of visual and audio modalities to the dominant textual features and eliminates redundant information.Finally,we introduce a gated enhanced emotion transfer module that explicitly models the temporal dependence of emotional evolution in dialogues via transfer gating and optimizes continuity modeling with a comparative learning loss.Experimental results demonstrate that the proposed method outperforms state-of-the-art models on the public MELD and IEMOCAP datasets. 展开更多
关键词 multi-modal emotion recognition dynamic gating emotion transfer module cross-modal dynamic alignment noise robustness
在线阅读 下载PDF
Construction and evaluation of a predictive model for the degree of coronary artery occlusion based on adaptive weighted multi-modal fusion of traditional Chinese and western medicine data 被引量:2
8
作者 Jiyu ZHANG Jiatuo XU +1 位作者 Liping TU Hongyuan FU 《Digital Chinese Medicine》 2025年第2期163-173,共11页
Objective To develop a non-invasive predictive model for coronary artery stenosis severity based on adaptive multi-modal integration of traditional Chinese and western medicine data.Methods Clinical indicators,echocar... Objective To develop a non-invasive predictive model for coronary artery stenosis severity based on adaptive multi-modal integration of traditional Chinese and western medicine data.Methods Clinical indicators,echocardiographic data,traditional Chinese medicine(TCM)tongue manifestations,and facial features were collected from patients who underwent coro-nary computed tomography angiography(CTA)in the Cardiac Care Unit(CCU)of Shanghai Tenth People's Hospital between May 1,2023 and May 1,2024.An adaptive weighted multi-modal data fusion(AWMDF)model based on deep learning was constructed to predict the severity of coronary artery stenosis.The model was evaluated using metrics including accura-cy,precision,recall,F1 score,and the area under the receiver operating characteristic(ROC)curve(AUC).Further performance assessment was conducted through comparisons with six ensemble machine learning methods,data ablation,model component ablation,and various decision-level fusion strategies.Results A total of 158 patients were included in the study.The AWMDF model achieved ex-cellent predictive performance(AUC=0.973,accuracy=0.937,precision=0.937,recall=0.929,and F1 score=0.933).Compared with model ablation,data ablation experiments,and various traditional machine learning models,the AWMDF model demonstrated superior per-formance.Moreover,the adaptive weighting strategy outperformed alternative approaches,including simple weighting,averaging,voting,and fixed-weight schemes.Conclusion The AWMDF model demonstrates potential clinical value in the non-invasive prediction of coronary artery disease and could serve as a tool for clinical decision support. 展开更多
关键词 Coronary artery disease Deep learning multi-modal Clinical prediction Traditional Chinese medicine diagnosis
暂未订购
Adaptive Reinforcement Learning with Multi-Modal Perception for Autonomous Formation Control and Exploration in Large-Scale Multi-UAV Swarms
9
作者 Ziyuan Ma Huajun Gong Xinhua Wang 《Journal of Beijing Institute of Technology》 2026年第1期63-83,共21页
To address the challenge of achieving decentralized,scalable,and adaptive control for large-scale multiple unmanned aerial vehicle(multi-UAV)swarms in dynamic urban environments with obstacles and wind perturbations,w... To address the challenge of achieving decentralized,scalable,and adaptive control for large-scale multiple unmanned aerial vehicle(multi-UAV)swarms in dynamic urban environments with obstacles and wind perturbations,we proposed a hybrid framework integrating adaptive reinforcement learning(RL),multi-modal perception fusion,and enhanced pigeon flock optimization(PFO)with curiosity-driven exploration to enable robust autonomous and formation control.The framework leverages meta-learning to optimize RL policies for real-time adaptation,fuses sensor data for precise state estimation,and enhances PFO with learned leader-follower dynamics and exploration rewards to maintain cohesive formations and explore uncertain areas.For swarms of 10–30 UAVs,it achieves 34%faster convergence,61%reduced stability root mean square error(RMSE),88%fewer collisions and 85.6%–92.3%success rates in target detection and encirclement,outperforming standard multi-agent RL,pure PFO,and single-modality RL.Three-dimensional trajectory visualizations confirm cohesive formations,collision-free maneuvers,and efficient exploration in urban search-and-rescue scenarios.Innovations include meta-RL for rapid adaptation,multi-modal fusion for robust perception,and curiosity-driven PFO for scalable,decentralized control,advancing real-world multi-UAV swarm autonomy and coordination. 展开更多
关键词 multiple unmanned aerial vehicle(multi-UAV)swarm autonomous control reinforcement learning(RL) multi-modal perception pigeon flock optimization(PFO)
在线阅读 下载PDF
ViE-Take:A Vision-Driven Multi-Modal Dataset for Exploring the Emotional Landscape in Takeover Safety of Autonomous Driving
10
作者 Yantong Wang Yu Gu +4 位作者 Tong Quan Jiaoyun Yang Mianxiong Dong Ning An Fuji Ren 《Research》 2025年第4期907-925,共19页
Takeover safety draws increasing attention in the intelligent transportation as the new energy vehicles with cutting-edge autopilot capabilities vigorously blossom on the road.Despite recent studies highlighting the i... Takeover safety draws increasing attention in the intelligent transportation as the new energy vehicles with cutting-edge autopilot capabilities vigorously blossom on the road.Despite recent studies highlighting the importance of drivers’emotions in takeover safety,the lack of emotion-aware takeover datasets hinders further investigation,thereby constraining potential applications in this field.To this end,we introduce ViE-Take,the first Vision-driven(Vision is used since it constitutes the most cost-effective and user-friendly solution for commercial driver monitor systems)dataset for exploring the Emotional landscape in Takeovers of autonomous driving.ViE-Take enables a comprehensive exploration of the impact of emotions on drivers’takeover performance through 3 key attributes:multi-source emotion elicitation,multi-modal driver data collection,and multi-dimensional emotion annotations.To aid the use of ViE-Take,we provide 4 deep models(corresponding to 4 prevalent learning strategies)for predicting 3 different aspects of drivers’takeover performance(readiness,reaction time,and quality).These models offer benefits for various downstream tasks,such as driver emotion recognition and regulation for automobile manufacturers.Initial analysis and experiments conducted on ViE-Take indicate that(a)emotions have diverse impacts on takeover performance,some of which are counterintuitive;(b)highly expressive social media clips,despite their brevity,prove effective in eliciting emotions(a foundation for emotion regulation);and(c)predicting takeover performance solely through deep learning on vision data not only is feasible but also holds great potential. 展开更多
关键词 multi modal data emotion annotations takeover safety vision driven new energy vehicles intelligent transportation autonomous driving emotion aware dataset
原文传递
M3SC:A Generic Dataset for Mixed Multi-Modal(MMM)Sensing and Communication Integration 被引量:6
11
作者 Xiang Cheng Ziwei Huang +6 位作者 Lu Bai Haotian Zhang Mingran Sun Boxun Liu Sijiang Li Jianan Zhang Minson Lee 《China Communications》 SCIE CSCD 2023年第11期13-29,共17页
The sixth generation(6G)of mobile communication system is witnessing a new paradigm shift,i.e.,integrated sensing-communication system.A comprehensive dataset is a prerequisite for 6G integrated sensing-communication ... The sixth generation(6G)of mobile communication system is witnessing a new paradigm shift,i.e.,integrated sensing-communication system.A comprehensive dataset is a prerequisite for 6G integrated sensing-communication research.This paper develops a novel simulation dataset,named M3SC,for mixed multi-modal(MMM)sensing-communication integration,and the generation framework of the M3SC dataset is further given.To obtain multimodal sensory data in physical space and communication data in electromagnetic space,we utilize Air-Sim and WaveFarer to collect multi-modal sensory data and exploit Wireless InSite to collect communication data.Furthermore,the in-depth integration and precise alignment of AirSim,WaveFarer,andWireless InSite are achieved.The M3SC dataset covers various weather conditions,multiplex frequency bands,and different times of the day.Currently,the M3SC dataset contains 1500 snapshots,including 80 RGB images,160 depth maps,80 LiDAR point clouds,256 sets of mmWave waveforms with 8 radar point clouds,and 72 channel impulse response(CIR)matrices per snapshot,thus totaling 120,000 RGB images,240,000 depth maps,120,000 LiDAR point clouds,384,000 sets of mmWave waveforms with 12,000 radar point clouds,and 108,000 CIR matrices.The data processing result presents the multi-modal sensory information and communication channel statistical properties.Finally,the MMM sensing-communication application,which can be supported by the M3SC dataset,is discussed. 展开更多
关键词 multi-modal sensing RAY-TRACING sensing-communication integration simulation dataset
在线阅读 下载PDF
TCM network pharmacology:new perspective integrating network target with artificial intelligence and multi-modal multi-omics technologies 被引量:1
12
作者 Ziyi Wang Tingyu Zhang +1 位作者 Boyang Wang Shao Li 《Chinese Journal of Natural Medicines》 2025年第11期1425-1434,共10页
Traditional Chinese medicine(TCM)demonstrates distinctive advantages in disease prevention and treatment.However,analyzing its biological mechanisms through the modern medical research paradigm of“single drug,single ... Traditional Chinese medicine(TCM)demonstrates distinctive advantages in disease prevention and treatment.However,analyzing its biological mechanisms through the modern medical research paradigm of“single drug,single target”presents significant challenges due to its holistic approach.Network pharmacology and its core theory of network targets connect drugs and diseases from a holistic and systematic perspective based on biological networks,overcoming the limitations of reductionist research models and showing considerable value in TCM research.Recent integration of network target computational and experimental methods with artificial intelligence(AI)and multi-modal multi-omics technologies has substantially enhanced network pharmacology methodology.The advancement in computational and experimental techniques provides complementary support for network target theory in decoding TCM principles.This review,centered on network targets,examines the progress of network target methods combined with AI in predicting disease molecular mechanisms and drug-target relationships,alongside the application of multi-modal multi-omics technologies in analyzing TCM formulae,syndromes,and toxicity.Looking forward,network target theory is expected to incorporate emerging technologies while developing novel approaches aligned with its unique characteristics,potentially leading to significant breakthroughs in TCM research and advancing scientific understanding and innovation in TCM. 展开更多
关键词 Network pharmacology Traditional Chinese medicine Network target Artificial intelligence multi-modal Multi-omics
原文传递
MMGC-Net: Deep neural network for classification of mineral grains using multi-modal polarization images 被引量:1
13
作者 Jun Shu Xiaohai He +3 位作者 Qizhi Teng Pengcheng Yan Haibo He Honggang Chen 《Journal of Rock Mechanics and Geotechnical Engineering》 2025年第6期3894-3909,共16页
The multi-modal characteristics of mineral particles play a pivotal role in enhancing the classification accuracy,which is critical for obtaining a profound understanding of the Earth's composition and ensuring ef... The multi-modal characteristics of mineral particles play a pivotal role in enhancing the classification accuracy,which is critical for obtaining a profound understanding of the Earth's composition and ensuring effective exploitation utilization of its resources.However,the existing methods for classifying mineral particles do not fully utilize these multi-modal features,thereby limiting the classification accuracy.Furthermore,when conventional multi-modal image classification methods are applied to planepolarized and cross-polarized sequence images of mineral particles,they encounter issues such as information loss,misaligned features,and challenges in spatiotemporal feature extraction.To address these challenges,we propose a multi-modal mineral particle polarization image classification network(MMGC-Net)for precise mineral particle classification.Initially,MMGC-Net employs a two-dimensional(2D)backbone network with shared parameters to extract features from two types of polarized images to ensure feature alignment.Subsequently,a cross-polarized intra-modal feature fusion module is designed to refine the spatiotemporal features from the extracted features of the cross-polarized sequence images.Ultimately,the inter-modal feature fusion module integrates the two types of modal features to enhance the classification precision.Quantitative and qualitative experimental results indicate that when compared with the current state-of-the-art multi-modal image classification methods,MMGC-Net demonstrates marked superiority in terms of mineral particle multi-modal feature learning and four classification evaluation metrics.It also demonstrates better stability than the existing models. 展开更多
关键词 Mineral particles multi-modal image classification Shared parameters Feature fusion Spatiotemporal feature
暂未订购
Multi-modal intelligent situation awareness in real-time air traffic control: Control intent understanding and flight trajectory prediction 被引量:1
14
作者 Dongyue GUO Jianwei ZHANG +1 位作者 Bo YANG Yi LIN 《Chinese Journal of Aeronautics》 2025年第6期41-57,共17页
With the advent of the next-generation Air Traffic Control(ATC)system,there is growing interest in using Artificial Intelligence(AI)techniques to enhance Situation Awareness(SA)for ATC Controllers(ATCOs),i.e.,Intellig... With the advent of the next-generation Air Traffic Control(ATC)system,there is growing interest in using Artificial Intelligence(AI)techniques to enhance Situation Awareness(SA)for ATC Controllers(ATCOs),i.e.,Intelligent SA(ISA).However,the existing AI-based SA approaches often rely on unimodal data and lack a comprehensive description and benchmark of the ISA tasks utilizing multi-modal data for real-time ATC environments.To address this gap,by analyzing the situation awareness procedure of the ATCOs,the ISA task is refined to the processing of the two primary elements,i.e.,spoken instructions and flight trajectories.Subsequently,the ISA is further formulated into Controlling Intent Understanding(CIU)and Flight Trajectory Prediction(FTP)tasks.For the CIU task,an innovative automatic speech recognition and understanding framework is designed to extract the controlling intent from unstructured and continuous ATC communications.For the FTP task,the single-and multi-horizon FTP approaches are investigated to support the high-precision prediction of the situation evolution.A total of 32 unimodal/multi-modal advanced methods with extensive evaluation metrics are introduced to conduct the benchmarks on the real-world multi-modal ATC situation dataset.Experimental results demonstrate the effectiveness of AI-based techniques in enhancing ISA for the ATC environment. 展开更多
关键词 Airtraffic control Automatic speechrecognition and understanding Flight trajectory prediction multi-modal Situationawareness
原文传递
A multi-modal clustering method for traditonal Chinese medicine clinical data via media convergence 被引量:3
15
作者 Jingna Si Ziwei Tian +6 位作者 Dongmei Li Lei Zhang Lei Yao Wenjuan Jiang Jia Liu Runshun Zhang Xiaoping Zhang 《CAAI Transactions on Intelligence Technology》 SCIE EI 2023年第2期390-400,共11页
Media convergence is a media change led by technological innovation.Applying media convergence technology to the study of clustering in Chinese medicine can significantly exploit the advantages of media fusion.Obtaini... Media convergence is a media change led by technological innovation.Applying media convergence technology to the study of clustering in Chinese medicine can significantly exploit the advantages of media fusion.Obtaining consistent and complementary information among multiple modalities through media convergence can provide technical support for clustering.This article presents an approach based on Media Convergence and Graph convolution Encoder Clustering(MCGEC)for traditonal Chinese medicine(TCM)clinical data.It feeds modal information and graph structure from media information into a multi-modal graph convolution encoder to obtain the media feature representation learnt from multiple modalities.MCGEC captures latent information from various modalities by fusion and optimises the feature representations and network architecture with learnt clustering labels.The experiment is conducted on real-world multimodal TCM clinical data,including information like images and text.MCGEC has improved clustering results compared to the generic single-modal clustering methods and the current more advanced multi-modal clustering methods.MCGEC applied to TCM clinical datasets can achieve better results.Integrating multimedia features into clustering algorithms offers significant benefits compared to single-modal clustering approaches that simply concatenate features from different modalities.It provides practical technical support for multi-modal clustering in the TCM field incorporating multimedia features. 展开更多
关键词 graph convolutional encoder media convergence multi-modal clustering traditional Chinese medicine
在线阅读 下载PDF
Adaptive Multi-modal Fusion Instance Segmentation for CAEVs in Complex Conditions:Dataset,Framework and Verifications 被引量:3
16
作者 Pai Peng Keke Geng +3 位作者 Guodong Yin Yanbo Lu Weichao Zhuang Shuaipeng Liu 《Chinese Journal of Mechanical Engineering》 SCIE EI CAS CSCD 2021年第5期96-106,共11页
Current works of environmental perception for connected autonomous electrified vehicles(CAEVs)mainly focus on the object detection task in good weather and illumination conditions,they often perform poorly in adverse ... Current works of environmental perception for connected autonomous electrified vehicles(CAEVs)mainly focus on the object detection task in good weather and illumination conditions,they often perform poorly in adverse scenarios and have a vague scene parsing ability.This paper aims to develop an end-to-end sharpening mixture of experts(SMoE)fusion framework to improve the robustness and accuracy of the perception systems for CAEVs in complex illumination and weather conditions.Three original contributions make our work distinctive from the existing relevant literature.The Complex KITTI dataset is introduced which consists of 7481 pairs of modified KITTI RGB images and the generated LiDAR dense depth maps,and this dataset is fine annotated in instance-level with the proposed semi-automatic annotation method.The SMoE fusion approach is devised to adaptively learn the robust kernels from complementary modalities.Comprehensive comparative experiments are implemented,and the results show that the proposed SMoE framework yield significant improvements over the other fusion techniques in adverse environmental conditions.This research proposes a SMoE fusion framework to improve the scene parsing ability of the perception systems for CAEVs in adverse conditions. 展开更多
关键词 Connected autonomous electrified vehicles multi-modal fusion Semi-automatic annotation Sharpening mixture of experts Comparative experiments
在线阅读 下载PDF
Personal Style Guided Outfit Recommendation with Multi-Modal Fashion Compatibility Modeling 被引量:1
17
作者 WANG Kexin ZHANG Jie +3 位作者 ZHANG Peng SUN Kexin ZHAN Jiamei WEI Meng 《Journal of Donghua University(English Edition)》 2025年第2期156-167,共12页
A personalized outfit recommendation has emerged as a hot research topic in the fashion domain.However,existing recommendations do not fully exploit user style preferences.Typically,users prefer particular styles such... A personalized outfit recommendation has emerged as a hot research topic in the fashion domain.However,existing recommendations do not fully exploit user style preferences.Typically,users prefer particular styles such as casual and athletic styles,and consider attributes like color and texture when selecting outfits.To achieve personalized outfit recommendations in line with user style preferences,this paper proposes a personal style guided outfit recommendation with multi-modal fashion compatibility modeling,termed as PSGNet.Firstly,a style classifier is designed to categorize fashion images of various clothing types and attributes into distinct style categories.Secondly,a personal style prediction module extracts user style preferences by analyzing historical data.Then,to address the limitations of single-modal representations and enhance fashion compatibility,both fashion images and text data are leveraged to extract multi-modal features.Finally,PSGNet integrates these components through Bayesian personalized ranking(BPR)to unify the personal style and fashion compatibility,where the former is used as personal style features and guides the output of the personalized outfit recommendation tailored to the target user.Extensive experiments on large-scale datasets demonstrate that the proposed model is efficient on the personalized outfit recommendation. 展开更多
关键词 personalized outfit recommendation fashion compatibility modeling style preference multi-modal representation Bayesian personalized ranking(BPR) style classifier
暂未订购
Deep learning-based multi-modal data integration enhancing breast cancer disease-free survival prediction 被引量:1
18
作者 Zehua Wang Ruichong Lin +9 位作者 Yanchun Li Jin Zeng Yongjian Chen Wenhao Ouyang Han Li Xueyan Jia Zijia Lai Yunfang Yu Herui Yao Weifeng Su 《Precision Clinical Medicine》 2024年第2期132-145,共14页
Background:The prognosis of breast cancer is often unfavorable,emphasizing the need for early metastasis risk detection and accurate treatment predictions.This study aimed to develop a novel multi-modal deep learning ... Background:The prognosis of breast cancer is often unfavorable,emphasizing the need for early metastasis risk detection and accurate treatment predictions.This study aimed to develop a novel multi-modal deep learning model using preoperative data to predict disease-free survival(DFS).Methods:We retrospectively collected pathology imaging,molecular and clinical data from The Cancer Genome Atlas and one independent institution in China.We developed a novel Deep Learning Clinical Medicine Based Pathological Gene Multi-modal(DeepClinMed-PGM)model for DFS prediction,integrating clinicopathological data with molecular insights.The patients included the training cohort(n=741),internal validation cohort(n=184),and external testing cohort(n=95).Result:Integrating multi-modal data into the DeepClinMed-PGM model significantly improved area under the receiver operating characteristic curve(AUC)values.In the training cohort,AUC values for 1-,3-,and 5-year DFS predictions increased to 0.979,0.957,and 0.871,while in the external testing cohort,the values reached 0.851,0.878,and 0.938 for 1-,2-,and 3-year DFS predictions,respectively.The DeepClinMed-PGM's robust discriminative capabilities were consistently evident across various cohorts,including the training cohort[hazard ratio(HR)0.027,95%confidence interval(CI)0.0016-0.046,P<0.0001],the internal validation cohort(HR 0.117,95%CI 0.041-0.334,P<0.0001),and the external cohort(HR 0.061,95%CI 0.017-0.218,P<0.0001).Additionally,the DeepClinMed-PGM model demonstrated C-index values of 0.925,0.823,and 0.864 within the three cohorts,respectively.Conclusion:This study introduces an approach to breast cancer prognosis,integrating imaging and molecular and clinical data for enhanced predictive accuracy,offering promise for personalized treatment strategies. 展开更多
关键词 breast cancer multi-modalITY deep learning PATHOLOGICAL disease-free survival
原文传递
Multi-modal deep learning based on multi-dimensional and multi-level temporal data can enhance the prognostic prediction for multi-drug resistant pulmonary tuberculosis patients 被引量:4
19
作者 Zhen-Hui Lu Ming Yang +2 位作者 Chen-Hui Pan Pei-Yong Zheng Shun-Xian Zhang 《Science in One Health》 2022年第1期6-8,共3页
Despite the advent of new diagnostics,drugs and regimens,multi-drug resistant pulmonary tuberculosis(MDRPTB)remains a global health threat.It has a long treatment cycle,low cure rate and heavy disease burden.Factors s... Despite the advent of new diagnostics,drugs and regimens,multi-drug resistant pulmonary tuberculosis(MDRPTB)remains a global health threat.It has a long treatment cycle,low cure rate and heavy disease burden.Factors such as demographics,disease characteristics,lung imaging,biomarkers,therapeutic schedule and adherence to medications are associated with MDR-PTB prognosis.However,thus far,the majority of existing studies have focused on predicting treatment outcomes through static single-scale or low dimensional information.Hence,multi-modal deep learning based on dynamic data for multiple dimensions can provide a deeper understanding of personalized treatment plans to aid in the clinical management of patients. 展开更多
关键词 MDR-PTB multi-modal Deep learning PROGNOSIS
暂未订购
Multi-Modal Named Entity Recognition with Auxiliary Visual Knowledge and Word-Level Fusion
20
作者 Huansha Wang Ruiyang Huang +1 位作者 Qinrang Liu Xinghao Wang 《Computers, Materials & Continua》 2025年第6期5747-5760,共14页
Multi-modal Named Entity Recognition(MNER)aims to better identify meaningful textual entities by integrating information from images.Previous work has focused on extracting visual semantics at a fine-grained level,or ... Multi-modal Named Entity Recognition(MNER)aims to better identify meaningful textual entities by integrating information from images.Previous work has focused on extracting visual semantics at a fine-grained level,or obtaining entity related external knowledge from knowledge bases or Large Language Models(LLMs).However,these approaches ignore the poor semantic correlation between visual and textual modalities in MNER datasets and do not explore different multi-modal fusion approaches.In this paper,we present MMAVK,a multi-modal named entity recognition model with auxiliary visual knowledge and word-level fusion,which aims to leverage the Multi-modal Large Language Model(MLLM)as an implicit knowledge base.It also extracts vision-based auxiliary knowledge from the image formore accurate and effective recognition.Specifically,we propose vision-based auxiliary knowledge generation,which guides the MLLM to extract external knowledge exclusively derived from images to aid entity recognition by designing target-specific prompts,thus avoiding redundant recognition and cognitive confusion caused by the simultaneous processing of image-text pairs.Furthermore,we employ a word-level multi-modal fusion mechanism to fuse the extracted external knowledge with each word-embedding embedded from the transformerbased encoder.Extensive experimental results demonstrate that MMAVK outperforms or equals the state-of-the-art methods on the two classical MNER datasets,even when the largemodels employed have significantly fewer parameters than other baselines. 展开更多
关键词 multi-modal named entity recognition large language model multi-modal fusion
在线阅读 下载PDF
上一页 1 2 250 下一页 到第
使用帮助 返回顶部