Bone tumors(BTs)-including osteosarcoma,Ewing sarcoma,and chondrosarcoma-are rare but biologically complex malignancies characterized by pronounced heterogeneity in anatomical location,histological subtype,and molecul...Bone tumors(BTs)-including osteosarcoma,Ewing sarcoma,and chondrosarcoma-are rare but biologically complex malignancies characterized by pronounced heterogeneity in anatomical location,histological subtype,and molecular alterations.Recent advances in artificial intelligence(AI),particularly deep learning,have enabled the integration of diverse clinical data modalities to support diagnosis,treatment planning,and prognostication in bone oncology.This review provides a comprehensive synthesis of AI-driven multimodal fusion strategies that incorporate radiological imaging,digital pathology,multi-omics profiling,and electronic health records.We conducted a structured review of peer-reviewed literature published between 2015 and early 2025,focusing on the development,validation,and clinical applicability of AI models for BT diagnosis,subtyping,treatment response prediction,and recurrence monitoring.Although multimodal models have demonstrated advantages over unimodal approaches,especially in handling missing data and improving generalizability,most remain constrained by single-center study designs,small sample sizes,and limited prospective or external validation.Persistent technical and translational challenges include semantic misalignment across modalities,incomplete datasets,limited model interpretability,and regulatory and infrastructural barriers to clinical integration.To address these limitations,we highlight emerging directions such as contrastive representation learning,generative data augmentation,transformer-based fusion architectures,and privacy-preserving federated learning.We also discuss the evolving role of foundation models and workflow-integrated AI agents in enhancing scalability and clinical usability.In summary,multimodal AI represents a promising paradigm for advancing precision care in BTs.Realizing its full clinical potential will require methodologically rigorous,biologically informed,and system-level approaches that bridge algorithmic innovation with real-world healthcare delivery.展开更多
In recent years,microservice architecture has gained increasing popularity.However,due to the complex and dynamically chang⁃ing nature of microservice systems,failure detection has become more challenging.Traditional ...In recent years,microservice architecture has gained increasing popularity.However,due to the complex and dynamically chang⁃ing nature of microservice systems,failure detection has become more challenging.Traditional root cause analysis methods mostly rely on a single modality of data,which is insufficient to cover all failure information.Existing multimodal methods require collecting high-quality la⁃beled samples and often face challenges in classifying unknown failure categories.To address these challenges,this paper proposes a root cause analysis framework based on a masked graph autoencoder(GAE).The main process involves feature extraction,feature dimensionality reduction based on GAE,and online clustering combined with expert input.The method is experimentally evaluated on two public datasets and compared with two baseline methods,demonstrating significant advantages even with 16%labeled samples.展开更多
With the increasing attention to the state and role of people in intelligent manufacturing, there is a strong demand for human-cyber-physical systems (HCPS) that focus on human-robot interaction. The existing intellig...With the increasing attention to the state and role of people in intelligent manufacturing, there is a strong demand for human-cyber-physical systems (HCPS) that focus on human-robot interaction. The existing intelligent manufacturing system cannot satisfy efcient human-robot collaborative work. However, unlike machines equipped with sensors, human characteristic information is difcult to be perceived and digitized instantly. In view of the high complexity and uncertainty of the human body, this paper proposes a framework for building a human digital twin (HDT) model based on multimodal data and expounds on the key technologies. Data acquisition system is built to dynamically acquire and update the body state data and physiological data of the human body and realize the digital expression of multi-source heterogeneous human body information. A bidirectional long short-term memory and convolutional neural network (BiLSTM-CNN) based network is devised to fuse multimodal human data and extract the spatiotemporal features, and the human locomotion mode identifcation is taken as an application case. A series of optimization experiments are carried out to improve the performance of the proposed BiLSTM-CNN-based network model. The proposed model is compared with traditional locomotion mode identifcation models. The experimental results proved the superiority of the HDT framework for human locomotion mode identifcation.展开更多
In this case study, we hypothesized that sympathetic nerve activity would be higher during conversation with PALRO robot, and that conversation would result in an increase in cerebral blood flow near the Broca’s area...In this case study, we hypothesized that sympathetic nerve activity would be higher during conversation with PALRO robot, and that conversation would result in an increase in cerebral blood flow near the Broca’s area. The facial expressions of a human subject were recorded, and cerebral blood flow and heart rate variability were measured during interactions with the humanoid robot. These multimodal data were time-synchronized to quantitatively verify the change from the resting baseline by testing facial expression analysis, cerebral blood flow, and heart rate variability. In conclusion, this subject indicated that sympathetic nervous activity was dominant, suggesting that the subject may have enjoyed and been excited while talking to the robot (normalized High Frequency < normalized Low Frequency: 0.22 ± 0.16 < 0.78 ± 0.16). Cerebral blood flow values were higher during conversation and in the resting state after the experiment than in the resting state before the experiment. Talking increased cerebral blood flow in the frontal region. As the subject was left-handed, it was confirmed that the right side of the brain, where the Broca’s area is located, was particularly activated (Left < right: 0.15 ± 0.21 < 1.25 ± 0.17). In the sections where a “happy” facial emotion was recognized, the examiner-judged “happy” faces and the MTCNN “happy” results were also generally consistent.展开更多
To ensure the safe and stable operation of rotating machinery,intelligent fault diagnosis methods hold significant research value.However,existing diagnostic approaches largely rely on manual feature extraction and ex...To ensure the safe and stable operation of rotating machinery,intelligent fault diagnosis methods hold significant research value.However,existing diagnostic approaches largely rely on manual feature extraction and expert experience,which limits their adaptability under variable operating conditions and strong noise environments,severely affecting the generalization capability of diagnostic models.To address this issue,this study proposes a multimodal fusion fault diagnosis framework based on Mel-spectrograms and automated machine learning(AutoML).The framework first extracts fault-sensitive Mel time–frequency features from acoustic signals and fuses them with statistical features of vibration signals to construct complementary fault representations.On this basis,automated machine learning techniques are introduced to enable end-to-end diagnostic workflow construction and optimal model configuration acquisition.Finally,diagnostic decisions are achieved by automatically integrating the predictions of multiple high-performance base models.Experimental results on a centrifugal pump vibration and acoustic dataset demonstrate that the proposed framework achieves high diagnostic accuracy under noise-free conditions and maintains strong robustness under noisy interference,validating its efficiency,scalability,and practical value for rotating machinery fault diagnosis.展开更多
1 Introduction.The development of information technology has promoted the application of multimodal,long-temporal,and multiscale data in healthcare[1].However,the effective utilization of multimodal data still faces c...1 Introduction.The development of information technology has promoted the application of multimodal,long-temporal,and multiscale data in healthcare[1].However,the effective utilization of multimodal data still faces challenges related to feature redundancy within the modality of long temporal data and across multimodal data[2].展开更多
Background:Retinal vein occlusion(RvO)is a leading cause of visual impairment on a global scale.Its patho-logical mechanisms involve a complex interplay of vascular obstruction,ischemia,and secondary inflammatory resp...Background:Retinal vein occlusion(RvO)is a leading cause of visual impairment on a global scale.Its patho-logical mechanisms involve a complex interplay of vascular obstruction,ischemia,and secondary inflammatory responses.Recent interdisciplinary advances,underpinned by the integration of multimodal data,have estab-lished a new paradigm for unraveling the pathophysiological mechanisms of RvO,enabling early diagnosis and personalized treatment strategies.Main text:This review critically synthesizes recent progress at the intersection of machine learning,bioinfor-matics,and clinical medicine,focusing on developing predictive models and deep analysis,exploring molecular mechanisms,and identifying markers associated with RvO.By bridging technological innovation with clinical needs,this review underscores the potential of data-driven strategies to advance RvO research and optimize patient care.Conclusions:Machine learning-bioinformatics integration has revolutionised RvO research through predictive modelling and mechanistic insights,particularly via deep learning-enhanced retinal imaging and multi-omics networks.Despite progress,clinical translation requires resolving data standardisation inconsistencies and model generalizability limitations.Establishing multicentre validation frameworks and interpretable AI tools,coupled with patient-focused data platforms through cross-disciplinary collaboration,could enable precision interventions to optimally preserve vision.展开更多
This paper addresses the challenge of efficiently querying multimodal related data in data lakes,a large-scale storage and management system that supports heterogeneous data formats,including structured,semi-structure...This paper addresses the challenge of efficiently querying multimodal related data in data lakes,a large-scale storage and management system that supports heterogeneous data formats,including structured,semi-structured,and unstructured data.Multimodal data queries are crucial because they enable seamless retrieval of related data across modalities,such as tables,images,and text,which has applications in fields like e-commerce,healthcare,and education.However,existing methods primarily focus on single-modality queries,such as joinable or unionable table discovery,and struggle to handle the heterogeneity and lack of metadata in data lakes while balancing accuracy and efficiency.To tackle these challenges,we propose a Multimodal data Query mechanism for Data Lakes(MQDL),which employs a modality-adaptive indexing mechanism raleted and contrastive learning based embeddings to unify representations across modalities.Additionally,we introduce product quantization to optimize candidate verification during queries,reducing computational overhead while maintaining precision.We evaluate MQDL using a table-image dataset across multiple business scenarios,measuring metrics such as precision,recall,and F1-score.Results show that MQDL achieves an accuracy rate of approximately 90%,while demonstrating strong scalability and reduced query response time compared to traditional methods.These findings highlight MQDL's potential to enhance multimodal data retrieval in complex data lake environments.展开更多
Effective survival analysis is essential for identifying optimal preventive treatments within smart healthcare systems and leveraging digital health advancements;however,existing prediction models face limitations,pri...Effective survival analysis is essential for identifying optimal preventive treatments within smart healthcare systems and leveraging digital health advancements;however,existing prediction models face limitations,primarily relying on ensemble classification techniques with suboptimal performance in both target detection and predictive accuracy.To address these gaps,this paper proposes a multimodal framework that integrates enhanced facial feature detection and temporal predictive modeling.For facial feature extraction,this study developed a lightweight faceregion convolutional neural network(FRegNet)specialized in detecting key facial components,such as eyes and lips in clinical patients that incorporates a residual backbone(Rstem)to enhance feature representation and a facial path aggregated feature pyramid network for multi-resolution feature fusion;comparative experiments reveal that FReg-Net outperforms state-of-the-art target detection algorithms,achieving average precision(AP)of 0.922,average recall of 0.933,mean average precision(mAP)of 0.987,and precision of 0.98–significantly surpassing other mask region-based convolutional neural networks(RCNN)variants,such as mask RCNN-ResNeXt with AP of 0.789 and mAP of 0.957.Based on the extracted facial features and clinical physiological indicators,this study proposes an enhanced temporal encoding-decoding(ETED)model that integrates an adaptive attention mechanism and a gated weighting mechanism to improve predictive performance,with comparative results demonstrating that the ETED variant incorporating facial features(ETEncoding-Decoding-Face)outperforms traditional models,achieving an accuracy of 0.916,precision of 0.850,recall of 0.895,F1 of 0.884,and area under the curve(AUC)of 0.947–outperforming gradient boosting with an accuracy of 0.922,but AUC of 0.669,and other classifiers in comprehensive metrics.The results confirm that the multimodal dataset(facial features+physiological indicators)significantly enhances the prediction accuracy of the seven-day survival conditions of patients.Correlation analysis reveals that chronic health evaluation and mean arterial pressure are positively correlated with survival,while temperature,Glasgow Coma Scale,and fibrinogen are negatively correlated.展开更多
Seismic hazards pose a major threat to life safety,social development,and the economy.Traditional seismic vulnerability and risk assessments,such as field survey methods,may not be suitable for densely built-up urban ...Seismic hazards pose a major threat to life safety,social development,and the economy.Traditional seismic vulnerability and risk assessments,such as field survey methods,may not be suitable for densely built-up urban areas due to the limited availability of comprehensive data and potential subjectivity in judgment.To overcome these limitations,an integrated method for seismic vulnerability and risk assessment based on multimodal remote sensing data,support vector machine(SVM)and GIScience methods was proposed and applied to the central urban area of Jinan City,Shandong Province,China.First,an area with representative buildings was selected for field survey research,and an attribute information base established.Then,the SVM method was used to establish the susceptibility proxies,which were applied to the whole study area after accuracy evaluation.Finally,the spatial distribution of seismic vulnerability and risk under different seismic intensity scenarios(from VI to X)was analyzed in GIScience.The results show that the average building vulnerability index in the central urban area of Jinan City is 0.53,indicating that the overall seismic performance of buildings is at a moderate level.Under the seismic intensity scenario of VIII,the buildings in the Starting area and New urban district of Jinan would mostly suffer‘Moderate’damage,while Old urban areas,with more seismic-resistant buildings,would experience only‘Slight’damage.This study aims to offer an efficient and accurate method for assessing seismic vulnerability in mid to large-sized cities characterized by concentrated population densities and rapid urbanization,as well as provide a valuable reference for efforts in urban renewal,seismic mitigation,and land planning,particularly in cities and regions of developing countries.Additionally,it contributes to the realization of Sustainable Development Goal 11,which seeks to make cities and human settlements inclusive,safe,resilient,and sustainable.展开更多
The coronavirus disease 2019 (COVID-19) pandemic has dramatically increased the awareness of emerging infectious diseases. The advancement of multiomics analysis technology has resulted in the development of several d...The coronavirus disease 2019 (COVID-19) pandemic has dramatically increased the awareness of emerging infectious diseases. The advancement of multiomics analysis technology has resulted in the development of several databases containing virus information. Several scientists have integrated existing data on viruses to construct phylogenetic trees and predict virus mutation and transmission in different ways, providing prospective technical support for epidemic prevention and control. This review summarized the databases of known emerging infectious viruses and techniques focusing on virus variant forecasting and early warning. It focuses on the multi-dimensional information integration and database construction of emerging infectious viruses, virus mutation spectrum construction and variant forecast model, analysis of the affinity between mutation antigen and the receptor, propagation model of virus dynamic evolution, and monitoring and early warning for variants. As people have suffered from COVID-19 and repeated flu outbreaks, we focused on the research results of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and influenza viruses. This review comprehensively viewed the latest virus research and provided a reference for future virus prevention and control research.展开更多
With the rapid development of artificial intelligence(AI)technology,multimodal data integration has become an important means to improve the accuracy of diagnosis and treatment in gastroenterology and hepatology.This ...With the rapid development of artificial intelligence(AI)technology,multimodal data integration has become an important means to improve the accuracy of diagnosis and treatment in gastroenterology and hepatology.This article systematically reviews the latest progress of multimodal AI technology in the diagnosis,treatment,and decision-making for gastrointestinal tumors,functional gastrointestinal diseases,and liver diseases,focusing on the innovative applications of endoscopic image AI,pathological section AI,multi-omics data fusion models,and wearable devices combined with natural language processing.Multimodal AI can significantly improve the accuracy of early diagnosis and the efficiency of individualized treatment planning by integrating imaging,pathological data,molecular,and clinical phenotypic data.However,current AI technologies still face challenges such as insufficient data standardization,limited generalization of models,and ethical compliance.This paper proposes solutions,such as the establishment of cross-center data sharing platform,the development of federated learning framework,and the formulation of ethical norms,and looks forward to the application prospect of multimodal large-scale models in the disease management process.This review provides theoretical basis and practical guidance for promoting the clinical translation of AI technology in the field of gastroenterology and hepatology.展开更多
Alzheimer’s Disease(AD),a prevalent neurodegenerative disorder characterized by memory loss and cognitive decline,poses significant challenges for individuals and society.Multimodal data fusion has emerged as a promi...Alzheimer’s Disease(AD),a prevalent neurodegenerative disorder characterized by memory loss and cognitive decline,poses significant challenges for individuals and society.Multimodal data fusion has emerged as a promising approach for AD diagnosis,with Graph Convolutional Networks(GCNs)effectively capturing irregular brain information.However,traditional GCN methods face limitations in representing and integrating multimodal data,often resulting in feature mismatch.In this study,we propose a novel Kolmogorov-Arnold Graph Attention Network(KAGAN)model to address this issue through semantic-level alignment.KAGAN incorporates a Multimodal Feature Construction method(MuStaF)to extract structural and functional features from T1-and T2-weighted images,and a Multimodal Graph Adjacency Matrix Construction method(MuGAC)to integrate clinical information,modeling intricate relationships across modalities.Experiments conducted on the ADNI dataset demonstrate the superiority of KAGAN in AD/CN/MCI classification,achieving an accuracy of 98.29±1.21%.This highlights KAGAN’s potential for early AD diagnosis by enabling interactive learning and fusion of multimodal features at the semantic level.The source code of our proposed model and the related datasets are available at https://github.com/sheeprra/KAGAN.展开更多
Purpose:This study aims to integrate large language models(LLMs)with interpretable machine learning methods to develop a multimodal data-driven framework for predicting corporate financial fraud,addressing the limitat...Purpose:This study aims to integrate large language models(LLMs)with interpretable machine learning methods to develop a multimodal data-driven framework for predicting corporate financial fraud,addressing the limitations of traditional approaches in long-text semantic parsing,model interpretability,and multisource data fusion,thereby providing regulatory agencies with intelligent auditing tools.Design/methodology/approach:Analyzing 5,304 Chinese listed firms’annual reports(2015-2020)from the CSMAD database,this study leverages the Doubao LLMs to generate chunked summaries and 256-dimensional semantic vectors,developing textual semantic features.It integrates 19 financial indicators,11 governance metrics,and linguistic characteristics(tone,readability)with fraud prediction models optimized through a group of Gradient Boosted Decision Tree(GBDT)algorithms.SHAP value analysis in the final model reveals the risk transmission mechanism by quantifying the marginal impacts of financial,governance,and textual features on fraud likelihood.Findings:The study found that LLMs effectively distill lengthy annual reports into semantic summaries,while GBDT algorithms(AUC>0.850)outperform the traditional Logistic Regression model in fraud detection.Multimodal fusion improved performance by 7.4%,with financial,governance,and textual features providing complementary signals.SHAP analysis revealed financial distress,governance conflicts,and narrative patterns(e.g.,tone anchoring,semantic thresholds)as key fraud indicators,highlighting managerial intent in report language.Research limitations:This study identifies three key limitations:1)lack of interpretability for semantic features,2)absence of granular fraud-type differentiation,and 3)unexplored comparative validation with other deep learning methods.Future research will address these gaps to enhance fraud detection precision and model transparency.Practical implications:The developed semantic-enhanced evaluation model provides a quantitative tool for assessing listed companies’information disclosure quality and enables practical implementation through its derivative real-time monitoring system.This advancement significantly strengthens capital market risk early warning capabilities,offering actionable insights for securities regulation.Originality/value:This study presents three key innovations:1)A novel“chunking-summarizationembedding”framework for efficient semantic compression of lengthy annual reports(30,000 words);2)Demonstration of LLMs’superior performance in financial text analysis,outperforming traditional methods by 19.3%;3)A novel“language-psychology-behavior”triad model for analyzing managerial fraud motives.展开更多
Hateful meme is a multimodal medium that combines images and texts.The potential hate content of hateful memes has caused serious problems for social media security.The current hateful memes classification task faces ...Hateful meme is a multimodal medium that combines images and texts.The potential hate content of hateful memes has caused serious problems for social media security.The current hateful memes classification task faces significant data scarcity challenges,and direct fine-tuning of large-scale pre-trained models often leads to severe overfitting issues.In addition,it is a challenge to understand the underlying relationship between text and images in the hateful memes.To address these issues,we propose a multimodal hateful memes classification model named LABF,which is based on low-rank adapter layers and bidirectional gated feature fusion.Firstly,low-rank adapter layers are adopted to learn the feature representation of the new dataset.This is achieved by introducing a small number of additional parameters while retaining prior knowledge of the CLIP model,which effectively alleviates the overfitting phenomenon.Secondly,a bidirectional gated feature fusion mechanism is designed to dynamically adjust the interaction weights of text and image features to achieve finer cross-modal fusion.Experimental results show that the method significantly outperforms existing methods on two public datasets,verifying its effectiveness and robustness.展开更多
By 2025,research on Traditional Chinese Medicine(TCM)meridians has generated 12-15 macro-level theories and over 20 specific hypotheses,manifesting a highly fragmented research landscape.Objective:This paper proposes ...By 2025,research on Traditional Chinese Medicine(TCM)meridians has generated 12-15 macro-level theories and over 20 specific hypotheses,manifesting a highly fragmented research landscape.Objective:This paper proposes the“Holistic Hierarchical Predictive-Integration Hypothesis”(HHPIT)to construct a unified theoretical framework that integrates the rational components of existing meridian hypotheses.Methods:The HHPIT hypothesis systematically reviews current meridian theories,employs interdisciplinary methodologies,integrates artificial intelligence technology,and establishes a three-tier architecture encompassing structural,functional,and systemic layers.Results:HHPIT successfully integrates diverse meridian theories,proposes a computable algorithmic pipeline,and provides specific application protocols for chronic disease treatment,anti-aging,and enhancement of Zang-fu organ functions.Conclusion:HHPIT offers a novel,computable,and verifiable research paradigm for meridian studies,promoting the modernization and internationalization of TCM theory.展开更多
Recent advances in computer vision and deep learning have shown that the fusion of depth information can significantly enhance the performance of RGB-based damage detection and segmentation models.However,alongside th...Recent advances in computer vision and deep learning have shown that the fusion of depth information can significantly enhance the performance of RGB-based damage detection and segmentation models.However,alongside the advantages,depth-sensing also presents many practical challenges.For instance,the depth sensors impose an additional payload burden on the robotic inspection platforms limiting the operation time and increasing the inspection cost.Additionally,some lidar-based depth sensors have poor outdoor performance due to sunlight contamination during the daytime.In this context,this study investigates the feasibility of abolishing depth-sensing at test time without compromising the segmentation performance.An autonomous damage segmentation framework is developed,based on recent advancements in vision-based multi-modal sensing such as modality hallucination(MH)and monocular depth estimation(MDE),which require depth data only during the model training.At the time of deployment,depth data becomes expendable as it can be simulated from the corresponding RGB frames.This makes it possible to reap the benefits of depth fusion without any depth perception per se.This study explored two different depth encoding techniques and three different fusion strategies in addition to a baseline RGB-based model.The proposed approach is validated on computer-generated RGB-D data of reinforced concrete buildings subjected to seismic damage.It was observed that the surrogate techniques can increase the segmentation IoU by up to 20.1%with a negligible increase in the computation cost.Overall,this study is believed to make a positive contribution to enhancing the resilience of critical civil infrastructure.展开更多
Industrial Internet of Things(IoT)connecting society and industrial systems represents a tremendous and promising paradigm shift.With IoT,multimodal and heterogeneous data from industrial devices can be easily collect...Industrial Internet of Things(IoT)connecting society and industrial systems represents a tremendous and promising paradigm shift.With IoT,multimodal and heterogeneous data from industrial devices can be easily collected,and further analyzed to discover device maintenance and health related potential knowledge behind.IoT data-based fault diagnosis for industrial devices is very helpful to the sustainability and applicability of an IoT ecosystem.But how to efficiently use and fuse this multimodal heterogeneous data to realize intelligent fault diagnosis is still a challenge.In this paper,a novel Deep Multimodal Learning and Fusion(DMLF)based fault diagnosis method is proposed for addressing heterogeneous data from IoT environments where industrial devices coexist.First,a DMLF model is designed by combining a Convolution Neural Network(CNN)and Stacked Denoising Autoencoder(SDAE)together to capture more comprehensive fault knowledge and extract features from different modal data.Second,these multimodal features are seamlessly integrated at a fusion layer and the resulting fused features are further used to train a classifier for recognizing potential faults.Third,a two-stage training algorithm is proposed by combining supervised pre-training and fine-tuning to simplify the training process for deep structure models.A series of experiments are conducted over multimodal heterogeneous data from a gear device to verify our proposed fault diagnosis method.The experimental results show that our method outperforms the benchmarking ones in fault diagnosis accuracy.展开更多
Artificial intelligence(AI)is driving a paradigm shift in gastroenterology and hepa-tology by delivering cutting-edge tools for disease screening,diagnosis,treatment,and prognostic management.Through deep learning,rad...Artificial intelligence(AI)is driving a paradigm shift in gastroenterology and hepa-tology by delivering cutting-edge tools for disease screening,diagnosis,treatment,and prognostic management.Through deep learning,radiomics,and multimodal data integration,AI has achieved diagnostic parity with expert cli-nicians in endoscopic image analysis(e.g.,early gastric cancer detection,colorectal polyp identification)and non-invasive assessment of liver pathologies(e.g.,fibrosis staging,fatty liver typing)while demonstrating utility in personalized care scenarios such as predicting hepatocellular carcinoma recurrence and opti-mizing inflammatory bowel disease treatment responses.Despite these advance-ments challenges persist including limited model generalization due to frag-mented datasets,algorithmic limitations in rare conditions(e.g.,pediatric liver diseases)caused by insufficient training data,and unresolved ethical issues related to bias,accountability,and patient privacy.Mitigation strategies involve constructing standardized multicenter databases,validating AI tools through prospective trials,leveraging federated learning to address data scarcity,and de-veloping interpretable systems(e.g.,attention heatmap visualization)to enhance clinical trust.Integrating generative AI,digital twin technologies,and establishing unified ethical/regulatory frameworks will accelerate AI adoption in primary care and foster equitable healthcare access while interdisciplinary collaboration and evidence-based implementation remain critical for realizing AI’s potential to redefine precision care for digestive disorders,improve global health outcomes,and reshape healthcare equity.展开更多
Population migration data derived from location-based services has often been used to delineate population flows between cities or construct intercity relationship networks to reveal and explore the complex interactio...Population migration data derived from location-based services has often been used to delineate population flows between cities or construct intercity relationship networks to reveal and explore the complex interaction patterns underlying human activities.Nevertheless,the inherent heterogeneity in multimodal migration big data has been ignored.This study conducts an in-depth comparison and quantitative analysis through a comprehensive lens of spatial association.Initially,the intercity interactive networks in China were constructed,utilizing migration data from Baidu and AutoNavi collected during the same time period.Subsequently,the characteristics and spatial structure similarities of the two types of intercity interactive networks were quantitatively assessed and analyzed from overall(network)and local(node)perspectives.Furthermore,the precision of these networks at the local scale is corroborated by constructing an intercity network from mobile phone(MP)data.Results indicate that the intercity interactive networks in China,as delineated by Baidu and AutoNavi migration flows,exhibit a high degree of structure equivalence.The correlation coefficient between these two networks is 0.874.Both networks exhibit a pronounced spatial polarization trend and hierarchical structure.This is evident in their distinct core and peripheral structures,as well as in the varying importance and influence of different nodes within the networks.Nevertheless,there are notable differences worthy of attention.Baidu intercity interactive network exhibits pronounced cross-regional effects,and its high-level interactions are characterized by a“rich-club”phenomenon.The AutoNavi intercity interactive network presents a more significant distance attenuation effect,and the high-level interactions display a gradient distribution pattern.Notably,there exists a substantial correlation between the AutoNavi and MP networks at the local scale,evidenced by a high correlation coefficient of 0.954.Furthermore,the“spatial dislocations”phenomenon was observed within the spatial structures at different levels,extracted from the Baidu and AutoNavi intercity networks.However,the measured results of network spatial structure similarity from three dimensions,namely,node location,node size,and local structure,indicate a relatively high similarity and consistency between the two networks.展开更多
基金supported by the National Natural Science Foundation of China[Grant No.:82172524]the Natural Science Foundation of Hubei Province[Grant No.:2025AFB240].
文摘Bone tumors(BTs)-including osteosarcoma,Ewing sarcoma,and chondrosarcoma-are rare but biologically complex malignancies characterized by pronounced heterogeneity in anatomical location,histological subtype,and molecular alterations.Recent advances in artificial intelligence(AI),particularly deep learning,have enabled the integration of diverse clinical data modalities to support diagnosis,treatment planning,and prognostication in bone oncology.This review provides a comprehensive synthesis of AI-driven multimodal fusion strategies that incorporate radiological imaging,digital pathology,multi-omics profiling,and electronic health records.We conducted a structured review of peer-reviewed literature published between 2015 and early 2025,focusing on the development,validation,and clinical applicability of AI models for BT diagnosis,subtyping,treatment response prediction,and recurrence monitoring.Although multimodal models have demonstrated advantages over unimodal approaches,especially in handling missing data and improving generalizability,most remain constrained by single-center study designs,small sample sizes,and limited prospective or external validation.Persistent technical and translational challenges include semantic misalignment across modalities,incomplete datasets,limited model interpretability,and regulatory and infrastructural barriers to clinical integration.To address these limitations,we highlight emerging directions such as contrastive representation learning,generative data augmentation,transformer-based fusion architectures,and privacy-preserving federated learning.We also discuss the evolving role of foundation models and workflow-integrated AI agents in enhancing scalability and clinical usability.In summary,multimodal AI represents a promising paradigm for advancing precision care in BTs.Realizing its full clinical potential will require methodologically rigorous,biologically informed,and system-level approaches that bridge algorithmic innovation with real-world healthcare delivery.
基金supported by ZTE Industry-University-Institute Coopera⁃tion Funds under Grant No.HC-CN-20221123003.
文摘In recent years,microservice architecture has gained increasing popularity.However,due to the complex and dynamically chang⁃ing nature of microservice systems,failure detection has become more challenging.Traditional root cause analysis methods mostly rely on a single modality of data,which is insufficient to cover all failure information.Existing multimodal methods require collecting high-quality la⁃beled samples and often face challenges in classifying unknown failure categories.To address these challenges,this paper proposes a root cause analysis framework based on a masked graph autoencoder(GAE).The main process involves feature extraction,feature dimensionality reduction based on GAE,and online clustering combined with expert input.The method is experimentally evaluated on two public datasets and compared with two baseline methods,demonstrating significant advantages even with 16%labeled samples.
基金Supported by National Natural Science Foundation of China(Grant Nos.52205288,52130501,52075479)Zhejiang Provincial Key Research&Development Program(Grant No.2021C01110).
文摘With the increasing attention to the state and role of people in intelligent manufacturing, there is a strong demand for human-cyber-physical systems (HCPS) that focus on human-robot interaction. The existing intelligent manufacturing system cannot satisfy efcient human-robot collaborative work. However, unlike machines equipped with sensors, human characteristic information is difcult to be perceived and digitized instantly. In view of the high complexity and uncertainty of the human body, this paper proposes a framework for building a human digital twin (HDT) model based on multimodal data and expounds on the key technologies. Data acquisition system is built to dynamically acquire and update the body state data and physiological data of the human body and realize the digital expression of multi-source heterogeneous human body information. A bidirectional long short-term memory and convolutional neural network (BiLSTM-CNN) based network is devised to fuse multimodal human data and extract the spatiotemporal features, and the human locomotion mode identifcation is taken as an application case. A series of optimization experiments are carried out to improve the performance of the proposed BiLSTM-CNN-based network model. The proposed model is compared with traditional locomotion mode identifcation models. The experimental results proved the superiority of the HDT framework for human locomotion mode identifcation.
文摘In this case study, we hypothesized that sympathetic nerve activity would be higher during conversation with PALRO robot, and that conversation would result in an increase in cerebral blood flow near the Broca’s area. The facial expressions of a human subject were recorded, and cerebral blood flow and heart rate variability were measured during interactions with the humanoid robot. These multimodal data were time-synchronized to quantitatively verify the change from the resting baseline by testing facial expression analysis, cerebral blood flow, and heart rate variability. In conclusion, this subject indicated that sympathetic nervous activity was dominant, suggesting that the subject may have enjoyed and been excited while talking to the robot (normalized High Frequency < normalized Low Frequency: 0.22 ± 0.16 < 0.78 ± 0.16). Cerebral blood flow values were higher during conversation and in the resting state after the experiment than in the resting state before the experiment. Talking increased cerebral blood flow in the frontal region. As the subject was left-handed, it was confirmed that the right side of the brain, where the Broca’s area is located, was particularly activated (Left < right: 0.15 ± 0.21 < 1.25 ± 0.17). In the sections where a “happy” facial emotion was recognized, the examiner-judged “happy” faces and the MTCNN “happy” results were also generally consistent.
基金supported in part by the National Natural Science Foundation of China under Grants 52475102 and 52205101in part by the Guangdong Basic and Applied Basic Research Foundation under Grant 2023A1515240021+1 种基金in part by the Young Talent Support Project of Guangzhou Association for Science and Technology(QT-2024-28)in part by the Youth Development Initiative of Guangdong Association for Science and Technology(SKXRC2025254).
文摘To ensure the safe and stable operation of rotating machinery,intelligent fault diagnosis methods hold significant research value.However,existing diagnostic approaches largely rely on manual feature extraction and expert experience,which limits their adaptability under variable operating conditions and strong noise environments,severely affecting the generalization capability of diagnostic models.To address this issue,this study proposes a multimodal fusion fault diagnosis framework based on Mel-spectrograms and automated machine learning(AutoML).The framework first extracts fault-sensitive Mel time–frequency features from acoustic signals and fuses them with statistical features of vibration signals to construct complementary fault representations.On this basis,automated machine learning techniques are introduced to enable end-to-end diagnostic workflow construction and optimal model configuration acquisition.Finally,diagnostic decisions are achieved by automatically integrating the predictions of multiple high-performance base models.Experimental results on a centrifugal pump vibration and acoustic dataset demonstrate that the proposed framework achieves high diagnostic accuracy under noise-free conditions and maintains strong robustness under noisy interference,validating its efficiency,scalability,and practical value for rotating machinery fault diagnosis.
基金supported by the National Key Research and Development Program of China(2024YFF1207002)the National Natural Science Foundation of China(Grant Nos.82325027,82030047,82441011)the Research Funds of Center for Big Data and Population Health of IHM(JKS2022002,JKS2022001).
文摘1 Introduction.The development of information technology has promoted the application of multimodal,long-temporal,and multiscale data in healthcare[1].However,the effective utilization of multimodal data still faces challenges related to feature redundancy within the modality of long temporal data and across multimodal data[2].
基金supported by the National Natural Science Foundation of China(82271094 to J.Z.).
文摘Background:Retinal vein occlusion(RvO)is a leading cause of visual impairment on a global scale.Its patho-logical mechanisms involve a complex interplay of vascular obstruction,ischemia,and secondary inflammatory responses.Recent interdisciplinary advances,underpinned by the integration of multimodal data,have estab-lished a new paradigm for unraveling the pathophysiological mechanisms of RvO,enabling early diagnosis and personalized treatment strategies.Main text:This review critically synthesizes recent progress at the intersection of machine learning,bioinfor-matics,and clinical medicine,focusing on developing predictive models and deep analysis,exploring molecular mechanisms,and identifying markers associated with RvO.By bridging technological innovation with clinical needs,this review underscores the potential of data-driven strategies to advance RvO research and optimize patient care.Conclusions:Machine learning-bioinformatics integration has revolutionised RvO research through predictive modelling and mechanistic insights,particularly via deep learning-enhanced retinal imaging and multi-omics networks.Despite progress,clinical translation requires resolving data standardisation inconsistencies and model generalizability limitations.Establishing multicentre validation frameworks and interpretable AI tools,coupled with patient-focused data platforms through cross-disciplinary collaboration,could enable precision interventions to optimally preserve vision.
文摘This paper addresses the challenge of efficiently querying multimodal related data in data lakes,a large-scale storage and management system that supports heterogeneous data formats,including structured,semi-structured,and unstructured data.Multimodal data queries are crucial because they enable seamless retrieval of related data across modalities,such as tables,images,and text,which has applications in fields like e-commerce,healthcare,and education.However,existing methods primarily focus on single-modality queries,such as joinable or unionable table discovery,and struggle to handle the heterogeneity and lack of metadata in data lakes while balancing accuracy and efficiency.To tackle these challenges,we propose a Multimodal data Query mechanism for Data Lakes(MQDL),which employs a modality-adaptive indexing mechanism raleted and contrastive learning based embeddings to unify representations across modalities.Additionally,we introduce product quantization to optimize candidate verification during queries,reducing computational overhead while maintaining precision.We evaluate MQDL using a table-image dataset across multiple business scenarios,measuring metrics such as precision,recall,and F1-score.Results show that MQDL achieves an accuracy rate of approximately 90%,while demonstrating strong scalability and reduced query response time compared to traditional methods.These findings highlight MQDL's potential to enhance multimodal data retrieval in complex data lake environments.
基金supported by the National Key Research and Development Program,No.2022YFB4703500Shenzhen High-tech Zone Development Special Plan Innovation Platform Construction Project+3 种基金the Proof of Concept Center for High Precision and High Resolution 4D Imagingthe National Key Research and Development Program,No.2022YFB4703500Beijing Natural Science Foundation,No.L243005National Natural Science Foundation of China,No.82372218.
文摘Effective survival analysis is essential for identifying optimal preventive treatments within smart healthcare systems and leveraging digital health advancements;however,existing prediction models face limitations,primarily relying on ensemble classification techniques with suboptimal performance in both target detection and predictive accuracy.To address these gaps,this paper proposes a multimodal framework that integrates enhanced facial feature detection and temporal predictive modeling.For facial feature extraction,this study developed a lightweight faceregion convolutional neural network(FRegNet)specialized in detecting key facial components,such as eyes and lips in clinical patients that incorporates a residual backbone(Rstem)to enhance feature representation and a facial path aggregated feature pyramid network for multi-resolution feature fusion;comparative experiments reveal that FReg-Net outperforms state-of-the-art target detection algorithms,achieving average precision(AP)of 0.922,average recall of 0.933,mean average precision(mAP)of 0.987,and precision of 0.98–significantly surpassing other mask region-based convolutional neural networks(RCNN)variants,such as mask RCNN-ResNeXt with AP of 0.789 and mAP of 0.957.Based on the extracted facial features and clinical physiological indicators,this study proposes an enhanced temporal encoding-decoding(ETED)model that integrates an adaptive attention mechanism and a gated weighting mechanism to improve predictive performance,with comparative results demonstrating that the ETED variant incorporating facial features(ETEncoding-Decoding-Face)outperforms traditional models,achieving an accuracy of 0.916,precision of 0.850,recall of 0.895,F1 of 0.884,and area under the curve(AUC)of 0.947–outperforming gradient boosting with an accuracy of 0.922,but AUC of 0.669,and other classifiers in comprehensive metrics.The results confirm that the multimodal dataset(facial features+physiological indicators)significantly enhances the prediction accuracy of the seven-day survival conditions of patients.Correlation analysis reveals that chronic health evaluation and mean arterial pressure are positively correlated with survival,while temperature,Glasgow Coma Scale,and fibrinogen are negatively correlated.
基金supported in part by the National Natural Science Foundation of China(Grant No.42201077)the Natural Science Foundation of Shandong Province(No.ZR2021QD074)+2 种基金the China Postdoctoral Science Foundation(No.2023M732105)the Lhasa National Geophysical Observation and Research Station(No.NORSLS22-05)the Youth Innovation Team Project of Higher School in Shandong Province,China(No.2024KJH087).
文摘Seismic hazards pose a major threat to life safety,social development,and the economy.Traditional seismic vulnerability and risk assessments,such as field survey methods,may not be suitable for densely built-up urban areas due to the limited availability of comprehensive data and potential subjectivity in judgment.To overcome these limitations,an integrated method for seismic vulnerability and risk assessment based on multimodal remote sensing data,support vector machine(SVM)and GIScience methods was proposed and applied to the central urban area of Jinan City,Shandong Province,China.First,an area with representative buildings was selected for field survey research,and an attribute information base established.Then,the SVM method was used to establish the susceptibility proxies,which were applied to the whole study area after accuracy evaluation.Finally,the spatial distribution of seismic vulnerability and risk under different seismic intensity scenarios(from VI to X)was analyzed in GIScience.The results show that the average building vulnerability index in the central urban area of Jinan City is 0.53,indicating that the overall seismic performance of buildings is at a moderate level.Under the seismic intensity scenario of VIII,the buildings in the Starting area and New urban district of Jinan would mostly suffer‘Moderate’damage,while Old urban areas,with more seismic-resistant buildings,would experience only‘Slight’damage.This study aims to offer an efficient and accurate method for assessing seismic vulnerability in mid to large-sized cities characterized by concentrated population densities and rapid urbanization,as well as provide a valuable reference for efforts in urban renewal,seismic mitigation,and land planning,particularly in cities and regions of developing countries.Additionally,it contributes to the realization of Sustainable Development Goal 11,which seeks to make cities and human settlements inclusive,safe,resilient,and sustainable.
基金supported by the National Key R&D Program of China(2022YFF1203202,2018YFC2000205)the Strategic Priority Research Program of the Chinese Academy of Sciences(XDB38050200,XDA26040304)the Self-supporting Program of Guangzhou Laboratory(SRPG22-007).
文摘The coronavirus disease 2019 (COVID-19) pandemic has dramatically increased the awareness of emerging infectious diseases. The advancement of multiomics analysis technology has resulted in the development of several databases containing virus information. Several scientists have integrated existing data on viruses to construct phylogenetic trees and predict virus mutation and transmission in different ways, providing prospective technical support for epidemic prevention and control. This review summarized the databases of known emerging infectious viruses and techniques focusing on virus variant forecasting and early warning. It focuses on the multi-dimensional information integration and database construction of emerging infectious viruses, virus mutation spectrum construction and variant forecast model, analysis of the affinity between mutation antigen and the receptor, propagation model of virus dynamic evolution, and monitoring and early warning for variants. As people have suffered from COVID-19 and repeated flu outbreaks, we focused on the research results of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and influenza viruses. This review comprehensively viewed the latest virus research and provided a reference for future virus prevention and control research.
文摘With the rapid development of artificial intelligence(AI)technology,multimodal data integration has become an important means to improve the accuracy of diagnosis and treatment in gastroenterology and hepatology.This article systematically reviews the latest progress of multimodal AI technology in the diagnosis,treatment,and decision-making for gastrointestinal tumors,functional gastrointestinal diseases,and liver diseases,focusing on the innovative applications of endoscopic image AI,pathological section AI,multi-omics data fusion models,and wearable devices combined with natural language processing.Multimodal AI can significantly improve the accuracy of early diagnosis and the efficiency of individualized treatment planning by integrating imaging,pathological data,molecular,and clinical phenotypic data.However,current AI technologies still face challenges such as insufficient data standardization,limited generalization of models,and ethical compliance.This paper proposes solutions,such as the establishment of cross-center data sharing platform,the development of federated learning framework,and the formulation of ethical norms,and looks forward to the application prospect of multimodal large-scale models in the disease management process.This review provides theoretical basis and practical guidance for promoting the clinical translation of AI technology in the field of gastroenterology and hepatology.
基金supported by the National Natural Science Foundation of China(62276092,62303167)the Postdoctoral Fellowship Program(Grade C)of China Postdoctoral Science Foundation(GZC20230707)+4 种基金the Young Elite Scientists Sponsorship Program by Henan Association for Science and Technology(2025HYTP061)the Key Science and Technology Program of Henan Province,China(242102211051,212102310084)Key Scientific Research Projects of Colleges and Universities in Henan Province,China(25A520009)the China Postdoctoral Science Foundation(2024M760808)the Henan Province medical science and technology research plan joint construction project(LHGJ2024069).
文摘Alzheimer’s Disease(AD),a prevalent neurodegenerative disorder characterized by memory loss and cognitive decline,poses significant challenges for individuals and society.Multimodal data fusion has emerged as a promising approach for AD diagnosis,with Graph Convolutional Networks(GCNs)effectively capturing irregular brain information.However,traditional GCN methods face limitations in representing and integrating multimodal data,often resulting in feature mismatch.In this study,we propose a novel Kolmogorov-Arnold Graph Attention Network(KAGAN)model to address this issue through semantic-level alignment.KAGAN incorporates a Multimodal Feature Construction method(MuStaF)to extract structural and functional features from T1-and T2-weighted images,and a Multimodal Graph Adjacency Matrix Construction method(MuGAC)to integrate clinical information,modeling intricate relationships across modalities.Experiments conducted on the ADNI dataset demonstrate the superiority of KAGAN in AD/CN/MCI classification,achieving an accuracy of 98.29±1.21%.This highlights KAGAN’s potential for early AD diagnosis by enabling interactive learning and fusion of multimodal features at the semantic level.The source code of our proposed model and the related datasets are available at https://github.com/sheeprra/KAGAN.
基金supported by the 2021 Guangdong Province(China)Science and Technology Plan Project“Research and Application of Key Technologies for Multi-level Knowledge Retrieval Based on Big Data Intelligence”(Project No.2021B0101420004)the 2022 commissioned project“Cross-border E-commerce Taxation and Related Research”from the State Taxation Administration Guangdong Provincial Taxation Bureau,China.
文摘Purpose:This study aims to integrate large language models(LLMs)with interpretable machine learning methods to develop a multimodal data-driven framework for predicting corporate financial fraud,addressing the limitations of traditional approaches in long-text semantic parsing,model interpretability,and multisource data fusion,thereby providing regulatory agencies with intelligent auditing tools.Design/methodology/approach:Analyzing 5,304 Chinese listed firms’annual reports(2015-2020)from the CSMAD database,this study leverages the Doubao LLMs to generate chunked summaries and 256-dimensional semantic vectors,developing textual semantic features.It integrates 19 financial indicators,11 governance metrics,and linguistic characteristics(tone,readability)with fraud prediction models optimized through a group of Gradient Boosted Decision Tree(GBDT)algorithms.SHAP value analysis in the final model reveals the risk transmission mechanism by quantifying the marginal impacts of financial,governance,and textual features on fraud likelihood.Findings:The study found that LLMs effectively distill lengthy annual reports into semantic summaries,while GBDT algorithms(AUC>0.850)outperform the traditional Logistic Regression model in fraud detection.Multimodal fusion improved performance by 7.4%,with financial,governance,and textual features providing complementary signals.SHAP analysis revealed financial distress,governance conflicts,and narrative patterns(e.g.,tone anchoring,semantic thresholds)as key fraud indicators,highlighting managerial intent in report language.Research limitations:This study identifies three key limitations:1)lack of interpretability for semantic features,2)absence of granular fraud-type differentiation,and 3)unexplored comparative validation with other deep learning methods.Future research will address these gaps to enhance fraud detection precision and model transparency.Practical implications:The developed semantic-enhanced evaluation model provides a quantitative tool for assessing listed companies’information disclosure quality and enables practical implementation through its derivative real-time monitoring system.This advancement significantly strengthens capital market risk early warning capabilities,offering actionable insights for securities regulation.Originality/value:This study presents three key innovations:1)A novel“chunking-summarizationembedding”framework for efficient semantic compression of lengthy annual reports(30,000 words);2)Demonstration of LLMs’superior performance in financial text analysis,outperforming traditional methods by 19.3%;3)A novel“language-psychology-behavior”triad model for analyzing managerial fraud motives.
基金supported by the Funding for Research on the Evolution of Cyberbullying Incidents and Intervention Strategies(24BSH033)Discipline Innovation and Talent Introduction Bases in Higher Education Institutions(B20087).
文摘Hateful meme is a multimodal medium that combines images and texts.The potential hate content of hateful memes has caused serious problems for social media security.The current hateful memes classification task faces significant data scarcity challenges,and direct fine-tuning of large-scale pre-trained models often leads to severe overfitting issues.In addition,it is a challenge to understand the underlying relationship between text and images in the hateful memes.To address these issues,we propose a multimodal hateful memes classification model named LABF,which is based on low-rank adapter layers and bidirectional gated feature fusion.Firstly,low-rank adapter layers are adopted to learn the feature representation of the new dataset.This is achieved by introducing a small number of additional parameters while retaining prior knowledge of the CLIP model,which effectively alleviates the overfitting phenomenon.Secondly,a bidirectional gated feature fusion mechanism is designed to dynamically adjust the interaction weights of text and image features to achieve finer cross-modal fusion.Experimental results show that the method significantly outperforms existing methods on two public datasets,verifying its effectiveness and robustness.
文摘By 2025,research on Traditional Chinese Medicine(TCM)meridians has generated 12-15 macro-level theories and over 20 specific hypotheses,manifesting a highly fragmented research landscape.Objective:This paper proposes the“Holistic Hierarchical Predictive-Integration Hypothesis”(HHPIT)to construct a unified theoretical framework that integrates the rational components of existing meridian hypotheses.Methods:The HHPIT hypothesis systematically reviews current meridian theories,employs interdisciplinary methodologies,integrates artificial intelligence technology,and establishes a three-tier architecture encompassing structural,functional,and systemic layers.Results:HHPIT successfully integrates diverse meridian theories,proposes a computable algorithmic pipeline,and provides specific application protocols for chronic disease treatment,anti-aging,and enhancement of Zang-fu organ functions.Conclusion:HHPIT offers a novel,computable,and verifiable research paradigm for meridian studies,promoting the modernization and internationalization of TCM theory.
基金supported in part by a fund from Bentley Systems,Inc.
文摘Recent advances in computer vision and deep learning have shown that the fusion of depth information can significantly enhance the performance of RGB-based damage detection and segmentation models.However,alongside the advantages,depth-sensing also presents many practical challenges.For instance,the depth sensors impose an additional payload burden on the robotic inspection platforms limiting the operation time and increasing the inspection cost.Additionally,some lidar-based depth sensors have poor outdoor performance due to sunlight contamination during the daytime.In this context,this study investigates the feasibility of abolishing depth-sensing at test time without compromising the segmentation performance.An autonomous damage segmentation framework is developed,based on recent advancements in vision-based multi-modal sensing such as modality hallucination(MH)and monocular depth estimation(MDE),which require depth data only during the model training.At the time of deployment,depth data becomes expendable as it can be simulated from the corresponding RGB frames.This makes it possible to reap the benefits of depth fusion without any depth perception per se.This study explored two different depth encoding techniques and three different fusion strategies in addition to a baseline RGB-based model.The proposed approach is validated on computer-generated RGB-D data of reinforced concrete buildings subjected to seismic damage.It was observed that the surrogate techniques can increase the segmentation IoU by up to 20.1%with a negligible increase in the computation cost.Overall,this study is believed to make a positive contribution to enhancing the resilience of critical civil infrastructure.
基金supported in part by the National Key Research and Development Program of China(No.2018YFB1003700)in part by the National Natural Science Foundation of China(No.61836001)。
文摘Industrial Internet of Things(IoT)connecting society and industrial systems represents a tremendous and promising paradigm shift.With IoT,multimodal and heterogeneous data from industrial devices can be easily collected,and further analyzed to discover device maintenance and health related potential knowledge behind.IoT data-based fault diagnosis for industrial devices is very helpful to the sustainability and applicability of an IoT ecosystem.But how to efficiently use and fuse this multimodal heterogeneous data to realize intelligent fault diagnosis is still a challenge.In this paper,a novel Deep Multimodal Learning and Fusion(DMLF)based fault diagnosis method is proposed for addressing heterogeneous data from IoT environments where industrial devices coexist.First,a DMLF model is designed by combining a Convolution Neural Network(CNN)and Stacked Denoising Autoencoder(SDAE)together to capture more comprehensive fault knowledge and extract features from different modal data.Second,these multimodal features are seamlessly integrated at a fusion layer and the resulting fused features are further used to train a classifier for recognizing potential faults.Third,a two-stage training algorithm is proposed by combining supervised pre-training and fine-tuning to simplify the training process for deep structure models.A series of experiments are conducted over multimodal heterogeneous data from a gear device to verify our proposed fault diagnosis method.The experimental results show that our method outperforms the benchmarking ones in fault diagnosis accuracy.
基金Supported by the Natural Science Foundation of Jilin Province,No.YDZJ202401182ZYTSJilin Provincial Key Laboratory of Precision Infectious Diseases,No.20200601011JCJilin Provincial Engineering Laboratory of Precision Prevention and Control for Common Diseases,Jilin Province Development and Reform Commission,No.2022C036.
文摘Artificial intelligence(AI)is driving a paradigm shift in gastroenterology and hepa-tology by delivering cutting-edge tools for disease screening,diagnosis,treatment,and prognostic management.Through deep learning,radiomics,and multimodal data integration,AI has achieved diagnostic parity with expert cli-nicians in endoscopic image analysis(e.g.,early gastric cancer detection,colorectal polyp identification)and non-invasive assessment of liver pathologies(e.g.,fibrosis staging,fatty liver typing)while demonstrating utility in personalized care scenarios such as predicting hepatocellular carcinoma recurrence and opti-mizing inflammatory bowel disease treatment responses.Despite these advance-ments challenges persist including limited model generalization due to frag-mented datasets,algorithmic limitations in rare conditions(e.g.,pediatric liver diseases)caused by insufficient training data,and unresolved ethical issues related to bias,accountability,and patient privacy.Mitigation strategies involve constructing standardized multicenter databases,validating AI tools through prospective trials,leveraging federated learning to address data scarcity,and de-veloping interpretable systems(e.g.,attention heatmap visualization)to enhance clinical trust.Integrating generative AI,digital twin technologies,and establishing unified ethical/regulatory frameworks will accelerate AI adoption in primary care and foster equitable healthcare access while interdisciplinary collaboration and evidence-based implementation remain critical for realizing AI’s potential to redefine precision care for digestive disorders,improve global health outcomes,and reshape healthcare equity.
基金National Natural Science Foundation of China,No.42361040。
文摘Population migration data derived from location-based services has often been used to delineate population flows between cities or construct intercity relationship networks to reveal and explore the complex interaction patterns underlying human activities.Nevertheless,the inherent heterogeneity in multimodal migration big data has been ignored.This study conducts an in-depth comparison and quantitative analysis through a comprehensive lens of spatial association.Initially,the intercity interactive networks in China were constructed,utilizing migration data from Baidu and AutoNavi collected during the same time period.Subsequently,the characteristics and spatial structure similarities of the two types of intercity interactive networks were quantitatively assessed and analyzed from overall(network)and local(node)perspectives.Furthermore,the precision of these networks at the local scale is corroborated by constructing an intercity network from mobile phone(MP)data.Results indicate that the intercity interactive networks in China,as delineated by Baidu and AutoNavi migration flows,exhibit a high degree of structure equivalence.The correlation coefficient between these two networks is 0.874.Both networks exhibit a pronounced spatial polarization trend and hierarchical structure.This is evident in their distinct core and peripheral structures,as well as in the varying importance and influence of different nodes within the networks.Nevertheless,there are notable differences worthy of attention.Baidu intercity interactive network exhibits pronounced cross-regional effects,and its high-level interactions are characterized by a“rich-club”phenomenon.The AutoNavi intercity interactive network presents a more significant distance attenuation effect,and the high-level interactions display a gradient distribution pattern.Notably,there exists a substantial correlation between the AutoNavi and MP networks at the local scale,evidenced by a high correlation coefficient of 0.954.Furthermore,the“spatial dislocations”phenomenon was observed within the spatial structures at different levels,extracted from the Baidu and AutoNavi intercity networks.However,the measured results of network spatial structure similarity from three dimensions,namely,node location,node size,and local structure,indicate a relatively high similarity and consistency between the two networks.