Bone tumors(BTs)-including osteosarcoma,Ewing sarcoma,and chondrosarcoma-are rare but biologically complex malignancies characterized by pronounced heterogeneity in anatomical location,histological subtype,and molecul...Bone tumors(BTs)-including osteosarcoma,Ewing sarcoma,and chondrosarcoma-are rare but biologically complex malignancies characterized by pronounced heterogeneity in anatomical location,histological subtype,and molecular alterations.Recent advances in artificial intelligence(AI),particularly deep learning,have enabled the integration of diverse clinical data modalities to support diagnosis,treatment planning,and prognostication in bone oncology.This review provides a comprehensive synthesis of AI-driven multimodal fusion strategies that incorporate radiological imaging,digital pathology,multi-omics profiling,and electronic health records.We conducted a structured review of peer-reviewed literature published between 2015 and early 2025,focusing on the development,validation,and clinical applicability of AI models for BT diagnosis,subtyping,treatment response prediction,and recurrence monitoring.Although multimodal models have demonstrated advantages over unimodal approaches,especially in handling missing data and improving generalizability,most remain constrained by single-center study designs,small sample sizes,and limited prospective or external validation.Persistent technical and translational challenges include semantic misalignment across modalities,incomplete datasets,limited model interpretability,and regulatory and infrastructural barriers to clinical integration.To address these limitations,we highlight emerging directions such as contrastive representation learning,generative data augmentation,transformer-based fusion architectures,and privacy-preserving federated learning.We also discuss the evolving role of foundation models and workflow-integrated AI agents in enhancing scalability and clinical usability.In summary,multimodal AI represents a promising paradigm for advancing precision care in BTs.Realizing its full clinical potential will require methodologically rigorous,biologically informed,and system-level approaches that bridge algorithmic innovation with real-world healthcare delivery.展开更多
Recent advances in computer vision and deep learning have shown that the fusion of depth information can significantly enhance the performance of RGB-based damage detection and segmentation models.However,alongside th...Recent advances in computer vision and deep learning have shown that the fusion of depth information can significantly enhance the performance of RGB-based damage detection and segmentation models.However,alongside the advantages,depth-sensing also presents many practical challenges.For instance,the depth sensors impose an additional payload burden on the robotic inspection platforms limiting the operation time and increasing the inspection cost.Additionally,some lidar-based depth sensors have poor outdoor performance due to sunlight contamination during the daytime.In this context,this study investigates the feasibility of abolishing depth-sensing at test time without compromising the segmentation performance.An autonomous damage segmentation framework is developed,based on recent advancements in vision-based multi-modal sensing such as modality hallucination(MH)and monocular depth estimation(MDE),which require depth data only during the model training.At the time of deployment,depth data becomes expendable as it can be simulated from the corresponding RGB frames.This makes it possible to reap the benefits of depth fusion without any depth perception per se.This study explored two different depth encoding techniques and three different fusion strategies in addition to a baseline RGB-based model.The proposed approach is validated on computer-generated RGB-D data of reinforced concrete buildings subjected to seismic damage.It was observed that the surrogate techniques can increase the segmentation IoU by up to 20.1%with a negligible increase in the computation cost.Overall,this study is believed to make a positive contribution to enhancing the resilience of critical civil infrastructure.展开更多
Alzheimer’s Disease(AD),a prevalent neurodegenerative disorder characterized by memory loss and cognitive decline,poses significant challenges for individuals and society.Multimodal data fusion has emerged as a promi...Alzheimer’s Disease(AD),a prevalent neurodegenerative disorder characterized by memory loss and cognitive decline,poses significant challenges for individuals and society.Multimodal data fusion has emerged as a promising approach for AD diagnosis,with Graph Convolutional Networks(GCNs)effectively capturing irregular brain information.However,traditional GCN methods face limitations in representing and integrating multimodal data,often resulting in feature mismatch.In this study,we propose a novel Kolmogorov-Arnold Graph Attention Network(KAGAN)model to address this issue through semantic-level alignment.KAGAN incorporates a Multimodal Feature Construction method(MuStaF)to extract structural and functional features from T1-and T2-weighted images,and a Multimodal Graph Adjacency Matrix Construction method(MuGAC)to integrate clinical information,modeling intricate relationships across modalities.Experiments conducted on the ADNI dataset demonstrate the superiority of KAGAN in AD/CN/MCI classification,achieving an accuracy of 98.29±1.21%.This highlights KAGAN’s potential for early AD diagnosis by enabling interactive learning and fusion of multimodal features at the semantic level.The source code of our proposed model and the related datasets are available at https://github.com/sheeprra/KAGAN.展开更多
Purpose:This study aims to integrate large language models(LLMs)with interpretable machine learning methods to develop a multimodal data-driven framework for predicting corporate financial fraud,addressing the limitat...Purpose:This study aims to integrate large language models(LLMs)with interpretable machine learning methods to develop a multimodal data-driven framework for predicting corporate financial fraud,addressing the limitations of traditional approaches in long-text semantic parsing,model interpretability,and multisource data fusion,thereby providing regulatory agencies with intelligent auditing tools.Design/methodology/approach:Analyzing 5,304 Chinese listed firms’annual reports(2015-2020)from the CSMAD database,this study leverages the Doubao LLMs to generate chunked summaries and 256-dimensional semantic vectors,developing textual semantic features.It integrates 19 financial indicators,11 governance metrics,and linguistic characteristics(tone,readability)with fraud prediction models optimized through a group of Gradient Boosted Decision Tree(GBDT)algorithms.SHAP value analysis in the final model reveals the risk transmission mechanism by quantifying the marginal impacts of financial,governance,and textual features on fraud likelihood.Findings:The study found that LLMs effectively distill lengthy annual reports into semantic summaries,while GBDT algorithms(AUC>0.850)outperform the traditional Logistic Regression model in fraud detection.Multimodal fusion improved performance by 7.4%,with financial,governance,and textual features providing complementary signals.SHAP analysis revealed financial distress,governance conflicts,and narrative patterns(e.g.,tone anchoring,semantic thresholds)as key fraud indicators,highlighting managerial intent in report language.Research limitations:This study identifies three key limitations:1)lack of interpretability for semantic features,2)absence of granular fraud-type differentiation,and 3)unexplored comparative validation with other deep learning methods.Future research will address these gaps to enhance fraud detection precision and model transparency.Practical implications:The developed semantic-enhanced evaluation model provides a quantitative tool for assessing listed companies’information disclosure quality and enables practical implementation through its derivative real-time monitoring system.This advancement significantly strengthens capital market risk early warning capabilities,offering actionable insights for securities regulation.Originality/value:This study presents three key innovations:1)A novel“chunking-summarizationembedding”framework for efficient semantic compression of lengthy annual reports(30,000 words);2)Demonstration of LLMs’superior performance in financial text analysis,outperforming traditional methods by 19.3%;3)A novel“language-psychology-behavior”triad model for analyzing managerial fraud motives.展开更多
The virtual test platform is a vital tool for ship simulation and testing.However,the numerical pool ship virtual test platform is a complex system that comprises multiple heterogeneous data types,such as relational d...The virtual test platform is a vital tool for ship simulation and testing.However,the numerical pool ship virtual test platform is a complex system that comprises multiple heterogeneous data types,such as relational data,files,text,images,and animations.The analysis,evaluation,and decision-making processes heavily depend on data,which continue to increase in size and complexity.As a result,there is an increasing need for a distributed database system to manage these data.In this paper,we propose a Key-Value database based on a distributed system that can operate on any type of data,regardless of its size or type.This database architecture supports class column storage and load balancing and optimizes the efficiency of I/O bandwidth and CPU resource utilization.Moreover,it is specif-ically designed to handle the storage and access of largefiles.Additionally,we propose a multimodal data fusion mechanism that can connect various descrip-tions of the same substance,enabling the fusion and retrieval of heterogeneous multimodal data to facilitate data analysis.Our approach focuses on indexing and storage,and we compare our solution with Redis,MongoDB,and MySQL through experiments.We demonstrate the performance,scalability,and reliability of our proposed database system while also analysing its architecture’s defects and providing optimization solutions and future research directions.In conclu-sion,our database system provides an efficient and reliable solution for the data management of the virtual test platform of numerical pool ships.展开更多
This paper develops a conceptual foundation-model framework for predicting SME growth in the digital economy.It argues that traditional,linear and data-scarce approaches cannot capture the non-linear,networked and mul...This paper develops a conceptual foundation-model framework for predicting SME growth in the digital economy.It argues that traditional,linear and data-scarce approaches cannot capture the non-linear,networked and multimodal nature of contemporary SME trajectories.Building on organisational life-cycle theory and the resource-based view,the study proposes an architecture that fuses structured fnancial statements,unstructured textual disclosures and graph-based relational data into shared latent representations.Transformer-based models,graph neural networks and contrastive multimodal learning are combined to generate time-varying,frm-level assessments of growth and default risk,accompanied by natural-language diagnostic reports.The paper further discusses how transfer learning,synthetic data and retrieval-augmented prediction can mitigate cold-start problems,enhance causal interpretability and improve responsiveness to macro shocks.At the macro level,the framework has implications for easing bank frm information asymmetries,expanding fnancial inclusion and rethinking the role of fnancial intermediaries in an AI-driven credit ecosystem.展开更多
This project explores the integration of image and point cloud data for 3D object detection using the F-PointNet model,aiming to enhance accuracy and reliability in autonomous driving applications.F-PointNet leverages...This project explores the integration of image and point cloud data for 3D object detection using the F-PointNet model,aiming to enhance accuracy and reliability in autonomous driving applications.F-PointNet leverages multimodal data from RGB cameras and LiDAR to improve environmental perception and object localisation under varied operational conditions.Employing a rigorous methodology,the model incorporates preprocessing and network components such as frustum rotation and T-net adjustments to refine the detection process.Experiments were conducted on the KITTI dataset,which included applying both random and designated perturbations,and assessing their impact on the model’s performance.Results show that random perturbations generally outperform designated ones,especially in complex scenarios,by enhancing the model’s adaptability and capability for generalisation.This study highlights the critical role of methodological innovations and data perturbation strategies in advancing 3D object detection technologies,suggesting that further research is needed to optimise these approaches for broader applications.Furthermore,this research contributes to the development of autonomous systems,emphasising the importance of robust and accurate 3D object detection in enhancing the safety and reliability of autonomous vehicles.展开更多
Artificial intelligence(AI)-driven data-centric paradigms are catalyzing a paradigm shift in radiopharmaceutical development and molecular imaging,two pivotal technologies that underpin precision nuclear medicine.This...Artificial intelligence(AI)-driven data-centric paradigms are catalyzing a paradigm shift in radiopharmaceutical development and molecular imaging,two pivotal technologies that underpin precision nuclear medicine.This review focuses on the cutting-edge applications of AI in radiopharmaceutical discovery and molecular image analytics,and systematically investigates the technical principles and typical cases of Deep Learning algorithms(e.g.,Graph Neural Networks(GNNs),Generative Adversarial Networks(GANs),and Transformer Models)in target identification,ligand design,pharmacokinetic optimization,and image reconstruction and enhancement.By integrating multi-omics data and 3D structural information,AI can significantly improve the accuracy of target affinity prediction for radiopharmaceuticals and accelerate the design of novel ligands.In the field of molecular imaging,AI-driven low-dose single-photon emission computed tomography(SPECT)and positron emission tomography(PET)image reconstruction,tumor segmentation,and quantitative analysis techniques have significantly improved the diagnostic efficiency and accuracy,providing a reliable basis for individualized treatment.In addition,the paper discusses data privacy,model generalization,and ethical challenges faced by AI in clinical translation,and looks forward to the future direction of multidisciplinary integration(e.g.,combining AI with radiochemistry and nuclear medicine)and technological innovations,which will help precision medicine leap from theory to practice.展开更多
基金supported by the National Natural Science Foundation of China[Grant No.:82172524]the Natural Science Foundation of Hubei Province[Grant No.:2025AFB240].
文摘Bone tumors(BTs)-including osteosarcoma,Ewing sarcoma,and chondrosarcoma-are rare but biologically complex malignancies characterized by pronounced heterogeneity in anatomical location,histological subtype,and molecular alterations.Recent advances in artificial intelligence(AI),particularly deep learning,have enabled the integration of diverse clinical data modalities to support diagnosis,treatment planning,and prognostication in bone oncology.This review provides a comprehensive synthesis of AI-driven multimodal fusion strategies that incorporate radiological imaging,digital pathology,multi-omics profiling,and electronic health records.We conducted a structured review of peer-reviewed literature published between 2015 and early 2025,focusing on the development,validation,and clinical applicability of AI models for BT diagnosis,subtyping,treatment response prediction,and recurrence monitoring.Although multimodal models have demonstrated advantages over unimodal approaches,especially in handling missing data and improving generalizability,most remain constrained by single-center study designs,small sample sizes,and limited prospective or external validation.Persistent technical and translational challenges include semantic misalignment across modalities,incomplete datasets,limited model interpretability,and regulatory and infrastructural barriers to clinical integration.To address these limitations,we highlight emerging directions such as contrastive representation learning,generative data augmentation,transformer-based fusion architectures,and privacy-preserving federated learning.We also discuss the evolving role of foundation models and workflow-integrated AI agents in enhancing scalability and clinical usability.In summary,multimodal AI represents a promising paradigm for advancing precision care in BTs.Realizing its full clinical potential will require methodologically rigorous,biologically informed,and system-level approaches that bridge algorithmic innovation with real-world healthcare delivery.
基金supported in part by a fund from Bentley Systems,Inc.
文摘Recent advances in computer vision and deep learning have shown that the fusion of depth information can significantly enhance the performance of RGB-based damage detection and segmentation models.However,alongside the advantages,depth-sensing also presents many practical challenges.For instance,the depth sensors impose an additional payload burden on the robotic inspection platforms limiting the operation time and increasing the inspection cost.Additionally,some lidar-based depth sensors have poor outdoor performance due to sunlight contamination during the daytime.In this context,this study investigates the feasibility of abolishing depth-sensing at test time without compromising the segmentation performance.An autonomous damage segmentation framework is developed,based on recent advancements in vision-based multi-modal sensing such as modality hallucination(MH)and monocular depth estimation(MDE),which require depth data only during the model training.At the time of deployment,depth data becomes expendable as it can be simulated from the corresponding RGB frames.This makes it possible to reap the benefits of depth fusion without any depth perception per se.This study explored two different depth encoding techniques and three different fusion strategies in addition to a baseline RGB-based model.The proposed approach is validated on computer-generated RGB-D data of reinforced concrete buildings subjected to seismic damage.It was observed that the surrogate techniques can increase the segmentation IoU by up to 20.1%with a negligible increase in the computation cost.Overall,this study is believed to make a positive contribution to enhancing the resilience of critical civil infrastructure.
基金supported by the National Natural Science Foundation of China(62276092,62303167)the Postdoctoral Fellowship Program(Grade C)of China Postdoctoral Science Foundation(GZC20230707)+4 种基金the Young Elite Scientists Sponsorship Program by Henan Association for Science and Technology(2025HYTP061)the Key Science and Technology Program of Henan Province,China(242102211051,212102310084)Key Scientific Research Projects of Colleges and Universities in Henan Province,China(25A520009)the China Postdoctoral Science Foundation(2024M760808)the Henan Province medical science and technology research plan joint construction project(LHGJ2024069).
文摘Alzheimer’s Disease(AD),a prevalent neurodegenerative disorder characterized by memory loss and cognitive decline,poses significant challenges for individuals and society.Multimodal data fusion has emerged as a promising approach for AD diagnosis,with Graph Convolutional Networks(GCNs)effectively capturing irregular brain information.However,traditional GCN methods face limitations in representing and integrating multimodal data,often resulting in feature mismatch.In this study,we propose a novel Kolmogorov-Arnold Graph Attention Network(KAGAN)model to address this issue through semantic-level alignment.KAGAN incorporates a Multimodal Feature Construction method(MuStaF)to extract structural and functional features from T1-and T2-weighted images,and a Multimodal Graph Adjacency Matrix Construction method(MuGAC)to integrate clinical information,modeling intricate relationships across modalities.Experiments conducted on the ADNI dataset demonstrate the superiority of KAGAN in AD/CN/MCI classification,achieving an accuracy of 98.29±1.21%.This highlights KAGAN’s potential for early AD diagnosis by enabling interactive learning and fusion of multimodal features at the semantic level.The source code of our proposed model and the related datasets are available at https://github.com/sheeprra/KAGAN.
基金supported by the 2021 Guangdong Province(China)Science and Technology Plan Project“Research and Application of Key Technologies for Multi-level Knowledge Retrieval Based on Big Data Intelligence”(Project No.2021B0101420004)the 2022 commissioned project“Cross-border E-commerce Taxation and Related Research”from the State Taxation Administration Guangdong Provincial Taxation Bureau,China.
文摘Purpose:This study aims to integrate large language models(LLMs)with interpretable machine learning methods to develop a multimodal data-driven framework for predicting corporate financial fraud,addressing the limitations of traditional approaches in long-text semantic parsing,model interpretability,and multisource data fusion,thereby providing regulatory agencies with intelligent auditing tools.Design/methodology/approach:Analyzing 5,304 Chinese listed firms’annual reports(2015-2020)from the CSMAD database,this study leverages the Doubao LLMs to generate chunked summaries and 256-dimensional semantic vectors,developing textual semantic features.It integrates 19 financial indicators,11 governance metrics,and linguistic characteristics(tone,readability)with fraud prediction models optimized through a group of Gradient Boosted Decision Tree(GBDT)algorithms.SHAP value analysis in the final model reveals the risk transmission mechanism by quantifying the marginal impacts of financial,governance,and textual features on fraud likelihood.Findings:The study found that LLMs effectively distill lengthy annual reports into semantic summaries,while GBDT algorithms(AUC>0.850)outperform the traditional Logistic Regression model in fraud detection.Multimodal fusion improved performance by 7.4%,with financial,governance,and textual features providing complementary signals.SHAP analysis revealed financial distress,governance conflicts,and narrative patterns(e.g.,tone anchoring,semantic thresholds)as key fraud indicators,highlighting managerial intent in report language.Research limitations:This study identifies three key limitations:1)lack of interpretability for semantic features,2)absence of granular fraud-type differentiation,and 3)unexplored comparative validation with other deep learning methods.Future research will address these gaps to enhance fraud detection precision and model transparency.Practical implications:The developed semantic-enhanced evaluation model provides a quantitative tool for assessing listed companies’information disclosure quality and enables practical implementation through its derivative real-time monitoring system.This advancement significantly strengthens capital market risk early warning capabilities,offering actionable insights for securities regulation.Originality/value:This study presents three key innovations:1)A novel“chunking-summarizationembedding”framework for efficient semantic compression of lengthy annual reports(30,000 words);2)Demonstration of LLMs’superior performance in financial text analysis,outperforming traditional methods by 19.3%;3)A novel“language-psychology-behavior”triad model for analyzing managerial fraud motives.
文摘The virtual test platform is a vital tool for ship simulation and testing.However,the numerical pool ship virtual test platform is a complex system that comprises multiple heterogeneous data types,such as relational data,files,text,images,and animations.The analysis,evaluation,and decision-making processes heavily depend on data,which continue to increase in size and complexity.As a result,there is an increasing need for a distributed database system to manage these data.In this paper,we propose a Key-Value database based on a distributed system that can operate on any type of data,regardless of its size or type.This database architecture supports class column storage and load balancing and optimizes the efficiency of I/O bandwidth and CPU resource utilization.Moreover,it is specif-ically designed to handle the storage and access of largefiles.Additionally,we propose a multimodal data fusion mechanism that can connect various descrip-tions of the same substance,enabling the fusion and retrieval of heterogeneous multimodal data to facilitate data analysis.Our approach focuses on indexing and storage,and we compare our solution with Redis,MongoDB,and MySQL through experiments.We demonstrate the performance,scalability,and reliability of our proposed database system while also analysing its architecture’s defects and providing optimization solutions and future research directions.In conclu-sion,our database system provides an efficient and reliable solution for the data management of the virtual test platform of numerical pool ships.
文摘This paper develops a conceptual foundation-model framework for predicting SME growth in the digital economy.It argues that traditional,linear and data-scarce approaches cannot capture the non-linear,networked and multimodal nature of contemporary SME trajectories.Building on organisational life-cycle theory and the resource-based view,the study proposes an architecture that fuses structured fnancial statements,unstructured textual disclosures and graph-based relational data into shared latent representations.Transformer-based models,graph neural networks and contrastive multimodal learning are combined to generate time-varying,frm-level assessments of growth and default risk,accompanied by natural-language diagnostic reports.The paper further discusses how transfer learning,synthetic data and retrieval-augmented prediction can mitigate cold-start problems,enhance causal interpretability and improve responsiveness to macro shocks.At the macro level,the framework has implications for easing bank frm information asymmetries,expanding fnancial inclusion and rethinking the role of fnancial intermediaries in an AI-driven credit ecosystem.
文摘This project explores the integration of image and point cloud data for 3D object detection using the F-PointNet model,aiming to enhance accuracy and reliability in autonomous driving applications.F-PointNet leverages multimodal data from RGB cameras and LiDAR to improve environmental perception and object localisation under varied operational conditions.Employing a rigorous methodology,the model incorporates preprocessing and network components such as frustum rotation and T-net adjustments to refine the detection process.Experiments were conducted on the KITTI dataset,which included applying both random and designated perturbations,and assessing their impact on the model’s performance.Results show that random perturbations generally outperform designated ones,especially in complex scenarios,by enhancing the model’s adaptability and capability for generalisation.This study highlights the critical role of methodological innovations and data perturbation strategies in advancing 3D object detection technologies,suggesting that further research is needed to optimise these approaches for broader applications.Furthermore,this research contributes to the development of autonomous systems,emphasising the importance of robust and accurate 3D object detection in enhancing the safety and reliability of autonomous vehicles.
基金supported by the Major Research Plan of NSFC project(No.92359203,China)Youth Beijing Scholar 2024(No.113,China)+1 种基金Youth Talent Support Program A002863(China)Science Foundation of Peking University Cancer Hospital BJCH2025BJ02.
文摘Artificial intelligence(AI)-driven data-centric paradigms are catalyzing a paradigm shift in radiopharmaceutical development and molecular imaging,two pivotal technologies that underpin precision nuclear medicine.This review focuses on the cutting-edge applications of AI in radiopharmaceutical discovery and molecular image analytics,and systematically investigates the technical principles and typical cases of Deep Learning algorithms(e.g.,Graph Neural Networks(GNNs),Generative Adversarial Networks(GANs),and Transformer Models)in target identification,ligand design,pharmacokinetic optimization,and image reconstruction and enhancement.By integrating multi-omics data and 3D structural information,AI can significantly improve the accuracy of target affinity prediction for radiopharmaceuticals and accelerate the design of novel ligands.In the field of molecular imaging,AI-driven low-dose single-photon emission computed tomography(SPECT)and positron emission tomography(PET)image reconstruction,tumor segmentation,and quantitative analysis techniques have significantly improved the diagnostic efficiency and accuracy,providing a reliable basis for individualized treatment.In addition,the paper discusses data privacy,model generalization,and ethical challenges faced by AI in clinical translation,and looks forward to the future direction of multidisciplinary integration(e.g.,combining AI with radiochemistry and nuclear medicine)and technological innovations,which will help precision medicine leap from theory to practice.