Bone tumors(BTs)-including osteosarcoma,Ewing sarcoma,and chondrosarcoma-are rare but biologically complex malignancies characterized by pronounced heterogeneity in anatomical location,histological subtype,and molecul...Bone tumors(BTs)-including osteosarcoma,Ewing sarcoma,and chondrosarcoma-are rare but biologically complex malignancies characterized by pronounced heterogeneity in anatomical location,histological subtype,and molecular alterations.Recent advances in artificial intelligence(AI),particularly deep learning,have enabled the integration of diverse clinical data modalities to support diagnosis,treatment planning,and prognostication in bone oncology.This review provides a comprehensive synthesis of AI-driven multimodal fusion strategies that incorporate radiological imaging,digital pathology,multi-omics profiling,and electronic health records.We conducted a structured review of peer-reviewed literature published between 2015 and early 2025,focusing on the development,validation,and clinical applicability of AI models for BT diagnosis,subtyping,treatment response prediction,and recurrence monitoring.Although multimodal models have demonstrated advantages over unimodal approaches,especially in handling missing data and improving generalizability,most remain constrained by single-center study designs,small sample sizes,and limited prospective or external validation.Persistent technical and translational challenges include semantic misalignment across modalities,incomplete datasets,limited model interpretability,and regulatory and infrastructural barriers to clinical integration.To address these limitations,we highlight emerging directions such as contrastive representation learning,generative data augmentation,transformer-based fusion architectures,and privacy-preserving federated learning.We also discuss the evolving role of foundation models and workflow-integrated AI agents in enhancing scalability and clinical usability.In summary,multimodal AI represents a promising paradigm for advancing precision care in BTs.Realizing its full clinical potential will require methodologically rigorous,biologically informed,and system-level approaches that bridge algorithmic innovation with real-world healthcare delivery.展开更多
Recent advances in computer vision and deep learning have shown that the fusion of depth information can significantly enhance the performance of RGB-based damage detection and segmentation models.However,alongside th...Recent advances in computer vision and deep learning have shown that the fusion of depth information can significantly enhance the performance of RGB-based damage detection and segmentation models.However,alongside the advantages,depth-sensing also presents many practical challenges.For instance,the depth sensors impose an additional payload burden on the robotic inspection platforms limiting the operation time and increasing the inspection cost.Additionally,some lidar-based depth sensors have poor outdoor performance due to sunlight contamination during the daytime.In this context,this study investigates the feasibility of abolishing depth-sensing at test time without compromising the segmentation performance.An autonomous damage segmentation framework is developed,based on recent advancements in vision-based multi-modal sensing such as modality hallucination(MH)and monocular depth estimation(MDE),which require depth data only during the model training.At the time of deployment,depth data becomes expendable as it can be simulated from the corresponding RGB frames.This makes it possible to reap the benefits of depth fusion without any depth perception per se.This study explored two different depth encoding techniques and three different fusion strategies in addition to a baseline RGB-based model.The proposed approach is validated on computer-generated RGB-D data of reinforced concrete buildings subjected to seismic damage.It was observed that the surrogate techniques can increase the segmentation IoU by up to 20.1%with a negligible increase in the computation cost.Overall,this study is believed to make a positive contribution to enhancing the resilience of critical civil infrastructure.展开更多
Purpose:This study aims to integrate large language models(LLMs)with interpretable machine learning methods to develop a multimodal data-driven framework for predicting corporate financial fraud,addressing the limitat...Purpose:This study aims to integrate large language models(LLMs)with interpretable machine learning methods to develop a multimodal data-driven framework for predicting corporate financial fraud,addressing the limitations of traditional approaches in long-text semantic parsing,model interpretability,and multisource data fusion,thereby providing regulatory agencies with intelligent auditing tools.Design/methodology/approach:Analyzing 5,304 Chinese listed firms’annual reports(2015-2020)from the CSMAD database,this study leverages the Doubao LLMs to generate chunked summaries and 256-dimensional semantic vectors,developing textual semantic features.It integrates 19 financial indicators,11 governance metrics,and linguistic characteristics(tone,readability)with fraud prediction models optimized through a group of Gradient Boosted Decision Tree(GBDT)algorithms.SHAP value analysis in the final model reveals the risk transmission mechanism by quantifying the marginal impacts of financial,governance,and textual features on fraud likelihood.Findings:The study found that LLMs effectively distill lengthy annual reports into semantic summaries,while GBDT algorithms(AUC>0.850)outperform the traditional Logistic Regression model in fraud detection.Multimodal fusion improved performance by 7.4%,with financial,governance,and textual features providing complementary signals.SHAP analysis revealed financial distress,governance conflicts,and narrative patterns(e.g.,tone anchoring,semantic thresholds)as key fraud indicators,highlighting managerial intent in report language.Research limitations:This study identifies three key limitations:1)lack of interpretability for semantic features,2)absence of granular fraud-type differentiation,and 3)unexplored comparative validation with other deep learning methods.Future research will address these gaps to enhance fraud detection precision and model transparency.Practical implications:The developed semantic-enhanced evaluation model provides a quantitative tool for assessing listed companies’information disclosure quality and enables practical implementation through its derivative real-time monitoring system.This advancement significantly strengthens capital market risk early warning capabilities,offering actionable insights for securities regulation.Originality/value:This study presents three key innovations:1)A novel“chunking-summarizationembedding”framework for efficient semantic compression of lengthy annual reports(30,000 words);2)Demonstration of LLMs’superior performance in financial text analysis,outperforming traditional methods by 19.3%;3)A novel“language-psychology-behavior”triad model for analyzing managerial fraud motives.展开更多
The virtual test platform is a vital tool for ship simulation and testing.However,the numerical pool ship virtual test platform is a complex system that comprises multiple heterogeneous data types,such as relational d...The virtual test platform is a vital tool for ship simulation and testing.However,the numerical pool ship virtual test platform is a complex system that comprises multiple heterogeneous data types,such as relational data,files,text,images,and animations.The analysis,evaluation,and decision-making processes heavily depend on data,which continue to increase in size and complexity.As a result,there is an increasing need for a distributed database system to manage these data.In this paper,we propose a Key-Value database based on a distributed system that can operate on any type of data,regardless of its size or type.This database architecture supports class column storage and load balancing and optimizes the efficiency of I/O bandwidth and CPU resource utilization.Moreover,it is specif-ically designed to handle the storage and access of largefiles.Additionally,we propose a multimodal data fusion mechanism that can connect various descrip-tions of the same substance,enabling the fusion and retrieval of heterogeneous multimodal data to facilitate data analysis.Our approach focuses on indexing and storage,and we compare our solution with Redis,MongoDB,and MySQL through experiments.We demonstrate the performance,scalability,and reliability of our proposed database system while also analysing its architecture’s defects and providing optimization solutions and future research directions.In conclu-sion,our database system provides an efficient and reliable solution for the data management of the virtual test platform of numerical pool ships.展开更多
This project explores the integration of image and point cloud data for 3D object detection using the F-PointNet model,aiming to enhance accuracy and reliability in autonomous driving applications.F-PointNet leverages...This project explores the integration of image and point cloud data for 3D object detection using the F-PointNet model,aiming to enhance accuracy and reliability in autonomous driving applications.F-PointNet leverages multimodal data from RGB cameras and LiDAR to improve environmental perception and object localisation under varied operational conditions.Employing a rigorous methodology,the model incorporates preprocessing and network components such as frustum rotation and T-net adjustments to refine the detection process.Experiments were conducted on the KITTI dataset,which included applying both random and designated perturbations,and assessing their impact on the model’s performance.Results show that random perturbations generally outperform designated ones,especially in complex scenarios,by enhancing the model’s adaptability and capability for generalisation.This study highlights the critical role of methodological innovations and data perturbation strategies in advancing 3D object detection technologies,suggesting that further research is needed to optimise these approaches for broader applications.Furthermore,this research contributes to the development of autonomous systems,emphasising the importance of robust and accurate 3D object detection in enhancing the safety and reliability of autonomous vehicles.展开更多
This paper develops a conceptual foundation-model framework for predicting SME growth in the digital economy.It argues that traditional,linear and data-scarce approaches cannot capture the non-linear,networked and mul...This paper develops a conceptual foundation-model framework for predicting SME growth in the digital economy.It argues that traditional,linear and data-scarce approaches cannot capture the non-linear,networked and multimodal nature of contemporary SME trajectories.Building on organisational life-cycle theory and the resource-based view,the study proposes an architecture that fuses structured fnancial statements,unstructured textual disclosures and graph-based relational data into shared latent representations.Transformer-based models,graph neural networks and contrastive multimodal learning are combined to generate time-varying,frm-level assessments of growth and default risk,accompanied by natural-language diagnostic reports.The paper further discusses how transfer learning,synthetic data and retrieval-augmented prediction can mitigate cold-start problems,enhance causal interpretability and improve responsiveness to macro shocks.At the macro level,the framework has implications for easing bank frm information asymmetries,expanding fnancial inclusion and rethinking the role of fnancial intermediaries in an AI-driven credit ecosystem.展开更多
基金supported by the National Natural Science Foundation of China[Grant No.:82172524]the Natural Science Foundation of Hubei Province[Grant No.:2025AFB240].
文摘Bone tumors(BTs)-including osteosarcoma,Ewing sarcoma,and chondrosarcoma-are rare but biologically complex malignancies characterized by pronounced heterogeneity in anatomical location,histological subtype,and molecular alterations.Recent advances in artificial intelligence(AI),particularly deep learning,have enabled the integration of diverse clinical data modalities to support diagnosis,treatment planning,and prognostication in bone oncology.This review provides a comprehensive synthesis of AI-driven multimodal fusion strategies that incorporate radiological imaging,digital pathology,multi-omics profiling,and electronic health records.We conducted a structured review of peer-reviewed literature published between 2015 and early 2025,focusing on the development,validation,and clinical applicability of AI models for BT diagnosis,subtyping,treatment response prediction,and recurrence monitoring.Although multimodal models have demonstrated advantages over unimodal approaches,especially in handling missing data and improving generalizability,most remain constrained by single-center study designs,small sample sizes,and limited prospective or external validation.Persistent technical and translational challenges include semantic misalignment across modalities,incomplete datasets,limited model interpretability,and regulatory and infrastructural barriers to clinical integration.To address these limitations,we highlight emerging directions such as contrastive representation learning,generative data augmentation,transformer-based fusion architectures,and privacy-preserving federated learning.We also discuss the evolving role of foundation models and workflow-integrated AI agents in enhancing scalability and clinical usability.In summary,multimodal AI represents a promising paradigm for advancing precision care in BTs.Realizing its full clinical potential will require methodologically rigorous,biologically informed,and system-level approaches that bridge algorithmic innovation with real-world healthcare delivery.
基金supported in part by a fund from Bentley Systems,Inc.
文摘Recent advances in computer vision and deep learning have shown that the fusion of depth information can significantly enhance the performance of RGB-based damage detection and segmentation models.However,alongside the advantages,depth-sensing also presents many practical challenges.For instance,the depth sensors impose an additional payload burden on the robotic inspection platforms limiting the operation time and increasing the inspection cost.Additionally,some lidar-based depth sensors have poor outdoor performance due to sunlight contamination during the daytime.In this context,this study investigates the feasibility of abolishing depth-sensing at test time without compromising the segmentation performance.An autonomous damage segmentation framework is developed,based on recent advancements in vision-based multi-modal sensing such as modality hallucination(MH)and monocular depth estimation(MDE),which require depth data only during the model training.At the time of deployment,depth data becomes expendable as it can be simulated from the corresponding RGB frames.This makes it possible to reap the benefits of depth fusion without any depth perception per se.This study explored two different depth encoding techniques and three different fusion strategies in addition to a baseline RGB-based model.The proposed approach is validated on computer-generated RGB-D data of reinforced concrete buildings subjected to seismic damage.It was observed that the surrogate techniques can increase the segmentation IoU by up to 20.1%with a negligible increase in the computation cost.Overall,this study is believed to make a positive contribution to enhancing the resilience of critical civil infrastructure.
基金supported by the 2021 Guangdong Province(China)Science and Technology Plan Project“Research and Application of Key Technologies for Multi-level Knowledge Retrieval Based on Big Data Intelligence”(Project No.2021B0101420004)the 2022 commissioned project“Cross-border E-commerce Taxation and Related Research”from the State Taxation Administration Guangdong Provincial Taxation Bureau,China.
文摘Purpose:This study aims to integrate large language models(LLMs)with interpretable machine learning methods to develop a multimodal data-driven framework for predicting corporate financial fraud,addressing the limitations of traditional approaches in long-text semantic parsing,model interpretability,and multisource data fusion,thereby providing regulatory agencies with intelligent auditing tools.Design/methodology/approach:Analyzing 5,304 Chinese listed firms’annual reports(2015-2020)from the CSMAD database,this study leverages the Doubao LLMs to generate chunked summaries and 256-dimensional semantic vectors,developing textual semantic features.It integrates 19 financial indicators,11 governance metrics,and linguistic characteristics(tone,readability)with fraud prediction models optimized through a group of Gradient Boosted Decision Tree(GBDT)algorithms.SHAP value analysis in the final model reveals the risk transmission mechanism by quantifying the marginal impacts of financial,governance,and textual features on fraud likelihood.Findings:The study found that LLMs effectively distill lengthy annual reports into semantic summaries,while GBDT algorithms(AUC>0.850)outperform the traditional Logistic Regression model in fraud detection.Multimodal fusion improved performance by 7.4%,with financial,governance,and textual features providing complementary signals.SHAP analysis revealed financial distress,governance conflicts,and narrative patterns(e.g.,tone anchoring,semantic thresholds)as key fraud indicators,highlighting managerial intent in report language.Research limitations:This study identifies three key limitations:1)lack of interpretability for semantic features,2)absence of granular fraud-type differentiation,and 3)unexplored comparative validation with other deep learning methods.Future research will address these gaps to enhance fraud detection precision and model transparency.Practical implications:The developed semantic-enhanced evaluation model provides a quantitative tool for assessing listed companies’information disclosure quality and enables practical implementation through its derivative real-time monitoring system.This advancement significantly strengthens capital market risk early warning capabilities,offering actionable insights for securities regulation.Originality/value:This study presents three key innovations:1)A novel“chunking-summarizationembedding”framework for efficient semantic compression of lengthy annual reports(30,000 words);2)Demonstration of LLMs’superior performance in financial text analysis,outperforming traditional methods by 19.3%;3)A novel“language-psychology-behavior”triad model for analyzing managerial fraud motives.
文摘The virtual test platform is a vital tool for ship simulation and testing.However,the numerical pool ship virtual test platform is a complex system that comprises multiple heterogeneous data types,such as relational data,files,text,images,and animations.The analysis,evaluation,and decision-making processes heavily depend on data,which continue to increase in size and complexity.As a result,there is an increasing need for a distributed database system to manage these data.In this paper,we propose a Key-Value database based on a distributed system that can operate on any type of data,regardless of its size or type.This database architecture supports class column storage and load balancing and optimizes the efficiency of I/O bandwidth and CPU resource utilization.Moreover,it is specif-ically designed to handle the storage and access of largefiles.Additionally,we propose a multimodal data fusion mechanism that can connect various descrip-tions of the same substance,enabling the fusion and retrieval of heterogeneous multimodal data to facilitate data analysis.Our approach focuses on indexing and storage,and we compare our solution with Redis,MongoDB,and MySQL through experiments.We demonstrate the performance,scalability,and reliability of our proposed database system while also analysing its architecture’s defects and providing optimization solutions and future research directions.In conclu-sion,our database system provides an efficient and reliable solution for the data management of the virtual test platform of numerical pool ships.
文摘This project explores the integration of image and point cloud data for 3D object detection using the F-PointNet model,aiming to enhance accuracy and reliability in autonomous driving applications.F-PointNet leverages multimodal data from RGB cameras and LiDAR to improve environmental perception and object localisation under varied operational conditions.Employing a rigorous methodology,the model incorporates preprocessing and network components such as frustum rotation and T-net adjustments to refine the detection process.Experiments were conducted on the KITTI dataset,which included applying both random and designated perturbations,and assessing their impact on the model’s performance.Results show that random perturbations generally outperform designated ones,especially in complex scenarios,by enhancing the model’s adaptability and capability for generalisation.This study highlights the critical role of methodological innovations and data perturbation strategies in advancing 3D object detection technologies,suggesting that further research is needed to optimise these approaches for broader applications.Furthermore,this research contributes to the development of autonomous systems,emphasising the importance of robust and accurate 3D object detection in enhancing the safety and reliability of autonomous vehicles.
文摘This paper develops a conceptual foundation-model framework for predicting SME growth in the digital economy.It argues that traditional,linear and data-scarce approaches cannot capture the non-linear,networked and multimodal nature of contemporary SME trajectories.Building on organisational life-cycle theory and the resource-based view,the study proposes an architecture that fuses structured fnancial statements,unstructured textual disclosures and graph-based relational data into shared latent representations.Transformer-based models,graph neural networks and contrastive multimodal learning are combined to generate time-varying,frm-level assessments of growth and default risk,accompanied by natural-language diagnostic reports.The paper further discusses how transfer learning,synthetic data and retrieval-augmented prediction can mitigate cold-start problems,enhance causal interpretability and improve responsiveness to macro shocks.At the macro level,the framework has implications for easing bank frm information asymmetries,expanding fnancial inclusion and rethinking the role of fnancial intermediaries in an AI-driven credit ecosystem.