期刊文献+
共找到9篇文章
< 1 >
每页显示 20 50 100
DeepGut:A collaborative multimodal large language model framework for digestive disease assisted diagnosis and treatment
1
作者 Xiao-Han Wan Mei-Xia Liu +6 位作者 Yan Zhang Guan-Jun Kou Lei-Qi Xu Han Liu Xiao-Yun Yang Xiu-Li Zuo Yan-Qing Li 《World Journal of Gastroenterology》 2025年第31期92-100,共9页
BACKGROUND Gastrointestinal diseases have complex etiologies and clinical presentations.An accurate diagnosis requires physicians to integrate diverse information,including medical history,laboratory test results,and ... BACKGROUND Gastrointestinal diseases have complex etiologies and clinical presentations.An accurate diagnosis requires physicians to integrate diverse information,including medical history,laboratory test results,and imaging findings.Existing artificial intelligence-assisted diagnostic tools are limited to single-modality information,resulting in recommendations that are often incomplete and may be associated with clinical or legal risks.AIM To develop and evaluate a collaborative multimodal large language model(LLM)framework for clinical decision-making in digestive diseases.METHODS In this observational study,DeepGut,a multimodal LLM collaborative diagnostic framework,was developed to integrate four distinct large models into a four-tiered structure.The framework sequentially accomplishes multimodal infor-mation extraction,logical“chain”construction,diagnostic and treatment suggestion generation,and risk analysis.The model was evaluated using objective metrics,which assess the reliability and comprehensiveness of model-generated results,and subjective expert opinions,which examine the effectiveness of the framework in assisting physicians.RESULTS The diagnostic and treatment recommendations generated by the DeepGut framework achieved exceptional performance,with a diagnostic accuracy of 97.8%,diagnostic completeness of 93.9%,treatment plan accuracy of 95.2%,and treatment plan completeness of 98.0%,significantly surpassing the capabilities of single-modal LLM-based diagnostic tools.Experts evaluating the framework commended the completeness,relevance,and logical coherence of its outputs.However,the collaborative multimodal LLM approach resulted in increased input and output token counts,leading to higher computational costs and extended diagnostic times.CONCLUSION The framework achieves successful integration of multimodal diagnostic data,demonstrating enhanced performance enabled by multimodal LLM collaboration,which opens new horizons for the clinical application of artificial intelligence-assisted technology. 展开更多
关键词 Gastrointestinal diseases Artificial intelligence-assisted diagnosis and treatment multimodal large language model Multiple large language model collaboration DeepGut
在线阅读 下载PDF
Medical multimodal large language models:A systematic review
2
作者 Yuan Hu Chenhan Xu +2 位作者 Bo Lin Weibin Yang Yuan Yan Tang 《Intelligent Oncology》 2025年第4期308-325,共18页
The rapid advancement of artificial intelligence(AI)has ushered in a new era of medical multimodal large language models(MLLMs),which integrate diverse data modalities such as text,imaging,physiological signals,and ge... The rapid advancement of artificial intelligence(AI)has ushered in a new era of medical multimodal large language models(MLLMs),which integrate diverse data modalities such as text,imaging,physiological signals,and genomics to enhance clinical decision-making.This systematic review explores the core methodologies and applied research frontiers of medical MLLMs,focusing on their architecture,training methods,evaluation techniques,and applications.We highlight the transformative potential of MLLMs in achieving cross-modal semantic alignment,medical knowledge integration,and robust clinical reasoning.Despite their promise,challenges such as data heterogeneity,hallucination,and computational efficiency persist.By reviewing state-of-the-art solutions and future directions,this paper provides a comprehensive technical guide for developing reliable and interpretable medical MLLMs,ultimately aiming to bridge the gap between AI and clinical practice. 展开更多
关键词 multimodal large language model HALLUCINATION Medical multimodal dataset Clinical evaluation
在线阅读 下载PDF
Foundation models:Insights and implications for gastrointestinal cancer
3
作者 Lei Shi Rui Huang +1 位作者 Li-Ling Zhao An-Jie Guo 《World Journal of Gastroenterology》 2025年第47期7-34,共28页
Gastrointestinal(GI)cancers represent a major global health concern due to their high incidence and mortality rates.Foundation models(FMs),also referred to as large models,represent a novel class of artificial intelli... Gastrointestinal(GI)cancers represent a major global health concern due to their high incidence and mortality rates.Foundation models(FMs),also referred to as large models,represent a novel class of artificial intelligence technologies that have demonstrated considerable potential in addressing these challenges.These models encompass large language models(LLMs),vision FMs(VFMs),and multimodal LLMs(MLLMs),all of which utilize transformer architectures and self-supervised pre-training on extensive unlabeled datasets to achieve robust cross-domain generalization.This review delineates the principal applications of these models:LLMs facilitate the structuring of clinical narratives,extraction of insights from medical records,and enhancement of physician-patient communication;VFMs are employed in the analysis of endoscopic,radiological,and pathological images for lesion detection and staging;MLLMs integrate heterogeneous data modalities,including imaging,textual information,and genomic data,to support diagnostic processes,treatment prediction,and prognostic evaluation.Despite these promising developments,several challenges remain,such as the need for data standardization,limited diversity within training datasets,substantial computational resource requirements,and ethical-legal concerns.In conclusion,FMs exhibit significant potential to advance research and clinical management of GI cancers.Future research efforts should prioritize the refinement of these models,promote international collaborations,and adopt interdisciplinary approaches.Such a comprehensive strategy is essential to fully harness the capabilities of FMs,driving substantial progress in the fight against GI malignancies. 展开更多
关键词 Foundation models Gastrointestinal cancers large language models Vision foundation models multimodal large language models
在线阅读 下载PDF
CAPGen: An MLLM-Based Framework Integrated with Iterative Optimization Mechanism for Cultural Artifacts Poster Generation
4
作者 Qianqian Hu Chuhan Li +1 位作者 Mohan Zhang Fang Liu 《Computers, Materials & Continua》 2026年第1期494-510,共17页
Due to the digital transformation tendency among cultural institutions and the substantial influence of the social media platform,the demands of visual communication keep increasing for promoting traditional cultural ... Due to the digital transformation tendency among cultural institutions and the substantial influence of the social media platform,the demands of visual communication keep increasing for promoting traditional cultural artifacts online.As an effective medium,posters serve to attract public attention and facilitate broader engagement with cultural artifacts.However,existing poster generation methods mainly rely on fixed templates and manual design,which limits their scalability and adaptability to the diverse visual and semantic features of the artifacts.Therefore,we propose CAPGen,an automated aesthetic Cultural Artifacts Poster Generation framework built on a Multimodal Large Language Model(MLLM)with integrated iterative optimization.During our research,we collaborated with designers to define principles of graphic design for cultural artifact posters,to guide the MLLM in generating layout parameters.Later,we generated these parameters into posters.Finally,we refined the posters using an MLLM integrated with a multi-round iterative optimization mechanism.Qualitative results show that CAPGen consistently outperforms baseline methods in both visual quality and aesthetic performance.Furthermore,ablation studies indicate that the prompt,iterative optimization mechanism,and design principles significantly enhance the effectiveness of poster generation. 展开更多
关键词 Aesthetic poster generation prompt engineering multimodal large language models iterative optimization design principles
在线阅读 下载PDF
Current Trends and Future Prospects of Large-Scale Foundation Model in K-12 Education 被引量:1
5
作者 Qiannan Zhu Mei Wang +1 位作者 Ting Zhang Hua Huang 《Frontiers of Digital Education》 2025年第2期9-31,共23页
The rapid advancement of artificial intelligence has significantly impacted education,with largescalefoundatiomn odels(LFMs)emerging as transformative tools.While LFMs have demonstrated exceptional performance across ... The rapid advancement of artificial intelligence has significantly impacted education,with largescalefoundatiomn odels(LFMs)emerging as transformative tools.While LFMs have demonstrated exceptional performance across diverse domains,their integration into K-12 education remains in its early stages,requiring alignment with pedagogical principles,cognitive development,and curriculum standards.This paper provides a comprehensive technological review of LFM applications in K-12 education,examining current workflows,challenges,and future opportunities.We explore how LFMs facilitate personalized learning,teacher-student collaboration,and automated assessment while highlighting critical issues such as motivation,engagement,and_age-appropriate instructional strategies.By analyzing global developments,this study offers valuable insights for educators seeking to optimize AI-driven teaching methods and for students leveraging AI for self-directed learning.Our findings aim to inform future research and drive innovation in educational Al,ensuring the effective and ethical integration of LFMs into the evolving K-12 educational landscape. 展开更多
关键词 large-scalefoundationmodels(LFMs) multimodal large language model large language model K-12 education
在线阅读 下载PDF
Large investment model
6
作者 Jian GUO Heung-Yeung SHUM 《Frontiers of Information Technology & Electronic Engineering》 2025年第10期1771-1792,共22页
Traditional quantitative investment research is encountering diminishing returns alongside rising labor and time costs.To overcome these challenges,we introduce the large investment model(LIM),a novel research paradig... Traditional quantitative investment research is encountering diminishing returns alongside rising labor and time costs.To overcome these challenges,we introduce the large investment model(LIM),a novel research paradigm designed to enhance both performance and efficiency at scale.LIM employs end-to-end learning and universal modeling to create an upstream foundation model,which is capable of autonomously learning comprehensive signal patterns from diverse financial data spanning multiple exchanges,instruments,and frequencies.These“global patterns”are subsequently transferred to downstream strategy modeling,optimizing performance for specific tasks.We detail the system architecture design of LIM,address the technical challenges inherent in this approach,and outline potential directions for future research. 展开更多
关键词 Artificial general intelligence END-TO-END large investment model Quantitative investment Foundation model multimodal large language model
原文传递
Federated Services:A Smart Service Ecology With Federated Security for Aligned Data Supply and Scenario-Oriented Demands
7
作者 Xiaofeng Jia Juanjuan Li +5 位作者 Shouwen Wang Hongwei Qi Fei-Yue Wang Rui Qin Min Zhang Xiaolong Liang 《IEEE/CAA Journal of Automatica Sinica》 2025年第5期925-936,共12页
This paper introduces federated services as a smart service ecology with federated security to align distributed data supply with diversified service demands spanning digital and societal contexts.It presents the comp... This paper introduces federated services as a smart service ecology with federated security to align distributed data supply with diversified service demands spanning digital and societal contexts.It presents the comprehensive researches on the theoretical foundation and technical system of federated services,aiming at advancing our understanding and implementation of this novel service paradigm.First,a thorough examination of the characteristics of federated security within federated services is conducted.Then,a five-layer technical framework is formulated under a decentralized intelligent architecture,ensuring secure,agile,and adaptable service provision.On this basis,the operational mechanisms underlying data federation and service confederation is analyzed,with emphasis on the smart supply-demand matching model.Furthermore,a scenario-oriented taxonomy of federated services accompanied by illustrative examples is proposed.Our work offers actionable insights and roadmap for realizing and advancing federated services,contributing to the refinement and wider adoption of this transformative service paradigm in the digital era. 展开更多
关键词 Decentralized autonomous organizations and operations decentralized physical infrastructure networks federated security federated services multimodal large language models smart contracts
在线阅读 下载PDF
CLIP4Video-Sampling: Global Semantics-Guided Multi-Granularity Frame Sampling for Video-Text Retrieval
8
作者 Tao Zhang Yu Zhang 《Journal of Computer and Communications》 2024年第11期26-36,共11页
Video-text retrieval (VTR) is an essential task in multimodal learning, aiming to bridge the semantic gap between visual and textual data. Effective video frame sampling plays a crucial role in improving retrieval per... Video-text retrieval (VTR) is an essential task in multimodal learning, aiming to bridge the semantic gap between visual and textual data. Effective video frame sampling plays a crucial role in improving retrieval performance, as it determines the quality of the visual content representation. Traditional sampling methods, such as uniform sampling and optical flow-based techniques, often fail to capture the full semantic range of videos, leading to redundancy and inefficiencies. In this work, we propose CLIP4Video-Sampling: Global Semantics-Guided Multi-Granularity Frame Sampling for Video-Text Retrieval, a global semantics-guided multi-granularity frame sampling strategy designed to optimize both computational efficiency and retrieval accuracy. By integrating multi-scale global and local temporal sampling and leveraging the CLIP (Contrastive Language-Image Pre-training) model’s powerful feature extraction capabilities, our method significantly outperforms existing approaches in both zero-shot and fine-tuned video-text retrieval tasks on popular datasets. CLIP4Video-Sampling reduces redundancy, ensures keyframe coverage, and serves as an adaptable pre-processing module for multimodal models. 展开更多
关键词 Video Sampling multimodal large language model Text-Video Retrieval CLIP model
在线阅读 下载PDF
TimeJudge:empowering video-LLMs as zero-shot judges for temporal consistency in video captions
9
作者 Yangliu HU Zikai SONG +2 位作者 Junqing YU Yiping Phoebe CHEN Wei YANG 《Frontiers of Information Technology & Electronic Engineering》 2025年第11期2204-2214,共11页
Video large language models(video-LLMs)have demonstrated impressive capabilities in multimodal understanding,but their potential as zero-shot evaluators for temporal consistency in video captions remains underexplored... Video large language models(video-LLMs)have demonstrated impressive capabilities in multimodal understanding,but their potential as zero-shot evaluators for temporal consistency in video captions remains underexplored.Existing methods notably underperform in detecting critical temporal errors,such as missing,hallucinated,or misordered actions.To address this gap,we introduce two key contributions.(1)TimeJudge:a novel zero-shot framework that recasts temporal error detection as answering calibrated binary question pairs.It incorporates modality-sensitive confidence calibration and uses consistency-weighted voting for robust prediction aggregation.(2)TEDBench:a rigorously constructed benchmark featuring videos across four distinct complexity levels,specifically designed with fine-grained temporal error annotations to evaluate video-LLM performance on this task.Through a comprehensive evaluation of multiple state-of-the-art video-LLMs on TEDBench,we demonstrate that TimeJudge consistently yields substantial gains in terms of recall and F1-score without requiring any task-specific fine-tuning.Our approach provides a generalizable,scalable,and training-free solution for enhancing the temporal error detection capabilities of video-LLMs. 展开更多
关键词 Video large language model(Video-LLM) multimodal large language model(MLLM) MLLM-as-a-Judge Video caption BENCHMARK
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部