期刊文献+
共找到5篇文章
< 1 >
每页显示 20 50 100
DeepGut:A collaborative multimodal large language model framework for digestive disease assisted diagnosis and treatment
1
作者 Xiao-Han Wan Mei-Xia Liu +6 位作者 Yan Zhang Guan-Jun Kou Lei-Qi Xu Han Liu Xiao-Yun Yang Xiu-Li Zuo Yan-Qing Li 《World Journal of Gastroenterology》 2025年第31期92-100,共9页
BACKGROUND Gastrointestinal diseases have complex etiologies and clinical presentations.An accurate diagnosis requires physicians to integrate diverse information,including medical history,laboratory test results,and ... BACKGROUND Gastrointestinal diseases have complex etiologies and clinical presentations.An accurate diagnosis requires physicians to integrate diverse information,including medical history,laboratory test results,and imaging findings.Existing artificial intelligence-assisted diagnostic tools are limited to single-modality information,resulting in recommendations that are often incomplete and may be associated with clinical or legal risks.AIM To develop and evaluate a collaborative multimodal large language model(LLM)framework for clinical decision-making in digestive diseases.METHODS In this observational study,DeepGut,a multimodal LLM collaborative diagnostic framework,was developed to integrate four distinct large models into a four-tiered structure.The framework sequentially accomplishes multimodal infor-mation extraction,logical“chain”construction,diagnostic and treatment suggestion generation,and risk analysis.The model was evaluated using objective metrics,which assess the reliability and comprehensiveness of model-generated results,and subjective expert opinions,which examine the effectiveness of the framework in assisting physicians.RESULTS The diagnostic and treatment recommendations generated by the DeepGut framework achieved exceptional performance,with a diagnostic accuracy of 97.8%,diagnostic completeness of 93.9%,treatment plan accuracy of 95.2%,and treatment plan completeness of 98.0%,significantly surpassing the capabilities of single-modal LLM-based diagnostic tools.Experts evaluating the framework commended the completeness,relevance,and logical coherence of its outputs.However,the collaborative multimodal LLM approach resulted in increased input and output token counts,leading to higher computational costs and extended diagnostic times.CONCLUSION The framework achieves successful integration of multimodal diagnostic data,demonstrating enhanced performance enabled by multimodal LLM collaboration,which opens new horizons for the clinical application of artificial intelligence-assisted technology. 展开更多
关键词 Gastrointestinal diseases Artificial intelligence-assisted diagnosis and treatment multimodal large language model Multiple large language model collaboration DeepGut
在线阅读 下载PDF
Current Trends and Future Prospects of Large-Scale Foundation Model in K-12 Education 被引量:1
2
作者 Qiannan Zhu Mei Wang +1 位作者 Ting Zhang Hua Huang 《Frontiers of Digital Education》 2025年第2期9-31,共23页
The rapid advancement of artificial intelligence has significantly impacted education,with largescalefoundatiomn odels(LFMs)emerging as transformative tools.While LFMs have demonstrated exceptional performance across ... The rapid advancement of artificial intelligence has significantly impacted education,with largescalefoundatiomn odels(LFMs)emerging as transformative tools.While LFMs have demonstrated exceptional performance across diverse domains,their integration into K-12 education remains in its early stages,requiring alignment with pedagogical principles,cognitive development,and curriculum standards.This paper provides a comprehensive technological review of LFM applications in K-12 education,examining current workflows,challenges,and future opportunities.We explore how LFMs facilitate personalized learning,teacher-student collaboration,and automated assessment while highlighting critical issues such as motivation,engagement,and_age-appropriate instructional strategies.By analyzing global developments,this study offers valuable insights for educators seeking to optimize AI-driven teaching methods and for students leveraging AI for self-directed learning.Our findings aim to inform future research and drive innovation in educational Al,ensuring the effective and ethical integration of LFMs into the evolving K-12 educational landscape. 展开更多
关键词 large-scalefoundationmodels(LFMs) multimodal large language model large language model K-12 education
在线阅读 下载PDF
Large investment model
3
作者 Jian GUO Heung-Yeung SHUM 《Frontiers of Information Technology & Electronic Engineering》 2025年第10期1771-1792,共22页
Traditional quantitative investment research is encountering diminishing returns alongside rising labor and time costs.To overcome these challenges,we introduce the large investment model(LIM),a novel research paradig... Traditional quantitative investment research is encountering diminishing returns alongside rising labor and time costs.To overcome these challenges,we introduce the large investment model(LIM),a novel research paradigm designed to enhance both performance and efficiency at scale.LIM employs end-to-end learning and universal modeling to create an upstream foundation model,which is capable of autonomously learning comprehensive signal patterns from diverse financial data spanning multiple exchanges,instruments,and frequencies.These“global patterns”are subsequently transferred to downstream strategy modeling,optimizing performance for specific tasks.We detail the system architecture design of LIM,address the technical challenges inherent in this approach,and outline potential directions for future research. 展开更多
关键词 Artificial general intelligence END-TO-END large investment model Quantitative investment Foundation model multimodal large language model
原文传递
CLIP4Video-Sampling: Global Semantics-Guided Multi-Granularity Frame Sampling for Video-Text Retrieval
4
作者 Tao Zhang Yu Zhang 《Journal of Computer and Communications》 2024年第11期26-36,共11页
Video-text retrieval (VTR) is an essential task in multimodal learning, aiming to bridge the semantic gap between visual and textual data. Effective video frame sampling plays a crucial role in improving retrieval per... Video-text retrieval (VTR) is an essential task in multimodal learning, aiming to bridge the semantic gap between visual and textual data. Effective video frame sampling plays a crucial role in improving retrieval performance, as it determines the quality of the visual content representation. Traditional sampling methods, such as uniform sampling and optical flow-based techniques, often fail to capture the full semantic range of videos, leading to redundancy and inefficiencies. In this work, we propose CLIP4Video-Sampling: Global Semantics-Guided Multi-Granularity Frame Sampling for Video-Text Retrieval, a global semantics-guided multi-granularity frame sampling strategy designed to optimize both computational efficiency and retrieval accuracy. By integrating multi-scale global and local temporal sampling and leveraging the CLIP (Contrastive Language-Image Pre-training) model’s powerful feature extraction capabilities, our method significantly outperforms existing approaches in both zero-shot and fine-tuned video-text retrieval tasks on popular datasets. CLIP4Video-Sampling reduces redundancy, ensures keyframe coverage, and serves as an adaptable pre-processing module for multimodal models. 展开更多
关键词 Video Sampling multimodal large language model Text-Video Retrieval CLIP model
在线阅读 下载PDF
CAPGen: An MLLM-Based Framework Integrated with Iterative Optimization Mechanism for Cultural Artifacts Poster Generation
5
作者 Qianqian Hu Chuhan Li +1 位作者 Mohan Zhang Fang Liu 《Computers, Materials & Continua》 2026年第1期494-510,共17页
Due to the digital transformation tendency among cultural institutions and the substantial influence of the social media platform,the demands of visual communication keep increasing for promoting traditional cultural ... Due to the digital transformation tendency among cultural institutions and the substantial influence of the social media platform,the demands of visual communication keep increasing for promoting traditional cultural artifacts online.As an effective medium,posters serve to attract public attention and facilitate broader engagement with cultural artifacts.However,existing poster generation methods mainly rely on fixed templates and manual design,which limits their scalability and adaptability to the diverse visual and semantic features of the artifacts.Therefore,we propose CAPGen,an automated aesthetic Cultural Artifacts Poster Generation framework built on a Multimodal Large Language Model(MLLM)with integrated iterative optimization.During our research,we collaborated with designers to define principles of graphic design for cultural artifact posters,to guide the MLLM in generating layout parameters.Later,we generated these parameters into posters.Finally,we refined the posters using an MLLM integrated with a multi-round iterative optimization mechanism.Qualitative results show that CAPGen consistently outperforms baseline methods in both visual quality and aesthetic performance.Furthermore,ablation studies indicate that the prompt,iterative optimization mechanism,and design principles significantly enhance the effectiveness of poster generation. 展开更多
关键词 Aesthetic poster generation prompt engineering multimodal large language models iterative optimization design principles
在线阅读 下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部