Background:The integration of 7 Tesla(7T)magnetic resonance imaging(MRI)with advanced multimodal artificial intelligence(AI)models represents a promising frontier in neuroimaging.The superior spatial resolution of 7TM...Background:The integration of 7 Tesla(7T)magnetic resonance imaging(MRI)with advanced multimodal artificial intelligence(AI)models represents a promising frontier in neuroimaging.The superior spatial resolution of 7TMRI provides detailed visualizations of brain structure,which are crucial forunderstanding complex central nervous system diseases and tumors.Concurrently,the application of multimodal AI to medical images enables interactive imaging-based diagnostic conversation.Methods:In this paper,we systematically investigate the capacity and feasibility of applying the existing advanced multimodal AI model ChatGPT-4V to 7T MRI under the context of brain tumors.First,we test whether ChatGPT-4V has knowledge about 7T MRI,and whether it can differentiate 7T MRI from 3T MRI.In addition,we explore whether ChatGPT-4V can recognize different 7T MRI modalities and whether it can correctly offer diagnosis of tumors based on single or multiple modality 7T MRI.Results:ChatGPT-4V exhibited accuracy of 84.4%in 3T-vs-7T differentiation and accuracy of 78.9%in 7T modality recognition.Meanwhile,in a human evaluation with three clinical experts,ChatGPT obtained average scores of 9.27/20 in single modality-based diagnosis and 21.25/25 in multiple modality-based diagnosis.Our study indicates that single-modality diagnosis and the interpretability of diagnostic decisions in clinical practice should be enhanced when ChatGPT-4V is applied to 7T data.Conclusions:In general,our analysis suggests that such integration has promise as a tool to improve the workflow of diagnostics in neurology,with a potentially transformative impact in the fields of medical image analysis and patient management.展开更多
In recent years,multimodal agent AI(MAA)has emerged as a pivotal area of research,holding promise for transforming human-machine interaction.Agent AI systems,capable of perceiving and responding to inputs from multipl...In recent years,multimodal agent AI(MAA)has emerged as a pivotal area of research,holding promise for transforming human-machine interaction.Agent AI systems,capable of perceiving and responding to inputs from multiple modalities(e.g.,language,vision,audio),have demonstrated remarkable progress in understanding complex environments and executing intricate tasks.This survey comprehensively reviews the state-of-the-art developments in MAA and examines its fundamental concepts,key techniques,and applications across diverse domains.We first introduce the basics of agent AI and its multimodal interaction capabilities.We then delve into the core technologies that enable agents to perform task planning,decision-making,and multi-sensory fusion.Furthermore,we focus on exploring various applications of MAA in robotics,healthcare,gaming,and beyond.Additionally,we mainly focus on analyzing the challenges and limitations of current systems and propose promising research directions for future improvements,including human-AI collaboration,online learning method improvement.By reviewing existing work and highlighting open questions,this survey aims to provide a comprehensive roadmap for researchers and practitioners in the field of MAA.展开更多
基金Science and Technology Innovation Plan Of Shanghai Science and Technology Commission,Grant/Award Number:21Y21900600Shanghai Zhou Liang Fu Medical Development Foundation,Grant/Award Number:XM00050-2024-3-8+1 种基金National Natural Science Foundation of China,Grant/Award Numbers:82127801,82227806,82272063Science and Technology Innovation 2030-Major Project,Grant/Award Number:2023ZD0511800。
文摘Background:The integration of 7 Tesla(7T)magnetic resonance imaging(MRI)with advanced multimodal artificial intelligence(AI)models represents a promising frontier in neuroimaging.The superior spatial resolution of 7TMRI provides detailed visualizations of brain structure,which are crucial forunderstanding complex central nervous system diseases and tumors.Concurrently,the application of multimodal AI to medical images enables interactive imaging-based diagnostic conversation.Methods:In this paper,we systematically investigate the capacity and feasibility of applying the existing advanced multimodal AI model ChatGPT-4V to 7T MRI under the context of brain tumors.First,we test whether ChatGPT-4V has knowledge about 7T MRI,and whether it can differentiate 7T MRI from 3T MRI.In addition,we explore whether ChatGPT-4V can recognize different 7T MRI modalities and whether it can correctly offer diagnosis of tumors based on single or multiple modality 7T MRI.Results:ChatGPT-4V exhibited accuracy of 84.4%in 3T-vs-7T differentiation and accuracy of 78.9%in 7T modality recognition.Meanwhile,in a human evaluation with three clinical experts,ChatGPT obtained average scores of 9.27/20 in single modality-based diagnosis and 21.25/25 in multiple modality-based diagnosis.Our study indicates that single-modality diagnosis and the interpretability of diagnostic decisions in clinical practice should be enhanced when ChatGPT-4V is applied to 7T data.Conclusions:In general,our analysis suggests that such integration has promise as a tool to improve the workflow of diagnostics in neurology,with a potentially transformative impact in the fields of medical image analysis and patient management.
基金supported in part by the National Natural Science Foundation of China under Grant Nos.62072365 and 62472348the Aviation Science Foundation of China under Grant No.2023M071070002+2 种基金the Key Research and Development Program of Shaanxi Province of China under Grant Nos.2022GY-332,2023-YBGY-230,and 2024GX-YBXM-533the Innovation Capability Support Plan of Shaanxi Province of China under Grant No.2022PT-33the Xi'an Science and Technology Plan Key Industrial Chain Technology Research Project under Grant No.23ZDCYJSGG0007.
文摘In recent years,multimodal agent AI(MAA)has emerged as a pivotal area of research,holding promise for transforming human-machine interaction.Agent AI systems,capable of perceiving and responding to inputs from multiple modalities(e.g.,language,vision,audio),have demonstrated remarkable progress in understanding complex environments and executing intricate tasks.This survey comprehensively reviews the state-of-the-art developments in MAA and examines its fundamental concepts,key techniques,and applications across diverse domains.We first introduce the basics of agent AI and its multimodal interaction capabilities.We then delve into the core technologies that enable agents to perform task planning,decision-making,and multi-sensory fusion.Furthermore,we focus on exploring various applications of MAA in robotics,healthcare,gaming,and beyond.Additionally,we mainly focus on analyzing the challenges and limitations of current systems and propose promising research directions for future improvements,including human-AI collaboration,online learning method improvement.By reviewing existing work and highlighting open questions,this survey aims to provide a comprehensive roadmap for researchers and practitioners in the field of MAA.