Artificial intelligence is increasingly entering everyday healthcare.Large language model(LLM)systems such as Chat Generative Pre-trained Transformer(ChatGPT)have become potentially accessible to everyone,including pa...Artificial intelligence is increasingly entering everyday healthcare.Large language model(LLM)systems such as Chat Generative Pre-trained Transformer(ChatGPT)have become potentially accessible to everyone,including patients with inflammatory bowel diseases(IBD).However,significant ethical issues and pitfalls exist in innovative LLM tools.The hype generated by such systems may lead to unweighted patient trust in these systems.Therefore,it is necessary to understand whether LLMs(trendy ones,such as ChatGPT)can produce plausible medical information(MI)for patients.This review examined ChatGPT’s potential to provide MI regarding questions commonly addressed by patients with IBD to their gastroenterologists.From the review of the outputs provided by ChatGPT,this tool showed some attractive potential while having significant limitations in updating and detailing information and providing inaccurate information in some cases.Further studies and refinement of the ChatGPT,possibly aligning the outputs with the leading medical evidence provided by reliable databases,are needed.展开更多
This letter evaluates the article by Gravina et al on ChatGPT’s potential in providing medical information for inflammatory bowel disease patients.While promising,it highlights the need for advanced techniques like r...This letter evaluates the article by Gravina et al on ChatGPT’s potential in providing medical information for inflammatory bowel disease patients.While promising,it highlights the need for advanced techniques like reasoning+action and retrieval-augmented generation to improve accuracy and reliability.Emphasizing that simple question and answer testing is insufficient,it calls for more nuanced evaluation methods to truly gauge large language models’capabilities in clinical applications.展开更多
Background:The integration of artificial intelligence(AI)in radiology has opened new possibilities for diagnostic accuracy,with large language models(LLMs)showing potential for supporting clinical decision-making.Whil...Background:The integration of artificial intelligence(AI)in radiology has opened new possibilities for diagnostic accuracy,with large language models(LLMs)showing potential for supporting clinical decision-making.While proprietary models like ChatGPT have gained attention,open-source alternatives such as Meta LLaMa 3.1 remain underexplored.This study aims to evaluate the diagnostic accuracy of LLaMa 3.1 in thoracic imaging and to discuss broader implications of open-source versus proprietary AI models in healthcare.Methods:Meta LLaMa 3.1(8B parameter version)was tested on 126 multiple-choice thoracic imaging questions selected from Thoracic Imaging:A Core Review by Hobbs et al.These questions required no image interpretation.The model’s answers were validated by two board-certified diagnostic radiologists.Accuracy was assessed overall and across subgroups,including intensive care,pathology,and anatomy.Additionally,a narrative review introduces three widely used AI platforms in thoracic imaging:DeepLesion,ChexNet,and 3D Slicer.Results:LLaMa 3.1 achieved an overall accuracy of 61.1%.It performed well in intensive care(90.0%)and terms and signs(83.3%)but showed variability across subgroups,with lower accuracy in normal anatomy and basic imaging(40.0%).Subgroup analysis revealed strengths in infectious pneumonia and pleural disease,but notable weaknesses in lung cancer and vascular pathology.Conclusion:LLaMa 3.1 demonstrates promise as an open-source NLP tool in thoracic diagnostics,though its performance variability highlights the need for refinement and domain-specific training.Open-source models offer transparency and accessibility,while proprietary models deliver consistency.Both hold value,depending on clinical context and resource availability.展开更多
文摘Artificial intelligence is increasingly entering everyday healthcare.Large language model(LLM)systems such as Chat Generative Pre-trained Transformer(ChatGPT)have become potentially accessible to everyone,including patients with inflammatory bowel diseases(IBD).However,significant ethical issues and pitfalls exist in innovative LLM tools.The hype generated by such systems may lead to unweighted patient trust in these systems.Therefore,it is necessary to understand whether LLMs(trendy ones,such as ChatGPT)can produce plausible medical information(MI)for patients.This review examined ChatGPT’s potential to provide MI regarding questions commonly addressed by patients with IBD to their gastroenterologists.From the review of the outputs provided by ChatGPT,this tool showed some attractive potential while having significant limitations in updating and detailing information and providing inaccurate information in some cases.Further studies and refinement of the ChatGPT,possibly aligning the outputs with the leading medical evidence provided by reliable databases,are needed.
文摘This letter evaluates the article by Gravina et al on ChatGPT’s potential in providing medical information for inflammatory bowel disease patients.While promising,it highlights the need for advanced techniques like reasoning+action and retrieval-augmented generation to improve accuracy and reliability.Emphasizing that simple question and answer testing is insufficient,it calls for more nuanced evaluation methods to truly gauge large language models’capabilities in clinical applications.
文摘Background:The integration of artificial intelligence(AI)in radiology has opened new possibilities for diagnostic accuracy,with large language models(LLMs)showing potential for supporting clinical decision-making.While proprietary models like ChatGPT have gained attention,open-source alternatives such as Meta LLaMa 3.1 remain underexplored.This study aims to evaluate the diagnostic accuracy of LLaMa 3.1 in thoracic imaging and to discuss broader implications of open-source versus proprietary AI models in healthcare.Methods:Meta LLaMa 3.1(8B parameter version)was tested on 126 multiple-choice thoracic imaging questions selected from Thoracic Imaging:A Core Review by Hobbs et al.These questions required no image interpretation.The model’s answers were validated by two board-certified diagnostic radiologists.Accuracy was assessed overall and across subgroups,including intensive care,pathology,and anatomy.Additionally,a narrative review introduces three widely used AI platforms in thoracic imaging:DeepLesion,ChexNet,and 3D Slicer.Results:LLaMa 3.1 achieved an overall accuracy of 61.1%.It performed well in intensive care(90.0%)and terms and signs(83.3%)but showed variability across subgroups,with lower accuracy in normal anatomy and basic imaging(40.0%).Subgroup analysis revealed strengths in infectious pneumonia and pleural disease,but notable weaknesses in lung cancer and vascular pathology.Conclusion:LLaMa 3.1 demonstrates promise as an open-source NLP tool in thoracic diagnostics,though its performance variability highlights the need for refinement and domain-specific training.Open-source models offer transparency and accessibility,while proprietary models deliver consistency.Both hold value,depending on clinical context and resource availability.