This editorial presents an optimistic yet cautious perspective on the development,deployment,and regulation of large language models(LLMs)in the field of medicine.It is essential to strike a balance between embracing ...This editorial presents an optimistic yet cautious perspective on the development,deployment,and regulation of large language models(LLMs)in the field of medicine.It is essential to strike a balance between embracing the benefits of artificial intelligence-driven solutions and preserving the human touch that is vital for providing compassionate care.The exponential growth of medical data has paved the way for the integration of LLMs into healthcare,offering unprecedented opportunities to enhance clinical decision-making and alleviate physicians’workloads.Recently,LLMs have exhibited remarkable potential across various clinical scenarios,including streamlining diagnostic processes,optimizing radiology reports,and providing personalized treatment recommendations.However,the implementation of LLMs in healthcare is not without its challenges.Issues such as the scarcity of high-quality annotated data,privacy concerns,and the risk of generating misleading or overconfident information are significant hurdles that must be addressed.Moreover,while LLMs can replace certain basic tasks traditionally performed by humans,it is crucial to recognize that senior clinicians play an irreplaceable role in complex decision-making and providing emotional support to patients.By harnessing the power of LLMs to augment human capabilities while maintaining essential human elements within healthcare,we might shape a future where artificial intelligence and human intelligence coexist harmoniously.Prioritizing ethical development and deployment for artificial intelligence,empowering healthcare professionals,and safeguarding patient privacy will be key to realizing the full potential of LLMs in revolutionizing healthcare delivery.Through ongoing research,collaboration,and adaptation,responsible integration of LLMs holds promise for elevating both quality and accessibility globally,ultimately creating a more efficient,personalized,and patient-centric healthcare system.展开更多
Objective Appropriate medical imaging is important for value-based care.We aim to evaluate the performance of generative pretrained transformer 4(GPT-4),an innovative natural language processing model,providing approp...Objective Appropriate medical imaging is important for value-based care.We aim to evaluate the performance of generative pretrained transformer 4(GPT-4),an innovative natural language processing model,providing appropriate medical imaging automatically in different clinical scenarios.Methods Institutional Review Boards(IRB)approval was not required due to the use of nonidentifiable data.Instead,we used 112 questions from the American College of Radiology(ACR)Radiology-TEACHES Program as prompts,which is an open-sourced question and answer program to guide appropriate medical imaging.We included 69 free-text case vignettes and 43 simplified cases.For the performance evaluation of GPT-4 and GPT-3.5,we considered the recommendations of ACR guidelines as the gold standard,and then three radiologists analyzed the consistency of the responses from the GPT models with those of the ACR.We set a five-score criterion for the evaluation of the consistency.A paired t-test was applied to assess the statistical significance of the findings.Results For the performance of the GPT models in free-text case vignettes,the accuracy of GPT-4 was 92.9%,whereas the accuracy of GPT-3.5 was just 78.3%.GPT-4 can provide more appropriate suggestions to reduce the overutilization of medical imaging than GPT-3.5(t=3.429,P=0.001).For the performance of the GPT models in simplified scenarios,the accuracy of GPT-4 and GPT-3.5 was 66.5%and 60.0%,respectively.The differences were not statistically significant(t=1.858,P=0.070).GPT-4 was characterized by longer reaction times(27.1 s in average)and extensive responses(137.1 words on average)than GPT-3.5.Conclusion As an advanced tool for improving value-based healthcare in clinics,GPT-4 may guide appropriate medical imaging accurately and efficiently。展开更多
基金funded by the National Natural Science Foundation of China(Grant No.62171297)the Beijing Natural Science Foundation(Grant No.L242024)+2 种基金the Beijing Friendship Hospital,Capital Medical University(Grant No.YYZZ202334)the Open Projects of Sichuan Province Clinical Medical Research Center for Imaging Medicine(Grant No.YXYX2409)the Beijing Municipal Natural Science Foundation(Grant No.7254539).
文摘This editorial presents an optimistic yet cautious perspective on the development,deployment,and regulation of large language models(LLMs)in the field of medicine.It is essential to strike a balance between embracing the benefits of artificial intelligence-driven solutions and preserving the human touch that is vital for providing compassionate care.The exponential growth of medical data has paved the way for the integration of LLMs into healthcare,offering unprecedented opportunities to enhance clinical decision-making and alleviate physicians’workloads.Recently,LLMs have exhibited remarkable potential across various clinical scenarios,including streamlining diagnostic processes,optimizing radiology reports,and providing personalized treatment recommendations.However,the implementation of LLMs in healthcare is not without its challenges.Issues such as the scarcity of high-quality annotated data,privacy concerns,and the risk of generating misleading or overconfident information are significant hurdles that must be addressed.Moreover,while LLMs can replace certain basic tasks traditionally performed by humans,it is crucial to recognize that senior clinicians play an irreplaceable role in complex decision-making and providing emotional support to patients.By harnessing the power of LLMs to augment human capabilities while maintaining essential human elements within healthcare,we might shape a future where artificial intelligence and human intelligence coexist harmoniously.Prioritizing ethical development and deployment for artificial intelligence,empowering healthcare professionals,and safeguarding patient privacy will be key to realizing the full potential of LLMs in revolutionizing healthcare delivery.Through ongoing research,collaboration,and adaptation,responsible integration of LLMs holds promise for elevating both quality and accessibility globally,ultimately creating a more efficient,personalized,and patient-centric healthcare system.
基金National Natural Science Foundation of China(Grant Nos.62171297 and 61931013).
文摘Objective Appropriate medical imaging is important for value-based care.We aim to evaluate the performance of generative pretrained transformer 4(GPT-4),an innovative natural language processing model,providing appropriate medical imaging automatically in different clinical scenarios.Methods Institutional Review Boards(IRB)approval was not required due to the use of nonidentifiable data.Instead,we used 112 questions from the American College of Radiology(ACR)Radiology-TEACHES Program as prompts,which is an open-sourced question and answer program to guide appropriate medical imaging.We included 69 free-text case vignettes and 43 simplified cases.For the performance evaluation of GPT-4 and GPT-3.5,we considered the recommendations of ACR guidelines as the gold standard,and then three radiologists analyzed the consistency of the responses from the GPT models with those of the ACR.We set a five-score criterion for the evaluation of the consistency.A paired t-test was applied to assess the statistical significance of the findings.Results For the performance of the GPT models in free-text case vignettes,the accuracy of GPT-4 was 92.9%,whereas the accuracy of GPT-3.5 was just 78.3%.GPT-4 can provide more appropriate suggestions to reduce the overutilization of medical imaging than GPT-3.5(t=3.429,P=0.001).For the performance of the GPT models in simplified scenarios,the accuracy of GPT-4 and GPT-3.5 was 66.5%and 60.0%,respectively.The differences were not statistically significant(t=1.858,P=0.070).GPT-4 was characterized by longer reaction times(27.1 s in average)and extensive responses(137.1 words on average)than GPT-3.5.Conclusion As an advanced tool for improving value-based healthcare in clinics,GPT-4 may guide appropriate medical imaging accurately and efficiently。