This editorial presents an optimistic yet cautious perspective on the development,deployment,and regulation of large language models(LLMs)in the field of medicine.It is essential to strike a balance between embracing ...This editorial presents an optimistic yet cautious perspective on the development,deployment,and regulation of large language models(LLMs)in the field of medicine.It is essential to strike a balance between embracing the benefits of artificial intelligence-driven solutions and preserving the human touch that is vital for providing compassionate care.The exponential growth of medical data has paved the way for the integration of LLMs into healthcare,offering unprecedented opportunities to enhance clinical decision-making and alleviate physicians’workloads.Recently,LLMs have exhibited remarkable potential across various clinical scenarios,including streamlining diagnostic processes,optimizing radiology reports,and providing personalized treatment recommendations.However,the implementation of LLMs in healthcare is not without its challenges.Issues such as the scarcity of high-quality annotated data,privacy concerns,and the risk of generating misleading or overconfident information are significant hurdles that must be addressed.Moreover,while LLMs can replace certain basic tasks traditionally performed by humans,it is crucial to recognize that senior clinicians play an irreplaceable role in complex decision-making and providing emotional support to patients.By harnessing the power of LLMs to augment human capabilities while maintaining essential human elements within healthcare,we might shape a future where artificial intelligence and human intelligence coexist harmoniously.Prioritizing ethical development and deployment for artificial intelligence,empowering healthcare professionals,and safeguarding patient privacy will be key to realizing the full potential of LLMs in revolutionizing healthcare delivery.Through ongoing research,collaboration,and adaptation,responsible integration of LLMs holds promise for elevating both quality and accessibility globally,ultimately creating a more efficient,personalized,and patient-centric healthcare system.展开更多
Large language models(LLMs),trained on vast amounts of textual data,have demonstrated strong capabilities in natural language understanding and generation.In the medical field,LLMs are increasingly applied across vari...Large language models(LLMs),trained on vast amounts of textual data,have demonstrated strong capabilities in natural language understanding and generation.In the medical field,LLMs are increasingly applied across various domains such as disease screening,diagnostic assistance,and health management,playing a key role in advancing intelligent healthcare.In recent years,China has actively promoted the integration of artificial intel-ligence(AI)with healthcare through a series of policies that support enterprises in making breakthroughs in key technologies such as medical LLMs and multimodal data integration.Concurrently,efforts have accelerated the deployment of AI in applications such as health management and precision medicine to gradually establish a full-cycle intelligent healthcare system encompassing prevention,diagnosis,treatment,and rehabilitation.How-ever,the rapid deployment of LLMs in healthcare has highlighted the lack of standardized evaluation criteria and consistent methodologies.To address this,this expert consensus focuses on establishing a retrospective eval-uation framework tailored to medical applications.By integrating scientific evaluation metrics,standards,and procedures,the framework provides clear and actionable guidance for model evaluators,developers,and end users.It aims to unify assessment practices,enhance the scientific rigor and comparability of evaluations,and ensure the safe and effective use of LLMs in healthcare,ultimately supporting the high-quality development of AI-powered medical services.展开更多
Objective Appropriate medical imaging is important for value-based care.We aim to evaluate the performance of generative pretrained transformer 4(GPT-4),an innovative natural language processing model,providing approp...Objective Appropriate medical imaging is important for value-based care.We aim to evaluate the performance of generative pretrained transformer 4(GPT-4),an innovative natural language processing model,providing appropriate medical imaging automatically in different clinical scenarios.Methods Institutional Review Boards(IRB)approval was not required due to the use of nonidentifiable data.Instead,we used 112 questions from the American College of Radiology(ACR)Radiology-TEACHES Program as prompts,which is an open-sourced question and answer program to guide appropriate medical imaging.We included 69 free-text case vignettes and 43 simplified cases.For the performance evaluation of GPT-4 and GPT-3.5,we considered the recommendations of ACR guidelines as the gold standard,and then three radiologists analyzed the consistency of the responses from the GPT models with those of the ACR.We set a five-score criterion for the evaluation of the consistency.A paired t-test was applied to assess the statistical significance of the findings.Results For the performance of the GPT models in free-text case vignettes,the accuracy of GPT-4 was 92.9%,whereas the accuracy of GPT-3.5 was just 78.3%.GPT-4 can provide more appropriate suggestions to reduce the overutilization of medical imaging than GPT-3.5(t=3.429,P=0.001).For the performance of the GPT models in simplified scenarios,the accuracy of GPT-4 and GPT-3.5 was 66.5%and 60.0%,respectively.The differences were not statistically significant(t=1.858,P=0.070).GPT-4 was characterized by longer reaction times(27.1 s in average)and extensive responses(137.1 words on average)than GPT-3.5.Conclusion As an advanced tool for improving value-based healthcare in clinics,GPT-4 may guide appropriate medical imaging accurately and efficiently。展开更多
基金funded by the National Natural Science Foundation of China(Grant No.62171297)the Beijing Natural Science Foundation(Grant No.L242024)+2 种基金the Beijing Friendship Hospital,Capital Medical University(Grant No.YYZZ202334)the Open Projects of Sichuan Province Clinical Medical Research Center for Imaging Medicine(Grant No.YXYX2409)the Beijing Municipal Natural Science Foundation(Grant No.7254539).
文摘This editorial presents an optimistic yet cautious perspective on the development,deployment,and regulation of large language models(LLMs)in the field of medicine.It is essential to strike a balance between embracing the benefits of artificial intelligence-driven solutions and preserving the human touch that is vital for providing compassionate care.The exponential growth of medical data has paved the way for the integration of LLMs into healthcare,offering unprecedented opportunities to enhance clinical decision-making and alleviate physicians’workloads.Recently,LLMs have exhibited remarkable potential across various clinical scenarios,including streamlining diagnostic processes,optimizing radiology reports,and providing personalized treatment recommendations.However,the implementation of LLMs in healthcare is not without its challenges.Issues such as the scarcity of high-quality annotated data,privacy concerns,and the risk of generating misleading or overconfident information are significant hurdles that must be addressed.Moreover,while LLMs can replace certain basic tasks traditionally performed by humans,it is crucial to recognize that senior clinicians play an irreplaceable role in complex decision-making and providing emotional support to patients.By harnessing the power of LLMs to augment human capabilities while maintaining essential human elements within healthcare,we might shape a future where artificial intelligence and human intelligence coexist harmoniously.Prioritizing ethical development and deployment for artificial intelligence,empowering healthcare professionals,and safeguarding patient privacy will be key to realizing the full potential of LLMs in revolutionizing healthcare delivery.Through ongoing research,collaboration,and adaptation,responsible integration of LLMs holds promise for elevating both quality and accessibility globally,ultimately creating a more efficient,personalized,and patient-centric healthcare system.
文摘Large language models(LLMs),trained on vast amounts of textual data,have demonstrated strong capabilities in natural language understanding and generation.In the medical field,LLMs are increasingly applied across various domains such as disease screening,diagnostic assistance,and health management,playing a key role in advancing intelligent healthcare.In recent years,China has actively promoted the integration of artificial intel-ligence(AI)with healthcare through a series of policies that support enterprises in making breakthroughs in key technologies such as medical LLMs and multimodal data integration.Concurrently,efforts have accelerated the deployment of AI in applications such as health management and precision medicine to gradually establish a full-cycle intelligent healthcare system encompassing prevention,diagnosis,treatment,and rehabilitation.How-ever,the rapid deployment of LLMs in healthcare has highlighted the lack of standardized evaluation criteria and consistent methodologies.To address this,this expert consensus focuses on establishing a retrospective eval-uation framework tailored to medical applications.By integrating scientific evaluation metrics,standards,and procedures,the framework provides clear and actionable guidance for model evaluators,developers,and end users.It aims to unify assessment practices,enhance the scientific rigor and comparability of evaluations,and ensure the safe and effective use of LLMs in healthcare,ultimately supporting the high-quality development of AI-powered medical services.
基金National Natural Science Foundation of China(Grant Nos.62171297 and 61931013).
文摘Objective Appropriate medical imaging is important for value-based care.We aim to evaluate the performance of generative pretrained transformer 4(GPT-4),an innovative natural language processing model,providing appropriate medical imaging automatically in different clinical scenarios.Methods Institutional Review Boards(IRB)approval was not required due to the use of nonidentifiable data.Instead,we used 112 questions from the American College of Radiology(ACR)Radiology-TEACHES Program as prompts,which is an open-sourced question and answer program to guide appropriate medical imaging.We included 69 free-text case vignettes and 43 simplified cases.For the performance evaluation of GPT-4 and GPT-3.5,we considered the recommendations of ACR guidelines as the gold standard,and then three radiologists analyzed the consistency of the responses from the GPT models with those of the ACR.We set a five-score criterion for the evaluation of the consistency.A paired t-test was applied to assess the statistical significance of the findings.Results For the performance of the GPT models in free-text case vignettes,the accuracy of GPT-4 was 92.9%,whereas the accuracy of GPT-3.5 was just 78.3%.GPT-4 can provide more appropriate suggestions to reduce the overutilization of medical imaging than GPT-3.5(t=3.429,P=0.001).For the performance of the GPT models in simplified scenarios,the accuracy of GPT-4 and GPT-3.5 was 66.5%and 60.0%,respectively.The differences were not statistically significant(t=1.858,P=0.070).GPT-4 was characterized by longer reaction times(27.1 s in average)and extensive responses(137.1 words on average)than GPT-3.5.Conclusion As an advanced tool for improving value-based healthcare in clinics,GPT-4 may guide appropriate medical imaging accurately and efficiently。