口腔辅助诊疗和健康咨询领域5种大语言模型应用初探被引量：1

Preliminary exploration of the applications of five large language models in the field of oral auxiliary diagnosis,treatment and health consultation

原文传递

导出

摘要目的评估不同大语言模型(LLM)所提供医疗保健信息的准确性,以探讨其在口腔辅助诊疗和健康咨询中应用的可行性及局限性。方法设计8项共47个有关口腔疾病诊疗的问题[评估LLM作为人工智能(AI)医疗助手的表现]和5项共35个口腔健康咨询方面的问题(评估LLM作为模拟医师的表现),并由5种LLM(文心一言、华佗GPT、通义千问、讯飞星火、ChatGPT)分别作答。由两名工作5年以上的主治医师使用3C标准(正确性、清晰度、简洁性)对答案进行独立评分,并采用Spearman等级相关系数评估评分者间一致性,采用Kruskal-Wallis检验和Dunn事后检验评估各模型间差异。使用600道2023年口腔执业医师资格考试试题,评估各模型的答题耗时、得分及准确率。结果作为AI医疗助手,LLM可辅助医师诊疗决策,评价者间Spearman系数为0.505(P<0.01);作为模拟医师,LLM可对患者进行科普,评价者间Spearman系数为0.533(P<0.01)。各模型作为AI医疗助手和模拟医师的3C评分分别为文心一言2.00(1.00,3.00)、2.00(2.00,3.00)分,华佗GPT 1.00(1.00,2.00)、2.00(1.00,2.00)分,通义千问2.00(1.00,2.00)、2.00(1.00,3.00)分、讯飞星火2.00(1.00,2.00)、2.00(1.75,2.25)分、ChatGPT 3.00(2.00,3.00)、3.00(2.00,3.00)分(满分4分);Kruskal-Wallis检验结果显示,作为AI医疗助手或模拟医师,5种LLM的3C评分差异均有统计学意义(均P<0.001)。5种LLM在口腔执业医师资格考试中的平均得分为370.2分、准确率61.7%(370.2/600)、耗时94.6 min,其中文心一言耗时115 min、得分363分、准确率60.5%(363/600),华佗GPT耗时224 min、得分305分、准确率50.8%(305/600),通义千问耗时43 min、得分438分、准确率73.0%(438/600),讯飞星火耗时32 min、得分364分、准确率60.7%(364/600),ChatGPT耗时59 min、得分381分、准确率63.5%(381/600)。结论基于LLM作为AI医疗助手和模拟医师的双重角色评估,ChatGPT表现最佳,回答基本正确、清晰、简洁,文心一言、通义千问、讯飞星火次之,华佗GPT表现显著落后。口腔执业医师资格考试中,除华佗GPT外其余4个LLM均达到合格水平,且5个LLM的作答时间相较于考试规定的8 h均明显下降。LLM在口腔辅助诊疗和健康咨询中具备应用可行性,可帮助医患双方快速获取医疗信息,但其输出存在错误风险(3C评分未达满分),使用时应审慎判断。 Objective To evaluate the accuracy of the oral healthcare information provided by different large language models(LLM)to explore their feasibility and limitations in the application of oral auxiliary,treatment and health consultation.MethodsThis study designed eight items comprising 47 questions in total related to the diagnosis and treatment of oral diseases[to assess the performance of LLM as an artificial intelligence(AI)medical assistant],and five items comprising 35 questions in total about oral health consultations(to assess the performance of LLM as a simulated doctor).These questions were answered individually by the five LLM models(Erine Bot,HuatuoGPT,Tongyi Qianwen,iFlytek Spark,ChatGPT).Two attending physicians with more than 5 years of experience independently rated the responses using the 3C criteria(correct,clear,concise),and the consistency between the raters was assessed using the Spearman rank correlation coefficient,and the Kruskal-Wallis test and Dunn post hoc test were used to assess the statistical differences between the models.Additionally,this study used 600 questions from the 2023 dental licensing examination to evaluate the time taken to answer,scores,and accuracy of each model.ResultsAs an AI medical assistant,LLM can assist doctors in diagnosis and treatment decision-making,with an inter-evaluator Spearman coefficient of 0.505(P<0.01).As a simulated doctor,LLM can carry out patient popularization,with an inter-evaluator Spearman coefficient of 0.533(P<0.01).The 3C scores of each model as an AI medical assistant and a simulated doctor were respectively:2.00(1.00,3.00)and 2.00(2.00,3.00)points of Erine Bot,1.00(1.00,2.00)and 2.00(1.00,2.00)points of HuatuoGPT,2.00(1.00,2.00)and 2.00(1.00,3.00)points of Tongyi Qianwen,2.00(1.00,2.00)and 2.00(1.75,2.25)points of iFlytek Spark,3.00(2.00,3.00)and 3.00(2.00,3.00)points of ChatGPT(full score of 4 points).The Kruskal-Wallis test results showed that,as an AI medical assistant or a simulated doctor,there were statistically differences in the 3C scores among the five large language models(all P<0.001).The average score of the 5 LLMs on the dental licensing examination was 370.2,with an accuracy rate of 61.7%(370.2/600)and a time consumption of 94.6 min.Specifically,Erine Bot took 115 min,scored 363 points with an accuracy rate of 60.5%(363/600),HuatuoGPT took 224 min and scored 305 points with an accuracy rate of 50.8%(305/600),Tongyi Qianwen took 43 min,scored 438 points with an accuracy rate of 73.0%(438/600),iFlytek Spark took 32 min,scored 364 points with an accuracy rate of 60.7%(364/600),and ChatGPT took 59 min,scored 381 points with an accuracy rate of 63.5%(381/600).ConclusionsBased on the evaluation of LLM′s dual roles as an AI medical assistant and a simulated doctor,ChatGPT performes the best,with basically correct,clear and concise answers,followed by Erine Bot,Tongyi Qianwen and iFlytek Spark,with HuatuoGPT lagging behind significantly.In the dental licensing examination,all the 4 LLM,except for HuatuoGPT,reach the passing level,and the time consumpution for answering is significantly reduced compared to the 8 h required for the exam regulations in all of the five models.LLM has the feasibility of application in oral auxiliary,treatment and health consultation,and it can help both doctors and patients obtain medical information quickly.Howere,their outputs carry a risk of errors(since the 3C scoring results do not reach the full marks),so prudent judgment should be exercised when using them.

作者韩彩玲白石柱张婷民刘琛刘昱晨胡祥翔赵铱民 Han Cailing;Bai Shizhu;Zhang Tingmin;Liu Chen;Liu Yuchen;Hu Xiangxiang;Zhao Yimin(Digital Center,School of Stomatology,The Fourth Military Medical University,State Key Laboratory of Oral&Maxillofacial Reconstruction and Regeneration,National Clinical Research Center for Oral Diseases,Shaanxi Key Laboratory of Stomatology,Xi′an 710032,China)

机构地区空军军医大学口腔医院数字化中心、口颌系统重建与再生全国重点实验室、国家口腔疾病临床医学研究中心、陕西省口腔医学重点实验室

出处《中华口腔医学杂志》北大核心 2025年第8期871-878,共8页 Chinese Journal of Stomatology

基金国家自然科学基金(82471035) 陕西省创新能力支撑计划(2023-CX-PT-27) 口颌系统重建与再生全国重点实验室自主研究课题(2021ZA02)。

关键词人工智能口腔医学咨询口腔保健大语言模型数字化口腔医学 Artificial intelligence Oral medicine Counseling Oral health Large language models Digital dentistry

分类号 R78 [医药卫生—口腔医学]

引文网络
相关文献

参考文献4

1李旭.人工智能时代背景下自然语言处理技术的发展[J].今日自动化,2023(4):100-102. 被引量：2
2Hanyao Huang,Ou Zheng,Dongdong Wang,Jiayi Yin,Zijin Wang,Shengxuan Ding,Heng Yin,Chuan Xu,Renjie Yang,Qian Zheng,Bing Shi.ChatGPT for shaping the future of dentistry: the potential of multi-modal large language model[J].International Journal of Oral Science,2023,15(3):377-389. 被引量：19
3管伯颜,许明鹤,张惠淇,马舒蕾,张珊珊,赵俊峰.大语言模型在儿童口腔预防医学领域问答的准确性比较[J].口腔疾病防治,2025,33(4):313-319. 被引量：4
4Minyu Chen,Guoqiang Li.ChatGPT for mechanobiology and medicine:A perspective[J].Mechanobiology in Medicine,2023,1(1):2-4. 被引量：1

二级参考文献5

1赵园丁.浅谈人工智能时代背景下自然语言处理技术的发展应用[J].办公自动化,2019,24(10):63-64. 被引量：6
2张宾,武斌,周晶,李慧超,王帅.探究人工智能时代背景下自然语言处理技术的发展应用[J].科技风,2020(23):84-84. 被引量：8
3孙伟博,张斌.人工智能时代背景下自然语言处理技术的发展[J].电子技术与软件工程,2020(13):104-105. 被引量：10
4林莉.人工智能时代背景下自然语言处理技术的发展[J].电子世界,2020(22):24-25. 被引量：12
5袁瑞,司敏敏,张印,冯志远.ChatGPT在口腔正畸教育和临床中的应用前景[J].口腔疾病防治,2024,32(6):478-484. 被引量：3

共引文献22

1Mingze Yuan,Peng Bao,Jiajia Yuan,Yunhao Shen,Zifan Chen,Yi Xie,Jie Zhao,Quanzheng Li,Yang Chen,Li Zhang,Lin Shen,Bin Dong.Large language models illuminate a progressive pathway to artificial intelligent healthcare assistant[J].Medicine Plus,2024(2):102-124. 被引量：5
2陈丽先,曾妮,石冰,黄汉尧.Asher-McDade鼻唇评价量表的汉化及信效度初步研究[J].华西口腔医学杂志,2024,42(1):97-103.
3高歌,崔馨心,曾梦雨,曾维,郭际香,张韬,汤炜,刘畅.基于人工智能的诊断试验准确性研究(三):方法学评价与报告规范[J].中国循证医学杂志,2024,24(5):598-604. 被引量：2
4杜美君,曾妮,石冰,黄汉尧.腭裂修复术的理论发展与技术改良:Sommerlad-Furlow改良腭裂整复技术[J].中华口腔医学研究杂志(电子版),2024,18(1):5-11.
5林岚,武雨桐.大型语言模型在医疗领域的应用现状与展望[J].医疗卫生装备,2024,45(8):102-109. 被引量：6
6徐昕恺,陈虎,王相,张旭,张耀鹏,赵晓一,田素坤,孙玉春.口腔数智仿生诊疗技术研究应用进展[J].中国实用口腔科杂志,2024,17(4):406-413. 被引量：4
7冯文华,任朝楠,罗平,彭希琪,梁睿,彭磊,张少华.ChatGPT辅助儿科诊疗与医患沟通的初步探索[J].华西医学,2024,39(8):1273-1276. 被引量：5
8贺妮,牟莉,万晓慧.引入位置编码机制对抗网络的文本生成模型[J].计算机技术与发展,2024,34(9):154-158. 被引量：1
9李克寒,余丽媛,邵企能,蒋可,乌丹旦.大语言模型在口腔住院医师规范化培训中的应用构想[J].中国卫生产业,2024,21(7):155-158. 被引量：5
10Yong Chen,Xiqun(Michael)Chen,Ziyou Gao.Toward equitable,transparent,and collaborative human mobility computing for smart cities[J].The Innovation,2024,5(5):15-16. 被引量：1

同被引文献5

1中华医学会肿瘤学分会,韩宝惠,王洁,钟润波.中华医学会肺癌临床诊疗指南(2024版)[J].中华医学杂志,2024,104(34):3175-3213. 被引量：123
2刘泽垣,王鹏江,宋晓斌,张欣,江奔奔.大语言模型的幻觉问题研究综述[J].软件学报,2025,36(3):1152-1185. 被引量：69
3何静,沈阳,谢润锋.大语言模型幻觉现象的识别与优化[J].计算机应用,2025,45(3):709-714. 被引量：10
4管伯颜,许明鹤,张惠淇,马舒蕾,张珊珊,赵俊峰.大语言模型在儿童口腔预防医学领域问答的准确性比较[J].口腔疾病防治,2025,33(4):313-319. 被引量：4
5吕勇,王钧,樊代明.DeepSeek在临床医学见习教学中的应用[J].医学教育研究与实践,2025,33(4):564-570. 被引量：5

引证文献1

1段智允,刘方益,蒋冬先,王青乐,栾温熠,吴颖,江天,唐汉,谭黎杰.大语言模型在肺癌辅助诊疗中的应用探索[J].中华胸部外科电子杂志,2025,12(3):152-161.

1任智,李维政,何亮,刘奕君.公平低时延的太赫兹无线个域网定向MAC协议[J].小型微型计算机系统,2022,43(12):2651-2656. 被引量：2
2翟欣姣,岳旭,同晓乐,王玉佳,林来儿.不同火次厚度钛及钛合金板材轧制力与道次变形[J].金属世界,2025(3):52-58.
3郑洁霁,秦国栋.医疗信息智能对话机器人设计[J].消费电子,2024(12):37-39.
4李娜,李咏梅,叶梅.炎症性肠病发展的历史沿革:从早期认知到现代诊疗[J].中华炎性肠病杂志(中英文),2025,9(4):279-285.
5刘子璇,吴天宇(综述),陈伟(审校),张奇(审校).人工智能背景下腰椎间盘突出症诊疗的最新进展[J].河北医科大学学报,2025,46(8):916-920.
6陈昊.从读者到编委的二十年同行之路[J].中国卫生质量管理,2025,32(8):124-124.
7王亚安,张宏伟,岳战国,何正文,王少晨.基于DEFORM位置追踪法分析铝合金锻件各部位性能差异原因[J].锻造与冲压,2025(9):33-36.
8崔文俊,张一恒,王智,姚泽,张敬哲,谢英杰,李辉.热处理制度对短流程制备TA15钛合金厚板组织与性能的影响[J].锻压技术,2025,50(7):227-231.
9苗永悦,马斯佳,孙少谦,吴限.基于血枯而挛论治中风后痉挛性偏瘫[J].世界中西医结合杂志,2025,20(7):1448-1452.
10李广海,刘佳,王慧莹.AI赋能的逻辑、应用与实证——应用型本科院校大学英语口语教学有效性分析[J].太原城市职业技术学院学报,2025(8):91-94.

中华口腔医学杂志

2025年第8期

浏览历史

内容加载中请稍等...

口腔辅助诊疗和健康咨询领域5种大语言模型应用初探被引量：1

参考文献4

二级参考文献5

共引文献22

同被引文献5

引证文献1

相关作者

相关机构

相关主题

浏览历史

口腔辅助诊疗和健康咨询领域5种大语言模型应用初探 被引量：1

参考文献4

二级参考文献5

共引文献22

同被引文献5

引证文献1

相关作者

相关机构

相关主题

浏览历史

口腔辅助诊疗和健康咨询领域5种大语言模型应用初探被引量：1