期刊文献+

ChatGPT-4.0与DeepSeek-V3两种人工智能语言模型在回答近视问题的基准分析比较

Comparison of benchmarking analysis of ChatGPT-4.0 and DeepSeek-V3 of two kinds of AI language models in response to questions about myopia
暂未订购
导出
摘要 目的:比较ChatGPT-4.0与DeepSeek-V3两种人工智能(AI)聊天机器人在应答近视问题的表现差异,为AI聊天机器人的应用提供参考。方法:2024年10月至2025年3月在新加坡国立大学医院(NUHS)和中国北京京煤集团总医院进行大型语言模型(LLM)ChatGPT-4.0与DeepSeek-V3两种AI聊天机器人对近视问题回答结果进行测试,经专家测评比较其准确性和全面性。近视问答内容为眼科临床中最常遇到的30道近视相关问题,包括近视的发病机制、临床表现、诊断、治疗、预防和预后6个主题,从准确性和全面性两方面对两种AI聊天机器人进行评分评价。结果:准确性评价方面,ChatGPT-4.0聊天机器人回答结果被测评为“良好”的11题(占36.7%),DeepSeek-V3聊天机器人为23题(占76.7%),其占比比较差异有统计学意义(x^(2)=9.791,P<0.05)。全面性评价方面,对准确性评价为“良好”的答案,ChatGPT-4.0聊天机器人回答问题全面性评分为(2.44±0.33)分,DeepSeek-V3聊天机器人为(2.63±0.17)分,差异无统计学意义(P>0.05)。结论:AI聊天机器人可为用户的近视咨询提供有效帮助,DeepSeek-V3聊天机器人对近视问题应答的准确性较ChatGPT-4.0聊天机器人更高。 Objective:To compare the difference of performance of ChatGPT-4.0 and DeepSeek-V3 of two kinds of artificial intelligence(AI)Chatbots in response to questions about myopia,so as to provide references for application of AI chatbot.Method:From October 2024 to March 2025,a comparative test about two kinds of AI chatbots,namely ChatGPT-4.0 and DeepSeek-V3 of large language model(LLM),was conducted on the responses to questions about myopia at the National University Hospital of Singapore(NUHS)and Beijing Jingmei Group General Hospital of China.The accuracy and comprehensiveness were detected and evaluated by specialists.The content of the myopia question and answer(Q&A)consisted of 30 myopia-related questions in ophthalmic clinical practice,covering six themes about myopia:the pathogenesis,clinical manifestations,diagnosis,treatment,prevention,and prognosis.The evaluation was conducted by storing two kinds of AI chatbots from two aspects including accuracy and comprehensiveness.Results:In terms of accuracy evaluation,11 results(36.7%)of the answers of ChatGPT-4.0 chatbot were detected and evaluated as“good”,and 23 results(76.7%)of the answers of DeepSeek-V3 chatbot were detected and evaluated as“good”,and the difference of the proportion between two groups was significant(x^(2)=9.791,P<0.05).In terms of the evaluation for comprehensiveness,the comprehensive score of the ChatGPT-4.0 chatbot was(2.44±0.33)points in answering questions,and that of the DeepSeek chatbot was(2.63±0.17)points,and there was not statistically significant difference between them(P>0.05).Conclusion:AI chatbot can provide effective helps about consulting myopia for users.The accuracy of the DeepSeek-V3 chatbot in responding to questions about myopia is superior to that of the ChatGPT-4.0 chatbot.
作者 姚晶磊 李露茜 姜慧君 Sun Chen-Hsin 任骁方 肖林 Yao Jinglei;Li Luxi;Jiang Huijun;Sun Chin-Hsin;Ren Xiaofang;Xiao Lin(Department of Ophthalmology,Beijing Jingmei Group General Hospital,Beijing 102300,China;Department of Ophthalmology,National University Hospital of Singapore,Singapore 119074,China;Department of Ophthalmology,Affiliated Children's Hospital of Capital Institute of Pediatrics,Beijing 100020,China;Department of Ophthalmology,Beijing Shijitan Hospital,Capital Medical University,Beijing 100038,China)
出处 《中国医学装备》 2026年第3期86-89,共4页 China Medical Equipment
基金 北京京煤集团总医院院级科研资助项目(ZZ2024-46)。
关键词 近视 ChatGPT-4.0聊天机器人 DeepSeek-V3聊天机器人 大语言模型(LLM) Myopia ChatGPT-4.0 chatbot DeepSeek-V3 chatbot Large language model(LLM)
  • 相关文献

参考文献3

二级参考文献24

共引文献18

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部