Introduction Large language models(LLMs)have demonstrated potential applications across diverse fields,yet their effectiveness in supporting field epidemiology investigations remains uncertain.Methods We assessed six ...Introduction Large language models(LLMs)have demonstrated potential applications across diverse fields,yet their effectiveness in supporting field epidemiology investigations remains uncertain.Methods We assessed six prominent LLMs(ChatGPT-o4-mini-high,ChatGPT-4o,DeepSeek-R1,DeepSeek-V3,Qwen3-235B-A22B,and Qwen2.5-max)using multiple-choice and case-based questions from the 2025 Zhejiang Field Epidemiology Training Program entrance examination.Model responses were evaluated against standard answers and benchmarked against performance scores from junior epidemiologists.Results For multiple-choice questions,only DeepSeek-V3(75%)exceeded the 75th percentile performance level of junior epidemiologists(67.5%).In case-based assessments,most LLMs achieved or surpassed the 75th percentile of junior epidemiologists,demonstrating particular strength in data analysis tasks.Conclusion Although LLMs demonstrate promise as supportive tools in field epidemiology investigations,they cannot yet replace human expertise.Significant challenges persist regarding the accuracy and timeliness of model outputs,alongside critical concerns about data security and privacy protection that must be addressed before widespread implementation.展开更多
基金Supported by the Medical Health Science and Technology Project of Zhejiang Provincial Health Commission(2024KY896)the Public Health Talent Development Support Program of National Disease Control and Prevention Administration(2025)the Zhejiang Provincial Key Laboratory of Vaccine and Preventive Research on Infectious Diseases from Department of Science and Technology of Zhejiang Province(2024).
文摘Introduction Large language models(LLMs)have demonstrated potential applications across diverse fields,yet their effectiveness in supporting field epidemiology investigations remains uncertain.Methods We assessed six prominent LLMs(ChatGPT-o4-mini-high,ChatGPT-4o,DeepSeek-R1,DeepSeek-V3,Qwen3-235B-A22B,and Qwen2.5-max)using multiple-choice and case-based questions from the 2025 Zhejiang Field Epidemiology Training Program entrance examination.Model responses were evaluated against standard answers and benchmarked against performance scores from junior epidemiologists.Results For multiple-choice questions,only DeepSeek-V3(75%)exceeded the 75th percentile performance level of junior epidemiologists(67.5%).In case-based assessments,most LLMs achieved or surpassed the 75th percentile of junior epidemiologists,demonstrating particular strength in data analysis tasks.Conclusion Although LLMs demonstrate promise as supportive tools in field epidemiology investigations,they cannot yet replace human expertise.Significant challenges persist regarding the accuracy and timeliness of model outputs,alongside critical concerns about data security and privacy protection that must be addressed before widespread implementation.