在日新月异的今天,随着人工智能技术的快速发展,不同国家文化间的交流越发紧密,人们对AI翻译的需求越发迫切,对AI翻译工具的使用也越发频繁。然而,AI翻译译文质量在很大程度上受到输入单位(如词汇、句子、段落)的影响。文章选取诗歌She ...在日新月异的今天,随着人工智能技术的快速发展,不同国家文化间的交流越发紧密,人们对AI翻译的需求越发迫切,对AI翻译工具的使用也越发频繁。然而,AI翻译译文质量在很大程度上受到输入单位(如词汇、句子、段落)的影响。文章选取诗歌She Walks In Beauty,以文心一言大模型3.5为研究载体,从翻译美学的角度对比分析输入单位对AI译文质量的影响。研究发现,不同输入单位对译文结果有显著影响。词汇单位一词多义现象严重,准确性最低;句子单位准确性较强,能够更好表达语义信息和审美价值,但无法结合上下文进行情境化思考;段落单位准确性虽无句子高,但能够结合上下句进行多方位思考。针对这一问题,也引发其思考,在使用文心一言工具时应注意输入单位的影响从而得出更高效率、高质量的译文。展开更多
BACKGROUND Patients with hepatitis B virus(HBV)infection require chronic and personalized care to improve outcomes.Large language models(LLMs)can potentially provide medical information for patients.AIM To examine the...BACKGROUND Patients with hepatitis B virus(HBV)infection require chronic and personalized care to improve outcomes.Large language models(LLMs)can potentially provide medical information for patients.AIM To examine the performance of three LLMs,ChatGPT-3.5,ChatGPT-4.0,and Google Gemini,in answering HBV-related questions.METHODS LLMs’responses to HBV-related questions were independently graded by two medical professionals using a four-point accuracy scale,and disagreements were resolved by a third reviewer.Each question was run three times using three LLMs.Readability was assessed via the Gunning Fog index and Flesch-Kincaid grade level.RESULTS Overall,all three LLM chatbots achieved high average accuracy scores for subjective questions(ChatGPT-3.5:3.50;ChatGPT-4.0:3.69;Google Gemini:3.53,out of a maximum score of 4).With respect to objective questions,ChatGPT-4.0 achieved an 80.8%accuracy rate,compared with 62.9%for ChatGPT-3.5 and 73.1%for Google Gemini.Across the six domains,ChatGPT-4.0 performed better in terms of diagnosis,whereas Google Gemini demonstrated excellent clinical manifestations.Notably,in the readability analysis,the mean Gunning Fog index and Flesch-Kincaid grade level scores of the three LLM chatbots were significantly higher than the standard level eight,far exceeding the reading level of the normal population.CONCLUSION Our results highlight the potential of LLMs,especially ChatGPT-4.0,for delivering responses to HBV-related questions.LLMs may be an adjunctive informational tool for patients and physicians to improve outcomes.Nevertheless,current LLMs should not replace personalized treatment recommendations from physicians in the management of HBV infection.展开更多
AIM:To assess the possibility of using different large language models(LLMs)in ocular surface diseases by selecting five different LLMS to test their accuracy in answering specialized questions related to ocular surfa...AIM:To assess the possibility of using different large language models(LLMs)in ocular surface diseases by selecting five different LLMS to test their accuracy in answering specialized questions related to ocular surface diseases:ChatGPT-4,ChatGPT-3.5,Claude 2,PaLM2,and SenseNova.METHODS:A group of experienced ophthalmology professors were asked to develop a 100-question singlechoice question on ocular surface diseases designed to assess the performance of LLMs and human participants in answering ophthalmology specialty exam questions.The exam includes questions on the following topics:keratitis disease(20 questions),keratoconus,keratomalaciac,corneal dystrophy,corneal degeneration,erosive corneal ulcers,and corneal lesions associated with systemic diseases(20 questions),conjunctivitis disease(20 questions),trachoma,pterygoid and conjunctival tumor diseases(20 questions),and dry eye disease(20 questions).Then the total score of each LLMs and compared their mean score,mean correlation,variance,and confidence were calculated.RESULTS:GPT-4 exhibited the highest performance in terms of LLMs.Comparing the average scores of the LLMs group with the four human groups,chief physician,attending physician,regular trainee,and graduate student,it was found that except for ChatGPT-4,the total score of the rest of the LLMs is lower than that of the graduate student group,which had the lowest score in the human group.Both ChatGPT-4 and PaLM2 were more likely to give exact and correct answers,giving very little chance of an incorrect answer.ChatGPT-4 showed higher credibility when answering questions,with a success rate of 59%,but gave the wrong answer to the question 28% of the time.CONCLUSION:GPT-4 model exhibits excellent performance in both answer relevance and confidence.PaLM2 shows a positive correlation(up to 0.8)in terms of answer accuracy during the exam.In terms of answer confidence,PaLM2 is second only to GPT4 and surpasses Claude 2,SenseNova,and GPT-3.5.Despite the fact that ocular surface disease is a highly specialized discipline,GPT-4 still exhibits superior performance,suggesting that its potential and ability to be applied in this field is enormous,perhaps with the potential to be a valuable resource for medical students and clinicians in the future.展开更多
文摘在日新月异的今天,随着人工智能技术的快速发展,不同国家文化间的交流越发紧密,人们对AI翻译的需求越发迫切,对AI翻译工具的使用也越发频繁。然而,AI翻译译文质量在很大程度上受到输入单位(如词汇、句子、段落)的影响。文章选取诗歌She Walks In Beauty,以文心一言大模型3.5为研究载体,从翻译美学的角度对比分析输入单位对AI译文质量的影响。研究发现,不同输入单位对译文结果有显著影响。词汇单位一词多义现象严重,准确性最低;句子单位准确性较强,能够更好表达语义信息和审美价值,但无法结合上下文进行情境化思考;段落单位准确性虽无句子高,但能够结合上下句进行多方位思考。针对这一问题,也引发其思考,在使用文心一言工具时应注意输入单位的影响从而得出更高效率、高质量的译文。
基金Supported by National Natural Science Foundation of China,No.82260133the Key Laboratory Project of Digestive Diseases in Jiangxi Province,No.2024SSY06101Jiangxi Clinical Research Center for Gastroenterology,No.20223BCG74011.
文摘BACKGROUND Patients with hepatitis B virus(HBV)infection require chronic and personalized care to improve outcomes.Large language models(LLMs)can potentially provide medical information for patients.AIM To examine the performance of three LLMs,ChatGPT-3.5,ChatGPT-4.0,and Google Gemini,in answering HBV-related questions.METHODS LLMs’responses to HBV-related questions were independently graded by two medical professionals using a four-point accuracy scale,and disagreements were resolved by a third reviewer.Each question was run three times using three LLMs.Readability was assessed via the Gunning Fog index and Flesch-Kincaid grade level.RESULTS Overall,all three LLM chatbots achieved high average accuracy scores for subjective questions(ChatGPT-3.5:3.50;ChatGPT-4.0:3.69;Google Gemini:3.53,out of a maximum score of 4).With respect to objective questions,ChatGPT-4.0 achieved an 80.8%accuracy rate,compared with 62.9%for ChatGPT-3.5 and 73.1%for Google Gemini.Across the six domains,ChatGPT-4.0 performed better in terms of diagnosis,whereas Google Gemini demonstrated excellent clinical manifestations.Notably,in the readability analysis,the mean Gunning Fog index and Flesch-Kincaid grade level scores of the three LLM chatbots were significantly higher than the standard level eight,far exceeding the reading level of the normal population.CONCLUSION Our results highlight the potential of LLMs,especially ChatGPT-4.0,for delivering responses to HBV-related questions.LLMs may be an adjunctive informational tool for patients and physicians to improve outcomes.Nevertheless,current LLMs should not replace personalized treatment recommendations from physicians in the management of HBV infection.
基金Supported by National Natural Science Foundation of China(No.82160195,No.82460203)Degree and Postgraduate Education Teaching Reform Project of Jiangxi Province(No.JXYJG-2020-026).
文摘AIM:To assess the possibility of using different large language models(LLMs)in ocular surface diseases by selecting five different LLMS to test their accuracy in answering specialized questions related to ocular surface diseases:ChatGPT-4,ChatGPT-3.5,Claude 2,PaLM2,and SenseNova.METHODS:A group of experienced ophthalmology professors were asked to develop a 100-question singlechoice question on ocular surface diseases designed to assess the performance of LLMs and human participants in answering ophthalmology specialty exam questions.The exam includes questions on the following topics:keratitis disease(20 questions),keratoconus,keratomalaciac,corneal dystrophy,corneal degeneration,erosive corneal ulcers,and corneal lesions associated with systemic diseases(20 questions),conjunctivitis disease(20 questions),trachoma,pterygoid and conjunctival tumor diseases(20 questions),and dry eye disease(20 questions).Then the total score of each LLMs and compared their mean score,mean correlation,variance,and confidence were calculated.RESULTS:GPT-4 exhibited the highest performance in terms of LLMs.Comparing the average scores of the LLMs group with the four human groups,chief physician,attending physician,regular trainee,and graduate student,it was found that except for ChatGPT-4,the total score of the rest of the LLMs is lower than that of the graduate student group,which had the lowest score in the human group.Both ChatGPT-4 and PaLM2 were more likely to give exact and correct answers,giving very little chance of an incorrect answer.ChatGPT-4 showed higher credibility when answering questions,with a success rate of 59%,but gave the wrong answer to the question 28% of the time.CONCLUSION:GPT-4 model exhibits excellent performance in both answer relevance and confidence.PaLM2 shows a positive correlation(up to 0.8)in terms of answer accuracy during the exam.In terms of answer confidence,PaLM2 is second only to GPT4 and surpasses Claude 2,SenseNova,and GPT-3.5.Despite the fact that ocular surface disease is a highly specialized discipline,GPT-4 still exhibits superior performance,suggesting that its potential and ability to be applied in this field is enormous,perhaps with the potential to be a valuable resource for medical students and clinicians in the future.