摘要
本研究对具有深度推理功能的大语言模型(reasoning-capable large language models,rcLLMs)评阅医学生反思性写作作业的可行性和效果进行了探索。在选择作业样本、建立评阅方法的基础上,用2种国内主流rcLLMs进行评阅,并采用定量与定性相结合的混合研究方法将评阅结果与教师评阅结果及评分标准相比较。研究发现2种rcLLMs和教师三者评分的一致性及两两之间评分的相关性较差,2种rcLLMs各自评分的稳定性也较差;rcLLMs生成的评语结构完整,但关注主题与评分标准不完全一致。本研究揭示出目前rcLLMs尚不能作为评阅医学生反思性写作作业的有效工具,讨论了相关影响因素,并强调了建立人机协作策略及反思AI评阅正当性的重要性。
This case study explored the feasibility and efficacy of employing reasoning-capable large language models(rcLLMs)to assess the reflective writing assignments of medical students.Following the selection of writing samples and the establishment of an assessment methodology,two prominent domestic rcLLMs were utilized to evaluate the assignments.A method combining quantitative and qualitative analyses was employed to compare the evaluations from rcLLMs with those from instructors in accordance with the established grading rubric.The findings revealed poor inter-rater reliability due to weak correlations among all three raters collectively and pairwise.Both rcLLMs demonstrated significant variability in their own scoring consistency.While the feedback comments generated by the rcLLMs were well-structured,their focuses exhibited a misalignment with the grading rubric.This study suggests that current rcLLMs are not yet suitable for use as effective tools for assessing the reflective writing assignments of medical students.The influencing factors were discussed,and the need to develop human-AI collaborative strategies,as well as to critically reflect on the legitimacy of employing rcLLMs in assessment,was underscored.
作者
柴桦
闫昱江
卿平
Chai Hua;Yan Yujiang;Qing Ping(Zhulang Career Development Center,West China Second University Hospital,Sichuan University,Chengdu 610041,China;National Research Center for Educational Materials,Chengdu 610041,China;Department of Medical Education,West China Medical Center,Sichuan University,Chengdu 610041,China)
出处
《中华医学教育探索杂志》
2026年第1期1-11,共11页
Chinese Journal of Medical Education Research
基金
四川大学2025年度医学教材建设研究项目(面上项目)(SCUYJ0126)
四川大学高等教育教学改革工程(第十一期)研究项目(SCU11404、SCU11411)
西部妇幼医学研究发展中心2025年研究项目(XBFY-YJXM2025003)。
关键词
医学教育
生成式人工智能
深度推理大语言模型
学业评价
反思性写作
Medical education
Generative artificial intelligence
Reasoning-capable large language model
Assessment
Reflective writing