期刊文献+

陆空通话口语考试机器评分和人工评分的对比研究

Comparative Study of Automatic Scoring and Human Scoring of Radiotelephony Communication Oral Test
在线阅读 下载PDF
导出
摘要 为了分析ICE (Intelligent Communication Environment:智能陆空通话自主训练平台)软件计算机自动评分的有效性,邀请8位评分员分别为2021-2023学年736名学生的27227条陆空通话口语考试数据评分,利用Pearson相关性系数、人机评分一致性系数和等级一致率对比分析机评分和人评分。结果表明:机器评分比人工评分稍低,尤其是汉译英题目;人、机评分相关性系数、一致性系数和一致率均较好。利用Many-Facet Rasch模型分析可知人工评分员内在的一致性和稳定性较好,但评分的严厉度还是存在显著差异。 In order to analyze the validity of ICE(Intelligent Communication Environment)software automatic scoring,8 experts are invited to manually score the 27227 pieces of radiotelephony voice data from 736 students in academic year of 2021-2023.The Pearson correlation coefficient,consistency rate and the percentage agreement of human-automatic scoring are used to compare and analyze the automatic rating and the human rating.The results indicate the automatic scoring is slightly lower than human scoring,especially the Chinese-English translation questions.The correlation coefficient,consistency rate and the percentage agreement between human scoring and automatic scoring is good in the radiotelephony communication oral test.The analysis with Many-Facet Rasch model reveals that the intra-rater consistency and reliability of human raters are acceptable,whereas the severity of human raters has significant differences.
作者 赵琦 王万乐 宋祥波 赵德斌 杨越 李学明 Zhao Qi;Wang Wanle;Song Xiangbo;Zhao Debin;Yang Yue;Li Xueming(College of Air Traffic Management,Civil Aviation University of China,Tianjin 300300 China)
出处 《中国民航飞行学院学报》 2025年第6期70-75,共6页 Journal of Civil Aviation Flight University of China
基金 中国民航大学教育教学研究项目(CAUC-2021-C2-028)。
关键词 陆空通话口语考试 ICE软件 机器自动评分 人工评分 有效性 Radiotelephony communication oral test ICE Software Automatic scoring Human scoring Validity
  • 相关文献

参考文献8

二级参考文献33

共引文献70

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部