摘要
主观性测试中,评分员差异是影响测试信度、效度和公平性的重要因素。本文采用多面Rasch模型考察8位评分员对记叙文和议论文两种体裁各60篇作文的评分情况。结果表明,评分员对不同体裁作文的评分存在不一致性:在评分员层面上,评分员的严厉度基本不受体裁的影响,但在评分员的信度与内在一致性方面,议论文评分好于记叙文评分;在评分量表层面上,评分员在评定语言和内容项目上,议论文比记叙文严格,而在条理项目上,议论文比记叙文宽松,并且议论文高分的使用频率比记叙文高。本文还就评分员评分的不一致性的原因进行了探讨,以求为降低评分偏差提供参考。
For the subjective test,the difference of raters is one of the important factors that affect the reliability,validity and fairness of the test. This paper studies eight raters' scoring on narration and negotiation both of sixty by using MFRM. It shows there are inconsistencies in the evaluation of different genres: from the level of raters,every rater's severity is not affected by the genre of composition,but in terms of the rater 's reliability and internal consistency,the scoring results of the argumentative writing is better than that of the narrative; from the level of rating scale,on the evaluation of the language and content item,raters are more stringent for the argumentative than the narrative,but for the consecution,the argumentative is more relaxed than the narrative,and the frequency of using higher scores in the argumentative writing is higher than that of the narrative. This paper also attempts to discuss the reasons for the inconsistencies of raters,so as to provide a reference for reducing the grading bias.
出处
《考试研究》
2018年第1期80-89,共10页
Examinations Research
基金
国家社科基金重大项目"汉语交际能力标准与测评体系研究(项目号:15ZDB101)"
北京市社科规划项目"首都留学生跨文化适应研究"(项目号:13WYB014)的资助