期刊文献+

等级反应多水平评分者漂移模型的构建

Formulation of Grade Response Multilevel DRIFT Model
原文传递
导出
摘要 对于评定耗时较长的测验来说,时间因素对评分精确性的影响不容忽视。研究提出了可用于检测评分者漂移效应的等级反应多水平评分者漂移模型,并通过模拟研究对模型性能进行验证。结果表明:模型能够准确估计项目和能力参数;随机效应模型较固定效应模型更能有效地检测出评分者漂移效应,其有效性和稳定性更佳,可进一步验证其在实际情境中的适用性,并进一步发展成含预测变量的全模型。 Recently, both national and international educational assessment programs have highlighted the usefulness of constructed-response item (CR item). Accordingly, the rater effect should be of concern to us. The grade response multilevel facets model (GR-MLFM), which is proposed by Kang, Sun and Zeng (2016), is used to detect the rater effect, and simulations demonstrate that GR-MLFM can not only estimate the item and person parameters precisely, but also detect the rater effect efficiently. GR-MLFM integrates the advantages of the many facets Rasch model, the multilevel random coefficient model, and the grade response model. Like other multi-facet models however, GR-MLFM also regards the rater effect merely as a static effect, which means for GR-MLFM that only the overall rater effect can be obtained across the time stages, while the specific rater effect for each time stage is not observed. In fact, when the rating task takes place over the period of several hours or several days, concern may arise about the comparability of the ratings both between and within raters over time (Wolfe, Moulder, & Myford, 2001). This phenomenon is called DRIFT that means differential rater fimctioning over time. Myford and Wolfe (2009) developed the separate model (SM), based on the many facets Rasch model that includes the time variable as one of the facets of the model, to estimate the specific rater effect for each time stage. With this model, we can obtain the separated severity and the change trend of severity for each rater. Nevertheless, SM cannot find out the factors that affect the rater drift. There are many studies that also aimed at detecting the drift effect with the generalizability theory or other item response models. However, these methods or models can only detect the rater drift effect. They cannot reveal the affected factors behind the rater drift. In order to detect the rater drift and simultaneously find out the factors that can affect the drift, the authors try to construct a model dealing with this situation. We name it the grade response multilevel DRIFT model (GR-MLDM). This model extends the GR-MLFM and inherits the concepts of SM, therefore combines the advantages that derive from both GR-MLFM and SM. Two simulation studies using the rater fixed effect model and the rater random effect model respectively are conducted to evaluate the reasonableness and the feasibility of GR-MLDM. For the fixed effect model, there are four types of parameters (discrimination, difficulty, ability, and rater severity over time) and no interaction effect between raters and time stages, which means that the overall severity of raters will remain the same across time. As for the random effect model, interactions between raters and time stages are allowed. Therefore, the dynamic "rater drift" can be detected. The results show that:(1) GR-MLDM, both the fixed effect model and the random effect model, can estimate the item and person parameters precisely. (2) Compared with the fixed effect model, the random effect model can detect the rater drift more precisely; furthermore, due to the interaction between the raters and time stages, the random effect model appears to be more suitable for large-scale assessments. For further investigation, we will apply GR-MLDM to real situations so as to verify the applicability of this model. In addition, predictors can also be added to the random model to form the full model, which can evaluate factors that affect the rater drift in practice.
出处 《心理科学》 CSSCI CSCD 北大核心 2018年第1期196-203,共8页 Journal of Psychological Science
基金 浙江省自然科学基金(LY15C090003) 教育部人文社会科学研究规划基金(16YJA190002)的资助
关键词 评分者漂移 评分者漂移模型 固定效应 随机效应 differential rater functioning over time, grade response multilevel DRIFT model, fixed effect, random effect
  • 相关文献

参考文献3

二级参考文献37

  • 1纪凌开.分部评分模型与其它几种多级模型的比较[J].心理科学,2004,27(4):1000-1001. 被引量:7
  • 2戴海崎,简小珠.被试作答的偶然性对IRT能力估计的影响研究[J].心理科学,2005,28(6):1433-1436. 被引量:6
  • 3田清源.主观评分中多面Rasch模型的应用[J].心理学探新,2006,26(1):70-73. 被引量:17
  • 4Andrich, D. (1995). Distinctive and incompatible properties of two common classes of IRT models for graded responses. Applied Psychological Measurement, 19(1), 101-119.
  • 5Barkaoui, K. (2010). Explaining ESL essay holistic scores: A multilevel modeling approach. Language Testing, 27(4), 515-535.
  • 6Cook, K. F., Dodd, B. G., & Fitzpatrick, S. J. (1999). A comparison of three polytomous item response theory models in the context of testlet scoring. Journal of Outcome Measurement, 3(1), 1-20,.
  • 7De Boeck, P., & Wilson, M. (2004). A framework for item response models. In P. De Boeck & M. Wilson (Eds.),Explanatory item response models (pp. 3-41). New York:Springer.
  • 8Farrokhi, F., & Esfandiari, R. (2011). A many-facet Rasch model to detect halo effect in three types of raters. Theory and Practice in Language Studies, 1(11), 1531-1540.
  • 9Fishman, G. S. (1972). Bias considerations in simulation experiments. Operations Research, 20(4), 785-790.
  • 10Guo, S. (2014). Correction of rater effects in longitudinal research with a cross- classified random effects model. Applied Psycbological Measurement, 38(1), 37-60.

共引文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部