Interrater reliability (IRR) statistics, like Cohen’s kappa, measure agreement between raters beyond what is expected by chance when classifying items into categories. While Cohen’s kappa has been widely used, it ha...Interrater reliability (IRR) statistics, like Cohen’s kappa, measure agreement between raters beyond what is expected by chance when classifying items into categories. While Cohen’s kappa has been widely used, it has several limitations, prompting development of Gwet’s agreement statistic, an alternative “kappa”statistic which models chance agreement via an “occasional guessing” model. However, we show that Gwet’s formula for estimating the proportion of agreement due to chance is itself biased for intermediate levels of agreement, despite overcoming limitations of Cohen’s kappa at high and low agreement levels. We derive a maximum likelihood estimator for the occasional guessing model that yields an unbiased estimator of the IRR, which we call the maximum likelihood kappa (κML). The key result is that the chance agreement probability under the occasional guessing model is simply equal to the observed rate of disagreement between raters. The κMLstatistic provides a theoretically principled approach to quantifying IRR that addresses limitations of previous κcoefficients. Given the widespread use of IRR measures, having an unbiased estimator is important for reliable inference across domains where rater judgments are analyzed.展开更多
文摘Interrater reliability (IRR) statistics, like Cohen’s kappa, measure agreement between raters beyond what is expected by chance when classifying items into categories. While Cohen’s kappa has been widely used, it has several limitations, prompting development of Gwet’s agreement statistic, an alternative “kappa”statistic which models chance agreement via an “occasional guessing” model. However, we show that Gwet’s formula for estimating the proportion of agreement due to chance is itself biased for intermediate levels of agreement, despite overcoming limitations of Cohen’s kappa at high and low agreement levels. We derive a maximum likelihood estimator for the occasional guessing model that yields an unbiased estimator of the IRR, which we call the maximum likelihood kappa (κML). The key result is that the chance agreement probability under the occasional guessing model is simply equal to the observed rate of disagreement between raters. The κMLstatistic provides a theoretically principled approach to quantifying IRR that addresses limitations of previous κcoefficients. Given the widespread use of IRR measures, having an unbiased estimator is important for reliable inference across domains where rater judgments are analyzed.