ITA
ENG

How Robust Are Multirater Interrater Reliability Indices to Changes in Frequency Distribution?

Authors

David Quarfoot Richard A. Levine

Citation

David Quarfoot et Richard A. Levine, How Robust Are Multirater Interrater Reliability Indices to Changes in Frequency Distribution?, American statistician , 70(4), 2016, pp. 373-384

Journal title

American statistician → ACNP

ISSN journal

00031305

Volume

Issue

Year of publication

2016

Pages

373 - 384

Database

ACNP

SICI code

Abstract

Interrater reliability studies are used in a diverse set of fields. Often, these investigations involve three or more raters, and thus, require the use of indices such as Fleiss.s kappa, Conger.s kappa, or Krippendorff.s alpha. Through two motivating examples.one theoretical and one from practice.this article exposes limitations of these indices when the units to be rated are not well-distributed across the rating categories. Then, using a Monte Carlo simulation and information visualizations, we argue for the use of two alternative indices, the Brennan.Prediger coefficient and Gwet.s AC2, because the agreement levels reported by these indices are more robust to variation in the distribution of units that raters encounter. The article concludes by exploring the complex, interwoven relationship between the number of levels in a rating instrument, the agreement level present among raters, and the distribution of units that are to be scored