Rl. Johnson et al., The relation between score resolution methods and interrater reliability: An empirical study of an analytic scoring rubric, APPL MEAS E, 13(2), 2000, pp. 121-138
When the raters of constructed-response items, such as writing samples, dis
agree on the level of proficiency exhibited in an item, testing agencies mu
st resolve the score discrepancy before computing an operational score for
release to the public. Several forms of score resolution are used throughou
t the assessment industry. In this study, we selected 4 of the more common
forms of score resolution that were reported in a national survey of testin
g agencies and investigated the effect that each form of resolution has on
the interrater reliability associated with the resulting operational scores
. It is shown that some forms of resolution can be associated with higher r
eliability than other forms and that some forms maybe associated with artif
icially inflated interrater reliability. Moreover, it is shown that the cho
ice of resolution method may affect the percentage of papers that are defin
ed as passing in a high-stakes assessment.