The purpose of this article is to discuss the importance of decision r
eproducibility for performance assessments. When decisions from two ju
dges about a student's performance using comparable tasks correlate, d
ecisions have been considered reproducible. However, when judges diffe
r in expectations and tasks differ in difficulty, decisions may not be
independent of the particular judges or tasks encountered unless appr
opriate adjustments for the observable differences are made. In this s
tudy, data were analyzed with the Facets model and provided evidence t
hat judges grade differently, whether or not the scores given correlat
e well. This outcome suggests that adjustments for differences among j
udge severities should be made before student measures are estimated t
o produce reproducible decisions for certification, achievement, or pr
omotion.