Researchers assessing interrater agreement for ratings of a single tar
get have increasingly used the r(WG(j)) index, but have found it can d
isplay irregular behavior. Mathematical analyses show this problem ari
ses from the use of random response, operationalized by the variance o
f a uniform distribution (s(EU)(2)), for the baseline of comparison. T
hese analyses suggest that researchers should continue to use r(WG)(j)
as a summary measure of interrater agreement, but should use maximum
dissensus as a reference distribution for computing r(WG)(j). Although
values of s(EU)(2) can be descriptively misleading, they provide an i
mportant inferential baseline. Thus, s(EU)(2) should be used in comput
ing chi(2) tests Of the departure of the observed response variance fr
om random responding. Researchers should also examine interrater agree
ment as a theoretical variable in its own right, investigating the cau
ses and consequences of rater dissensus.