Murphy and DeShon (2000) show that interrater correlations do not provide r
easonable estimates of the reliability of job performance ratings, and sugg
est that better estimates can be obtained by applying the methods of genera
lizability theory. Schmidt, Viswesvaran, and Ones (2000) criticize our sugg
estions as radical, and argue that: (a) the reliability of ratings should b
e evaluated using the parallel test model rather than the more general and
more realistic generalizability model, (b) reliability and validity are dis
tinct concepts that should not be confused, and (c) measurement models have
little to do with substantive models of the processes that generate scores
on a test or measure. All three of these ideas were once part of the psych
ometric mainstream, but progress in psychometrics over the last 3 decades h
as moved the field well beyond these assumptions and approaches. Modern psy
chometric theory calls for dose linkages between measurement models and sub
stantive models of the phenomena being measured.