Ordinal categorical assessments are common in medical practice and in resea
rch. Variability in such measurements amongst raters making the assessments
can be problematic. In this paper we consider how such variability can be
described statistically. We review three current approaches, including kapp
a-type statistics, loglinear models for agreement, and latent class agreeme
nt models, and discuss their limitations. We present a new graphical approa
ch to describing interrater variability that involves a simple frequency di
stribution display of the category probabilities. The method enables descri
ption of interrater variability when raters are a random sample from some p
opulation as opposed to the traditional setting in which only a few selecte
d raters provide assessments. Advantages of this approach relative to curre
nt approaches include the following: (1) it provides a simple visual summar
y of the rating data, (2) description is closely linked to familiar methods
for describing variability in continuous measurements, (3) interpretation
is straightforward, and (4) a large sample of raters can be accommodated wi
th ease. We illustrate the method on simulated ordinal data representing ra
diologists' ratings of mammography images and on rating data from a nationa
l image reading study of mammography screening.