Six radiologists used continuous scales to rate 529 chest-film cases for li
kelihood of five different types of abnormalities (interstitial disease, no
dule, pneumothorax, alveolar infiltrate, and rib fracture! in each of six r
eplicated readings, yielding 36 separate ratings of each case for the five
abnormalities. Separate data analyses of all cases and subsets of the diffi
cult/subtle cases for each abnormality estimated the relative gains in accu
racy (linear-scaled area below the ROC curve) obtained by averaging the cas
e-ratings across (a) six independent replications by each reader (25% gain)
, (b) six different readers within each replication (34% gain), or (c) all
36 readings (48% gain). Although accuracy differed among both readers and a
bnormalities, ROC curves for the median ratings showed similar relative gai
ns in accuracy, somewhat greater than those predicted from the measured rat
ing correlations. A model for variance components in the observer's latent
decision variable could predict these gains from measured correlations in t
he single ratings of cases. Depending on whether the model's estimates were
based on realized accuracy gains or on rating correlations, about 48% or 3
9% of each reader's total decision variance (summed variance for positive a
nd negative cases) consisted of random (within-reader) error that was uncor
related between replications, another 10% or 14% came from idiosyncratic re
sponses to individual cases, and about 43% or 47% was systematic variation
that all readers found in the sampled cases. (C) 2000 American Association
of Physicists in Medicine. [S0094-2405(00)00608-8].