The relevance of test content to practice is essential for credentiali
ng examinations and one way to ensure it is to collect ratings of item
relevance from job incumbents. This study analyzed ratings of the 132
single-best-answer items and 117 multiple true-false item sets that f
ormed the pretest books in a single administration of a medical certif
ying examination. Ratings collected from 57 practitioners were high (a
n average of more than 4 on a 5-point scale) and correlated with item
difficulty (r =.31 to .34). The relationship between ratings and item
discrimination is less clear (r = -.04 to .31). Application of general
izability theory to the ratings shows that reasonable estimates of ite
m, stem, and total test relevance can be obtained with about 10 raters
.