In classical test theory, a test is regarded as a sample of items from
a domain defined by generating rules or by content, process, and form
at specifications. If the items are a random sample of the domain, the
n the percent-correct score on the test estimates the domain score, th
at is, the expected percent correct for all items in the domain. When
the domain is represented by a large set of calibrated items, as in it
em banking applications, item response theory (IRT) provides an altern
ative estimator of the domain score by transformation of the IRT scale
score on the rest. This estimator has the advantage of not requiring
the rest items to be a random sample of the domain, and of having a si
mple standard error. We present here resampling results in real data d
emonstrating for uni-and multidimensional models that the IRT estimato
r is also a more accurate predictor of the domain score than is the cl
assical percent-correct score. These results have implications for rep
orting outcomes of educational qualification testing and assessment.