H. Wainer et al., HOW WELL CAN WE COMPARE SCORES ON TEST FORMS THAT ARE CONSTRUCTED BY EXAMINEES CHOICE, Journal of educational measurement, 31(3), 1994, pp. 183-199
When an exam consists, in whole or in part, of constructed-response it
ems, it is a common practice to allow the examinee to choose a subset
of the questions to answer. This procedure is usually adopted so that
the limited number of items that can be completed in the allotted time
does not unfairly affect the examinee. This results in the de facto a
dministration of several different test forms, where the exact structu
re of any particular form is determined by the examinee. However, when
different forms are administered, a canon of good testing practice re
quires that those forms be equated to adjust for differences in their
difficulty. When the items are chosen by the examinee, traditional equ
ating procedures do not strictly apply due to the nonignorable nature
of the missing responses. In this article, we examine the comparabilit
y of scores on such tests within an IRT framework. We illustrate the a
pproach with data from the College Board's Advanced Placement Test in
Chemistry.