Th. Tsai et al., A comparison of bootstrap standard errors of IRT equating methods for the common-item nonequivalent groups design, APPL MEAS E, 14(1), 2001, pp. 17-30
The primary purpose of this study was to compare bootstrap standard errors
of 5 item response theory (IRT) equating methods for the common-item nonequ
ivalent groups design. For true-score (Method 1) and observed-score (Method
2) equating, IRT parameters were estimated separately, and a linear scalin
g transformation method was used to rescale the IRT parameter estimates for
Form X onto the Form Y scale. For IRT chained true-score equating (Method
3), IRT parameters for Form X and Form Y were estimated separately, and the
n IRT chained true-score equating was performed. For the last 2 methods, IR
T parameters for both forms were estimated simultaneously. Using the simult
aneously estimated parameter estimates, IRT true-score (Method 4) and obser
ved-score (Method 5) equatings were performed. For each method, the standar
d deviation was computed over 500 bootstrap replications to obtain the stan
dard error of IRT equating at each raw score point for the new form. The es
timated boot-strap standard errors for Methods 4 and 5 were slightly less t
han those for Methods 1 and 2. Method 3 produced the greatest standard erro
rs. However, the standard errors for all 5 methods were small enough to sug
gest that standard errors of equating less than 0.1 standard deviation unit
s could be obtained with any method, even with sample sizes of 500.