A comparison of bootstrap standard errors of IRT equating methods for the common-item nonequivalent groups design

Citation
Th. Tsai et al., A comparison of bootstrap standard errors of IRT equating methods for the common-item nonequivalent groups design, APPL MEAS E, 14(1), 2001, pp. 17-30
Citations number
14
Categorie Soggetti
Education
Journal title
APPLIED MEASUREMENT IN EDUCATION
ISSN journal
08957347 → ACNP
Volume
14
Issue
1
Year of publication
2001
Pages
17 - 30
Database
ISI
SICI code
0895-7347(2001)14:1<17:ACOBSE>2.0.ZU;2-Z
Abstract
The primary purpose of this study was to compare bootstrap standard errors of 5 item response theory (IRT) equating methods for the common-item nonequ ivalent groups design. For true-score (Method 1) and observed-score (Method 2) equating, IRT parameters were estimated separately, and a linear scalin g transformation method was used to rescale the IRT parameter estimates for Form X onto the Form Y scale. For IRT chained true-score equating (Method 3), IRT parameters for Form X and Form Y were estimated separately, and the n IRT chained true-score equating was performed. For the last 2 methods, IR T parameters for both forms were estimated simultaneously. Using the simult aneously estimated parameter estimates, IRT true-score (Method 4) and obser ved-score (Method 5) equatings were performed. For each method, the standar d deviation was computed over 500 bootstrap replications to obtain the stan dard error of IRT equating at each raw score point for the new form. The es timated boot-strap standard errors for Methods 4 and 5 were slightly less t han those for Methods 1 and 2. Method 3 produced the greatest standard erro rs. However, the standard errors for all 5 methods were small enough to sug gest that standard errors of equating less than 0.1 standard deviation unit s could be obtained with any method, even with sample sizes of 500.