ITA
ENG

A comparison of bootstrap standard errors of IRT equating methods for the common-item nonequivalent groups design

Authors

Tsai, TH Hanson, BA Kolen, MJ Forsyth, RA

Citation

Th. Tsai et al., A comparison of bootstrap standard errors of IRT equating methods for the common-item nonequivalent groups design, APPL MEAS E, 14(1), 2001, pp. 17-30

Citations number

Categorie Soggetti

Education

Journal title

APPLIED MEASUREMENT IN EDUCATION

ISSN journal

08957347 → ACNP

Volume

Issue

Year of publication

2001

Pages

17 - 30

Database

ISI

SICI code

0895-7347(2001)14:1<17:ACOBSE>2.0.ZU;2-Z

Abstract

The primary purpose of this study was to compare bootstrap standard errors of 5 item response theory (IRT) equating methods for the common-item nonequ ivalent groups design. For true-score (Method 1) and observed-score (Method 2) equating, IRT parameters were estimated separately, and a linear scalin g transformation method was used to rescale the IRT parameter estimates for Form X onto the Form Y scale. For IRT chained true-score equating (Method 3), IRT parameters for Form X and Form Y were estimated separately, and the n IRT chained true-score equating was performed. For the last 2 methods, IR T parameters for both forms were estimated simultaneously. Using the simult aneously estimated parameter estimates, IRT true-score (Method 4) and obser ved-score (Method 5) equatings were performed. For each method, the standar d deviation was computed over 500 bootstrap replications to obtain the stan dard error of IRT equating at each raw score point for the new form. The es timated boot-strap standard errors for Methods 4 and 5 were slightly less t han those for Methods 1 and 2. Method 3 produced the greatest standard erro rs. However, the standard errors for all 5 methods were small enough to sug gest that standard errors of equating less than 0.1 standard deviation unit s could be obtained with any method, even with sample sizes of 500.