Rd. Penfield, Assessing differential item functioning among multiple groups: A comparison of three Mantel-Haenszel procedures, APPL MEAS E, 14(3), 2001, pp. 235-259
It is often the case in performing a differential item functioning (DIF) an
alysis that comparisons are made between a single reference group and multi
ple focal groups. Conducting a separate test of DIF for each focal group ha
s several undesirable qualities: (a) the Type I error rate will exceed the
intended nominal level if the level of significance for each individual tes
t is not appropriately adjusted, (b) the power may not be as high as a sing
le test that assesses DIF among all groups simultaneously, and (c) substant
ial time and computing resources are required. These drawbacks are potentia
lly avoided by using a procedure that has the capacity to assess DIF across
all groups simultaneously. In this study I compare the performance of thre
e methods of assessing DIF across multiple demographic groups; the Mantel-H
aenszel chi-square statistic with no adjustment to the alpha level, the Man
tel-Haenszel chi-square statistic with a Bonferroni adjusted alpha level, a
nd the Generalized Mantel-Haenszel statistic (GMH) that offers a single tes
t of significance across all groups. Simulations were conducted in which th
ere was a single reference group and 1, 2, 3, and 4 focal groups, having fr
om 1 to all of the focal groups in a given condition experiencing DIE Addit
ional conditions that were varied included group size, focal group ability
distribution, and magnitude of matching criterion contamination. The result
s suggest that GMH is in general the most appropriate procedure because its
Type I error rate remained at the nominal level of 0.05, and its power was
consistently among the highest.