M. Roznowski et J. Reith, Examining the measurement quality of tests containing differentially functioning items: Do biased items result in poor measurement?, EDUC PSYC M, 59(2), 1999, pp. 248-269
This study investigated effects of retaining test items manifesting differe
ntial item functioning (DIF) on aspects of the measurement quality and vali
dity of that test's scores. DIF was evaluated using the Mantel-Haenszel pro
cedure, which allows one to detect items that function differently in two g
roups of examinees at constant levels of the trait. Multiple composites of
DIF- and non-DIF-containing items were created to examine the impact of DIF
on the measurement, validity, and predictive relations involving those com
posites. Criteria used were the American College Testing composite, the Sch
olastic Aptitude Test (SAT) verbal (SATV), quantitative (SATQ), composite (
SATC), and grade point average rank percentile. Results indicate measuremen
t quality of tests is not seriously degraded when items manifesting DIF are
retained, even when number of items in the compared composites has been co
ntrolled. Implications of results are discussed within the framework of mul
tiple determinants of item responses.