Purpose. To apply differential item functioning (DIF) procedures to in
vestigate station gender bias in multiple-station tests of clinical sk
ills, and to compare these results with those obtained by comparing th
e station-score distributions of men and women examinees. Method. The
data were from 23 stations used in the selection of seven successive c
ohorts (1987-1993) of candidates to the Ontario Pre-Internship Program
for graduates of foreign medical schools. The stations had been used
on at least three occasions, with a minimum sample of about 210 candid
ates per station. Each station's score was expressed as both a binary
score and a continuous score, and DIF was assessed using the Mantel-Ha
enszel procedure with the binary scores and analysis of covariance wit
h the continuous scores. For each station, DIF effect sizes were calcu
lated and compared with the gender-group mean differences. Results. Us
ing the binary scores, significant DIF was observed for three stations
; using the continuous scores, significant DIF was observed for five s
tations. Significant gender differences were observed in the scores of
nine stations. In eight, these differences favored women. Overall, in
more stations the direction of DIF favored the men, while the women d
emonstrated higher levels of ability. Conclusion: The results suggest
the importance of using a DIF approach for controlling the ''ability f
actor'' in studies of this kind: although significant gender differenc
es were observed in the continuous score distributions of nine station
s, generally these differences were not indicative of station gender b
ias.