In many psychological experiments, interaction effects in factorial an
alysis of variance (ANOVA) designs are often estimated using total sco
res derived from classical test theory. However, interaction effects c
an be reduced or eliminated by nonlinear monotonic transformations of
a dependent variable. Although cross-over interactions cannot be elimi
nated by transformations, the meaningfulness of other interactions hin
ges on achieving a measurement scale level for which nonlinear transfo
rmations are inappropriate (i.e., at least interval scale level). Clas
sical total test scores do not provide interval level measurement acco
rding to contemporary item response theory (IRT). Nevertheless, rarely
are IRT models applied to achieve more optimal measurement properties
and hence more meaningful interaction effects. This paper provides se
veral conditions under which interaction effects that are estimated fr
om classical total scores, rather than IRT trait scores, can be mislea
ding. Using derived asymptotic expectations from an IRT model, interac
tion effects of zero on the IRT trait scale were often not estimated a
s zero from the total score scale. Further, when nonzero interactions
were specified on the IRT trait scale, the estimated interaction effec
ts were biased inward when estimated from the total score scale. Test
difficulty level determined both the direction and the magnitude of th
e biased interaction effects.