S. Greenland et Wd. Finkle, A CRITICAL-LOOK AT METHODS FOR HANDLING MISSING COVARIATES IN EPIDEMIOLOGIC REGRESSION-ANALYSES, American journal of epidemiology, 142(12), 1995, pp. 1255-1264
Epidemiologic studies often encounter missing covariate values. While
simple methods such as stratification on missing-data status, conditio
nal-mean imputation, and complete-subject analysis are commonly employ
ed for handling this problem, several studies have shown that these me
thods can be biased under reasonable circumstances. The authors review
these results in the context of logistic regression and present simul
ation experiments showing the limitations of the methods, The method b
ased on missing-data indicators can exhibit severe bias even when the
data are missing completely at random, and regression (conditional-mea
n) imputation can be inordinately sensitive to model misspecification.
Even complete-subject analysis can outperform these methods. More sop
histicated methods, such as maximum likelihood, multiple imputation, a
nd weighted estimating equations, have been given extensive attention
in the statistics literature, While these methods are superior to simp
le methods, they are not commonly used in epidemiology, no doubt due t
o their complexity and the lack of packaged software to apply these me
thods. The authors contrast the results of multiple imputation to simp
le methods in the analysis of a case-control study of endometrial canc
er, and they find a meaningful difference in results for age at menarc
he. In general, the authors recommend that epidemiologists avoid using
the missing-indicator method and use more sophisticated methods whene
ver a large proportion of data are missing.