Nj. Horton et Sr. Lipsitz, Multiple imputation in practice: Comparison of software packages for regression models with missing variables, AM STATISTN, 55(3), 2001, pp. 244-254
Missing data frequently complicates data analysis for scientific investigat
ions. The development of statistical methods to address missing data has be
en an active area of research in recent decades. Multiple imputation, origi
nally proposed by Rubin in a public use dataset setting, is a general purpo
se method for analyzing datasets with missing data that is broadly applicab
le to a variety of missing data settings. We review multiple imputation as
an analytic strategy for missing data. We describe and evaluate a number of
software packages that implement this procedure, and contrast the interfac
e, features, and results. We compare the packages, and detail shortcomings
and useful features. The comparisons are illustrated using examples from an
artificial dataset and a study of child psychopathology. We suggest additi
onal features as well as discuss limitations and cautions to consider when
using multiple imputation as an analytic strategy for incomplete data setti
ngs.