D. Spiegelman et al., Estimation and inference for logistic regression with covariate misclassification and measurement error in main study/validation study designs, J AM STAT A, 95(449), 2000, pp. 51-61
In epidemiological studies, continuous covariates often are measured with e
rror and categorical covariates often are misclassified. Using the logistic
regression model to represent the relationship between the binary outcome
and the perfectly measured and classified covariates, the model for the obs
erved main study data is derived. This derivation relies on the assumption
that the error in the continuous covariates is multivariate normally distri
buted and uses a chain of logistic regression models to describe the miscla
ssification processes. These model assumptions are empirically verified in
the validation study, where the misclassified and mismeasured covariates ar
e validated using perfectly measured and classified data. The full data lik
elihood, including contributions from both the main study and the Validatio
n study, is maximized to obtain the maximum likelihood estimates for the pa
rameters of the underlying logistic regression model and of the measurement
error model and reclassification models simultaneously. Standard asymptoti
c theory is applied. An example of this methodology is presented from the N
urses' Health Study investigating the relationship between cumulative incid
ence of breast cancer and saturated fat, total energy, and alcohol intake.
A detailed simulation study was conducted to investigate the small-sample p
roperties of these likelihood-based estimates and inferential quantities. N
o single estimation/inference option performed satisfactorily when the main
study/validation study size was representative of that typically encounter
ed in practice; When the validation size was twice or larger than from the
usual one, features of asymptotic optimality were more apparent. By example
and through simulation, the procedures appeared to be robust to misspecifi
cation of the order of the chain of conditional measurement error/reclassif
ication models.