Estimation and inference for logistic regression with covariate misclassification and measurement error in main study/validation study designs

Citation
D. Spiegelman et al., Estimation and inference for logistic regression with covariate misclassification and measurement error in main study/validation study designs, J AM STAT A, 95(449), 2000, pp. 51-61
Citations number
26
Categorie Soggetti
Mathematics
Volume
95
Issue
449
Year of publication
2000
Pages
51 - 61
Database
ISI
SICI code
Abstract
In epidemiological studies, continuous covariates often are measured with e rror and categorical covariates often are misclassified. Using the logistic regression model to represent the relationship between the binary outcome and the perfectly measured and classified covariates, the model for the obs erved main study data is derived. This derivation relies on the assumption that the error in the continuous covariates is multivariate normally distri buted and uses a chain of logistic regression models to describe the miscla ssification processes. These model assumptions are empirically verified in the validation study, where the misclassified and mismeasured covariates ar e validated using perfectly measured and classified data. The full data lik elihood, including contributions from both the main study and the Validatio n study, is maximized to obtain the maximum likelihood estimates for the pa rameters of the underlying logistic regression model and of the measurement error model and reclassification models simultaneously. Standard asymptoti c theory is applied. An example of this methodology is presented from the N urses' Health Study investigating the relationship between cumulative incid ence of breast cancer and saturated fat, total energy, and alcohol intake. A detailed simulation study was conducted to investigate the small-sample p roperties of these likelihood-based estimates and inferential quantities. N o single estimation/inference option performed satisfactorily when the main study/validation study size was representative of that typically encounter ed in practice; When the validation size was twice or larger than from the usual one, features of asymptotic optimality were more apparent. By example and through simulation, the procedures appeared to be robust to misspecifi cation of the order of the chain of conditional measurement error/reclassif ication models.