ITA
ENG

Estimation and inference for logistic regression with covariate misclassification and measurement error in main study/validation study designs

Authors

Spiegelman, D Rosner, B Logan, R

Citation

D. Spiegelman et al., Estimation and inference for logistic regression with covariate misclassification and measurement error in main study/validation study designs, J AM STAT A, 95(449), 2000, pp. 51-61

Citations number

Categorie Soggetti

Mathematics

Journal title

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION → ACNP

Volume

Issue

449

Year of publication

2000

Pages

51 - 61

Database

ISI

SICI code

Abstract

In epidemiological studies, continuous covariates often are measured with e rror and categorical covariates often are misclassified. Using the logistic regression model to represent the relationship between the binary outcome and the perfectly measured and classified covariates, the model for the obs erved main study data is derived. This derivation relies on the assumption that the error in the continuous covariates is multivariate normally distri buted and uses a chain of logistic regression models to describe the miscla ssification processes. These model assumptions are empirically verified in the validation study, where the misclassified and mismeasured covariates ar e validated using perfectly measured and classified data. The full data lik elihood, including contributions from both the main study and the Validatio n study, is maximized to obtain the maximum likelihood estimates for the pa rameters of the underlying logistic regression model and of the measurement error model and reclassification models simultaneously. Standard asymptoti c theory is applied. An example of this methodology is presented from the N urses' Health Study investigating the relationship between cumulative incid ence of breast cancer and saturated fat, total energy, and alcohol intake. A detailed simulation study was conducted to investigate the small-sample p roperties of these likelihood-based estimates and inferential quantities. N o single estimation/inference option performed satisfactorily when the main study/validation study size was representative of that typically encounter ed in practice; When the validation size was twice or larger than from the usual one, features of asymptotic optimality were more apparent. By example and through simulation, the procedures appeared to be robust to misspecifi cation of the order of the chain of conditional measurement error/reclassif ication models.