A simulation study of confounding in generalized linear models for air pollution epidemiology

Citation
C. Chen et al., A simulation study of confounding in generalized linear models for air pollution epidemiology, ENVIR H PER, 107(3), 1999, pp. 217-222
Citations number
17
Categorie Soggetti
Environment/Ecology,"Pharmacology & Toxicology
Journal title
ENVIRONMENTAL HEALTH PERSPECTIVES
ISSN journal
00916765 → ACNP
Volume
107
Issue
3
Year of publication
1999
Pages
217 - 222
Database
ISI
SICI code
0091-6765(199903)107:3<217:ASSOCI>2.0.ZU;2-R
Abstract
Confounding between the model covariates and causal variables (which may or may not be included as model covariates) is a well-known problem in regres sion models used in air pollution epidemiology. This problem is usually ack nowledged but hardly ever investigated, especially in the context of genera lized linear models. Using synthetic data sets, the present study shows how model overfit, underfit, and misfit in the presence of correlated causal v ariables in a Poisson regression model affect the estimated coefficients of the covariates and their confidence levels. The study also shows how this effect changes with the ranges of the covariates and the sample size. There is qualitative agreement between these study results and the corresponding expressions in the large-sample limit for the ordinary linear models. Conf ounding of covariates in an overfitted model (with covariates encompassing more than just the causal variables) does not bias the estimated coefficien ts but reduces their significance. The effect of model underfit (with some causal variables excluded as covariates) or misfit (with covariates encompa ssing only noncausal variables), on the other hand, leads to not only erron eous estimated coefficients, but a misguided confidence, represented by lar ge t-values, that the estimated coefficients are significant. The results o f this study indicate that models which use only one or two air quality var iables, such as particulate matter less than or equal to 10 mu m and sulfur dioxide, are probably unreliable, and that models containing several corre lated and toxic or potentially toxic air quality variables should also be i nvestigated in order to minimize the situation of model underfit or misfit.