S. Schildcrout, Jonathan et J. Heagerty, Patrick, Regression analysis of longitudinal binary data with time-dependent environmental covariates: bias and efficiency, Biostatistics (Oxford. Print) , 6(4), 2005, pp. 633-652
Generalized estimating equations (Liang and Zeger, 1986) is a widely used, moment-based procedure to estimate marginal regression parameters.However, a subtle and often overlooked point is that valid inference requires the mean for the response at time t to be expressed properly as a function of the complete past, present, and future values of any time-varying covariate.For example, with environmental exposures it may be necessary to express the response as a function of multiple lagged values of the covariate series.Despite the fact that multiple lagged covariates may be predictive of outcomes, researchers often focus interest on parameters in a 'cross-sectional' model, where the response is expressed as a function of a single lag in the covariate series.Cross-sectional models yield parameters with simple interpretations and avoid issues of collinearity associated with multiple lagged values of a covariate.Pepe and Anderson (1994), showed that parameter estimates for time-varying covariates may be biased unless the mean, given all past, present, and future covariate values, is equal to the cross-sectional mean or unless independence estimating equations are used. Although working independence avoids potential bias, many authors have shown that a poor choice for the response correlation model can lead to highly inefficient parameter estimates.The purpose of this paper is to study the bias.efficiency trade-off associated with working correlation choices for application with binary response data.We investigate data characteristics or design features (e.g. cluster size, overall response association, functional form of the response association, covariate distribution, and others) that influence the small and large sample characteristics of parameter estimates obtained from several different weighting schemes or equivalently 'working' covariance models.We find that the impact of covariance model choice depends highly on the specific structure of the data features, and that key aspects should be examined before choosing a weighting scheme.