P. Boyle et al., EVALUATING THE GOODNESS-OF-FIT IN MODELS OF SPARSE MEDICAL DATA - A SIMULATION APPROACH, International journal of epidemiology, 26(3), 1997, pp. 651-656
Background. Epidemiological studies of rare events, which are common i
n the medical literature, often involve modelling sparse data sets, As
sessing the fit of these models may be complicated by the large number
s of observed zeros in the data set, Methods, Poisson models, fitted a
s generalized linear models, were used to investigate the referral pat
terns of patients suffering from end-stage renal failure in south west
Wales, The usual method for assessing the goodness of fit is to compa
re the deviance with a chi(2) distribution with appropriate degrees of
freedom, However, this test may be invalid when the data set is spars
e, as the deviance values may be unusually low compared to the degrees
of freedom. This would suggest that there is a problem with underdisp
ersion when, in fact, the large numbers of zeros in the data set make
the comparison with the chi(2) distribution unreliable, A simulation a
pproach is advocated as an alternative method of assessing model fit i
n these situations, Results. Three models are considered in detail her
e. The first modelled the total referrals in each of the 245 wards in
the study area and included two explanatory variables, These observati
ons were not unusually sparse and both the chi(2) goodness of fit test
and the simulation methodology outlined here suggested that the model
did not fit, The second model included the population 'at risk' as an
offset and the model improved considerably. Both the chi(2) test and
the simulation approach suggested that this model did fit. Finally, th
e data were disaggregated into five age groups providing 1225 observat
ions and a very sparse data set, According to the chi(2) goodness of f
it test, the deviance was very low suggesting that the model was under
dispersed, Using simulated data, it was shown that the deviance was no
t unusually low and that the model fitted the data reasonably well, Co
nclusion, In cases where the data set being modelled is sparse, it is
useful to test the goodness of fit of a Poisson model using a simulati
on approach, rather than relying on the chi(2) test.