EVALUATING THE GOODNESS-OF-FIT IN MODELS OF SPARSE MEDICAL DATA - A SIMULATION APPROACH

Citation
P. Boyle et al., EVALUATING THE GOODNESS-OF-FIT IN MODELS OF SPARSE MEDICAL DATA - A SIMULATION APPROACH, International journal of epidemiology, 26(3), 1997, pp. 651-656
Citations number
9
Categorie Soggetti
Public, Environmental & Occupation Heath
ISSN journal
03005771
Volume
26
Issue
3
Year of publication
1997
Pages
651 - 656
Database
ISI
SICI code
0300-5771(1997)26:3<651:ETGIMO>2.0.ZU;2-M
Abstract
Background. Epidemiological studies of rare events, which are common i n the medical literature, often involve modelling sparse data sets, As sessing the fit of these models may be complicated by the large number s of observed zeros in the data set, Methods, Poisson models, fitted a s generalized linear models, were used to investigate the referral pat terns of patients suffering from end-stage renal failure in south west Wales, The usual method for assessing the goodness of fit is to compa re the deviance with a chi(2) distribution with appropriate degrees of freedom, However, this test may be invalid when the data set is spars e, as the deviance values may be unusually low compared to the degrees of freedom. This would suggest that there is a problem with underdisp ersion when, in fact, the large numbers of zeros in the data set make the comparison with the chi(2) distribution unreliable, A simulation a pproach is advocated as an alternative method of assessing model fit i n these situations, Results. Three models are considered in detail her e. The first modelled the total referrals in each of the 245 wards in the study area and included two explanatory variables, These observati ons were not unusually sparse and both the chi(2) goodness of fit test and the simulation methodology outlined here suggested that the model did not fit, The second model included the population 'at risk' as an offset and the model improved considerably. Both the chi(2) test and the simulation approach suggested that this model did fit. Finally, th e data were disaggregated into five age groups providing 1225 observat ions and a very sparse data set, According to the chi(2) goodness of f it test, the deviance was very low suggesting that the model was under dispersed, Using simulated data, it was shown that the deviance was no t unusually low and that the model fitted the data reasonably well, Co nclusion, In cases where the data set being modelled is sparse, it is useful to test the goodness of fit of a Poisson model using a simulati on approach, rather than relying on the chi(2) test.