Dealing with missing, abnormal and incoherent data in E3N cohort study

Citation
S. Garcia-acosta et F. Clavel-chapelon, Dealing with missing, abnormal and incoherent data in E3N cohort study, REV EPIDEM, 47(6), 1999, pp. 515-523
Citations number
13
Categorie Soggetti
Envirnomentale Medicine & Public Health
Journal title
REVUE D EPIDEMIOLOGIE ET DE SANTE PUBLIQUE
ISSN journal
03987620 → ACNP
Volume
47
Issue
6
Year of publication
1999
Pages
515 - 523
Database
ISI
SICI code
0398-7620(199912)47:6<515:DWMAAI>2.0.ZU;2-Q
Abstract
Background: The E3N Study, "Etude Epidemiologique aupres de femmes de la Mu tuelle Generale de l'Education Nationale", is a cohort study, aiming at stu dying cancer risk factors on 100,000 women. Even if the incidence of proble matic (missing, incoherent, etc.) data is low, any multivariate analysis wh ich would be based only on complete subjects would rely on a too small samp le, which would not necessarily be representative of the studied population . Results could thus be biased. Methods: Our dealing with problematic data includes: 1) the identification of problematic data: locating these data, looking for their source and diff erentiating their process of existence, 2) the definition of the methodolog y, and 3) the implementation of the methods, cold-deck, and multiple imputa tion for Missing At Random data. Results: We looked at the number of individuals on which an analysis on 19 variables could be undertaken. The management of missing data made exploita ble one fourth of the cohort, i.e. 74.6% of individuals instead of 50.5%. M oreover, for 89.0% of subjects, one variable at most (out of the 19 studied ) has missing datum. Conclusions: The main difficulty does not stand so much in the choice and i mplementation of methods to deal with problematic data than in the identifi cation of their process of existence. Most of what was gained was due to th e simplest methods: cold-deck and deductive method.