Background: The E3N Study, "Etude Epidemiologique aupres de femmes de la Mu
tuelle Generale de l'Education Nationale", is a cohort study, aiming at stu
dying cancer risk factors on 100,000 women. Even if the incidence of proble
matic (missing, incoherent, etc.) data is low, any multivariate analysis wh
ich would be based only on complete subjects would rely on a too small samp
le, which would not necessarily be representative of the studied population
. Results could thus be biased.
Methods: Our dealing with problematic data includes: 1) the identification
of problematic data: locating these data, looking for their source and diff
erentiating their process of existence, 2) the definition of the methodolog
y, and 3) the implementation of the methods, cold-deck, and multiple imputa
tion for Missing At Random data.
Results: We looked at the number of individuals on which an analysis on 19
variables could be undertaken. The management of missing data made exploita
ble one fourth of the cohort, i.e. 74.6% of individuals instead of 50.5%. M
oreover, for 89.0% of subjects, one variable at most (out of the 19 studied
) has missing datum.
Conclusions: The main difficulty does not stand so much in the choice and i
mplementation of methods to deal with problematic data than in the identifi
cation of their process of existence. Most of what was gained was due to th
e simplest methods: cold-deck and deductive method.