Background Capture-recapture (CR) methods are increasingly used to estimate
the size of human populations, including those with diabetes. Few studies
have examined the demographic details needed to match patients on the lists
used in these techniques, or to determine the optimum number of lists.
Methods Six lists of known diabetic patients attending different medical se
ttings during the study year were obtained. The effects on total enumeratio
n after aggregation of these lists were examined using increasing numbers o
f demographic data items as patient identifiers. The CR estimates of preval
ence were obtained using 15 different combinations of two lists. Estimates
were obtained after log-linear modelling for interdependence between differ
ent combinations of three and four lists, and after combining the six avail
able lists into three logical lists.
Results For matching patients, adding date of birth to first name and famil
y name as matching criteria increased the total of identified patients from
2500 to 2585 (3% increase), corresponding to a period prevalence of 1.5% (
95% CI : 1.41-1.52). Addition of further identifiers, such as partial postc
ode, only increased the estimate by a further 15 patients (0.5%), and more
detailed matching with full postcode introduced uncertainty. The use of two
-list CR yielded widely varying estimates of the total diabetic population
from 1379 (95% CI : 435-2273) to 9554 (95% CI : 7291-10 983). Log-linear mo
delling using different combinations of three and four lists produced estim
ates of 5074 (95% CI : 4417-5947) and 5578 (95% CI : 4918-7081), respective
ly, after compensating for statistical interdependence between the lists us
ed. The appropriate condensation of six available lists into three lists fo
r modelling yielded estimates of 5492 (95% CI : 4870-6285), corresponding t
o a CR-adjusted period prevalence of 3.1% (95% CI: 3.03-3.19%).
Conclusions In a Western population, the only demographic data required for
matching patients on lists used for CR methods are first name, family name
and date of birth, if unique identifiers such as social security numbers a
re not available. Two lists alone do not produce reliable data, and at leas
t three lists are needed to allow for modelling for 'dependence' between da
tasets. The use of more than three lists does not substantially alter the a
bsolute value or confidence of enumeration, and multiple lists (if availabl
e) should be condensed into three lists for use in CR calculations.