Probabilistic record linkage: Relationships between file sizes, identifiers, and match weights

Citation
Lj. Cook et al., Probabilistic record linkage: Relationships between file sizes, identifiers, and match weights, METH INF M, 40(3), 2001, pp. 196-203
Citations number
18
Categorie Soggetti
Research/Laboratory Medicine & Medical Tecnology
Journal title
METHODS OF INFORMATION IN MEDICINE
ISSN journal
00261270 → ACNP
Volume
40
Issue
3
Year of publication
2001
Pages
196 - 203
Database
ISI
SICI code
0026-1270(200107)40:3<196:PRLRBF>2.0.ZU;2-D
Abstract
This study investigates relationships between file sizes, amounts of inform ation contained in commonly used record linkage variables, and the amount o f information needed for a successful probabilistic linkage project. We pre sent an equation predicting the amount of information needed for a successf ul linkage project. Match weights for variables commonly used in record lin kage are measured using artificially created databases. Linkage algorithms were successful when the sum of minimum weights for variables used in a lin kage exceeded the predicted cutoff. Linkage results were acceptable when th is sum was near the predicted cutoff. This technique enables researchers to determine if enough information exists to perform a successful probabilist ic linkage.