A METHOD FOR CALIBRATING FALSE-MATCH RATES IN RECORD LINKAGE

Authors
Citation
Tr. Belin et Db. Rubin, A METHOD FOR CALIBRATING FALSE-MATCH RATES IN RECORD LINKAGE, Journal of the American Statistical Association, 90(430), 1995, pp. 694-707
Citations number
34
Categorie Soggetti
Statistic & Probability","Statistic & Probability
Volume
90
Issue
430
Year of publication
1995
Pages
694 - 707
Database
ISI
SICI code
Abstract
Specifying a record-linkage procedure requires both (1) a method for m easuring closeness of agreement between records, typically a scalar we ight, and (2) a rule for deciding when to classify records as matches or nonmatches based on the weights. Here we outline a general strategy for the second problem, that is, for accurately estimating false-matc h rates for each possible cutoff weight. The strategy uses a model whe re the distribution of observed weights are viewed as a mixture of wei ghts for true matches and weights for false matches. An EM algorithm f or fitting mixtures of transformed-normal distributions is used to fin d posterior modes; associated posterior variability is due to uncertai nty about specific normalizing transformations as well as uncertainty in the parameters of the mixture model the latter being calculated usi ng the SEM algorithm. This mixture-model calibration method is shown t o perform well in an applied setting with census data. Further, a simu lation experiment reveals that, across a wide variety of settings not satisfying the model's assumptions, the procedure is slightly conserva tive on average in the sense of overstating false-match rates, and the one-sided confidence coverage (i.e., the proportion of times that the se interval estimates cover or overstate the actual false-match rate) is very close to the nominal rate.