Tr. Belin et Db. Rubin, A METHOD FOR CALIBRATING FALSE-MATCH RATES IN RECORD LINKAGE, Journal of the American Statistical Association, 90(430), 1995, pp. 694-707
Specifying a record-linkage procedure requires both (1) a method for m
easuring closeness of agreement between records, typically a scalar we
ight, and (2) a rule for deciding when to classify records as matches
or nonmatches based on the weights. Here we outline a general strategy
for the second problem, that is, for accurately estimating false-matc
h rates for each possible cutoff weight. The strategy uses a model whe
re the distribution of observed weights are viewed as a mixture of wei
ghts for true matches and weights for false matches. An EM algorithm f
or fitting mixtures of transformed-normal distributions is used to fin
d posterior modes; associated posterior variability is due to uncertai
nty about specific normalizing transformations as well as uncertainty
in the parameters of the mixture model the latter being calculated usi
ng the SEM algorithm. This mixture-model calibration method is shown t
o perform well in an applied setting with census data. Further, a simu
lation experiment reveals that, across a wide variety of settings not
satisfying the model's assumptions, the procedure is slightly conserva
tive on average in the sense of overstating false-match rates, and the
one-sided confidence coverage (i.e., the proportion of times that the
se interval estimates cover or overstate the actual false-match rate)
is very close to the nominal rate.