Data quality has many dimensions one of which is accuracy. Accuracy is usua
lly compromised by errors accidentally or intensionally introduced in a dat
abase system. These errors result in inconsistent, incomplete, or erroneous
data elements. For example, a small variation in the representation of a d
ata object, produces a unique instantiation of the object being represented
. In order to improve the accuracy of the data stored in a database system,
we need to compare them either with real-world counterparts or with other
data stored in the same or a different system. In this paper, we address th
e problem of matching records which refer to the same entity by computing t
heir similarity. Exact record matching has limited applicability in this co
ntext since even simple errors like character transpositions cannot be capt
ured in the record-linking process. Our methodology deploys advanced data-m
ining techniques for dealing with the high computational and inferential co
mplexity of approximate record matching. (C) 2000 Elsevier Science Inc. All
rights reserved.