Automating the approximate record-matching process

Citation
Vs. Verykios et al., Automating the approximate record-matching process, INF SCI, 126(1-4), 2000, pp. 83-98
Citations number
20
Categorie Soggetti
Information Tecnology & Communication Systems
Journal title
INFORMATION SCIENCES
ISSN journal
00200255 → ACNP
Volume
126
Issue
1-4
Year of publication
2000
Pages
83 - 98
Database
ISI
SICI code
0020-0255(200007)126:1-4<83:ATARP>2.0.ZU;2-J
Abstract
Data quality has many dimensions one of which is accuracy. Accuracy is usua lly compromised by errors accidentally or intensionally introduced in a dat abase system. These errors result in inconsistent, incomplete, or erroneous data elements. For example, a small variation in the representation of a d ata object, produces a unique instantiation of the object being represented . In order to improve the accuracy of the data stored in a database system, we need to compare them either with real-world counterparts or with other data stored in the same or a different system. In this paper, we address th e problem of matching records which refer to the same entity by computing t heir similarity. Exact record matching has limited applicability in this co ntext since even simple errors like character transpositions cannot be capt ured in the record-linking process. Our methodology deploys advanced data-m ining techniques for dealing with the high computational and inferential co mplexity of approximate record matching. (C) 2000 Elsevier Science Inc. All rights reserved.