ITA
ENG

Reconciling schemas of disparate data sources: A machine-learning approach

Authors

Doan, AH Domingos, P Halevy, A

Citation

Ah. Doan et al., Reconciling schemas of disparate data sources: A machine-learning approach, SIG RECORD, 30(2), 2001, pp. 509-520

Citations number

Categorie Soggetti

Computer Science & Engineering

Journal title

SIGMOD RECORD

ISSN journal

01635808 → ACNP

Volume

Issue

Year of publication

2001

Pages

509 - 520

Database

ISI

SICI code

0163-5808(200106)30:2<509:RSODDS>2.0.ZU;2-U

Abstract

A data-integration system provides access to a multitude of data sources th rough a single mediated schema. A key bottleneck in building such systems h as been the laborious manual construction of semantic mappings between the source schemas and the mediated schema. We describe LSD, a system that empl oys and extends current machine-learning techniques to semi-automatically f ind such mappings. LSD first asks the user to provide the semantic mappings for a small set of data sources, then uses these mappings together with th e sources to train a set of learners. Each learner exploits a different typ e of information either in the source schemas or in their data. Once the le arners have been trained, LSD finds semantic mappings for a new data source by applying the learners, then combining their predictions using a meta-le arner. To further improve matching accuracy, we extend machine learning tec hniques so that LSD can incorporate domain constraints as:an additional sou rce of knowledge, and develop a novel learner that utilizes the structural information in XML documents. Our approach thus is distinguished in that it incorporates multiple types of knowledge. Importantly, its architecture is extensible to additional learners that may exploit new kinds of informatio n. We describe a set of experiments on several real-world domains, and show that LSD proposes semantic mappings with a high degree of accuracy.