A data-integration system provides access to a multitude of data sources th
rough a single mediated schema. A key bottleneck in building such systems h
as been the laborious manual construction of semantic mappings between the
source schemas and the mediated schema. We describe LSD, a system that empl
oys and extends current machine-learning techniques to semi-automatically f
ind such mappings. LSD first asks the user to provide the semantic mappings
for a small set of data sources, then uses these mappings together with th
e sources to train a set of learners. Each learner exploits a different typ
e of information either in the source schemas or in their data. Once the le
arners have been trained, LSD finds semantic mappings for a new data source
by applying the learners, then combining their predictions using a meta-le
arner. To further improve matching accuracy, we extend machine learning tec
hniques so that LSD can incorporate domain constraints as:an additional sou
rce of knowledge, and develop a novel learner that utilizes the structural
information in XML documents. Our approach thus is distinguished in that it
incorporates multiple types of knowledge. Importantly, its architecture is
extensible to additional learners that may exploit new kinds of informatio
n. We describe a set of experiments on several real-world domains, and show
that LSD proposes semantic mappings with a high degree of accuracy.