Developing intelligent tools for the integration of information extracted f
rom multiple heterogeneous sources is a challenging issue to effectively ex
ploit the numerous sources available on-line in global information systems.
In this paper, we propose intelligent, tool-supported techniques to inform
ation extraction and integration from both structured and semistructured da
ta sources. An object-oriented language, with an underlying Description Log
ic, called ODLI3, derived from the standard ODMG is introduced for informat
ion extraction. ODLI3 descriptions of the source schemas are exploited firs
t to set a Common Thesaurus for the sources. Information integration is the
n performed in a semiautomatic way by exploiting the knowledge in the Commo
n Thesaurus and ODLI3 descriptions of source schemas with a combination of
clustering techniques and Description Logics. This integration process give
s rise to a virtual integrated view of the underlying sources for which map
ping rules and integrity constraints are specified to handle heterogeneity.
Integration techniques described in the paper are provided in the framewor
k of the MOMIS system based on a conventional wrapper/mediator architecture
. (C) 2001 Elsevier Science B.V. All rights reserved.