New applications of information systems, such as electronic commerce and he
althcare information systems, need to integrate a large number of heterogen
eous databases over computer networks. Answering a query in these applicati
ons usually involves selecting relevant information sources and generating
a query plan to combine the data automatically. As significant progress has
been made in source selection and plan generation, the critical issue has
been shifting to query optimization. This paper presents a semantic query o
ptimization (SQO) approach to optimizing query plans of heterogeneous multi
database systems. This approach provides global optimization for query plan
s as well as local optimization for subqueries that retrieve data from indi
vidual database sources. An important feature of our local optimization alg
orithm is that we prove necessary and sufficient conditions to eliminate an
unnecessary join in a conjunctive query of arbitrary join topology. This f
eature allows our optimizer to utilize more expressive relational rules to
provide a wider range of possible optimizations than previous work in SQO.
The local optimization algorithm also features a new data structure called
AND-OR implication graphs to facilitate the search for optimal queries. The
se features allow the global optimization to effectively use semantic knowl
edge to reduce data transmission cost. We have implemented this approach in
to the PESTO query plan optimizer as a part of the SIMS information mediato
r. Experimental results demonstrate that PESTO can provide significant savi
ngs in query execution cost over query plan execution without optimization.