Distributed data processing is becoming a reality. Businesses want to do it
for many reasons, and they often must do it in order to stay competitive.
While much of the infrastructure for distributed data processing is already
there (e.g., modern network technology), a number of issues make distribut
ed data processing still a complex undertaking: (1) distributed systems can
become very large, involving thousands of heterogeneous sites including PC
s and mainframe server machines; (2) the state of a distributed system chan
ges rapidly because the load of sites varies over time and new sites are ad
ded to the system; (3) legacy systems need to be integrated-such legacy sys
tems usually have not been designed for distributed data processing and now
need to interact with other (modern) systems in a distributed environment.
This paper presents the state of the art of query processing for distribute
d database and information systems. The paper presents the "textbook" archi
tecture for distributed query processing and a series of techniques that ar
e particularly useful for distributed database systems. These techniques in
clude special join techniques, techniques to exploit intraquery parallelism
, techniques to reduce communication costs, and techniques to exploit cachi
ng and replication of data. Furthermore, the paper discusses different kind
s of distributed systems such as client-server, middleware (multitier), and
heterogeneous database systems, and shows how query processing works in th
ese systems.