The state of the art in distributed query processing

Authors
Citation
D. Kossmann, The state of the art in distributed query processing, ACM C SURV, 32(4), 2000, pp. 422-469
Citations number
162
Categorie Soggetti
Computer Science & Engineering
Journal title
ACM COMPUTING SURVEYS
ISSN journal
03600300 → ACNP
Volume
32
Issue
4
Year of publication
2000
Pages
422 - 469
Database
ISI
SICI code
0360-0300(200012)32:4<422:TSOTAI>2.0.ZU;2-J
Abstract
Distributed data processing is becoming a reality. Businesses want to do it for many reasons, and they often must do it in order to stay competitive. While much of the infrastructure for distributed data processing is already there (e.g., modern network technology), a number of issues make distribut ed data processing still a complex undertaking: (1) distributed systems can become very large, involving thousands of heterogeneous sites including PC s and mainframe server machines; (2) the state of a distributed system chan ges rapidly because the load of sites varies over time and new sites are ad ded to the system; (3) legacy systems need to be integrated-such legacy sys tems usually have not been designed for distributed data processing and now need to interact with other (modern) systems in a distributed environment. This paper presents the state of the art of query processing for distribute d database and information systems. The paper presents the "textbook" archi tecture for distributed query processing and a series of techniques that ar e particularly useful for distributed database systems. These techniques in clude special join techniques, techniques to exploit intraquery parallelism , techniques to reduce communication costs, and techniques to exploit cachi ng and replication of data. Furthermore, the paper discusses different kind s of distributed systems such as client-server, middleware (multitier), and heterogeneous database systems, and shows how query processing works in th ese systems.