A LOAD-BALANCED PARALLEL SORTING ALGORITHM FOR SHARED-NOTHING ARCHITECTURES

Citation
A. Kumar et al., A LOAD-BALANCED PARALLEL SORTING ALGORITHM FOR SHARED-NOTHING ARCHITECTURES, DISTRIBUTED AND PARALLEL DATABASES, 3(1), 1995, pp. 37-68
Citations number
29
Categorie Soggetti
Computer Sciences, Special Topics","Computer Science Theory & Methods","Computer Science Information Systems
ISSN journal
09268782
Volume
3
Issue
1
Year of publication
1995
Pages
37 - 68
Database
ISI
SICI code
0926-8782(1995)3:1<37:ALPSAF>2.0.ZU;2-A
Abstract
With the popularity of parallel database machines based on the shared- nothing architecture, it has become important to find external sorting algorithms which lead to a load-balanced computation, i.e., balanced execution, communication and output. If during the course of the sorti ng algorithm each processor is equally loaded, parallelism is fully ex ploited. Similarly, balanced communication will not congest the networ k traffic. Since sorting can be used to support a number of other rela tional operations (joins, duplicate elimination, building indexes etc. ) data skew produced by sorting can further lead to execution skew at later stages of these operations. In this paper we present a lend-bala nced parallel sorting algorithm for shared-nothing architectures. It i s a multiple-input multiple-output algorithm with four stages, based o n a generalization of Batcher's odd-even merge. At each stage the n ke ys are evenly distributed among the p processors (i.e., there is no fi nal sequential merge phase) and the distribution of keys between stage s ensures against network congestion. There is no assumption made on t he key distribution and the algorithm performs equally well in the pre sence of duplicate keys. Hence our approach always guarantees its perf ormance, as long as n is greater than p(3), which is the case of inter est for sorting large relations. In addition, processors can be added incrementally.