SIMPLE RANDOMIZED MERGE SORT ON PARALLEL DISKS/

Citation
Rd. Barve et al., SIMPLE RANDOMIZED MERGE SORT ON PARALLEL DISKS/, Parallel computing, 23(4-5), 1997, pp. 601-631
Citations number
20
Categorie Soggetti
Computer Sciences","Computer Science Theory & Methods
Journal title
ISSN journal
01678191
Volume
23
Issue
4-5
Year of publication
1997
Pages
601 - 631
Database
ISI
SICI code
0167-8191(1997)23:4-5<601:SRMSOP>2.0.ZU;2-W
Abstract
We consider the problem of sorting a file of N records on the D-disk m odel of parallel I/O in which there are two sources of parallelism, Re cords are transferred to and from disk concurrently in blocks of B con tiguous records. In each I/O operation, up to one block can be transfe rred to or from each of the D-disks in parallel, We propose a simple, efficient, randomized mergesort algorithm called SRM that uses a forec ast-and-flush approach to overcome the inherent difficulties of simple merging on parallel disks, SRM exhibits a limited use of randomizatio n and also has a useful deterministic version. Generalizing the techni que of forecasting, our algorithm is able to read in, at any time, the 'right' block from any disk and using the technique of flushing, our algorithm evicts, without any I/O overhead, just the 'right' blocks fr om memory to make space for new ones to be read in. The disk layout of SRM is such that it enjoys perfect write parallelism, avoiding fundam ental inefficiencies of previous mergesort algorithms, By analysis of generalized maximum occupancy problems we are able to derive an analyt ical upper bound on SRM's expected overhead valid for arbitrary inputs , The upper bound derived on expected I/O performance of SRM indicates that SRM is provably better than disk-striped mergesort (DSM) for rea listic parameter values D, M and B. Average-case simulations show furt her improvement on the analytical upper bound. Unlike previously propo sed optimal sorting algorithms, SRM outperforms DSM even when the numb er D of parallel disks is small.