Modeling parallel bandwidth: Local versus global restrictions

Citation
M. Adler et al., Modeling parallel bandwidth: Local versus global restrictions, ALGORITHMIC, 24(3-4), 1999, pp. 381-404
Citations number
52
Categorie Soggetti
Engineering Mathematics
Journal title
ALGORITHMICA
ISSN journal
01784617 → ACNP
Volume
24
Issue
3-4
Year of publication
1999
Pages
381 - 404
Database
ISI
SICI code
0178-4617(199907/08)24:3-4<381:MPBLVG>2.0.ZU;2-U
Abstract
Recently there has been an increasing interest in models of parallel comput ation that account for the bandwidth limitations in communication networks. Some models (e.g., ssp, LOGP, and QSM) account for bandwidth limitations u sing a per-processor parameter g > I. such that each processor can send/rec eive at most h messages in g . h time. Other models (e.g., PRAM(m)) account for bandwidth limitations as an aggregate parameter,ll < p, such that the p processors can send at most ill messages in total at each step. This paper provides the first detailed study of the algorithmic implication s; of modeling parallel bandwidth as a per-processor (local) limitation ver sus an aggregate (global) limitation. We consider a number of basic problem s such as broadcasting, parity, summation, and sorting, and give several ne w upper and lower time bounds that demonstrate the advantage of globally li mited models over locally limited models given the same aggregate bandwidth (i.e., p . 1/g = ill). In general, globally limited models have a possible advantage whenever there is an imbalance in the number of messages sent/rec eived by the processors. To exploit this advantage, the processors must sch edule the sending of messages so as to respect the aggregate bandwidth limi t. We present a new parallel scheduling algorithm for globally limited mode ls that enable an unknown, arbitrarily unbalanced set of messages to he sen t through the limited bandwidth within a ( I + epsilon) factor of the optim al off-line schedule with high probability, even if the penalty for overloa ding the network is an exponential function of the overload. We also presen t a near-optimal algorithm for the case where long messages must be sent as hits in consecutive time steps, as well as for the case where new messages to be sent arrive dynamically over an infinite time line. These results co nsider hl,th message passing (distributed memory) and shared memory scenari os, and improve upon the best results for the locally limited model by a fa ctor of Theta(g). Finally, we present results quantifying the power of conc urrent reads in a globally limited bandwidth setting, including chewing an Omega(p, Ig,m/m Ig p) time separation between the exclusive-read and the co ncurrent-read PRAM(m) models, which, when iii much less than p, greatly imp roves upon the 2(Omega(root 1g p)) separation known previously.