Recently there has been an increasing interest in models of parallel comput
ation that account for the bandwidth limitations in communication networks.
Some models (e.g., ssp, LOGP, and QSM) account for bandwidth limitations u
sing a per-processor parameter g > I. such that each processor can send/rec
eive at most h messages in g . h time. Other models (e.g., PRAM(m)) account
for bandwidth limitations as an aggregate parameter,ll < p, such that the
p processors can send at most ill messages in total at each step.
This paper provides the first detailed study of the algorithmic implication
s; of modeling parallel bandwidth as a per-processor (local) limitation ver
sus an aggregate (global) limitation. We consider a number of basic problem
s such as broadcasting, parity, summation, and sorting, and give several ne
w upper and lower time bounds that demonstrate the advantage of globally li
mited models over locally limited models given the same aggregate bandwidth
(i.e., p . 1/g = ill). In general, globally limited models have a possible
advantage whenever there is an imbalance in the number of messages sent/rec
eived by the processors. To exploit this advantage, the processors must sch
edule the sending of messages so as to respect the aggregate bandwidth limi
t. We present a new parallel scheduling algorithm for globally limited mode
ls that enable an unknown, arbitrarily unbalanced set of messages to he sen
t through the limited bandwidth within a ( I + epsilon) factor of the optim
al off-line schedule with high probability, even if the penalty for overloa
ding the network is an exponential function of the overload. We also presen
t a near-optimal algorithm for the case where long messages must be sent as
hits in consecutive time steps, as well as for the case where new messages
to be sent arrive dynamically over an infinite time line. These results co
nsider hl,th message passing (distributed memory) and shared memory scenari
os, and improve upon the best results for the locally limited model by a fa
ctor of Theta(g). Finally, we present results quantifying the power of conc
urrent reads in a globally limited bandwidth setting, including chewing an
Omega(p, Ig,m/m Ig p) time separation between the exclusive-read and the co
ncurrent-read PRAM(m) models, which, when iii much less than p, greatly imp
roves upon the 2(Omega(root 1g p)) separation known previously.