This paper explores the problem of solving triangular linear systems on par
allel distributed-memory machines. Working within the LogP model, tight asy
mptotic bounds for solving these systems using forward/backward substitutio
n are presented, Specifically, lower bounds on execution time independent o
f the data layout, lower bounds for data layouts in which the number of dat
a items per processor is bounded, and lower bounds for specific data layout
s commonly used in designing parallel algorithms for this problem are prese
nted in this paper. Furthermore, algorithms are provided which have running
limes within a constant factor of the lower bounds described. One interest
ing result is that the popular two-dimensional block matrix layout necessar
ily results in significantly longer running times than simpler one-dimensio
nal schemes. Finally, a generalization of the lower bounds to banded triang
ular linear systems is presented. (C) 2000 Academic Press.