We provide optimal within a constant explicit upper bounds on the make
span of schedules for tree-structured programs on mesh arrays of proce
ssors, and provide polynomial-time algorithms to find schedules with m
akespan matching these bounds. In particular, we show how to find, in
polynomial time, a (nonpreemptive) schedule for a binary tree dag with
n unit execution time tasks and height h on a d-dimensional mesh arra
y with m processors and links of unit bandwidth and unit propagation d
elay whose makespan is O(n/m + n(l/(d+1)) + h), i.e., optimal within a
constant factor. Further, we extend these schedules to bounded degree
forest dags with arbitrary positive integer execution time tasks and
to meshes when the propagation delay of all the links is an arbitrary
positive integer. Thus, we provide a polynomial-time approximation alg
orithm for an NP-hard problem, with a performance ratio that is a cons
tant. We also show how to schedule tree dags on any parallel architect
ure that satisfies certain natural, not very restrictive, conditions t
hat are satisfied by most parallel architectures used in practice. Let
epsilon be a fixed positive real number. We provide polynomial time c
omputable schedules for binary tree dags with n unit execution time ta
sks and height h is not an element of (g(n)n(-epsilon), g(n) log n) on
any parallel architecture satisfying those conditions, with unit band
width and unit propagation delay Links, with optimal up to a constant
makespan O(g(n) + h), where g is a function that depends only on that
architecture. The number of processors used is optimal within a consta
nt factor if h less than or equal to g(n)n(-epsilon), and is optimal w
ithin an O(log n) factor if h greater than or equal to g(n) log n. As
an example, for hypercube and complete binary tree architectures, we a
chieve optimal within a constant makespan O(h) when h = Omega(log(2) n
), using an optimal within an O (log n) factor number of processors. F
urther, we extend these schedules to the case of bounded-degree forest
dags with tasks of arbitrary positive integer execution times and arc
hitectures when the propagation delay of all the links is a given arbi
trary positive integer.