We show a simple parallel acceleration (from about 2n to about 1.4 squ
are-root n log n parallel arithmetic steps) of the straightforward par
allelization of the substitution algorithm for a nonsingular triangula
r linear system of n equations. This only requires that we increase by
less than 3 times the overall number of flops (or the potential work)
of the former algorithm. The previous parallel acceleration of the su
bstitution algorithm in [1] was slower than ours by the factor log n.