Pipelining is normally associated with shared memory and vector comput
ers and rarely used as an algorithm design technique for distributed m
emory architectures. In this paper we show how pipelining enables comm
unication and computation to be overlapped on a distributed memory par
allel computer (128-processor T800 Parsytec SuperCluster) yielding a s
ignificant speedup. A linear solver based on Givens rotations is selec
ted and parallelized using two different techniques. A non-overlapping
algorithm using collective communication, such as optimized broadcast
and collection, is compared with a pipelined (overlapping) algorithm
using only simple point-to-point communications between neighbouring p
rocessors. Both algorithms use the same computational modules which ha
ve been identified and extracted from the sequential code.