Reducing communication cost is crucial to achieving good performance o
n scalable parallel machines. This paper presents a new compiler algor
ithm for global analysis and optimization of communication in data-par
allel programs. Our algorithm is distinct from existing approaches in
that rather than handling loop-nests and array references one by one,
it considers all communication in a procedure and their interactions u
nder different placements before making a final decision on the placem
ent of any communication. It exploits the flexibility resulting from t
his advanced analysis to eliminate redundancy, reduce the number of me
ssages, and reduce contention for cache and communication buffers, all
in a unified framework. In contrast, single loop-nest analysis often
retains redundant communication, and more aggressive dataflow analysis
on array sections can generate too many messages or cache and buffer
contention. The algorithm has been implemented in the IBM pHPF compile
r for High Performance Fortran. During compilation, the number of mess
ages per processor goes down by as much as a factor of nine for some H
PF programs. We present performance results for the IBM SP2 and a netw
ork of Spare workstations (NOW) connected by a Myrinet switch. In many
cases, the communication cost is reduced by a factor of two.