The Fortran D compiler uses data decomposition specifications to autom
atically translate Fortran programs for execution on MIMD distributed-
memory machines. This paper introduces and classifies a number of adva
nced optimizations needed to achieve acceptable performance; they are
analyzed and empirically evaluated for stencil computations. Communica
tion optimizations reduce communication overhead by decreasing the num
ber of messages and hide communication overhead by overlapping the cos
t of remaining messages with local computation. Parallelism optimizati
ons exploit parallel and pipelined computations and may need to restru
cture the computation to increase parallelism. Profitability formulas
are derived for each optimization. Empirical results show that exploit
ing parallelism for pipelined computations, reductions, and scans is v
ital. Message vectorization, collective communication, and efficient c
oarse-grain pipelining also significantly affect performance. Scalabil
ity of communication and parallelism optimizations are analyzed. The e
ffectiveness of communication optimizations is dictated by the ratio o
f communication to computation in the program. An optimization strateg
y is developed based on these analyses. (C) 1994 Academic Press, Inc.