Irregular loop nests in which the loop bounds are determined dynamical
ly by indexed arrays are difficult to compile into expressive parallel
constructs, such as segmented scans and reductions. In this paper we
describe a suite of transformations to automatically parallelize such
irregular loop nests, even in the presence of recurrences. We describe
a simple, general loop flattening transformation, along with new opti
mizations which make it a viable compiler transformation. A robust rec
urrence parallelization technique is coupled to the loop flattening tr
ansformation, allowing parallelization of segmented reductions, scans,
and combining-sends over arbitrary associative operators. We discuss
the implementation and performance results of the transformations in a
parallelizing Fortran 77 compiler for the Gray C90 supercomputer In p
articular we focus on important sparse matrix-vector multiplication ke
rnels, for one of which we are able to automatically derive an algorit
hm used by one of the fastest library routines available.