Global locality optimization is a technique for improving the cache perform
ance of a sequence of loop nests through a combination of loop and data lay
out transformations. Pure loop transformations are restricted by data depen
dencies and may not be very successful in optimizing imperfectly nested loo
ps and explicitly parallelized programs. Although pure data transformations
are not constrained by data dependencies, the impact of a data transformat
ion on an array might be program-wide; that is, it can affect all the refer
ences to that array in all the loop nests. Therefore. in this paper we argu
e for an integrated approach that employs both loop and data transformation
s. The method enjoys the advantages of most of the previous techniques for
enhancing locality and is efficient. In our approach, the loop nests in a p
rogram are processed one by one and the data layout constraints obtained fr
om one nest are propagated for optimizing the remaining loop nests. We show
a simple and effective matrix-based framework to implement this process. T
he search space that we consider for possible loop transformations can be r
epresented by general nonsingular linear transformation matrices and the da
ta layouts that we consider are those that can be expressed using hyperplan
es. Experiments with several floating-point programs on an 8-processor SGI
Origin 2000 distributed-shared-memory machine demonstrate the efficacy of o
ur approach. (C) 1999 Academic Press.