The delivered performance on modern processors that employ deep memory hier
archies is closely related to the performance of the memory subsystem. Comp
iler optimizations aimed at improving cache locality are critical in realiz
ing the performance potential of powerful processors. For scientific applic
ations, several loop transformations have been shown to be useful in improv
ing both temporal and spatial locality. Recently, there has been some work
in the area of data layout optimizations, i.e., changing the memory layouts
of multidimensional arrays from the language-defined default such as colum
n-major storage in Fortran. The effect of such memory layout decisions is o
n the spatial locality characteristics of loop nests. While data layout tra
nsformations are not constrained by data dependences, they have no effect o
n temporal locality. On the other hand, loop transformations are not readil
y applicable to imperfect loop nests and are constrained by data dependence
s. More importantly, loop transformations affect the memory access patterns
of all the arrays accessed in a loop nest and, as a result, the locality c
haracteristics of some of the arrays may worsen. This paper presents a tech
nique based on integer linear programming (ILP) that attempts to derive the
best combination of loop and data layout transformations. Prior attempts t
o unify loop and data layout transformations for programs consisting of a s
equence of loop nests have been based on heuristics not only for transforma
tions for a single loop nest but also for the sequence in which loop nests
will be considered. The ILP formulation presented here obviates the need fo
r such heuristics and gives us a bar against which the heuristic algorithms
can be compared, More importantly, our approach is able to transform memor
y layouts dynamically during program execution. This is particularly useful
in applications whose disjoint code segments demand different layouts for
a given array. In addition, we show how this formulation can be extended to
address the false sharing problem in a multiprocessor environment. The key
data structure we introduce is the memory layout graph (MLG) that allows u
s to formulate the problems as path problems. The paper discusses the relat
ionship of this ILP approach based on the memory layout graphs to other wor
k in the area including our previous work. Experimental results on a MIPS R
10000-based system demonstrate the benefits of this approach and show that
the use of the ILP formulation does not increase the compilation time signi
ficantly.