We present a simple and novel framework for generating blocked codes f
or high-performance machines with a memory hierarchy. Unlike tradition
al compiler techniques like tiling, which are based on reasoning about
the control flow of programs, our techniques are based on reasoning d
irectly about the flow of data through the memory hierarchy. Our data-
centric transformations permit a more direct solution to the problem o
f enhancing data locality than current control-centric techniques do,
and generalize easily to multiple levels of memory hierarchy. We buttr
ess these claims with performance numbers for standard benchmarks from
the problem domain of dense numerical linear algebra. The simplicity
and intuitive appeal of our approach should make it attractive to comp
iler writers as well as to library writers.