With the ever-widening performance gap between processors and main memory,
cache memory, which is used to bridge this gap, is becoming more and more s
ignificant. Caches work well for programs that exhibit sufficient locality.
Other programs, however, have reference patterns that fail to exploit the
cache, thereby suffering heavily from high memory latency. In order to get
high cache efficiency and achieve good program performance, efficient memor
y accessing behavior is necessary. In fact, for many programs, program tran
sformations or source-code changes can radically alter memory access patter
ns, significantly improving cache performance. Both hand-tuning and compile
r optimization techniques are often used to transform codes to improve cach
e utilization. Unfortunately, cache conflicts are difficult to predict and
estimate, precluding effective transformations. Hence, effective transforma
tions require detailed knowledge about the frequency and causes of cache mi
sses in the code. This article describes methods for generating and solving
Cache Miss Equations (CMEs) that give a detailed representation of cache b
ehavior, including conflict misses, in loop-oriented scientific code. Imple
mented within the SUIF compiler frame-work, our approach extends traditiona
l compiler reuse analysis to generate linear Diophantine equations that sum
marize each loop's memory behavior. While solving these equations is in gen
eral difficult, we show that is also unnecessary, as mathematical technique
s for manipulating Diophantine equations allow us to relatively easily comp
ute and/or reduce the number of possible solutions, where each solution cor
responds to a potential cache miss. The mathematical precision of CMEs allo
ws us to find true optimal solutions for transformations such as blocking o
r padding. The generality of CMEs also allows us to reason about interactio
ns between transformations applied in concert. The article also gives examp
les of their use to determine array padding and offset amounts that minimiz
e cache misses, and to determine optimal blocking factors for tiled code. O
verall, these equations represent an analysis framework that offers the gen
erality and precision needed for detailed compiler optimizations.