This paper formulates and shows how to solve the problem of selecting
the cache size and depth of cache pipelining that maximizes the perfor
mance of a given instruction-set architecture. The solution combines t
race-driven architectural simulations and the timing analysis of the p
hysical implementation of the cache. Increasing cache size tends to im
prove performance but this improvement is limited because cache access
time increases with its size. This trade-off results in an optimizati
on problem we referred to as multilevel optimization, because it requi
res the simultaneous consideration of two levels of machine abstractio
n: the architectural level and the physical implementation level. The
introduction of pipelining permits the use of larger caches without in
creasing their apparent access time, however, the bubbles caused by lo
ad and branch delays limit this technique. In this paper we also show
how multilevel optimization can be applied to pipelined systems if sof
tware-and hardware-based strategies are considered for hiding the bran
ch and load delays. The multilevel optimization technique is illustrat
ed with the design of a pipelined cache for a high clock rate MIPS-bas
ed architecture. The results of this design exercise show that, becaus
e processors with pipelined caches can have shorter CPU cycle times an
d larger caches, a significant performance advantage is gained by usin
g two or three pipeline stages to fetch data from the cache. Of course
, the results are only optimal for the implementation technologies cho
sen for the design exercise; other choices could result in quite diffe
rent optimal designs. The exercise is primarily to illustrate the step
s in the design of pipelined caches using multilevel optimization; how
ever, it does exemplify the importance of pipelined caches if high clo
ck rate processors are to achieve high performance.