In this paper, we focus on tow-power design techniques for high-performance
processors at the architectural and compiler levels. We focus mainly on de
veloping methods for reducing the energy dissipated in the an-chip caches.
Energy dissipated in caches represents a substantial portion in the energy
budget of today's processors. Extrapolating current trends, this portion is
likely to increase in the near future, since the devices devoted to the ca
ches occupy an increasingly larger percentage of the total area of the chip
.
We propose a method that uses an additional minicache located between the I
-Cache and the central processing unit (CPU) core and buffers instructions
that are nested within loops and are continuously otherwise fetched from th
e I-Cache, This mechanism is combined with code modifications, through the
compiler, that greatly simplify the required hardware, eliminate unnecessar
y instruction fetching, and consequently reduce signal switching activity a
nd the dissipated energy.
We show that the additional cache, dubbed L-Cache, is much smaller and simp
ler than the I-Cache when the compiler assumes the role of allocating instr
uctions to it. Through simulation, we show that for the SPECfp95 benchmarks
, the I-Cache remains disabled most of the time, and the " cheaper" extra c
ache is used instead. We also propose different techniques that are better
adapted to nonnumeric nonloop-intensive code.