To understand processor performance, it is essential to use metrics th
at are intuitive, and it is essential to be familiar with a few aspect
s of a simple scalar pipeline before attempting to understand more com
plex structures. This paper shows that cycles per instruction (CPI) is
a simple dot product of event frequencies and event penalties, and th
at it is far more intuitive than its more popular cousin, instructions
per cycle (IPC). CPI is separable into three components that account
for the inherent work, the pipeline, and the memory hierarchy, respect
ively. Each of these components is a fixed upper limit, or ''hard boun
d,'' for the superscalar equivalent components. In the last decade, th
e memory-hierarchy component has become the most dominant of the three
components, and in the next decade, queueing at the memory data bus w
ill become a very significant part of this. In a reaction to this tren
d, an evolution in bus protocols will ensue. This paper provides a gen
eral sketch of those protocols. An underlying theme in this paper is t
hat power constraints have been a driving force in computer architectu
re since the first computers were built fifty years ago. In CMOS techn
ology, power constraints will shape future microarchitecture in a posi
tive and surprising way. Specifically, a resurgence of the RISC approa
ch is expected in high-performance design which will cause the client
and server microarchitectures to converge.