This paper describes the main features and functions of the Pentium (R) 4 p
rocessor microarchitecture. We present the front-end of the machine, includ
ing its new form of instruction cache called the trace cache, and describe
the out-of-order execution engine, including a low latency double-pumped ar
ithmetic logic unit (ALU) that runs at 4 GHz. We also discuss the memory su
bsystem, including the low-latency Level I data cache that is accessed in t
wo clock cycles. We then describe some of the key features that contribute
to the Pentium (R) 4 processor's floating-point and multimedia performance.
We provide some key performance numbers for this processor, comparing it t
o the Pentium (R) III processor.