This quad-issue processor achieves 1-GHz operation through improved dynamic
circuit techniques in critical paths and a more extensive on-chip memory s
ystem which scales in both bandwidth and latency. Critical logic paths use
domino, delayed clocked domino, and logic embedded in dynamic flip-flops fo
r minimum delay. A 64-KB sum-addressed memory data cache combines the addre
ss offset add with the cache decode, allowing the average memory latency to
scale by more than the clock ratio. Memory bandwidth is improved by using
wave pipelined SRAM designs for on-chip caches and a write cache for store
traffic. Memory power is controlled without increased latency by use of del
ayed-reset logic decoders. The chip operates at 1000 MHz and dissipates les
s than 80 W from a 1.6-V supply. It contains 23 million transistors (12 mil
lion in RAM cells) on a 244 mm(2) die.