A superscalar RISC processor contains 2.8 million transistors in a die
size of 16.2 mm x 16.5 mm, and utilizes 3.3 V/0.5 pm BiCMOS technolog
y. In order to take advantage of superscalar performance without incur
ring penalties from a slower clock or a longer pipeline, a tag bit is
implemented in the instruction cache to indicate dependency between tw
o instructions. A performance gain of up to 37% is obtained with only
a 3.5% area overhead from our superscalar design.