In this paper, we examine some critical design features of a trace cache fe
tch engine for a 16-wide issue processor and evaluate their effects on perf
ormance. We evaluate path associativity, partial matching, and inactive iss
ue, all of which are straightforward extensions to the trace cache. We exam
ine features such as the fill unit and branch predictor design. In our fina
l analysis, we show that the trace cache mechanism attains a 28 percent per
formance improvement over an aggressive single block fetch mechanism and a
15 percent improvement over a sequential multiblock mechanism.