In this paper, we evaluate the performance of high bandwidth cache organiza
tions employing multiple cache ports, multiple cycle hit times, and cache p
ort efficiency enhancements, such as load all and line buffer, to find the
organization that provides the best processor performance. Using a dynamic
superscalar processor running realistic benchmarks that include operating s
ystem references, we use execution time to measure processor performance. W
hen the cache is limited to a single cache port without enhancements, we fi
nd that two cache ports increase processor performance by 25 percent. With
the addition of line buffer and load all to a single ported cache, the proc
essor achieves 91 percent of the performance of the same processor containi
ng a cache with two ports. When the processor is not limited to a single ca
che port, the results show that a large dual-ported multicycle pipelined SR
AM cache with a line buffer maximizes processor performance. A targe pipeli
ned cache provides both a low miss rate and a high CPU clock frequency. Dua
l-porting the cache and using a line buffer provide the bandwidth needed by
a dynamic superscalar processor. The line buffer makes the pipelined dual-
ported cache the best option by increasing cache port bandwidth and hiding
cache latency.