Prefetching is one approach to reducing the latency of memory operations in
modern computer systems. In this paper, we describe the Markov prefetcher.
This prefetcher acts as an interface between the on-chip and off-chip cach
e and can be added to existing computer designs. The Markov prefetcher is d
istinguished by prefetching multiple reference predictions from the memory
subsystem, and then prioritizing the delivery of those references to the pr
ocessor. This design results in a prefetching system that provides good cov
erage, is accurate, and produces timely results that can be effectively use
d by the processor. We also explored a range of techniques that can be used
to reduce the bandwidth demands of prefetching, leading to improved memory
system performance. In our cycle-level simulations, the Markov Prefetcher
reduces the overall execution stalls due to instruction and data memory ope
rations by an average of 54 percent for various commercial benchmarks while
only using two-thirds the memory of a demand-fetch cache organization.