As the speed of microprocessors increases at a breath-taking rate, the
gap between processor and memory system performance is getting worse.
To alleviate this problem, all modern processors contain caches, but
even using caches, processors cannot achieve their peak performance. W
e propose a mechanism, smart caching, which extends the power of conve
ntional memory subsystems by including a prefetch unit. This prefetch
unit is responsible for efficiently using the available memory bandwid
th by fetching memory data before they are actually needed. Prefetchin
g allows high-level application knowledge to increase memory performan
ce, which is currently constraining the performance of most systems. W
hile prefetching does not reduce the latency of memory accesses, it hi
des this latency by overlapping memory access and instruction executio
n.