Prefetching has been shown to be one of several effective approaches t
hat can tolerate large memory latencies. Hardware-based prefetching sc
hemes handles prefetching at run-time without compiler intervention, w
hereas software-directed prefetching inserts prefetch instructions in
the code by performing static data analysis. In this paper, we conside
r a prefetch engine called Hare, which handles prefetches at run time
and is built in addition to the data pipelining in the on-chip data ca
che for high-performance processors. The key design is that it is prog
rammable by the user code so that techniques of software prefetching c
an be also employed in exploiting the benefits of prefetching. The eng
ine launches prefetches ahead of current execution, which is controlle
d by the program counter. We evaluate the proposed scheme by trace-dri
ven simulation and consider area and cycle time factors for the evalua
tion of cost-effectiveness. Our performance results show that the pref
etch engine can significantly reduce data access penalty with only lit
tle prefetching overhead. (C) 1997 Elsevier Science B.V.