As the disparity between processor and memory speeds continues to grow, mem
ory latency is becoming an increasingly important performance bottleneck. W
hile software-controlled prefetching is an attractive technique for tolerat
ing this latency, its success has been limited thus far to array-based nume
ric codes. In this paper, we expand the scope of automatic compiler-inserte
d prefetching to also include the recursive data structures commonly found
in pointer-based applications. We propose three compiler-based prefetching
schemes, and automate the most widely applicable scheme (greedy prefetching
) in an optimizing research compiler. Our experimental results demonstrate
that compiler-inserted prefetching can offer significant performance gains
on both uniprocessors and large-scale shared-memory multiprocessors.