ITA
ENG

Understanding why correlation profiling improves the predictability of data cache misses in nonnumeric applications

Authors

Mowry, TC Luk, CK

Citation

Tc. Mowry et Ck. Luk, Understanding why correlation profiling improves the predictability of data cache misses in nonnumeric applications, IEEE COMPUT, 49(4), 2000, pp. 369-384

Citations number

Categorie Soggetti

Computer Science & Engineering

Journal title

IEEE TRANSACTIONS ON COMPUTERS

ISSN journal

00189340 → ACNP

Volume

Issue

Year of publication

2000

Pages

369 - 384

Database

ISI

SICI code

0018-9340(200004)49:4<369:UWCPIT>2.0.ZU;2-I

Abstract

Latency-tolerance techniques offer the potential for bridging the ever-incr easing speed gap between the memory subsystem and today's high-performance processors. However, to fully exploit the benefit of these techniques, one must be careful to apply them only to the dynamic references that are likel y to suffer cache misses-otherwise the runtime overheads can potentially of fset any gains. In this paper, we focus on isolating dynamic miss instances in nonnumeric applications, which is a difficult but important problem. Al though compilers cannot statically analyze data locality in nonnumeric appl ications, one viable approach is to use profiling information to measure th e actual miss behavior. Unfortunately, the state-of-the-art in cache miss p rofiling (which we call summary profiling) is inadequate for references wit h intermediate miss ratios-it either misses opportunities to hide latency, or else inserts overhead that is unnecessary. To overcome this problem, we propose and evaluate a new profiling technique that helps predict which dyn amic instances of a static memory reference will hit or miss in the cache: correlation profiling. Our experimental results demonstrate that roughly ha lf of the 21 nonnumeric applications we study can potentially enjoy signifi cant reductions in memory stall time by exploiting at least one of the thre e forms of correlation profiling we consider: control-flow correlation, sel f correlation, and global correlation. In addition, our detailed case studi es illustrate that self correlation succeeds because a given reference's ca che outcomes often contain repeated patterns and control-flow correlation s ucceeds because cache outcomes are often call-chain dependent. Finally, we suggest a number of ways to exploit correlation profiling in practice and d emonstrate that software prefetching can achieve better performance on a mo dern superscalar processor when directed by correlation profiling rather th an summary profiling information.