Understanding why correlation profiling improves the predictability of data cache misses in nonnumeric applications

Authors
Citation
Tc. Mowry et Ck. Luk, Understanding why correlation profiling improves the predictability of data cache misses in nonnumeric applications, IEEE COMPUT, 49(4), 2000, pp. 369-384
Citations number
18
Categorie Soggetti
Computer Science & Engineering
Journal title
IEEE TRANSACTIONS ON COMPUTERS
ISSN journal
00189340 → ACNP
Volume
49
Issue
4
Year of publication
2000
Pages
369 - 384
Database
ISI
SICI code
0018-9340(200004)49:4<369:UWCPIT>2.0.ZU;2-I
Abstract
Latency-tolerance techniques offer the potential for bridging the ever-incr easing speed gap between the memory subsystem and today's high-performance processors. However, to fully exploit the benefit of these techniques, one must be careful to apply them only to the dynamic references that are likel y to suffer cache misses-otherwise the runtime overheads can potentially of fset any gains. In this paper, we focus on isolating dynamic miss instances in nonnumeric applications, which is a difficult but important problem. Al though compilers cannot statically analyze data locality in nonnumeric appl ications, one viable approach is to use profiling information to measure th e actual miss behavior. Unfortunately, the state-of-the-art in cache miss p rofiling (which we call summary profiling) is inadequate for references wit h intermediate miss ratios-it either misses opportunities to hide latency, or else inserts overhead that is unnecessary. To overcome this problem, we propose and evaluate a new profiling technique that helps predict which dyn amic instances of a static memory reference will hit or miss in the cache: correlation profiling. Our experimental results demonstrate that roughly ha lf of the 21 nonnumeric applications we study can potentially enjoy signifi cant reductions in memory stall time by exploiting at least one of the thre e forms of correlation profiling we consider: control-flow correlation, sel f correlation, and global correlation. In addition, our detailed case studi es illustrate that self correlation succeeds because a given reference's ca che outcomes often contain repeated patterns and control-flow correlation s ucceeds because cache outcomes are often call-chain dependent. Finally, we suggest a number of ways to exploit correlation profiling in practice and d emonstrate that software prefetching can achieve better performance on a mo dern superscalar processor when directed by correlation profiling rather th an summary profiling information.