ITA
ENG

Run-time cache bypassing

Authors

Johnson, TL Connors, DA Merten, MC Hwu, WMW

Citation

Tl. Johnson et al., Run-time cache bypassing, IEEE COMPUT, 48(12), 1999, pp. 1338-1354

Citations number

Categorie Soggetti

Computer Science & Engineering

Journal title

IEEE TRANSACTIONS ON COMPUTERS

ISSN journal

00189340 → ACNP

Volume

Issue

Year of publication

1999

Pages

1338 - 1354

Database

ISI

SICI code

0018-9340(199912)48:12<1338:RCB>2.0.ZU;2-3

Abstract

The growing disparity between processor and memory performance has made cac he misses increasingly expensive. Additionally, data and instruction caches are not always used efficiently, resulting in large numbers of cache misse s. Therefore, the importance of cache performance improvements at each leve l of the memory hierarchy will continue to grow. In numeric programs, there are several known compiler techniques for optimizing data cache performanc e. However, integer (nonnumeric) programs often have irregular access patte rns that are more difficult for the compiler to optimize. In the past, cach e management techniques such as cache bypassing were implemented manually a t the machine-language-programming level. As the available chip area grows, it makes sense to spend more resources to allow intelligent control over t he cache management. In this paper, we present an approach to improving cac he effectiveness, taking advantage of the growing chip area, utilizing run- time adaptive cache management techniques, optimizing both performance and cost of implementation. Specifically, we are aiming to increase data cache effectiveness for integer programs. We propose a microarchitecture scheme w here the hardware determines data placement within the cache hierarchy base d on dynamic referencing behavior. This scheme is fully compatible with exi sting Instruction Set Architectures. This paper examines the theoretical up per bounds on the cache hit ratio that cache bypassing can provide for inte ger applications, including several Windows applications with OS activity. Then, detailed trace-driven simulations of the integer applications are use d to show that the implementation described in this paper can achieve perfo rmance close to that of the upper bound.