This paper describes a method for improving the performance of a large
direct-mapped cache by reducing the number of conflict misses. Our so
lution consists of two components: an inexpensive hardware device call
ed a Cache Miss Lookaside (CML) buffer that detects conflicts by recor
ding and summarizing a history of cache misses, and a software policy
within the operating system's virtual memory system that removes confl
icts by dynamically remapping pages whenever large numbers of conflict
misses are detected. Using trace-driven simulation of applications an
d the operating system, we how that a CML buffer enables a large direc
t-mapped cache to perform nearly as well as a two-way set associative
cache of equivalent size and speed, although with lower hardware cost
and complexity.