AN EVALUATION OF MEMORY CONSISTENCY MODELS FOR SHARED-MEMORY SYSTEMS WITH ILP PROCESSORS

Citation
Vs. Pai et al., AN EVALUATION OF MEMORY CONSISTENCY MODELS FOR SHARED-MEMORY SYSTEMS WITH ILP PROCESSORS, ACM SIGPLAN NOTICES, 31(9), 1996, pp. 12-23
Citations number
27
Categorie Soggetti
Computer Sciences","Computer Science Software Graphycs Programming
Journal title
Volume
31
Issue
9
Year of publication
1996
Pages
12 - 23
Database
ISI
SICI code
Abstract
Relaxed consistency models have been shown to significantly outperform sequential consistency for single-issue, statically scheduled process ors with blocking reads. However, current microprocessors aggressively exploit instruction-level parallelism (ILP) using methods such as mul tiple issue, dynamic scheduling, and non-blocking reads. Researchers h ave conjectured that two techniques, hardware-controlled non-binding p refetching and speculative loads, have the potential to equalize the h ardware performance of memory consistency models on such processors, T his paper performs the first detailed quantitative comparison of sever al implementations of sequential consistency and release consistency o ptimized for aggressive ILP processors. Our results indicate that hard ware prefetching and speculative loads dramatically improve the perfor mance of sequential consistency. However, the gap between sequential c onsistency and release consistency depends on the cache write policy a nd the complexity of the cache-coherence protocol implementation. In m ost cases, release consistency significantly outperforms sequential co nsistency, but for two applications, the use of a write-back primary c ache and a more complex cache-coherence protocol nearly equalizes the performance of the two models. We also observe that the existing techn iques, which require on-chip hardware modifications, enhance the perfo rmance of release consistency only to a smell extent. We propose two n ew software techniques - fuzzy acquires and selective acquires - to ac hieve more overlap than allowed by the previous implementations of rel ease consistency. To enhance methods for overlapping acquires, we also propose a technique to eliminate control dependences caused by an acq uire loop, using a small amount of off-chip hardware called the synchr onization buffer.