ITA
ENG

BOOSTING THE PERFORMANCE OF SHARED-MEMORY MULTIPROCESSORS

Authors

STENSTROM P BRORSSON M DAHLGREN F GRAHN H DUBOIS M

Citation

P. Stenstrom et al., BOOSTING THE PERFORMANCE OF SHARED-MEMORY MULTIPROCESSORS, Computer, 30(7), 1997, pp. 63

Citations number

Categorie Soggetti

Computer Sciences","Computer Science Hardware & Architecture","Computer Science Software Graphycs Programming

Journal title

Computer → ACNP

ISSN journal

00189162

Volume

Issue

Year of publication

1997

Database

ISI

SICI code

0018-9162(1997)30:7<63:BTPOSM>2.0.ZU;2-I

Abstract

Shared memory multiprocessors make it practical to convert sequential programs to parallel ones in a variety of applications. An emerging cl ass of shared memory multiprocessors are nonuniform memory access mach ines with private caches and a cache coherence protocol. Proposed hard ware optimizations to CC-NUMA machines can shorten the time processors lose because of cache misses and invalidations. The authors look at c ost-performance trade-offs for each of four proposed optimizations: re lease consistency, adaptive sequential prefetching, migratory sharing detection, and hybrid update/invalidate with a write cache. The four o ptimizations differ with respect to which application features they at tack, what hardware resources they require, and what constraints they impose on the application software. The authors measured the degree of performance improvement using the four optimizations in isolation and in combination, looking at the trade-offs in hardware and programming complexities. Although one combination of the proposed optimizations (prefetching and migratory sharing detection) can boost a sequentially consistent machine to perform as well as a machine with release consi stency, release consistency models offer significant performance impro vements across a broad application domain at little extra complexity i n the machine design. Moreover, a combination of sequential prefetchin g and hybrid update/invalidate with a write cache cuts the execution t ime of a sequentially consistent machine by half with fairly modest ch anges to the second-level cache and the cache protocol. The authors ex pect that designers will begin to turn more to the release consistency model.