Multigrain shared memory

Citation
D. Yeung et al., Multigrain shared memory, ACM T COMP, 18(2), 2000, pp. 154-196
Citations number
33
Categorie Soggetti
Computer Science & Engineering
Journal title
ACM TRANSACTIONS ON COMPUTER SYSTEMS
ISSN journal
07342071 → ACNP
Volume
18
Issue
2
Year of publication
2000
Pages
154 - 196
Database
ISI
SICI code
0734-2071(200005)18:2<154:MSM>2.0.ZU;2-T
Abstract
Parallel workstations, each comprising tens of processors based on shared m emory, promise cost-effective scalable multiprocessing. This article explor es the coupling of such small- to medium-scale shared-memory multiprocessor s through software over a local area network to synthesize larger shared-me mory systems. We call these systems Distributed Shared-memory MultiProcesso rs (DSMPs). This article introduces the design of a shared-memory system th at uses multiple granularities of sharing, called MGS, and presents a proto type implementation of MGS on the MIT Alewife multiprocessor. Multigrain sh ared memory enables the collaboration of hardware and software shared memor y, thus synthesizing a single transparent shared-memory address space acros s a cluster of multiprocessors. The system leverages the efficient support for fine-grain cache-line sharing within multiprocessor nodes as often as p ossible, and resorts to coarse-grain page-level sharing across nodes only w hen absolutely necessary. Using our prototype implementation of MGS, an in- depth study of several shared-memory applications is conducted to understan d the behavior of DSMPs. Our study is the first to comprehensively explore the DSMP design space, and to compare the performance of DSMPs against all- software and all-hardware DSMs on a single experimental platform. Keeping t he total number of processors fixed, we show that applications execute up t o 85% faster on a DSMP as compared to an all-software DSM. We also show tha t all-hardware DSMs hold a significant performance advantage over DSMPs on challenging applications, between 159% and 1014%. However, program transfor mations to improve data locality for these applications allow DSMPs to almo st match the performance of an all-hardware multiprocessor of the same size .