The MIT alewife machine

Citation
A. Agarwal et al., The MIT alewife machine, P IEEE, 87(3), 1999, pp. 430-444
Citations number
34
Categorie Soggetti
Eletrical & Eletronics Engineeing
Journal title
PROCEEDINGS OF THE IEEE
ISSN journal
00189219 → ACNP
Volume
87
Issue
3
Year of publication
1999
Pages
430 - 444
Database
ISI
SICI code
0018-9219(199903)87:3<430:TMAM>2.0.ZU;2-4
Abstract
A variety of models for parallel architectures, such as shared memory, mess age passing, and data flow, have converged in the recent past to a hybrid a rchitecture form called distributed shared memory (DSM). By using a combina tion of hardware and software mechanisms, DSM combines the nice features of all the above models and is able to achieve both the scalability of messag e-passing machines and the programmability of shared memory systems. Alewif e, an early prototype of such DSM architectures, uses a hybrid of software and hardware mechanisms to support coherent shared memory, efficient user-l evel messaging, fine-grain synchronization, and latency tolerance. Alewife supports up to 512 processing nodes connected over a scalable and c ost-effective mesh network at a constant cost per node. Four mechanisms com bine to achieve Alewife' s goals of scalability and programmability: softwa re-extended coherent shared memory provides a global, linear address space; integrated message passing allows compiler and operating system designers to provide efficient communication and synchronization; support for fine-gr ain computation allow many processors to cooperate on small problem sizes; and latency-tolerance mechanisms-including block multithreading and prefetc hing-mask unavoidable delays due to communication. Extensive results from microbenchmarks, together with over a dozen complete applications running on a 32-node prototype, demonstrate that integrating message passing with shared memory enables a cost-efficient solution to the cache coherence problem and provides a rich set of programming primitives. Our results further show that messaging and shared memory operations are b oth important because each helps the programmer to achieve the best perform ance for various machine configurations. Block multithreading and prefetchi ng improve performance significantly, and language constructs that allow pr ogrammers to express fine-grain synchronization can improve performance by over a factor of two.