A variety of models for parallel architectures, such as shared memory, mess
age passing, and data flow, have converged in the recent past to a hybrid a
rchitecture form called distributed shared memory (DSM). By using a combina
tion of hardware and software mechanisms, DSM combines the nice features of
all the above models and is able to achieve both the scalability of messag
e-passing machines and the programmability of shared memory systems. Alewif
e, an early prototype of such DSM architectures, uses a hybrid of software
and hardware mechanisms to support coherent shared memory, efficient user-l
evel messaging, fine-grain synchronization, and latency tolerance.
Alewife supports up to 512 processing nodes connected over a scalable and c
ost-effective mesh network at a constant cost per node. Four mechanisms com
bine to achieve Alewife' s goals of scalability and programmability: softwa
re-extended coherent shared memory provides a global, linear address space;
integrated message passing allows compiler and operating system designers
to provide efficient communication and synchronization; support for fine-gr
ain computation allow many processors to cooperate on small problem sizes;
and latency-tolerance mechanisms-including block multithreading and prefetc
hing-mask unavoidable delays due to communication.
Extensive results from microbenchmarks, together with over a dozen complete
applications running on a 32-node prototype, demonstrate that integrating
message passing with shared memory enables a cost-efficient solution to the
cache coherence problem and provides a rich set of programming primitives.
Our results further show that messaging and shared memory operations are b
oth important because each helps the programmer to achieve the best perform
ance for various machine configurations. Block multithreading and prefetchi
ng improve performance significantly, and language constructs that allow pr
ogrammers to express fine-grain synchronization can improve performance by
over a factor of two.