Hz. Shan et Jp. Singh, A comparison of MPI, SHMEM and cache-coherent shared address space programming models on a tightly-coupled multiprocessors, INT J P PRO, 29(3), 2001, pp. 283-318
We compare the performance of three major programming models on a modern, 6
4-processor hardware cache-coherent machine, one of the two major types of
platforms upon which high-performance computing is converging. We focus on
applications that are either regular, predictable or at least do not requir
e fine-grained dynamic replication of irregularly accessed data. Within thi
s class, we use programs with a range of important communication patterns.
We examine whether the basic parallel algorithm and communication structuri
ng approaches needed for best performance are similar or different among th
e models, whether some models have substantial performance advantages over
others as problem size and number of processors change, what the sources of
these performance differences are, where the programs spend their time, an
d whether substantial improvements can be obtained by modifying either the
application programming interfaces or the implementations of the programmin
g models on this type of tightly-coupled multiprocessor platform.