Parallel programs written in MPI have been widely used for developing high-
performance applications on various platforms. Because of a restriction of
the MPI computation model, conventional MPI implementations on shared-memor
y machines map each MPI, node to an OS process, which can suffer serious pe
rformance degradation in the presence of multiprogramming, This paper studi
es compile-time and runtime techniques for enhancing performance portabilit
y of MPI code running on multiprogrammed shared-memory machines. The propos
ed techniques allow MPI nodes to be executed safely and efficiently as thre
ads. Compile-time transformation eliminates global and static variables in
C code using node-specific data. The runtime support includes an efficient
and provably correct communication protocol that uses lock-free data struct
ure and takes advantage of address space sharing among threads. The experim
ents on SGI Origin 2000 show that our MPI prototype called TMPI using the p
roposed techniques is competitive with SGI's native MPI implementation in a
dedicated environment, and that it has significant performance advantages
in a multiprogrammed environment.