In this article we recount the sequence of steps by which MPICH, a hig
h-performance, portable implementation of the Message-Passing Interfac
e (MPI) standard, was ported to the NEC SX-4, a high-performance paral
lel supercomputer. Each step in the sequence raised issues that are im
portant for shared-memory programming in general and shed light on bot
h MPICH and the SX-4. The result is a low-latency, very high bandwidth
implementation of MPI for the NEC SX-4, In the process, MPICH was als
o improved in several general ways.