In multiprocessor systems, processing nodes contain a processor, some
cache memory, and a share of the system memory, and connect through a
scalable interconnection. The system memory partitions may be shared o
r disjoint (message passing). Within each class of systems, many archi
tectural variations are possible. Fair comparisons among systems are d
ifficult without a common hardware platform to implement the different
architectures. RPM (Rapid Prototyping engine for Multiprocessors), a
hardware emulator for the rapid prototyping of various multiprocessor
architectures, provides this platform. The authors describe its archit
ecture, performance, and prototyping methodology. Reprogrammable contr
ollers implemented with field-programmable gate arrays emulate the tar
get machine's hardware. The processors, memories, and interconnections
are off the shelf, and their relative speeds can be modified to emula
te various component technologies. The authors also compare RPM with o
ther rapid prototyping approaches. Because emulation is orders-of-magn
itude faster than simulation, an emulator can run problems with large
data sets more representative of the workloads for which the target ma
chine is designed. An emulator can also accomplish more reliable perfo
rmance evaluation and design. Finally, because an emulator is a real c
omputer with its own I/O, every emulation is an actual incarnation of
the target and can run several different workloads.