Cache-coherent nonuniform memory access (CC-NUMA) machines have been s
hown to be a promising paradigm for exploiting distributed execution.
CC-NUMA systems can provide performance typically associated with para
llel machines, without the high cost associated with parallel programm
ing. This is because a single image of memory is provided on a CC-NUMA
machine. Past research on CC-NUMA machines has focused on modificatio
ns to the memory hierarchy, interconnect topology, and memory consiste
ncy protocols, which are all areas critical to achieving scalable perf
ormance. The research described here expands this focus to issues asso
ciated with operating system structures which can increase system scal
ability. We describe a hardware/software prototyping study which inves
tigates how changes to the operating system of a commercia! IBM AS/400
(R) system can provide scalable performance when running transaction p
rocessing workloads. The project described was a joint effort between
researchers at the IBM Thomas J. Watson Research Center and a team fro
m the AS/400 development laboratory in Rochester, Minnesota. This pape
r describes various aspects of the project, including changes made to
the operating system to enable scalable performance, and the associate
d hardware and software performance tools developed to identify bottle
necks in the existing operating system structures.