Several cache-coherent shared-memory multiprocessors have been develop
ed that are scalable and offer a very tight coupling between the proce
ssing resources. They are therefore quite attractive for use as comput
e servers for multiprogramming and parallel application workloads. Pro
cess scheduling and memory management, however, remain challenging due
to the distributed main memory found on such machines. This paper exa
mines the effects of OS scheduling and page migration policies on the
performance of such compute servers. Our experiments are done on the S
tanford DASH, a distributed-memory cache-coherent multiprocessor. We s
how that for our multiprogramming workloads consisting of sequential j
obs, the traditional Unix scheduling policy does very poorly. In contr
ast, a policy incorporating cluster and cache affinity along with a si
mple page-migration algorithm offers up to two-fold performance improv
ement. Four our workloads consisting of multiple parallel applications
, we compare space-sharing policies that divide the processors among t
he applications to time-slicing policies can achieve better processor
utilization due to the operating point effect, but time-slicing polici
es benefit strongly from user-level data distribution. Our initial exp
erience with automatic page migration suggests that policies based onl
y on TLB miss information can be quite effective, and useful for addre
ssing the data distribution problems of space-sharing schedulers.