Efficient synchronization is important for achieving good performance
in parallel programs, especially on large-scale multiprocessors. Most
synchronization algorithms have been designed to run on a dedicated ma
chine, with one application process per processor, and can suffer seri
ous performance degradation in the presence of multiprogramming. Probl
ems arise when running processes block or, worse, busy-wait for action
on the part of a process that the scheduler has chosen not to run. We
show that these problems are particularly severe for scalable synchro
nization algorithms based on distributed data structures. We then desc
ribe and evaluate a set of algorithms that perform well in the presenc
e of multiprogramming while maintaining good performance on dedicated
machines. We consider both large and small machines, with a particular
focus on scalability, and examine mutual-exclusion locks, reader-writ
er locks, and barriers. Our algorithms vary in the degree of support r
equired from the kernel scheduler. We find that while it is possible t
o avoid pathological performance problems using previously proposed ke
rnel mechanisms, a modest additional widening of the kernel/user inter
face can make scheduler-conscious synchronization algorithms significa
ntly simpler and faster, with performance on dedicated machines compar
able to that of scheduler-oblivious algorithms.