Queue-based spin locks allow programs with busy-wait synchronization to sca
le to very large multiprocessors, without fear of starvation or performance
-destroying contention. So-called try locks, traditionally based on non-sca
lable test-and-set locks, allow a process to abandon its attempt to acquire
a lock after a given amount of time. The process can then pursue an altern
ative code path, or yield the processor to some other process.
We demonstrate that it is possible to obtain both scalability and bounded w
aiting, using variants of the queue-based locks of Craig, Landin, and Hager
sten, and of Mellor-Crummey and Scott. A process that decides to stop waiti
ng for one of these new locks can "link itself out of line" atomically. Sin
gle-processor experiments reveal performance penalties of 50-100% for the C
LH and MCS try locks in comparison to their standard versions, this margina
l cost decreases with larger numbers of processors.
We have also compared our queue-based locks to a. traditional test-and-test
;-and-set lock with exponential backoff and timeout. At modest (non-zero) l
evels of contention, the queued locks sacrifice cache locality for fairness
, resulting in a worst-case 3X performance penalty. At high levels of conte
ntion, however, they display a 1.5-2X performance advantage, with significa
ntly more regular timings and significantly higher rates of acquisition pri
or to timeout.