Multiprocessor computer systems are becoming increasingly important as
vehicles for solving computationally expensive problems. Synchronizat
ion among the processors is achieved with a variety of clock configura
tions. A new notion of fault-tolerance for clock synchronization algor
ithms is defined, tailored to the requirements and failure patterns of
shared memory multiprocessors. Algorithms in this class can tolerate
ally number of trapping processors, where a napping processor can Fail
by repeatedly ceasing operation for an arbitrary time interval and th
en resume operation without necessarily recognizing that a fault has o
ccurred. These algorithms guarantee that, for some fixed k, once. proc
essor P has been working correctly for at least k time, then as long a
s P continues to work correctly, (I) P does not adjust its clock, and
(2) P's clock agrees with the clock of every other processor that has
also been working correctly for at least k time. Because a working pro
cessor must synchronize in a fixed amount of time regardless of the ac
tions of the other processors, these algorithms are called wait-free.
Another useful type of fault-tolerance is called self-stabilization: s
tarting with an arbitrary state of the system, a self-stabilizing algo
rithm eventually reaches a point after which it correctly performs its
task. Two wait-free clock synchronization algorithms are presented fo
r a model with global clock pulses. The first one is self-stabilizing;
the second one is not but it converges more quickly than the first on
e. The self-stabilizing algorithm requires each processor's communicat
ion register contents to be a part of the processor's state. This last
requirement is proven necessary. A wait-free clock synchronization al
gorithm is also presented for a model. with local clock pulses. This a
lgorithm is not self-stabilizing.