P. Ramanathan et Kg. Shin, USE OF COMMON TIME BASE FOR CHECKPOINTING AND ROLLBACK RECOVERY IN A DISTRIBUTED SYSTEM, IEEE transactions on software engineering, 19(6), 1993, pp. 571-583
A new approach for checkpointing and rollback recovery in a distribute
d computing system using a common time base is proposed in this paper.
First, a common time base is established in the system using a hardwa
re clock synchronization algorithm. This common time base is coupled w
ith the idea of pseudo-recovery points to develop a checkpointing algo
rithm that has the following advantages: 1) reduced wait for commitmen
t for establishing recovery lines, 2) fewer messages to be exchanged,
and 3) less memory requirement. These advantages are assessed quantita
tively by developing a probabilistic model.