St. Hsu et Rc. Chang, CONTINUOUS CHECKPOINTING - JOINING THE CHECKPOINTING WITH VIRTUAL MEMORY PAGING, Software, practice & experience, 27(9), 1997, pp. 1103-1120
Checkpointing is a basic mechanism for backward error-recovery in faul
t-tolerant systems. A checkpointed process stops execution and saves i
ts states to files periodically. To reduce the file sizes, only data m
odified between two consecutive checkpoint times is saved. However, ex
isting approaches do not consider operating system paging activities;
which, if ignored may double the number of disk accesses required to c
heckpoint non-resident dirty pages. In this paper, we propose continuo
us checkpointing, which combines the checkpoint facility with virtual
memory paging operations. Thus, checkpointing is continuous during the
Lifetime of a process without extra overhead. Checkpoint size is no l
onger proportional to application size, but rather is bounded by resid
ent dirty pages. Experimental results show that disk accesses can be r
educed by about 80% when checkpointing large applications. (C) 1997 by
John Wiley & Sons, Ltd.