Staggered consistent checkpointing

Authors
Citation
Nh. Vaidya, Staggered consistent checkpointing, IEEE PARALL, 10(7), 1999, pp. 694-702
Citations number
24
Categorie Soggetti
Computer Science & Engineering
Journal title
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
ISSN journal
10459219 → ACNP
Volume
10
Issue
7
Year of publication
1999
Pages
694 - 702
Database
ISI
SICI code
1045-9219(199907)10:7<694:SCC>2.0.ZU;2-9
Abstract
A consistent checkpointing algorithm saves a consistent view of a distribut ed application's state on stable storage. The traditional consistent checkp ointing algorithms require different processes to save their state at about the same time. This causes contention for the stable storage, potentially resulting in large overheads. Staggering the checkpoints taken by various p rocesses can reduce checkpoint overhead. This paper presents a simple appro ach to arbitrarily stagger the checkpoints. Our approach requires that the processes take consistent logical checkpoints, as compared to consistent ph ysical checkpoints enforced by existing algorithms. Experimental results on nCube-2 are presented.