Communication-based prevention of useless checkpoints in distributed computations

Citation
Jm. Helary et al., Communication-based prevention of useless checkpoints in distributed computations, DIST COMPUT, 13(1), 2000, pp. 29-43
Citations number
32
Categorie Soggetti
Computer Science & Engineering
Journal title
DISTRIBUTED COMPUTING
ISSN journal
01782770 → ACNP
Volume
13
Issue
1
Year of publication
2000
Pages
29 - 43
Database
ISI
SICI code
0178-2770(200001)13:1<29:CPOUCI>2.0.ZU;2-Y
Abstract
A useless checkpoint is a local checkpoint that cannot be part of a consist ent global checkpoint. This paper addresses the following problem. Given a set of processes that take (basic) local checkpoints in an independent and unknown way, the problem is to design communication-induced checkpointing p rotocols that direct processes to take additional local (forced) checkpoint s to ensure no local checkpoint is useless. The paper first proves two properties related to integer timestamps which a re associated with each local checkpoint. The first property is a necessary and sufficient condition that these timestamps must satisfy for no checkpo int to be useless. The second property provides an easy timestamp-based det ermination of consistent global checkpoints. Then, a general communication- induced checkpointing protocol is proposed. This protocol, derived from the two previous properties, actually defines a family of timestamp-based comm unication-induced checkpointing protocols. It is shown that several existin g checkpointing protocols for the same problem are particular instances of the general protocol. The design of this general protocol is motivated by t he use of communication-induced checkpointing protocols in "consistent glob al checkpoint"-based distributed applications such as the detection of stab le or unstable properties and the determination of distributed breakpoints.