ITA
ENG

Communication-based prevention of useless checkpoints in distributed computations

Authors

Helary, JM Mostefaoui, A Netzer, RHB Raynal, M

Citation

Jm. Helary et al., Communication-based prevention of useless checkpoints in distributed computations, DIST COMPUT, 13(1), 2000, pp. 29-43

Citations number

Categorie Soggetti

Computer Science & Engineering

Journal title

DISTRIBUTED COMPUTING

ISSN journal

01782770 → ACNP

Volume

Issue

Year of publication

2000

Pages

29 - 43

Database

ISI

SICI code

0178-2770(200001)13:1<29:CPOUCI>2.0.ZU;2-Y

Abstract

A useless checkpoint is a local checkpoint that cannot be part of a consist ent global checkpoint. This paper addresses the following problem. Given a set of processes that take (basic) local checkpoints in an independent and unknown way, the problem is to design communication-induced checkpointing p rotocols that direct processes to take additional local (forced) checkpoint s to ensure no local checkpoint is useless. The paper first proves two properties related to integer timestamps which a re associated with each local checkpoint. The first property is a necessary and sufficient condition that these timestamps must satisfy for no checkpo int to be useless. The second property provides an easy timestamp-based det ermination of consistent global checkpoints. Then, a general communication- induced checkpointing protocol is proposed. This protocol, derived from the two previous properties, actually defines a family of timestamp-based comm unication-induced checkpointing protocols. It is shown that several existin g checkpointing protocols for the same problem are particular instances of the general protocol. The design of this general protocol is motivated by t he use of communication-induced checkpointing protocols in "consistent glob al checkpoint"-based distributed applications such as the detection of stab le or unstable properties and the determination of distributed breakpoints.